<-- back to the mailing list

Crawlers on Gemini and best practices

Stephane Bortzmeyer stephane at sources.org

Fri Dec 11 10:16:05 GMT 2020

- - - - - - - - - - - - - - - - - - - 

On Thu, Dec 10, 2020 at 11:37:34PM +0530, Sudipto Mallick <smallick.dev at gmail.com> wrote a message of 40 lines which said:

- ask for /bots.txt

Speaking of this, I suggest it could be better to have a /.well-known(or equivalent) to put all these "meta" files. The Web does it (RFC5785) and it's cool since it avoids colliding with "real"resources. (Also, crawling the geminispace shows strange robots.txtwhich are probably "wildcards" or "catchall", created by a programwhich replies for every possible path. Having a /.well-known wouldallow to define an exception.)

It requires no change in clients (except bots) or servers, it is justa convention.

=

gemini://gemini.bortzmeyer.org/rfc-mirror/rfc5785.txt RFC 5785 "Defining Well-Known URIs"

Meta-remark: is there a place with all the "Gemini good practices" or"Gemini conventions", which do not change the protocol or the formatbut are useful?