💾 Archived View for rawtext.club › ~sloum › geminilist › 003973.gmi captured on 2024-02-05 at 10:15:07. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-11-30)
-=-=-=-=-=-=-
Sudipto Mallick smallick.dev at gmail.com
Thu Dec 10 18:07:34 GMT 2020
- - - - - - - - - - - - - - - - - - -
On 12/10/20, Stephane Bortzmeyer <stephane at sources.org> wrote:
Opinion: may be we should specify a syntax for Gemini's robots.txt,
not relying on the broken Web one?Here it is:
'bots.txt' for gemini bots and crawlers.
- know who you are: archiver, indexer, feed-reader, researcher etc.- ask for /bots.txt- if 20 text/plain then-- allowed = set()-- denied = set()-- split response by newlines, for each line--- split by spaces and tabs into fields---- paths = field[0] split by ','---- if fields[2] is "allowed" and you in field[1] split by ',' thenallowed = allowed union paths----- if field[3] is "but" and field[5] is "denied" and you infield[4] split by ',' then denied = denied union paths---- if fields[2] is "denied" and you in field[1] split by ',' thendenied = denied union pathsyou always match all, never match noneunion of paths is special: { "/a/b" } union { "/a/b/c" } ==
{ "/a/b" }
when you request for path, find the longest match from allowed anddenied; if it is in allowed you're allowed, otherwise not;; when atie: undefined behaviour, do what you want.
examples:default, effectively: / all allowedor / none deniedcomplex example: /priv1,/priv2,/login all denied /cgi-bin indexer allowed but archiver denied /priv1/pub researcher allowed but blabla,meh,heh,duh denied
what do you think?