On Thu, Dec 10, 2020 at 11:37:34PM +0530, Sudipto Mallick <smallick.dev at gmail.com> wrote a message of 40 lines which said: > 'bots.txt' for gemini bots and crawlers. Interesting. The good thing is that it moves away from robots.txt (underspecified, full of variants, impossible to know what a good bot should do). > - know who you are: archiver, indexer, feed-reader, researcher etc. > - ask for /bots.txt > - if 20 text/plain then > -- allowed = set() > -- denied = set() > -- split response by newlines, for each line > --- split by spaces and tabs into fields > ---- paths = field[0] split by ',' > ---- if fields[2] is "allowed" and you in field[1] split by ',' then > allowed = allowed union paths > ----- if field[3] is "but" and field[5] is "denied" and you in > field[4] split by ',' then denied = denied union paths > ---- if fields[2] is "denied" and you in field[1] split by ',' then > denied = denied union paths > you always match all, never match none > union of paths is special: > { "/a/b" } union { "/a/b/c" } ==> { "/a/b" } > > when you request for path, find the longest match from allowed and > denied; if it is in allowed you're allowed, otherwise not;; when a > tie: undefined behaviour, do what you want. It seems perfect.
---
Previous in thread (21 of 41): 🗣️ Stephane Bortzmeyer (stephane (a) sources.org)
Next in thread (23 of 41): 🗣️ Stephane Bortzmeyer (stephane (a) sources.org)