Le vendredi 11 d?cembre 2020, 09:26:54 CET Stephane Bortzmeyer a ?crit : > > - know who you are: archiver, indexer, feed-reader, researcher etc. > > - ask for /bots.txt > > - if 20 text/plain then > > -- allowed = set() > > -- denied = set() > > -- split response by newlines, for each line > > --- split by spaces and tabs into fields > > ---- paths = field[0] split by ',' > > ---- if fields[2] is "allowed" and you in field[1] split by ',' then > > allowed = allowed union paths > > ----- if field[3] is "but" and field[5] is "denied" and you in > > field[4] split by ',' then denied = denied union paths > > ---- if fields[2] is "denied" and you in field[1] split by ',' then > > denied = denied union paths > > you always match all, never match none > > union of paths is special: > > { "/a/b" } union { "/a/b/c" } ==> { "/a/b" } > > > > when you request for path, find the longest match from allowed and > > denied; if it is in allowed you're allowed, otherwise not;; when a > > tie: undefined behaviour, do what you want. > > It seems perfect. I guess I?m not the only one needing some examples to fully understand how this would work? If I get it it?s something like so: path1,path2 archiver,crawler allowed but path3 denied path4 * denied
---
Previous in thread (24 of 41): 🗣️ Stephane Bortzmeyer (stephane (a) sources.org)
Next in thread (26 of 41): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)