Hi I suppose I am chipping it a bit too late here, but I think the robots.txt thing was always a rather ugly mechanism - a bit of an afterthought. Consider the gemini://example.com/~somebody/personal.gmi - if somebody wishes to exclude personal.gmi from being crawled they need write access to example.com/robots.txt, and how do we go about making sure that ~somebodyelse, also on example.com doesn't overwrite robots.txt with their own rules ? Then there is the problem of transitivity - if we have a portal, proxy or archive - how does it relay the information to its downstream users ? See also the exchange between Sean and Drew... So the way I remember it, robots.txt was a quick hack to prevent spiders getting trapped in a maze of cgi generated data, and so hammering the server. It wasn't designed to solve matters of privacy and redistribution. I have pitched this idea before: I think a footer containing the license/rules under which a page can be distributed/cached is more sensible than robots.txt. This approach is:
---
Previous in thread (19 of 70): 🗣️ Drew DeVault (sir (a) cmpwn.com)
Next in thread (21 of 70): 🗣️ Johann Galle (johann (a) qwertqwefsday.eu)