<-- back to the mailing list

Gemini Archiving and WARC

Brian Evans b__m__e at mailfence.com

Fri Sep 4 19:43:10 BST 2020

- - - - - - - - - - - - - - - - - - - 

acdw writes:

I think any archiver or spider should also respect *robots.txt* files -- though them being opt-in vs. opt-out is kind of moot, since spiders gonna spider, you know?

I think opt-in vs opt-out is definitely not moot. The web largely operates on an opt out basis (where there is an option at all). We are at a point where we can develop different norms for a different system, and I think we should.

I definitely agree that there is nothing that can be down about spiders that do not follow recommended community guidelines and that when you post something that is not behind a client cert requirement or the like that it is public.

However, I do think that using robots.txt for spiders of all sorts is a bad idea for gemini and will create less user choice in the long run. robots.txt is suggested often because it exists and is there... but it is not designed for multi-user systems (the predominant form of system on gemini at present) and is explicitly designed to opt you out... meaning that if users dont even know that spiders are a thing (as many non-technical people do not) then they do not get to have a choice. My suggestion as simply about community norms and trying to push, at least for spiders that are willing to respect a community standard, an opt in that works at the directory level and can be managed by users rather than by system administrators. The idea being that if someone does not have a document, lets call it `green-light.txt`, saying yes to various sorts of spidering that a well behaved spider should ignore content in that directory.

Having said all of that: I agree this is not a protocol issue and the conversation and is more about philosophical/ethical preferences and could be moved over to gemini posts rather than here on the mailing list. So I will likely not post more on it here... but maybe I'll write something up on my gemlog tonight.