robots.txt for Gemini formalised

On Tue, 2020-11-24 at 13:31 +0000, James Tomasino wrote:
> 
> As much as I'd love to wave a magic wand and say, "it's all opt-in
> here" we don't really have any legal footing to do so.
> 

James and I talked a bit more about this one on IRC. Key to this
argument, AIUI, is how robots.txt (or the lack of it) is treated for
FTP, which lacks any mention of it in the spec but has apparently been
given weight in DMCA-related rulings involving it.

I'm not sure I agree with the reasoning, which goes something like "the
robots.txt Internet-Draft is already de-jure part of Gemini, and we
can't change that", but IANAL ^^. In particular, I've been thinking
about this almost entirely in GDPR terms so far, and have a bunch of
DMCA-related reading to do now.

In the event that it *is* accurate, we talked about an alternative way
to implement the functionality.  Rather than having the gemini
robots.txt spec say "if the client doesn't receive a robots.txt, it
must assume this one", the *server* could be made to return a defined
robots.txt response body if it would otherwise issue a 51 response to
`/robots.txt`

(51 may be too specific, it could be 5x, but I don't *think* it would
be appropriate in response to 4x responses, which crawlers would be
expected to retry).

Of course, any server could do that already today, so the ask is to put
a recommendation about it into "server best practice", perhaps
incorporating the `--permit-indexing` and `--permit-archiving` flags I
talked about in another post.

Another advantage of this approach is that it becomes opaque to crawler
authors whether the user has explicitly selected a preference or not.
I'm also inclined to trust server implementors over crawler
implementors.

/Nick

p.s. there was also some question as to whether someone hosting gemini
content was a "gemini user", in the way we use that term on the project
homepage. To me, it seems like a reasonable extrapolation, but perhaps
it's a topic that deserves more debate or clarification.

---

Previous in thread (31 of 70): 🗣️ Johann Galle (johann (a) qwertqwefsday.eu)

Next in thread (33 of 70): 🗣️ James Tomasino (tomasino (a) lavabit.com)

View entire thread.