robots.txt for Gemini formalised

Hi

I suppose I am chipping it a bit too late here, but I think
the robots.txt thing was always a rather ugly mechanism - a
bit of an afterthought.

Consider the gemini://example.com/~somebody/personal.gmi -
if somebody wishes to exclude personal.gmi from being
crawled they need write access to example.com/robots.txt,
and how do we go about making sure that ~somebodyelse,
also on example.com doesn't overwrite robots.txt with
their own rules ?

Then there is the problem of transitivity - if we
have a portal, proxy or archive - how does it relay
the information to its downstream users ? See also
the exchange between Sean and Drew...

So the way I remember it, robots.txt was a quick hack
to prevent spiders getting trapped in a maze of
cgi generated data, and so hammering the server.
It wasn't designed to solve matters of privacy
and redistribution.

I have pitched this idea before: I think a footer containing
the license/rules under which a page can be distributed/cached
is more sensible than robots.txt. This approach is:



I speak under correction, but I believe a decent amount of the
public web was mined for faces to train the neural networks
that now make totalitarian surveillance possible. Had these
been labelled "CC ND (no derivative work)" then there
would be legal impediment - not to the regimes now, but to
the universities and research labs which pioneered this.

We now have people more aware of this problem, and some
of us wish to put up material limited to gemini-space only,
and not export it to the web. A footer line "-- GMI: A. User"
could prohibit export to the web, while one "-- CC-SA: J. Soap"
would permit it...

regards

marc

---

Previous in thread (19 of 70): 🗣️ Drew DeVault (sir (a) cmpwn.com)

Next in thread (21 of 70): 🗣️ Johann Galle (johann (a) qwertqwefsday.eu)

View entire thread.