robots.txt for Gemini

On Tue, Mar 24, 2020 at 05:35:08PM -0400, Sean Conner wrote:
 
>   Two possible solutions for robot identification:
> 
> 1) Allow IP addresses to be used where a user-agent would be specificifed. 
> Some examples:
> 
> 	User-agent: 172.16.89.3
> 	User-agent: 172.17.24.0/27
> 	User-agent: fde7:a680:47d3/48
> 
> Yes, I'm including CIDR (Classless Inter-Domain Routing) notation to specify
> a range of IP addresses.  And for a robot, if your IP addresss matches an IP
> address (or range), then you need to follow the following rules.

Hmm, I'm not a huge fan of this idea (although I recognise it as a valid
technical solution to the problem at hand, which is perhaps all you
meant it to be).  Mostly because I don't like to encourage people to
think of IP addresses as permanently mapping to, well, just anything.
The address of a VPN running an abusive bot today might be handed out to
a different customer running a well-behaved bot next year.
 
> 2) Use the fragment portion of a URL to designate a robot.  The fragment
> portion of a URL has no meaning for a server (it does for a client).  A
> robot could use this fact to skip it its identifier when making a request. 
> The server MUST NOT use this information, but the logs could show it.  For
> example, a robot could request:
> 
> 	gemini://example.com/robots.txt#GUS
> 
> A review of the logs would reveal that GUS is a robot, and the text "GUS"
> could be placed in the User-agent: field to control it.  It SHOULD be the
> text the robot would recognize in robots.txt.

Hmm, nice out-of-the-box thinking.  Since the suggestion has come from
you I will assume it does not violate the letter of any RFCs, even
though I can't shake a strange feeling that this is "abusing" the
fragment concept a little...

Cheers,
Solderpunk

---

Previous in thread (6 of 8): 🗣️ Sean Conner (sean (a) conman.org)

Next in thread (8 of 8): 🗣️ Sean Conner (sean (a) conman.org)

View entire thread.