On Tue, Mar 24, 2020 at 05:35:08PM -0400, Sean Conner wrote: > Two possible solutions for robot identification: > > 1) Allow IP addresses to be used where a user-agent would be specificifed. > Some examples: > > User-agent: 172.16.89.3 > User-agent: 172.17.24.0/27 > User-agent: fde7:a680:47d3/48 > > Yes, I'm including CIDR (Classless Inter-Domain Routing) notation to specify > a range of IP addresses. And for a robot, if your IP addresss matches an IP > address (or range), then you need to follow the following rules. Hmm, I'm not a huge fan of this idea (although I recognise it as a valid technical solution to the problem at hand, which is perhaps all you meant it to be). Mostly because I don't like to encourage people to think of IP addresses as permanently mapping to, well, just anything. The address of a VPN running an abusive bot today might be handed out to a different customer running a well-behaved bot next year. > 2) Use the fragment portion of a URL to designate a robot. The fragment > portion of a URL has no meaning for a server (it does for a client). A > robot could use this fact to skip it its identifier when making a request. > The server MUST NOT use this information, but the logs could show it. For > example, a robot could request: > > gemini://example.com/robots.txt#GUS > > A review of the logs would reveal that GUS is a robot, and the text "GUS" > could be placed in the User-agent: field to control it. It SHOULD be the > text the robot would recognize in robots.txt. Hmm, nice out-of-the-box thinking. Since the suggestion has come from you I will assume it does not violate the letter of any RFCs, even though I can't shake a strange feeling that this is "abusing" the fragment concept a little... Cheers, Solderpunk
---
Previous in thread (6 of 8): 🗣️ Sean Conner (sean (a) conman.org)
Next in thread (8 of 8): 🗣️ Sean Conner (sean (a) conman.org)