💾 Archived View for rawtext.club › ~sloum › geminilist › 000519.gmi captured on 2020-09-24 at 02:30:50. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Sean Conner sean at conman.org
Tue Mar 24 21:35:08 GMT 2020
- - - - - - - - - - - - - - - - - - - ``` It was thus said that the Great solderpunk once stated: > The biggest question, in my mind, is what to do about user-agents, which > Gemini lacks (by design, as they are a component of the browser > fingerprinting problem, and because they encourage content developers to > serve browser-specific content which is a bad thing IMHO). The 2019 RFC > says "The product token SHOULD be part of the identification string that > the crawler sends to the service" (where "product token" is bizarre and > disappointingly commercial alternative terminology for "user-agent" in > this document), so the fact that Gemini doesn't send one is not > technically a violation. Two possible solutions for robot identification: 1) Allow IP addresses to be used where a user-agent would be specificifed. Some examples: User-agent: 172.16.89.3 User-agent: 172.17.24.0/27 User-agent: fde7:a680:47d3/48 Yes, I'm including CIDR (Classless Inter-Domain Routing) notation to specifya range of IP addresses. And for a robot, if your IP addresss matches an IPaddress (or range), then you need to follow the following rules. 2) Use the fragment portion of a URL to designate a robot. The fragmentportion of a URL has no meaning for a server (it does for a client). Arobot could use this fact to skip it its identifier when making a request. The server MUST NOT use this information, but the logs could show it. Forexample, a robot could request: gemini://example.com/robots.txt#GUS A review of the logs would reveal that GUS is a robot, and the text "GUS"could be placed in the User-agent: field to control it. It SHOULD be thetext the robot would recognize in robots.txt. One clarification, this: gemini://example.com/robots.txt#foo%20bot would be User-agent: foo bot but a robot ID SHOULD NOT contain spaces---it SHOULD be one word. Anyway, that's my ideas. -spc