On Sun, Nov 22, 2020 at 6:03 PM Drew DeVault <sir at cmpwn.com> wrote: > A web portal is a regular user agent, not a robot. > Agreed. However, The spec says "publicly serve the result", and a *public* proxy can pound a Gemini server if a lot of Web clients are accessing it concurrently. It should be able to find out whether the server is robust to such operations or not. By the same token, a public Gopher proxy (if there are any) should respect "Disallow: gopherproxy". Other points: +1 for Allow: +1 for Virtual-Agent +1 for ignoring unknown lines Unsure what the difference is between Crawl-Delay: and Check:, but having a retry delay is a Good Thing Additionally: "Agent:" should specify a SHA-256 hash of the client cert used by particular crawlers rather than a random easy-to-forge name. Thus GUS should crawl using a cert and publicly post the hash of this cert. Then callers with that cert are necessarily GUS, since the cert itself is not published. (Of course it's still possible for a server to steal GUS's client cert.) > Maybe we could normalize robots fetching robots.txt with the query > string set to some useful identifiying information? This would allow > gemini administrators to make bot-specific rules, understand the > behavior of their logs, and get in touch with the operator if > necessary. > The trouble is that completely different pages can be returned with different query strings that are entirely unrelated to actual searching, so it's inappropriate to usurp the query string for this purpose. That's not to say that agent control can't rely on the query string. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org Gules six bars argent on a canton azure 50 mullets argent six five six five six five six five and six --blazoning the U.S. flag <http://web.meson.org/blazonserver> -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201122/f4d1 e563/attachment.htm>
---
Previous in thread (3 of 70): 🗣️ Drew DeVault (sir (a) cmpwn.com)
Next in thread (5 of 70): 🗣️ Adnan Maolood (me (a) adnano.co)