💾 Archived View for rawtext.club › ~sloum › geminilist › 005415.gmi captured on 2024-02-05 at 11:11:08. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-11-30)
-=-=-=-=-=-=-
Mansfield mansfield at ondollo.com
Sun Feb 21 21:37:18 GMT 2021
- - - - - - - - - - - - - - - - - - -
On Sun, Feb 21, 2021 at 1:48 PM Johann Galle <johann at qwertqwefsday.eu>wrote:
Hi,
why is robots.txt not the obvious answer here? The companion
specification[1] has a "User-agent: webproxy" for this specific case:
### Web proxies
Gemini bots which fetch content in order to translate said content into
HTML and publicly serve the result over HTTP(S) (in order to make
Geminispace accessible from within a standard web browser) should respect
robots.txt directives aimed at a User-agent of "webproxy".
So this should suffice:
```
User-agent: webproxy
Disallow: /
```
Regards,
Johann
I must admit, I'm woefully lacking skill or background with robots.txt. Itseems like it could be a great answer.
A few questions to help me educate myself:
1. How often should that file be referenced by the proxy? It feels like ananswer might be, to check that URL before every request, but that goes inthe direction of some of the negative feedback about the favicon. One useraction -
one gemini request and more. 2. Is 'webproxy' a standard reference to any proxy, or is that somethingleft to us to decide? 3. Are there globbing-like syntax rules for the Disallow field? 4. I'm assuming there could be multiple rules that need to be mixed. Isthere a standard algorithm for that process? E.g.:User-agent: webproxyDisallow: /aAllow: /a/bDisallow: /a/b/c
Again - it seems like this could work out really well.
Thanks for helping me learn a bit more!-------------- next part --------------An HTML attachment was scrubbed...URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20210221/efe2188f/attachment.htm>