robots.txt for Gemini


solderpunk writes:

> But since there is no way for Gemini server admins to learn the
> user-agent of arbitrary bots, we could define a small (I'm thinking ~5
> would suffice, surely 10 at most) number of pre-defined user-agents
> which all bots of a given kind MUST respect (in addition to optionally
> having their own individual user-agent). A very rough sketch of some
> possibilities, not meant to be exhaustive or even very good, just to
> give the flavour:

I think this is probably the right approach, since it doesn't require
adding user-agents to the protocol.

> * A user-agent of "webproxy" which must be respected by all web
> proxies. Possibly this could have sub-types for proxies which do and
> don't forbid web search engines?

webproxy-bot and webproxy-nobot, perhaps.

> * A user-agent of "search" which must be respected by all search
> engine spiders

> * A user-agent of "research" for bots which crawl a site without
> making specific results of their crawl publically available (I've
> thought of writing something like this to study the growth of
> Geminispace and the structure of links between documents)

Another type I can think of is "archive", for things that rehost
existing gemini content elsewhere on gemini. Besides being another use
case, this category also has the implication that it may make deleted
content available (a la the Wayback Machine).

--
+-----------------------------------------------------------+
| Jason F. McBrayer                    jmcbray at carcosa.net  |
| If someone conquers a thousand times a thousand others in |
| battle, and someone else conquers himself, the latter one |
| is the greatest of all conquerors.  --- The Dhammapada    |

---

Previous in thread (2 of 8): 🗣️ Krixano (krixano (a) protonmail.com)

Next in thread (4 of 8): 🗣️ solderpunk (solderpunk (a) SDF.ORG)

View entire thread.