Yes, you should respect robots.txt in my opinion. It's not compulsory, but it's currently the best way we have to respect servers' wishes and bandwidth constraints. There is even a companion spec for doing so, which accompanies the main Gemini spec. gemini://gemini.circumlunar.space/docs/companion/robots.gmi Read the companion spec for more detail, but you're indeed correct that bots don't advertise who they are since there's no user-agent. Instead, we have some agreed-upon crawler categories, like `researcher`, `indexer`, `archiver`. It sounds like you may want to respect `researcher` and call it a day :) Nat On Tue, Dec 08, 2020 at 02:36:56PM +0100, Stephane Bortzmeyer wrote: > I just developed a simple crawler for Gemini. Its goal is not to build > another search engine but to perform some surveys of the > geminispace. A typical result will be something like (real data, but > limited in size): > > gemini://gemini.bortzmeyer.org/software/crawler/ > > Currently, I did not yet let it loose on the Internet, because there > are some questions I have. > > Is it "good practice" to follow robots.txt? There is no mention of it > in the specification but it could work for Gemini as well as for the > Web and I notice that some programs query this name on my server. > > Since Gemini (and rightly so) has no User-Agent, how can a bot > advertise its policy and a point of contact?
---
Previous in thread (3 of 41): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)
Next in thread (5 of 41): 🗣️ Stephane Bortzmeyer (stephane (a) sources.org)