💾 Archived View for capsule.usebox.net › gemlog › 20220103-adding-a-robots-txt.gmi captured on 2024-08-18 at 17:28:28. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-01-08)
-=-=-=-=-=-=-
Posted Mon 03 Jan, 2022.
I have noticed that some search engines in Gemini are indexing content that perhaps they shouldn't index. For example: the jar files of SapaceBeans releases (that perhaps are a bit too heavy to be distributed over Gemini, but I like the idea of a Gemini server that you can download and setup using the small web itself). Honest question: why would anyone want to index a jar file in Gemini?
I also found that some pages from Cherrygrove City Pokémon Center are being indexed as well (go to geminispace.info and search for Electabuzz, for example). Although this may be useful if you are searching for information about one specific Pokémon (LOL), in reality it adds a lot of noise to the search engine.
So I have added this robots.txt to my capsule:
User-agent: * Disallow: /spacebeans/releases/ Disallow: /pokemon/db/
Every now and then I can read posts of people complaining about bad (or misconfigured) actors that waste resources by crawling the Gemini space in the wrong way, and I didn't think that I had some resources that were contributing to that waste because I didn't provide help to the good crawlers out there that honour the robots.txt file.
Let's see if this has the desired effect or not.
Update 2022-01-04: it does work, indeed! geminispace.info won't give you results for Electabuzz any more. Excellent!