๐พ Archived View for station.martinrue.com โบ freezr โบ e7837e28023940f7afd24beb3a1d2f5e captured on 2023-07-22 at 18:05:58. Gemini links have been rewritten to link to archived content
โฌ ๏ธ Previous capture (2023-05-24)
โก๏ธ Next capture (2023-09-08)
-=-=-=-=-=-=-
Dear search engine makers how can I tell your services to do not crawl some specific folders?
I use a folder where I upload the drafts and I don't want that discoverable.
Can you help me with this? ๐
Thanks! ๐
11 months ago
Its pretty easy. As other's have said, putting a "robots.txt" in the root of our capsule is what you need. It contains lines that tell crawlers what URLs to ignore. Solderpunk defined that Gemini uses a really primative version of Robots.txt. Here is my:
gemini://gemi.dev/robots.txt
Any URLs that start with the text listed on a line will be ignored. So if you want to have a crawler ignore anything inside of "gemini://example.com/drafts/" put this in:
user-agent:*
Disallow: /drafts/ ยท 11 months ago
@moddedbear thanks! ๐๐บ ยท 11 months ago
Robots.txt is way simpler than most pages in search results make it out to be I'm just noticing. ยท 11 months ago
You could start it off with "User-agent: *" to target all crawlers and then if you wanted to disallow a directory called drafts the next line should be "Disallow: /drafts/". ยท 11 months ago
@moddedbear I did not understand how to use it...
My level of abstraction doesn't decode information laid off that way... ๐ฉ๐ฉ๐ฉ ยท 11 months ago
I don't think you typically have to worry as long you're not linking to your drafts anywhere, but you should look into putting a robots.txt in your capsule root. I'll let you look up the specifics but it's a pretty simple text file that tells crawlers where they should or shouldn't go. ยท 11 months ago