๐Ÿ’พ Archived View for station.martinrue.com โ€บ freezr โ€บ e7837e28023940f7afd24beb3a1d2f5e captured on 2023-11-14 at 10:22:49. Gemini links have been rewritten to link to archived content

View Raw

More Information

โฌ…๏ธ Previous capture (2023-09-08)

โžก๏ธ Next capture (2024-03-21)

-=-=-=-=-=-=-

๐Ÿ‘ฝ freezr

Dear search engine makers how can I tell your services to do not crawl some specific folders?

I use a folder where I upload the drafts and I don't want that discoverable.

Can you help me with this? ๐Ÿ™

Thanks! ๐Ÿ‘

1 year ago

Actions

๐Ÿ‘‹ Join Station

6 Replies

๐Ÿ‘ฝ acidus

Its pretty easy. As other's have said, putting a "robots.txt" in the root of our capsule is what you need. It contains lines that tell crawlers what URLs to ignore. Solderpunk defined that Gemini uses a really primative version of Robots.txt. Here is my:

gemini://gemi.dev/robots.txt

Any URLs that start with the text listed on a line will be ignored. So if you want to have a crawler ignore anything inside of "gemini://example.com/drafts/" put this in:

user-agent:*

Disallow: /drafts/ ยท 1 year ago

๐Ÿ‘ฝ freezr

@moddedbear thanks! ๐Ÿ‘๐Ÿบ ยท 1 year ago

๐Ÿ‘ฝ moddedbear

Robots.txt is way simpler than most pages in search results make it out to be I'm just noticing. ยท 1 year ago

๐Ÿ‘ฝ moddedbear

You could start it off with "User-agent: *" to target all crawlers and then if you wanted to disallow a directory called drafts the next line should be "Disallow: /drafts/". ยท 1 year ago

๐Ÿ‘ฝ freezr

@moddedbear I did not understand how to use it...

My level of abstraction doesn't decode information laid off that way... ๐Ÿ˜ฉ๐Ÿ˜ฉ๐Ÿ˜ฉ ยท 1 year ago

๐Ÿ‘ฝ moddedbear

I don't think you typically have to worry as long you're not linking to your drafts anywhere, but you should look into putting a robots.txt in your capsule root. I'll let you look up the specifics but it's a pretty simple text file that tells crawlers where they should or shouldn't go. ยท 1 year ago