๐Ÿ’พ Archived View for station.martinrue.com โ€บ freezr โ€บ e7837e28023940f7afd24beb3a1d2f5e captured on 2023-07-22 at 18:05:58. Gemini links have been rewritten to link to archived content

View Raw

More Information

โฌ…๏ธ Previous capture (2023-05-24)

โžก๏ธ Next capture (2023-09-08)

๐Ÿšง View Differences

-=-=-=-=-=-=-

๐Ÿ‘ฝ freezr

Dear search engine makers how can I tell your services to do not crawl some specific folders?

I use a folder where I upload the drafts and I don't want that discoverable.

Can you help me with this? ๐Ÿ™

Thanks! ๐Ÿ‘

11 months ago

Actions

๐Ÿ‘‹ Join Station

6 Replies

๐Ÿ‘ฝ acidus

Its pretty easy. As other's have said, putting a "robots.txt" in the root of our capsule is what you need. It contains lines that tell crawlers what URLs to ignore. Solderpunk defined that Gemini uses a really primative version of Robots.txt. Here is my:

gemini://gemi.dev/robots.txt

Any URLs that start with the text listed on a line will be ignored. So if you want to have a crawler ignore anything inside of "gemini://example.com/drafts/" put this in:

user-agent:*

Disallow: /drafts/ ยท 11 months ago

๐Ÿ‘ฝ freezr

@moddedbear thanks! ๐Ÿ‘๐Ÿบ ยท 11 months ago

๐Ÿ‘ฝ moddedbear

Robots.txt is way simpler than most pages in search results make it out to be I'm just noticing. ยท 11 months ago

๐Ÿ‘ฝ moddedbear

You could start it off with "User-agent: *" to target all crawlers and then if you wanted to disallow a directory called drafts the next line should be "Disallow: /drafts/". ยท 11 months ago

๐Ÿ‘ฝ freezr

@moddedbear I did not understand how to use it...

My level of abstraction doesn't decode information laid off that way... ๐Ÿ˜ฉ๐Ÿ˜ฉ๐Ÿ˜ฉ ยท 11 months ago

๐Ÿ‘ฝ moddedbear

I don't think you typically have to worry as long you're not linking to your drafts anywhere, but you should look into putting a robots.txt in your capsule root. I'll let you look up the specifics but it's a pretty simple text file that tells crawlers where they should or shouldn't go. ยท 11 months ago