[discussion] The matter of Robots.txt

Andrew Singleton <singletona082 (a) gmail.com>

I'm going to lead in with a question prompted by Sean's experiences.

Do we even need a robots.txt?

-- 
-----
http://singletona082.flounder.online
gemini://singletona082.flounder.online
My online presence

Link to individual message.

Alan Bunbury <gemini (a) bunburya.eu>

Why wouldn't we? We certainly have a lot of bots so it seems reasonable 
to have robots.txt.

I learned the value of robots.txt soon after setting up Remini, my 
Gemini proxy for Reddit. Many Reddit pages tend to link to a lot of 
other Reddit pages, so crawlers that visited Remini were sent down a 
rabbit hole which ultimately led to them trying to index all of Reddit 
(which is huge) via the proxy.

That's obviously not a usual case but I don't think it's *that* unusual 
either, in Geminispace. More generally, it seems obvious to me that 
there should be a (mostly) agreed-upon way to direct the behaviour of 
bots that visit one's capsule, so if there are good arguments against 
robots.txt I'd be interested in hearing them. I don't think this is 
strictly speaking a Gemini question though, as the robots exclusion 
standard is something quite separate to Gemini (or HTTP).

On 21/10/2021 13:41, Andrew Singleton wrote:
>
> I'm going to lead in with a question prompted by Sean's experiences.
>
> Do we even need a robots.txt?
>
> -- 
> -----
> http://singletona082.flounder.online
> gemini://singletona082.flounder.online
> My online presence

Link to individual message.

---

Previous Thread: [off-topic_ann] Publishing As Protocol

Next Thread: Gemini on Sourcehut (was Re: News----good, bad, ugly? You decide (was Re: [spec] comments on the proposed gemini spec revisions))