Requests for robots.txt

On Sat, Mar 21, 2020 at 09:39:46PM -0400, Sean Conner wrote:
 
>   I don't mind the crawling, but I am concerned about the references to
> robots.txt.  In the web world, robots.txt lives at the top level and *only*
> at the top level.  I don't think there's been a official response from
> solderpunk about robots.txt, but I would expect it to be very similar to how
> it works on the web---the top level only.
> 
>   But a clarification would be nice (either way).  In my opinion, it should
> only live at the top level, but I can adapt to every "directory" as well.

This is nicely timed, actually, as things like robots.txt are now
looming larger on my personal radar than they have previously - with
CAPCOM I am writing for the first time a program which automatically
makes Gemini requests, and I'm very keen on making sure that it's a
"good citizen".  There hasn't been too much overt discussion of good
Gemini citizenship yet, but now that non-human clients are becoming more
common, there should be.  Robots.txt is obviously part of that package.

(It's *not* super relevant to feed aggregation, because nobody publishes
a feed without the expectation that it is read entirely by bots, but
other issues, especially rate limiting, rate)

It's been many years since I read any robots.txt specs from the web.  I
will refresh my memory and start thinking about this, and asking
questions, in the hopes that we can finalise some stuff soon.

Cheers,
Solderpunk

---

Previous in thread (1 of 3): 🗣️ Sean Conner (sean (a) conman.org)

Next in thread (3 of 3): 🗣️ Natalie Pendragon (natpen (a) natpen.net)

View entire thread.