💾 Archived View for gemi.dev › gemini-mailing-list › 000054.gmi captured on 2024-08-25 at 08:58:33. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Requests for robots.txt

📧 Messages: 3
🗣️ Authors: 3
📅 First Message: 2020-03-22 01:39
📅 Last Message: 2020-03-22 11:59

1. Sean Conner (sean (a) conman.org)

📅 Sent: 2020-03-22 01:39
📧 Message 1 of 3


  I'm going through my Gemini logs, and I'm finding this:

remote=XXX.XXX.XXX.XXX status=51 
request="gemini://gemini.conman.org/sourcecode/lua:1965/robots.txt" 
bytes=14 subject="" issuer=""
remote=XXX.XXX.XXX.XXX status=51 
request="gemini://gemini.conman.org/sourcecode/lua/glv-1:1965/robots.txt" 
bytes=14 subject="" issuer=""
remote=XXX.XXX.XXX.XXX status=51 
request="gemini://gemini.conman.org/sourcecode/lua/glv-1/handlers:1965/robo
ts.txt" bytes=14 subject="" issuer=""
remote=XXX.XXX.XXX.XXX status=51 
request="gemini://gemini.conman.org/sourcecode/lua/glv-1/handlers/filesyste
m.lua:1965/robots.txt" bytes=14 subject="" issuer=""
remote=XXX.XXX.XXX.XXX status=51 
request="gemini://gemini.conman.org/sourcecode/lua/glv-1/handlers/sample.lu
a:1965/robots.txt" bytes=14 subject="" issuer=""
remote=XXX.XXX.XXX.XXX status=51 
request="gemini://gemini.conman.org/sourcecode/lua/glv-1/handlers/userdir.l
ua:1965/robots.txt" bytes=14 subject="" issuer=""
remote=XXX.XXX.XXX.XXX status=51 
request="gemini://gemini.conman.org/sourcecode/lua/glv-1/msg.lua:1965/robot
s.txt" bytes=14 subject="" issuer=""
remote=XXX.XXX.XXX.XXX status=51 
request="gemini://gemini.conman.org/sourcecode/lua/glv-1/cgi.lua:1965/robot
s.txt" bytes=14 subject="" issuer=""

(I'm censoring the IP to protect the guilty here)

  I don't mind the crawling, but I am concerned about the references to
robots.txt.  In the web world, robots.txt lives at the top level and *only*
at the top level.  I don't think there's been a official response from
solderpunk about robots.txt, but I would expect it to be very similar to how
it works on the web---the top level only.

  But a clarification would be nice (either way).  In my opinion, it should
only live at the top level, but I can adapt to every "directory" as well.

  -spc

Link to individual message.

2. solderpunk (solderpunk (a) SDF.ORG)

📅 Sent: 2020-03-22 11:51
📧 Message 2 of 3

On Sat, Mar 21, 2020 at 09:39:46PM -0400, Sean Conner wrote:

>   I don't mind the crawling, but I am concerned about the references to
> robots.txt.  In the web world, robots.txt lives at the top level and *only*
> at the top level.  I don't think there's been a official response from
> solderpunk about robots.txt, but I would expect it to be very similar to how
> it works on the web---the top level only.
> 
>   But a clarification would be nice (either way).  In my opinion, it should
> only live at the top level, but I can adapt to every "directory" as well.

This is nicely timed, actually, as things like robots.txt are now
looming larger on my personal radar than they have previously - with
CAPCOM I am writing for the first time a program which automatically
makes Gemini requests, and I'm very keen on making sure that it's a
"good citizen".  There hasn't been too much overt discussion of good
Gemini citizenship yet, but now that non-human clients are becoming more
common, there should be.  Robots.txt is obviously part of that package.

(It's *not* super relevant to feed aggregation, because nobody publishes
a feed without the expectation that it is read entirely by bots, but
other issues, especially rate limiting, rate)

It's been many years since I read any robots.txt specs from the web.  I
will refresh my memory and start thinking about this, and asking
questions, in the hopes that we can finalise some stuff soon.

Cheers,
Solderpunk

Link to individual message.

3. Natalie Pendragon (natpen (a) natpen.net)

📅 Sent: 2020-03-22 11:59
📧 Message 3 of 3

FWIW I'm 99% sure those are requests from GUS, and I agree that it
should be top level only. That was a regression in GUS' crawling code,
which I've now fixed!

I'm still very happy to accommodate more official guidance on how
robots.txt should work, but in the meantime (and in the absence of any
more regressions, eep!) I plan to check top-level-only robots.txt.

So sorry about the :bug:!

Link to individual message.

---

Previous Thread: [ANN] Announcing CAPCOM, a Gemini Atom aggregator

Next Thread: robots.txt for Gemini