💾 Archived View for soviet.circumlunar.space › oak › mailinglist › 27.gmi captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

[tech] robots.txt format

From: rwagner at rw-net.de

Date: Wed, 27 Jan 2021 10:40:39 +0100 (CET)

Hi,

simple question:

is the following robots.txt format valid in a form that the "disallow" is applied to all User-agents mentioned before?

---

User-agent: researcher

User-agent: indexer

User-agent: archiver

Disallow: about

---

or do i need to be more chatty?

---

User-agent: researcher

Disallow: about

User-agent: indexer

Disallow: about

User-agent: archiver

Disallow: about

---

kind regards

Ren?

--------

From: Stephane Bortzmeyer

Date: Wed, 27 Jan 2021 11:27:41 +0100

On Wed, Jan 27, 2021 at 10:40:39AM +0100,

Ren? Wagner <rwagner at rw-net.de> wrote

a message of 23 lines which said:

simple question:

Complicated answers:

is the following robots.txt format valid in a form that the
"disallow" is applied to all User-agents mentioned before?

1) There is no standard for robots.txt.

2) There is not yet an "official" adaptation to Gemini, just

proposals.

--------

From: Sean Conner

Date: Wed, 27 Jan 2021 05:56:04 -0500

It was thus said that the Great Ren? Wagner once stated:

Hi,
simple question:
is the following robots.txt format valid in a form that the "disallow" is applied to all User-agents mentioned before?
---
User-agent: researcher
User-agent: indexer
User-agent: archiver
Disallow: about
---

That will work, but you need to add a leading '/' to the Disallow line:

Disallow: /about

That will match any request starting with '/about', like '/about',

'/aboutthis', '/about/that', etc.

or do i need to be more chatty?
---
User-agent: researcher
Disallow: about
User-agent: indexer
Disallow: about
User-agent: archiver
Disallow: about
---

That will work too (same thing about the Disallow: line though). You can

read more about it at <http://www.robotstxt.org/>.

-spc

--------

From: Stephane Bortzmeyer

Date: Wed, 27 Jan 2021 12:12:54 +0100

On Wed, Jan 27, 2021 at 05:56:04AM -0500,

Sean Conner <sean at conman.org> wrote

a message of 33 lines which said:

That will work too (same thing about the Disallow: line though). You can
read more about it at <http://www.robotstxt.org/>.

But do note that many Gemini capsules do not follow this specification

but one of the others (typically more complicated).

--------

From: rwagner at rw-net.de

Date: Wed, 27 Jan 2021 15:38:48 +0100 (CET)

Thanks for the replys.

I've opted for the first version at the moment.

Off course no one knows how exactly crawlers out there are implemented or if they obey robots.txt at all.

Atleast i can serve a valid robots.txt now.

cheers

Ren?

--------