πŸ’Ύ Archived View for gemi.dev β€Ί gemini-mailing-list β€Ί 000641.gmi captured on 2024-08-19 at 01:30:44. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-12-28)

-=-=-=-=-=-=-

[tech] robots.txt format

1. RenΓ© Wagner (rwagner (a) rw-net.de)

Hi,

simple question: 
is the following robots.txt format valid in a form that the "disallow" is 
applied to all User-agents mentioned before?
---
User-agent: researcher
User-agent: indexer
User-agent: archiver
Disallow: about
---

or do i need to be more chatty?
---
User-agent: researcher
Disallow: about
User-agent: indexer
Disallow: about
User-agent: archiver
Disallow: about
---

kind regards
Ren?

Link to individual message.

2. Stephane Bortzmeyer (stephane (a) sources.org)

On Wed, Jan 27, 2021 at 10:40:39AM +0100,
 Ren? Wagner <rwagner at rw-net.de> wrote 
 a message of 23 lines which said:

> simple question:

Complicated answers:

> is the following robots.txt format valid in a form that the
> "disallow" is applied to all User-agents mentioned before?

1) There is no standard for robots.txt.

2) There is not yet an "official" adaptation to Gemini, just
proposals.

Link to individual message.

3. Sean Conner (sean (a) conman.org)

It was thus said that the Great Ren? Wagner once stated:
> Hi,
> 
> simple question: 
> is the following robots.txt format valid in a form that the "disallow" 
is applied to all User-agents mentioned before?
> ---
> User-agent: researcher
> User-agent: indexer
> User-agent: archiver
> Disallow: about
> ---

  That will work, but you need to add a leading '/' to the Disallow line:

Disallow: /about

That will match any request starting with '/about', like '/about',
'/aboutthis', '/about/that', etc.  

> or do i need to be more chatty?
> ---
> User-agent: researcher
> Disallow: about
> User-agent: indexer
> Disallow: about
> User-agent: archiver
> Disallow: about
> ---

  That will work too (same thing about the Disallow: line though).  You can
read more about it at <http://www.robotstxt.org/>.

  -spc

Link to individual message.

4. Stephane Bortzmeyer (stephane (a) sources.org)

On Wed, Jan 27, 2021 at 05:56:04AM -0500,
 Sean Conner <sean at conman.org> wrote 
 a message of 33 lines which said:

>   That will work too (same thing about the Disallow: line though).  You can
> read more about it at <http://www.robotstxt.org/>.

But do note that many Gemini capsules do not follow this specification
but one of the others (typically more complicated).

Link to individual message.

5. RenΓ© Wagner (rwagner (a) rw-net.de)

Thanks for the replys.

I've opted for the first version at the moment.
Off course no one knows how exactly crawlers out there are implemented or 
if they obey robots.txt at all.

Atleast i can serve a valid robots.txt now.

cheers
Ren?

Link to individual message.

---

Previous Thread: Proposal: Simple structured form specification

Next Thread: Viability of *apps* on Gemini?