Here’s the proposal that was discussed on the Gopher mailing list, recently.
discussed on the Gopher mailing list
We love gopher apps and we love seeing them, but it is very hard for robots crawling gopher-space to automatically recognize them, requiring lots of manual work to pull stuff out of the index that should never have been there in the first place. Please use a `robots.txt` selector to keep spiders out of these areas.
A robot MUST check `robots.txt`. A robot MAY check `0/robots.txt` if `robots.txt` is not found.
The reason for those two selectors is almost every server interprets a selector of “robots.txt” as a file in its root. The reason for the second in particular is UMN or UMN-alike gopherds that like to have the itemtype repeated. The first takes precedence.
Note that this doesn’t include a leading slash!
How to test? The following should return the contents of the site’s `robots.txt`.
echo robots.txt | nc alexschroeder.ch 70
A robot SHOULD cache the `robots.txt` file for 24h.
A `robots.txt` file consists of lines separated by a newline (`\n`) or a carriage return and a newline (`\r\n`).
A robot MUST consider all lines starting with `Disallow:`. Each such line specifies a pattern indicating that all selectors matching the pattern are to be ignored by robots.
1. Whitespace after `Disallow:` MUST be ignored
2. Patterns match from the beginning of the selector
Example:
The following line disallows robots from indexing any links starting with a slash:
Disallow: /
Note that the selector `robots.txt` does not start with a slash.
In terms of regular expressions, this means that every pattern implicitly starts with `^`.
Patterns MAY contain one or more asterisks (`*`). These are wildcards matching zero or more characters.
Example:
The following line disallows robots from indexing any links containing a slash:
Disallow: */
Note that there is no way to specify that a pattern must match up to the end of the selector.
In terms of regular expressions, this means that there is no way to specify `
Authors SHOULD use the `#` character to indicate comment up to the end of the line.
There is currently no support for other keywords we know from the web’s `robots.txt` standard. [1]
[1] https://en.wikipedia.org/wiki/Robots_exclusion_standard
https://en.wikipedia.org/wiki/Robots_exclusion_standard
#Gopher