robots.txt for Gemini formalised

"Drew DeVault" <sir at cmpwn.com> writes:

> A web portal is a one-to-one mapping of a user request to a gemini
> request. It's not an automated process. It's a genuine user agent, an
> agent of a user.

I believe the concern is not that a web portal will archive pages, or
run on its own as an automated process, but that it will be used by a
third-party web bot (i.e., one not run by the owner of the portal) to
crawl Gemini sites and index them on the web.

> As the maintainer of such a web portal, I officially NACK any
> suggestion that it should obey robots.txt, and will not introduce such
> a feature.

It seems to me that the correct thing is for people that run web portals
to have a very strong robots.txt on /their/ web site, and additionally,
to be proactive about blocking web bots that don't observe robots.txt. I
think people want to block web portals in their Gemini robots.txt
because they don't trust web portal authors to do those two things. I
understand the feeling, but they're still trusting web portal authors to
obey robots.txt, which is honestly more work.

-- 
+-----------------------------------------------------------+
| Jason F. McBrayer                    jmcbray at carcosa.net  |
| A flower falls, even though we love it; and a weed grows, |
| even though we do not love it.            -- Dogen        |

---

Previous in thread (25 of 70): 🗣️ James Tomasino (tomasino (a) lavabit.com)

Next in thread (27 of 70): 🗣️ Drew DeVault (sir (a) cmpwn.com)

View entire thread.