💾 Archived View for mozz.us › journal › 2020-02-26.gmi captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

On: WWW indexing concerns

Published 2020-02-26

m68k brought up an interesting point on the mailing list about how https://portal.mozz.us doesn't have a robots.txt. As a result all of the content on gemini is being indexed by google and other WWW crawlers. This was an oversight on my part and something that I just never considered. I went ahead and added a robots.txt file that disallows all crawlers.

https://portal.mozz.us/robots.txt

Personally, web crawlers don't bother me much concerning my own content. I suppose if I had a choice, I would rather *not* have my gemini content accessible on the web. I feel like there's somewhat of a romantic appeal to publishing original content on a fringe platform. It makes my ultimately pointless and insignificant contributions feel a tiny bit less pointless and insignificant.

Back in gopher land, I actually experimented with setting up a captcha to specifically block web crawlers from getting stuck in request loops. This worked out well. I think this might be achievable in gemini. Perhaps by asking an input challenge and then requesting a transient client certificate to validate the session. If only jetforce supported transient TLS certs...

gopher://mozz.us:7006/7/newgame/lost-pig/

(click "Play Lost Pig" to solve a math problem

On the subject of HTTP -> Gemini proxies in general: I would rather be using a native client, no question about that. But I mostly browse gemini on my phone and I'm not a mobile developer (damn you Apple for charging $99 for a license). So I setup a browser shortcut on my home screen that opens up portal.mozz.us. I check every couple of days to see if anybody has updated their gemini sites.