💾 Archived View for freeshell.de › gemlog › 2022-05-15_Loopy_links.gmi captured on 2022-07-16 at 13:38:17. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-06-03)
-=-=-=-=-=-=-
One thing I found by looking at where my #hashtag crawler went was that some people have a bottomless pit of links.
The first one I saw was someone exposing a repository of the site content. That included a link to the content itself. Not a link to the actual site, but to the copy of it in the repo. That had a repo link, where you could find a site link, and so on. I spotted this when it got to several levels of site/repo/site/repo/site/repo and told the crawler to give up. I'm mildy curious how deep that could go. I suppose it's limited by the maximum length of a gemini request (assuming that either the server or the client respected that limit).
There's another one where someone has a public bookmarking system which is paginated. The first page has a link like /bookmarks?2 that goes to the next page, and so on. The interesting part is that this link is there even if there are no more bookmarks. I tried /bookmarks?9999999999999999999 and it broke. One less 9 was ok. I told my crawler to give up on those too.
Having read Sean Connor's experiences with the crawlers that won't give up on an infinite redirect loop, I think that some crawlers are probably probing the limits of those two capsules.