About a month ago, I was checking my webserver logs when I noticed mutiple requests to pages that have long been marked as gone. The webserver has been returning HTTP (HyperText Transport Protocol) status code 410 Gone [1], in some cases, for over fifteen years! At first I was annoyed—why are these webbots still requesting pages I've marked as gone? But then I started thinking about it—if I were writing a webbot to scan web pages, what would I do if I got a “gone” status? Well, I'd delete any references to said page, for sure. But when what if I came across the link on another page? I don't have the link (because I deleted it earlier) so let's add it to scan. Lather, rinse, repeat.
So there's a page or pages out there that are still linking to the pages that are long gone. And upon further investigation, I found the pages—my own site!
Sigh.
I've fixed some of the links—mostly the ones that have been causing some real issues with Gemini requests, but I still have scores of links to fix in the blog [2].
I also noticed a large number of permanent redirects, and again, the cause are pages on my own site linking to the non-canonical source. This isn't that much of an issue for HTTP (because the HTTP connection is still open for further requests) but it is one for Gemini (because each request is a separate connection to the server). I started fixing them, but when I did a full scan of the site (and it's mostly links on my blog) there are a significant number of links to fix—around 500 or so. And mostly in the first five years of entries.
[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.11