💾 Archived View for freeshell.de › gemlog › 2022-01-08.gmi captured on 2023-07-22 at 16:30:37. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-03-01)
-=-=-=-=-=-=-
See previous posts about #hashtags
Other crawlers report hundreds of thousands if URLs. Mine's looking like it might stay in tens of thousands. Estimating this is hard.
There are all sorts of things a crawler finds that I don't see in everyday usage. There are message boards and mirrors of web content and blog entries from decades ago and many many things that make no sense to me at all. You won't see those just following Antenna.
Sometimes the crawler just waits forever for a URL. I thought it was broken, but other clients behave the same way. So occasionally I have to kill a stuck request by hand. Odd.
The sequence of URLs is random, and I thought that would be enough to avoid hammering anyone's capsule. But I stll got some "44 slow down" responses. Apologies to those people. I noticed that in all cases, the requested wait before another request was many days. So I just stopped crawling those hosts.
I find it hard to just let the crawler run. I want to know how it's doing. All the time. I keep running stats scripts and checking for this and that. I should let it be. Particularly as there is no time it will stop.
OK, that's all the things.