๐พ Archived View for station.martinrue.com โบ krixano โบ 33ad4a11ddfd4a4296bc7d4ed066110d captured on 2024-05-26 at 16:38:43. Gemini links have been rewritten to link to archived content
โฌ ๏ธ Previous capture (2024-05-12)
-=-=-=-=-=-=-
Currently running my crawler. It's quite slow right now, but it works. If anyone has problems with it hammering your server (the IP of the crawler is 67.60.37.132), do tell me so I can figure something out.
3 years ago ยท ๐ skyjake, aka_dude
@skyjake You can try what I have for my Search Engine so far here: gemini://pon.ix.tc/searchengine/
So far 1393 pages have been indexed. The ranking/scoring system hasn't been implemented yet, and neither have backlinks, but those are coming soon. ยท 3 years ago
gemini://pon.ix.tc/searchengine/
I see! I find crawling and indexing an interesting problem, but don't have the time or resources to work on it myself... Good luck and keep us posted. ๐ ยท 3 years ago
@skyjake Righ now I'm just trying to make it fast enough that it doesn't take 111 days to crawl all of geminispace, lol. I will be handling big files - right now I just download a file until i hit a limit file size, then I disconnect the connection before the rest of the file can download - but I'll probably change this in different ways in the future. It is intended to run continuously - it's for a new Gemini Search Engine I am making. Currently, capsules are randomely spread out due to the random ordering of hash tables, so hitting one capsule a bunch of times at once is less frequent (this was kinda an accident optimization, lol). ยท 3 years ago
What are your plans for the crawler? Will it have some special intelligence for dealing with capsules that contain large/deep path trees, and/or multi-MB files? Is it intended for running continuously/autonomously, and does it have some sort of heuristics for figuring out how often content might be changing on capsules, so they can be reindexed in a timely (but not excessive) manner? ยท 3 years ago