💾 Archived View for gemi.dev › gemini-mailing-list › 000477.gmi captured on 2023-11-04 at 12:51:19. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Greetings, I'm happy to report that I have finished my effort to create a historical snapshot of geminispace and upload it to the Internet Archive. I ended up running three separate crawls in total, spaced over a few months. In total there were 115,223 unique gemini URLs captured. Here are some general statistics and download links: Crawl | September | October | November --- | --- | --- | --- Date | 2020-09-24 | 2020-10-31 | 2020-11-07 Size | 9.3 GB | 12.9 GB | 13.5 GB Domains seen | 283 | 276 | 314 Total Responses | 51,995 | 71,632 | 65,347 2x Responses | 43,425 | 61,771 | 56,680 https://archive.org/details/mozz-gemini-crawl-2020-1 https://archive.org/details/mozz-gemini-crawl-2020-2 https://archive.org/details/mozz-gemini-crawl-2020-3 More information on the crawls can be found here: gemini://mozz.us/archive/ The crawling software and related tools can be found here: https://github.com/michael-lazar/mozz-archiver I am also temporarily hosting a mirror of this snapshot on my gemini server. It works using proxy URLs (which I thought was a neat idea). You can send any request for a gemini URL to mozz.us:1966, and the server will attempt to retrieve that URL from the snapshot. Example using gemget: $ gemget --proxy mozz.us:1966 -o - gemini://gemini.circumlunar.space/capcom/ Best, Michael
This is amazing! As an amateur archivist, I really appreciate this. I predict geminispace will only grow, and this will be more and more valuable as time goes on. Will be seeding all three :) The more I look into this, the more pleased I am. Thanks for doing it so well.
---