💾 Archived View for kennedy.gemi.dev › archive › faq.gmi captured on 2023-09-08 at 15:51:26. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-07-22)
-=-=-=-=-=-=-
The Delorean Time Machine (Delorean for short) is a historial archive of content from Geminispace. It's like the Internet Archive's Wayback machine, just for Gemini. It allows you:
Delorean tracks a list of URLs in Geminispace. For each URL Delorean stores "snapshots." A snapshot is the content that URL returned on a specific time/date. As content changes over time, Delorean stores the newer versions, as well as the old. By looking at the different snapshots, you can view a URLs content and see how it has changed over time, even if the original capsule is not available.
To build it's search engine database, Kennedy crawls capsules and downloads each URL's content. Delorean imports this content into its archive.
Absolutely! Delorean respects a capsule's robots.txt file, which allows you to tell crawlers, search engines, and archivers what content can be included or excluded.
Content passes through 2 filters before being archived in Delorean. First, since all content archived by Delorean comes as a byproduct of the crawler for Kennedy, any part of a capsule that is excluded from search engines will not appear in Delorean. All content blocked by any "Disallow" rules targeting all user-agents or the "indexer" virtual user-agent will not crawled by Kennedy, and so it cannnot appear in Delorean.
The second filter is also controlled by robots.txt. Since Delorean is building an archive, it also respects any "archiver" rules in robots.txt. When importing content from Kennedy, Delorean checks the capsule's robots.txt file for "Disallow" rules targeting the "archiver" virtual user-agent, and will not import any content that is blocked.
Create a `robots.txt` file in the root directory of your capsule if it doesn't already exist. Add these lines to it:
user-agent: archiver Disallow: /
No worries, mistakes happen. Drop me an email and I can remove it.
You can use "archiver" rules in robots.txt to control what Delorean archives. By not excluding content from "indexer" user-agents, but excluding content from "archiver" user-agents, your content will appear in search engine results while not appearing in Delorean's archive.
If its served over Gemini, is smaller than 5 MiB, and isn't excluded by robots.txt, it should be archived.
Most specifically:
If a URL meets those requirements, assuming it is not excluded by robots.txt rules for indexers and archives, it will appear in Delorean.
The oldest content in the archives is from September 2020, with most content appearing by mid-2022.
The completeness of the archive really is a history of the capabilities of the Kennedy crawler, and how often I run it.