💾 Archived View for kennedy.gemi.dev › archive › faq.gmi captured on 2023-09-08 at 15:51:26. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

🏎 Delorean Time Machine FAQ

What is the Delorean Time Machine?

The Delorean Time Machine (Delorean for short) is a historial archive of content from Geminispace. It's like the Internet Archive's Wayback machine, just for Gemini. It allows you:

Access and browse Gemini capsules that are offline or no longer exist.
See content that no longer exists.
See previous versions of content.
See content at a specific point in time.

How does Delorean work?

Delorean tracks a list of URLs in Geminispace. For each URL Delorean stores "snapshots." A snapshot is the content that URL returned on a specific time/date. As content changes over time, Delorean stores the newer versions, as well as the old. By looking at the different snapshots, you can view a URLs content and see how it has changed over time, even if the original capsule is not available.

Where does Delorean's data come from?

To build it's search engine database, Kennedy crawls capsules and downloads each URL's content. Delorean imports this content into its archive.

Can I exclude my content from Delorean?

Absolutely! Delorean respects a capsule's robots.txt file, which allows you to tell crawlers, search engines, and archivers what content can be included or excluded.

robots.txt for Gemini

Content passes through 2 filters before being archived in Delorean. First, since all content archived by Delorean comes as a byproduct of the crawler for Kennedy, any part of a capsule that is excluded from search engines will not appear in Delorean. All content blocked by any "Disallow" rules targeting all user-agents or the "indexer" virtual user-agent will not crawled by Kennedy, and so it cannnot appear in Delorean.

The second filter is also controlled by robots.txt. Since Delorean is building an archive, it also respects any "archiver" rules in robots.txt. When importing content from Kennedy, Delorean checks the capsule's robots.txt file for "Disallow" rules targeting the "archiver" virtual user-agent, and will not import any content that is blocked.

How do I exclude my capsule from Delorean's archive?

Create a `robots.txt` file in the root directory of your capsule if it doesn't already exist. Add these lines to it:

user-agent: archiver
Disallow: /

Some content I might want to excluded is in Delorean! How can I remove it?

No worries, mistakes happen. Drop me an email and I can remove it.

Email me

How can my capsule appear in search results, but not in Delorean's archive?

You can use "archiver" rules in robots.txt to control what Delorean archives. By not excluding content from "indexer" user-agents, but excluding content from "archiver" user-agents, your content will appear in search engine results while not appearing in Delorean's archive.

What content is archived?

If its served over Gemini, is smaller than 5 MiB, and isn't excluded by robots.txt, it should be archived.

Most specifically:

Input (1x), Redirects (2x), and Client Certificate (6x) responses are archived.
Success (2x) responses with content less than 5 MiB (5 * 1024 1024) in size. This includes gemtext, plain text, images, PDFs, ZIP files, binary files, anything.
Any text files larger than 5 MiB will be truncated to 5 MiB. Anything with a "text/*" MIME type is considered a text file.

If a URL meets those requirements, assuming it is not excluded by robots.txt rules for indexers and archives, it will appear in Delorean.

How far back does the Archive go?

The oldest content in the archives is from September 2020, with most content appearing by mid-2022.

The completeness of the archive really is a history of the capabilities of the Kennedy crawler, and how often I run it.

Kennedy didn't exist until late 2021. All archived content before that comes from @mozz's Gemini archive work in late 2020.
I used to run crawls from a macOS machine. However TLS 1.3 support in .NET running on non-Windows system was spotty, meaning some TLS 1.3 only-capsules were not crawled for a time.
Until around mid-2022, Kennedy only stored plain text and gemtext. Non-text content between December 2021 and mid-2022 was not captured, so it is not present in the archive.

What happens when this baby hits 88 miles-per-hour?

You've gonna see some serious shit