2024-12-13 Archiving homepages

I keep thinking about self-hosting and people dying, myself included. So my first wish for the end of the year is a solar-powered machine that takes all my websites and turns them into clay tablets to bury and survive the coming darkness.

Other than that, however, I think the solution would have to involve a kind decentralized archive sharing where I offer an archive (zip, tarball) for download and whoever has it can share it peer-to-peer with others. Is this how BitTorrent works? I think I don't understand what words like "tracker" mean. Also, where does the original torrent come from and where does it go? I know there are sites where I can search for and download torrent files. But what happens if Alice has a file she wants to share with others including Bob, does she create a torrent and offers it on her website, Bob finds it, downloads it, runs a torrent client and gets a copy. If Alice and her website disappear, how does Charlie get a copy now? Bob isn't hosting Alice's torrent file on his website. So are they all dependent on a torrent hosting site?

BitTorrent

I'm only half-aware of the InterPlanetary File-System (IPFS) and when I read the Wikipedia page, there's stuff about hashes and content addressing, but how does that work from a user perspective? Is there a directory? How does Charlie learn about Alice's site that's no longer online and how does Charlie get a copy from Bob? Can Bob make a list of files on offer and Charlie can get them all, maybe from Bob and maybe from others?

InterPlanetary File-System

In this case, preservation would mean: you need people interested in keeping a copy; the copies need to survive; the copies must be listed; the lists must be distributed widely; at least some people must make copies of these lists.

So, for me and you and some other fedi randos, we could have a "fedi website archive" list where our names are listed together with the hashes pointing to the content, and some IPFS client would keep it in sync.

The next question, though: how do we keep this list updated? What little I know about the blockchains underlying these protocols is that they are immutable so is there a way to say: "this is the updated list"? That would require some sort of social control and trust, too. An association of the living members of the "fedi website archive" list that manages the yearly updates, perhaps?

And so how would the maintenance actually work, I wonder. I write a web app. We chat. (I think the human element is important.) You have an account based on an email address and upload an archive and give it a name (the name of your website, a short description, its current URL). Once a year, the living associates meet and discuss whether to dump some of their members who have turned fascist or whatnot (sadly, always a possibility). Then we use the data gathered by the website to generate a new "directory" list with names, description, URL and hash (the URL may no longer work) and all the members share or host (??) this new directory and drop previous directories so that the old versions of our sites can be forgotten. And the IPFS clients do the magic of actually exchanging the archive bytes?

Perhaps all of this only requires BitTorrent and hosting the torrent files on our own websites.

Would that work? Would you want to be part of this association? We could create an association according to Swiss law. There are some famous international orgs that use this format.

an association according to Swiss law

@edsu@social.coop wrote back, talking about pincushion. And that's what it is all about:

pincushion

The basic idea is that users should be able to download and view their
data without losing the context they have added. We want a pincushion
to represent a user’s collections, pins, images, videos, audio, tags,
locations, comments … and we want users to be able to view this content
when Historypin is no longer online, or even when the user isn’t
online. Maybe the pincushion is discovered on an old thumbdrive in a
shoebox under the bed.
This means that the resources being served dynamically by the
Historypin application need to be serialized as files, and specifically
as files that can be viewed directly in a browser: HTML, CSS,
JavaScript, JPEG, PNG, MP3, MP4, JSON. Once a users content can be
represented as a set of static files they can easily be distributed,
copied, and opportunities for replicating them using technologies like
IPFS become much more realistic.

This is what I'm talking about! It's all about exporting user data out of a dynamic site into a form based on static files.

Oddµ tries to do the same: all the pages you write, all the files you upload, everything can be downloaded as a static website. The dynamic wiki application on top of it is essentially optional.

Oddµ

There are two things I still haven't solved, though:

1. Bots crawling the web downloading the zip file again and again.

2. A distribution procedure and organisation that keeps it alive.

Perhaps BitTorrent is the better solution. There are a gazillion tools. It's been around for a while. It seems like a stable platform.

@edsu@social.coop also added:

You might want to check out how Magnet Links work, which let users
share links to their torrents and retrieve the metadata and data
from peers using a Distributed Hash Table (DHT). IPFS uses a DHT as
well.

I think this is the way forward. People host to the latest torrent file and host a copy of the data.

The org gets together once a year to celebrate, to welcome new members, to honor those who have passed, the review complaints, produces a new torrent file and we all agree to update our link.

All of this while keeping in mind the two forces pulling in opposite directions: We want the right to curate, forget, delete, revise, but we also want to archive in a decentralized fashion.

It seems to me that the only way to "solve" this currently is via a social process. I suspect that the problem of reconciling these two requirements is fundamentally not solvable on a technical level.

And we need to start practising now, while we're alive.

​#Archives ​#Web

a wiki for the association