Mirroring sources used in nixpkgs (software preservation)

Comment on Mastodon

Introduction

This may appear like a very niche use case, in my quest of software conservancy for nixpkgs I didn't encounter many people understanding why I was doing this.

I would like to present you a project I made to easily download all the sources files required to build packages from nixpkgs, allowing to keep offline copies.

Why would you like to keep a local copy? If upstream disappear, you can't get access to the sources anymore, except maybe in Hydra, but you rely on a third party to access the sources, so it's still valuable to have local copies of software you care about, just to make copies. It's not that absolutely useful for everyone, but it's always important to have such tools available.

nixpkgs-mirror-tarballs project page

How to use it

You must run it on a system with `nix` installed.

After cloning and 'cd-ing' into the directory, simply run `./run.sh some package list | ./mirror.pl`. The command `run.sh` will generate a JSON structure containing all the dependencies used by the packages listed as arguments, and the script `mirror.pl` will iterate over the JSON list and use nix's fetcher to gather the sources in the nix store, verifying the checksum on the go. This will create a directory `distfiles` containing symlinks to the sources files stored in the store.

The symlinks are very important as they will prevent garbage collection from the store, and it's also used internally to quickly check if a file is already in the store.

To delete a file from the store, remove its symlink and run the garbage collector.

Limitation

I still need to figure how to get a full list of all the packages, I currently have a work in progress relying on `nix search --json` but it doesn't work on 100% of the packages for some reasons.

It's currently not possible to easily trim distfiles that aren't useful anymore, I plan to maybe add it someday.

Trivia

This task is natively supported in the OpenBSD tool building packages (dpb), it can fetch multiples files in parallel and automatic remove files that aren't used anymore. This was really complicated to figure how to replicate this with nixpkgs.