2022-02-08 NNCP distributed text

I just finished watching an episode of Star Trek: The Original Series. Something about super humans being smug.

This article is a continuation of a previous article, 2022-02-04 Wondering about NNCP. There, I set up NNCP on my laptop and started using John Goerzen’s quux server as a relay. You should try it, if you have a little sysadmin experience. 😃

2022-02-04 Wondering about NNCP

I haven’t made any progress regarding the running of INN, the Usenet server, and the longer I think about it, the more I think that maybe that infrastructure would be too brittle and too esoteric. I thinking that maybe I should rethink what I actually want.

I used to say that I don’t want a federated wiki, because so many things about it would be unclear:

do you accept all incoming edits?
do you send your edits to all connected wikis?
how do you handle edit conflicts?
how do you handle incoming edits to pages you don’t have?

So here’s what I am thinking:

I want a hypertext for a network of friends, maybe a hundred people or so, enabling them to talk to each other, to link to each other. Other media is important but it takes up a lot of space and I’m currently not willing to dedicate a lot of space to this hypertext. Let’s focus on text.

I want a world without servers, with offline nodes, with connectivity from one machine to the next, via all sorts of transports. NNCP will do fine.

Ideally, it would work like an offline copy of the WWW consisting of our own sites. Each one of us can create such sites, and we can link to each other. If we write it well, we can publish those sites on the web and our links will continue to work!

What is a “site”, in this owner-centric way of thinking? It’s a collection of files, possibly from different people, interlinked with a strong theme. The edges are fuzzy. Currently, the edges are arbitrarily formed by the domain names. That’s why alexschroeder.ch and transjovian.org and campaignwiki.org are different sites, even thought they are just directories on my virtual machine hosted on some server out there.

So me writing a site called “Alex Schroeder” or “kensanata” is still possible. Our network has a node with that name, and in it a directory of that name.

I’m starting to suspect that all these ideas I’m talking about have already been solved and it’s called Secure Scuttlebut or something like it. But bear with me, since I only understand a tiny bit of NNCP right now.

So everybody gets a directory tree, and we exchange copies of our directory trees, and feeds. It’s all file based and therefore it’s possible to examine and edit using regular text editors and command line tools. There are tools to refresh the entire installation if we botched it. We agree to only make changes in our own trees, and our software makes sure that we reject files that haven’t been signed by the correct node.

In a way I’m thinking of it like exchanging Gemini capsules using NNCP. All we have to do is make sure we don’t overwrite each other.

For collaboration, we could think of common authorship over a shared directory tree, or we could think of a way to tightly interlink our pages. That would simply involve more social engineering. Think of the blogs on a particular topic making up the “blogosphere”. If a lot of the RPG blogs agree to update the RPG Planet; if all the Gemini capsules agree to update Antenna; then in a way the collection forms an interlinked network with clusters for each blog and each capsule since authors perhaps interlink their own pages more tightly – at least they do right now. Perhaps we just need tool-support to collect those feeds and sitemaps into larger collections.

So how would it work?

NNCP allows us to copy packets between nodes, and we can specify how we want these to be treated on the remote end. Let’s assume we have a shell script that handles how we deal with received packets such that the above web can work.

Each system like my laptop is a node for NNCP. If you generate your config, you get an id. In my case, that’s the following public keys.

id: R23WEIHB52TMA4EKGJPKUDBFSYP2HG4HHW2HGJ3RJATCCRLYDUZQ
exchpub: EGP2MMLQJQUKWTHI22JTIRMR2UV3BA2ATE3AYLVOFODMTNRGAMEA
signpub: YO6SZXVEIU77OQQRKMAUFUT4V3NJER4U7LQE5JI7JORJXKXY5FBA
noisepub: 6ECO4WXJNDED6WHJ6SM2HGRQMUO75X65ALT2YRKZ3YGGBDXRNV4A

This is how I get my id; and in your /etc/nncp.json you would also give it a nickname.

What we want is to have one directory where all the capsules reside, each named after their owner’s id, which we’ll be able to identify using the nickname, too.

Let’s imagine that /var/capsules/R23…UZQ/ is where my capsule is, my home. You capsule would end up in a neighbouring directory.

Knowing this, you can now request files. Let’s talk about the files containing metadata: data about the other files containing the actual texts.

We agree on a bunch of conventions:

“index.txt.zst” is the file listing all the files you need download my capsule; it might be very big. If my site has 10,000 pages then this file has 10,000 lines. The .zst suffix indicates the use of the Zstandard compression.

The filenames are all relative paths, UTF-8 encoded, newline separated, starting with a signature that matches the id of the directory we’re in. Paths may contain spaces (it’s the last field) but no newlines.

SIGNATURE page/About.gmi

This is important. /var/capsules/R23…UZQ/ is my directory; R23…UZQ is my id; and the signatures in my index are by R23…UZQ for each file. This is how we make sure that all the files in the R23…UZQ directory are by R23…UZQ.

“changes.txt.zst” is a file listing the files that have recently changed, with a rough timestamp. We’re keeping it rough because expect this network to be slow. Again, the content is UTF-8 encoded, newline separated lines, thus paths may contain spaces (it’s the last field) but no newlines.

SIGNATURE YYYY-MM-DD page/About.gmi

I don’t think we need version numbers, time stamps, or anything like that. It’s the slow net. The list of changes also doesn’t have to go back forever. If some node gets an update and sees no change it knows about in the file, then it’s time to download the index and download any missing or changed files.

This allows people to get an update of the entire site, and a an update of recent changes which they could assemble into maps and feeds across multiple capsules.

The system doesn’t use MIME types like the web, which is why we need to rely on regular file system tools. In other words, we would rely on the file extension most of the time.

At first, we’d use text formats like .txt (plain text), .md (Markdown), .org (Org Mode), .gmi (Gemtext), .html (why not HTML for rich content?), and so on. Later, we can add multimedia files. Maybe with a 100KiB file size limit, if at all. Let’s work with dithered, black and white images during stage 2? I’d like the script that accepts incoming packets to ensure that any larger media file is either converted, with loss, or dropped.

In order to build aggregators, I suggest that we use subdirectories:

“contrib/medusa/index.txt.zst” is the file listing all my files I want to contribute to the “medusa” project.

“contrib/medusa/changes.gmi.zst” is the file linking to the recently changed files that I want to contribute to the “medusa” project

If I want to assemble the medusa project, I can check the “contrib/medusa” directories of all the capsules I’m connected to using a command line tool which merges all the indexes and changes and presents them to me as a unified directory – possibly using links, copying files, or with a cool GUI client.

How do changes spread? With a kind of pull request. Both me and my friend R connect to each other using NNCP. We have a collection of scripts (that don’t exist yet):

“capsule-update” is used to call R and request either the index or changes. Using the index allows me to get a complete copy of all the capsules R has (and I’d exclude the copy of my own capsule on R’s system, of course). Using the changes file allows me to get updates. If R’s changes don’t go back far enough, I might miss some updates to existing pages. If there is no overlap between the changes I have and the changes I’m getting, I might decide to get the index and try to find missing files. It’s not perfect, but it’s a start. We could add hashes to the index to make it work.

I also send R an encrypted email, asking them to add my capsule and also run “capsule-update” on their end. That’s how they would get my updates.

Using this system we might get around the use of nncp-exec, I think. And we wouldn’t have to call every single one of the people in our network. We can agree on a central node that calls them all and we call them, or we can just interconnect in messy ways, and it would still work.

There’s the problem of trust, of course. I’m sure that we could figure out a way to use signatures instead of hashes in the index file so that every capsule has every file signed by the capsule owner.

That is, /var/capsules/R23…UZQ/index.txt.zst contains a list of files in the R23…UZQ capsule, signed by R23…UZQ. Nobody should be able to change those files without our “capsule-update” script alerting us. We’d need to add those signatures to the text file. Maybe start each line with a signature, then a space, then the path.

Does that sound interesting?

NNCP, by John Goerzen

Zstandard, on Wikipedia

The Transjovian Capsules

#NNCP #Wikis

Comments

(Please contact me if you want to remove your comment.)

⁂

This is similar to a problem I was considering late last year, when Solarpunk inspired a discussion regarding Git as a potential distributed offline-capable publishing tool. The hard problem (as I see it) is making your files content-addressable in a manner where they can be easily “linked” to by other nodes.

My proposal was here: https://mntn.xyz/posts/2021-10-25-a-proposal-for-a-git-link-url/

https://mntn.xyz/posts/2021-10-25-a-proposal-for-a-git-link-url/

Similarly, links in Secure Scuttlebutt: https://github.com/ssbc/docs/blob/master/ssb/linking.md

https://github.com/ssbc/docs/blob/master/ssb/linking.md

The downside is that these links are UGLY and hard to write, but I think at some level hashes and/or signatures are needed to identify content in a system like this. I know that “cool URLs don’t change”, but in reality most URLs aren’t that cool and many people eventually want to change domains, especially for personal sites as they outgrow their old personas. Additionally, with networks like NNCP and Secure Scuttlebutt, you can obtain the same content from a different place than someone else, but both links need to point to the same content. On top of this, you need to know that the author is indeed who they say they are (thus the need for signatures)!

Anyway, the problem you are looking at adds another layer of difficulty onto this one, that of collaborating with a trusted group. I’ve seen some projects attempting this but almost all of them rely on overly complex solutions for synchronization (like SSB, which I love but it’s a bit heavy).

The only thing I’m aware of that may be compatible with NNCP is an esoteric tool known as “V” that was developed by a somewhat shady group of Bitcoin enthusiasts – see https://archive.is/pRfAz. It operates on signed patches that can be distributed in whatever way is most convenient. Their solution may be worth a look because of how straightforward it is. They literally just iterate across files in a few directories, check signatures, and generate an up-to-date source based on whatever “trusted” patches can be applied.

https://archive.is/pRfAz

– mntn 2022-02-09 19:09 UTC

mntn

---

While NNTP is simple, INN (which implements NNTP) is *not* and from what I remember of my days having to deal with it, was a complete nightmare to configure and keep running. You might have better luck just implementing NNTP (RFC-3977 or the older RFC-977) yourself and not deal with INN.

– Sean Conner 2022-02-09 21:04 UTC

Sean Conner

---

Thank you both! My take away from mntn’s comment is that I should think about linking some more. We do have the InterLink or URL Abbreviations system for wikis, so it’s not impossible to do. It’d be nice if we could use URLs, though.

To stick with my example, and assuming Gemtext for a moment: the following example links to a file in the same directory. That’s the easy part.

=> hello.gmi Hello

If my friend R wanted to link to my About page, they would have write something like this:

=> ../R23…UZQ/page/About.gmi About Alex

I hope I got that right. Up from their directory, down into my directory, and then to the page. At least, that’s what I would write if I were authoring my pages in Gemtext.

How could we avoid the need to write those ugly ids? Perhaps we can also write something like the following:

=> file://alex/page/About.gmi About Alex

Yes, there’s a hostname in a file URL!

Usually, the hostname is empty for file URLs. But here’s what the RFC says about the hostname for file URLs:

The “host” is the fully qualified domain name of the system on which the file is accessible. This allows a client on another system to know that it cannot access the file system, or perhaps that it needs to use some other local mechanism to access the file. – RFC 8089, section 2

RFC 8089, section 2

I think this is what we’ll use: “some other local mechanism to access the file.”

I already mentioned the special files index.txt.zst and changes.txt.zst, but now I’m going to add a third one: aliases.txt.zst. This is where every capsule announces the aliases it will use for the various ids.

So R might have the following aliases defined:

alex R23WEIHB52TMA4EKGJPKUDBFSYP2HG4HHW2HGJ3RJATCCRLYDUZQ

This is a surprising turn of events, but that’s it: within the world of capsules network, all the links are file URLs. There is no protocol like HTTP or Gemini to specify – the files are already there. 😃

When serving the site to the outside world, we could use this knowledge, and the knowledge that “alex” has his site hosted on a web server for the “alexschroeder.ch” domain turn the link into something like this:

=> gemini://alexschroeder.ch/page/About.gmi About Alex

That would require yet another mapping from aliases to domain names and it would only work for isolated clusters of sites. Surely there will be sites that aren’t available on the open internet, or only available via particular protocols, and all of that would need to be take into account.

Also, copyright, and moral obligations to not do surprising things with other people’s capsules – like putting them all into the open Internet.

Regarding Sean’s comment: the problem is that NNCP is like UUCP. Packets can travel for many hours before the get received and many more hours pass before the reply packets gets back. That’s why suspect that NNTP might not work. What would work is if files depart and arrive in some incoming and outgoing spool areas.

As for git: this is not my area of expertise, so take with a grain of salt. I think a shallow git checkout would be something like a website, but a full repo has each commit linked to its ancestor via signed hashes, so you cannot remove a revision of a file in the distant past without recomputing all the hashes, which makes your repo incompatible with every body else’s copy (force push and all the problems it entails). This is a features for repositories like git and it is what I want to avoid.

Git also lacks a key part of the capsule network: every user only edits their own files, yet relays all the files it knows about.

– Alex 2022-02-09 22:30 UTC

---

Yes, you’re onto what I was describing. One thing you should consider, though, is using URNs instead of URLs. URLs are intended to reference a particular resource at a particular location, while URNs are intended to reference content based on some kind of unchanging identifier. A URN doesn’t attempt to tell you where to find the resource, it just gives you the information you need to locate it using another source (libraries, directories, etc).

To use a URN natively within a browser, you’d need two things:

A browser extension that translates the URN into a URL that points to a local proxy. For instance, it could rewrite `urn:nncpwiki:some-identifier:some-path` into `localhost:12345/some-identifier/some-path`.
A local service that waits for requests at `localhost:12345` and retrieves cached content from disk (or fetches new content) and passes it to the browser. If you’ve ever accessed a Tor Onion site or IPFS resource through a local proxy, this is somewhat similar.

Of course, if you have the URN, you can also just manually resolve the URN yourself by fetching the NNCP data and opening the referenced file from disk.

The local proxy approach gives you the option of displaying some kind of interstitial status page if the NNCP data hasn’t been loaded yet. If it’s from a new or unknown source, you could simply alert the user so they can decide how to proceed.

Anyway, just food for thought! I’m interested to see where you end up going with this.

(Also re: Git, I agree that it wouldn’t work for your use case.)

– mntn 2022-02-10 07:14 UTC

mntn

---

The URN proposal is interesting, but I think if we keep using (file) URLs, existing clients should be able to just read and navigate the local copy of the hypertext: a web browser, a gemini client – as long as they can handle local files, they can browser it. At least that would be the goal.

A local proxy allowing users to request more “sites” to be added is an interesting idea, but also more infrastructure that has to work. With the proposal as-is, we could have a tool that reads the aliases.txt.zst file and we’d know immediately what we are missing – although not necessarily how to get it.

How to request to be added as a peer remains interesting. I guess you always need some other line of communication (unlike the web and other centralised solutions).

So, if my friend R adds a new alias to his list, my tool will know this, and it can request the index of all the pages the new id is offering, and request the files from R, no problem. But how would somebody new ever get added? They need a friend on the existing network. And how does that work? I guess a tool can look at my aliases, and the aliases of my aliases, and request the files of aliases one, two, or three degrees removed. Eventually, you can download the whole world hypertext, but perhaps that doesn’t make much sense.

Also, legality issues start popping up.

I really should start a list of the actions we need to implement in tools.

init: create NNCP config if necessary; pick a new directory for the capsules; create your own capsule based on your id, create empty aliases, index, changes
add alias: add an alias from the list of neighbors in the NNCP config file; explain how to add a neighbor to the config; request index of the new alias
remove alias: remove from our alias file and delete their capsule
check: check the configs; offer to delete aliases no longer in the list kf neighbors in the NNCP config file; offer to delete capsules no longer listed in the aliases list
update: optionally run the NNCP caller if the incoming directory is empty; move new files to their corresponding places: only accept a new index if it contains an entry for itself with a signature we can verify based on the directory it is in (the id from the NNCP neighbors) and the id is in our alias list; do the same for changes (which must list itself!); use changes to update existing indexes; only accept other files if they are listed in the index; check the updated entries in the indexes and request any files where the signature no longer matches; remove any files no longer listed in the indexes
ls: list aliases with globbing
cd alias: change working directory with “self” being our own and the default
regenerate index: regenerate our index based on last modified timestamps
regenerate changes: regenerate our changes based on the last modified timestamps
prepare all: collate all the indexes and changes over all capsules into a human readable format in Gemtext, plain text, Markdown, Org Mode, HTML, at the top level (outside all capsules)
prepare topic: do the same for a topic (with links into the various capsules, of course)

– Alex 2022-02-10 11:50 UTC