Misfin(C) Proposal

Updated February 9 2024

Context

This proposal was written primarily before lem's return on January 9th:

Misfin(C) is an informal update to Misfin that is NOT backwards compatible with Lem's Misfin(B). It proposes a few changes to:

increase the content length of messages to 16KB
change the request format to something more consistent with other smallnet protocols and to something distinguishable from Misfin(B)
change the message metadata format to something more inextensible, in the spirit of the smallnet
and add in the concept of message IDs so that there can be threads and replies.
Bring up other issues with the protocol that may or may not need to be considered.

The decisions took place on libera.chat, within the ##misfin channel, and included these participants:

jmjl (Julián Marcos)
mk270 (Martin Keegan)
satchlj
BBSman
cipres (gemalaya)
jeang3nie
clseibold

These decisions are made without prejudice to any future governance arrangements for the protocol.

More Consistent Request Format

The new format that Misfin(C) specifies is largely based on Spartan, Gemini, and Titan where data blocks come after a header ending in CRLF. In contrast, Misfin(B) places the CRLF *after* the data block, separating the data block from the header with a space instead.

Additionally, to distinguish Misfin(C) from Misfin(B), the content length is delimited from the address with a tab, which intentionally fails on Misfin(B) servers that expect a space or that try to parse the address and fail on the tab. Misfin(B) servers that fail on parsing the request header for those reasons are likely to return a 59 Invalid Request error, which clients can use to detect that a server doesn't support Misfin(C).

The new format is as follows:

misfin://<MAILBOX>@<HOSTNAME><TAB><CONTENT-LENGTH><CRLF><MESSAGE>

One can create a socket request reader and parser that supports Gemini, Spartan, Titan, and Misfin all at once using the following strategy:

Read from the socket until a CRLF or a Space.
Check to see if what was read contains a tab, and if it does, then it's a Misfin(C) request.
Cut/Split the string into 2 parts using tab as the delimiter, with the first part being the address and the second part being the content length (if applicable)
Parse the address using a URL parser, and check the Scheme
For Misfin, Continue reading to get the message data: For Misfin(B), read until CRLF, and for Misfin(C), read the number of bytes specified in the content length.
For Spartan, continue reading until a CRLF to get the rest of the header parameters. Then read the data block (for data uploads).
For Titan, read the number of bytes specified in the content length.

Max Lengths

The maximum header length before the CRLF should be 1KiB (1024 bytes), including the CRLF.

The maximum length of messages with metadata should be 16KiB (16384 bytes).

The message length was upped to 16KiB based on the average of an 8KiB email size in the Gemini mailing list, multiplied by 2 to take into account the languages that have 2 bytes per character in UTF-8 (this includes: Arabic, Hebrew, Greek, Cyrillic, and others). While most eastern languages use 3 bytes per character, they also tend to use less characters. Accounting for 2 bytes per character gives us a middle-ground.

New Metadata Format

See our reasoning for this format and its implications:

Misfin(C) metadata format

The new metadata format is more consistent, and hopefully less extensible. Every message is prepended with three lines, ending in LF, in the following static order:

Senders
Recipients
Timestamps

The linetype prefixes ('<', ':', and '@') have been removed from these lines because they are no longer needed. Each line can list multiple values, each delimited with a comma followed by optional whitespace. This combines all values of the same type into one line, where previously in Misfin(B) you had to specify multiple lines for multiple senders and multiple timestamps. If one of the lines has no values, it cannot be omitted, but remains an empty line.

Misfin(C) timestamps must be RFC 3339 UTC formatted like YYYY-MM-DDTHH:MM:SSZ, with an allowed fractional second part.

Similarly to Misfin(B), the senders value list is ordered from most recent sender/forwarder to least recent sender (the original sender).

Message IDs (WIP)

Below are ideas of the problems and possible design considerations of using Message IDs:

The problem with using a hash is it assumes that you cannot have two messages of the same content from the same sender, which assumes that a message being sent to multiple people is of the same thread, unless you put in recipients into the hash, but then the hash changes based on the recipient and a CC system wouldn't really work.

Message ID's just need to be globally unique, they don't have to be fully re-creatable on different systems, because the sender sends the message-ID to the misfin server. We can prevent senders from intentionally creating IDs that conflict with other messages by separating out the ID from the domain and user. So the sender sends some ID, and then the *full message ID* is that ID combined with the address of the user. We can call this something like a sender-ID:

MessageID = SenderID + Sender_User@Domain

This SenderID could, for example, be the timestamp that the message is sent, which is presumably different from the timestamp it is received. The SenderID would be sent over in the message metadata, and then the full MessageID can be reconstructed using this SenderID and the address from the TLS cert.

Using the address of the sender is a way of namespacing all messages. All MessageIDs are namespaced to the sender address, which is verifiable via TLS certs and TOFU. This makes everything globally unique as long as each user makes their MessageIDs unique within their own namespace.

One proposed message ID format is a MD5 hash of the most recent RFC822-style send address along with the message timestamp, as implemented by the Skylab misfin client.

Backwards Compatibility

The Misfin(C) protocol is deliberately incompatible with Misfin(B) (and Misfin(A)) so that a client can distinguish between the two. However, backwards compatibility can be achieved by having all Misfin(C) servers support Misfin(B) requests, and by having Misfin(C) clients detect if a server supports Misfin(C) by sending a request and checking if there is a request invalid error (or 40 temporary failure for some servers, like Lem's server). When a Misfin(C) client detects a failure, it can then revert back to the Misfin(B) protocol and try sending the message again. In this way, we can have backwards compatible software even if the protocols are incompatible.

The request format for Misfin(C) was changed so that it is differentiable from Misfin(B) to allow for both clients and servers to support them both. One reason for having this "Backwards Compatibility" in software is to support those who choose not to move to Misfin(C) and keep the smaller 2KB limit of Misfin(B).

To allow for better interoperability, while all new Misfin(C) clients must use the content length and the new Misfin(C) request and metadata format, they can fallback to Misfin(B) format when a server does not support Misfin(C). All new Misfin(C) servers must handle both the new Misfin(C) format and the old Misfin(B) format for backwards compatibility. Note that the Misfin(B) format will use the old metadata format, so Misfin(C) servers must be able to convert between metadata formats.

Misfin(C) clients can detect if a server supports Misfin(C) by sending a Misfin(C) request and checking if the response is an error (usually a 59 request invalid error). If it is, then they can revert back to the Misfin(B) format and send again.

Senders List Spoofing (WIP)

Misfin(B) servers only ever verify the very last sender. This is a big issue because this sender can spoof a whole chain of senders before it, and even the original sender. This may or may not be a problem the protocol wants to address.

Note: this can be addressed by treated the most recent sender, not the original sender, as the only trusted piece of information in implementations for which this is relevant. For example, mailing lists should not accept mail forwarded to them as it will display unverified information to the mailing list members:

AuraGem Misfin Mailing list note

Known existing software

Clients

Reference implementation (by lem)

cipres' fork

gemalaya browser (by cipres)

Skylab client (by satch)

Servers

Reference implementation

cipres' fork

miselfin (by mk270, unreleased)

Dory (by jeang3nie)

and a few others.

Status

The original Lem spec is under a CC licence:

https://git.sr.ht/~lem/misfin/tree/master/item/COPYING

It was agreed that our document should include this by reference rather than inclusion, and be a "proposal" rather than any kind of usurpation.