💾 Archived View for gemini.ctrl-c.club › ~tjp › gl › 2023-10-25-guppy-v0.3.gmi captured on 2023-11-04 at 11:46:02. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
It feels like there's a renaissance of text client/server protocols in the small web these days, and I love seeing these come out.
dimkr posted a v0.3 draft spec for a new text protocol named "guppy"
The stated goal of the spec is being easy to implement even for hosting servers on microcontrollers such as the Pi Pico W, going more barebones than Spartan in the process.
The largest deviation from similar recent work is eschewing TCP in favor of UDP and implementing out-of-order resolution and retransmission in the application layer. But the spec and implementation complexity arising from this decision are very much at odds with the goals.
Probably the quickest way to get up to speed on this spec is in terms of other small web protocols.
Spartan simplified Gemini by removing TLS from the picture, but complicated it by adding an additional line type to gemtext with request/upload implications. Guppy also declines to encrypt content, and adopts the original form of gemtext without the Spartan := line type.
Like other modern-but-small-web protocols, Guppy adopts a simple single-line CRLF-terminated request type, and a small number of response types differentiated by a leading numeric type. Though any number 2-32767 represents success, and doubles as a starting sequence number.
Then it also adds on additional message types necessary for adding order and retransmission given the choice of UDP transport: continuations, EOFs, and acks. Continuations increment the sequence/identification number of the packet they are continuing and carry additional payload, EOFs look just like an empty continuation, and acks simply echo back a sequence number. Requests and acks are sent by the client, while success/redirect/error responses, continuations, and EOFs are sent by the server (response messages serve as acknowledgement to the client that a request was received).
A bunch of things jumped out at me in the spec immediately.
In the section "Packet Order" we have this:
Servers should transmit multiple packets at once, instead of waiting for the client to acknolwedge a packet before sending the next one.
but later, the section on acknowledgement messages directly contradicts this using the same "should" language:
The server should wait for the client to acknowledge the previous chunk of the response (the success packet or the previous continuation packet) before sending the next continuation packet, to avoid waste of network bandwidth.
It seems like this probably was changed but the acknowledgement section wasn't updated. Or possibly the other way around, but the "Packet Order" version is more pervasive in the spec.
32767 immediately catches the eye of anybody who has written a little bit-twiddling C before. It's the maximum value of a signed 16-bit number. I'd suggest that there are readers who will skim this and assume that a signed 16 bits is enough to fit the sequence number (particularly among readers who are writing software for microcontrollers).
But that initial response sequence number can (and generally, will) be incremented by subsequent continuation and end-of-file packets. So this is a danger that is probably worth calling out. Note also that with the spec-provided 512 byte minimum chunk size, storing the sequence number in an unsigned 16-bit number caps the guaranteed download size at 16MB. Some conversation of the interplay between sequence number integer sizes, chunk sizes, and file download sizes may be in order.
The section on acknowledgement messages ends with this:
The client may attempt re-transmission of an acknowledgement packet
A client won't generally know whether an acknowledgement packet was received by the server because there are no acks of acks in guppy.
One possible (but not guaranteed) ack failure indication would be receiving a re-transmission of an already-acked packet, but this is something the spec elsewhere suggests clients ignore, and is a pretty awkward heuristic to code.
The spec on continuation packets:
The client must ignore packets where the sequence number is not the sequence number of the previous packet plus 1.
This contradicts just about everything said elsewhere about out-of-order packet handling so it probably just wasn't updated in some prior iteration.
On request packets:
The client may attempt re-transmission of a request packet if no response is received after a while and the server must ignore duplicate request packets.
The server won't be able to distinguish re-transmission of the "same" request packet from a legitimate re-request of the same page, perhaps from a client with a refresh button. These are indistinguishable because request packets don't have a sequence number.
I've omitted some other more nit-picky things, partially because they are more minor, and partially because the above list really highlights that the use of UDP is overall a huge problem in the guppy spec. It contributes the spec complexity that led to some mistakes, and it creates protocol complexity that I think this spec under-appreciates, leading to ambiguity and complications for would-be implementors. Every last issue in the above list would be resolved by adopting TCP and stripping out the parts of the spec that deal with rebuilding it's capabilities (which I expect would constitute a majority of it).
It's a classic mistake to look at complicated machinery like TCP and assume it's bloated. But the effort here to provide a simpler alternative only misses *required complexity*, because it was an under-appreciation of the complexity inherent in the problem itself that led to the assumption in the first place.
Even though TCP contains a more complicated and convoluted solution to the problems of re-ordering and re-transmission, its use would be a massive simplification both for this spec and especially for implementors.
But, to each their own! Don't let me dissuade you from tackling these problems on top of UDP if it's what you really want to do. It's certainly an interesting area, and in that case I hope the issues identified above are helpful in fleshing out a comprehensive solution. It's just that what you're doing at that point is not simplification.
Here are a few suggestions, informed by existing implementations (like TCP), which can help with some of the problems I noticed.
---