💾 Archived View for zaibatsu.circumlunar.space › ~solderpunk › phlog › protocol-pondering-intensifie… captured on 2020-10-31 at 02:14:21.
⬅️ Previous capture (2020-09-24)
-=-=-=-=-=-=-
Protocol pondering intensifies ------------------------------ This is the first of a three-part epic miniseries of, well, ponderings of new gopher-like protocols. If you're not into that kind of thing, feel free to ignore them. If you *are*, grab a coffee or beer or something and get comfy! Any sort of document retrieval protocol needs to specify two things: the format of a client's request, and the format of a server's response. Focusing only on these two things, Gopher is impossible to make simpler or more minimal. It literally is the simplest thing that could possibly work, and this aspect of Gopher can barely even be considered to have been "designed". All of the actual *decisions* are in other details like the item type system and the menu format. The request and response formats are pure Void. For those unfamiliar with the protocol, when you came here, your client connected to zaibatsu.circumlunar.space on port 70 and said: ---------- ~solderpunk/phlog/protocol-pondering-intensifies.txt<CR><LF> ---------- (<CR> and <LF> are new-line characters, read up on ASCII if this is new to you) That's it. Just a unique identifier for the document you want (a "selector" in gopher lingo, it's in every way equivalent to a "path" in HTTP), plus an unambiguous way of terminating the request, and that's it. Both of those things are essential, any functional protocol will have them (and we'll see them in HTTP shortly), and Gopher has nothing else - real ultimate simplicity! In response the server said, well, the contents of this file and that's it. According to RFC1436, the server ought to include a terminating line of ".<CR><LF>", but also says that "The client should be prepared for the server closing the connection without sending the Lastline", and a lot of modern servers seem to leave it out. That's all there is, folks! It's this brutal minimalism which means you can use telnet as a gopher client with only mild discomfort. Let's bump the complexity up a bit! What does an HTTP request look like? The simplest valid one would be: ---------- GET /~solderpunk/phlog/protocol-pondering-intensifies.txt HTTP/1.0<CR><LF> <CR><LF> ---------- UPDATE 17/06/2019: Thanks to Mastodon user @gcupc@glitch.social for pointing out that I had originally standards non-compliant HTTP/1.1 requests in these posts! What's the extra baggage here? The "GET" is called a HTTP method, and tells the server that we want to, well, get the document at the specified path, as opposed to, e.g. upload something new there or delete something which is already there. This is actually another real, core protocol-level difference between HTTP and Gopher - gopher is a strictly consumption-oriented protocol. It's for reading, not writing. You may have seen gopher guestbooks around the place, and surely those involve writing? It's a clever hack using gopher's search functionality. From the point of view of the protocol, you're actually "searching" for your guestbook comment, and the server just does something decidedly non-searchlike with your (256 char or less) query. There's also Alex Schroeder's Oddmuse wiki server[1] which has a gopher mode using a non-standard item type to allow writes. But I'm getting off track, back to the HTTP request! After the "GET", there's the path, no different to gopher, really. Finally, the "HTTP/1.0" is a protocol version number, it tells the server we're using HTTP 1.0 and not a later or earlier version. This information is useful if a protocol changes substantially over its lifetime. If the protocol is fixed in stone, then it's dead weight. Why the blank second line, containing only <CR><LF>? The above is the simplest possible HTTP request, but you can add a lot of extra optional stuff, in the form of "headers". Because you can add as few or as many headers as you like, the number of lines in an HTTP request is variable, and so a blank line is needed to unambiguously end the request. A slightly fancier request might look like this: ---------- GET /~solderpunk/phlog/protocol-pondering-intensifies.txt HTTP/1.0<CR><LF> If-Modified-Since: Wed, 12 Jun 2019 01:02:03 GMT Accept-Language: en-US, de <CR><LF> ---------- This request has two headers, and it says "Send me this phlog post, but only if it's changed since the last time I fetched it a few days ago (if it hasn't changed, I'll use my cached copy), and only if you have a version in US English or German". This is still a *very* minimal HTTP request. A modern browser like Firefox or Chrome will probably jam at least a dozen headers into a typical request. What's wrong with request headers? Well, there is nothing fundamentally wrong with them. A lot of the headers in HTTP are related to caching, and caching is neither dumb nor evil. If all your content is small and it changes rarely then caching is totally unnecessary, but for a fully general purpose protocol it makes sense. I don't think there is any need for caching in gopher, and I don't advocate adding it to a hypothetical new protocol to sit somewhere "between gopher and the web". The language preference thing seems like a nice idea, but in practice I've never seen it actually used. Every multilingual website I've ever visited makes you play "hunt the tiny flag icon" to change the language, so in reality its more dead weight. A lot of HTTP headers fall into these categories: genuinely useful stuff for a sufficiently aspirational protocol, or good intentions which are rarely used in practice. However, request headers are also the mechanism for a lot of the nastiness in the modern web. The "User-Agent" header is how your browser tells the server which browser version you're using, which is None of Their Damn Business and is only something the server actually needs to know if different clients have substantially different ways of handling the same response, which is a Really Dumb Idea. The "Referer" header is how your browser tells the server which *other* webpage linked you to the one you're actually requesting, which is yet more None of Their Damn Business (it has an arguably valid application in preventing "hot linking" of images, but that's not a big concern for anything vaguely gopher-like). And, of course, the "Cookie" header is half of how cookies work - cookies come *into* your browser via a "Set-Cookie" HTTP *response* header (more on those in the next entry), and then sent back via a "Cookie" header in subsequent requests to the same server. Even if you wrote a browser which never sent any of these Three Evil Headers, it turns out that the unique combination of seemingly-harmless headers you might send, about your cache status and your encoding preferences and language preferences and bla-bla-bla can act as a nearly-unique browser "fingerprint" to facilitate tracking (a problem widely publicised by the EFF with their "Panopticlick" site[2] back in 2010. Really, only 2010? It feels older to me...). So, request headers have a lot to answer for. Do we need to ban them outright, or can we just put strong restrictions in place, maybe limit ourselves to one or two or three request headers which are obviously harmless? Well, if we want to have anything even remotely like a strong guarantee of anonymity and untrackability, we'd need to insist on a principle like "almost all requests for the same resource from different clients should look almost exactly the same". And if we meet that condition, then I think request headers become dead weight. If everybody is specifying more or less the same information, then just let that information become an unspoken assumption of the protocol and drop the headers. So, in a protocol which is supposed to be anonymous, I just don't see any place for request headers. In this respect, gopher gets it exactly right and I see no reason to advocate anything other than keeping exactly the same request format in the Future Protocol of Truth and Glory. Strictly forbidding request headers breaks any possible return path for information from the server to the client and back, like cookies. (well, not quite: an unscrupulous server can always inject pseudo-cookies into paths. I've written about this elsewhere[3]. It keeps me up at night, but there's no way to guard against it, so that's that). By breaking this connection, the decision to leave out request headers renders any and all possible *response* headers harmless. Response headers are already less scary, simply because we don't have to worry about clients tracking servers in the same way we do about servers tracking clients. They're just less risky in general for that reason. But if there's *any* way for information that rides in on a response header to make it back to the server, even if it's seemingly harmless information, that channel can and eventually will be abused for tracking. In HTTP, this has happened with "Etag" headers[4]. ETags are a kind of lightweight checksum intended for cache validation, but they have been used as part of so-called "super-cookies", where different clients are sent slightly different ETags for the same resource. Then, even if you delete that site's cookies, if you don't also clear your browser cache, the site can recognise you from the Etag and send you the *same* cookie back. Insidious! So even seemingly harmless response headers can in fact be Pure Evil if there is a back channel. Breaking that channel lets us relax when thinking about response headers - which is good, because I think that they're actually the place where genuinely useful enhancements can be made, compared to just sending the content and nothing else, which is how gopher works. More on this in another entry soon! [1] https://oddmuse.org/wiki/Gopher_Server [2] https://panopticlick.eff.org/ [3] gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/on-gopher-conservatism.txt [4] https://lucb1e.com/rp/cookielesscookies/