2019-06-21 Solderpunk's Gemini Protocol

@solderpunk is moving all this thinking about a new protocol to a dedicated area of his site. He calls it the Gemini Protocol. I like the name.

@solderpunk

Gemini Protocol

I recently sent some feedback via Mastodon which I’d like to have a summary of. Here goes.

Redirection

In the post about Redirection he said he liked redirection, but didn’t like URL shorteners and I agree with that. Solderpunk then decided that he wanted some way to prevent these from working *at the protocol level* and suggested them working only within one site.

Redirection

I’m not sure I like this idea. In the past, for example, my private blog was hosted on http://emacswiki.org/alex – and the redirect still works. That makes it hard for me to agree to a “automatic local redirects only” proposal.

http://emacswiki.org/alex

I have a different solution: what if we didn’t implement restrictions at the protocol level but at the client level? Sure, in the future some corporate entity could write a “better” client and follow redirects automatically, and eventually we’d get back URL shorteners. But my hope is that the protocol stays simple enough so that we’ll have plenty of clients. And one good design pattern should be that when a human is following a link, they should be see some text, click through. Perhaps they’ll have to click through three or four times. That’s OK. Conversely, a program such as moku pona should continue finding a feed that has moved elsewhere.

moku pona

Thus: I want humans not to be redirected automatically and I’d like machines to be redirected automatically and I want redirection to work for other domains, too. Therefore, the distinction should happen at the client level. Bad design shouldn’t force us to design a solution that’s not good enough to serve our needs.

Status

In his Speculative Specification, Solderpunk writes about headers. He mentions two of them in particular: a status code, and a MIME type. Both of them make sense but I have a few points I’d like to make.

Speculative Specification

The first regards the status codes themselves. The proposal talks about a single “UTF-8 byte”. I’m guessing that means only the ASCII subset of UTF-8. I think I’m in favour of reusing the well known HTTP codes, though. Using 2 for OK instead of 200 is a gimmick; using 200 saves us all mental bandwidth. Let’s just list the codes that MUST be handled by conforming implementations: 200 OK, 400 BAD REQUEST, 303 SEE OTHER, 500 SERVER ERROR, 429 TOO MANY REQUESTS, for example. Everything else is undefined.

In a way, my proposal is open-ended. I’m saying: perhaps we’ll add more HTTP status codes in the future. And that again invites the well known embrace, extend, and extinguish tactics used against free software in the past. And yet. I don’t know. I guess I feel like a tinkerer. I like to program. Just look at Gopher. Yes, it has served us well, and yet people have used other character encodings, new item types, left off the trailing period at the end of a communication, and so on. Making changes is a natural thing. Sure, we want the protocol to stay simple. We want it to resist changes. But if we design us into a corner where it’ll be hard to extend in the future, then the future will bring us backwards-incompatible versions of the protocol and that thought pains me a bit.

embrace, extend, and extinguish

Headers

When talking about status codes and MIME types, Solderpunks suggests we should simply print them as the first line of the document: separated by a horizontal tabulator and terminated by a carriage return and a line feed.

The reason I don’t like this particular format is the same reason I mentioned above in a different guise. I want things to be open ended. I’m seeing a future where we want to add more headers. Remember how tricky it was to extend the Gopher menu? We did extend it, of course. The `h` item type was added to link to web resources. That’s an extension that worked, because it was possible to add more item types to the list. But when we needed a new field to indicate that an item was to be retrieved via Gopher + TLS, we didn’t know what to do. All we had was four fields: item type and name, selector, host, and port. Where do we specify TLS? We cannot. If only we had a more flexible, extensible way of indicating links. That’s what I mean. Open ended, extensible features are good.

Again, the counter argument is that others will “embrace, extend, and extinguish.” It’s true. Let’s take cookies as an example. They were invented for online shops and eventually they got used to track people and that’s bad. But cookies also enable people to build games and that’s good. It seems to me that we should keep the protocol simple but extensible and provide simple clients that don’t automatically do things that can be abused (follow redirects automatically, store all the cookies, send all the cookies). I guess I’m just not convinced that we need to prevent such things at the protocol level.

The problem with the current web is not just that HTTP and it’s open-ended features can be abused. It’s that it can be abused and browser are such behemoths of a web client that we don’t dare implement them ourselves anymore. It’s a huge undertaking because we need HTML, CSS, Javascript, encryption, proxies, caching, and so many more things. That is why we cannot get rid of abusive cookies and Javascript again. Or if we do, we now have multiple problems: we need a behemoth of a web client and a plethora of add-ons that disable half of the byzantine features. That’s what’s so terrible.

And thus, finally, we get to the format. Assuming that we want an extensible list of headers, and I do, then I see two options.

We could use a human readable format. There’s a well established format for this. Mail headers, news headers, HTTP headers, they all follow this pattern. It’s a list of key-value pairs, one per line, with the format key, colon, space, value. The list of headers ends with an empty line.

To: alex@gnu.org
Subject: Test

This is a test.

There’s also a rule that allows us to wrap these lines in case the values are very large. Hopefully we don’t need this. 🙂

If we’re not going to use colons, spaces and line breaks, I think we should use established ASCII control characters. This is from the ASCII man page on my system:

Oct   Dec   Hex   Char

000   0     00    NUL '\0' (null character)
001   1     01    SOH (start of heading)
002   2     02    STX (start of text)
003   3     03    ETX (end of text)
004   4     04    EOT (end of transmission)
005   5     05    ENQ (enquiry)
006   6     06    ACK (acknowledge)
007   7     07    BEL '\a' (bell)
010   8     08    BS  '\b' (backspace)
011   9     09    HT  '\t' (horizontal tab)
012   10    0A    LF  '\n' (new line)
013   11    0B    VT  '\v' (vertical tab)
014   12    0C    FF  '\f' (form feed)
015   13    0D    CR  '\r' (carriage ret)
016   14    0E    SO  (shift out)
017   15    0F    SI  (shift in)
020   16    10    DLE (data link escape)
021   17    11    DC1 (device control 1)
022   18    12    DC2 (device control 2)
023   19    13    DC3 (device control 3)
024   20    14    DC4 (device control 4)
025   21    15    NAK (negative ack.)
026   22    16    SYN (synchronous idle)
027   23    17    ETB (end of trans. blk)
030   24    18    CAN (cancel)
031   25    19    EM  (end of medium)
032   26    1A    SUB (substitute)
033   27    1B    ESC (escape)
034   28    1C    FS  (file separator)
035   29    1D    GS  (group separator)
036   30    1E    RS  (record separator)
037   31    1F    US  (unit separator)

We could start headers using SOH, separate headers using RS, separate key and value using US, and begin the text using STX and end it using ETX. When saving binary files, the server would hang up at the end and we’d strip the trailing ETX character, of course.

Given the choice between the two, I’m currently thinking that we should simply use mail headers. It’s old technology, but it’s easy to read, easy to edit, which means easy to write tests for, easier to communicate bugs, and so on.

Remember RSS 3.0? Good times! 😀

RSS 3.0

Conclusion

The more I think about it, the more I get a sinking feeling about this. Writing the Simple Text Server and the Simple Text Client was a fun exercise. Reading those posts by Solderpunk and was time well spent reading and thinking (and coding). But I do confess I don’t expect this to actually go anywhere.

Simple Text Server

Simple Text Client

What I’m thinking is that slowly we’ll normalise a few extensions for Gopher and Gopher clients:

1. UTF-8 encoding of text files

2. URL parsing in text files

3. TLS to encrypt traffic

4. Reflowing text is the rule

I don’t know. Perhaps I’m just tired because I wrote so much. But I’m already seeing that two or three people are interested in the idea and already I’m dissenting. How is this going to end well? Perhaps it’s better to not only let people write up their ideas but also start implementing something and see how far it goes? I’m a bit stuck.

This is how I personally work best: think of something, write a quick sketch, write a working implementation (even if minimal), and learn from the experience. Right now I feel like I’m just being a cold blanket.

cold blanket

​#Web ​#Gemini ​#Gopher

Comments

(Please contact me if you want to remove your comment.)

I am not sure I understand the problem you’re trying to solve. What you’re doing here is basically reinventing HTTP/1.0. There is nothing wrong with HTTP, it is a very simple, text based protocol. Easily extendable. Has a lot of clients available.

I don’t think what drew you to gopher in the first place was the transfer protocol. It was the limitation to text. You can transfer text via HTTP. HTTP isn’t the bloated mess you like to avoid, text/html is.

– Andreas Gohr 2019-06-21 23:23 UTC

Andreas Gohr

---

Yeah, perhaps you are right. HTPP/1.0 didn’t include virtual hosts, chunked encoding, and many other things. But there are already small complications:

HTPP/1.0

1. content coding for compression

2. multipart types are left open

3. referrer heading is suggested

4. user agent heading is suggested

But with just a few small tweaks I could write a simple http server without any features. 🙂

And I could specific that the request method is optional and defaults to GET and that the protocol version is optional and if not provided, no further request headers are expected... and it should all work. 🤔

– Alex Schroeder 2019-06-22 11:29 UTC

---

Reply by Solderpunk.

Reply by Solderpunk

– Alex Schroeder 2019-06-22 19:47 UTC

---

This site is also available via Gemini, at the moment.

– Alex Schroeder 2020-05-29 18:05 UTC

---

A longer argument by Solderpunkt about the decision not to repurpose HTTP and HTML, Why not just use a subset of HTTP and HTML? The argument goes that since the border between this “safe web” and the regular web was so smooth, you never know when you’re leaving the “safe web” and you could always use the browser behemoths to browse the “safe web”, therefore the temptation to just use a tiny little extra of the HTTP and HTML features would always be there. Like my use of Javascript on this site, I guess! All I wanted to do was paste images.

Why not just use a subset of HTTP and HTML?

– Alex Schroeder 2020-06-18 21:36 UTC