💾 Archived View for thrig.me › blog › 2023 › 05 › 29 › http-client-person-from-gemini.gmi captured on 2023-11-14 at 08:14:09. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-06-14)

🚧 View Differences

-=-=-=-=-=-=-

The HTTP client person as seen from Gemini

https://daniel.haxx.se/blog/2023/05/28/the-gemini-protocol-seen-by-this-http-client-person/

As a sign of this, the protocol is designed by the pseudonymous “Solderpunk” – and the IETF or other suitable or capable organizations have not been involved – and it shows.

"Designed by committee" isn't exactly a ringing endorsement; it is not hard to find baroquely complicated protocols, emissions of some expert committee or the other--and it shows.

The reduced complexity however also makes it less visually pleasing to users and by taking shortcuts in the protocol, it risks adding complexities elsewhere instead.

If we're trading random opinions about what is visually pleasing, I find gemtext to be quite pleasing and readable, and tag the modern web as "vigorously unusable" for a variety of reasons that very much includes the visual--annoying pop-ups, CPU wasting animations, anemic fonts, low contrast, and unreadable colors. If a site works in w3m, it can sometimes be as good as gemtext. For example:

webview.png

geminiview.png

That web page isn't bad. Would I try to view it in Firefox? Nope. Meanwhile, elsewhere, it's often the iron curtain of javascript:

PrincipleOfLeastSurprise.png

modernweb.png

Would I try to view these in Firefox? Nope.

Now, the average meat popsicle is gonna want their bling and jiggle, but that's hardly "augmenting the human intellect", and some might start to wonder why worker productivity is so low these days. Perhaps the modern web has too many distractions, too much bloat? Ezra Klein has some thoughts in this space.

But of course, the protocol is also so simple that it lacks the power to do a lot of things you can otherwise do on the web.

Good.

The only protocol specification is a single fairly short page that documents the over-the-wire format mostly in plain English (undoubtedly featuring interpretation conflicts), includes the URL format specification (very briefly) and oddly enough also features the text/gemini media type: a new document format that is “a kind of lightweight hypertext format, which takes inspiration from gophermaps and from Markdown“.

The web has a hilariously huge specification--a "reckless, infinite scope"--that undoubtedly features interpretation conflicts (to say less of what Google wants). Point for gemini. A new documentation format is not odd at all, if you think about it for a little bit. Another point for gemini.

The spec says “Although not finalised yet, further changes to the specification are likely to be relatively small.” The protocol itself however has no version number or anything and there is no room for doing a Gemini v2 in a forward-compatible way. This way of a “living document” seems to be popular these days, even if rather problematic for implementers.

I did not find gemini very problematic to implement, though admittedly I do not have the benefit of 30 years experience implementing internet application protocols, and have never worked as a programmer.

HTTP/3? HTML+CSS+JavaScript? Not even gonna try.

The Gemini protocol reeks of GOPHER and HTTP/0.9 vibes. Application protocol style anno mid 1990s with TLS on top. Designed to serve single small text documents from servers you have a relation to.

Reek to one is a fabulous blue cheese to another.

Or, so what if the web has some shiny HTTP rails if they go straight to a pond of shit.

We know from HTTP and a primary reason for the introduction of HTTP/1.1 back in 1997 that doing short-lived bursty TCP connections makes it almost impossible to reach high transfer speeds due to the slow-starts. Also, re-doing the TCP and TLS handshakes over and over could also be seen a plain energy waste.

Why does the protocol need high transfer speeds? And the energy waste of gemini is trivial compared to that of the modern web. Start Firefox? No, I already know what the CPU fans sound like, and how much memory does that piggy want this year?

Future protocols can probably dial down the energy use. These would doubtless be neither the modern web nor gemini. But degrowth sure ain't popular with the powers that be.

Serving an average HTML page using a number of linked resources/images over this protocol is going to be significantly slower than with HTTP/1.1 or later. Especially for servers far away. My guess is that people will not serve “normal” HTML content over this protocol.

If by "normal" you mean a document bloated with menus, tracking, ugly visuals, dark patterns, autoplaying videos, pop-ups, images, ads, and javascript with a visible refresh rate like Word 6 had, and who knows how many security vulnerabilities along for the ride, then, probably not.

A recent web page I was trying to summarize weighed in at ~350,000 bytes; the actual content on that page was probably 1300 bytes. And that was without any additional resources that a Firefox would go out and grab by default. Gemini? Cat the text into a file. HTML? Load up a parser library, learn how CSS Selectors are useless, and that all that pretty OO code is too slow, try another parser library, figure out how to build a buffer up with the text you want, emit those blocks into a plain-text form... err, you were saying something funny about gemini being a waste of CPU?

By the way, text/html is the 10th most popular MIME type on gemini according to one search engine; better formats such as text/gemini and text/plain predominate.

Gemini only exists done over TLS. There is no clear text version.

So? I heard a rumor that HTTP was moving to only HTTPS.

There is nothing written about how a client should deal with the existing query part in this situation. Like if you want to send a query and answer the prompt. Or how to deal with the fact that the entire URL, including the now added query part, still needs to fit within the URL size limit.

The prompts work pretty good in amfora, or are not that difficult to write manually with a printf(1) piped to nc(1). So some folks were, somehow, able to figure this out well enough.

A challenge is of course that on the first visit a client cannot spot an impostor, and neither can it when the server updates its certificates down the line. Maybe an attacker did it? It trains users on just saying “yes” when asked if they should trust it. Since you as a user might not have a clue about how runs that particular server or whatever the reason is why the certificate changes.

Trusting all of the Certificate Authority certificates that ship with a typical web client has its own set of problems. Maybe some nice government told someone to sign something, or there's a corrupt employee, or the black hats can use lettuce encrypt just as well as anybody else, but hey, there's some blue checkmark, so we must be good!

The concept of storing certificates to compare against later is a scaling challenge in multiple dimensions:
(blah blah blah blah blah)

Nice gish gallop. However, in practice,

    $ cd ~/.cache
    $ wc amfora/tofu.toml
         946    2838   65359 amfora/tofu.toml
    $ wc /etc/ssl/cert.pem
        6417   13657  341121 /etc/ssl/cert.pem
    $ du -sh amfora
    68.0K   amfora
    $ du -sh mozilla
    17.7M   mozilla

I'd better order another disk, stat, given how unscalable gemini is!

My host count is probably higher than many as I like to visit new hosts found by the search engine. Haven't seen speed issues in amfora. Nearly always see speed issues in Firefox.

P.S. Firefox hasn't been run in a while, and I tell it to delete "everything" when it quits. For some value of everything. I guess. "Firefox" here can stand for Chromium or whatever. Last I checked Chromium was crashing at startup, and of course Chrome is not available.

I strongly suspect that many existing Gemini clients avoid this huge mess by simply not verifying the server certificates at all or by just storing the certificates temporarily in memory.

Depends on the client. For the popular ones, probably you'll want to be wearing edible shoes if you want to keep making that argument.

(And of course no government would ever try to backdoor a major browser, and of course users are well known to do the right thing when there isn't a blue checkmark or whatever in their browser, and of course Chrome did not have, what, nine zero-days last year...)

You can opt to store a hash or fingerprint of the certificate instead of the whole one, but that does not change things much.

You will see fewer certificate rotation notices in amfora if the new certificate is signed against the existing private key. But this argument is pretty technical and probably boils down to the definition of "much".

I think insisting on TOFU is one of Gemini’s weakest links and I cannot see how this system can ever scale to a larger audience or even just many servers. I foresee that they need to accept Certificate Authorities or use DANE in a future.

Certificate Authorities have various problems, as mentioned above, and some folks do use lettuce encrypt certificates on their gemini servers, and a gemini client could easily offer HTTPesque verification, if you want (mine does). In practice TOFU hasn't been a problem, so a mandate of CA certs is not showing up in my crystal ball. How much have you used gemini?

Whether DNS security happens is an open question. Some have not been exactly enthusiastic about DNSSEC. "Don't hold your breath" might be good advice here. Ah, and this just came in:

DNSSEC KSK rollover breaks DNS resolution for .nz domains

https://status.internetnz.nz/incidents/gq1c6slz3198

Whoops!

On the plus side, we have an actual experiment running to see how well TOFU works out (besides SSH). Would that have gotten past a committee?

Another point: TOFU may make you slow down and think about the verification. Who are you trusting? How automatic is this? Are the certificate authorities really secure? Most won't think, like with the SSH hostkeys, or the blue checkmarks... well, you can lead a horse to water, but not everyone will take a drink.

The Gemini URL scheme is explained in 138 words, which is of course very tense and assumes quite a lot. It includes “This scheme is syntactically compatible with the generic URI syntax defined in RFC 3986“.

Probably this area needs more work. But then again so does my main web browser, w3m, which does not support punycode, yet. Neither the URL spec being wishy washy nor w3m supporting punycode yet have caused any great deal of trouble.

The document is carelessly thinking “host name” is a good authority boundary to TLS client certificates, totally ignoring the fact that “the web” learned this lesson long time ago. It needs to restrict it to the host name plus port number . Not doing that opens up Gemini for rather bad security flaws. This can be fixed by improving the spec.

Probably? The amfora author did mention changing their implementation to use host:port, but that's an easy fix, unlike the outrageous complexity of the modern web.

The text/gemini media type should simply be moved out the protocol spec and be put elsewhere. It documents content that may or may not be transferred over Gemini. Similarly, we don’t document HTML in the HTTP spec.

text/gemini is by far the most common MIME type in gemini space. By, like, two orders of magnitude, excluding image/jpeg, in which case it's only one order of magnitude. So a default of text/gemini when there are no bytes looks pretty sensible, and is fairly integral to the protocol, so should not be moved elsewhere. But here we argue opinions, again.

More urgent might be to move Chrome out of the web specifications, but, uh, good luck with that. Find me a regulator who would geld Google.

I am fairly sure that once I push publish on this blog post, some people will insist that I have misunderstood parts or most of the protocol spec. I think that is entirely plausible and kind of my point: the spec is written in such an open-ended way that it will not avoid this. We basically cannot implement this protocol by only reading the spec.

And, yet, somehow, there are a lot of servers, and a lot of clients, and it mostly seems to work out for us geminauts. From this, we might conclude...