An outsider's view of the `gemini://` protocol

🗣️ From: solderpunk (solderpunk (a) SDF.ORG)
📅 Sent: 2020-02-28 22:32
📧 Message 7 of 17
On Fri, Feb 28, 2020 at 01:16:30AM +0200, Ciprian Dorin Craciun wrote:
> Hello all!
> 
> Today I've stumbled upon the `gemini://` protocol specification
> (v0.10) and FAQ, and after reading them both, I thought that perhaps
> an "outsiders" point of view could be useful.

Howdy!

Thanks very much for taking the time to provide this outside
perspective.  I've done my best to take your comments in the
constructive fashion you intended them.  I'm going to reply relatively
briefly to some major points below - please don't take brevity as me
being dismissive, it's more to do with my available time!
 
> * caching -- given that most content is going to be static, caching
> should be quite useful;  however it doesn't seem to have been present
> as a concern neither in the spec, FAQ or the mailing list archive;
> I'm not advocating for the whole HTTP caching headers, but perhaps for
> a simple SHA of the body so that clients can just skip downloading it
> (although this would imply a more elaborate protocol, having a
> "headers" and separate "body" phase);

Not just a more elaborate protocol (although that does count, by itself,
against caching, as implementation simplicity is a driving goal of the
protocol), but a more extensible protocol.  I've fought since day one
against anything that acts to divide the response header into parts,
equivalent to the multiple header lines of HTTP.  Extensibility, for all
its benefits, is the eventual death of simplicity.

Caching is not a bad thing, but it pays off the most for large content.
Leaving caching out actively encourages content producers to make their
content as small as possible.  I like that.
 
> * compression -- needless to say that `text/*` MIME types compress
> very well, thus saving both bandwidth and caching storage;  (granted
> one can use compression on the TLS side, although I think that one was
> dropped due to security issues?);

As above, compression is not a bad thing, but for small content the
benefit is not proportionate to the implementation effort.  Gopherspace
is an existence proof that worthwhile textual content can be served
uncompressed and still be orders of magnitude smaller than the average
website which *does* use compression.

You're right about TLS compression having security problems.

> * `Content-Length` -- I've seen this mentioned in the FAQ or the
> mailing lists;  I think the days of "unreliable" protocols has passed;
>  (i.e. we should better make sure that the intended document was
> properly delivered, in its entirety and unaltered;)

This is definitely the biggest existing pain point in Gemini so far, I
think.  I might write about this in another email.  I still think for
various reasons we can live without this, but I won't swear that if the
right solution is proposed I won't consider it.

Someone did mention earlier on the list that TLS has a way to explicitly
signal a clean shut down of a connection, which would provide "in its
entirety".

> * status codes -- although both Gemini and HTTP use numeric status
> codes, I do believe that these are an artifact of ancient times, and
> we could just replace them with proper symbols (perhaps hierarchical
> in nature like `redirect:temporary` or `failure:temporary:slow-down`;

This seems to me like extra bytes with very little benefit?  The status
codes are supposed to be machine-readable, so what's wrong with numbers?

> * keep-alive -- although in Gopher and Gemini the served documents
> seem to be self-contained, and usually connections will be idle while
> the user is pondering what to read, in case of crawlers having to
> re-establish each time a new connection (especially a TLS one) would
> eat a lot of resources and incur significant delays;  (not to mention
> that repeated TCP connection establishment to the same port or target
> IP might be misinterpreted as an attack by various security appliances
> or cloud providers;)

The overhead of setting up a new TLS connection each time is a shame.
TLS 1.3 introduces new functionality to reuse previously negotiated
content, which is currently not widely supported in a lot of libraries
but I hope that this will become easier in the future and ease some of
the pain on this point.
 
> Now on the transport side, somewhat related to the previous point, I
> think TLS transient certificates are an overkill...  If one wants to
> implement "sessions", one could introduce
> "client-side-generated-cookies" which are functionally equivalent to
> these transient certificates.  Instead of creating a transient
> certificate, the client generates a unique token and sends that to the
> server instead.  The server has no more control over the value of that
> cookie as it does for the transient certificate.
>
> Moreover the way sessions are signaled between the server and client,
> piggy-backed ontop of status codes, seems rather an afterthought than
> part of an orthogonal design.  Perhaps these sessions should "moved"
> to a higher level (i.e. after transport and before the actual
> transaction, just like in the case of OSI stack).
 
This is all true, but once client certificate support was already in the
protocol for reasons unrelated to sessions, since it was *possible* to
implement sessions using client certificates instead of adding some new
part to the protocol, I chose to do it.  This is part of the "maximise
power to weight" principle that has guided Gemini's design.  Once you
are paying the weight penalty for some part of the protocol, you should
extract as much power from it you can by using it to solve any problem
you can.  This will lead to somewhat clunky solutions to problems
cobbled together from two or three exisitng parts, even when there is an
obvious neater solution that could be achieved with one non-existing
part, but I'm okay with that.
 
> Also these transient certificates are sold as "privacy enablers" or
> "tracking preventing" which is far from the truth.  The server (based
> on IP, ASN or other information) can easily map various transient
> certificates as "possibly" belonging to the same person.  Thus just by
> allowing these one opens up the possibility of tracking (even if only
> for a given session).  Moreover, securely generating these transient
> certificates does require some CPU power.

But servers can do that with raw requests anyway, right?

The CPU power point is well taken, believe me.  I have considered having
the spec (or maybe this belongs in our Best Practices document)
encourage implementers to support and to prefer the computationally
lighter ciphers in TLS (e.g. the ChaCha stream cipher).
 
> On a second thought, why TLS?  Why not something based on NaCL /
> `libsodium` constructs, or even the "Noise Protocol"
> (http://www.noiseprotocol.org/)?

Mostly because TLS library support is much more wide spread than
anything else.

> For example I've tried to build the
> Asuka Rust-based client and it pulled ~104 dependencies and took a few
> minutes to compile, this doesn't seem too lightweight...

A slight off-topic rant:  That's not Asuka's fault, it's not TLS's fault
and it's not Gemini's fault, that's Rust's fault.  Every single Rust
program I have ever tried to build has had over 100 dependencies.  Every
single one has had at least one dependency with a minimum required
version (of either the library, or Rust itself) which was released only
yesterday.  The Rust toolchain and community seem to support and even
actively encourage this unsustainable approach to development.  It
strikes me (as an outsider!) as a total mess.
 
> Why not just re-use PGP to sign / encrypt requests and replies?  With
> regard to PGP, given that Gopher communities tend to be quite small,
> and composed of mostly "techie" people, this goes hand-in-hand with
> the "web-of-trust"

I would prefer not to do anything like explicitly designing Gemini to
cater to a small and tight-knit group of techies.  I know it's that now,
and maybe that's all it will ever be, but I would like to give it a
decent chance of being more.

There is an `application/pgp-encrypted` MIME type that Gemini can serve
content with, and people can write clients that to handle this, so
Gemininaut cypherpunks can do this if they want to!

> Now getting back to the `gemini://` protocol, another odd thing I
> found is the "query" feature.  Gemini explicitly supports only `GET`
> requests, and the `text/gemini` format doesn't support forms, yet it
> still tries to implement a "single input-box form"...  Granted it's a
> nice hack, but it's not "elegant"...  (Again, like in the case of
> sessions, it seems more as an afterthought, even though this is the
> way Gopher does it...)

> Perhaps a simple "form" solution would be better?  Perhaps completely
> eliminating for the time these "queries"?  Or perhaps introducing a
> new form of URL's like for example:
> `gemini-query:?url=gemini://server/path&prompt=Please+enter+something`
> which can be served either in-line (as was possible in Gopher) and /
> or served as a redirect (thus eliminating another status code family).

I did, back during the long, drawn-out contemplation of whether to use
one, two or three digit status codes, consider having the META content
for query status be a string in some kind of small DSL for defining a
form, but decided against it.  You can simulate the effect using a
sequency of "single input forms" tied together with a client certificate
session.  This is, IMHO, "elegant" in it's own way - a FORTHy kind of
elegance where you build complicated things up by combining a small set
of sharp primitives in creative ways.

> Regarding the `text/gemini` format -- and taking into account various
> emails in the archive about reflowing, etc -- makes me wonder if it is
> actually needed.  Why can't CommonMark be adopted as the HTML
> equivalent, and a more up-to-date Gopher map variant as an alternative
> for menus?  There are already countless safe CommonMark parsers
> out-there (for example in Rust there is one implemented by Google) and
> the format is well understood and accepted by a large community
> (especially the static side generators community).

Sorry, I'm still too busy recovering from the trauma of our text/gemini
discussion around Markdown to respond to this now. :)
 
> All in all I find the `gemini://` project quite interesting, and I'll
> keep an close eye on it.

Please do!  And please continue to share your thoughts with us here.  I
hope it doesn't seem to much like I've not taken some of your points
seriously enough and have just stubbornly stuck to previous decisions.
I really do see challenging questions regarding our design decisions as
valuable things, and tried to consider your questions seriously - and
I'll continue to do so in coming days.

Cheers,
Solderpunk
---
Previous in thread (6 of 17): 🗣️ Ciprian Dorin Craciun (ciprian.craciun (a) gmail.com)
Next in thread (8 of 17): 🗣️ Aaron Janse (aaron (a) ajanse.me)
View entire thread.