💾 Archived View for rawtext.club › ~sloum › geminilist › 000449.gmi captured on 2020-09-24 at 02:33:45. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

<-- back to the mailing list

An outsider's view of the `gemini://` protocol

Ciprian Dorin Craciun ciprian.craciun at gmail.com

Fri Feb 28 10:04:19 GMT 2020

- - - - - - - - - - - - - - - - - - - ```

On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote:
>   Why is a numeric status code so bad?  Yes, the rest of the protocol is
> English centric (MIME types; left-to-right, UTF-8).  It just seems that
> using words (regardless of language) is just complexity for its own sake.


Why did people use `/etc/hosts` files before DNS was invented?  Why dowe have `/etc/services`?  Why do we have `O_READ`?  Why do we have`chmod +x`?

Because numbers are hard to remember, and say nothing to a person thatdoesn't know the spec by heart.  (For example although I do a lot ofHTTP related work with regard to routing and such, I always don'tremember which of the 4-5 HTTP redirect codes says "temporary redirectbut keep the same method" as "opposed to temporary redirect but switchto `GET`".)




> 
> As minor issues:
> 
> * why `CRLF`?  it's easier (both in terms of availability of functions
> 
> and efficiency) to split lines by a single character `\n` than by a
> 
> string;
>
>   That was discussed earlier on the list:
>
>         https://lists.orbitalfox.eu/archives/gemini/2019/000116.html


OK, reading that email the answer seems to be "because other protocolshave it"...  And even you admit that in your own code you also handlejust `LF`.

So then why bother?  Why not simplify the protocol?




> 
> 
> 
> On a second thought, why TLS?  Why not something based on NaCL /
> 
> 
> 
> `libsodium` constructs, or even the "Noise Protocol"
> 
> 
> 
> (http://www.noiseprotocol.org/)?
> 
> 
>
> 
> 
>         1) Never, *NEVER* implement crypto yourself.
> 
>
> 
> I was never proposing to implement crypto ourselves.  `libsodium` /
> 
> NaCL provides very useful high-level constructs, tailored for specific
> 
> use-cases (like for example message encryption and signing), that are
> 
> proven to be safe, and exports them with a very simple API that can be
> 
> easily understood and used.
>
>   TLS was choosen because the COMMUNICATIONS LINK is encrypted, not just the
> payload.  All Eve (the evesdropper) can see is what IP address you are
> connecting to, not what content you are reading, nor (depending upon the TLS
> version) what virtual server you might be talking to.


Although I do agree that encryption at the "transport" level to hidethe entire traffic is a good idea, if you take into account that`gemini://` requires one request and one reply per TCP connection(thus TLS connection), there is no actual "communications link here".

Basically you are using TLS to encrypt only one payload.  Moreoveralso because there is exactly one request / one reply one can justlook at the traffic pattern and deduce what the user is doing just byanalyzing the length of the stream (in both ways) and the time theserver takes to respond (which says static or dynamically generated).(Granted TLS records are padded, however even so, having the size asmultiple of some fixed value, still gives an insight into what wasrequested.)

For example say lives in a country where certain books (perhaps aboutcryptography) are forbidden;  now imagine there is a library out therethat serves these books through `gemini://`;  now imagine the countrywants to see what books are read by its own citizens;  all it has todo is record each session and deduce a response size range, then crawlthat library and see which books fit into that range.

Therefore I would say (I'm no cryptographer) TLS doesn't help at all,einter does PGP, either `libsodium` / NaCL...




Another related topic regarding TLS that just struck me:  given that`gemini://` supports out-of-the-box virtual hosts, do you couple thatwith TLS SNI?

If not basically TLS is just an "obfuscation" than actual end-to-endencryption.  Why I say that:  because the spec says one should useSSH-style "do you trust this server" questions and keep thatcertificate in mind.  But how about when the certificate expires, oris revoked?  (SSH server public keys never expire...)  How does theuser know that the certificate was rightfully replaced or he is avictim of an MITM attack?




> 
> 
> 
> Why not just re-use PGP to sign / encrypt requests and replies?  With
> 
> 
> 
> regard to PGP,
> 
> 
>
> 
> 
>   There are issues with using PGP:
> 
> 
>
> 
> 
>         https://latacora.micro.blog/2019/07/16/the-pgp-problem.html
> 
>
> 
> There are issues with any technology, TLS included.
> 
>
> 
> However I would say it's easier to integrate GnuPG (even through
> 
> subprocesses) in order to encrypt / decrypt payloads (especially given
> 
> how low in count they are for Gemini's ecosystem) than implementing
> 
> TLS.  Moreover it offers out-of-the-box the whole client side
> 
> certificate management, which adding to a TLS-based client would be
> 
> much more involved, more on this bellow...
>
>   As I have mentioned, that only protects the payload, not the
> communications channel.


But as said, you don't have an actual communications channel becauseyou use TLS for a single request / reply payload pair...  :)




> 
> 
>   The hardest problem with crypto is key management.  If anything, key
> 
> 
> management with PGP seems more problematic than with OpenSSL and the CA
> 
> 
> infrastructure (as bad as the CA infrastructure is).
> 
>
> 
> One of the `gemini://` specifications explicitly states that the
> 
> server certificate authentication model is similar to SSH's first use
> 
> accept and cache afterward.  However say you'll go with the actual CA
> 
> model, now you need to juggle Let's Encrypt (each 3 months) (or add
> 
> support for ACME in your server), then juggle PEM files, etc.
> 
> Regardless, either way one will have to implement all this certificate
> 
> management from scratch.
>
>   Or self-signed certificates.
>
>   Okay, we use NaCL.  Now what?  What's needed to secure the communication
> channel?  A key exchange.  Again, rule 1---never implement crypto.


Given that one has the public key of the server (more on that later),one could use the following on client / server sides:

    https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes

The function creates a new key pair for each message, and attaches thepublic key to the ciphertext. The secret key is overwritten and is notaccessible after this function returns.

The crypto_box_seal_open() function decrypts the ciphertext c whoselength is clen, using the key pair (pk, sk), and puts the decryptedmessage into m (clen - crypto_box_SEALBYTES bytes).```

How does one get the public key of the server? One could change theprotocol so that the server speaks first and sends its own public key.

My take on this: given a set of clear requirements for the`gemini://` protocol (which I've seen there are) one can come up withbetter solutions than TLS, ones that better fit the use-case.

(Again, just to be clear, I'm not saying "lets invent our own crypto",but instead "let's look at other tested" alternatives. As aside-note, NaCL, on which `libsodium` is based, was created by `DanielJ. Bernstein`...)

Regarding an up-to-date Gopher map alternative, I think this is an
important piece of the Gopher ecosystem that is missing from today's
world: a machine-parsable standard format of indexing documents. I
very fondly remember "directory" sites of yesteryear (like DMOZ or the
countless other clones) that strives to categorize the internet not by
"machine learning" but by human curation.
Could you provide an example of what you mean by this? I'm not sure why a
map alternative is needed.
One problem with today's web is that the actual "web structure" is
embedded in unstructured documents as links. What I liked about
Gopher maps is that it gave a machine-readable, but still
user-friendly, way to map and categorize the "web contents".
One problem with that---incentives. What's my incentive to make all this
information more easily machine readable? On the web, you do that, and what
happens? Google comes along, munches on all that sweet machine readable
data and serves it up directly to users, meaning the user just has to go to
Google for the information, not your server. Given those incentives, I have
no reason to make my data easily machine readable when it means less
traffic.

The incentive is a clear one: for the end-user. Given that we canstandardize on such an "index", then we can create better"user-agents" that are more useful to our actual users. (And I'm noteven touching on the persons that have various disabilities thathamper their interaction with computers.)

For example say I'm exposing a API documentation via `gemini://`. Howdo I handle the "all functions index page"? Do I create a large`text/gemini` file, or a large HTML file? How does the user interactwith that? With search? Wouldn't he be better served by a searchableinterface which filters the options as he types, like `dmenu` / `rofi`/ `fzf` (or the countless other clones) do? (Currently eachprogramming language from Rust to Scheme tries to do something similarwith JavaScript and the result is horrible...)

Or, to take another approach, why do people use Google to searchthings? Because our web pages are so poor when it comes tostructuring information, that most often than not, when I want to findsomething on a site I just Google: `site:example.com the topic i'minterested in`.

I recall the large push for RDF (Resource Description Framework) back
around 2004 or so ... embed machine parsable relations and metadata and it
would be oh so wonderful. Some people even bothered to to all that work.
And for what? It was a pain to maintain, the tooling was poor, and Google
would just suck it up and serve it to users directly, no reason for anyone
to actually visit your site.

I'm not advocating for RDF (it was quite convoluted) or semantic web,or GraphQL, etc. I'm just advocating something better than the Gophermap.

As a user, that's great! As a web site operator, not so much.

OK... Now here is something I don't understand: aren't you buildingGemini sites for "users"? You are building it for "operators"?

Because if the operator is what you optimize for, then why not justSSH into the operator's server where he provides you with his"favourite" BBS clone.

* and perhaps add support for content-based addressing (as opposed to
server-based addressing) (i.e. persistent URL's);
There already exist such protocols---I'm not sure what a new one based
around Gemini would buy.
I agree that `gemini://` is first and foremost a "transfer" protocol.
However one can include a document's identity as a first class citizen
of the protocol.
For example say each document is identified by its SHA; then when
replying with a document also send that SHA in form of a permanent URL
like say `gemini-object:?sha={SHA}&location=gemini://first-server/...&location=gemini://second-server/...`;
then a client (that perhaps has bookmarked that particular version of
that document) could send that URL to a server (of his choosing via
configuration, to the first one specified in `location`, etc.) and if
that server has that document just reply with that, else use
`location`, else return 404.
Hey, go ahead and implement that. I'd like to see that ...

There is already FreeNet and IPFS that implement content-basedaddressing. I just wanted something in between that is still"location" driven, but is "content identity" aware.

Ciprian.