Understanding the Gemini request and response (COSCUP 2024 draft)

Author's note: This post is a draft for a upcomming talk I'll be giving at COSCUP 2024 in the "Let's read the code" track. Where I'll be using Gemini as a simple case to introduce the RFC format and how to read it.
This post only focuses on the actual protocol itself - sending a request and receiving a response. To get people a taste on reading RFCs and how to understand them.
Which I should have written much easiler. It's like a week until I'm giving the talk.
Therefor this post is very rushed

Introduction

The Gemini protocol is a simple documentation transport protocol published by Solderpunk in 2021 and have gained some traction in the tech community. Users of the protocol, among others forms what's referred to as the "small web" or the "small internet". The protocol itself is described across 3 documents.

Formal specification of the Gemini protocol

Formal specification of the Gemini "Gemtext" document format

A high-level overview of the Gemini protocol (was the Gemini specutive specification).

In this post, we will be reading the high level overview first, then dive into the formal specification of the protocol. You can read the Gemtext specification on your own time.

The RFC language

Most internet protocols are written in the RFC format. RFC stands for Request For Comments, these are made for engineers who knows their trade to communicate with each other and is managed by the Internet Engineering Task Force (IETF).

In any case, the important thing to know thofat you'll see a lot of MAY, MUST, SHOULD, etc. These are keywords defined in RFC2119, which is really easy to read. Here's the TL;DR nevertheless

For example. If a protocol says "The sender MUST NOT send more than 1000 bytes of data". That means if the sender is conforming to the protocol, they will never send more than 1000 bytes of data. But as a protocol implementer, you have to be prepared if a non-conforming sender sends more than 1000 bytes of data. (In most cases, send some kind of error message back and kill the connection).

The Gemini Tech Overview/Speculatice Specification

We'll start by going through the speculatice specification as it is easier to understand (with some prior knowledge to how similar protocols, like HTTP, works).

Overview and transactions

Let's start from the beginning and see what the protocol is about.

Gemini is a client-server protocol featuring request-response transactions, broadly similar to gopher or HTTP. Connections are closed at the end of a single transaction and cannot be reused. When Gemini is served over TCP/IP, servers should listen on port 1965.

Here we know that Gemini is a client-server protocol, and that it's similar to HTTP (and gopher). But what is a transaction? We can real one line down and find out. For gemini, a transaction is a single request-response cycle. The client sends a request, the server sends a response, and the connection is closed. The details are in later sections.

## 1.1 Gemini transactions
There is one kind of Gemini transaction, roughly equivalent to a gopher request or a HTTP "GET" request. Transactions happen as follows:
C: Opens connection
S: Accepts connection
C/S: Complete TLS handshake (see section 4)
C: Validates server certificate (see 4.2)
C: Sends request (one CRLF terminated line) (see section 2)
S: Sends response header (one CRLF terminated line), closes connection under non-success conditions (see 3.1 and 3.2)
S: Sends response body (text or binary data) (see 3.3)
S: Closes connection (including TLS close_notify, see section 4)
C: Handles response (see 3.4)

It describes how a client talks to a Gemini server. With some guess work, we can assume C means a client and S means a server. The client opens a connection to the server, both side negotiates a TLS connection, the client validates the server's certificate, the client sends a request, the server sends a response header then the response body, and finally the server closes the connection.

Request and responses

We will bother ourselves with TLS later, partially because it is in secion 4 and partially because it's good old boring TLS that we all know and love/hate. Let's see how a request and response looks like.

Gemini requests are a single CRLF-terminated line with the following structure:
<URL><CR><LF>
<URL> is a UTF-8 encoded absolute URL, including a scheme, of maximum length 1024 bytes. The request MUST NOT begin with a U+FEFF byte order mark.

That looks simple. The request is literarly just "protocol://your.domain.tld/path\r\n". And here is our first instance of those RFC keywords. MUST NOT here demains that the client mustn't send a byte order mark at the beginning of the request, if so the client is considered non-conforming and servers will most likely reject the request. Likewise, it implies that servers should actively look for byte order marks at the beginning of the request and reject the request if it is found. Following the text:

Resources hosted via Gemini are identified using URIs with the scheme "gemini". This scheme is syntactically compatible with the generic URI syntax defined in RFC 3986, but does not support all components of the generic syntax. In particular, the authority component is allowed and required, but its userinfo subcomponent is NOT allowed. The host subcomponent is required. The port subcomponent is optional, with a default value of 1965. The path, query and fragment components are allowed and have no special meanings beyond those defined by the generic syntax. An empty path is equivalent to a path consisting only of "/". Spaces in paths should be encoded as %20, not as +.
Clients SHOULD normalise URIs (as per section 6.2.3 of RFC 3986) before sending requests (see section 2) and servers SHOULD normalise received URIs before processing a request.

Let's disect this paragraph. First it references RFC 3986, which defines the URI syntax. And Gemini should use the "gemini" scheme. Furthermore, the authority component is allowed and required, but the userinfo subcomponent is not allowed. What the heck are those? Never heard of them in years of using a computer, right?

You can read the URI specification on your own time, but in short, URIs are formed from several parts. First the scheme, or more commonly called the protocol. It's the http or https in https://example.com. The authority is the domain name (and port) while userinfo refers to the username and password. Yes, technically you can put a username and password in the URI. But no one ever does that. The path is the path we all know.

Now let's see what a response looks like. The spec first says:

Gemini response consist of a single CRLF-terminated header line, optionally followed by a response body.

Which is followed by a defination of a resonse header.

Gemini response headers look like this:
<STATUS><SPACE><META><CR><LF>
<STATUS> is a two-digit numeric status code, as described below in 3.2 and in Appendix 1.
<SPACE> is a single space character, i.e. the byte 0x20.
<META> is a UTF-8 encoded string of maximum length 1024 bytes, whose meaning is <STATUS> dependent.

So the response is formed by several parts. The status code, followed by a space, followed by a meta string. The status code is a two digit number,and the meta string is a UTF-8 encoded string of maximum length 1024 bytes. The meaning of the meta string is dependent on the status code. We can see the status codes in the appendix. From this we can infer the followins is probably a valid response header

10 Hello, world!\r\n

The spec then gives us a list of status codes categorized by their meanings.

Let's look at the SUCCESS status code.

The request was handled successfully and a response body will follow the response header. The <META> line is a MIME media type which applies to the response body.

Ah! So that's how the client knows what to do with the response body. The server tells the client what it has sent back via the MIME type followed by the actual resposne. Good! We can update our guess of what a valid response header looks like.

20 text/plain\r\nHello, world!

TLS Requirements

The spec now goes into how TLS should be handled after the request and response. Skimming through, we can collect the following quotes:

Servers MUST use TLS version 1.2 or higher and SHOULD use TLS version 1.3 or higher.
Clients can validate TLS connections however they like (including not at all) but the strongly RECOMMENDED approach is to implement a lightweight "TOFU" certificate-pinning system which treats self-signed certificates as first- class citizens.

It demands that servers use TLS 1.2 or higher, and 1.3 if possible. And client can validate TLS connections however they like, but it is recommended to implement a lightweight "TOFU" certificate-pinning system. TOFU stands for Trust On First Use, which means the client will trust the server's certificate the first time it sees it, and will remember it for future connections (you can read the details, I won't paste it here).

A minimal Gemini client

Armed with the above information, we can write a minimal Gemini client that does the following:

(echo -n "gemini://gemini.clehaxze.tw/contact.gmi\r\n"; sleep 1) | socat - OPENSSL:gemini.clehaxze.tw:1965,verify=0

Which gives me the output. Which we can see, the server replies first with a header of status code 20 and MIME type `text/gemini`, followed by the actual response.

20 text/gemini
# Contact - Martin's capsule

I'm actively looking for new stuff and totally open to contact attempts.

You can reach me using email at marty1885 \at protonmail.com. If you wish to, or prefer other means of contact:



=> /misc/marty1885-at-protonmail.gpg My GPG key, if you want to encrypt your message.

The formal specification

Now we understand how the Gemini protocol works at a high level, we can dive into the formal specification. The formal specification is a lot more detailed and rigid, but it is also more complete. It is meant as a way for convering the _exact_ details of the protocol, and is meant for implementers of the protocol.

(A)BNF

Backus-Naur Form is a common way to describe the syntax of a language. Here we use the word language in the computer science sense, being any string of characters that follows a certain set of rules. You'll have seen it if you ever read any document about a programmign language itself, or even wrote a compiler yourself. Most RFCs uses an extented form of BNF called Augmented BNF, or ABNF. The Gemini protocol also uses ABNF to describe the syntax of the protocol.

To put it simply, Any BNF rule is defined as `rule = definition`, where `rule` is the name of the rule, and `definition` is a sequence of characters that the rule must match. For example the following rule defines a rule called `foo` that matches the character `a`.

foo     = "a"

There can me multiple rules that are combined together. For example, the following rule `ABC` matches the sequence of characters `a`, `b`, and `c`.

A       = "a"
B       = "b"
C       = "c"
ABC     = A B C

The `/` symbol denotes alternatives. The following rule `RULE` allows both "ABC" and "ABD"

RULE    = "AB" ("C" / "D")

And the `-` symbol denotes accepting a continuous range of character in the ASCII encoding (where `%x` means a character with the value in ASCII). Here `DIGIT` matches any character between `0` and `9`. Equlivent to `[0-9]` in regex. And NUM matches a sequence of digits. Where the `*` symbol denotes the minimal and maximal number of times the rule can match. The full form is `<min>*<max>rule`. When omnited, min is 0 and max is infinity. Therefor `1*DIGIT` matching the rule `DIGIT` at least once.

DIGIT   = %x30-39
NUM     = 1*DIGIT

Square brackets `[]` are used to denote optional parts of a rule. The following rule FLOATING_POINT matches a sequence of digits, followed by a period, followed by another sequence of digits. The period is optional.

DIGIT   = %x30-39
FLOATING_POINT = 1*DIGIT ["." 1*DIGIT]

And finally, there's common shorthands for non-printable characters. For example, CR and LF are used to denote the carriage return and line feed characters respectively. And CRLF is CR followed by LF.

Responses

In the formal specification, the response is defined in the following way. We can see a `reply` could be one of `input`, `success`, `redirect`, `tempfail`, `permfail`, or `auth`. For example `success` is a number "2" followed by a digit, followed by a space, followed by a MIME type, followed by a CRLF then the body. Where the body is a sequence of octets (tech speak for arbitrary data). Or the `redirect` is a number "3" followed by a digit, followed by a space, followed by where the client should redirect to, then CRLF.

        reply    = input / success / redirect / tempfail / permfail / auth

        input    = "1" DIGIT SP prompt        CRLF
        success  = "2" DIGIT SP mimetype      CRLF body
        redirect = "3" DIGIT SP URI-reference CRLF
                        ; NOTE: [STD66] allows "" as a valid
                        ;       URI-reference.  This is not intended to
                        ;       be valid for cases of redirection.
        tempfail = "4" DIGIT [SP errormsg]    CRLF
        permfail = "5" DIGIT [SP errormsg]    CRLF
        auth     = "6" DIGIT [SP errormsg]    CRLF

        prompt   = 1*(SP / VCHAR)
        mimetype = type "/" subtype *(";" parameter)
        errormsg = 1*(SP / VCHAR)
        body     = *OCTET

        VCHAR    =/ UTF8-2v / UTF8-3 / UTF8-4
        UTF8-2v  = %xC2 %xA0-BF UTF8-tail ; no C1 control set
                 / %xC3-DF UTF8-tail

Let's take the response we got earlier for example. The response header we got is `20 text/gemini` Following the rule set, we can match:

           20 text/gemini
           2
            DIGIT = 0
             SP
              mimetype = text/gemini

Request

Requests are dead simple. It's literarly just a URL followed by a CRLF. And tells you to find the rest of the details in the URI specification (with is in of itself a very complicated RFC, but still).

	request = absolute-URI CRLF

	; absolute-URI from [STD66]
	; CRLF         from [STD68]

Writhing a Gemini server

Let's put everything together. Write a simple Gemini server and prove that we understand the protocol. We will do the following:

First, generate a self-signed certificate, there is no need for a CA signed certificate as Gemini browsers uses TOFU. You can use the following command to generate a self-signed certificate.

openssl req -new -subj "/CN=<hostname>" -x509 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 -days 365 -nodes -out cert.pem -keyout key.pem

Start by making a TCP server and initialize the SSL context. Bind the socket and enable SSL on the socket

server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain("cert.pem", "key.pem")

server.bind(("localhost", 1965))
server.listen(0)

tls_server = ctx.wrap_socket(server, server_side=True)

Next we can start the main loop of the server. Once we accept a connection, the client SHOULD send us a URL followed by a CRLF. We can read the URL and parse it. If the path is `/`, treat it as `/index.gmi` (like how most web servers do). And serve the file. If the file does not exist, respond with a 51 status code.

while True:
    connection, client_address = tls_server.accept()
    data = connection.recv(1024)
    if not data:
        break
    url = data.decode("utf-8")
    parsed_url = urlparse(url) # TODO: Handle exceptions
    path = parsed_url.path
    if path == "/":
        path = "/index.gmi"

    try:
        with open(path[1:], "r") as file:
            connection.send("20 text/gemini\r\n".encode("utf-8"))
            connection.sendall(file.read().encode("utf-8"))
    except FileNotFoundError:
        connection.send("51 Not Found\r\n".encode("utf-8"))
    connection.close()
import socket
import ssl
from urllib.parse import urlparse

server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain("cert.pem", "key.pem")

server.bind(("localhost", 1965))
server.listen(0)

tls_server = ctx.wrap_socket(server, server_side=True)

while True:
    connection, client_address = tls_server.accept()
    data = connection.recv(1024)
    if not data:
        break
    url = data.decode("utf-8")
    parsed_url = urlparse(url)
    path = parsed_url.path
    if path == "/":
        path = "/index.gmi"

    try:
        with open(path[1:], "r") as file:
            connection.send("20 text/gemini\r\n".encode("utf-8"))
            connection.sendall(file.read().encode("utf-8"))
    except FileNotFoundError:
        connection.send("51 Not Found\r\n".encode("utf-8"))
    connection.close()