💾 Archived View for mozz.us › diagnostics › 2020-01-08 › notes.gmi captured on 2022-03-01 at 15:24:13. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2020-09-24)
-=-=-=-=-=-=-
2020-01-09
This page contains my conclusions of running the jetforce server diagnostic script against the current landscape of gemini servers.
/jetforce/jetforce_diagnostics.py
These are my personal, informal notes and takeaways regarding server design and improvements that could be made the gemini specification.
In total, 14 gemini servers were tested. This was every hostname that I could find that responded to a gemini request. I also tried to determine the software powering each server based on public information and some educated guessing.
gemini.conman.org (GLV-1.12556)
zaibatsu.circumlunar.space (gegobi)
gemini.circumlunar.space (jetforce 0.1.0)
konpeito.media (jetforce 0.1.0)
earthlight.xyz (jetforce 0.0.7)
No surprise, all servers resolved their IPv4 addresses.
Only four servers resolved to IPv6, and of those four, only two of them accepted a connection. I haven't played around with IPv6 much yet, but it's something that I want to setup on mozz.us.
From the gemini spec:
Servers MUST use TLS version 1.2 or higher and SHOULD use TLS version 1.3 or higher. Clients MAY refuse to connect to servers using TLS version 1.2 or lower.
I found it interesting that the split was almost down the middle. This likely has to do with the version of OpenSSL or LibreSSL installed on the server.
For this test, I checked the validity of some of the TLS certificate claims. Even with certificate pinning, it's reasonable to expect the certificate to not be expired and have a matching hostname.
Four servers had expired certs; 2 were self-signed and 2 were Let's Encrypt. One of the Let's Encrypt servers was my own. I discovered that I need to restart my server after certbot rotates my certificate. I need to look into if I can detect this automatically with jetforce.
All of the server's hostnames matched either their CN or their subjectAltName except for one. That one didn't include any subject in their certificate at all.
This was a check to see how many servers were using trusted CAs vs. self-signed. This is deliberately not a requirement in the specification, but I found it interesting none-the-less.
I wanted to see what happened if I sent a gemini request without wrapping the socket in TLS. All servers either dropped or closed the connection without sending any unencrypted bytes.
This was a simple check to see if any servers were running on a single thread without any support for asynchronous connection handling. All servers passed.
For this check, I requested the server's homepage and looked at the structure of the response.
All servers returned a status of 20 with a mime type of "text/gemini" and no charset.
Not a single server used <CR><LF> line endings consistently. These are required by the spec for formatting link lines, and heavily implied that they should be used for all other text lines as well (at least that's my take on it).
My takeaway from this is that the requirement for <CR><LF> is not practical and should be abandoned in favor of a plain <LF>. Or at least make the <CR> part optional. Who uses Windows anymore, anyways?
I expected all of the pages to end with a newline, but two did not. In my opinion, this should be enforced for text/gemini because it makes display consistent for naive clients that print the output to the screen.
This is not a hard requirement, but it's (arguably) a good practice in HTTP to only have one canonical URL for every resource, and then leverage redirects to point to that resource. So if I have two routes:
Since the resource is actually a directory and thus should be represented with a trailing slash, the first URL should return a redirect to the second URL instead of responding with the content directly.
The only server that appears to do this redirect is GLV-1.12556, and that server responds with a relative path
I'll be honest, at first I thought this was a mistake because gemini requests are only supposed to use absolute URLs. But it's right here is the gemini spec:
The server is redirecting the client to a new location for the requested resource. There is no response body. The header text is a new URL for the requested resource. The URL may be absolute or relative.
All servers appropriately responded with a status code of 51. All servers set the meta text to some variant of "Not Found".
This was also the first test where I noticed that all of the servers use a tab character for whitespace, e.g.
EXCEPT for gegobi, which opts to use a single space character.
I added this check because I suspected some servers were looking for a newline "\n" to end a request instead of a whole <CR><LF> pair.
It turned out that 3 servers (all running unique software) accepted my request without the <CR>.
In my opinion, this is more evidence that the carriage return should be made optional or dropped from the spec. It's too easy to overlook when most programming standard libraries have some variant of "readline()" that doesn't require a "<CR>" by default.
In this test I added the port number to the URL:
All servers interpreted this correctly, likely because everybody is using a URL parsing library that handles this for them.
If the scheme of the URL is not specified, a scheme of gemini:// is implied.
I don't know exactly what a URL without a scheme is supposed to look like, but here's what I came up with:
Every single server returned a bad request response for this format. So either I don't know what a scheme-less URL look like, or nobody else does either?
In my opinion, this exception for scheme-less URLs is unnecessary and should be removed from the gemini specification.
I was curious how the IP address would handled in a URL since it doesn't match the hostname. The two geminal servers allowed it, the rest returned a variety of error codes: 50, 51, and 59
From the spec:
<URL> is a UTF-8 encoded absolute URL
I stuck a random latin-1 character in the URL that wasn't valid UTF-8.
The jetforce servers rejected it (59 BAD REQUEST). The gegobi server dropped the connection. This does not surprise me as both of these servers were written in python and Python 3 rams UTF-8 handling down your throat.
All of the other servers accepted the URL and did not reject it as malformed. I'm curious what they're doing under the hood, are they not attempting to decode the URL?
Gemini requests are a single
CRLF-terminated line with the
following structure:
<URL><CR><LF>
<URL> is a UTF-8 encoded absolute URL,
of maximum length 1024 bytes.
First off, I tried sending a URL that was exactly 1024 bytes long. Two servers (jetforce and geminal) choked attempting to read a filename this long in the OS returned a 4x internal error status. Jetforce chokes harder and actually exposes sensitive information about the filesystem. I need to fix this ASAP.
The other servers appropriately return a 51 NOT FOUND status.
Next, I tried adding on an extra byte which bumped the URL over the 1024 limit. Jetforce was the only server that marked this as a bad request. The rest either failed internally or marked it as a standard 51 NOT FOUND.
My takeaway from this is that the 1024 byte limit for URLs is unnecessary and won't be enforced anyway. It should be up to each server to decide how long their URLs can be base on if they're manually allocating memory, etc.
The rest of the tests were variations on screwing with the URL format and seeing what happened:
I won't go over each test individually but here are some takeaways.