2021-07-24 An anatomy of the request

Early Internet protocols were simple. In order to make a request, you contact a machine on a particular port, and send some bytes, terminated by a carriage return (CR or \r) and a line feed (LF or \n). We can simulate this using echo and nc (netcat) to make these requests ourselves.

This is a request for user information using the finger protocol. Here, we’re sending a name to the server. The response is plain text.

echo -ne "alex\r\n" | nc alexschroeder.ch 79

Gopher, at the protocol level, is the same thing. Here, we’re sending a selector to the server. The response is plain text.

echo -ne "page/Alex_Schroeder\r\n" | nc alexschroeder.ch 70

HTML 1.0, at the protocol level, is already a lot more complicated. We’re sending the “method” (in our case, “GET”), the resource we’re interested in, the procotol version, HTTP headers (in our case, none), and an empty line, i.e. another CRLF:

echo -ne "GET /wiki/Alex_Schroeder HTTP/1.0\r\n\r\n" | nc alexschroeder.ch 80

The response in this case is actually a redirect to the HTTPS URL. We’ll talk about this down below. For now, let’s just note that we got a response that we can act upon.

All these protocols mentioned above cannot do virtual hosting. Virtual hosting is when multiple domains are served on the same machine. We’re used to this, on the web. We know that “http://alexschroeder.ch/” and “http://campaignwiki.org/” result in a different response; I know that this is surprising because they are both hosted on the same server. How is this possible? This is not possible using HTTP/1.0, that’s for sure: the server doesn’t know what host you think you’re talking to. If we try it, both requests get redirected to the page on “alexschroeder.ch”, the default domain.

http://alexschroeder.ch/”

http://campaignwiki.org/”

echo -ne "GET /wiki/Alex_Schroeder HTTP/1.0\r\n\r\n" | nc alexschroeder.ch 80
echo -ne "GET /wiki/Alex_Schroeder HTTP/1.0\r\n\r\n" | nc campaignwiki.org 80

You could argue that this is a bit of security concern. After all, somebody doing this request for “campaignwiki.org” now knows that it’s being hosted on the same server as “alexschroeder.ch”. I’m going to ignore that, however.

In any case, the technical reason for all of this is that at the TCP/IP level, domain names do not exist. We tell “nc” to send the request to “alexschroeder.ch” port 80, but what it actually does is look it up using the domain name system (DNS) and then it uses the IP number instead.

We can do our own lookup using “dig”:

dig alexschroeder.ch

The answer is “178.209.50.237” if you’re using IPv4. By default, that’s the answer you get because type “A” is the default. To get the IPv6 answer, you need to specify type “AAAA”.

dig -t AAAA alexschroeder.ch

The answer is “2a02:418:6a04:178:209:50:237:1” if you’re using IPv6.

We can verify the results above by substituting the IP numbers ourselves:

echo -ne "GET /wiki/Alex_Schroeder HTTP/1.0\r\n\r\n" | nc 178.209.50.237 80

The solution, as far as HTTP was concerned, is the use of additional headers. HTTP has a ton of headers to tell servers whether they already have a resourced cached and how old it is, what sort of languages they’d prefer to get back, what sort of MIME types they’d like to get back, and so on. One of these headers tells the server what host we think we’re talking to.

Here’s how to do the requests with a host header:

echo -ne "GET /wiki/Alex_Schroeder HTTP/1.0\r\nhost: alexschroeder.ch\r\n\r\n" | nc alexschroeder.ch 80
echo -ne "GET /wiki/Alex_Schroeder HTTP/1.0\r\nhost: campaignwiki.org\r\n\r\n" | nc campaignwiki.org 80

The request for “campaignwiki.org” now has a redirect to the same resource on “campaignwiki.org”. You can double check by using the IP number:

echo -ne "GET /wiki/Alex_Schroeder HTTP/1.0\r\nhost: campaignwiki.org\r\n\r\n" | nc 2a02:418:6a04:178:209:50:237:1 80

What about Gemini? It doesn’t have headers to send along, but instead of just naming the path of the resource on the server like Gopher does, it names the URL it is requesting. Sadly, we can’t illustrate this unless we use TLS. If you are lucky, your “netcat” or “nc” has the “--ssl” option and we can keep using it.

First, let’s just quickly show that HTTPS is HTTP over SSL or TLS, on a different port:

echo -ne "GET /wiki/Alex_Schroeder HTTP/1.0\r\nhost: alexschroeder.ch\r\n\r\n" | nc --ssl alexschroeder.ch 443
echo -ne "GET /wiki/Alex_Schroeder HTTP/1.0\r\nhost: campaignwiki.org\r\n\r\n" | nc --ssl campaignwiki.org 443

And so we get to Gemini. As you can see, we got rid of the complication using HTTP headers.

echo -ne "gemini://alexschroeder.ch/page/Alex_Schroeder\r\n" | nc --ssl alexschroeder.ch 1965

All right! 🚀🚀😃🎉

I hope you can see how using the command line tools helped me understand these things.

Now you know how a server like Phoebe can look at the first line of the request it gets and determine whether to serve a Finger response, a Gopher response, a Web response, or a Gemini response.

Phoebe

References, linking to the older RFCs because they’re often simpler to read:

RFC 742, NAME/FINGER

RFC 1436, The Internet Gopher Protocol

RFC 1945, Hypertext Transfer Protocol -- HTTP/1.0

RFC 2616, Hypertext Transfer Protocol -- HTTP/1.1

Project Gemini

RFCs, yo! I head Gemini might get one, eventually?

​#Programming ​#Finger ​#Gopher ​#Web ​#Gemini ​#Phoebe

Comments

(Please contact me if you want to remove your comment.)

If you don’t have an --ssl option on nc you can try:

echo -ne "gemini://alexschroeder.ch/page/Alex_Schroeder\r\n" \
  | openssl s_client alexschroeder.ch:1965

Fails needing a client certificate, but then I assume nc --ssl would fail that way too. I also assume an option on openssl would allow adding one but haven’t looked into it.

– Ed Davies 2021-07-24 19:37 UTC

Ed Davies

---

I think it works if you add the -quiet option and ignore stderr:

echo -ne "gemini://alexschroeder.ch/page/Alex_Schroeder\r\n" \
  | openssl s_client -quiet alexschroeder.ch:1965 2>/dev/null

At the time I was experimenting with bash functions as clients, but I have since abandoned it.

At least I think there’s nothing in my personal setup, nothing in my “~/.ssh/config” file that modifies s_client.

– Alex 2021-07-24 20:58 UTC

---

Actually, /b wrote in letting me know that the key part is not --quiet but that --quiet also turns on --ign_eof, and that’s the important one.

– Alex 2021-07-28 07:50 UTC