💾 Archived View for gem.acdw.net › raw › 2020-11-06.%20Edwin%2C%20a%20Gemini%20server captured on 2022-07-16 at 15:09:52.
⬅️ Previous capture (2021-12-03)
-=-=-=-=-=-=-
What follows is the org-file I'm working on to literately write my posix awk/sh gemini server, edwin. It's what I worked on today and it's mostly words, so it's what you get. Enjoy! Bah, nevermind. I'll convert it to gemtext, *by hand*. Whatever. Oh, also: it's not finished. So... take that for what you will. I hope to finish it up this weekend. # Edwin, a Gemini server in POSIX AWK (with some other bits) # Foreword Sort of on a whim, and sort of because the only programming languages I'm /really/ comfortable in are shell and awk, I've decided to write a Gemini server in those languages. I've already written a [[https://git.sr.ht/~acdw/bollux][Gemini browser in bash]] (or written most of one; I /still/ need to add some bits for quality-of-life improvements) for much of the same reasons, so I always knew that there would come a day when I'd need to write something for the other end of the pipe. It turns out, today is that day. What follows is a literate Org file containing a functioning Gemini server that's as POSIX-compatible as possible. Awk handles the textual parts of the request and response, but since it can't do networking (and even GNU awk can't do TLS), I'm wrapping that core logic in a call to =socat= in a shell script. A dream of mine is to shoehorn Make in as a multiplexer, but I'm not sure if it's possible or even necessary. Let's find out!! # Requirements
function usplit(url, uarr) {
# scheme - scheme:
if (match(url, /^[^:\/\?#]+:/)) {
uarr["scheme"] = substr(url, RSTART, RLENGTH - 1);
url = substr(url, RSTART + RLENGTH);
}
# authority - //authority
if (match(url, /^\/\/[^\/\?#]*/)) {
uarr["authority"] = substr(url, RSTART+2, RLENGTH-2);
url = substr(url, RSTART + RLENGTH);
}
# path - path
if (match(url, /^[^\?#]*/)) {
uarr["path"] = substr(url, RSTART, RLENGTH);
url = substr(url, RSTART + RLENGTH);
}
# query - ?query
if (match(url, /^\?[^#]*/)) {
uarr["query"] = substr(url, RSTART+1, RLENGTH-1);
url = substr(url, RSTART + RLENGTH);
}
# fragment - #fragment
if (match(url, /^#.*/)) {
uarr["fragment"] = substr(url, RSTART+1);
url = substr(url, RSTART + RLENGTH);
}
# sanity checks
if (!uarr["path"]) uarr["path"] = "/";
}
# Response ## Gemini spec > Gemini response headers look like this: > > =<STATUS><SPACE><META><CR><LF>= > > =<STATUS>= is a two-digit numeric status code, as described below in 3.2 and in Appendix 1. > > =<SPACE>= is a single space character, i.e. the byte 0x20. > > =<META>= is a UTF-8 encoded string of maximum length 1024 bytes, whose meaning is =<STATUS>= dependent. > > =<STATUS>= and =<META>= are separated by a single space character. > > If =<STATUS>= does not belong to the "SUCCESS" range of codes, then the server MUST close the connection after sending the header and MUST NOT send a response body. > > If a server sends a =<STATUS>= which is not a two-digit number or a =<META= which exceeds 1024 bytes in length, the client SHOULD close the connection and disregard the response header, informing the user of an error. ## Status codes Edwin is going to be "fancy," meaning it'll use the whole gamut of 2-digit codes. They are as follows:
| Code | Meaning | Layer |
|------+-----------------------------+---------|
| 10 | INPUT | cgi |
| 11 | SENSITIVE INPUT | cgi |
| 20 | SUCCESS | awk |
| 30 | REDIRECT - TEMPORARY | file? |
| 31 | REDIRECT - PERMANENT | file? |
| 40 | TEMPORARY FAILURE | -- |
| 41 | SERVER UNAVALIABLE | sh |
| 42 | CGI ERROR | awk |
| 43 | PROXY ERROR | ??? |
| 44 | SLOW DOWN | sh? |
| 50 | PERMANENT FAILURE | -- |
| 51 | NOT FOUND | awk |
| 52 | GONE | file? |
| 53 | PROXY REQUEST REFUSED | awk |
| 59 | BAD REQUEST | awk |
| 60 | CLIENT CERTIFICATE REQUIRED | cgi |
| 61 | CERTIFICATE NOT AUTHORISED | cgi |
| 62 | CERTIFICATE NOT VALID | cgi |
The =10= codes really only make sense in the context of CGI scripts, so they can handle those themselves. Ditto for the =60= s. =20= is the default, and works as long as the awk script can find the file or CGI script and can read/execute it. So awk can handle that. I'm thinking the ~30~ codes can be implemented on a file level, possibly with something as simple as =/some/path/redirect.31= with a single line, =gemini://example.com/some/other/path/= that edwin could read and send the client over there. Of course, the client would only have to request =/some/path/redirect= to be redirected. Another option for these is using something like a =.molly= or =.htaccess= file. =41= only really applies if the shellscript can't call the awk script. Likewise, =42= only makes sense in the awk layer, since that's what calls the CGI. =43= doesn't make sense unless we're planning on proxying to other hosts, which I'm not right now, so. =44= needs to be in the sh layer, if it's anywhere at all -- I'm not sure that I'll implement it. =51= will be implemented in the awk layer, since it tries to find the file. For my purposes, I don't see a meaningful difference between =51= and =52=, so I won't implement it; however, =52= /might/ be usable at a file level à la =31=-style file extensions -- i.e., move the to-be-deleted file to =delete.52=, and after waiting an "appropriate" amount of time, fully deleting it -- but that seems complicated and not-overly-helpful. =53= will be handled by the awk layer, since only awk will see what the request is. Same with =59=. ### Generate the status code
function respond(code, meta) {
printf "%s %s\r\n", code, meta
}
## Serve things ### Check permissions Edwin can't serve a file that doesn't exist, of course. ~expect()~ is the function that deals with that and other possiblities. Luckily, awk has a ~system()~ function that works like POSIX's ~system()~ call, which "shells out" to a shell -- meaning we can run Unix commands like ~test~. A caveat: because of the conventions of Unix, we need to negate our ideas of success and failure in awk. That's why I exit the script on the ~true~ branch of the ~if~ block below -- ~test~ exits with a 1 if it /fails/.
function test(file, test_arg, err_code, err_text) {
if (system("test -" test " " file)) {
if (err_code && err_text)
respond(err_code, err_text);
return 0;
}
return 1;
}
### TODO Mime types To serve a file, we need to know its mime-type so we can pass that on to the client. One day, I'll figure out something fancy with =/etc/mime.types= or something, but for now, we'll assume everything is ~DEFAULT_MIME~, which in edwin's case is =text/gemini=.
function get_mime(file) {
return DEFAULT_MIME;
}
### Serve files Here is the main "heart" of edwin, the whole reason we're here: we're serving a file. I'm not sure when we'd pass the mime-type in, but hey, it's there in case we do; at any rate, if it's not there we're going to find the mime type through the ~get_mime()~ function. After that, it's simple: print out the response header, then read the file line by line and pass it through. Finally, close the file and exit this iteration of the awk bit.
function serve_file(path, mime) {
if (!mime)
mime = get_mime(path);
respond(20, mime);
while (getline < path) {
print;
}
close(path);
exit 0;
}
### TODO Serve CGI ### Respond to requests
{
# clean out the URL array
for (part in url)
delete url[part];
# and reassign it
usplit($0, url);
# sanity checks
if (url["scheme"] != "gemini") {
respond(53, "Only gemini supported.");
exit 53;
}
if (url["authority"] != HOST_NAME) {
response(53, "No proxying to other hosts!");
exit 53;
}
# figure out the file we're serving
path = BASE_PATH url["path"];
# is the file executable? serve cgi
if (test(path, "x"))
serve_cgi(path);
# if not, does the file exist at least? serve the file.
if (test(path, "r", 51, "Not found."))
serve_file(path);
else
exit 51;
exit 0;
}
# TODO TLS