💾 Archived View for rawtext.club › ~sloum › geminilist › 000827.gmi captured on 2020-11-07 at 01:47:30. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Proposed minor spec changes, for comment.

solderpunk solderpunk at SDF.ORG

Mon May 18 21:35:44 BST 2020

- - - - - - - - - - - - - - - - - - -

Ahoy!

The three month spec freeze announced, well, almost three months ago,will be expiring soon. Things to ponder/discuss have been pilingup. So, I've been considering dealing with some of the "low hangingfruit" early (I have some time off work later this week because of anational holiday). I'm thinking in particular of fairly minorchanges, where it is obvious that there is a problem in what's alreadyspecced or important functionality is missing, and where there arefairly obviously solutions.

To this end, I'm going to outline some proposals below for feedback.I *hope* that these will be pretty uncontroversial. Feedback iswelcome, as always, but we have to do *something* about these issues,so if you really think what I propose below is a bad idea, a betteralternative would be a very good thing to bring to the discussion!

Here we go, then...

ISSUE 1:

Problem: The current spec does not impose any limit on request headerlength. The status code and META field can be separated byarbitrarily many spaces and/or tabs. Malicious or buggy servers canhang or crash carelessly written clients by sending an infinite streamof whitespace. It's not clear *why* anybody would want to do this (a"reverse DOS attack" is not very useful!), but it's clearly a problemnevertheless.

Proposal: Redfine response headers from:

<STATUS

<whitespace><META><CR><LF>

to:

<STATUS

<META><CR><LF>

i.e. exactly one space character between <STATUS> and <META>

Rationale: Allowing multiple whitespace characters of different kindsmakes sense in, e.g., the link syntax of text/gemini - that has to bewritten and read by human content authors, so it's a good idea toaccommodate different editor behaviours and different personalpreferences for laying things out. But response headers are writtenand read by software, so there's no need to be so generous.Specifying the header format more precisely actually just makes lifeslightly easier for client authors. As a result of this, the maximumlength of a response becomes finite (as the length of <STATUS> and<META> are already well defined elsewhere).

Client authors who want to follow Postel's law won't need to make anychanges here. I imagine many server authors also won't actually needto. The most probable scenario is no change needed (the server alreadysends one space) or a single s/\t/ / is neeed.

ISSUE 2:

Problem: The spec makes a big fuss about how text/gemini isline-oriented, but does not clearly state what exactly constitutes aline. The definition of link lines includes a <CR><LF> at the end butit's not clear if that applies to all line types - or whether I evenmeant to do this or it was a careless error.

Proposal: Actually, it turns out this is decided for us. RFC2046,which defines the text/* MIME media type and the text/plain subtypecovers this very clearly:

---4.1.1. Representation of Line Breaks

The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence. Similarly, any occurrence of CRLF in MIME "text" MUST represent a line break. Use of CR and LF outside of line break sequences is also forbidden.

This rule applies regardless of format or character set or sets involved.---

Since text/gemini is, well, text/gemini, it is a "text" subtype andusing anything other than CRLF means we're violating the RFCs we'resupposedly building on top of.

So, CRLF everywhere it is.

I propose it be mostly the server's job to handle this. Text editorson different operating systems used by content authors will usevarious different line break encodings which are beyond our control,so we can't really make it the author's job. Servers can translate LFto CRLF before sending content over the network. This way clientsonly need to handle the "canonical" format, no matter what authors do.

Rationale: Don't break foundational RFCs.

Yeah, I know, this is tedious and no fun for server authors, but, well,see above.

ISSUE 3:

Problem: There's no way to specify the (human) language a text/gemini documentis written in.

Proposal: Define a new parameter for the text/gemini MIME type(alongside the previously defined `charset`) to specify language.Following the example set by HTML, it seems natural to call theparameter `lang` and to allow values as per RFC1766, e.g.:

text/gemini; charset=utf-8; lang=entext/gemini; charset=utf-8; lang=en-UStext/gemini; charset=utf-8; lang=en-GBtext/gemini; charset=utf-8; lang=estext/gemini; charset=utf-8; lang=fr

Rationale: A protocol for a global network which targets human beingsreading textual content as its first-class application shouldn't beAnglocentric! Gemini already has:

A Spanish-only server at gemini://gagarin.p4g.club* An auditory browser which is/will be language aware* A search engine which will eventually become more difficult to use without the ability to limit searches to target languages.

This looks a bit scary at first from an extensibility point of view,because it does kind of open the door to defining all sorts ofadditional parameters. However, the pre-exisiting MIME RFCs we'releveraging here make it pretty clear that (i) these things aren'topen-ended, each MIME type and subtype has a fixed and finite set ofdefined parameters, and (ii) that only certain kinds of semanticinformation are really appropriate here. So this is about as safe asextensibility gets.

ISSUE 4:

Problem: Name-based virtual hosting is explicitly described as beingsupported in the spec, but no mention is made of SNI (Server NameIndication, a TLS extension which puts the desired server hostname inthe TLS handshake). Without this, virtual hosting can't be made towork reliably.

Proposal: Mandate use of SNI by clients.

Rationale: Earlier I proposed speccing that clients SHOULD use SNI butrequiring that servers be robust against its absence, by assigning adefault hostname. Upon more thought, this won't work. I was thinkingabout how a missing Host: header was handled in this situation inHTTP, where a default host works just fine. But with TLS involved,this is a problem: if the default host is not the one the client hasactually requested, the default certificate's Common Name and SubjectAlternative Names won't match what the client expects and thecertificiate will be rejected. So, I think we just have to requireSNI.

If you're a client developer, please check whether or not the TLSlibrary you are using supports SNI! If not, let me know. I imaginein this day and age they all will, so this won't be a burdensomerequirement.

That's it!

Cheers,Solderpunk