The spec says, as of v0.14.3, November 29th 2020, under # 2 Gemini requests: <URL> is a UTF-8 encoded absolute URL, including a scheme, of maximum length 1024 bytes. This is wrong. What's wrong is URL. It should read IRI instead, to make it consistent with UTF-8. An URL cannot be UTF-8, while an IRI can. As UTF-8 precedes URL in the sentence, UTF-8 has to take precedence. I suggest the following correction: <URL> is a UTF-8 encoded absolute IRI, including a scheme, of maximum length 4,096* bytes.
On Sat Dec 26, 2020 at 2:34 AM CET, Petite Abeille wrote: > The spec says, as of v0.14.3, November 29th 2020, under # 2 Gemini > requests: > > <URL> is a UTF-8 encoded absolute URL, including a scheme, of maximum > length 1024 bytes. > > This is wrong. > > What's wrong is URL. It should read IRI instead, to make it consistent > with UTF-8. An URL cannot be UTF-8, while an IRI can. As UTF-8 precedes > URL in the sentence, UTF-8 has to take precedence. I've already made it very clear in the main [spec] thread for this topic that whether we adopt IRIs or not, the use of "UTF-8 encoded URL" should be fixed, as it is likely to cause confusion. Rest assured, when the decision is made, this will be fixed accordingly. There's no need to split the thread over this. It's not true that URLs cannot be UTF-8. In fact, the opposite is true. The characters valid in a URL are a subset of those which can be represented in ASCII, and all byte strings which are valid ASCII are also valid UTF-8, with equivalent decodings. Hence, *every* URL is UTF-8. So there. :p The idea that, as a general principle, when an existing sentence in the spec is ambiguous or inconsistent, the problem should be resolved by granting absolute priority to whichever term occurs first in the current form of the sentence is, plainly, absurd. It's probably not a hill I'll choose to die on when it comes time to update the spec based on the IRI decision, but personally I reject the modern notion that the URL/URN (or IRL/IRN) distinction is not important and thus everything should be specified at maximum generality as a URI (or IRN). The difference matters and protocols/formats should choose appropriately. The practical question of how to handle a text/gemini document becomes *considerably* murkier when link lines contain URNs (of course, you and I know this quite well already). Cheers, Solderpunk
> On Dec 26, 2020, at 16:01, Solderpunk <solderpunk at posteo.net> wrote: > > Hence, *every* URL is UTF-8. So there. :p A subset thereof :D > plainly, absurd. Yes. And yet, we need a bit of formalism in the spec. > (of course, you and I know this quite well already). Indeed. The rabbit hole is deep.
---
Previous Thread: [tech] [spec] Decide on use of URL fragment
Next Thread: [spec] adding a "magic number" for gemini files