[spec] Limit valid encodings of text/gemini to UTF-8

On Tue, Dec 29, 2020 at 10:11 AM Petite Abeille
<petite.abeille at gmail.com> wrote:
> > On Dec 29, 2020, at 10:03, Sean Conner <sean at conman.org> wrote:
> >
> >  Per this wording, any client that receives "text/plain; charset=us-ascii"
> > is allowed to just drop it on the floor and do absolutely nothing with it.
> Nonsense. A compliant client MUST support UTF-8. US-ASCII is a strict 
subset of UTF-8. Therefore a compliant client supports US-ASCII 
out-of-the-box.  Nothing more, and nothing less.

A car contains people. Therefore people are cars.

Petite, you are confusing Is-A and Has-A relationships [1][2]. UTF-8
is a ("separate" from US-ASCII) character encoding that contains ASCII
charset. If the spec said "clients MUST support ONLY UTF-8" then any
pages specifying "charset=us-ascii" must result in an error.

[1] https://en.wikipedia.org/wiki/Is-a
[2] https://en.wikipedia.org/wiki/Has-a

Back to a more productive topic, the wording in the spec - "clients
MUST support UTF-8 encoded responses" - is ambiguous and doesn't
actually mean that acceptable value for "charset" must include
"utf-8", and says nothing about what values of "charset" are
acceptable. It says that clients must at the very least try to decode
response using UTF-8 charset decoder. Responses encoded with US-ASCII
and UTF-8 (and UTF-PETER, which is a random subset of UTF-8) will
indeed work.

Looking at latest stats on
gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi it looks like
UTF-8 (this includes unspecified charsets which per spec default to
UTF-8) is used by 81% of pages, US-ASCII accounts for 17%.

Given this, I suggest the spec be rephrased such that it instead
specifies minimum acceptable values of "charset" (specifically
us-ascii and utf-8).

---

Previous in thread (11 of 29): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

Next in thread (13 of 29): 🗣️ Sean Conner (sean (a) conman.org)

View entire thread.