Statement of intent regarding document encodings

Drew DeVault <sir (a) cmpwn.com>

I am stopping by to clarify another intepretation of the Gemini spec
made by my implementations.

https://git.sr.ht/~sircmpwn/gmni
https://git.sr.ht/~sircmpwn/kineto

With respect to the charset parameter of the document mimetype for
text/gemini documents, it is our intention to ONLY support UTF-8, and to
raise an error if any other content encoding is specified.

We won't refuse other text/* documents with an arbitrary encoding,
though we won't display them - we'll just let the user download them.
All text/gemini documents are new, and can be expected to be written in
a sane text encoding. Other text documents may use other encodings for
historical reasons, and therefore will not be refused outright.

It's 2020, and I have a zero tolerance policy for dumb encodings.

Link to individual message.

Philip Linde <linde.philip (a) gmail.com>

On Tue, 10 Nov 2020 12:47:07 -0400
"Drew DeVault" <sir at cmpwn.com> wrote:

> With respect to the charset parameter of the document mimetype for
> text/gemini documents, it is our intention to ONLY support UTF-8, and to
> raise an error if any other content encoding is specified.

I have opted to only support UTF-8 in my client, as well. This is
allowed by the spec. I'm not sure that I've ever come across a document
in Gemini space that uses a different encoding.

Allowing alternative encodings is a shortcoming of the spec IMO. If you
want to serve old documents, you can transcode them in advance. If your
client needs to render to a device that only supports some
non-utf8/ascii encoding (say, an old terminal), let it do the
transcoding from UTF-8 to its preferred encoding rather than burdening
every other client author with that problem.

-- 
Philip

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great Drew DeVault once stated:
> I am stopping by to clarify another intepretation of the Gemini spec
> made by my implementations.
> 
> https://git.sr.ht/~sircmpwn/gmni
> https://git.sr.ht/~sircmpwn/kineto
> 
> With respect to the charset parameter of the document mimetype for
> text/gemini documents, it is our intention to ONLY support UTF-8, and to
> raise an error if any other content encoding is specified.

  You may want to revisit that decision and allow US-ASCII as well.  It's a
strict subset of UTF-8, and about half the text pages return that encoding:

	https://portal.mozz.us/gemini/gus.guru/statistics
	(bottom of page, by charset)

  -spc

Link to individual message.

John Cowan <cowan (a) ccil.org>

+1.  Even Google does not distinguish them when it spiders the web, which
means that more than 95% of all pages are UTF-8 (by actual inspection, not
by Content-Type: declaration).

On Tue, Nov 10, 2020 at 4:24 PM Sean Conner <sean at conman.org> wrote:

> It was thus said that the Great Drew DeVault once stated:
> > I am stopping by to clarify another intepretation of the Gemini spec
> > made by my implementations.
> >
> > https://git.sr.ht/~sircmpwn/gmni
> > https://git.sr.ht/~sircmpwn/kineto
> >
> > With respect to the charset parameter of the document mimetype for
> > text/gemini documents, it is our intention to ONLY support UTF-8, and to
> > raise an error if any other content encoding is specified.
>
>   You may want to revisit that decision and allow US-ASCII as well.  It's a
> strict subset of UTF-8, and about half the text pages return that encoding:
>
>         https://portal.mozz.us/gemini/gus.guru/statistics
>         (bottom of page, by charset)
>
>   -spc
>
>

Link to individual message.

---

Previous Thread: [ANN] Taurus - A concurrent gemini server

Next Thread: Spec freeze