I am stopping by to clarify another intepretation of the Gemini spec made by my implementations. https://git.sr.ht/~sircmpwn/gmni https://git.sr.ht/~sircmpwn/kineto With respect to the charset parameter of the document mimetype for text/gemini documents, it is our intention to ONLY support UTF-8, and to raise an error if any other content encoding is specified. We won't refuse other text/* documents with an arbitrary encoding, though we won't display them - we'll just let the user download them. All text/gemini documents are new, and can be expected to be written in a sane text encoding. Other text documents may use other encodings for historical reasons, and therefore will not be refused outright. It's 2020, and I have a zero tolerance policy for dumb encodings.
On Tue, 10 Nov 2020 12:47:07 -0400 "Drew DeVault" <sir at cmpwn.com> wrote: > With respect to the charset parameter of the document mimetype for > text/gemini documents, it is our intention to ONLY support UTF-8, and to > raise an error if any other content encoding is specified. I have opted to only support UTF-8 in my client, as well. This is allowed by the spec. I'm not sure that I've ever come across a document in Gemini space that uses a different encoding. Allowing alternative encodings is a shortcoming of the spec IMO. If you want to serve old documents, you can transcode them in advance. If your client needs to render to a device that only supports some non-utf8/ascii encoding (say, an old terminal), let it do the transcoding from UTF-8 to its preferred encoding rather than burdening every other client author with that problem. -- Philip
It was thus said that the Great Drew DeVault once stated: > I am stopping by to clarify another intepretation of the Gemini spec > made by my implementations. > > https://git.sr.ht/~sircmpwn/gmni > https://git.sr.ht/~sircmpwn/kineto > > With respect to the charset parameter of the document mimetype for > text/gemini documents, it is our intention to ONLY support UTF-8, and to > raise an error if any other content encoding is specified. You may want to revisit that decision and allow US-ASCII as well. It's a strict subset of UTF-8, and about half the text pages return that encoding: https://portal.mozz.us/gemini/gus.guru/statistics (bottom of page, by charset) -spc
+1. Even Google does not distinguish them when it spiders the web, which means that more than 95% of all pages are UTF-8 (by actual inspection, not by Content-Type: declaration). On Tue, Nov 10, 2020 at 4:24 PM Sean Conner <sean at conman.org> wrote: > It was thus said that the Great Drew DeVault once stated: > > I am stopping by to clarify another intepretation of the Gemini spec > > made by my implementations. > > > > https://git.sr.ht/~sircmpwn/gmni > > https://git.sr.ht/~sircmpwn/kineto > > > > With respect to the charset parameter of the document mimetype for > > text/gemini documents, it is our intention to ONLY support UTF-8, and to > > raise an error if any other content encoding is specified. > > You may want to revisit that decision and allow US-ASCII as well. It's a > strict subset of UTF-8, and about half the text pages return that encoding: > > https://portal.mozz.us/gemini/gus.guru/statistics > (bottom of page, by charset) > > -spc > >
---