💾 Archived View for rawtext.club › ~sloum › geminilist › 000841.gmi captured on 2020-11-07 at 01:48:05. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2020-09-24)

-=-=-=-=-=-=-

<-- back to the mailing list

Proposed minor spec changes, for comment.

Sean Conner sean at conman.org

Tue May 19 01:07:53 BST 2020

- - - - - - - - - - - - - - - - - - - 

It was thus said that the Great jan6 at tilde.ninja once stated:

On Mon, May 18, 2020 at 05:03:41PM -0400, Sean Conner wrote:
What's a client to do if 'lang=' isn't there? Assume English? Assume nothing?
I'd think only the mimetype should be mandatory, and the rest will use defaults, when not
specified...
of course, spec shouldn't specify what the defaults are...
it could also attempt to auto-detect and prompt user if it matters (normal text browsers will
probably be indifferent, but audio browser could ask, and search engines could warn, which will
incentivize users to put a language anyway), but that's a client-specific extra...

I thought about autodetection---Unicode is defined in blocks, where eachalphabet becomes a defined block in Unicode. I then realized that there aremultiple languages that use the European block. Sure, detecting Greek iseasy since they have their own alphabet, but what about Spanish, French andGerman? They use the same alphabet.

Nice idea, but there are some tough issues to address.

I'm not sure I see the point in the encoding part, though...
practically everything can be converted to utf8 rather easily, making it a bit useless to
specify...

Think legacy documents. And not every legacy encoding scheme can roundtrip through Unicode---I recall there being issues with several east Asianlanguages (Chinese, Japanese in particular).

another interesting point, what specification is the lang= tag?

Solderpunk mentioned RFC-1766, which uses the two letter standard forlanguages.

it should probably encouraged to use some special use codes too, taking ISO 639-2 as example
(standard specifying three-letter codes for languages):
mis, for "uncoded languages";
mul, for "multiple languages";
und, for "undetermined";
zxx, for "no linguistic content; not applicable";

I buy that.

-spc