💾 Archived View for rawtext.club › ~sloum › geminilist › 001107.gmi captured on 2020-11-07 at 01:59:28. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2020-09-24)
-=-=-=-=-=-=-
solderpunk solderpunk at SDF.ORG
Thu May 28 19:43:21 BST 2020
- - - - - - - - - - - - - - - - - - -
Ahoy!
Let's pick this issue up again, in its own thread this time.
My original proposal was that we add a new parameter to the text/geminimedia type to specify the human language a document is written in.Following the lead of RFC1766, the parameter would be called "lang" andtake values based on ISO 639 language codes and ISO 3166 country codes.
As far as I recall, nobody actually objected to this as something weshould do in principle, instead we just got distracted by various edgecases. But I guess I may as well ask now: does anybody think this is a*bad* idea?
The two concerete motivations for adding this were:
1. Screenreaders need to know this information to know which settings touse for their text-to-speech engine: the same letters correspond todifferent sounds in different languages.
2. Search engines may want to to offer their users the ability to askfor results only in a particular set of languages.
Can anybody think of additional likey use cases besides these?
Since these are the main motivations, that also means that "normalclients" (i.e. for use by sighted human users) have minimal use for thisinformation and can more or less ignore it. So, in considering the edgecases that came up, we should be thinking about screenreaders and searchengines, not the stuff that most people here are presumably using day today.
The first question was what to do if the parameter is not specified.
I was, and am, opposed to putting a default language in the spec.
In the case of a screenreader, it seems entirely sensible to me that theuser of any such screenreader should be able to specify their owndefault based on their primary reading languages, and that the softwareshould make it easy to change this when it is clear there is a problem.It's not really the Gemini spec's job to say anything about this.
The case of search engines is trickier, since their resulting databasedoes not have just one user but many. This was where autodetectionfirst came up, which some people seemed to get carried away with. Fullygeneralised autodetection of language is computationally expensive andit gives answers with some uncertainty. A large search engine project*may* want to think about it - the idea of clients for humans usersdoing it as a routine response to a lack of a lang parameter is nuts.
A simpler option for search engines might simply be to interpret a userrequest of "only show me results in languages X" as "don't show results*known* to be in languages other than X". i.e documents for which thelanguage is not known are always possible search results. This isimperfect, but, well, sometimes life is.
In short, I am not sure that the lack of specified default behaviour isa good reason not to go ahead with this.
The second question was what to do when a document contains text inmultiple languages. This is a trickier question. I'd prefer not todefine a new line type to handle it. We could at least allow the langparameter to accept multiple values separated by some delimiter. Itwouldn't be clear from that which parts were what, but it could at leastact as a strong hint to screenreaders. Search engines could includesuch pages in results if any of the delcared languages matched one theuser had requested. Actually, perhaps that's a perfectly adequatesolution, in which case this is not trickier at all.
There's also the question of directionality, which I think might requirea separate parameter entirely. But let's focus on the language thingfor now. How does the above sound to people?
Cheers,Solderpunk