I would say, generally, that the base directionality of text is given by the script one is using, which is defined by the language tag. A language has a default script (for en-US, that's Latin), and if someone wants to change their script, it's very easy to do so via the script part of the lang tag, for example, yi-US (which is shorthand for yi-Hebr-US, and is RTL) vs yi-Latn-US (LTR). Nicole On Thu, May 28, 2020 at 11:43, solderpunk <solderpunk at SDF.ORG> wrote: > Ahoy! > > Let's pick this issue up again, in its own thread this time. > > My original proposal was that we add a new parameter to the text/gemini > media type to specify the human language a document is written in. > Following the lead of RFC1766, the parameter would be called "lang" and > take values based on ISO 639 language codes and ISO 3166 country codes. > > As far as I recall, nobody actually objected to this as something we > should do in principle, instead we just got distracted by various edge > cases. But I guess I may as well ask now: does anybody think this is a > *bad* idea? > > The two concerete motivations for adding this were: > > 1. Screenreaders need to know this information to know which settings to > use for their text-to-speech engine: the same letters correspond to > different sounds in different languages. > > 2. Search engines may want to to offer their users the ability to ask > for results only in a particular set of languages. > > Can anybody think of additional likey use cases besides these? > > Since these are the main motivations, that also means that "normal > clients" (i.e. for use by sighted human users) have minimal use for this > information and can more or less ignore it. So, in considering the edge > cases that came up, we should be thinking about screenreaders and search > engines, not the stuff that most people here are presumably using day to > day. > > The first question was what to do if the parameter is not specified. > > I was, and am, opposed to putting a default language in the spec. > > In the case of a screenreader, it seems entirely sensible to me that the > user of any such screenreader should be able to specify their own > default based on their primary reading languages, and that the software > should make it easy to change this when it is clear there is a problem. > It's not really the Gemini spec's job to say anything about this. > > The case of search engines is trickier, since their resulting database > does not have just one user but many. This was where autodetection > first came up, which some people seemed to get carried away with. Fully > generalised autodetection of language is computationally expensive and > it gives answers with some uncertainty. A large search engine project > *may* want to think about it - the idea of clients for humans users > doing it as a routine response to a lack of a lang parameter is nuts. > > A simpler option for search engines might simply be to interpret a user > request of "only show me results in languages X" as "don't show results > *known* to be in languages other than X". i.e documents for which the > language is not known are always possible search results. This is > imperfect, but, well, sometimes life is. > > In short, I am not sure that the lack of specified default behaviour is > a good reason not to go ahead with this. > > The second question was what to do when a document contains text in > multiple languages. This is a trickier question. I'd prefer not to > define a new line type to handle it. We could at least allow the lang > parameter to accept multiple values separated by some delimiter. It > wouldn't be clear from that which parts were what, but it could at least > act as a strong hint to screenreaders. Search engines could include > such pages in results if any of the delcared languages matched one the > user had requested. Actually, perhaps that's a perfectly adequate > solution, in which case this is not trickier at all. > > There's also the question of directionality, which I think might require > a separate parameter entirely. But let's focus on the language thing > for now. How does the above sound to people? > > Cheers, > Solderpunk -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200528/0330 b10c/attachment.htm>
---
Previous in thread (1 of 6): 🗣️ solderpunk (solderpunk (a) SDF.ORG)
Next in thread (3 of 6): 🗣️ solderpunk (solderpunk (a) SDF.ORG)