💾 Archived View for rawtext.club › ~sloum › geminilist › 000863.gmi captured on 2020-11-07 at 01:49:05. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2020-09-24)
-=-=-=-=-=-=-
defdefred defdefred at protonmail.com
Tue May 19 10:35:26 BST 2020
- - - - - - - - - - - - - - - - - - -
On Tuesday 19 May 2020 09:20, solderpunk <solderpunk at SDF.ORG> wrote:
I don't think it's viable for interactive user clients (especially light
and simple ones) to attempt this, but in the context of, say, a search
engine which really wants to categorise everything (which is not to say
that GUS necessarily has to shoulder this burden!), even distinguishing
languages with the same alphabet is possible by looking at bigram and
trigram frequencies if there's enough text. German text will have many
more occurences of "lich" and "heit" than French or Spanish, etc.Agree and french have éèà, spanish ñ¿ and german ß :-)Nice to have UTF-8 to display all of them in the same document...