💾 Archived View for rawtext.club › ~sloum › geminilist › 000854.gmi captured on 2020-10-31 at 01:52:21. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2020-09-24)
-=-=-=-=-=-=-
solderpunk solderpunk at SDF.ORG
Tue May 19 08:20:16 BST 2020
- - - - - - - - - - - - - - - - - - -
On Mon, May 18, 2020 at 08:07:53PM -0400, Sean Conner wrote:
Sure, detecting Greek is
easy since they have their own alphabet, but what about Spanish, French and
German? They use the same alphabet.
I don't think it's viable for interactive user clients (especially lightand simple ones) to attempt this, but in the context of, say, a searchengine which really wants to categorise everything (which is not to saythat GUS necessarily has to shoulder this burden!), even distinguishinglanguages with the same alphabet is possible by looking at bigram andtrigram frequencies if there's enough text. German text will have manymore occurences of "lich" and "heit" than French or Spanish, etc.
Nice idea, but there are some tough issues to address.
Yeah, this language proposal may have been poorly categorised as "quickand easy" compared to the others.
Cheers,Solderpunk