💾 Archived View for rawtext.club › ~sloum › geminilist › 006248.gmi captured on 2021-11-30 at 19:37:34. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

<-- back to the mailing list

[users] Language tagging does not always tell the truth

Petite Abeille petite.abeille at gmail.com

Tue Mar 30 08:16:04 BST 2021

- - - - - - - - - - - - - - - - - - - 
On Mar 30, 2021, at 09:03, Stephane Bortzmeyer <stephane at sources.org> wrote:
but the language tagging ('zh-TW') is misleading, all the texts are in english

franc: detect the language of text

https://github.com/wooorm/franc/tree/main/packages/franc

'/usr/local/bin/franc' --ignore glg,vec --min-length 256 < '04.content.utf.txt' 2>/dev/null

±0¢