<-- back to the mailing list

[users] Language tagging does not always tell the truth

Petite Abeille petite.abeille at gmail.com

Tue Mar 30 08:16:04 BST 2021

- - - - - - - - - - - - - - - - - - - 
On Mar 30, 2021, at 09:03, Stephane Bortzmeyer <stephane at sources.org> wrote:
but the language tagging ('zh-TW') is misleading, all the texts are in english

franc: detect the language of text

https://github.com/wooorm/franc/tree/main/packages/franc

'/usr/local/bin/franc' --ignore glg,vec --min-length 256 < '04.content.utf.txt' 2>/dev/null

±0¢