💾 Archived View for noa.smol.pub › 1695139432 captured on 2023-11-04 at 11:22:00. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Thoughts on learning with texts tools

There are a few learning with texts type tools around. Common ones are lingq, readlang, and learning with texts itself. I personally use a simple website called vocabtracker which also does a similar thing.

The general gist of the tool is to show you an article with every unknown word highlighted, and let you look up the definitions and slowly learn new words. But i think all the implementations i've tried have some issues.

Lingq's big issue is that the software is expensive and clunky. It is trying to do too much. I think the core of it works better than the alternatives, and i reckon they could make a decent amount and be much more beloved than put up with if they had just focused on that. But they didn't. The advantages are this already a huge library of texts with accompanying audio which is convenient.

Learning with texts isn't much less clunky and is quite inconvenient to get set up, as it requires a server.

I didn't try readlang much at all but it had real issues with chinese word separation.

Dong chinese is exclusively for chinese and is great at separating words. It also highlights words in time with the audio and has a nice library to work through.

Vocab tracker also has issues with word separation sometimes, as did all the others, but it's simpler and free so i have less to complain about. Lingq seems the worst, as they have community sourced definitions, but don't have a way to community source the right word boundaries for any one sentence. Often the same word will appear in two sentences next to each other but count as different vocabulary items due to poor word splitting.

If i was making my own such tool for chinese, i would make a few design decisions. First of all, nice word splitting is a priority. Something like jieba[1] should help with that. But it should also be possible to highlight a string of words and that will override the default splitting for that occurrence.

Second, the colours for highlighting words. Unknown words should have a neutral background, and the background should fade out as words become known. Lingq uses blue for unknown.

There should be a default dictionary entry, but this can be updated by the user as a free text field. This new entry will be shown for any occurrence of the word.

How to modify the word's known level? Most tools require this to be done manually, or fall back on flash cards. But none of the flashcards i've seen are great because if you get a lot of unknown words all at once, the flashcards get flooded. Of course doing it manually is not very easy to judge either.

My current thought is: clicking on an unknown word marks it as level 1 unknown, on a scale from 0 (unseen) to 5 (known). A lookup counter is also incremented every time the word is looked up. The reader can manually change the known value if they think they know the word. After that, any time the reader goes to the next page without looking up the word, it moves up the known scale. If they click for definition five times, or potentially more times if they known level is already higher, it gets added to flashcards.

In terms of audio alongside text, i'm happy not to have them synchronised, but general audio is definitely useful. I would also want to support video. But both of these would only be offline, rather than grabbing from youtube or other sources.

[1] https://github.com/fxsjy/jieba

go to home