💾 Archived View for station.martinrue.com › kevinsan › 64a6d5d2ad364e4d8f307514c87601da captured on 2023-04-26 at 14:43:59. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-03-20)
-=-=-=-=-=-=-
I need a way to efficiently filter out common dictionary words from a list of strings, ideally using existing CLI tools or libraries.
2 years ago
Might want to filter out words which are not nouns, after removing stop words. ntlk can get you part of speech. · 2 years ago
The keyword for this is "stop words". You can find lists of stop words online, and then filter them out using Python or `grep -vf`. · 2 years ago
Grab a word frequency list off somewhere (I think I've found them on wikipedia at times), plop into a file, read it into a set in python, match vs that? · 2 years ago
@ethereal I have used aspell in the past, but its dictionaries are too comprehensive. I just want to throw out the most common English words. I'm looking at nltk and TextBlob in Python at the moment. I'm trying to extract 'interesting words' from bookmark titles. · 2 years ago
Don't most linux distros ship with a list of dictionary words, primarly for spellchecking? Maybe you could start from there. · 2 years ago