💾 Archived View for station.martinrue.com › kevinsan › 64a6d5d2ad364e4d8f307514c87601da captured on 2022-07-16 at 18:51:56. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
I need a way to efficiently filter out common dictionary words from a list of strings, ideally using existing CLI tools or libraries.
10 months ago
Might want to filter out words which are not nouns, after removing stop words. ntlk can get you part of speech. · 10 months ago
The keyword for this is "stop words". You can find lists of stop words online, and then filter them out using Python or `grep -vf`. · 10 months ago
Grab a word frequency list off somewhere (I think I've found them on wikipedia at times), plop into a file, read it into a set in python, match vs that? · 10 months ago
@ethereal I have used aspell in the past, but its dictionaries are too comprehensive. I just want to throw out the most common English words. I'm looking at nltk and TextBlob in Python at the moment. I'm trying to extract 'interesting words' from bookmark titles. · 10 months ago
Don't most linux distros ship with a list of dictionary words, primarly for spellchecking? Maybe you could start from there. · 10 months ago