“Time flies like an arrow, but fruit flies like a banana.”

Noise words.

That's what I'm working on right now. Noise words.

Not words like clang or pththththt but words that can be ignored in Natural Language Processing. Interesting problem. Words like the and a can be stripped as noise words. But what else? And does frequency of occurance count?

Conjunctions, interjections, and maybe propositions can be cut. Maybe.

Doing a quick search for precompiled word lists, I came across the Language Technology Group Helpdesk FAQ [1] which is incredible if you're into this type of thing.

[1] http://www.ltg.ed.ac.uk/helpdesk/faq/index.html

Gemini Mention this post

Contact the author