1 upvotes, 1 direct replies (showing 1)
View submission: Unicode matching bug in AutoModerator
This explains why my attempt at a rule isn't working.. i am trying to remove posts that use any non-standard latin characters in titles. This was what i was trying:
priority: 2 title (includes, regex): ['[^\u0000-\u007f]'] moderators_exempt: false comment: | Your submission has been removed. The title may only include standard Latin characters (those on your keyboard). If you wish to re-submit, please do so with only standard characters. action: remove action_reason: "Non-Standard Characters In Title" ---
Do you think if i were to replace the unicode range with every individual character it would work?
Comment by dequeued at 11/05/2019 at 20:13 UTC*
1 upvotes, 1 direct replies
I think it's probably just the syntax you're using. Other than including the literal characters, this syntax for Unicode ranges[1] is the only one that has worked for me.
That being said, for this specific use case, I'd probably do something more like this:
--------------------------------------------------------------------------------
## not tested! title (regex, includes): ['[^\t !-~]'] action: remove
--------------------------------------------------------------------------------
Given how much smart quotes and a few other special characters are being pushed by browsers and apps these days, I think you'll be hard pressed to not add some non-ASCII stuff to that character class. These are the ones that I see the most often:
00A3 POUND SIGN 2013 EN DASH 2014 EM DASH 2019 RIGHT SINGLE QUOTATION MARK 201C LEFT DOUBLE QUOTATION MARK 201D RIGHT DOUBLE QUOTATION MARK 2026 HORIZONTAL ELLIPSIS 20AC EURO SIGN