Comment by Djentleman420 on 11/05/2019 at 19:45 UTC

1 upvotes, 1 direct replies (showing 1)

View submission: Unicode matching bug in AutoModerator

This explains why my attempt at a rule isn't working.. i am trying to remove posts that use any non-standard latin characters in titles. This was what i was trying:

priority: 2
title (includes, regex): ['[^\u0000-\u007f]']
moderators_exempt: false
comment: |
Your submission has been removed. The title may only include standard Latin characters 
(those on your keyboard).

If you wish to re-submit, please do so with only standard characters.
action: remove
action_reason: "Non-Standard Characters In Title"
---

Do you think if i were to replace the unicode range with every individual character it would work?

Replies

Comment by dequeued at 11/05/2019 at 20:13 UTC*

1 upvotes, 1 direct replies

I think it's probably just the syntax you're using. Other than including the literal characters, this syntax for Unicode ranges[1] is the only one that has worked for me.

1: https://old.reddit.com/r/AutoModerator/comments/bn4u8j/unicode_matching_bug_in_automoderator/en2f8w2/

That being said, for this specific use case, I'd probably do something more like this:

--------------------------------------------------------------------------------

## not tested!
title (regex, includes): ['[^\t !-~]']
action: remove

--------------------------------------------------------------------------------

Given how much smart quotes and a few other special characters are being pushed by browsers and apps these days, I think you'll be hard pressed to not add some non-ASCII stuff to that character class. These are the ones that I see the most often:

 00A3   POUND SIGN
 2013   EN DASH
 2014   EM DASH
 2019   RIGHT SINGLE QUOTATION MARK
 201C   LEFT DOUBLE QUOTATION MARK
 201D   RIGHT DOUBLE QUOTATION MARK
 2026   HORIZONTAL ELLIPSIS
 20AC   EURO SIGN