💾 Archived View for rawtext.club › ~sloum › geminilist › 006160.gmi captured on 2023-11-14 at 09:34:15. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

<-- back to the mailing list

[tech] Zero-width characters and tracking via pasted text

nervuri nervuri at disroot.org

Mon Mar 22 13:59:14 GMT 2021

- - - - - - - - - - - - - - - - - - - 

On Mon, Mar 15, 2021, Oliver Simmons wrote:

E0020: _󠀠_
... (E0020–E007F used for invisibly tagging texts by language)
E007F: _󠁿_
These *were* used for tagging texts by language, but have been
deprecated in favour of using other non-Unicode metadata for this
purpose.
They are planned to be used in emojis and are (were?) used (but not
widely supported) for country codes/flags with codes longer than 2
characters (3?), such as USA states or counties of England.
Wikipedia has a ~ok description of their history.
=
https://en.wikipedia.org/wiki/Tags_(Unicode_block)

Thanks, I replaced "used" with "formerly used". Wikipedia says "Therelease of Emoji 5.0 in March 2017 considers these characters to beemoji for use as modifiers in special sequences." I take that to meanthat they will remain zero-width, but will generate emojis when used inspecial sequences, as with the flag of England:

🏴󠁧󠁢󠁥󠁮󠁧󠁿=🏴<U+E0067><U+E0062><U+E0065><U+E006E><U+E0067><U+E007F><U+E0042>

Unicode keeps getting weirder.