💾 Archived View for rawtext.club › ~sloum › geminilist › 006120.gmi captured on 2023-09-28 at 16:54:52. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

<-- back to the mailing list

[tech] Zero-width characters and tracking via pasted text

Oliver Simmons oliversimmo at gmail.com

Mon Mar 15 16:43:33 GMT 2021

- - - - - - - - - - - - - - - - - - - 

On Sun, 14 Mar 2021 at 16:55, nervuri <nervuri at disroot.org> wrote:

First, as a point of reference, here are a few positive-width Unicode
characters:
0020: _ _ | 00E9: _é_ | 03A9: _Ω_ | 5B57: _字_ | 1F407: __

All fine for me!(GMail seems to strip emoji in plain-text replies though.. which is rather odd.)

FFF9: __
FFFA: __
FFFB: __

These three show as the replacement box for me.I've never quite understood what the "inter annotation" whatevercharacters are - but I think they're some form of control character sohaving them display as a box when used incorrectly might be correct.

E0020: _󠀠_
... (E0020–E007F used for invisibly tagging texts by language)
E007F: _󠁿_

These *were* used for tagging texts by language, but have beendeprecated in favour of using other non-Unicode metadata for thispurpose.They are planned to be used in emojis and are (were?) used (but notwidely supported) for country codes/flags with codes longer than 2characters (3?), such as USA states or counties of England.Wikipedia has a ~ok description of their history.=

https://en.wikipedia.org/wiki/Tags_(Unicode_block)

-Oliver Simmons