💾 Archived View for thingvellir.net › log › 2022-12-04.gmi captured on 2023-03-20 at 17:41:32. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-01-29)

➡️ Next capture (2023-04-19)

-=-=-=-=-=-=-

The Þog of 2022-12-04: Not-Unicode Character Encodings

Recently, I've been thinking that domain-specific one-byte code pages are nicer than troublesome UTF-8 decoding or dealing with the atrocious mess of Unicode. See homoglyph attacks, zalgo text, ZWJ combining pictographs, etc. The only real upside I see to it is that it's nearly universally supported and allows uniformly encoded, multi-language text.

ASCII was designed at a time where combining characters were a useful hack and out-of-band communication with the teletype wasn't an option. Does anyone still use the ASCII separator, device control, or vertical tab characters for what they were designed for?

Example Encoding

# This is written sort of like a Unicode mapping.
# '#' signs are comments, the left hex number is the source byte,
# and the right hex number is the Unicode equivalent of that character.

# Shifted subset of ASCII
0x00	0x0020	# Space
0x01	0x0021	# Exclamation mark
0x02	0x0022	# Double quotation mark
0x03	0x0023	# Number sign
0x04	0x00A4	# Currency sign
0x05	0x0025	# Percent sign
0x06	0x0026	# Ampersand
0x07	0x0027	# Apostrophe, Right single quotation mark
0x08	0x0028	# Left parenthesis
0x09	0x0029	# Right parenthesis
0x0A	0x002A	# Asterisk
0x0B	0x002B	# Plus sign
0x0C	0x002C	# Comma
0x0D	0x002D	# Hyphen, Minus
0x0E	0x002E	# Full stop
0x0F	0x002F	# Solidus, Forward slash
0x10	0x0030	# Digit Zero
0x11	0x0031	# Digit One
0x12	0x0032	# Digit Two
0x13	0x0033	# Digit Three
0x14	0x0034	# Digit Four
0x15	0x0035	# Digit Five
0x16	0x0036	# Digit Six
0x17	0x0037	# Digit Seven
0x18	0x0038	# Digit Eight
0x19	0x0039	# Digit Nine
0x1A	0x003A	# Colon
0x1B	0x003B	# Semicolon
0x1C	0x003C	# Less-than sign, Left angle-bracket
0x1D	0x003D	# Equals sign
0x1E	0x003E	# Greater-than sign, Right angle-bracket
0x1F	0x003F	# Question Mark
0x20	0x0040	# 'At' sign
0x21	0x0041	# Latin uppercase 'A'
0x22	0x0042	# Latin uppercase 'B'
0x23	0x0043	# Latin uppercase 'C'
0x24	0x0044	# Latin uppercase 'D'
0x25	0x0045	# Latin uppercase 'E'
0x26	0x0046	# Latin uppercase 'F'
0x27	0x0047	# Latin uppercase 'G'
0x28	0x0048	# Latin uppercase 'H'
0x29	0x0049	# Latin uppercase 'I'
0x2A	0x004A	# Latin uppercase 'J'
0x2B	0x004B	# Latin uppercase 'K'
0x2C	0x004C	# Latin uppercase 'L'
0x2D	0x004D	# Latin uppercase 'M'
0x2E	0x004E	# Latin uppercase 'N'
0x2F	0x004F	# Latin uppercase 'O'
0x30	0x0050	# Latin uppercase 'P'
0x31	0x0051	# Latin uppercase 'Q'
0x32	0x0052	# Latin uppercase 'R'
0x33	0x0053	# Latin uppercase 'S'
0x34	0x0054	# Latin uppercase 'T'
0x35	0x0055	# Latin uppercase 'U'
0x36	0x0056	# Latin uppercase 'V'
0x37	0x0057	# Latin uppercase 'W'
0x38	0x0058	# Latin uppercase 'X'
0x39	0x0059	# Latin uppercase 'Y'
0x3A	0x005A	# Latin uppercase 'Z'
0x3B	0x005B	# Left square bracket
0x3C	0x005C	# Reverse solidus, Backslash
0x3D	0x005D	# Right square bracket
0x3E	0x005E	# Caret
0x3F	0x005F	# Underscore
0x40	0x0060	# Backtick, Left single quotation mark
0x41	0x0061	# Latin lowercase 'a'
0x42	0x0062	# Latin lowercase 'b'
0x43	0x0063	# Latin lowercase 'c'
0x44	0x0064	# Latin lowercase 'd'
0x45	0x0065	# Latin lowercase 'e'
0x46	0x0066	# Latin lowercase 'f'
0x47	0x0067	# Latin lowercase 'g'
0x48	0x0068	# Latin lowercase 'h'
0x49	0x0069	# Latin lowercase 'i'
0x4A	0x006A	# Latin lowercase 'j'
0x4B	0x006B	# Latin lowercase 'k'
0x4C	0x006C	# Latin lowercase 'l'
0x4D	0x006D	# Latin lowercase 'm'
0x4E	0x006E	# Latin lowercase 'n'
0x4F	0x006F	# Latin lowercase 'o'
0x50	0x0070	# Latin lowercase 'p'
0x51	0x0071	# Latin lowercase 'q'
0x52	0x0072	# Latin lowercase 'r'
0x53	0x0073	# Latin lowercase 's'
0x54	0x0074	# Latin lowercase 't'
0x55	0x0075	# Latin lowercase 'u'
0x56	0x0076	# Latin lowercase 'v'
0x57	0x0077	# Latin lowercase 'w'
0x58	0x0078	# Latin lowercase 'x'
0x59	0x0079	# Latin lowercase 'y'
0x5A	0x007A	# Latin lowercase 'z'
0x5B	0x007B	# Left curly bracket
0x5C	0x007C	# Vertical line, Pipe
0x5D	0x007D	# Right curly bracket
0x5E	0x007E	# Tilde
0x5F	0x000A	# Line feed, Newline

# Greek letters, intended for mathematical notation
0x60	0x03B1	# Greek lowercase Alpha
0x61	0x03B2	# Greek lowercase Beta
0x62	0x03B3	# Greek lowercase Gamma
0x63	0x03B4	# Greek lowercase Delta
0x64	0x03B8	# Greek lowercase Theta
0x65	0x03BB	# Greek lowercase Lamda
0x66	0x03BC	# Greek lowercase Mu
0x67	0x03C0	# Greek lowercase Pi
0x68	0x03C4	# Greek lowercase Tau
0x69	0x03C6	# Greek lowercase Phi
0x6A	0x03C8	# Greek lowercase Psi
0x6B	0x03C9	# Greek lowercase Omega
0x6C	0x0394	# Greek uppercase Delta
0x6D	0x03A0	# Greek uppercase Pi
0x6E	0x03A3	# Greek uppercase Sigma
0x6F	0x03A9	# Greek uppercase Omega

0x70	0x00A1	# Inverted exclamation mark
0x71	0x00BF	# Inverted question mark
0x72	0x2022	# Black bullet
0x73	0x25E6	# White bullet
0x74	0x00D7	# Multiplication sign
0x75	0x00F7	# Division sign
0x76	0x221A	# Square root
0x77	0x221E	# Infinity sign
0x78	0x263A	# Outlined smiley face
0x79	0x263B	# Filled smiley face
0x7A	0x2665	# Heart suit
0x7B	0x2666	# Diamond suit
0x7C	0x2663	# Club suit
0x7D	0x2660	# Spade suit
0x7E	0x00A7	# Section sign
0x7F	0x2588	# Full block