💾 Archived View for gemi.dev › gemini-mailing-list › 000037.gmi captured on 2024-03-21 at 16:42:34. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-12-28)
-=-=-=-=-=-=-
I'm seeing some people wanting to support some other formatting options like color, leading to the support of ECMA-48 [1]. I would rather NOT see this supported because of several issues. 1). Complexity. The spec is rather complex and wide ranging. There are 130 escape codes (and I'm EXCLUDING those defined by ANSI, codes 0 through 31). If you are filtering them, then five or six cases (depending if you want to support one non-standard encoding used by xterm) need to be considered (which is still four or five more than most would expect). 2). Security. Passing raw escape sequences can not only leave the termainal into an unknown state, but there are a few seqences that are especially alarming: DCS Device Control String OSC Operating System Command APC Application Program Command which, if supported, are exactly what they say on the tin, and thus, you *don't* want these to be processed *at all*. Personally, I haven't come across any terminal or program that supports these. Also, ANSI.SYS, originally for MS-DOS but maybe it still exists for Windows? (I don't know, I don't use Windows) allows one to redefine any key on the keyboard to send a new sequence. "DELTREE C:\*" anyone? Below is the BNF (RFC-5234) for ECMA-48 control sequences: CSI = %d27 '[' / %d155 ; ISO charset / %d194 %d155 ; UTF-8 OSC = %d27 ']' / %d157 ; ISO / %d194 %d157 ; UTF-8 ST = %d27 '\' / %d156 ; ISO / %d194 %156 ; UTF-8 string = %d27 ( 'P' / 'X' / '^' / '_') / (%d144 / %d152 / %d158 / %d159) ; ISO / %d194 (%d144 / %d152 / %d158 / %d159) ; UTF-8 iso = %d160-255 ; the utf8 rule is any UTF-8 codepoint 160 or higher sequence = CSI *(%d48-63) *(%d32-47) %d64-126 / OSC *(%d8-13 / %32-126 / iso / utf8) (ST / %d7) / string *(%d8-13 / %32-126 / iso / utf8) ST / %d27 (%d64-126) / %d128-159 ; ISO / %d194 (%d128-159) ; UTF-8 If you are parsing an ISO charset (like ISO-8859-1) then remove the rules marked UTF-8; similarly, if you are parsing UTF-8, then remove the rules marked ISO. -spc [1] The so called 'ANSI escape codes', which is funny because they're not defined by ANSI, but ISO, so of course they're called ECMA.
---
Previous Thread: What shall we call Gemini logs, anyway?
Next Thread: FW: Text reflow woes (or: I want bullets back!)y