It was thus said that the Great Brian Evans once stated: > Aaron Janse writes: > > Hmmm. It does seem, though, that *allowing* ANSI colors would require > > non-terminal clients to strip ANSI colors, which would be a PITA, > > expecially considering that ANSI is a hot mess (I built an ANSI parser > > a while ago [1]) > > Currently Bombadillo has a few different modes. The normal mode removes > ansi escape codes. As I am parsing a document if I read an `\033` character I > just toggle an escape code boolean and then consume until I read a A-Za-z > character (and consume that char as well). It works very quickly and handles > removing them quite well. I do the same thing for the color mode for any > escape codes that do not end in `m`. That said, it may not work as well for > people not parsing by writing characters into a buffer char by char. Having written an ECMA-48 (the terminal control codes everybody calls ANSI escape codes when they aren't defined by ANSI) parser you'll probably catch 99% of the control codes used. But the actual definition is (RFC-5234 BNF): CSI = %d27 '[' / %d155 ; ISO-8859-1 or similar / %d194 %d155 ; UTF-8 encoding param = %d48-63 ; chars '0' through '?' meta = %d32-47 ; chars ' ' through '/' cmd = %d64-126 ; chars '@' through '~' sequence = CSI *param *meta cmd There are other ECMA-48 sequences that could prove dangerous if not filtered for. I do have Lua code to parse these [1][2] and use them in my current gopher client to filter them out (and yes, I have come across sites that embed ECMA-48 control codes). > 2. Do a simple find and replace on the whole document for '\033' and replace > it with "ESC". While this will still leave the codes displaying to the viewer > they will not actually render, thus you do not need to worry about line > movement, screen clears, etc. You might want to replace the following codepoints to render control codes harmless: 0 - 31 ; C0 set, except interpret the range from 7-13 inclusive 127 ; DEL 128-159 ; C1 set I say codepoints because in UTF-8, the C1 set is represented by the sequences 194 128 through 194 129 -spc [1] https://github.com/spc476/LPeg-Parsers/blob/master/iso/control.lua This handles encodings in ISO-8859-1 and similar. I have a UTF-8 one that is separate. This one just returns the escape sequence as a unit with no further parsing of the actual sequence. [2] https://github.com/spc476/LPeg-Parsers/blob/master/iso/ctrl.lua This does a more complete parse of the escape sequence, to include its name (if any). Again, This is for ISO-8859-1 and similar encodinds. I have another version for UTF-8.
---
Previous in thread (133 of 148): 🗣️ Julien Blanchard (julien (a) typed-hole.org)
Next in thread (135 of 148): 🗣️ Aaron Janse (aaron (a) ajanse.me)