💾 Archived View for rawtext.club › ~sloum › geminilist › 000387.gmi captured on 2020-09-24 at 02:36:29. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

<-- back to the mailing list

Text reflow woes (or: I want bullets back!)y

Sean Conner sean at conman.org

Sat Jan 18 23:36:37 GMT 2020

- - - - - - - - - - - - - - - - - - - ```

It was thus said that the Great Brian Evans once stated:
> Aaron Janse writes:
> 
> Hmmm. It does seem, though, that *allowing* ANSI colors would require
> 
> non-terminal clients to strip ANSI colors, which would be a PITA,
> 
> expecially considering that ANSI is a hot mess (I built an ANSI parser
> 
> a while ago [1])
> 
> Currently Bombadillo has a few different modes. The normal mode removes 
> ansi escape codes. As I am parsing a document if I read an `\033` character I 
> just toggle an escape code boolean and then consume until I read a A-Za-z
> character (and consume that char as well). It works very quickly and handles
> removing them quite well. I do the same thing for the color mode for any
> escape codes that do not end in `m`. That said, it may not work as well for
> people not parsing by writing characters into a buffer char by char.

  Having written an ECMA-48 (the terminal control codes everybody calls ANSIescape codes when they aren't defined by ANSI) parser you'll probably catch99% of the control codes used.  But the actual definition is (RFC-5234 BNF):

	CSI   = %d27 '['	      / %d155       ; ISO-8859-1 or similar	      / %d194 %d155 ; UTF-8 encoding	param = %d48-63     ; chars '0' through '?'        meta  = %d32-47     ; chars ' ' through '/'        cmd   = %d64-126    ; chars '@' through '~'

	sequence = CSI *param *meta cmd

  There are other ECMA-48 sequences that could prove dangerous if notfiltered for.  I do have Lua code to parse these [1][2] and use them in mycurrent gopher client to filter them out (and yes, I have come across sitesthat embed ECMA-48 control codes).

> 2. Do a simple find and replace on the whole document for '\033' and replace
>     it with "ESC". While this will still leave the codes displaying to the viewer
>     they will not actually render, thus you do not need to worry about line
>      movement, screen clears, etc.

  You might want to replace the following codepoints to render control codesharmless:

	0 - 31	; C0 set, except interpret the range from 7-13 inclusive	127	; DEL	128-159	; C1 set

I say codepoints because in UTF-8, the C1 set is represented by thesequences

	194 128 through 194 129

-spc

[1]	https://github.com/spc476/LPeg-Parsers/blob/master/iso/control.lua

	This handles encodings in ISO-8859-1 and similar.  I have a UTF-8	one that is separate.  This one just returns the escape sequence as	a unit with no further parsing of the actual sequence.

[2]	https://github.com/spc476/LPeg-Parsers/blob/master/iso/ctrl.lua

	This does a more complete parse of the escape sequence, to include	its name (if any).  Again, This is for ISO-8859-1 and similar	encodinds.  I have another version for UTF-8.