💾 Archived View for gemi.dev › gemini-mailing-list › 000030.gmi captured on 2023-11-04 at 12:19:32. Gemini links have been rewritten to link to archived content

View Raw

More Information

➡️ Next capture (2023-12-28)

-=-=-=-=-=-=-

Color and escape codes in text/gemini

Baschdel <baschdel (a) disroot.org>

In my last post on this list I called escape codes spaghetti without much 
of an explaination.

Tl;dr:
-Do we want color in text/gemini?
-If yes how should it be standardized?

I really like that there are creative people out there looking for ways to 
push systems like gemini outside of what they are supposed to be able to do.
And I don't want to decide (at least not now) if the escape code thingy is 
a "bug" or a feataure in most clients.
The first question we have ask here is if we want color support at all: 
I know, that gemini should remain simple and support for 
colors takes away a good part of this simplicity.
But there are a lot of usecases for color support.
(Code highlighting, pretty ascii art, footnotes, ...)
For me these are two pretty strong arguments against each other.

So in case we want color support:
Assuming the escape code thingy is a feataure:
We should standardize which escape codes are allowed and how clients should handle them.
Assuming it is a "bug":
Some people (me included after thinking about it) like to see color being 
used in gemini so we have to find another way to put colors in while the 
text/gemini format should remain easy to parse and human readable.
(Correct me if I'm wrong but that is how I interpret the text/gemini 
format: Markup that is easy to parse but still human readable if you open 
it in something like a text editor)

If I had to standardize it I'd use custom escape sequences, similar to 
those already in use but optimized to be no longer than necessary that 
ALWAYS end in a ';'.
This is not really text editor friendly, but avoids having to escape the 
escape caracter (I'm looking at you XML), makes it easy to parse/strip the 
escape sequences and if you don't you at least get very short spaghetti.

In any case I'm for filtering out unwanted escape codes.

I'm looking forward to get some other opinions on this.

Greetings
- Baschdel

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great Baschdel once stated:
> In my last post on this list I called escape codes spaghetti without much
> of an explaination.
> 
> Tl;dr:
> -Do we want color in text/gemini?
> -If yes how should it be standardized?

  Oh goody!  We still haven't figured out our last question about Gemini:
To reflow or not to reflow, that is the question.  

> I really like that there are creative people out there looking for ways to
> push systems like gemini outside of what they are supposed to be able to
> do.

> And I don't want to decide (at least not now) if the escape code thingy is
> a "bug" or a feataure in most clients. The first question we have ask here
> is if we want color support at all: I know, that gemini should remain
> simple and support for colors takes away a good part of this simplicity.
> But there are a lot of usecases for color support. (Code highlighting,
> pretty ascii art, footnotes, ...) For me these are two pretty strong
> arguments against each other.
> 
> So in case we want color support: Assuming the escape code thingy is a
> feataure: We should standardize which escape codes are allowed and how
> clients should handle them. Assuming it is a "bug": Some people (me
> included after thinking about it) like to see color being used in gemini
> so we have to find another way to put colors in while the text/gemini
> format should remain easy to parse and human readable. (Correct me if I'm
> wrong but that is how I interpret the text/gemini format: Markup that is
> easy to parse but still human readable if you open it in something like a
> text editor)

  Ah, the so called ANSI escape codes, which aren't at all defined by ANSI
but ISO, and are technically known as ECMA-48.  First off, the control codes
that *are* defined by ASCII, codes 0 to 31 [1].  There are only seven
control codes that releate to text (we're excluding the unit separators,
which are rarely, if ever, used):

	07	BEL	audible alarm
	08	BS	move left one space
	09	HT	horizontal tab
	0A	LF	move down one line
	0B	VT	vertical tab
	0C	FF	form feed
	0D	CR	carriage return

  Of these, HT, LF and CR are the most used---the rest not so much in text
files.  Now technically, what we consider the backspace is actually the
action of two control characters, BS (move left one space) and DEL (ignore
this character---if things worked this way, an easy way to get, say, an
umlaut is 'a BS "' or an n with a tilde over it by 'n BS ~'), and there are
fights over how to end a line (CR, LF, or both?  Technically, Microsoft got
this right with both).

  Then there's ECMA-48, which is what is under discussion here.  It's a vast
standard, but they all fall under a few patterns (using a slightly modified
RFC-5234 format---"a" - "z" means the range of characters between "a" and
"z"):

	Pattern group 1 (largely, a few exceptions in here as shown below):

		group1 = %d27 ( "`" - "~")

	Pattern group 2 (largely, a few exceptions in here shown below):

		group2 = %d27 ( "@" - "_")
	               / %d128-175 ; [3]

	Pattern group 3 (these are what is popularly known as ANSI codes,
	and a subset of group 2):

		group3 = (%d27 "[" / %d155) *("0" - "?") *1(" " - "/") ("@" - "~")

	and finally pattern group 4 (and are a subset of group 1):

		cmd     = %d144 / %d152 / %d157 / %d158 / %d159
		        / %d27 ( "P" / "X" / "]" / "^" / "_")
		group4  = cmd *(%d8-13 / " " - "~") (%d156 / %27 "\")

  Group 4 is the most problematic, security wise, as they define actual
messages.  'ESC]' is defined as "OPERATING SYSTEM COMMAND" and 'ESC_' is
"APPLICATION PROGRAM COMMAND".  Not many terminal emulators support group 4,
but they are defined.

  And there are several commands under group 3 that are problematic.
Microsoft defined 'ESC[...p' (which is in the private use area of
ECMA-48) to redefine the keys on the keyboard (so beware of the text file
that does 'ESC[13;"deltree c:\"p').

And there are code that do more than just define colors, they can define
fonts, move the cursor, define locked regions on the screen, set or clear
tab stops, insert lines or characters on the screen, delete lines or
characters on the screen, and even define print parameters.  But oddly, no
defined sequence to query the size of the screen (fancy that!).

  Now, not all terminial emulators support all of ECMA-48, but it's a large
standard (and there is plenty of room for terminal emulators to extend
support with private codes).  

> If I had to standardize it I'd use custom escape sequences, similar to
> those already in use but optimized to be no longer than necessary that
> ALWAYS end in a ';'. 

  Hard to enforce, and the ending in ';' is problematic with respect to the
current standard.  For instance, to set the foreground color blinking and
the background color:

	ESC[31;42;5m

> This is not really text editor friendly, but avoids
> having to escape the escape caracter (I'm looking at you XML), makes it
> easy to parse/strip the escape sequences and if you don't you at least get
> very short spaghetti.

  So how would you define the control sequences?

  One thing I noticed with gemini://konpeito.media/ is that the default page
was very hard for me to read, but that's because the page made one
assumption that isn't universally true---that the default background color
is black.  I don't use a black background on my terminals.

> In any case I'm for filtering out unwanted escape codes.

  That's what I do, only I filter out all escape codes, since I don't want
the screen to mess up and my program get confused as to what is where.

  -spc

[1]	32, or space, is special in that it can be treated as a control code
	(part of the unit separator group starting with 28), or the
	graphical portion (even though it has no graphical representation). 
	And 127 is technically not part of any control group, as it
	technically means "ignore me entirely" [2].

[2]	It comes from paper tape, which originally were 7 bits long, and the
	hole represents a 1.  If there was an error, to fix it, all 7 bits
	were punched out (representing 127) and and reading side was known
	to ignore that character entirely.

[3]	If you are using the UTF-8 encoding scheme, these characters will be
	encoded as UTF-8 codepoints, so 155 is encoded as the byte sequence
	194,155, as if things weren't bad enough.

Link to individual message.

Jason McBrayer <jmcbray (a) carcosa.net>

Sean Conner <sean at conman.org> writes:

> It was thus said that the Great Baschdel once stated:
>> In any case I'm for filtering out unwanted escape codes.
>   That's what I do, only I filter out all escape codes, since I don't
> want the screen to mess up and my program get confused as to what is
> where.

It seems to me that the idea of putting ANSI escape codes
(oversimplifying here; thanks Sean for the more complete explanation)
makes a number of assumptions:

1. That all Gemini clients are running on a terminal (emulator) that
   interprets ANSI escape codes (or the client provides enough terminal
   emulation to interpret these escape codes).

2. That ANSI escape codes sent by Gemini pages will only interact with
   your terminal in a safe way.

3. That it is desirable for untrusted content from the Internet to be
   able to control your terminal.

I, honestly, don't think any of these are true.

My suggestion: that the text/gemini format not be allowed to contain
ANSI escape codes, even with a MIME type specifying an encoding that
could contain them. If you want to send control codes, it should be sent
with an appropriate MIME type. I'm not sure what that would be,
actually ? is text/plain; charset=us-ascii correct? Or does it need to
be application/something?

-- 
Jason McBrayer      | ?Strange is the night where black stars rise,
jmcbray at carcosa.net | and strange moons circle through the skies,
                    | but stranger still is lost Carcosa.?
                    | ? Robert W. Chambers,The King in Yellow

Link to individual message.

Brian Evans <b__m__e (a) mailfence.com>

This is interesting timing. Bombadillo v1.x supported printing any escape 
codes it received.
With 2.0.0 we eliminated that support in favor of filtering out escape 
codes (likely in a naive 
but mostly functional way andy time `\033` is encountered anything that 
follows it until a 
`[A-Za-z]` character is reached will not be rendered (including the 
terminal character that 
ends the sequence). That is where things stand with Bombadillo's mainline release right 
now.

However, there has been an issue in our backlog for awhile now to 
reinstate color. I worked
on this a bit over the last week and have done the following:

I have added a "theme" (previoiusly there was only 'normal' and 'inverse') 
called 'color'. This
theme looks identical to the normal theme, except it will render the 
escape sequences. This
approach allows the user to move in and out of this rendering at will. If 
something doesnt
look right in `color` mode they can always switch to `normal` mode. Sadly, color is not
compatible with `inverse` mode since I use escape sequences to achieve the 
inverse effect 
and it is immediately removed anyone tosses out a `\033[0m`.

I am kind of opposed to developing any syntax for text/gemini that would treat escape
sequences any differently than any other text. Clients can choose how and if to implement
escape codes in a way that makes sense to them and their user-base. I think most clients
will be well served taking Sean's approach of filtering out escape codes. 

For those wanting to keep clients lightweight a simple string replace for 
`\033` to any other
character (maybe a box?) will make the escape codes not render, but still show document 
intent. This is REALLY easy to implement in just about any language.

I really like the usage of color on cat's recent page and hope to see more people using
color. I do 100% agree with Sean though that using color is often presumptuous and I have
ran into issues where my terminal's bg color did not mesh well at all with 
the colors being
used by an application (which is partly why bombadillo's tui does not use colors, only
inversion).

I also agree with Sean (it seems to be happening a lot this time around) that getting
closer to an answer on reflow eventually would be a good idea. As it stands I am 
wrapping but doing no reflow whatsoever.

--?
Sent with https://mailfence.com
Secure and private email

Link to individual message.

Michael Lazar <lazar.michael22 (a) gmail.com>

On Thu, Dec 12, 2019 at 10:18 AM Jason McBrayer <jmcbray at carcosa.net> wrote:

> 1. That all Gemini clients are running on a terminal (emulator) that
>    interprets ANSI escape codes (or the client provides enough terminal
>    emulation to interpret these escape codes).
>
> 2. That ANSI escape codes sent by Gemini pages will only interact with
>    your terminal in a safe way.
>
> 3. That it is desirable for untrusted content from the Internet to be
>    able to control your terminal.
>
> I, honestly, don't think any of these are true.

I agree, and adding to this:

4. That gemini terminal clients write text directly to stdout.

Libraries like curses maintain their own internal screen buffer and don't allow
passing through raw escape codes. Doing so would likely break any TUI display
anyway, since escape codes could also change the cursor position.

My opinion is that ANSI escape codes should be neither endorsed nor prohibited
in the Gemini protocol. Even prohibiting them would add additional complexity,
because then servers would need to worry about what's a valid gemini document
and what isn't. Treat it as a fun little easter egg and leave it at that.

- mozz

Link to individual message.

Aaron Janse <aaron (a) ajanse.me>

> Even prohibiting them would add additional complexity,
> because then servers would need to worry about what's a valid gemini document
> and what isn't. Treat it as a fun little easter egg and leave it at that.

Wouldn't that lead to client support fragmentation? I agree that "prohobiting"
content on the protocol level would be difficult, but would we want to at
least "discourage" things such as ANSI color codes that are difficult to
implement and not part of the spec?

Also, maybe this is off-topic, but in my opinion, a lot of beauty of the gemini
spec is that the source code is so transparent. Once we have escape codes (and
invisible characters), however, the source code is much more opaque. I wouldn't
want to see source code displayed any differently by `cat` than when displayed
by `vim`, for example.

Just my two cents :-)

Link to individual message.

Bradley D. Thornton <Bradley (a) NorthTech.US>



On 12/12/2019 9:02 PM, Michael Lazar wrote:

> 
> My opinion is that ANSI escape codes should be neither endorsed nor prohibited
> in the Gemini protocol. Even prohibiting them would add additional complexity,
> because then servers would need to worry about what's a valid gemini document
> and what isn't. Treat it as a fun little easter egg and leave it at that.
> 

The way we handle this in the BBS world is as follows:

1.) If server is capable, then it checks or asks if the client is ANSI
capable.

2.) If the client supports ANSI (many do, and many don't) then ANSI is
served.

The level of complication is mitigated by this approach, especially with
regards to client implementation.

I would note that many on this list expressed interest and were
supportive of the effort, while using the approach above would not
disenfranchise those not interested.

-- 
Bradley D. Thornton
Manager Network Services
http://NorthTech.US
TEL: +1.310.421.8268

Link to individual message.

Sean Conner <sean (a) conman.org>

It was thus said that the Great Bradley D. Thornton once stated:
> On 12/12/2019 9:02 PM, Michael Lazar wrote:
> 
> > My opinion is that ANSI escape codes should be neither endorsed nor prohibited
> > in the Gemini protocol. Even prohibiting them would add additional complexity,
> > because then servers would need to worry about what's a valid gemini document
> > and what isn't. Treat it as a fun little easter egg and leave it at that.
> 
> The way we handle this in the BBS world is as follows:
> 
> 1.) If server is capable, then it checks or asks if the client is ANSI
> capable.
> 
> 2.) If the client supports ANSI (many do, and many don't) then ANSI is
> served.
> 
> The level of complication is mitigated by this approach, especially with
> regards to client implementation.
> 
> I would note that many on this list expressed interest and were
> supportive of the effort, while using the approach above would not
> disenfranchise those not interested.

  And thus we get user-agent strings being sent.

  -spc (I'm just saying ... )

Link to individual message.

---

Previous Thread: New server

Next Thread: [ANN] Castor, a graphical Gemini browser