A proposed scheme for parsing preformatted alt text

Nathan Galt mailinglists at ngalt.com

Fri Sep 11 20:56:08 BST 2020

- - - - - - - - - - - - - - - - - - -

On Sep 11, 2020, at 11:36 AM, Luke Emmet <luke at marmaladefoo.com> wrote:

On 11-Sep-2020 18:28, Gary Johnson wrote:

[...]

To assuage any concerns about opening the door to endless possible

syntaxes for machine-readable processing of preformatted blocks being

introduced to Gemtext, I'll retract my previous suggestion of using the

top line for (generic) machine-readable purposes and the bottom line for

the alt text.

The only real machine-readable behavior I want to see in Gemini is

source code syntax highlighting anyway, and the spec already supports

and encourages that. In all other cases, alt text (which can clearly be

ignored or used "at the client's discretion" as per the spec) is just

fine on the top line of preformatted blocks. I suppose I don't really

see much machine-readable value in tagging a block as "image" or "table"

currently anyway. YMMV

To that end, maybe we just need some community agreement (and/or a

clearer codification in the Gemini spec) of how to use alt text "for

computer source code to identify the programming language which advanced

clients may use for syntax highlighting".

Whilst I think it is nice to support a practice of source code language labelling (to assist syntax highlighting), I think it would be insufficient to cover current usage practice.

In particular, I'm thinking of ANSI markup that some authors sprinkle in their content.

ANSI codes are effectively platform-specific formatting instructions (for example foreground, background colours) that are unique to a terminal type client.

If authors wish to use these in preformatted regions, they really should be hinting this to the client so that the client can take appropriate steps to render (or ignore) the terminal ANSI codes. The interpretation of these terminal escape codes is not treating the content as plain text, but rather to take a particular content interpretation, to drive the visual UI. Similar to embedded <font> tags, you might say.

So this in itself suggests the need to be able to hint at content-type in the preformatted region.

e.g.

```text/x-ansi

(ANSI marked up content in color etc)

``

(or perhaps ```content-type: text/x-ansi to label it correctly)

Or are we going to say these implementation terminal escape codes are left as an ad-hoc convention? That seems to have its own risks as discussed on this thread and elsewhere.

Best Wishes

- Luke

[shock and horror that people are using ANSI codes for color]

Prior reading:

https://en.wikipedia.org/wiki/Escape_character#ASCII_escape_character=

https://the.exa.website/ a modern ls(1) replacement

I think ANSI color codes are up to 24-bit color now. Not all terminals support them (Terminal.app doesn’t; iTerm2 does), but they’re out there. I was looking up color codes so I could make my EXA_COLORS variable nicer and the whole process wasn’t pleasant.

Sounds like a good reason to explicitly disallow U+001B in the text/gemini spec and:

- give dirty looks to any page author that uses it- give dirty looks to any client author that doesn’t strip it out before presenting it to the user (whether the client is terminal-based or in a GUI)