💾 Archived View for auragem.letz.dev › devlog › 20240325.gmi captured on 2024-09-29 at 00:21:30. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-05-10)

-=-=-=-=-=-=-

2024-03-25 The Simplicity of List Nesting: How AsciiDoc Does It

This is part of my series on re-assessing the designs of Gemini and Gopher:

2024-03-22 Gopher's Uncontextualized Directories vs. Gemini's Contextualized Directories

2024-03-23 What Gemini Gets Wrong With Anti-Extensibility

2024-03-24 The Necessary Semantics behind Emphasis and Strong

Today we are going to talk about List Nesting, which Gemini deliberately excluded to keep the gemtext format very simple. I believe this was a mistake, because at the very least 2-3 levels of nesting would take care of most common uses for list-nesting. In fact, Gemini already has heading-nesting up to a max of 3 levels. The same approach can be used for lists.

Part of the reason Gemini might have done away with nesting is because of the Markdown syntax, which indents the nested bullets using spaces. This markdown-like syntax breaks one of Gemini's rules, which is that a client must be able to determine a linetype within its first three characters. However, the syntax we choose determines whether clients will be able to easily parse nested lists or not, and Markdown is not the only syntax we can choose.

The Uses of List Nesting

The most obvious use for nested lists is todo lists. Nesting allows you to have bullets of actions to take that can be further subdivided. Here's an example:

### 2023-03-25


  * Research
    * Markdown's syntax
    * AsciiDoc's syntax
    * Uses
  * Write the parsing code in scroll-term-golang and scroll-term-rust
  * Write
  * Revise
  * Edit

  * Chapter 9 on Error Handling
  * Chapter 10 on Generic Types, Traits, and Lifetimes

  * Research
    * Markdown vs. AsciiDoc vs. Gemtext
  * Write
  * Revise
  * Edit

  * Make less extensible
    * Mimetype parameters
    * Prevent URL parameters
    * last field of requests and responses should allow all characters but CRLF (following anti-extensibility principles I wrote about earlier)

The other obvious use is an outline and table of contents:

# List Nesting Outline

  * Todo lists
  * Outlines and Table of Contents
  * Feature lists
  * Note-taking and summary notes

  * The semantics of nested lists - a semantic that is unique to them

  * Markdown and why it won't work
  * AsciiDoc's solution
  * Our proposed solution = AsciiDoc

While clients can already take a document and construct its outline or table of contents, if someone was writing an outline separate from the paper or document, they might want to be able to present this outline over gemtext or markdown. Using preformated text might work, but it breaks word-wrapping and removes semantic detail.

Here's another use that is particularly common in repository Readme files - features lists for software:

# Scroll-Term

Features:

  * go, home, back, forward, up, root
  * refresh, download, url

  * Metadata and Abstracts
  * Language request parameter

  * Text Streaming: all text documents are converted and printed as they are streamed in.
  * Audio Streaming: mp3, ogg vorbis, wave, and flac
  * Video files piped into mpv

There's one more important one - note taking, or summary notes. It looks very much similar to the above, although there's more reason in notes to want multiple paragraphs in one bullet.

The Purpose of Nesting Lists

All of the uses above have one core purpose: semantic detail. Lists have meaning. Sub-lists have semantic meaning. Why don't we just use text? Because plain paragraphs don't have the semantic detail of relationships between themselves.

In fact, as I referenced before, this semantic idea of nesting is the same principle used by headings. What makes nesting lists different from headings is that list items are not sections, they are paragraphs themselves. They are items. Headings cover sections, bullets cover items.

In English we have lists too! I can list off things, items, words, attributes, and actions, using commas. What we *do not* have in most languages is sublists, at least not in a way that is easy to understand. My list of items, including objects, ideas, emotions, and concepts, words, including nouns, verbs, adjectives, prepositions, and particles, attributes, including color, size, demeanor, background, and context, and actions, including passive and active, including participles, abverbials, and basic verbs.

It's hard to tell where the heck the sublists end, isn't it? (There's that sneaky "and" + last item.) It's also a little ambiguous whether some are sublists of sublists. Not to mention it is really hard to read as well! I *could* use parentheses to list out these items (including objects, ideas, emotions, concepts), words (including nouns, verbs, adjectives, prepositions, particles), attributes (including color, size, demeanor, background, context), and actions (including passive and active (including participles, adverbials, and basic verbs)).

Unfortunately, nesting parentheses more than 2 levels makes your list look like Lisp. Yuck!! And yet, even so, we've just introduced a syntax. A parser could try to read this syntax, actually, but determining whether parentheses are being used for lists or something else is certainly a problem. Parsing it would certainly be harder than Markdown nested lists, though.

Hey! Why don't we just introduce a nested list syntax! Parsers and clients could then read it. Clients could even *control the presentation* of this list, instead of putting it in preformatted toggles where the *author* has to control the presentation. But what will our syntax be...

Syntax

Gemini seemed to have rested on the assumption that to support nesting, we must trade ease of parsing. This is an incorrect assumption. But let us first cover why Markdown syntax might be rejected.

Markdown syntax for normal lists places an asterisk (or hyphen) at the beginning of a new line followed by a space and some text, like so:


For sublists, you *indent* the list item with spaces, like so:


  - sub-item
    * sub-sub-item
  - Another sub-item

Parsers now have to make sure to read every line to see if it was indented by spaces, and if this indent is followed by an asterisk or hyphen, then it's a sublist. This isn't *hard*, but it is a bit more work than necessary if we just switched the syntax to something easier.

The benefit of Markdown's syntax is that it's very readable if one were to just read the raw text of the markdown document. It's almost as if Markdown allows its raw presentation to be controlled by the author (they can specify how much indentation they want, for example), but markdown's presentation by clients isn't. I suppose it tries to find a balance between these two sides.

Since AsciiDoc is primarily used to convert to other documents, it doesn't worry about the readability of the raw text, it cares about its parseability (kinda, it sometimes puts modifiers/attributes on different lines from what they modify, which makes the parser less line-based). Here is AsciiDoc's syntax:


Notice how similar this syntax is to markdown and gemtext headings:

# This is a heading
## sub-heading
### sub-sub-heading
## Another sub-heading
# Another heading

Using AsciiDoc's list nesting with gemtext/markdown's headings creates an interesting consistency between list items and headings, both of which allow nesting to convey relatioships between different items and sections, respectively. Parsers can parse AsciiDoc's lists just as well as they could parse nested headings.

The main consideration is going to be readability of the raw text. AsciiDoc's list nesting is not *unreadable*, but it does take some time to get used to. Here's this post's outline presented with AsciiDoc instead of the Markdown version from above:

# List Nesting Outline

One main reason for the decreased readability could actually just be where the lines start. Because we can have an optional number of whitespace at the start of lines without complicating parsers, we could easily fix it:

# List Nesting Outline

There, much more readable. Nested bullets now have two spaces after their linetype prefix. This is totally readable for those who care about the readability of the raw text. And it still works well with our parsers.

Ordered vs. Unordered Lists

So far we have only talked about unordered lists, but there are also ordered lists. Now, supporting these actually is more complicated, but I will make the case that making these optional is totally doable, and it was inspired by this post by Acidus:

(And see what I did there to make an ordered list? BOOM! Another thing @freezr brought up that you really don't need to worry about. Client could detect that pattern if they really want to and render it in a special way, just like how clients like Lagrange detect link lines whose link text starts with an emoji and render it in a special way.)

Tables in Gemtext, the non-hacky way

The way he did ordered lists was by starting each item with the exact same bullet syntax as unordered lists, but then placing the number + dot right after, like so:

# List Nesting Outline

Parsers that don't parse this will still print out these numbers/labels, just with a bullet attached to the front. Parsers that want to support the numbers/letters, however, can. I wrote code in Rust to support this quite easily, which you can see below.

Multiple Paragraphs In a Bullet?

The other thing that Markdown supports is nesting paragraphs under bullets. While we don't need to support this, we *could* add an extra syntax for this pretty simply (by using `*+`), and it wouldn't necessarily require much more extra code (not even in the renderer, which just has to indent already-existing rendering code):

# List Nesting Outline

Now, we *could* go all out and allow `*>` and `*=>` as well, but that's a bit much, and might not be used all that often, if at all. Sub-paragraphs under lists, though, *are* used quite a lot.

How Much More Code Is Really Needed to Parse Nesting?

In reality, parsing gemtext is a fairly insubstantial percentage of a whole Gemini client, which must deal with printing, a GUI/TUI/CLI system, TLS, User Certificates, navigation, and TOFU. If we can add a couple of features to our markup language with minimal changes that adds a giant impact to the usability of the protocol as a whole, then that is a big win.

It turns out my gemtext parsing is about 200 lines of code, and it handles Gemtext, Spartan's Gemtext variant, and a subset of scrolltext and markdown (excluding inline links), with basic word-wrapping. Printing out emphasis, italics, and monospace within paragraphs and list items is another 100 lines that took an hour or so to write - and it has about 13 lines of printf statements for the CLI presentation of gemtext. To support GUIs and TUIs, that part would go up a little, like most everything else in GUI code, but it would be marginal compared to the rest of the GUI that would be required.

In contrast, my Scroll Protocol client library code, which is based on makeworld's Gemini client code, is over 800 lines of code, excluding the actual command line UI stuff. The gemtext parsing code is about 26% of 1100 lines of code for a basic client without TOFU handling or certificate storage, and with basic gemtext/scrolltext parsing. **To support bullet with nesting using AsciiDoc's syntax, I added just 17 lines of code, 3 of which were added for just the nesting itself,** which took about 30 minutes to do.

When we make decisions about formats, we need to look at real code. Many times our assumptions turn out to be completely different from the reality when we start writing the code to solve the problem. Sometimes it takes thinking out of the box, but sometimes it takes just actually writing the code.

Bullet nesting is such an easy addition that requires fairly little effort on the part of client writers, and it impacts usability and expressiveness a lot; there's significantly more power than weight. It does, however, depend on *how* we do this. Markdown's approach is non-intuitive to parsers, but that's partially because it is focused on the raw-text still being readable. AsciiDoc, on the other hand, uses an approach that is significantly easier to parse, but requires some adjustments for readers of the raw text. I do believe these adjustments are fairly small and ultimately worth it.

Golang Code Example

Here's example code in Golang, taken from my Scroll-Term client, that will parse list nesting, up to a max of 4 levels. I have not added `*+` support yet, but it should be fairly simple.

Note that this code is not much more than what it would take to do lists without nesting. I supported nesting by adding 3 new lines to the code. That's it, *just three.* Instead of assuming the level is 1, I find what the level is by taking the index of the first character on the line that is not an asterisk.

if strings.HasPrefix(line, "* ") || strings.HasPrefix(line, "** ") || strings.HasPrefix(line, "*** ") || strings.HasPrefix(line, "**** ") {
	// Note that this works because we know that asterisk is one byte in UTF-8.
	level := strings.IndexFunc(line, func(r rune) bool {
		return r != '*'
	})
	parsingState := TextParsingState{}
	wordWrapper = wordwrap.Wrapper(context.maxwidth-2-((level-1)*2), false)
	multiline := wordWrapper(strings.TrimLeft(line[level:], " \t"))
	lines := strings.Split(multiline, "\n")
	for i, line := range lines {
		if i == 0 {
			fmt.Printf("%s%s* ", indentationString, strings.Repeat("  ", level-1))
		} else {
			fmt.Printf("%s%s  ", indentationString, strings.Repeat("  ", level-1))
		}
		parsingState.print_markdown(bufio.NewReader(strings.NewReader(line)), true)
		fmt.Printf("\n")
	}
}

Rust Code Example, With Ordered List Support

Here's a rust example that will parse bullets as well as get the number for ordered bullets. It returns an enum type that contains the necessary information for the bullet:

func parse_bullet(line &str) -> ScollLine {
	if line.starts_with("* ") || line.starts_with("*\t") || line.starts_with("** ") || line.starts_with("**\t") || line.starts_with("*** ") || line.starts_with("***\t")|| line.starts_with("**** ") || line.starts_with("****\t") {
	 	// Note that this works because we know that asterisk is one byte in UTF-8.
	    let Some(level) = line.find(|c| {
	        return c != '*'
	    }) else { todo!() };
	    let line = line[level..].trim_start();
	    let (ordered, label, text) = bullet_is_ordered(line);
	    if ordered {
	        ScrollLine::OrderedBullet(label, text, level)
	    } else {
	        ScrollLine::UnorderedBullet(line, level)
	    }
	}
}

// Pass in a line string slice with the bullet prefix (e.g., "* ") trimmed. Returns whether it's an
// ordered or unordered bullet, the label, and the text. If unordered, the label is an empty slice.
fn bullet_is_ordered(line: &str) -> (bool, &str, &str) {
    let mut text = line.trim_start();
    let Some(label_end) = text.find(|c: char| {
        return !c.is_digit(10) && c != '.'
    }) else { todo!() };

    return if label_end == 0 { // Unordered
        (false, &text[0..0], text)
    } else { // Ordered
        let label = &text[..label_end].trim_end_matches(".");
        text = &text[label_end..];
        (true, label, text)
    }
}

Continue the Series

Here are the next articles in this series:

2024-03-26 The Case for a 4th-Level Heading

2024-03-27 Who Controls Presentation? Presentation vs. Semantics

2024-03-28 Headers, Footers, Sidebars, and Footnotes