💾 Archived View for auragem.letz.dev › devlog › 20240325.gmi captured on 2024-09-29 at 00:21:30. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-05-10)
-=-=-=-=-=-=-
This is part of my series on re-assessing the designs of Gemini and Gopher:
2024-03-22 Gopher's Uncontextualized Directories vs. Gemini's Contextualized Directories
2024-03-23 What Gemini Gets Wrong With Anti-Extensibility
2024-03-24 The Necessary Semantics behind Emphasis and Strong
Today we are going to talk about List Nesting, which Gemini deliberately excluded to keep the gemtext format very simple. I believe this was a mistake, because at the very least 2-3 levels of nesting would take care of most common uses for list-nesting. In fact, Gemini already has heading-nesting up to a max of 3 levels. The same approach can be used for lists.
Part of the reason Gemini might have done away with nesting is because of the Markdown syntax, which indents the nested bullets using spaces. This markdown-like syntax breaks one of Gemini's rules, which is that a client must be able to determine a linetype within its first three characters. However, the syntax we choose determines whether clients will be able to easily parse nested lists or not, and Markdown is not the only syntax we can choose.
The most obvious use for nested lists is todo lists. Nesting allows you to have bullets of actions to take that can be further subdivided. Here's an example:
### 2023-03-25
The other obvious use is an outline and table of contents:
# List Nesting Outline
While clients can already take a document and construct its outline or table of contents, if someone was writing an outline separate from the paper or document, they might want to be able to present this outline over gemtext or markdown. Using preformated text might work, but it breaks word-wrapping and removes semantic detail.
Here's another use that is particularly common in repository Readme files - features lists for software:
# Scroll-Term Features:
There's one more important one - note taking, or summary notes. It looks very much similar to the above, although there's more reason in notes to want multiple paragraphs in one bullet.
All of the uses above have one core purpose: semantic detail. Lists have meaning. Sub-lists have semantic meaning. Why don't we just use text? Because plain paragraphs don't have the semantic detail of relationships between themselves.
In fact, as I referenced before, this semantic idea of nesting is the same principle used by headings. What makes nesting lists different from headings is that list items are not sections, they are paragraphs themselves. They are items. Headings cover sections, bullets cover items.
In English we have lists too! I can list off things, items, words, attributes, and actions, using commas. What we *do not* have in most languages is sublists, at least not in a way that is easy to understand. My list of items, including objects, ideas, emotions, and concepts, words, including nouns, verbs, adjectives, prepositions, and particles, attributes, including color, size, demeanor, background, and context, and actions, including passive and active, including participles, abverbials, and basic verbs.
It's hard to tell where the heck the sublists end, isn't it? (There's that sneaky "and" + last item.) It's also a little ambiguous whether some are sublists of sublists. Not to mention it is really hard to read as well! I *could* use parentheses to list out these items (including objects, ideas, emotions, concepts), words (including nouns, verbs, adjectives, prepositions, particles), attributes (including color, size, demeanor, background, context), and actions (including passive and active (including participles, adverbials, and basic verbs)).
Unfortunately, nesting parentheses more than 2 levels makes your list look like Lisp. Yuck!! And yet, even so, we've just introduced a syntax. A parser could try to read this syntax, actually, but determining whether parentheses are being used for lists or something else is certainly a problem. Parsing it would certainly be harder than Markdown nested lists, though.
Hey! Why don't we just introduce a nested list syntax! Parsers and clients could then read it. Clients could even *control the presentation* of this list, instead of putting it in preformatted toggles where the *author* has to control the presentation. But what will our syntax be...
Gemini seemed to have rested on the assumption that to support nesting, we must trade ease of parsing. This is an incorrect assumption. But let us first cover why Markdown syntax might be rejected.
Markdown syntax for normal lists places an asterisk (or hyphen) at the beginning of a new line followed by a space and some text, like so:
For sublists, you *indent* the list item with spaces, like so:
Parsers now have to make sure to read every line to see if it was indented by spaces, and if this indent is followed by an asterisk or hyphen, then it's a sublist. This isn't *hard*, but it is a bit more work than necessary if we just switched the syntax to something easier.
The benefit of Markdown's syntax is that it's very readable if one were to just read the raw text of the markdown document. It's almost as if Markdown allows its raw presentation to be controlled by the author (they can specify how much indentation they want, for example), but markdown's presentation by clients isn't. I suppose it tries to find a balance between these two sides.
Since AsciiDoc is primarily used to convert to other documents, it doesn't worry about the readability of the raw text, it cares about its parseability (kinda, it sometimes puts modifiers/attributes on different lines from what they modify, which makes the parser less line-based). Here is AsciiDoc's syntax:
Notice how similar this syntax is to markdown and gemtext headings:
# This is a heading ## sub-heading ### sub-sub-heading ## Another sub-heading # Another heading
Using AsciiDoc's list nesting with gemtext/markdown's headings creates an interesting consistency between list items and headings, both of which allow nesting to convey relatioships between different items and sections, respectively. Parsers can parse AsciiDoc's lists just as well as they could parse nested headings.
The main consideration is going to be readability of the raw text. AsciiDoc's list nesting is not *unreadable*, but it does take some time to get used to. Here's this post's outline presented with AsciiDoc instead of the Markdown version from above:
# List Nesting Outline
One main reason for the decreased readability could actually just be where the lines start. Because we can have an optional number of whitespace at the start of lines without complicating parsers, we could easily fix it:
# List Nesting Outline
There, much more readable. Nested bullets now have two spaces after their linetype prefix. This is totally readable for those who care about the readability of the raw text. And it still works well with our parsers.
So far we have only talked about unordered lists, but there are also ordered lists. Now, supporting these actually is more complicated, but I will make the case that making these optional is totally doable, and it was inspired by this post by Acidus:
(And see what I did there to make an ordered list? BOOM! Another thing @freezr brought up that you really don't need to worry about. Client could detect that pattern if they really want to and render it in a special way, just like how clients like Lagrange detect link lines whose link text starts with an emoji and render it in a special way.)
Tables in Gemtext, the non-hacky way
The way he did ordered lists was by starting each item with the exact same bullet syntax as unordered lists, but then placing the number + dot right after, like so:
# List Nesting Outline
Parsers that don't parse this will still print out these numbers/labels, just with a bullet attached to the front. Parsers that want to support the numbers/letters, however, can. I wrote code in Rust to support this quite easily, which you can see below.
The other thing that Markdown supports is nesting paragraphs under bullets. While we don't need to support this, we *could* add an extra syntax for this pretty simply (by using `*+`), and it wouldn't necessarily require much more extra code (not even in the renderer, which just has to indent already-existing rendering code):
# List Nesting Outline
Now, we *could* go all out and allow `*>` and `*=>` as well, but that's a bit much, and might not be used all that often, if at all. Sub-paragraphs under lists, though, *are* used quite a lot.
In reality, parsing gemtext is a fairly insubstantial percentage of a whole Gemini client, which must deal with printing, a GUI/TUI/CLI system, TLS, User Certificates, navigation, and TOFU. If we can add a couple of features to our markup language with minimal changes that adds a giant impact to the usability of the protocol as a whole, then that is a big win.
It turns out my gemtext parsing is about 200 lines of code, and it handles Gemtext, Spartan's Gemtext variant, and a subset of scrolltext and markdown (excluding inline links), with basic word-wrapping. Printing out emphasis, italics, and monospace within paragraphs and list items is another 100 lines that took an hour or so to write - and it has about 13 lines of printf statements for the CLI presentation of gemtext. To support GUIs and TUIs, that part would go up a little, like most everything else in GUI code, but it would be marginal compared to the rest of the GUI that would be required.
In contrast, my Scroll Protocol client library code, which is based on makeworld's Gemini client code, is over 800 lines of code, excluding the actual command line UI stuff. The gemtext parsing code is about 26% of 1100 lines of code for a basic client without TOFU handling or certificate storage, and with basic gemtext/scrolltext parsing. **To support bullet with nesting using AsciiDoc's syntax, I added just 17 lines of code, 3 of which were added for just the nesting itself,** which took about 30 minutes to do.
When we make decisions about formats, we need to look at real code. Many times our assumptions turn out to be completely different from the reality when we start writing the code to solve the problem. Sometimes it takes thinking out of the box, but sometimes it takes just actually writing the code.
Bullet nesting is such an easy addition that requires fairly little effort on the part of client writers, and it impacts usability and expressiveness a lot; there's significantly more power than weight. It does, however, depend on *how* we do this. Markdown's approach is non-intuitive to parsers, but that's partially because it is focused on the raw-text still being readable. AsciiDoc, on the other hand, uses an approach that is significantly easier to parse, but requires some adjustments for readers of the raw text. I do believe these adjustments are fairly small and ultimately worth it.
Here's example code in Golang, taken from my Scroll-Term client, that will parse list nesting, up to a max of 4 levels. I have not added `*+` support yet, but it should be fairly simple.
Note that this code is not much more than what it would take to do lists without nesting. I supported nesting by adding 3 new lines to the code. That's it, *just three.* Instead of assuming the level is 1, I find what the level is by taking the index of the first character on the line that is not an asterisk.
if strings.HasPrefix(line, "* ") || strings.HasPrefix(line, "** ") || strings.HasPrefix(line, "*** ") || strings.HasPrefix(line, "**** ") { // Note that this works because we know that asterisk is one byte in UTF-8. level := strings.IndexFunc(line, func(r rune) bool { return r != '*' }) parsingState := TextParsingState{} wordWrapper = wordwrap.Wrapper(context.maxwidth-2-((level-1)*2), false) multiline := wordWrapper(strings.TrimLeft(line[level:], " \t")) lines := strings.Split(multiline, "\n") for i, line := range lines { if i == 0 { fmt.Printf("%s%s* ", indentationString, strings.Repeat(" ", level-1)) } else { fmt.Printf("%s%s ", indentationString, strings.Repeat(" ", level-1)) } parsingState.print_markdown(bufio.NewReader(strings.NewReader(line)), true) fmt.Printf("\n") } }
Here's a rust example that will parse bullets as well as get the number for ordered bullets. It returns an enum type that contains the necessary information for the bullet:
func parse_bullet(line &str) -> ScollLine { if line.starts_with("* ") || line.starts_with("*\t") || line.starts_with("** ") || line.starts_with("**\t") || line.starts_with("*** ") || line.starts_with("***\t")|| line.starts_with("**** ") || line.starts_with("****\t") { // Note that this works because we know that asterisk is one byte in UTF-8. let Some(level) = line.find(|c| { return c != '*' }) else { todo!() }; let line = line[level..].trim_start(); let (ordered, label, text) = bullet_is_ordered(line); if ordered { ScrollLine::OrderedBullet(label, text, level) } else { ScrollLine::UnorderedBullet(line, level) } } } // Pass in a line string slice with the bullet prefix (e.g., "* ") trimmed. Returns whether it's an // ordered or unordered bullet, the label, and the text. If unordered, the label is an empty slice. fn bullet_is_ordered(line: &str) -> (bool, &str, &str) { let mut text = line.trim_start(); let Some(label_end) = text.find(|c: char| { return !c.is_digit(10) && c != '.' }) else { todo!() }; return if label_end == 0 { // Unordered (false, &text[0..0], text) } else { // Ordered let label = &text[..label_end].trim_end_matches("."); text = &text[label_end..]; (true, label, text) } }
Here are the next articles in this series:
2024-03-26 The Case for a 4th-Level Heading
2024-03-27 Who Controls Presentation? Presentation vs. Semantics