💾 Archived View for auragem.letz.dev › devlog › 20240506.gmi captured on 2024-12-17 at 09:53:38. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-05-10)
-=-=-=-=-=-=-
I have been disatisfied with Gemini for a while. It's not because it's terrible, but because it doesn't meet what I think its full potential is, and excuses are often made for this. There is a diverging of values between what I wished Gemini to be vs. what Gemini has developed into. I am going to do the unspeakable and post a former Node.JS developer's[^1] talk on Platform Values, where he explains how the values of Node.JS in the beginning changed around 2012-2014 to become more aligned to the values of JavaScript. Originally, Node.JS was focused on performance, simplicity, and approachability, but over time the community moved toward JavaScript's values of velocity, expressiveness, and approachability, and this resulted in many core developers leaving the project.
Platform as a Reflection of Values by Bryan Cantrill
[^1] Bryan Cantrill also has lots of experience in systems programming, working on Sun Microsystems, and later Oracle, and then Joyent, and now Oxide Computer Company.
I will start with my criticisms of Gemini's current core-values:
Gemini does not live up to some of these core values, and it sacrifices some core values for others to what I believe to be the detriment of the protocol, but more importantly, the default textual document format (gemtext).
Gemini confuses minimalism with simplicity and approachability. Like in music, minimalism is not always approachable in that some minimalist music is seen as boring by the mainstream, but more importantly, minimalist music can also be more complex and harder to create than less-minimalist music. It's harder to attach emotions to minimalist music, it's harder to convey meaning. Minimalism is always in tension with approachability, simplicity, and expressiveness. However, as will be demonstrated later, Gemini prioritizes minimalism over everything else almost at all times, to a great detriment to expressiveness, which is hard to imagine for a protocol and document format that is meant to provide a "timeless" and "text centric" reading experience.
Gemini's aspirations towards User Autonomy and Privacy are respectable, but it takes an all-or-nothing approach that lacks balance and nuance. For example, the lack of inline images are excused by the core value of user autonomy. When criticisms are made, the FAQ responds with:
nobody in Gopherspace complains about it
This is a clear case of selection bias. If you only listen to those who currently use Gemini, then you are only looking at opinions that are more likely to agree with you because they use the protocol. This same technique is also used by certain individuals on Twitter and other Social Media sites when they create polls. Their polls are always biased towards those who would be on Twitter or Social Media, and towards those who follow or read their tweets. In addition, a group of people can use a protocol or software and still wish it had some feature, even if they don't complain about it, or rather, even if you don't *see them* complain about it.
My response to inline images is this: users expect that documents that allow inline images might actually use inline images; users are not stupid, don't insult them. Sometimes Gemini goes so far to the extreme User Autonomy side that it forgets that inline images are not JavaScript and they can't run arbitrary code on the user's computer. In the case of reducing number of connections made in the background, a solution is embedding images into documents (which is already doable with clients that support `data:` URLs). Regardless, there is a bigger implication underlying how inline images are responded to that I view to be more severe and extreme, and it is weaved into the fabric of the Gemini community and parts of the Gopher community as a fundamental foundation, a foundation that I find to be extremely flawed and is one of the reasons for the divergence of core values in the Scroll Protocol.
As with privacy, Gemini's core value on privacy is notable, but it's not wholly successful, unfortunately. I will talk more about this in the Anti-extensibility section.
Gemini's FAQ overstates its text-centric and "timeless" experience. This is the *biggest* complaint I have with Gemini - its document format does not even come close to meeting this core value. What good is a library with limited semantic meaning? The expressiveness of gemtext is severely hampered by arbitrary restrictions that *are not* informed by real-world examples or experience in writing parsers. The implicit assumption behind the FAQ's response is that all drawn lines will be arbitrary.
a line has to be drawn somewhere, even if it's ultimately arbitrary, otherwise expansion continues forever.
Emphasis and strong (which are presented as italics and bold in visual-textual documents) have been in printed and electronic textual documents for a very long time. They have value because they convey meaning that has no linguistic equivalent. Gemini seems to be just fine with punctuational markup because it has become standard in most writing and because it is a part of the text encoding that underlies gemtext documents (i.e., UTF-8, etc.). Punctuational markup includes spaces, commas, parentheses, quotes, periods, exclamation points, and question marks, among others. They all have semantic value. They convey phrases, pauses, parentheticals, ends of sentences, word boundaries, and mood and voice, which are often conveyed with intonation in audal language. Interestingly, many of these were invented much later on, with spaces originating from around the 7th century by Irish and British scribes. Verses and Chapters in the Bible are not original to the text and were added to the Bible when it was first printed in the Geneva Bible in 1560, although an original sectioning scheme was used at least 1500 years prior in the Hebrew Manuscripts, often called the open and closed parsha (petuhah and setumah, respectively), as well as the Sedarim.
Emphasis and strong also convey semantic detail that would be conveyed through phrase-level intonation in audal language (and other methods in sign language), and yet they are missing from gemtext, even when other forms of semantic detail are presented visually, like blockquotes, headings, and list-items. This doesn't just hamper semantic detail and user autonomy, it also hampers text-to-speech systems, and other potential translation systems for those with disabilities.
For users to be able to control the presentation of documents, browsers and document viewers must be able to parse the different elements that users might want to change the visuals of. Having a consistent syntax for representing these is necessary so that clients have something specific that they can use to parse. Elements that users might want to have control over might include thematic breaks, footnotes/endnotes, tables, nested list items, and ordered lists. These are all presentable in plain text, but you miss the parseability aspect that is required for clients to affect the visual presentation of them.
There are two underlying assumptions behind much of the missing textual features of gemtext:
Those in geminispace severely overestimate the work it requires to parse inline markup. I wrote an article recently on Strong and Emphasis:
2024-03-24 The Necessary Semantics behind Emphasis and Strong
When I wrote this, I wanted to show the importance of these old and useful semantic ideas, but I also wanted to present a consistent syntax that tries to adhere to some common usage and reduces edge-cases. Had I removed the goal of using what is common, a consistent syntax without edge cases would have been easy, and has been done in many other markup languages. I also presented Golang code that would parse these from a stream. What I found is that it consisted of 350 lines of code of three separate functions to both parse and print to the terminal the scroll, asciidoc, and markdown variants of emphasis and strong inline markup. If you just wanted the parser for scroll, it's just 107 lines of code. This isn't much compared to the rest of the gemtext/scrolltext parser, let alone the rest of the client itself.
I want to quickly bring up a point Gemini's FAQ brings up:
It seems on the face of it like it would be a trivial extra little thing to add, but it would force a switch from line-by-line parsing of the format with a single bit of state ... to character-by-character, or at least word-by-word, parsing of the format with three bits of state, introduce a bunch of edge-cases and make it easy to accidentally mangle a document.
Yes, the parser would introduce two more bytes, or bits if using a bitfield (which of course means there would be no additional memory usage due to padding), of information. And yes, it *could* introduce edge-cases if you didn't define the syntax well, like in Markdown. It also could allow one to mangle a document by accident, just like they already can with preformatted toggles. Of course, document authors would be able to tell whether a document is mangled by checking how it is rendered. A consistent and well-thought-out syntax would also remove edge-cases. Additionally, the parser would actually just combine line-by-line with character-by-character, so you don't actually need to remove the line-by-line part of the parser, you would just add the character-by-character part. Scroll-term and Profectus take this obvious approach, and they both also parse word-by-word anyways to get word-wrapping. More importantly, this has been a solved problem that doesn't take a lot of work to write by even the most basic of programmers, and it is ultimately very little code in comparison to the TLS, TOFU, and rendering parts of a client that are required by the Gemini spec. The power-to-weight ratio leans towards power in my opinion, and that is why I made this an optional feature for clients in the Scroll Protocol spec. Not to mention people will still use markdown-like syntax for strong and emphasis even if it's unsupported by clients or not in the spec, and text-to-speech parsers would probably want to add the parser in anyways so that the text isn't disruptive when the markup occurs. Again, user control of presentation requires that the elements to be modified have a standard syntax that can be parsed.
I wrote two other articles that readers may find interesting, on Gemini's arbitrary lack of list nesting and headings *three* levels under the document title (or what would be more-consistently called a "level-4 heading" in gemtext):
The Simplicity of List Nesting: How AsciiDoc Does It
The Case for a 4th-Level Heading
Onto the next point: Gemini takes a hard stance against graphic design and the visual arts having semantic substance. Typography is seen as supporting the textual document, like drums and harmony support mainstream music genres. But just as there's genres that consist primarily of drums or harmony at the forefront, there are genres of graphic design and art where the typography or the visual presentation is at the forefront of the semantic meaning of the art. The division between presentation and semantics is more fluid than what is typically described by users within Geminispace, and that is the main point of my recent article on this:
Who Controls Presentation? Presentation vs. Semantics
Both approaches, presentational markup and semantic markup, have their uses. Even gemtext allows for limited presentational markup in the form of preformatted blocks. Gemtext does not need to be a presentational markup, but one cannot discount the validity of it to emphasize the semantic-only.
This is where Gemini takes a major divergence from Gopher: Gopher is document-centric, not text-centric. The difference is in how the community treats linking to media resources from their menus/pages. Interestingly, the text-centric emphasis in Gemini's FAQ doesn't appear to represent everyone within the community, from my experience, which has caused debates and arguing about what Gemini is meant to be and whether audio and video, and even streaming, are viable over the protocol. The inconsistency between the documentation and the historical development articles certainly causes some confusion. At times solderpunk celebrates streaming, audio, and gemini-apps, and then other times, like in the FAQ, these are relegated to second-class.
2020-06-16 A vision for Gemini applications (Solderpunk)
Gemini's anti-extensibility is one of the reasons why many useful and powerful, yet simple, features were not added to the protocol. It is the very reason given in the FAQ for why content lengths and upload request types were not added - because the addition of more fields to requests and responses would add a way to extend the protocol. Therefore, Gemini takes an all-or-nothing approach where nothing is better than features that might be seen to sacrifice certain goals of the protocol, particularly anti-extensibility:
we think it's worthwhile giving up some small luxuries to reduce the odds of that happening, even a little.
The problem is this is binary thinking that is far-removed from reality. This binary, black-and-white, thinking is what leads people to believe that every situation is a Trolley problem where only two options are available, rather than considering that there may be many more options that don't require sacrificing the few for the many. It's a shoot before thinking approach. Had the problems been thought through, however, one might realize that you can have anti-extensibility *and* multiple fields within requests and responses, and that gemini was not completely successful in being anti-extensible in the first place. One should also point out that gemini responses use a mechanism that allow them to have two fields without extensibility concerns, and that requests use fields that are extensible anyways. These are some of the general thoughts that are explored in-depth in my article on Gemini's Anti-extensibility:
2024-03-23 What Gemini Gets Wrong With Anti-Extensibility
Ultimately, what Gemini gets wrong with anti-extensibility is not that many parts of it are anti-extensible, but *how* they are anti-extensible to begin with. It is not the minimalism and the lack of extra fields and metadata that make it anti-extensible, it's the way the request and response parsing works!
Fairly early on Gemini emphasized the "power-to-weight" ratio: that features should have more power than weight to be considered worth adding to the protocol. Weight included work taken to parse something, or the feature's effects on the rest of the protocol, especially in terms of spec approachability, privacy, and extensibility. However, when you have a skewed sense of weight due to ill-defined or mis-defined goals, you tend to over-prioritize minimalism at the expense of power, and you use fallacious justifications like the following:
If one or two of these features were added, there would be no substantial net change in community satisfaction; if all of them were added, then everybody would find something to be upset about. Meanwhile, the spec would get longer and longer and cients would have to be updated again and again, all without the obvious payoffs.
Aside from the slippery-slope fallacy, this approach to features comes from a disconnect between the values and goals of the project, the project's spec, and the project's community. Bryan Cantrill's video above explains this well. This is why you must have a clear standard of values and goals and you stick to them. If your values and goals are ambiguous or contradictory, then that poses a problem both for how the spec is defined, leading to arbitrary decisions, and how the community interacts with and uses the project.
Gemini's goals were not completely sufficient, they contradicted the decisions being made, and the "text-centric" approach was a later addition, according to what I remember.[^2] This has led to a community of diverse viewpoints as well as arguments about streaming, videos, music, social media, gemini apps, favicons, etc. The community comes to a natural consensus either by convincing each other of their ideas or by people leaving the project, like what happened with Node.JS. Usually it's a bit of both. BDFL projects are not completely immunue to the arguments that might happen within a project's community. Rather, having a clear set of values and goals upfront and *sticking to them* reduces community strife because people know what they are getting into beforehand. Gemini was only mildly successful in this by having some values that it stuck to, namely the focus on privacy and user autonomy. However, people have left and criticisms within the community have been made due to ambiguities and contradictions in the values and goals.
[^2] Gemini always had some emphasis on text, but from what I remember, early on this was never described as "text-centric," which implies text is at the center and is the most important over everything else.
I view many of the decisions made about Gemini and gemtext to be very misplaced. Minimalism is overemphasized so much that it contradicts the power-to-weight value. Anti-extensibility is misused as the justification for missing features because of an all-or-nothing approach that leads to rash decisions and rejections. Some features one might wish gemtext had are missing and have no justification whatsoever, like the lack of list-nesting, and when they do have a justification, as with strong and emphasis, the percentage that parsing takes within a full gemini client is inflated. This has led to certain features that are indisputably used to great effect within the printing, writing, and electronic publishing fields being omitted from gemtext, with no replacement aside from plain text alternatives that offer no parseability and no user control over presentation, and which contradicts the current (and new) value of being an electronic library. These elements include thematic breaks, a third nesting of headings under the document title, list nesting, quote nesting, footnotes/endnotes, and yes, strong and emphasis.
This has made gemtext *less* simple for the sake of minimalism, because it has reduced expressibility. Markdown to gemtext, HTML to gemtext, and AsciiDoc to gemtext converters become harder to write. Gathering semantic information from across the space is more limited. Distinguishing between Ascii Art and code blocks is impossible without complex programming language detectors. Distinguishing between Ascii Art and regular preformatted text is significantly harder than it should be. Text-to-speech systems have less semantic detail to utilize. These are not good tradeoffs for document-centric or text-centric protocols, formats, and systems. For all of these reasons, Gemtext is less approachable than it needs to be, and it is less simple in that you have to find workarounds for a lot of missing features, just like you do with the idiosyncracies of C, Assembly, and even Golang. Minimalism and simplicity are not the same thing.
Fortunately, many of these problems are actually very simple to define, fix, and implement in clients, we just need someone to do it...
The Scroll Protocol's values are as follows:
Scroll's decisions will be based on real-world experiences in writing, organizing, and designing textual documents, like that of books, as well as real-world experience in writing parsers and clients (i.e., Profectus and Scroll-term). Decisions should always be made by using your project and experiencing it, which is why Scroll's spec is being designed alongside a terminal and a graphical browser for it. This provides necessary context and experience to make the right design decisions in accordance with the above values.
In terms of expressiveness, some of the most common elements of textual documents are being added to the document format (scrolltext), namely list nesting, 4th level headings (nested 3rd under the document's title), strong and emphasis (although optional), thematic breaks, and a formal specification for ordered lists. Footnotes and basic tables are also planned. All of these are simple to parse, approachable to write, and increase the client and user's control over presentation within a document. They are also all document-centric in that many of them give more structural elements to documents.
In terms of a document-centric approach, Scroll explicitly allows for serving documents of various media types over the protocol in ways that respect the user's autonomy and privacy, and therefore leans more in line with Gopher than with Gemini. A document centric approach requires features to organize many different forms of documents, which is why document metadata like bylines, publication, and modification info have been added to the protocol in a way that is stable, not extensible, and simple to parse. Abstracts are also added for increased approachability and expressiveness, without an added cost to anti-extensibility (scrolltext and gemtext documents are inherently extensible due to being fluid textual documents). More on Scroll's document-centric approach will be discussed in future articles.
In terms of Privacy, I will be refining Scroll's request and response formats to ensure that things like mimetypes, query strings, and other fields cannot be misused to violate user privacy and autonomy. There are ways to do this without sacrificing the current featureset of the protocol.
Streaming has a larger emphasis within Scroll to improve the user experience and to accomodate the document-centric approach. Larger documents as whole files are easier to handle and view when these documents are streamed in. We are at a point in computing history where displaying a document as it streams in should not freeze the browser's UI.
My hope is that Scroll is more in line with the concept of being a library of documents. You can already see this document-centric approach in a recent update I made to the Scroll Protocol scrollery/capsule. The homepage is a full document about the protocol and links out to other documents, like the spec, and the Profectus doc.
While writing Profectus, the hardest part was not the gemtext or scrolltext parser, it was not the protocol handling, or calling into a TLS library. The hardest part was the GUI and the font rendering. And not even word-wrapping, which is actually pretty simple when you have all of the necessary sizes to calculate and when you have a decent unicode library. Rather, opening the font files, finding each glyph, caching the glyphs, and then converting them to bitmaps so they can be rendered using the glyph metrics - that was one of the hardest parts. In the terminal application, the hardest part was not the protocol handling, TLS handling, or the parsing, and it wasn't even the word-wrapping, which is even simpler than in GUI applications. It was the rest of the stuff - the actual user interface.
Inflating the time and amount of code it takes to implement a document parser leads to misplaced decisions and unnecessary sacrifices to expressiveness. The expressiveness value is listed in the core values because I feel it is important for a true library experience that accounts for common usage in most fields. However, it is balanced by the other core values, namely simplicity and approachability. Scroll will not become XML, or God forbid, SGML, but it will not become plain text either.