💾 Archived View for gemi.dev › gemlog › 2022-06-06-unicode-sub-sup.gmi captured on 2023-09-08 at 16:22:38. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-07-10)
-=-=-=-=-=-=-
2022-06-06 |#gemipedia #wikipedia #cgi | @Acidus
Building Gemipedia has taught me a lot about how to translate various types of HTML formatting in Wikipedia content into Gemtext. Ignore things like italics or emphasis (you often don't need them). How do you represent math formulas? A web browser has lots of options:
Math formulas are especially challenging, not only because they can have so many symbols (such as sigma 'Σ'), but those symbols must be rendered in a certain layout (e.g. Σ used to sum a series has the variable and its starting value, as well as the value its going to, arranged in 2 different rows to the right of the Σ).
In fact, text layout of formulas and symbols is such a complicate domain that Donald Knuth literally created TeX, an entire digital typesetting system, while he was writing "The Art of Computer Programming."
Given all this complexity, I didn't even try to represent math formulas in gemtext. Instead. to solve this in Gempedia, I detect math formulas and create link lines out to PNG images of the rendered math formula (And how are those math formula's written by Wikipedia authors? Funny enough, using LaTex, a modern day version of TeX!). I don't use SVG because I found no Gemini client that supported SVG rendering, and the PNGs look crisp enough.
Chemical formulas or physics formulas are also an issue, but can be solved in a way that is much better aligned with gemtext. These formulas tend to be less complex than many math forumlas, mainly using subscript and superscript characters. Originally, I supported superscripts by rendering anything inside of a <sup> tag using the "^" character, like this:
e=mc^2 or 2^(x-1)
This was... kind of acceptable. However it only worked for superscripts. As soon as I had something with a subscript, like using H2O for water, there was nothing I could do. Surely someone must of have thought of how to deal with all of this right? Luckily someone had:
The Unicode "Subscript and Superscript" block contains characters which allow chemical and algebra formulas and phonetics to be written without using markup.
So instead of e=mc^2, there is a Unicode character '²' so I can write e=mc². Subscripts are also supported, which allow me to render chemical formulas like CH₄ and even whole chemical reactions:
CH₄ + 2 O₂ → CO₂ + 2 H₂O
Beyond just numbers, symbols and even lowercase and uppercase letters are available to handle variables: 2ᵉ. Xⁿ⁻¹
Test page for Unicode Superscript and Subscript characters
Many of the subscript and superscript letters come from different Unicode blocks, so there isn't a simple math function you can use to convert a typical letter into the subscript or superscript version, like you can easily compute the ASCII code of a lowercase letter from the ASCII code of the uppercase version. So the code to do the conversion is just a giant switch statement, that substitutes one character for another. You can see that in the "SuperscriptConverter.cs" and SubscriptConverter.cs" files in the Gemipedia source code:
SuperScript/Subscript converter code in Gemipedia
I should probably package this code up as a reusable library via nuget.