💾 Archived View for stacksmith.flounder.online › gemlog › 2021-09-26.GMI_parser_state_machine.gmi captured on 2023-07-22 at 16:20:12. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-07-16)
-=-=-=-=-=-=-
Legends have it that implementing a client is a weekend job. Maybe I am much stupider than expected, or maybe I care too much about being correct and minimal, but it's taking me much longer. Procrastination does not help, I admit.
As I see it, a parser for streamed data should be a state machine that eats characters. In my case I am starting with a string already (and as I am using GTK, it is a good idea to shove the entire string into the text buffer, and style afterwards). So my parser is there for styling purposes only.
So it is a state machine that eats characters and returns NIL unless it determines a stylable span end, in which case it returns enough information to style said span.
Since a determination happens at the beginning of a line and requires one to three characters, there are more states than one would expect. This reaffirms my conjecture that GMI format is rather unfortunate...
Each line starts with a 'kind' character - a specially selected unicode character that visually communicates the purpose of the line, be it a bullet, an arrow for links, or one of three headers. Only blockquote lines do not start with a kind character, allowing us to easily copy and paste them...
I just have to slog it with a 100-line state machine...