Last week I wrote a hasty patch for md2gemini because it was eating the comment blocks inside lists in one of my posts.
So, how did I fix it..? I was thinking of the original problem as a finite state machine.
``` ┌───────────────────┐ ▼ │ ┌───┐ ┌──────────┐ ``` ┌──────────┐ │ . │ ──▶ │ no_fence │ ─────▶ │ in_fence │ └───┘ └──────────┘ └──────────┘ ▲ * │ ▲ * │ └──────┘ └──────┘
This is correct, but as an illustration, it’s too simple. It obscures (doesn’t show) the complexity in detecting the fence conditions and that was the problem with my first approach. Or more precisely, I implemented it wrong.
I didn’t pay attention to where in the input stream those three ticks/graves showed up, just that they did. So it didn’t support nested blocks.
My naive thinking was that if the code is indented to match the list indent, then the lstrip would still spot a fence. Which was partly correct. It would spot a fence, but it would also incorrectly leave the state when it coincidentally encountered ticks/graves that happened to be inside the fence.
So the new fix is the same finite state machine, but the input is now tested against a regular expression and the indent level is recorded to ensure that only another set of ticks/graves at that indent level will end the fence.
I’ve glossed over it here and probably last week, too, but as for what has to happen while inside fences: newlines must be preserved. And outside of fences: newlines should be replaced with a single space.
Here’s the new fix:
def list_item(self, text, level): new_text = '' last_offset = 0 in_fence = False text = text.replace(PARAGRAPH_DELIM, PARAGRAPH_DELIM+'\r\n') for item in text.splitlines(): was_in_fence = in_fence m = FENCE_EXPR.match(item) # FENCE_EXPR = re.compile(r'^( *)```') if m: this_offset = len(m.groups()[0]) in_fence = not in_fence if m and this_offset == last_offset else in_fence if in_fence and not was_in_fence: last_offset = this_offset if in_fence: new_text += LINEBREAK + item else: if was_in_fence: new_text += LINEBREAK + item + LINEBREAK else: if new_text: new_text += ' ' + item.lstrip() else: new_text = item return new_text + NEWLINE
Now it tracks the indent level of the first fence it encounter and uses that offset to determine if the next fence(s) encountered matches that offset, and will only end the fence if it does.
There’s a little bit of a fence post problem at the end when concatenating the items presumably over line breaks. The originally code used reduce on a list for the same issue. Here, the block tests to see if new_text has anything to make sure the space is only added if there is existing content upon which to append. This was another problem with my first solution. It seems like there should be another way to do this without so much work, but I’m drawing a blank, so this is what I’ve committed.
But this still suffers the problem that it only detects a fence delineated by ticks/graves. How else can you define code fence? I looked at the documentation for Pandoc, and it supports…
Oh, and just using four or more spaces to indent a line is considered a fence. No delimiter before or after the block, just the indent level.
Wouldn’t it be awesome if I didn’t have to worry about it? Well, I wrote some tests in preparing to solve these problems and it turns out I don’t have to worry about it. By the time the method I’ve modified is called, the fences, no matter what style, have already been converted to the three ticks/graves.
Note: It seems the Vim syntax highlighter suffers a similar issue in spotting fences with ticks/graves as the above diagram highlighting broke at the second occurrence of ticks/graves in the diagram, I had to switch it to space indented.
Now I’ve created a GitHub account and submitted a pull request. Time to get back to studying.
Here’s the original code:
def list_item(self, text, level): items = [item.strip() for item in text.splitlines()] text = functools.reduce(lambda x, y: x + " " + y, items) return text + NEWLINE
Here was my first attempt at a fix:
def list_item(self, text, level): new_text = '' in_fence = False text = text.replace(PARAGRAPH_DELIM, PARAGRAPH_DELIM+'\r\n') for item in text.splitlines(): was_in_fence = in_fence in_fence = not in_fence if item.lstrip().startswith('```') else in_fence if in_fence: new_text += LINEBREAK + item else: if was_in_fence: new_text += LINEBREAK + item + LINEBREAK + LINEBREAK #extra newline else: new_text += ' ' + item # extra space prepended return new_text + NEWLINE
The graph above:
digraph G { graph [layout=dot rankdir=LR] node [shape="point"] initial [label="."] node [shape="oval"] no_fence in_fence initial -> no_fence no_fence -> in_fence [label="```"] no_fence -> no_fence [label="*"] in_fence -> in_fence [label="*"] in_fence -> no_fence [label="```"] }
created: 2022-12-15
(re)generated: 2024-12-17