updated md2gemini patch

Last week I wrote a hasty patch for md2gemini because it was eating the comment blocks inside lists in one of my posts.

So, how did I fix it..? I was thinking of the original problem as a finite state machine.

                      ```
            ┌───────────────────┐
            ▼                   │
┌───┐     ┌──────────┐  ```   ┌──────────┐
│ . │ ──▶ │ no_fence │ ─────▶ │ in_fence │
└───┘     └──────────┘        └──────────┘
            ▲ *    │            ▲ *    │
            └──────┘            └──────┘

This is correct, but as an illustration, it’s too simple. It obscures (doesn’t show) the complexity in detecting the fence conditions and that was the problem with my first approach. Or more precisely, I implemented it wrong.

I didn’t pay attention to where in the input stream those three ticks/graves showed up, just that they did. So it didn’t support nested blocks.

My naive thinking was that if the code is indented to match the list indent, then the lstrip would still spot a fence. Which was partly correct. It would spot a fence, but it would also incorrectly leave the state when it coincidentally encountered ticks/graves that happened to be inside the fence.

So the new fix is the same finite state machine, but the input is now tested against a regular expression and the indent level is recorded to ensure that only another set of ticks/graves at that indent level will end the fence.

I’ve glossed over it here and probably last week, too, but as for what has to happen while inside fences: newlines must be preserved. And outside of fences: newlines should be replaced with a single space.

Here’s the new fix:

    def list_item(self, text, level):
        new_text = ''
        last_offset = 0
        in_fence = False
        text = text.replace(PARAGRAPH_DELIM, PARAGRAPH_DELIM+'\r\n')
        for item in text.splitlines():
            was_in_fence = in_fence
            m = FENCE_EXPR.match(item) # FENCE_EXPR = re.compile(r'^( *)```')
            if m:
                this_offset = len(m.groups()[0])
                in_fence = not in_fence if m and this_offset == last_offset else in_fence
                if in_fence and not was_in_fence:
                    last_offset = this_offset
            if in_fence:
                new_text += LINEBREAK + item
            else:
                if was_in_fence:
                    new_text += LINEBREAK + item + LINEBREAK
                else:
                    if new_text:
                        new_text += ' ' + item.lstrip()
                    else:
                        new_text = item
        return new_text + NEWLINE

Now it tracks the indent level of the first fence it encounter and uses that offset to determine if the next fence(s) encountered matches that offset, and will only end the fence if it does.

There’s a little bit of a fence post problem at the end when concatenating the items presumably over line breaks. The originally code used reduce on a list for the same issue. Here, the block tests to see if new_text has anything to make sure the space is only added if there is existing content upon which to append. This was another problem with my first solution. It seems like there should be another way to do this without so much work, but I’m drawing a blank, so this is what I’ve committed.

Further considerations

But this still suffers the problem that it only detects a fence delineated by ticks/graves. How else can you define code fence? I looked at the documentation for Pandoc, and it supports…

four (or more) ticks/graves or tildes before and after a fence
blocks do not have to end with the same number it began with

Oh, and just using four or more spaces to indent a line is considered a fence. No delimiter before or after the block, just the indent level.

Wouldn’t it be awesome if I didn’t have to worry about it? Well, I wrote some tests in preparing to solve these problems and it turns out I don’t have to worry about it. By the time the method I’ve modified is called, the fences, no matter what style, have already been converted to the three ticks/graves.

Note: It seems the Vim syntax highlighter suffers a similar issue in spotting fences with ticks/graves as the above diagram highlighting broke at the second occurrence of ticks/graves in the diagram, I had to switch it to space indented.

Now I’ve created a GitHub account and submitted a pull request. Time to get back to studying.

For reference

Last week’s post.

Here’s the original code:

    def list_item(self, text, level):
        items = [item.strip() for item in text.splitlines()]
        text = functools.reduce(lambda x, y: x + " " + y, items)
        return text + NEWLINE

Here was my first attempt at a fix:

    def list_item(self, text, level):
        new_text = ''
        in_fence = False
        text = text.replace(PARAGRAPH_DELIM, PARAGRAPH_DELIM+'\r\n')
        for item in text.splitlines():
            was_in_fence = in_fence
            in_fence = not in_fence if item.lstrip().startswith('```') else in_fence
            if in_fence:
                new_text += LINEBREAK + item
            else:
                if was_in_fence:
                    new_text += LINEBREAK + item + LINEBREAK + LINEBREAK #extra newline
                else:
                    new_text += ' ' + item # extra space prepended
        return new_text + NEWLINE

The graph above:

digraph G {
    graph [layout=dot rankdir=LR]

    node [shape="point"]
    initial [label="."]

    node [shape="oval"]
    no_fence
    in_fence

    initial -> no_fence
    no_fence -> in_fence [label="```"]
    no_fence -> no_fence [label="*"]
    in_fence -> in_fence [label="*"]
    in_fence -> no_fence [label="```"]
}

Navigation

index