HTML to gmi

Posted on 2021-12-21

I have thought about a few Ideas of porting websites from .html to .gmi, below are some of my considerations and thoughts:

Why?

Probably the best use-case would be to easily mirror html-only websites to gemini, but that is not exactly what I wanted to achieve with this.

I have thought about a personal knowledge-base for a while, and already tried a few programs. But a few days ago, I thought "why should I build up my own knowledge base if the entire web is full of knowledge I can use". But using the web as a knowledge base has a few disadvantages:

But how does gemini help:

How it will work

External view

It should of course take the HTML-Content of the website as a parameter. Another thing it will need is the URL to find out what template to use (more on that later). Given this information, it will convert the HTML to gmi, throw it into a directory matching the URL and now it can be served using servers like agate. Now I can simply access all saved pages using any gemini-browser I would like, e.g. amfora. This will also have the advantage that I can easily access the content from anywhere else.

Agate

Amfora

Internal view

Internally, the conversion should take the HTML and the template and output the generated gmi. One more problem would be images, they would have to be downloaded separately, but I am sure there will be a good solution for this problem.

The templates

The templates should describe how to build the gmi from the given HTML. Probably the best approach for this would be inserting content from a DOM-Element into the gmi based on the CSS-Selector. But there still will be some minor annoyances:

For the inline-links, I probably will have to do some pre-processing of the HTML, inserting some paragraphs if necessary. For repeating sections I will likely need some syntax.

Speaking about the syntax, here is a simplified mockup of what I thought would fit, here made for StackOverflow:

# (#question-header h1 a)

{[#question .post-layout .postcell div p]
(p)

}


{[#answers .answer]
## Answer
{[.post-layout .answercell div p]
(p)

}

}

What happens here?

This is a very simplified representation, currently without images, links or comments. But it can already be seen, that the templates can easily be extended to other formats like markdown as well.

What might a generic "one template fits all" as a backup for websites with no special templates look like?

{[*]
<[h1]
# (h1)
>
<[h2]
## (h2)
>
<[p]
(p)

>
}

Another new syntax got introduced, lets look what it does:

Conclusion

This is only a short overview of the features I want and thoughts I have. There are probably a lot more issues to have and features needed.

If I further continue to think and implement this project, I will likely write another blog post.

Return to home