HTML to gmi

Posted on 2021-12-21

I have thought about a few Ideas of porting websites from .html to .gmi, below are some of my considerations and thoughts:

Why?

Probably the best use-case would be to easily mirror html-only websites to gemini, but that is not exactly what I wanted to achieve with this.

I have thought about a personal knowledge-base for a while, and already tried a few programs. But a few days ago, I thought "why should I build up my own knowledge base if the entire web is full of knowledge I can use". But using the web as a knowledge base has a few disadvantages:

HTML is bloated, ugly and often requires technologies like JavaScript (furthermore: Cookie-Banners).
Finding a website you already found information on can be hard sometimes.
I cannot edit the content and add some notes just for myself

But how does gemini help:

Obviously, gemini is way simpler and not bloated.
Gemini has a search-feature I have not yet found out how it works, but I might be able to build a search engine for local pages.
I can just convert HTML to gmi and add my own notes, e.g. using > (as a quote)

How it will work

External view

It should of course take the HTML-Content of the website as a parameter. Another thing it will need is the URL to find out what template to use (more on that later). Given this information, it will convert the HTML to gmi, throw it into a directory matching the URL and now it can be served using servers like agate. Now I can simply access all saved pages using any gemini-browser I would like, e.g. amfora. This will also have the advantage that I can easily access the content from anywhere else.

Agate

Amfora

Internal view

Internally, the conversion should take the HTML and the template and output the generated gmi. One more problem would be images, they would have to be downloaded separately, but I am sure there will be a good solution for this problem.

The templates

The templates should describe how to build the gmi from the given HTML. Probably the best approach for this would be inserting content from a DOM-Element into the gmi based on the CSS-Selector. But there still will be some minor annoyances:

Inline links: HTML supports inline-links, gmi does not
Repeating sections: Many elements might map to one template part.

For the inline-links, I probably will have to do some pre-processing of the HTML, inserting some paragraphs if necessary. For repeating sections I will likely need some syntax.

Speaking about the syntax, here is a simplified mockup of what I thought would fit, here made for StackOverflow:

# (#question-header h1 a)

{[#question .post-layout .postcell div p]
(p)

}


{[#answers .answer]
## Answer
{[.post-layout .answercell div p]
(p)

}

}

What happens here?

The element with selector "#question-header h1 a" (the question title) will be inserted as the header.
For each element with selector "#question .post-layout .postcell div p", print the content of "p", so this will print the entire question.
For each answer, print each paragraph.

This is a very simplified representation, currently without images, links or comments. But it can already be seen, that the templates can easily be extended to other formats like markdown as well.

What might a generic "one template fits all" as a backup for websites with no special templates look like?

{[*]
<[h1]
# (h1)
>
<[h2]
## (h2)
>
<[p]
(p)

>
}

Another new syntax got introduced, lets look what it does:

Go over all elements
If this element matches the selector "h1", then use it as a header. The cases "h2" and "p" are pretty much equivalent. If nothing matches, then do nothing.

Conclusion

This is only a short overview of the features I want and thoughts I have. There are probably a lot more issues to have and features needed.

If I further continue to think and implement this project, I will likely write another blog post.

Return to home