2023-02-22 | #html #gemtext #proxy | @Acidus
A few months back, I quietly added a "raw mode" to NewsWaffle. Normally, NewsWaffle takes the HTML of a news article and uses a port of the Readability library to extract the news article. It then runs the article HTML through a custom HTML-to-gemtext converter that I have tuned on the HTML and structure typically present in news websites. Sometimes Readability fails to find an article, or fully extract it, so I added in "raw mode" as a kind of backup which would convert all of the HTML. The output wasn't as clean as the optimized article view, but at least this meant that users could see the content.
But this also meant that users could use NewsWaffle in raw mode to view any HTML page over HTTP! It was in essense a super hacky general Gemini-to-HTTP gateway.
Over the next month or so I would find myself taking especially bad, slow, or gross website URLs and pasting them into NewWaffle's "Enter your own favorite news site" feature and then view them in raw mode. It was an OK way to clean up the content and make it readable.
Of course, there are already existing Genini proxies, like Duckling proxy, which allow you to access websites via a Gemini client. They work by accepting HTTP URLs in a normal gemini request. Image and other content is proxied, and any HTML is converted on the fly to gemtext.
In fact, I setup a public instance of Duckling, creating Gemini-to-HTTP gateway that anyone could use.
Public Gemini-to-HTTP Gateway via Duckling
While Duckling did work, I found the overall experience pretty poor, even on relative simple sites. That's not to denigrate Duckling at all. It's a great tool. Rather, as I've written about before, the complexity of the HTML in modern websites makes converting it to legible gemtext really hard, for lots of the reasons I've discussed before:
Building a better HTML-to-gemtext converter
As I used NewsWaffle as a hacky proxy, I began to make tweaks to better convert the HTML. At a certain point, I found the output of my hacky proxy was better than what Duckling was producing. So I pulled the converter logic out into a common library and used it to create a true general Gemini proxy which I call Stargate, which can be used to browse arbitrary websites via any Gemini client.
I've done a lot of work on Stargate to to make the output readable and clean. It's stable enough that a few weeks ago I switched over the public Gemini-to-HTTP gateway at stargate.gemi.dev:1994 from using Duckling to using Stargate. This means anyone here can start playing with it to browser the web.
I need to do some work to package it up so people can run it locally. In the meantime, you can use the public proxy instance with the instructions linked below:
Public Stargate Proxy: A Gemini-to-HTTP gateway
I plan to write up another post about some of the work I've done to more cleanly convert modern HTML into gemtext, since it has taught me a lot. There is still more work to do. In the meantime, please give it and try and let me know what you think!