💾 Archived View for gemi.dev › gemlog › 2023-02-08-marginalia.gmi captured on 2023-03-20 at 18:06:04. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
2023-02-08 | #search #proxy | @Acidus
Recently I've been working on a general Gemini proxy, to fetch HTTP/HTML resources and convert them on the fly to gemtext. The plan is to replace the public instace of Duckling with something a bit more feature-ful.
Public Gemini-to-HTTP Gateway via Duckling
A big part of building a new proxy is testing it against various HTML pages, specifically the type of HTML found on fun sites that appeal to Geminauts. @Marginalia's HTTP search engine does a nice job crawling and indexing these types of interesting sites, and their gemlog contains a lot of details about their journey building an HTTP search engine in the 2020's:
@Marginalia's awesome weblog on building a search engine
Anyway, while gemini proxies like Duckling or what I'm working on can render Maginalia's HTML search page, its still an HTML page, so it uses FORM/INPUT tags to receive search queries from users.
Unfortunately, forms cannot really be converted into gemtext. So while you can view Marginalia search in a gemini client via an HTTP proxy, you can't actually submit search queries.
To solve this, I wrote a quick CGI script that allows you to submit search queries to Marginalia inside of Gemini.
This allows you to submit a search query via Gemini's standard "10 [input]" method. It then uses a cross-protocol redirect to redirect from "gemini://" to the "https://" url for the search result's page for that query. I also tell Marginalia to only return websites that have no JS, to ensure that the types of webpages returned are easily converted to gemtext.
Two things to note:
This functionality is small enough that I implemented it as a CGI. For fun, I decided to try and write it in a new language. So this is actually the very first Python program I've ever written! You can see the contents of it here:
#!/usr/local/bin/python3 import os qs =os.environ['QUERY_STRING'] pi = os.environ["PATH_INFO"] splash = """20 text/gemini # Marginalia.nu search redirect This script allows you to submit a search query to search.marginalia.nu and be redirected to the results page. It works best if you Gemini client has been configured to use an HTTP proxy. => /cgi-bin/marginalia.py/s Search => /stargate.gmi Need an HTTP proxy? Try Stargate """ if pi == "": print (splash) else : if qs == "": print("10 Search query for search.marginalia.nu") else : url = "https://search.marginalia.nu/search?query=" + qs + "&profile=default&js=no-js" print("30 " + url)
To be clear, this is an oh-so-dirty hack that uses a cross-protocol redirect to get text for the user over to Marginalia. And I did it quickly so I could more easily test my gemini proxy. Still, searching Marginalia from Gemini and accessing its results is valuable, so how could we make this better?
Ideally, Marginalia could expose its search engine in Gemini itself. Of course, I'm sure they have other things to do and as I have learned, one of the dumbest things you can say to a developer is "its easy, you just need to do _____." So lets assume for now that doesn't happen.
Second best would be to use the Marginalia API to query the search engine from inside of Gemini. You could thenformat the results as gemtext, with HTTP links to each result. That avoids the strange cross-protocol redirect, and could allow for better presentation of the search results.
This seems pretty straight forward, and is something that I could do. In fact, I already received an API key, and given how much fun it was to play with Python, that might be a good next step.
In the meantime, I'll leave this CGI script up so others can search Marginalia via gemini.
Gemini's "10" status code won't help here, and even something like Spartan's input line type won't help here, since the URL of the search query looks like this:
I have been working on a Gemini proxy, similar to Duckling, powered by what