💾 Archived View for tilde.club › ~cyrus › 2022-09-11-backtick.gmi captured on 2024-05-10 at 10:55:02. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-01-29)
-=-=-=-=-=-=-
▀█▄ ▄▄ ▄▄ ██ ▄▄
▀▀ ██ ██ ██ ▀▀ ██
██▄███▄ ▄█████▄ ▄█████▄ ██ ▄██▀ ███████ ████ ▄█████▄ ██ ▄██▀
██▀ ▀██ ▀ ▄▄▄██ ██▀ ▀ ██▄██ ██ ██ ██▀ ▀ ██▄██
██ ██ ▄██▀▀▀██ ██ ██▀██▄ ██ ██ ██ ██▀██▄
███▄▄██▀ ██▄▄▄███ ▀██▄▄▄▄█ ██ ▀█▄ ██▄▄▄ ▄▄▄██▄▄▄ ▀██▄▄▄▄█ ██ ▀█▄
▀▀ ▀▀▀ ▀▀▀▀ ▀▀ ▀▀▀▀▀ ▀▀ ▀▀▀ ▀▀▀▀ ▀▀▀▀▀▀▀▀ ▀▀▀▀▀ ▀▀ ▀▀▀
2022-09-11 | #tilde.wtf #backtick #golang #postgresql #search
Now that I've built out the crawler and the database that spiders the tildeverse at a regular interval, it was time to create the API component. I wanted to ensure that the community would not be bound to a WWW-only interface for the search data (though I will be creating a WWW frontend myself here in the near future). Thus, it makes sense to first create an API that allows you to search through the index. This ensures the community can create all sorts of frontends to the data they'd like -- whether it's IRC, Gemini, etc.
The API is now up and running and serves parseable JSON with the results for your query. You can access it via HTTPS:
https://search.tilde.wtf/search?q=tilde
It supports pagination with an offset value. While this is not the best choice for performance over a very large index, the tildeverse probably isn't going to become large enough for this to be an issue. The index can multiply in size many times over before it may become an issue.
So how is the search happening? It's actually using PostgreSQL's smart full text search:
rows, err := db.Query(`SELECT url, title, crawled_on, ts_headline(body, plainto_tsquery(' '|| $1 ||' '), 'MaxFragments=0, MinWords=25, MaxWords=60') AS headline FROM tildes WHERE searchtext @@ plainto_tsquery(' '|| $2 ||' ') LIMIT 30 OFFSET $3;`, query, query, offset)
Details are available in the Postgres docs:
https://www.postgresql.org/docs/current/textsearch-controls.html
The benefit of all this is: "PostgreSQL provides two predefined ranking functions, which take into account lexical, proximity, and structural information; that is, they consider how often the query terms appear in the document, how close together the terms are in the document, and how important is the part of the document where they occur."
All of this without a bunch of complicated effort on my part. Lovely.
Next up is to work on a basic WWW frontend that will live as the official tilde.wtf page. I'll also be building this component in Go and make use of basic html/template's for it. It will have zero javascript and be minimal.