💾 Archived View for rawtext.club › ~sloum › geminilist › 005716.gmi captured on 2024-02-05 at 11:07:47. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-11-30)
-=-=-=-=-=-=-
Stephane Bortzmeyer stephane at sources.org
Sat Feb 27 10:16:46 GMT 2021
- - - - - - - - - - - - - - - - - - -
On Sat, Feb 27, 2021 at 10:21:18AM +0100, Côme Chilliet <come at chilliet.eu> wrote a message of 4 lines which said:
I was kind of expecting to see a solution based on an existing
search engine to emerge, such as elastic search, by implementing
only the gemini specific parts, but I looked into quite a few
project and all were terribly complicated…
A search engine service has three parts: the crawler, the indexer andthe querier (the one the user interacts with). ElasticSearch could bea good idea for the last two (at least the second and may be part ofthe third). You still have to write the crawler and, speaking forexperience, this is not a one week-end project. At the beginning, itis, you have a prototype running quite rapidly but then, in the realworld, a lot of problems happen. My "favorite" is capsules acceptingTCP, completing the TLS handshake, but then not replying to queriesbut there are also endless redirections and other "funny" stuff. Acrawler has to be paranoid! Managing such a beast takes time, and thegrowth of the geminispace (47 capsules added yesterday, a new record,including one in catalan, apparently the first one) requires than youplan in advance: what works today won't in a few months.