💾 Archived View for warmedal.se › ~bjorn › posts › 2022-07-13-automatic-gemlog-discovery.gmi captured on 2024-07-09 at 00:10:15. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-07-16)
-=-=-=-=-=-=-
Discovery in geminispace is hard. I've thought a lot about it, running Antenna and all, and blippy and masqq had also have some thoughts:
2022-07-09 blippy: NEW: Talking point: blogging discovery
2022-07-09 Masqq: Automatic discovery of blogs
They suggest adding a file in the root of the server akin to security.txt, robots.txt, and humans.txt.
(You never heard of humans.txt because it didn't catch on; the website certificate has even expired)
Similar to how humans.txt never caught on, and security.txt is very rare, another txt file pointing out blog, mini-blog, or micro-blog paths will never catch on. Especially not in geminispace. Why?
But all hope is not lost! Some sort of automatic discovery is still possible. I should know, because Antenna sort of in a way does it.
No, Antenna doesn't find gemlogs, but whenever a URL is submitted it does verify whether it's a gemsub feed, an atom feed, a twtxt file, or none of it. Whatever the page it checks all of them, because I've noticed that the MIME type isn't always correct.
I believe that's the way to do it. A search crawler (which must respect robots.txt of course) can check the pages it finds to suss out if they're feeds or twtxt and add them to an index of gemlogs/micro-blogs if they are.
The big problem is all those mini-blogs/mini-gemlogs where the entire thing is in one file with a totally arbitrary format. I tried to build a parser for it, but it's just not possible because they're so different.
The search engine geminispace.info keeps a list of known atom feeds. That's a solid start.
-- CC0 ew0k, 2022-07-13