💾 Archived View for warmedal.se › ~bjorn › posts › 2022-07-13-automatic-gemlog-discovery.gmi captured on 2024-07-09 at 00:10:15. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Automatic Gemlog Discovery

Discovery in geminispace is hard. I've thought a lot about it, running Antenna and all, and blippy and masqq had also have some thoughts:

2022-07-09 blippy: NEW: Talking point: blogging discovery

2022-07-09 Masqq: Automatic discovery of blogs

They suggest adding a file in the root of the server akin to security.txt, robots.txt, and humans.txt.

(You never heard of humans.txt because it didn't catch on; the website certificate has even expired)

Similar to how humans.txt never caught on, and security.txt is very rare, another txt file pointing out blog, mini-blog, or micro-blog paths will never catch on. Especially not in geminispace. Why?

Nobody will agree on a format.
Most people will never hear of it.
Most importantly: a whole lot of gemlogs are hosted on tildes, where the authors don't have write permissions in the root path.

But all hope is not lost! Some sort of automatic discovery is still possible. I should know, because Antenna sort of in a way does it.

No, Antenna doesn't find gemlogs, but whenever a URL is submitted it does verify whether it's a gemsub feed, an atom feed, a twtxt file, or none of it. Whatever the page it checks all of them, because I've noticed that the MIME type isn't always correct.

I believe that's the way to do it. A search crawler (which must respect robots.txt of course) can check the pages it finds to suss out if they're feeds or twtxt and add them to an index of gemlogs/micro-blogs if they are.

The big problem is all those mini-blogs/mini-gemlogs where the entire thing is in one file with a totally arbitrary format. I tried to build a parser for it, but it's just not possible because they're so different.

The search engine geminispace.info keeps a list of known atom feeds. That's a solid start.

-- CC0 ew0k, 2022-07-13