💾 Archived View for warmedal.se › ~bjorn › posts › 2022-07-13-automatic-gemlog-discovery.gmi captured on 2022-07-16 at 13:41:16. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

Automatic Gemlog Discovery

Discovery in geminispace is hard. I've thought a lot about it, running Antenna and all, and blippy and masqq had also have some thoughts:

2022-07-09 blippy: NEW: Talking point: blogging discovery

2022-07-09 Masqq: Automatic discovery of blogs

They suggest adding a file in the root of the server akin to security.txt, robots.txt, and humans.txt.

(You never heard of humans.txt because it didn't catch on; the website certificate has even expired)

Similar to how humans.txt never caught on, and security.txt is very rare, another txt file pointing out blog, mini-blog, or micro-blog paths will never catch on. Especially not in geminispace. Why?

But all hope is not lost! Some sort of automatic discovery is still possible. I should know, because Antenna sort of in a way does it.

No, Antenna doesn't find gemlogs, but whenever a URL is submitted it does verify whether it's a gemsub feed, an atom feed, a twtxt file, or none of it. Whatever the page it checks all of them, because I've noticed that the MIME type isn't always correct.

I believe that's the way to do it. A search crawler (which must respect robots.txt of course) can check the pages it finds to suss out if they're feeds or twtxt and add them to an index of gemlogs/micro-blogs if they are.

The big problem is all those mini-blogs/mini-gemlogs where the entire thing is in one file with a totally arbitrary format. I tried to build a parser for it, but it's just not possible because they're so different.

The search engine geminispace.info keeps a list of known atom feeds. That's a solid start.

-- CC0 ew0k, 2022-07-13