💾 Archived View for station.martinrue.com › marginalia › d4ce18fc5e834ac4a2a32f92cd297317 captured on 2024-06-16 at 14:26:21. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-05-12)
-=-=-=-=-=-=-
I spent the better part of the day tinkering with a wikipedia cleaner that generates stripped down HTML that's so clean you can read the the articles with netcat. It's supposed to be a part of my search engine, but it's pretty cool on its own. Check it out: https://search.marginalia.nu/wiki/Memex
2 years ago · 👍 warpengineer, samwise
https://search.marginalia.nu/wiki/Memex
@defunct Check for example man7.org/linux/man-pages -- very clean page. Couldn't figure out why my search engine wouldn't just gobble it up. Except it turns out it has a google analytics tag at the bottom. Sadly a common story. · 2 years ago
@defunct Well maybe. Meanwhile, I think rather the thing is most people don't have a grudge against javascript, so most plain websites come with a tag or two, maybe some analytics or some stuff, and that's just how it is. Maybe it's something that should be discussed and problematized more than it is. · 2 years ago
@marginalia Shouldn't we as responsible gemini residents then provide an incentive to fill that domain with something useful? Well, then again, we already have that, here in geminispace ;) · 2 years ago
@defunct I could extract results that are only in the top quality segment, but what you typically end up getting in thatr ange plain text files, changelogs and mailing lists. It seems hard to get useful results in that domain. · 2 years ago
can you build a second version of your search engine that ignores all web pages if they include a single script tag? that could be the grand escape 😍 · 2 years ago
of course just as our internet may be subverted by those that can : 1984, brazil lol. good work with your engine, thank you. · 2 years ago
@digbat_camping I know, right. I stumbled upon it a few days ago and I keep wondering what the heck happened wit that idea. It seems not just absolutely brilliant, but quite doable. · 2 years ago
Thank you for highlighting such an important concept: memex. · 2 years ago