💾 Archived View for station.martinrue.com › marginalia › d4ce18fc5e834ac4a2a32f92cd297317 captured on 2022-07-16 at 19:02:17. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
I spent the better part of the day tinkering with a wikipedia cleaner that generates stripped down HTML that's so clean you can read the the articles with netcat. It's supposed to be a part of my search engine, but it's pretty cool on its own. Check it out: https://search.marginalia.nu/wiki/Memex
11 months ago · 👍 warpengineer, samwise
[1] https://search.marginalia.nu/wiki/Memex
@defunct Check for example man7.org/linux/man-pages -- very clean page. Couldn't figure out why my search engine wouldn't just gobble it up. Except it turns out it has a google analytics tag at the bottom. Sadly a common story. · 11 months ago
@defunct Well maybe. Meanwhile, I think rather the thing is most people don't have a grudge against javascript, so most plain websites come with a tag or two, maybe some analytics or some stuff, and that's just how it is. Maybe it's something that should be discussed and problematized more than it is. · 11 months ago
@marginalia Shouldn't we as responsible gemini residents then provide an incentive to fill that domain with something useful? Well, then again, we already have that, here in geminispace ;) · 11 months ago
@defunct I could extract results that are only in the top quality segment, but what you typically end up getting in thatr ange plain text files, changelogs and mailing lists. It seems hard to get useful results in that domain. · 11 months ago
can you build a second version of your search engine that ignores all web pages if they include a single script tag? that could be the grand escape 😍 · 11 months ago
of course just as our internet may be subverted by those that can : 1984, brazil lol. good work with your engine, thank you. · 11 months ago
@digbat_camping I know, right. I stumbled upon it a few days ago and I keep wondering what the heck happened wit that idea. It seems not just absolutely brilliant, but quite doable. · 11 months ago
Thank you for highlighting such an important concept: memex. · 11 months ago