馃懡 marginalia

I spent the better part of the day tinkering with a wikipedia cleaner that generates stripped down HTML that's so clean you can read the the articles with netcat. It's supposed to be a part of my search engine, but it's pretty cool on its own. Check it out: https://search.marginalia.nu/wiki/Memex

2 years ago 路 馃憤 warpengineer, samwise

Links

[1] https://search.marginalia.nu/wiki/Memex

Actions

馃憢 Join Station

8 Replies

馃懡 marginalia

@defunct Check for example man7.org/linux/man-pages -- very clean page. Couldn't figure out why my search engine wouldn't just gobble it up. Except it turns out it has a google analytics tag at the bottom. Sadly a common story. 路 1 year ago

馃懡 marginalia

@defunct Well maybe. Meanwhile, I think rather the thing is most people don't have a grudge against javascript, so most plain websites come with a tag or two, maybe some analytics or some stuff, and that's just how it is. Maybe it's something that should be discussed and problematized more than it is. 路 1 year ago

馃懡 defunct

@marginalia Shouldn't we as responsible gemini residents then provide an incentive to fill that domain with something useful? Well, then again, we already have that, here in geminispace ;) 路 1 year ago

馃懡 marginalia

@defunct I could extract results that are only in the top quality segment, but what you typically end up getting in thatr ange plain text files, changelogs and mailing lists. It seems hard to get useful results in that domain. 路 1 year ago

馃懡 defunct

can you build a second version of your search engine that ignores all web pages if they include a single script tag? that could be the grand escape 馃槏 路 1 year ago

馃懡 digbat_camping

of course just as our internet may be subverted by those that can : 1984, brazil lol. good work with your engine, thank you. 路 2 years ago

馃懡 marginalia

@digbat_camping I know, right. I stumbled upon it a few days ago and I keep wondering what the heck happened wit that idea. It seems not just absolutely brilliant, but quite doable. 路 2 years ago

馃懡 digbat_camping

Thank you for highlighting such an important concept: memex. 路 2 years ago