💾 Archived View for station.martinrue.com › marginalia › 7c3f21cee3994414a741861b3ce81d18 captured on 2023-04-26 at 14:43:08. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-03-20)
-=-=-=-=-=-=-
Building a search engine is nothing for an instant gratification junkie. I think I've made huge improvements, but I won't know for certain until the dust settles in about a week.
2 years ago · 👍 cobradile94
@marginalia
but isn't that how a lot of projects start out? I mean, it doesn't exactly have to be a polished piece of gold, it just has to be something that works. I felt the same before about the documentation I write but I realized over time that if I don't, probably no-one will, even more so with something as undocumented as a search-engine, just my thoughts. · 2 years ago
@dimitrigorvachov If you check my capsule, you'll find I have written *some* on the topic including a few sketches on the design, but I don't think I'm the guy to write the authoritative guide on how to build search engines, as all I have done is cobble to gether one several months of guesswork and trial-and-error. · 2 years ago
you said before that there is not a lot of good things to read when it comes to making a search engine. Have you ever considered writing down the basics of making one? · 2 years ago
@kevinsan You might have caught the first iteration of what I'm testing now, which is designed specifically to improve the relevance of the search results. There were a few wrinkles though, which meant I had to restart indexing a few hours ago. Turns out some people use newlines to terminate HTML tags, they just <h1>start a tag and never end it. It confuses my (relatively modern) HTML parser, and makes the scraper put too high value on the text on the page. · 2 years ago
I did a search (less than 24 hours ago) for 'chacha' which gave a good range of results. The top result was what I was looking for. I remember feeling impressed. · 2 years ago