💾 Archived View for station.martinrue.com › marginalia › 178f59bcc7204b27a3eb509261b90b37 captured on 2022-07-16 at 19:02:19. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
I devised a fast compression scheme for my search engine dictionary which reduces its size to a third while still allowing O(1) lookups. I also had to implement my own hashmap because anything available was too generalized (and therefore wasting too much memory). A byte is a a gigabyte when your dictionary has a billion entries. A java object header is 8 bytes.
11 months ago · 👍 kevinsan, defunct, lykso, ttocsneb, skyjake
As for distribution, I think it requires a pretty large scale before it's actually helpful. A lot of why it's fast and good now is specifically because everything has been tuned very tightly. Probably need several dozen nodes doing work before it's even breaking even with my single node. What you'd basically need to build is is a very fast and extraordinarily large and highly fault tolerant distributed key-value dictionary. That's rough. (2/2) · 11 months ago
@kevinsan Yeah, I've had a bunch of revalations the last few weeks that's just made it a hundred times better. You can actually find relevant stuff with it now, at least in some domains. (1/2) · 11 months ago
I'm pretty impressed with your results, I've found plenty interesting things. Will the current design lend itself to being distributed? e.g. allowing others to run crawling/index nodes, to contribute to the service · 11 months ago