I devised a fast compression scheme for my search engine dictionary which reduces its size to a third while still allowing O(1) lookups. I also had to implement my own hashmap because anything available was too generalized (and therefore wasting too much memory). A byte is a a gigabyte when your dictionary has a billion entries. A java object header is 8 bytes.
3 years ago ยท ๐ kevinsan, defunct, lykso, ttocsneb, skyjake
As for distribution, I think it requires a pretty large scale before it's actually helpful. A lot of why it's fast and good now is specifically because everything has been tuned very tightly. Probably need several dozen nodes doing work before it's even breaking even with my single node. What you'd basically need to build is is a very fast and extraordinarily large and highly fault tolerant distributed key-value dictionary. That's rough. (2/2) ยท 3 years ago
@kevinsan Yeah, I've had a bunch of revalations the last few weeks that's just made it a hundred times better. You can actually find relevant stuff with it now, at least in some domains. (1/2) ยท 3 years ago
I'm pretty impressed with your results, I've found plenty interesting things. Will the current design lend itself to being distributed? e.g. allowing others to run crawling/index nodes, to contribute to the service ยท 3 years ago