Against the Flood [2021-09-19]

So hacker news apparently discovered my search engine, and really took a liking to the idea. Actually that's a bit of an understatement, the thread has gotten 3.3k points and lingered on the front page for half a week. And I wasn't planning for it to go quite that public yet. It has quietly been online for a while, but it was only very recently it started to feel like it was really coming together. It wasn't perfect, there was still a lot of jankiness and limitations that could have been fixed with more time. The index was half the size it should have been. Someone discovered it and shared it. It took off like a rocket, and I'm still at a loss for words at the reception it's gotten. I have received so many encouraging comments, emails, offers of collaboration, a few have even joined the patreon. I've been working through all the messages and I aim to reply to them all, but it takes time. I'm very grateful for all of this, since I half thought I was alone in this.

In building this, I had a hunch I was the next TempleOS-guy, quietly building something ambitious the world just wouldn't be able to relate to. Turns out that just the case at all.

But rewinding back a bit to last Thursday when this all began. I looked at a log and noticed I got more searches than usual. It quickly turned into a lot more searches. The logs just kept scrolling at a dizzying rate as I was tailing them. I didn't know then, but the server was getting about 2 search queries per second, a sustained load that lasted most of the night. The server withstood the barrage without going down, without even feeling slow.

To be perfectly clear, my server, and I have just one of them, it's a single computer. It is not a 42U tower like what you see on /r/homelab, but simple consumer hardware. The motherboard is a kinda shitty mATX board, the CPU is a Ryzen 3900X, and it has 128 Gb of RAM but no ECC. Stick a high end GPU in it and it would basically be a gaming PC with a silly amount of RAM and a weird disk configuration. The modest little cube sits quietly humming in my living room next to a UPS I got a few weeks ago because of all the the thunderstorms and outages this summer.

My home network flows through a cheap router I've had since 2006, 100 mbit, I purchased it when I first moved to my own apartment. I really think this is the craziest part of the whole story. If anything were to just keel over and die at managing tens of HTTP requests per second, it would be that piece IBM-beige antiquity (actually looking at the backside reveals that it was once grayish-white, but sitting in the sun for 15 years does things to plastic).

I had done some performance testing, and knew the search engine ought to hold up to a decent search pressure. But you don't really know if the ship floats until it's in the water, and here it suddenly found itself on an unexpected maiden voyage across a stormy ocean. There's a lot of moving parts in software this complex, and only one of them needs to scale poorly to bring it all down. But apparently not. In fact, due to how memory mapping interacts with disk caching, it searches faster now than it did before.

How is this even possible?

I'm too well-acquainted with survivorship bias to pretend I know exactly what the secret sauce is. But I can offer some guesses:

The Future

I'm still processing all of this. It's extremely encouraging how many people seem to like the idea. The project is in its infancy, and I have many ideas for improvements. There are also things that need to be tested to see if they work. It's probably going to be a pretty bumpy road, but I'm extremely grateful that I have people with me.

Below are the things I'm working toward right now.

Short term

Long term

Pictures

My Server

The Pi-cluster

Links

HN Thread

My Search Engine

Topic

/topic/astrolabe.gmi

Navigation

Back to Index

Reach me at kontakt@marginalia.nu