Comment by Seaoftroublez on 05/02/2021 at 23:31 UTC

57 upvotes, 0 direct replies (showing 0)

View submission: Diamond Hands on the Data 💎🙌📈

There are a few ways to scale databases.

Simplest way to scale is to throw more resources at the underlying computer that hosts the db. If using AWS RDS, this is increasing the vCPU count, memory, and volume size.

With SQL-like databases where there is a guarantee of ACID, the bottleneck for scaling is the ability to "write" to the database, since writing requires locking mechanisms.

In this case, you can typically split up the single writer from multiple readers. When someone writes to the writer database, the data is automatically replicated to the readers. There's usually a bit of a delay, can take a few milliseconds.

Readers are extremely easy to scale. You just add another "computer". When someone tries to read from your collective database, they'll pick one of the readers to read from. There is tooling that does this for you automatically. Like web load balancing, but for databases.

Scaling up "writers" is more difficult. One approach is called sharding. Essentially it's partitioning data across distinct databases. If you generate a unique identifier for each comments based on some property (maybe user ID), even numbers could go to database A while odd numbers go to database B. In reality it gets a bit more complicated, but that's the gist.

Before scaling up the databases more, you may want to improve the cache layer. A typical cache is a key value store and so random access is much faster than a database. Solutions like redis have built-in sharding for this. They're a lot easier to scale than a database, especially when you don't care about referential integrity. Downside is that random access storage is more expensive than sequential storage.

Replies

There's nothing here!