How to select the most appropriate DB for your project?

Author: udanisaloni

Score: 5

Comments: 10

Date: 2021-11-30 09:08:31

________________________________________________________________________________

toast0 wrote at 2021-12-01 00:58:00:

Are you using BEAM and does key-value work for your data? Use mnesia with disc_copies (side question: does your data fit in memory? The answer must be yes, but you can put a lot of memory in a machine and you can partition your data)

Otherwise, use Postgres or MySQL depending on what your team prefers or has experience with.

Done.

uniqueuid wrote at 2021-11-30 09:27:10:

Personal experience tells me to go by the following rules

- Pick what you know

- Pick what's boring (i.e. settled technology)

- Pick what others know and understand

- Don't be afraid to change schemata or DBs - in fact, get comfortable with the idea from the beginning. This prevents mental lock-in

- You don't need a specialized database (i.e. graph, time-series). If you need one, you already know it.

- Always, always have backups and test them.

And some more subjective takes:

- Sharding is easier than distributed DBs. If you can design your data flow to be largely unidirectional and compartmentalized (i.e. user profiles etc.), you benefit massively

- Multi-Master replication is often a huge cost in terms of complexity and risk. Avoid it wherever you can. If you can't, get someone who has battle scars to help you.

bjourne wrote at 2021-11-30 11:23:10:

You use sqlite until you realize that it is unsuitable for your project. That realization may never come and so you have not wasted any time pondering over what database to use. :)

sharps_xp wrote at 2021-12-01 01:30:10:

doesnt it get unsuitable really fast if you have to support concurrent write requests

gtsteve wrote at 2021-11-30 11:18:03:

There are many criteria surrounding performance etc. When it comes to cloud computing which is where I tend to do all my work these days, my key ones are:

1. Portability - Ensure that the database in use can be installed on a developer workstation or off-cloud in an on-prem production environment. Amazon's DynamoDB looks amazing and it's a shame I'll never use it.

2. Compatibility - If you can't get 100% portability then at least ensure the protocol is identical to something you could use elsewhere. This is why I compromised on using Amazon Aurora for example, because it's protocol compatible with MySQL.

3. Maintenance - if you pick a relatively new database technology then you will need to run your own servers. You will need to figure out backups and test them. If you use something boring like MySQL or Postgres, you can just host this in a managed service with everything you need to use it in production. You pay more, but you spend less time, which is the only thing you cannot borrow or buy.

komon wrote at 2021-11-30 11:31:13:

FWIW, DynamoDB (at least, an API-co patible, sqlite-backed stand in) can be installed to developer machines.

But your point about on-prem stands

gtsteve wrote at 2021-11-30 13:27:51:

Yes, the developer copy is nice. It's a shame you can't get that with other AWS services, although LocalStack (not personally tried) does look nice.

I am tempted to use DynamoDB from time to time but I've won a few jobs with our ability to deploy to a customer environment, even if they don't always end up doing that. Most of our competitors won't even entertain the idea, so it's a great box to be able to tick.

Note: My experience is in B2B. This isn't such a concern for B2C at least not today. Nonetheless, I would still not want to tie myself to AWS too tightly for a B2C app. Amazon is great today. Tomorrow, who knows? After all, I remember the days when nobody got fired for choosing IBM. I'm not so sure that's true now.

komon wrote at 2021-11-30 14:34:50:

DynamoDB is an interesting beast. It's hard for me to recommend it for most use cases anyway.

It was designed to power experiences that look like shopping carts: relatively short-lived, mostly append-only, needing a flattish performance curve as the number of items and users go up, with a really natural partition key.

Using it as the primary storage of your app means you play the part of query planner a lot of the time.

If your data is especially graph-structured you might end up doing so-called "single table design" where all objects end up in the same table that has its keys constructed carefully (compound keys formed by string concatting), And has secondary sparse indices to power other access patterns.

And definitely plan on overfetching as an _optimization_ to prevent many small queries to dynamo, especially in a GraphQL context where we have no idea what fields are being requested at fetch time.

bnchrch wrote at 2021-11-30 14:10:39:

If it’s a greenfield/saas/crud project? 99% of the time choose Postgres. It will hit nearly any use case.

You can always refactor if the project is a huge success and could benefit from a more obscure db.

SirChainsaw wrote at 2021-11-30 10:01:04:

If your data contains entities or data points that are related to each other somehow....use an RDBMS. Surprisingly uncommon.