💾 Archived View for dioskouroi.xyz › thread › 24944852 captured on 2020-10-31 at 01:00:57. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

What’s so exciting about Postgres?

Author: jerodsanto

Score: 89

Comments: 93

Date: 2020-10-30 17:43:50

Web Link

________________________________________________________________________________

Animats wrote at 2020-10-30 20:37:40:

Databases shouldn't be exciting. They should _just work_.

The main reason webcrap works is because the databases underneath work. If the Javascript crowd had to manage data storage, nothing persistent would work reliably.

systemvoltage wrote at 2020-10-30 21:12:03:

It is often downvoted to talk about the "javascript crowd" on HN, but I've worked in software industry long enough to unequivocally and with certainty say that it totally lives up to the cliches.

It never gets old to scold people that are not serious, disciplined and principled in their endeavors that affect the rest of the world. Javascript community absolutely deserves the avalanche of criticisms they face.

It is the reason why the modern web sucks, and frankly - I wish folks at Postgres would develop a new web standards including browsers that are not governed by corporations and can be implemented in a couple of weeks. Not trivializing what goes into building browsers, but making the whole standard small enough so any one can build a browser during a sabbatical.

noir_lord wrote at 2020-10-30 22:04:48:

I'm seeing this in action now, the difference between the backend code and the frontend code is stark and the backend code isn't up to my standard either (and my standards aren't that high, I know I'm merely competent for many places).

It just seems to be the nature of the web side of things, lots of sloppiness.

The devs are smart enough, they just don't _think_ and it shows.

That or they use something but clearly don't understand the thing they are using.

I replaced nearly 500 lines of code with ~60 by taking a bunch of massive conditonals re-ordering them then extracting them into methods and returning on the simple case.

This isn't voodoo, you have to work harder and remember more to do it the _wrong_ way.

Also comments, they don't seem to understand you document _why_ you did the thing, not what you did, what you did is in the code, I don't trust your comments to tell me what you did, I want to know _why_ you did it.

That and general documentation (cross link stuff, link at the top of the class to the documentation in the wiki, update the documentation when you get a ticket that changes the behaviour, write the high level flow in plain english a 10 year old could understand).

_rant over_.

I actually like my job, the people are lovely and the work is interesting its just frustrating when an ounce of thought saves a pound of cure.

edoceo wrote at 2020-10-30 22:19:36:

Hello fellow "merely competent" engineer. I recommend we both remove "merely" from our title. Inexperience grows from the bottom, I've gotten "better" by simply being too stubborn to get worse. Keep up the mediocre work!

donw wrote at 2020-10-30 23:15:09:

> Inexperience grows from the bottom, I've gotten "better" by simply being too stubborn to get worse.

That is quite possibly one of the best quotes I have heard on how to approach discipline in engineering.

systemvoltage wrote at 2020-10-31 03:35:49:

I don't understand, can you please explain?

> Inexperience grows from the bottom

In what way?

> I've gotten "better" by simply being too stubborn to get worse

How does one get better by wanting to get worse?

This makes no sense to me.

soneca wrote at 2020-10-30 22:19:03:

Should I assume from your comment that you never saw code worth improving other than those written by JavaScript developers?

robertlagrant wrote at 2020-10-30 22:29:49:

Ah, pointless exclusion of the middle.

noir_lord wrote at 2020-10-30 22:23:16:

No.

dfee wrote at 2020-10-30 21:33:03:

That’s quite the perspective.

It’s a big enough world where you don’t have to interact with every technology that exists, but this is one of the more prevalent ones - and you might find it useful to make peace with that community.

Anyway, good luck out there.

dkarl wrote at 2020-10-30 23:02:42:

It's a stereotype, and like a lot of stereotypes, it has a little bit of statistical truth, but I don't think it does justice for the excellent front-end engineers that do exist and whom I've worked with personally.

That said, there is some truth in it. The thing about the back end is that there's no satisfaction in it other than the satisfaction of doing it well and earning the esteem of peers, so most back end developers are paying attention to the quality of the code they write. The front end has other attractions as well, and many front-end developers start out focused on those attractions, and learn by experience (if they learn at all) that writing high quality code is related to the cool stuff that got them into programming in the first place.

systemvoltage wrote at 2020-10-31 03:40:59:

I didn't mean us to be divisive. Backend engineers need good front-end partners to be successful. Front-end engineers depend on the backend folk to not fuck shit up, which happens time and again.

I meant that the developer community, the folks that are at the helm of developing Javascript frameworks (gahhhh!) and other shenanigans, the entire browser based ecosystem just reeks of incompetence, lack of discipline and rigor.

Frontend and Backend engineers should get together and build a better system that serves both well. That's what I meant by asking Postgres core devs to build front-end tools, something like Gemini [1].

[1]

https://gemini.circumlunar.space/

jbverschoor wrote at 2020-10-31 03:11:28:

Sorry, but I can’t agree.

Yes, they are capable of making great software, but it’s exactly what gp said... they don’t _think_. I’ve seen so much code which looks nice from the outside, abd is all structured. But if you start reading, I’m riddled wtfs and questions about why things weren’t done in a more simple concise and stable way.

postit wrote at 2020-10-30 22:40:16:

I agree. We’ve also spent the last 10 years trying to make web app behave like desktop apps and they are still not matching.

erokar wrote at 2020-10-30 22:20:39:

Typical bitter backend developer spew. Works with Java or C# and hasn't explored much more than OO and procedural. Threatened by anything new, creative or explorative. Stuck in his ways and prefers to do mundane, uncreative and repetetive tasks, like writing boilerplate to get stuff in and out of a database. Soon to be automated away by serverless.

jschwartzi wrote at 2020-10-30 22:27:06:

Good luck running serverless code in a lightbulb.

outside1234 wrote at 2020-10-30 22:36:42:

https://azure.microsoft.com/en-us/services/iot-edge

api wrote at 2020-10-30 22:43:32:

Ooh, permanent lock in to a single company’s bill by the second mainframe! How exciting!

tinkertamper wrote at 2020-10-31 04:01:25:

I see this criticism a lot for serverless but I don’t see it in reality. Serverless is still just running code, the only part that wouldn’t be portable is the entrypoint data structure which is amounts to practically nothing.

soneca wrote at 2020-10-30 22:14:11:

Please explain further what negative characteristics do you apply to the _”js crowd“_ as a whole.

As a developer who only develops in JavaScript, I feel personally attacked by your post, so I would appreciate if your attack could be more specific to the point of being constructive or at least informative.

dragonwriter wrote at 2020-10-30 21:30:06:

> Databases shouldn't be exciting. They should _just work_.

With enough experience with the alternative, things that just work _are_ exciting.

auggierose wrote at 2020-10-30 23:25:27:

Amen.

baybal2 wrote at 2020-10-30 22:04:46:

PG steamrolls every other general purpose database, and has tons of extensions, and frontends so you can use it in any database paradigm.

With 30m minutes of time, you can turn it into a document db, log/time series db, columnar, graph, key value, whatever

pricechild wrote at 2020-10-30 22:15:05:

I bet we'd both seriously consider (if not prefer) sqlite over postgres in a bunch of scenarios?

edoceo wrote at 2020-10-30 22:33:41:

I love PG and sometimes I see some cases where maybe Sqlite would work "better" but just as soon as it starts to get a little bigger I start missing all the cool stuff that PG has.

Sqlite is still awesome but PG is more awesomer. And since the cost is the same I just stick with PG now. Unless I'm resource constrained (which for my work is rare)

zarkov99 wrote at 2020-10-30 23:18:11:

sqlite is a wonderful, amazing, file manipulation API. If single-threaded file manipulation is what you need, its great. But is not an application backend in the way PG can be.

api wrote at 2020-10-30 22:27:26:

SQLite is amazing, and “lite” is really not fair. I’ve seen it handle 1tb+ with at least decent performance.

It’s not a replacement for a real DB but it is damn close.

Rodeoclash wrote at 2020-10-30 22:49:27:

Actually as I understand it (and a couple of minutes Googling doesn't back up my case) the "lite" on the end is pronounced "ite" and refers more to SQL as a geology like term. In other words, the SQL is a rock.

eindiran wrote at 2020-10-30 23:03:39:

Even if that was what the name was intended to be, the vast majority of people pronounce it as SQL - lite. (In fact, this is the first I've heard of someone pronouncing it that way).

The documentation seems to suggest that it is in fact pronounced "SQL-lite":

https://www.sqlite.org/fullsql.html

> Don't be misled by the "Lite" in the name.

robertlagrant wrote at 2020-10-30 22:30:56:

It's not, because it is a real DB. It's just one for single user access only.

bachmeier wrote at 2020-10-30 23:43:33:

> Think of SQLite not as a replacement for Oracle but as a replacement for fopen().

https://sqlite.org/about.html

Koshkin wrote at 2020-10-31 00:56:21:

Well sure, it’s better be thought of as a replacement for fopen() than one for Oracle. (Unless you were thinking of buying Oracle in order to use it on your desktop.) An even better way would be to compare it with MS Access, for example.

jeff-davis wrote at 2020-10-30 22:40:37:

A lot of people imagine databases that way, but they are complex beasts with a lot of features. Everyone seems to say features don't matter, except of course for the ones that _they_ use. Which is often a growing set as your application matures.

All of these features are exciting to the people that can use them.

lhnz wrote at 2020-10-30 23:32:05:

  > If the Javascript crowd had to manage data storage,
  > nothing persistent would work reliably.

There are different incentives to writing UIs than there are when you're writing a backend. Often a business needs you to iterate very quickly, so people change their practices to suit this. Some are able to get away with never raising the quality bar once they finish iterating and it's unfortunate how the industry allows this. Additionally, there are, of course, teams and companies that want high quality code, but even here quality is defined differently from how it would be for data storage logic. For instance, it often means "high framerate" and "easy to understand", instead of "optimal" and "robust".

There are less JavaScript engineers writing backend code, and most that do this are generally making simplistic CRUD Node.js apps. However, there are companies and people that specialise in high-quality complex Node.js apps, and they try just as hard as backend engineers from other programming languages to do this well. (I'm thinking of projects made by developers on the core Node team. I've seen good backend code there, perhaps because they're more aware of lower-level concerns.)

I personally write a lot of JavaScript/TypeScript code and I try to do a good job, but a "good job" is defined differently depending on the kind of product you're creating.

Edit: I just realised you are _the John Nagle_, so you likely have a more refined taste in certain kinds of programming than I do, however I think my point about incentives probably still stands.

msla wrote at 2020-10-30 20:54:01:

> Databases shouldn't be exciting. They should _just work_.

A complicated thing which just works _is_ exciting.

bigbubba wrote at 2020-10-30 21:11:31:

One man's exciting is another's snorefest. A lot of people would probably laugh at the suggestion that _anything_ involving computers could be exciting.

systemvoltage wrote at 2020-10-30 21:20:39:

It is unproductive to talk about subjective opinions of excitement.

The GP was referring to "Databases shouldn't be exciting" - not in the traditional sense of the word, but a metaphor that hinges on "excitement" being new features, frivolous decorations, unnecessary tangents, losing vision of what a database should do, frantic patching and heated bikeshedding discussions.

It wasn't meant as "Ooooo I am excited about a database".

bigbubba wrote at 2020-10-30 21:27:53:

I think you're restating my point.

hashtagmarkup wrote at 2020-10-30 21:20:23:

> If the Javascript crowd had to manage data storage, nothing persistent would work reliably.

Wait until this guy learns what the JS in "json" column types means, and how the largest webcrap sites rely on them for persistence.

LukeShu wrote at 2020-10-30 22:48:51:

Note that 'Animats didn't comment on JS as a technology, he commented on the _crowd_.

In 2017 Bryan Cantrill gave a wonderful talk titled "Platform as a Reflection of Values: Joyent, Node.js and Beyond"

https://vimeo.com/230142234

In it, he talks about how even though they liked JS as a technology, they came to realize that the values of the JS/Node.js community differed from their values at Joyent.

In the talk (at around 19:30), he identifies that in 2014 Joyent's core values wrt node.js were "debuggability", "robustness", and "stability"; while the core values of the broader node.js community were "approachability", "expressiveness", and "velocity". And there's nothing wrong with either of those sets values, but they're not the same!

With that lens, I'd characterize the parent comment as contrasting the PostgreSQL team's core value of "robustness" with the Javascript crowd's core value of "velocity". That's a distinction in _values_, not in _technology_.

edgyquant wrote at 2020-10-30 22:53:44:

Json has nothing to do with the persistence part.

hashtagmarkup wrote at 2020-10-31 02:59:00:

If you need to persistently store frontend state in the backend, would it not be easier to use the native frontend data format?

It's not entirely about persistence... it's about the ease and reliability of the persistence.

edgyquant wrote at 2020-10-31 04:46:04:

My point is that that JSON isn’t being delivered via a js db but instead translated from data stored in a database written in C. So the “what about json” gotcha isn’t as good as it was made to seem.

kulig wrote at 2020-10-30 21:52:42:

Json is just a format. There are trillions of formats that work just as well as json.

pmiller2 wrote at 2020-10-30 22:44:33:

This was a really good article, and thank you @jerodsanto for submitting it. I learned probably 6-7 _big_ things about Postgres reading it that I didn't know before.

I won't list out those things, because, I'm sure it wouldn't be relevant to most people reading this comment. :). But, there are a couple things I noticed that I _do_ want to mention:

There were these edge cases in there, where it’s kind of like less safe. And Postgres was there, and it had a reputation of being just rock-solid, consistent, and more strict with your data

"... I never had that moment where I’m like “Oh, Postgres, you screwed me over.”

This is a huge deal, and, IMO, _the_ reason to choose Postgres over MySQL. MySQL has some _horrendous_ defaults. You can make things a little better, but, AFAIK, you can't even get to where Postgres starts out of the box. Most of that consists of MySQL being extremely permissive with what it allows you to do in your SQL. Many of those things are non-standard, and some are quite unsafe. Postgres not only

prioritizes safety, it keeps the application developer honest.

It can be a little complain-y, but I'd rather have my database complain than do stuff that could lose data. One great example is when you try to stuff more characters into a text field than it can accommodate. By default, MySQL just truncates the string silently; Postgres just doesn't allow it.

PostGIS is a huge one. It’s a whole geospatial database.

This is the other point I want to call out. I've had the pleasure of working with PostGIS before. The main thing I'd want to say about PostGIS is that, although there are issues working with geojson and GIS in general, PostGIS doesn't exacerbate any of them. Generally speaking, PostGIS is about as pleasant to work with as a geospatial database can be.

acdha wrote at 2020-10-30 23:08:40:

> Most of that consists of MySQL being extremely permissive with what it allows you to do in your SQL. Many of those things are non-standard, and some are quite unsafe. Postgres not only prioritizes safety, it keeps the application developer honest.

This is incredibly important: if you use MySQL, unless you are extremely pedantic about using safe defaults and a safe-by-default ORM you will almost certainly end up in the position where migrating will be hard because you are going to need to rigorously test & remediate all of the areas where your code was relying on MySQL allowing erroneous code to run without errors and nobody ever looked at the warnings.

I've even seen a few cases where people had data loss which had either gone unnoticed or been sporadic enough that they assumed it was user error or random corruption.

hoistbypetard wrote at 2020-10-30 23:32:21:

> I've even seen a few cases where people had data loss which had either gone unnoticed or been sporadic enough that they assumed it was user error or random corruption.

I've gotten to troubleshoot this specific condition myself. Amusingly, the big thing that saved us was that we could use lingering redundancy in a poorly normalized schema to rebuild some relationships that should've been enforced by the database.

redis_mlc wrote at 2020-10-31 00:51:30:

> I've even seen a few cases where people had data loss which had either gone unnoticed

Maybe under MyISAM, but InnoDB has been the default storage engine since 2010.

acdha wrote at 2020-10-31 02:29:57:

Under InnoDB, because we’re not talking about file corruption but cases where MySQL would silently and irreversibly discard data rather than throwing an error. For example, inserting a value which is too large for the target data type - it truncates the value so that value is lost unless you have another copy or way to reconstruct it.

nomad543 wrote at 2020-10-31 04:28:04:

People always criticize MySQL, then when asked what is the problem, they always mention the same thing about column length, which is just a default that can easily be changed and it's not even the default anymore for years.

hodgesrm wrote at 2020-10-31 02:41:43:

I think you are getting lost in the details. Databases are inherently complex and there are sharp edges as well as tricks you need to learn for each one. Beyond that row stores like MySQL and PostgreSQL are essentially interchangeable. The biggest technical success factor is whether you understand the DBMS well enough to use it effectively. The biggest business success factor is licensing: MySQL is GPLv2 and PostgreSQL is BSD.

Specific features are somewhat interesting but at this point not many teams are starting from scratch on MySQL or PostgreSQL. It's generally the case you already have experience on one or the other or both.

nomad543 wrote at 2020-10-30 23:33:17:

Your example is outdated, MySQL doesn't truncate by default in later versions. Defaults can easily be changed. Choosing one database over another because of defaults is not a good strategy.

acdha wrote at 2020-10-31 00:14:05:

The point still stands, however — here's an incomplete list of everything you have to worry about with a MySQL database which you don't need to worry about with most other databases:

https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html#sql-mo...

(I believe there are also still issues with Unicode, too)

I ran into a fun one a few weeks back: code which had been running for years failed on an AWS Aurora cluster because they'd defined a field as BIGINT and were inserting UUID_SHORT() values into it. For some reason, Aurora always generates values with the first bit set and so they learned the hard way that that field needed to be declared as unsigned.

Yes, the defaults can be changed but many, many people do not know this and will only learn why they need to after they have something blow up painfully. Often, even people who know this in theory will forget about it at some point when setting up a new one. It's the same reason why the industry is trying to get away from services like MongoDB / ELK listening on 0.0.0.0 with no password or having a default password which can easily be scanned. For something as widely installed as MySQL, even a 10% chance of oversights will mean a LOT of collateral damage.

pmiller2 wrote at 2020-10-30 23:48:13:

Fine, my example is outdated. That's irrelevant. I could probably find 3 more examples in the current default config for MySQL that would be as bad or worse.

As for the rest of your comment, sensible defaults most certainly _are_ a valid reason to choose one piece of software over another. Bad default configs are equivalent to what the Dungeons & Dragons community calls "newbie traps." They're things that either don't look important, or don't look like they'll hurt you, but really are important, or can hurt you, if you're not extremely careful. What's more, some of these options are probably not going to be backward compatible with your code. So, you get into this situation where MySQL is doing shitty things you don't want it to do, and, when you find out later that it's doing those things, you either can't do much about it, or doing something about it involves major surgery to your application. That's bad.

So, yes, you can configure MySQL to have better than default behavior. That's no surprise; that's literally the point of having configuration options, after all. The problem is that some of these options don't look too bad. There is something to be said for having a permissive configuration. But, for the reasons I spelled out in the previous paragraph, that shouldn't be the _default_ configuration. It's not too much to ask that you should be able to install and configure a database and not have it lose or corrupt your data.

redis_mlc wrote at 2020-10-31 00:50:11:

> I'd rather have my database complain than do stuff that could lose data. One great example is when you try to stuff more characters into a text field than it can accommodate. By default, MySQL just truncates the string silently; Postgres just doesn't allow it.

That's your opinion, but the truth is not necessarily black and white.

Postgres will probably lose more data in your case when the application developer doesn't handle the exception for a text field too long, and loses the whole row. (If you're an application developer, you should audit your code now.)

A well-written application using Postgres would have to truncate the data before insert, achieving the same result as having MySQL auto-truncation, but with more initial and on-going maintenance effort in the case of Postgres.

And the existence proof that MySQL's behavior is ok is that MySQL is the most widely-used relational databases, and powers most Internet sites.

Source: DBA.

pmiller2 wrote at 2020-10-31 02:48:49:

Here's the deal: I am not presenting my opinion as objective fact, nor do I actually care that you're a DBA. If anything, the fact that you are a DBA makes your opinion _less_ relevant to me as an application developer. Simply put, you do not share my pain, and the people who have upvoted me, while your comment is currently gray as I write this, have.

The example was intended to show a scenario in which MySQL _silently_ loses data. There are certainly situations in which this sort of data loss is acceptable. I've been involved in scenarios where it was not acceptable, I was surprised by the behavior, and I ended up with a bunch of missing data.

The problem here is that I was _surprised_ by my database.

That should literally _never_ happen. The correct behavior in this specific case is for the database driver to raise an exception or warning (exactly which should probably be a configurable option, defaulting to an exception). That way, if an exception gets raised in my application, at the very minimum, I have a traceback in my logs to _tell_ me something went wrong, which allows me to fix the issue sooner rather than later.

As for your existence proof that "MySQL's behavior is ok," the fact that MySQL has a bigger installed base literally proves nothing. I guarantee you there are folks out there who are experiencing such behavior and being surprised by it, just like I was. There are people out there for whom this specific behavior is not a problem. And, there are people out there who are length checking every bit of text they try to stuff into a VARCHAR(n) or CHAR(n) field using MySQL, because they started with MySQL and they're basically stuck with it, unless they want to do major surgery on their app.

axegon_ wrote at 2020-10-30 21:27:42:

Apart from being the most mature relational database, I would argue the code that runs it. It's pure poetry. I am yet to see code this clean anywhere. Just opening any source code file at random[1] will show you exactly what I mean.

[1]

https://github.com/postgres/postgres/blob/master/src/backend...

systemvoltage wrote at 2020-10-30 21:34:33:

Let's not get carried away. It is very much debatable which database is "most mature".

Here is some discussions about how Postgres compares with Oracle DB (if $ is not an issue), it is often said that Postgres is a Cessna whereas Oracle DB is like a Fighter Jet:

https://news.ycombinator.com/item?id=24582937

axegon_ wrote at 2020-10-30 21:40:01:

I've never had the opportunity to test Oracle DB at large, just some minor playing around with 12c many many years ago so I don't have an accurate view. That said, given my experience with other Oracle products and the fighter jet analogy... De omnibus dibutandum est...

craigkerstiens wrote at 2020-10-30 22:23:21:

Fully agree with this. I once heard it stated by someone "I don't like writing C, but I don't mind at all writing Postgres C". It is definitely a quality codebase and the credit goes to a pretty small set of maintainers over the years.

rbanffy wrote at 2020-10-30 20:40:10:

I started using PostgreSQL because, at the time, MySQL wouldn't do subqueries. The fact it silently truncated VARCHAR data to the size of the column and didn't give out an error didn't help much.

Overall, my impression is that the project's philosophy is to do the Correct Thing, even if it's slow.

Ralfp wrote at 2020-10-30 21:30:44:

Few years back I've had friend explain the difference between MySQL and PostgreSQL like this:

- MySQL: do it fast, then make it right

- PSQL: do it right, then make it fast

setr wrote at 2020-10-31 00:06:03:

I had switched after learning that CHECK constraints were parsed, but ignored -- I've yet to be as offended by a program that at that instant

Apparently supported now but what the hell; I don't think they even threw a warning about it

anonymous324 wrote at 2020-10-30 21:01:19:

PostgreSQL isn't exciting. It's boring, with roots in the Berkeley Ingress project from the mid 80s. Which is why 10 years ago, so many HNers went with trendy NoSQL data stores instead, even though they lost data. Data loss is very exciting, and PostgreSQL is very boring. This preference for what's exciting and trendy over what's old and proven is a large reason why so much software today sucks.

Someone1234 wrote at 2020-10-30 21:10:30:

That seems unfair.

NoSQL was trying out new _ideas_ in data storage. It was exciting to try out new or re-imagined core concepts, and some of those young projects had teething issues. But several are still around and remain popular, but they're popular for certain niches they excel at (and those niches were largely discovered through trial and error).

In the SQL-sphere a lot of people skipped Postgres because MySQL had, at the time, the momentum in the free/cheap relational database space. Between then and now Progres has grown more elegantly than MySQL, and people are rightfully looking to it.

anonymous326 wrote at 2020-10-30 21:45:18:

> NoSQL was trying out new ideas in data storage

Was it? Or was it just rehashing old ideas made obsolete by relational databases?

https://en.wikipedia.org/wiki/IBM_Information_Management_Sys...

> Initial release 1966; 54 years ago

https://en.wikipedia.org/wiki/Category:NoSQL

treeman79 wrote at 2020-10-30 21:51:42:

MySQL was also a lot faster in the early days.

Plus it “just worked” not sanely mind you. But you had a lot less problems with type mismatches, up/downcast, etc. Well unless you wanted quality data.

But back then we were just happy that it didn’t bother us with “minor” details.

hoistbypetard wrote at 2020-10-30 23:28:05:

It was a lot faster, and sometimes the odds that your data might live long enough for those integrity issues to matter seemed very remote.

Around 2001, the conventional wisdom was that it was worth the risk to take the performance victory, keep your hosting bill down, use mysql, and build some other approach to data integrity. (If memory serves, the integrity downsides were bigger then, too...)

The math is pretty different now. I'm not sure it's fair to hang all of it on Oracle, but it feels like that acquisition derailed some MySQL progress that might've made it a close call today. Now, I use Postgres unless I have a very specific, very compelling reason not to.

redis_mlc wrote at 2020-10-31 04:21:47:

> Around 2001

ok boomer. It's 2020.

shay_ker wrote at 2020-10-30 21:12:23:

Surprised they didn't cover transaction id wraparound, which has caused more than a few public outages at scale.

IIRC some Postgres folks have a plan to fix it? Maybe Craig knows?

anarazel wrote at 2020-10-30 21:24:30:

> Surprised they didn't cover transaction id wraparound, which has caused more than a few public outages at scale.

There have been some significant improvements in the last couple years - most importantly the introduction of the freeze map in 9.6. But also quite a sprinkling of other incremental improvements.

It can obviously still be a problem, but it's not as pronounced as it once was.

shay_ker wrote at 2020-10-30 21:51:21:

can you elaborate?

anarazel wrote at 2020-10-30 22:14:48:

> can you elaborate?

On what exactly?

The freeze map bit I referenced above is the following commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

      Commit a892234f830e832110f63fc0a2afce2fb21d1584 gave us enough
    infrastructure to avoid vacuuming pages where every tuple on the
    page is already frozen.  So, replace the notion of a scan_all or
    whole-table vacuum with the less onerous notion of an "aggressive"
    vacuum, which will pages that are all-visible, but still skip those
    that are all-frozen.
    
    This should greatly reduce the cost of anti-wraparound vacuuming
    on large clusters where the majority of data is never touched
    between one cycle and the next, because we'll no longer have to
    read all of those pages only to find out that we don't need to
    do anything with them.

This means that an anti-wraparound vacuum (automatically started, even when autovacuum is disabled) is cheaper than it used to be. Still not necessarily cheap, as indexes still need to be scanned (but see below).

Some of the additional changes were (reverse chronological order):

* 2020-03-28 - "Trigger autovacuum based on number of INSERTs" -

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

This is important because it will reduce the cost of an eventual index wraparound, as there will be less work in a later anti-wraparound vacuum

* 2020-01-20 - "Allow vacuum command to process indexes in parallel." -

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

As index processing can be expensive (no equivalent to scanning only changed parts of table), processing them in parallel can greatly reduce the time for a vacuum. Note that this isn't yet done by autovacuum.

* 2019-04-04 - "Allow VACUUM to be run with index cleanup disabled." -

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

This can be extremely useful for manual vacuum when getting close to a wraparound, since the index processing step is not necessary to stave of wraparound.

* 2018-04-04 - "Skip full index scan during cleanup of B-tree indexes when possible" -

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

This can make vacuums for pretty clean tables vastly cheaper.

There's also a number of changes that make indexes smaller / less likely to bloat. That in turn makes vacuuming more efficient. E.g.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

Edit: formatting (gah, why is HN formatting so limited)

craigkerstiens wrote at 2020-10-30 22:51:23:

Just chiming in to say, Andres above can answer this far more in detail than I can. I suspect most people that do hit this today 1. have not nearly as a bad pain from it and 2. recover in generally much more quickly than previously. Today vs. say Postgres 9.6 is a huge difference.

postgresqlrock wrote at 2020-10-30 22:58:01:

Features I miss in PostgreSQL:

- Support large number of connections (without external pooling tools) and stop starting a new process per connection (consumes huge amount of resources). Other DBMS works with threads instead of processes and don't have any issues.

- Easy to configure high availability with multiple nodes

Less important:

- Graph capabilities directly in PostgreSQL without extentions or something else

- Running a scheduling tasks directly in PostgreSQL without external tools (for example a script that removes old data from database).

- Supporting Temporal Tables

But overall I like PostgreSQL more than Oracle or SQL Server, even if I miss the above features.

udev wrote at 2020-10-30 20:41:08:

I like the this whole class of questions "What's so exciting/good about <SOMETHING>?"

Online (forums, etc) this type of question usually elicits informative answers, and probably in face to face communication too.

sandinmyjoints wrote at 2020-10-30 19:02:53:

It was really funny, Uber switched from MySQL to Postgres, and then Postgres to MySQL. So they’ve gone back and forth a couple of times, but when they’ve swapped to Postgres, they had to figure out – they had all this app logic that relied on things being case-insensitive, because the database just doesn’t respect case search.

This suggests to me that the guest here maybe doesn't know about collations? But that doesn't seem likely (possible?) given his stated experience and the roles he's held. So what gives?

craigkerstiens wrote at 2020-10-30 19:48:50:

Guest here. The context is Uber was on MySQL which by default doesn't respect (or at least in their version and setup at the time) case sensitivity. To MySQL craig is the same as Craig when matching.

Their app for search relied on this. The solution for them was the citext data type to create a case insensitive string -

https://www.postgresql.org/docs/13/citext.html

I'm fairly familiar with Postgres, but have not heard of a collation that tackles case insensitivity without tweaking the system itself? Postgres you can tweak this at the system level, but I've not seen it in the wild and as per the docs you're now non deterministic. For MySQL it appears you can change this, but seems a surprising default for them they expected case insensitivity.

Yes, another option could have been indexing and searching on lower or upper casing, but they wanted the minimal change to their app.

There was once a conference talk they gave about their migration process from MySQL->Postgres, but I'm not immediately able to find the video online so it may have been removed.

Arnavion wrote at 2020-10-30 20:17:27:

I'm not familiar with pg, but the choice of case sensitivity is a fundamental property of collations in other DBMSes like MSSQL. Web search does make it look like pg only got case-insenstivity-via-collation in v12 (2019), and also requires you to make your own collation with case-insensitivity since it doesn't have any built-in ones with it.

aloukissas wrote at 2020-10-30 21:13:52:

It's been a while and don't remember the specifics, but when we implemented a POSIX-compliant filesystem at Maginatics (acq. by EMC), we used MySQL (Percona actually) to store the FS logic (inode numbers, filenames, etc) and case sensitivity was one of the FS features. Maybe Percona supports it? I do remember (at least at that time) that full utf-8 support was poor though.

aloukissas wrote at 2020-10-30 21:15:16:

Btw, don't get me started about case-sensitive file systems - Windows got it right (case-preserving, but case-insensitive). Why should both /home/ellen/Music and /home/ellen/music exist? :)

viraptor wrote at 2020-10-30 21:25:49:

Because once you go there, you have to answer questions which are not as easy. Does M match m? Does i match ı? Does a match あ? Does あ match ア? Does m match 𝓂? What happens when they need to coexist?

aloukissas wrote at 2020-10-30 23:57:41:

But they're different! I see where you're coming from though (unfortunately I know all these examples too well..).

gigatexal wrote at 2020-10-30 20:48:50:

Case insensitive search is accomplished with ILIKE no?

johannes1234321 wrote at 2020-10-31 00:17:10:

How well does ILIKE work with indexes? In MySQL the collation (which includes case- and accent-independency options) is known to the index.

fuhrysteve wrote at 2020-10-30 20:35:13:

Great practical note by Josh Berkus on why Uber left Posgresql. Basically: runaway table bloat because Uber had a usecase that postgres doesn't address as well as InnoDB.

https://www.postgresql.org/message-id/5797D5A1.5030009%40agl...

api wrote at 2020-10-30 22:37:07:

The whole VACUUM paradigm is the biggest thing that bugs me about pgsql. The fact that it can actually freeze things always worries me. Can’t this happen constantly in the background like modern GCs?

chousuke wrote at 2020-10-30 23:11:25:

It does, though? That's what autovacuum is, unless I'm missing something.

It's just not always enough without tuning, just like GC.

coding123 wrote at 2020-10-30 19:23:00:

I learned that about postgres collations and case sensitivity about 6 years back, and it was within 3 months of my first heavy use of postgres. So I wonder if this person was just experiencing their first usage of postgres too? And I've used relational databases for more than 25 years.

willvarfar wrote at 2020-10-30 19:19:58:

I read that as Uber discovered their code relied on MySQL default case insentivity. Migrating to Postgres might have been tricky as collation is a very recent addition in Postgres.

sandinmyjoints wrote at 2020-10-30 19:44:42:

That could be. It was the "because the database just doesn’t respect case search" part that struck me -- to say that MySQL doesn't respect case search is false, and the only scenario I can think of where it can even appear to be the case is if you're new to MySQL and you've made a big assumption without even googling it. Or maybe if you're new to string comparison (and therefore programming?) in general.

dainiusse wrote at 2020-10-30 19:18:43:

Uber once switched Postgres->MySQL because of human error and polytics for “taking remediation action”.

cxcorp wrote at 2020-10-30 22:59:15:

Wow, what a great episode! Probably one of the info-loaded podcasts I've listened to. Definitely going to be using psql's \e command with $EDITOR set to VS Code.

mxyzplk wrote at 2020-10-30 22:40:02:

It works reliably and is not at the mercy of Oracle - so it's pretty much the winner for relational storage, unless you need something different (doc db, time series db...).

RickHull wrote at 2020-10-30 18:34:39:

Title has a typo: What's so _exciting_ about Postgres