đŸ Archived View for dioskouroi.xyz âș thread âș 24944852 captured on 2020-10-31 at 01:00:57. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
Databases shouldn't be exciting. They should _just work_.
The main reason webcrap works is because the databases underneath work. If the Javascript crowd had to manage data storage, nothing persistent would work reliably.
It is often downvoted to talk about the "javascript crowd" on HN, but I've worked in software industry long enough to unequivocally and with certainty say that it totally lives up to the cliches.
It never gets old to scold people that are not serious, disciplined and principled in their endeavors that affect the rest of the world. Javascript community absolutely deserves the avalanche of criticisms they face.
It is the reason why the modern web sucks, and frankly - I wish folks at Postgres would develop a new web standards including browsers that are not governed by corporations and can be implemented in a couple of weeks. Not trivializing what goes into building browsers, but making the whole standard small enough so any one can build a browser during a sabbatical.
I'm seeing this in action now, the difference between the backend code and the frontend code is stark and the backend code isn't up to my standard either (and my standards aren't that high, I know I'm merely competent for many places).
It just seems to be the nature of the web side of things, lots of sloppiness.
The devs are smart enough, they just don't _think_ and it shows.
That or they use something but clearly don't understand the thing they are using.
I replaced nearly 500 lines of code with ~60 by taking a bunch of massive conditonals re-ordering them then extracting them into methods and returning on the simple case.
This isn't voodoo, you have to work harder and remember more to do it the _wrong_ way.
Also comments, they don't seem to understand you document _why_ you did the thing, not what you did, what you did is in the code, I don't trust your comments to tell me what you did, I want to know _why_ you did it.
That and general documentation (cross link stuff, link at the top of the class to the documentation in the wiki, update the documentation when you get a ticket that changes the behaviour, write the high level flow in plain english a 10 year old could understand).
_rant over_.
I actually like my job, the people are lovely and the work is interesting its just frustrating when an ounce of thought saves a pound of cure.
Hello fellow "merely competent" engineer. I recommend we both remove "merely" from our title. Inexperience grows from the bottom, I've gotten "better" by simply being too stubborn to get worse. Keep up the mediocre work!
> Inexperience grows from the bottom, I've gotten "better" by simply being too stubborn to get worse.
That is quite possibly one of the best quotes I have heard on how to approach discipline in engineering.
I don't understand, can you please explain?
> Inexperience grows from the bottom
In what way?
> I've gotten "better" by simply being too stubborn to get worse
How does one get better by wanting to get worse?
This makes no sense to me.
Should I assume from your comment that you never saw code worth improving other than those written by JavaScript developers?
Ah, pointless exclusion of the middle.
No.
Thatâs quite the perspective.
Itâs a big enough world where you donât have to interact with every technology that exists, but this is one of the more prevalent ones - and you might find it useful to make peace with that community.
Anyway, good luck out there.
It's a stereotype, and like a lot of stereotypes, it has a little bit of statistical truth, but I don't think it does justice for the excellent front-end engineers that do exist and whom I've worked with personally.
That said, there is some truth in it. The thing about the back end is that there's no satisfaction in it other than the satisfaction of doing it well and earning the esteem of peers, so most back end developers are paying attention to the quality of the code they write. The front end has other attractions as well, and many front-end developers start out focused on those attractions, and learn by experience (if they learn at all) that writing high quality code is related to the cool stuff that got them into programming in the first place.
I didn't mean us to be divisive. Backend engineers need good front-end partners to be successful. Front-end engineers depend on the backend folk to not fuck shit up, which happens time and again.
I meant that the developer community, the folks that are at the helm of developing Javascript frameworks (gahhhh!) and other shenanigans, the entire browser based ecosystem just reeks of incompetence, lack of discipline and rigor.
Frontend and Backend engineers should get together and build a better system that serves both well. That's what I meant by asking Postgres core devs to build front-end tools, something like Gemini [1].
[1]
https://gemini.circumlunar.space/
Sorry, but I canât agree.
Yes, they are capable of making great software, but itâs exactly what gp said... they donât _think_. Iâve seen so much code which looks nice from the outside, abd is all structured. But if you start reading, Iâm riddled wtfs and questions about why things werenât done in a more simple concise and stable way.
I agree. Weâve also spent the last 10 years trying to make web app behave like desktop apps and they are still not matching.
Typical bitter backend developer spew. Works with Java or C# and hasn't explored much more than OO and procedural. Threatened by anything new, creative or explorative. Stuck in his ways and prefers to do mundane, uncreative and repetetive tasks, like writing boilerplate to get stuff in and out of a database. Soon to be automated away by serverless.
Good luck running serverless code in a lightbulb.
https://azure.microsoft.com/en-us/services/iot-edge
Ooh, permanent lock in to a single companyâs bill by the second mainframe! How exciting!
I see this criticism a lot for serverless but I donât see it in reality. Serverless is still just running code, the only part that wouldnât be portable is the entrypoint data structure which is amounts to practically nothing.
Please explain further what negative characteristics do you apply to the _âjs crowdâ_ as a whole.
As a developer who only develops in JavaScript, I feel personally attacked by your post, so I would appreciate if your attack could be more specific to the point of being constructive or at least informative.
> Databases shouldn't be exciting. They should _just work_.
With enough experience with the alternative, things that just work _are_ exciting.
Amen.
PG steamrolls every other general purpose database, and has tons of extensions, and frontends so you can use it in any database paradigm.
With 30m minutes of time, you can turn it into a document db, log/time series db, columnar, graph, key value, whatever
I bet we'd both seriously consider (if not prefer) sqlite over postgres in a bunch of scenarios?
I love PG and sometimes I see some cases where maybe Sqlite would work "better" but just as soon as it starts to get a little bigger I start missing all the cool stuff that PG has.
Sqlite is still awesome but PG is more awesomer. And since the cost is the same I just stick with PG now. Unless I'm resource constrained (which for my work is rare)
sqlite is a wonderful, amazing, file manipulation API. If single-threaded file manipulation is what you need, its great. But is not an application backend in the way PG can be.
SQLite is amazing, and âliteâ is really not fair. Iâve seen it handle 1tb+ with at least decent performance.
Itâs not a replacement for a real DB but it is damn close.
Actually as I understand it (and a couple of minutes Googling doesn't back up my case) the "lite" on the end is pronounced "ite" and refers more to SQL as a geology like term. In other words, the SQL is a rock.
Even if that was what the name was intended to be, the vast majority of people pronounce it as SQL - lite. (In fact, this is the first I've heard of someone pronouncing it that way).
The documentation seems to suggest that it is in fact pronounced "SQL-lite":
https://www.sqlite.org/fullsql.html
> Don't be misled by the "Lite" in the name.
It's not, because it is a real DB. It's just one for single user access only.
> Think of SQLite not as a replacement for Oracle but as a replacement for fopen().
Well sure, itâs better be thought of as a replacement for fopen() than one for Oracle. (Unless you were thinking of buying Oracle in order to use it on your desktop.) An even better way would be to compare it with MS Access, for example.
A lot of people imagine databases that way, but they are complex beasts with a lot of features. Everyone seems to say features don't matter, except of course for the ones that _they_ use. Which is often a growing set as your application matures.
All of these features are exciting to the people that can use them.
> If the Javascript crowd had to manage data storage, > nothing persistent would work reliably.
There are different incentives to writing UIs than there are when you're writing a backend. Often a business needs you to iterate very quickly, so people change their practices to suit this. Some are able to get away with never raising the quality bar once they finish iterating and it's unfortunate how the industry allows this. Additionally, there are, of course, teams and companies that want high quality code, but even here quality is defined differently from how it would be for data storage logic. For instance, it often means "high framerate" and "easy to understand", instead of "optimal" and "robust".
There are less JavaScript engineers writing backend code, and most that do this are generally making simplistic CRUD Node.js apps. However, there are companies and people that specialise in high-quality complex Node.js apps, and they try just as hard as backend engineers from other programming languages to do this well. (I'm thinking of projects made by developers on the core Node team. I've seen good backend code there, perhaps because they're more aware of lower-level concerns.)
I personally write a lot of JavaScript/TypeScript code and I try to do a good job, but a "good job" is defined differently depending on the kind of product you're creating.
Edit: I just realised you are _the John Nagle_, so you likely have a more refined taste in certain kinds of programming than I do, however I think my point about incentives probably still stands.
> Databases shouldn't be exciting. They should _just work_.
A complicated thing which just works _is_ exciting.
One man's exciting is another's snorefest. A lot of people would probably laugh at the suggestion that _anything_ involving computers could be exciting.
It is unproductive to talk about subjective opinions of excitement.
The GP was referring to "Databases shouldn't be exciting" - not in the traditional sense of the word, but a metaphor that hinges on "excitement" being new features, frivolous decorations, unnecessary tangents, losing vision of what a database should do, frantic patching and heated bikeshedding discussions.
It wasn't meant as "Ooooo I am excited about a database".
I think you're restating my point.
> If the Javascript crowd had to manage data storage, nothing persistent would work reliably.
Wait until this guy learns what the JS in "json" column types means, and how the largest webcrap sites rely on them for persistence.
Note that 'Animats didn't comment on JS as a technology, he commented on the _crowd_.
In 2017 Bryan Cantrill gave a wonderful talk titled "Platform as a Reflection of Values: Joyent, Node.js and Beyond"
In it, he talks about how even though they liked JS as a technology, they came to realize that the values of the JS/Node.js community differed from their values at Joyent.
In the talk (at around 19:30), he identifies that in 2014 Joyent's core values wrt node.js were "debuggability", "robustness", and "stability"; while the core values of the broader node.js community were "approachability", "expressiveness", and "velocity". And there's nothing wrong with either of those sets values, but they're not the same!
With that lens, I'd characterize the parent comment as contrasting the PostgreSQL team's core value of "robustness" with the Javascript crowd's core value of "velocity". That's a distinction in _values_, not in _technology_.
Json has nothing to do with the persistence part.
If you need to persistently store frontend state in the backend, would it not be easier to use the native frontend data format?
It's not entirely about persistence... it's about the ease and reliability of the persistence.
My point is that that JSON isnât being delivered via a js db but instead translated from data stored in a database written in C. So the âwhat about jsonâ gotcha isnât as good as it was made to seem.
Json is just a format. There are trillions of formats that work just as well as json.
This was a really good article, and thank you @jerodsanto for submitting it. I learned probably 6-7 _big_ things about Postgres reading it that I didn't know before.
I won't list out those things, because, I'm sure it wouldn't be relevant to most people reading this comment. :). But, there are a couple things I noticed that I _do_ want to mention:
There were these edge cases in there, where itâs kind of like less safe. And Postgres was there, and it had a reputation of being just rock-solid, consistent, and more strict with your data
"... I never had that moment where Iâm like âOh, Postgres, you screwed me over.â
This is a huge deal, and, IMO, _the_ reason to choose Postgres over MySQL. MySQL has some _horrendous_ defaults. You can make things a little better, but, AFAIK, you can't even get to where Postgres starts out of the box. Most of that consists of MySQL being extremely permissive with what it allows you to do in your SQL. Many of those things are non-standard, and some are quite unsafe. Postgres not only
prioritizes safety, it keeps the application developer honest.
It can be a little complain-y, but I'd rather have my database complain than do stuff that could lose data. One great example is when you try to stuff more characters into a text field than it can accommodate. By default, MySQL just truncates the string silently; Postgres just doesn't allow it.
PostGIS is a huge one. Itâs a whole geospatial database.
This is the other point I want to call out. I've had the pleasure of working with PostGIS before. The main thing I'd want to say about PostGIS is that, although there are issues working with geojson and GIS in general, PostGIS doesn't exacerbate any of them. Generally speaking, PostGIS is about as pleasant to work with as a geospatial database can be.
> Most of that consists of MySQL being extremely permissive with what it allows you to do in your SQL. Many of those things are non-standard, and some are quite unsafe. Postgres not only prioritizes safety, it keeps the application developer honest.
This is incredibly important: if you use MySQL, unless you are extremely pedantic about using safe defaults and a safe-by-default ORM you will almost certainly end up in the position where migrating will be hard because you are going to need to rigorously test & remediate all of the areas where your code was relying on MySQL allowing erroneous code to run without errors and nobody ever looked at the warnings.
I've even seen a few cases where people had data loss which had either gone unnoticed or been sporadic enough that they assumed it was user error or random corruption.
> I've even seen a few cases where people had data loss which had either gone unnoticed or been sporadic enough that they assumed it was user error or random corruption.
I've gotten to troubleshoot this specific condition myself. Amusingly, the big thing that saved us was that we could use lingering redundancy in a poorly normalized schema to rebuild some relationships that should've been enforced by the database.
> I've even seen a few cases where people had data loss which had either gone unnoticed
Maybe under MyISAM, but InnoDB has been the default storage engine since 2010.
Under InnoDB, because weâre not talking about file corruption but cases where MySQL would silently and irreversibly discard data rather than throwing an error. For example, inserting a value which is too large for the target data type - it truncates the value so that value is lost unless you have another copy or way to reconstruct it.
People always criticize MySQL, then when asked what is the problem, they always mention the same thing about column length, which is just a default that can easily be changed and it's not even the default anymore for years.
I think you are getting lost in the details. Databases are inherently complex and there are sharp edges as well as tricks you need to learn for each one. Beyond that row stores like MySQL and PostgreSQL are essentially interchangeable. The biggest technical success factor is whether you understand the DBMS well enough to use it effectively. The biggest business success factor is licensing: MySQL is GPLv2 and PostgreSQL is BSD.
Specific features are somewhat interesting but at this point not many teams are starting from scratch on MySQL or PostgreSQL. It's generally the case you already have experience on one or the other or both.
Your example is outdated, MySQL doesn't truncate by default in later versions. Defaults can easily be changed. Choosing one database over another because of defaults is not a good strategy.
The point still stands, however â here's an incomplete list of everything you have to worry about with a MySQL database which you don't need to worry about with most other databases:
https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html#sql-mo...
(I believe there are also still issues with Unicode, too)
I ran into a fun one a few weeks back: code which had been running for years failed on an AWS Aurora cluster because they'd defined a field as BIGINT and were inserting UUID_SHORT() values into it. For some reason, Aurora always generates values with the first bit set and so they learned the hard way that that field needed to be declared as unsigned.
Yes, the defaults can be changed but many, many people do not know this and will only learn why they need to after they have something blow up painfully. Often, even people who know this in theory will forget about it at some point when setting up a new one. It's the same reason why the industry is trying to get away from services like MongoDB / ELK listening on 0.0.0.0 with no password or having a default password which can easily be scanned. For something as widely installed as MySQL, even a 10% chance of oversights will mean a LOT of collateral damage.
Fine, my example is outdated. That's irrelevant. I could probably find 3 more examples in the current default config for MySQL that would be as bad or worse.
As for the rest of your comment, sensible defaults most certainly _are_ a valid reason to choose one piece of software over another. Bad default configs are equivalent to what the Dungeons & Dragons community calls "newbie traps." They're things that either don't look important, or don't look like they'll hurt you, but really are important, or can hurt you, if you're not extremely careful. What's more, some of these options are probably not going to be backward compatible with your code. So, you get into this situation where MySQL is doing shitty things you don't want it to do, and, when you find out later that it's doing those things, you either can't do much about it, or doing something about it involves major surgery to your application. That's bad.
So, yes, you can configure MySQL to have better than default behavior. That's no surprise; that's literally the point of having configuration options, after all. The problem is that some of these options don't look too bad. There is something to be said for having a permissive configuration. But, for the reasons I spelled out in the previous paragraph, that shouldn't be the _default_ configuration. It's not too much to ask that you should be able to install and configure a database and not have it lose or corrupt your data.
> I'd rather have my database complain than do stuff that could lose data. One great example is when you try to stuff more characters into a text field than it can accommodate. By default, MySQL just truncates the string silently; Postgres just doesn't allow it.
That's your opinion, but the truth is not necessarily black and white.
Postgres will probably lose more data in your case when the application developer doesn't handle the exception for a text field too long, and loses the whole row. (If you're an application developer, you should audit your code now.)
A well-written application using Postgres would have to truncate the data before insert, achieving the same result as having MySQL auto-truncation, but with more initial and on-going maintenance effort in the case of Postgres.
And the existence proof that MySQL's behavior is ok is that MySQL is the most widely-used relational databases, and powers most Internet sites.
Source: DBA.
Here's the deal: I am not presenting my opinion as objective fact, nor do I actually care that you're a DBA. If anything, the fact that you are a DBA makes your opinion _less_ relevant to me as an application developer. Simply put, you do not share my pain, and the people who have upvoted me, while your comment is currently gray as I write this, have.
The example was intended to show a scenario in which MySQL _silently_ loses data. There are certainly situations in which this sort of data loss is acceptable. I've been involved in scenarios where it was not acceptable, I was surprised by the behavior, and I ended up with a bunch of missing data.
The problem here is that I was _surprised_ by my database.
That should literally _never_ happen. The correct behavior in this specific case is for the database driver to raise an exception or warning (exactly which should probably be a configurable option, defaulting to an exception). That way, if an exception gets raised in my application, at the very minimum, I have a traceback in my logs to _tell_ me something went wrong, which allows me to fix the issue sooner rather than later.
As for your existence proof that "MySQL's behavior is ok," the fact that MySQL has a bigger installed base literally proves nothing. I guarantee you there are folks out there who are experiencing such behavior and being surprised by it, just like I was. There are people out there for whom this specific behavior is not a problem. And, there are people out there who are length checking every bit of text they try to stuff into a VARCHAR(n) or CHAR(n) field using MySQL, because they started with MySQL and they're basically stuck with it, unless they want to do major surgery on their app.
Apart from being the most mature relational database, I would argue the code that runs it. It's pure poetry. I am yet to see code this clean anywhere. Just opening any source code file at random[1] will show you exactly what I mean.
[1]
https://github.com/postgres/postgres/blob/master/src/backend...
Let's not get carried away. It is very much debatable which database is "most mature".
Here is some discussions about how Postgres compares with Oracle DB (if $ is not an issue), it is often said that Postgres is a Cessna whereas Oracle DB is like a Fighter Jet:
https://news.ycombinator.com/item?id=24582937
I've never had the opportunity to test Oracle DB at large, just some minor playing around with 12c many many years ago so I don't have an accurate view. That said, given my experience with other Oracle products and the fighter jet analogy... De omnibus dibutandum est...
Fully agree with this. I once heard it stated by someone "I don't like writing C, but I don't mind at all writing Postgres C". It is definitely a quality codebase and the credit goes to a pretty small set of maintainers over the years.
I started using PostgreSQL because, at the time, MySQL wouldn't do subqueries. The fact it silently truncated VARCHAR data to the size of the column and didn't give out an error didn't help much.
Overall, my impression is that the project's philosophy is to do the Correct Thing, even if it's slow.
Few years back I've had friend explain the difference between MySQL and PostgreSQL like this:
- MySQL: do it fast, then make it right
- PSQL: do it right, then make it fast
I had switched after learning that CHECK constraints were parsed, but ignored -- I've yet to be as offended by a program that at that instant
Apparently supported now but what the hell; I don't think they even threw a warning about it
PostgreSQL isn't exciting. It's boring, with roots in the Berkeley Ingress project from the mid 80s. Which is why 10 years ago, so many HNers went with trendy NoSQL data stores instead, even though they lost data. Data loss is very exciting, and PostgreSQL is very boring. This preference for what's exciting and trendy over what's old and proven is a large reason why so much software today sucks.
That seems unfair.
NoSQL was trying out new _ideas_ in data storage. It was exciting to try out new or re-imagined core concepts, and some of those young projects had teething issues. But several are still around and remain popular, but they're popular for certain niches they excel at (and those niches were largely discovered through trial and error).
In the SQL-sphere a lot of people skipped Postgres because MySQL had, at the time, the momentum in the free/cheap relational database space. Between then and now Progres has grown more elegantly than MySQL, and people are rightfully looking to it.
> NoSQL was trying out new ideas in data storage
Was it? Or was it just rehashing old ideas made obsolete by relational databases?
>
https://en.wikipedia.org/wiki/IBM_Information_Management_Sys...
> Initial release 1966; 54 years ago
>
https://en.wikipedia.org/wiki/Category:NoSQL
MySQL was also a lot faster in the early days.
Plus it âjust workedâ not sanely mind you. But you had a lot less problems with type mismatches, up/downcast, etc. Well unless you wanted quality data.
But back then we were just happy that it didnât bother us with âminorâ details.
It was a lot faster, and sometimes the odds that your data might live long enough for those integrity issues to matter seemed very remote.
Around 2001, the conventional wisdom was that it was worth the risk to take the performance victory, keep your hosting bill down, use mysql, and build some other approach to data integrity. (If memory serves, the integrity downsides were bigger then, too...)
The math is pretty different now. I'm not sure it's fair to hang all of it on Oracle, but it feels like that acquisition derailed some MySQL progress that might've made it a close call today. Now, I use Postgres unless I have a very specific, very compelling reason not to.
> Around 2001
ok boomer. It's 2020.
Surprised they didn't cover transaction id wraparound, which has caused more than a few public outages at scale.
IIRC some Postgres folks have a plan to fix it? Maybe Craig knows?
> Surprised they didn't cover transaction id wraparound, which has caused more than a few public outages at scale.
There have been some significant improvements in the last couple years - most importantly the introduction of the freeze map in 9.6. But also quite a sprinkling of other incremental improvements.
It can obviously still be a problem, but it's not as pronounced as it once was.
can you elaborate?
> can you elaborate?
On what exactly?
The freeze map bit I referenced above is the following commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
Commit a892234f830e832110f63fc0a2afce2fb21d1584 gave us enough infrastructure to avoid vacuuming pages where every tuple on the page is already frozen. So, replace the notion of a scan_all or whole-table vacuum with the less onerous notion of an "aggressive" vacuum, which will pages that are all-visible, but still skip those that are all-frozen. This should greatly reduce the cost of anti-wraparound vacuuming on large clusters where the majority of data is never touched between one cycle and the next, because we'll no longer have to read all of those pages only to find out that we don't need to do anything with them.
This means that an anti-wraparound vacuum (automatically started, even when autovacuum is disabled) is cheaper than it used to be. Still not necessarily cheap, as indexes still need to be scanned (but see below).
Some of the additional changes were (reverse chronological order):
* 2020-03-28 - "Trigger autovacuum based on number of INSERTs" -
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
This is important because it will reduce the cost of an eventual index wraparound, as there will be less work in a later anti-wraparound vacuum
* 2020-01-20 - "Allow vacuum command to process indexes in parallel." -
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
As index processing can be expensive (no equivalent to scanning only changed parts of table), processing them in parallel can greatly reduce the time for a vacuum. Note that this isn't yet done by autovacuum.
* 2019-04-04 - "Allow VACUUM to be run with index cleanup disabled." -
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
This can be extremely useful for manual vacuum when getting close to a wraparound, since the index processing step is not necessary to stave of wraparound.
* 2018-04-04 - "Skip full index scan during cleanup of B-tree indexes when possible" -
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
This can make vacuums for pretty clean tables vastly cheaper.
There's also a number of changes that make indexes smaller / less likely to bloat. That in turn makes vacuuming more efficient. E.g.
*
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
*
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
Edit: formatting (gah, why is HN formatting so limited)
Just chiming in to say, Andres above can answer this far more in detail than I can. I suspect most people that do hit this today 1. have not nearly as a bad pain from it and 2. recover in generally much more quickly than previously. Today vs. say Postgres 9.6 is a huge difference.
Features I miss in PostgreSQL:
- Support large number of connections (without external pooling tools) and stop starting a new process per connection (consumes huge amount of resources). Other DBMS works with threads instead of processes and don't have any issues.
- Easy to configure high availability with multiple nodes
Less important:
- Graph capabilities directly in PostgreSQL without extentions or something else
- Running a scheduling tasks directly in PostgreSQL without external tools (for example a script that removes old data from database).
- Supporting Temporal Tables
But overall I like PostgreSQL more than Oracle or SQL Server, even if I miss the above features.
I like the this whole class of questions "What's so exciting/good about <SOMETHING>?"
Online (forums, etc) this type of question usually elicits informative answers, and probably in face to face communication too.
It was really funny, Uber switched from MySQL to Postgres, and then Postgres to MySQL. So theyâve gone back and forth a couple of times, but when theyâve swapped to Postgres, they had to figure out â they had all this app logic that relied on things being case-insensitive, because the database just doesnât respect case search.
This suggests to me that the guest here maybe doesn't know about collations? But that doesn't seem likely (possible?) given his stated experience and the roles he's held. So what gives?
Guest here. The context is Uber was on MySQL which by default doesn't respect (or at least in their version and setup at the time) case sensitivity. To MySQL craig is the same as Craig when matching.
Their app for search relied on this. The solution for them was the citext data type to create a case insensitive string -
https://www.postgresql.org/docs/13/citext.html
I'm fairly familiar with Postgres, but have not heard of a collation that tackles case insensitivity without tweaking the system itself? Postgres you can tweak this at the system level, but I've not seen it in the wild and as per the docs you're now non deterministic. For MySQL it appears you can change this, but seems a surprising default for them they expected case insensitivity.
Yes, another option could have been indexing and searching on lower or upper casing, but they wanted the minimal change to their app.
There was once a conference talk they gave about their migration process from MySQL->Postgres, but I'm not immediately able to find the video online so it may have been removed.
I'm not familiar with pg, but the choice of case sensitivity is a fundamental property of collations in other DBMSes like MSSQL. Web search does make it look like pg only got case-insenstivity-via-collation in v12 (2019), and also requires you to make your own collation with case-insensitivity since it doesn't have any built-in ones with it.
It's been a while and don't remember the specifics, but when we implemented a POSIX-compliant filesystem at Maginatics (acq. by EMC), we used MySQL (Percona actually) to store the FS logic (inode numbers, filenames, etc) and case sensitivity was one of the FS features. Maybe Percona supports it? I do remember (at least at that time) that full utf-8 support was poor though.
Btw, don't get me started about case-sensitive file systems - Windows got it right (case-preserving, but case-insensitive). Why should both /home/ellen/Music and /home/ellen/music exist? :)
Because once you go there, you have to answer questions which are not as easy. Does M match m? Does i match ı? Does a match ă? Does ă match ăą? Does m match đ? What happens when they need to coexist?
But they're different! I see where you're coming from though (unfortunately I know all these examples too well..).
Case insensitive search is accomplished with ILIKE no?
How well does ILIKE work with indexes? In MySQL the collation (which includes case- and accent-independency options) is known to the index.
Great practical note by Josh Berkus on why Uber left Posgresql. Basically: runaway table bloat because Uber had a usecase that postgres doesn't address as well as InnoDB.
https://www.postgresql.org/message-id/5797D5A1.5030009%40agl...
The whole VACUUM paradigm is the biggest thing that bugs me about pgsql. The fact that it can actually freeze things always worries me. Canât this happen constantly in the background like modern GCs?
It does, though? That's what autovacuum is, unless I'm missing something.
It's just not always enough without tuning, just like GC.
I learned that about postgres collations and case sensitivity about 6 years back, and it was within 3 months of my first heavy use of postgres. So I wonder if this person was just experiencing their first usage of postgres too? And I've used relational databases for more than 25 years.
I read that as Uber discovered their code relied on MySQL default case insentivity. Migrating to Postgres might have been tricky as collation is a very recent addition in Postgres.
That could be. It was the "because the database just doesnât respect case search" part that struck me -- to say that MySQL doesn't respect case search is false, and the only scenario I can think of where it can even appear to be the case is if you're new to MySQL and you've made a big assumption without even googling it. Or maybe if you're new to string comparison (and therefore programming?) in general.
Uber once switched Postgres->MySQL because of human error and polytics for âtaking remediation actionâ.
Wow, what a great episode! Probably one of the info-loaded podcasts I've listened to. Definitely going to be using psql's \e command with $EDITOR set to VS Code.
It works reliably and is not at the mercy of Oracle - so it's pretty much the winner for relational storage, unless you need something different (doc db, time series db...).
Title has a typo: What's so _exciting_ about Postgres