I won't list out those things, because, I'm sure it wouldn't be relevant to most people reading this comment. :). But, there are a couple things I noticed that I do want to mention:
> There were these edge cases in there, where it’s kind of like less safe. And Postgres was there, and it had a reputation of being just rock-solid, consistent, and more strict with your data
> "... I never had that moment where I’m like “Oh, Postgres, you screwed me over.”
This is a huge deal, and, IMO, the reason to choose Postgres over MySQL. MySQL has some horrendous defaults. You can make things a little better, but, AFAIK, you can't even get to where Postgres starts out of the box. Most of that consists of MySQL being extremely permissive with what it allows you to do in your SQL. Many of those things are non-standard, and some are quite unsafe. Postgres not only
prioritizes safety, it keeps the application developer honest.
It can be a little complain-y, but I'd rather have my database complain than do stuff that could lose data. One great example is when you try to stuff more characters into a text field than it can accommodate. By default, MySQL just truncates the string silently; Postgres just doesn't allow it.
> PostGIS is a huge one. It’s a whole geospatial database.
This is the other point I want to call out. I've had the pleasure of working with PostGIS before. The main thing I'd want to say about PostGIS is that, although there are issues working with geojson and GIS in general, PostGIS doesn't exacerbate any of them. Generally speaking, PostGIS is about as pleasant to work with as a geospatial database can be.
This is incredibly important: if you use MySQL, unless you are extremely pedantic about using safe defaults and a safe-by-default ORM you will almost certainly end up in the position where migrating will be hard because you are going to need to rigorously test & remediate all of the areas where your code was relying on MySQL allowing erroneous code to run without errors and nobody ever looked at the warnings.
I've even seen a few cases where people had data loss which had either gone unnoticed or been sporadic enough that they assumed it was user error or random corruption.
I've gotten to troubleshoot this specific condition myself. Amusingly, the big thing that saved us was that we could use lingering redundancy in a poorly normalized schema to rebuild some relationships that should've been enforced by the database.
Maybe under MyISAM, but InnoDB has been the default storage engine since 2010.
In 1998, MySQL was appealing because it was very fast on simple queries and didn’t cost a further. Until the early 2010s it had a better replication story than Postgres. But by now it’s often slower than Postgres on my apps, feature poor, and you don’t notice how many limitations you’ve internalized until you realize it’s been years since you had to repair or kludge around a MySQL quirk.
I commented in another thread about this. It's not black or white whether that's a bug or a feature since:
1) not truncating causes the whole row to be rejected in Postgres, or with the relevant MySQL server setting
2) for many SaaS and social media apps, truncating that column type is fine, since typically that's a description field
3) auto-truncation means you don't have to sync your app and database schema after every change.
I prefer and rely on MySQL's behavior actually, even though I work with both MySQL and Postgres.
As someone who's trained a bunch of DBAs, here's how you should explain that this is a bug:
A database's job is to reliably store and retrieve data: that's why we talk about ACID since the whole concept is built around moving from one correct state to another. If we didn't care about getting back what we stored, we could just write flat files on disk and shrug when it breaks. Silently discarding some of the input breaks that contract.
Similarly, a core part of the SQL standard is about the database ensuring that values match the defined schema constraints. If we're going to allow silent truncation, the same reasoning would allow inserting foreign keys which don't exist, coercing non-numeric values into NULLs or other ways of “handling” non-numeric data types, replacing invalid dates with "0000" (oh, wait…), ignoring CHECKs (which MySQL did until 8.0.16), etc. If we're not doing that, we could just use MongoDB and stop pretending to be more than a blob store.
Rejecting the entire row is exactly what a DBA should want because it means that a) the database only contains the valid data it promised to save and b) the application team is immediately alerted to a mismatch between what their application assumes and how the database schema is actually configured — since there's been a breakdown in the migration process, there are likely other problems as well which might be more subtle. Especially in this century there's no excuse for not using a migration framework which makes the process of keeping the database schema in sync with the application automatic and reliable.
In one of the rare cases where data truncation is okay — and note that I've never encountered one of those in 3 decades where the users saw it as anything other than “This garbage software doesn't let us store more and that's why our department really runs on the Excel spreadsheet one of the admin assistants created” — the correct approach is to implement it at the application level so the user can see a fixed-length input field, warnings if they try to enter more, etc. and the application doesn't produce odd errors when, say, it stores a value and attempts to retrieve the same value only to be told it doesn't exist.
Specific features are somewhat interesting but at this point not many teams are starting from scratch on MySQL or PostgreSQL. It's generally the case you already have experience on one or the other or both.
That's your opinion, but the truth is not necessarily black and white.
Postgres will probably lose more data in your case when the application developer doesn't handle the exception for a text field too long, and loses the whole row. (If you're an application developer, you should audit your code now.)
A well-written application using Postgres would have to truncate the data before insert, achieving the same result as having MySQL auto-truncation, but with more initial and on-going maintenance effort in the case of Postgres.
And the existence proof that MySQL's behavior is ok is that MySQL is the most widely-used relational databases, and powers most Internet sites.
The example was intended to show a scenario in which MySQL silently loses data. There are certainly situations in which this sort of data loss is acceptable. I've been involved in scenarios where it was not acceptable, I was surprised by the behavior, and I ended up with a bunch of missing data.
The problem here is that I was surprised by my database.
That should literally never happen. The correct behavior in this specific case is for the database driver to raise an exception or warning (exactly which should probably be a configurable option, defaulting to an exception). That way, if an exception gets raised in my application, at the very minimum, I have a traceback in my logs to tell me something went wrong, which allows me to fix the issue sooner rather than later.
As for your existence proof that "MySQL's behavior is ok," the fact that MySQL has a bigger installed base literally proves nothing. I guarantee you there are folks out there who are experiencing such behavior and being surprised by it, just like I was. There are people out there for whom this specific behavior is not a problem. And, there are people out there who are length checking every bit of text they try to stuff into a VARCHAR(n) or CHAR(n) field using MySQL, because they started with MySQL and they're basically stuck with it, unless they want to do major surgery on their app.
Edit: I notice a lot of gray on your profile page under comments. Do you need help? This is a genuine offer. My email is in my profile.
The thing is, you'll noisily lose your data, which is far more likely to be caught and corrected. Even the most roughly put together web app is probably going to have some mechanism to alert about server-side production errors.
> A well-written application using Postgres would have to truncate the data before insert, achieving the same result as having MySQL auto-truncation, but with more initial and on-going maintenance effort in the case of Postgres.
Not necessarily - you could alert the user that they need to provide less data, or mitigate it in some other way (maybe storing a summary and the full version in a blob somewhere).
(I believe there are also still issues with Unicode, too)
I ran into a fun one a few weeks back: code which had been running for years failed on an AWS Aurora cluster because they'd defined a field as BIGINT and were inserting UUID_SHORT() values into it. For some reason, Aurora always generates values with the first bit set and so they learned the hard way that that field needed to be declared as unsigned.
Yes, the defaults can be changed but many, many people do not know this and will only learn why they need to after they have something blow up painfully. Often, even people who know this in theory will forget about it at some point when setting up a new one. It's the same reason why the industry is trying to get away from services like MongoDB / ELK listening on 0.0.0.0 with no password or having a default password which can easily be scanned. For something as widely installed as MySQL, even a 10% chance of oversights will mean a LOT of collateral damage.
As for the rest of your comment, sensible defaults most certainly are a valid reason to choose one piece of software over another. Bad default configs are equivalent to what the Dungeons & Dragons community calls "newbie traps." They're things that either don't look important, or don't look like they'll hurt you, but really are important, or can hurt you, if you're not extremely careful. What's more, some of these options are probably not going to be backward compatible with your code. So, you get into this situation where MySQL is doing shitty things you don't want it to do, and, when you find out later that it's doing those things, you either can't do much about it, or doing something about it involves major surgery to your application. That's bad.
So, yes, you can configure MySQL to have better than default behavior. That's no surprise; that's literally the point of having configuration options, after all. The problem is that some of these options don't look too bad. There is something to be said for having a permissive configuration. But, for the reasons I spelled out in the previous paragraph, that shouldn't be the default configuration. It's not too much to ask that you should be able to install and configure a database and not have it lose or corrupt your data.
NoSQL was trying out new ideas in data storage. It was exciting to try out new or re-imagined core concepts, and some of those young projects had teething issues. But several are still around and remain popular, but they're popular for certain niches they excel at (and those niches were largely discovered through trial and error).
In the SQL-sphere a lot of people skipped Postgres because MySQL had, at the time, the momentum in the free/cheap relational database space. Between then and now Progres has grown more elegantly than MySQL, and people are rightfully looking to it.
Was it? Or was it just rehashing old ideas made obsolete by relational databases?
> Initial release 1966; 54 years ago
Plus it “just worked” not sanely mind you. But you had a lot less problems with type mismatches, up/downcast, etc. Well unless you wanted quality data.
But back then we were just happy that it didn’t bother us with “minor” details.
Around 2001, the conventional wisdom was that it was worth the risk to take the performance victory, keep your hosting bill down, use mysql, and build some other approach to data integrity. (If memory serves, the integrity downsides were bigger then, too...)
The math is pretty different now. I'm not sure it's fair to hang all of it on Oracle, but it feels like that acquisition derailed some MySQL progress that might've made it a close call today. Now, I use Postgres unless I have a very specific, very compelling reason not to.
Here is some discussions about how Postgres compares with Oracle DB (if $ is not an issue), it is often said that Postgres is a Cessna whereas Oracle DB is like a Fighter Jet: https://news.ycombinator.com/item?id=24582937
Overall, my impression is that the project's philosophy is to do the Correct Thing, even if it's slow.
- MySQL: do it fast, then make it right
- PSQL: do it right, then make it fast
Apparently supported now but what the hell; I don't think they even threw a warning about it
With enough experience with the alternative, things that just work are exciting.
With 30m minutes of time, you can turn it into a document db, log/time series db, columnar, graph, key value, whatever
Sqlite is still awesome but PG is more awesomer. And since the cost is the same I just stick with PG now. Unless I'm resource constrained (which for my work is rare)
It’s not a replacement for a real DB but it is damn close.
The documentation seems to suggest that it is in fact pronounced "SQL-lite": https://www.sqlite.org/fullsql.html
> Don't be misled by the "Lite" in the name.
> nothing persistent would work reliably.
Edit: I just realised you are _the John Nagle_, so you likely have a more refined taste in certain kinds of programming than I do, however I think my point about incentives probably still stands.
A complicated thing which just works is exciting.
The GP was referring to "Databases shouldn't be exciting" - not in the traditional sense of the word, but a metaphor that hinges on "excitement" being new features, frivolous decorations, unnecessary tangents, losing vision of what a database should do, frantic patching and heated bikeshedding discussions.
It wasn't meant as "Ooooo I am excited about a database".
All of these features are exciting to the people that can use them.
It is the reason why the modern web sucks, and frankly - I wish folks at Postgres would develop a new web standards including browsers that are not governed by corporations and can be implemented in a couple of weeks. Not trivializing what goes into building browsers, but making the whole standard small enough so any one can build a browser during a sabbatical.
It just seems to be the nature of the web side of things, lots of sloppiness.
The devs are smart enough, they just don't think and it shows.
That or they use something but clearly don't understand the thing they are using.
I replaced nearly 500 lines of code with ~60 by taking a bunch of massive conditonals re-ordering them then extracting them into methods and returning on the simple case.
This isn't voodoo, you have to work harder and remember more to do it the wrong way.
Also comments, they don't seem to understand you document why you did the thing, not what you did, what you did is in the code, I don't trust your comments to tell me what you did, I want to know why you did it.
That and general documentation (cross link stuff, link at the top of the class to the documentation in the wiki, update the documentation when you get a ticket that changes the behaviour, write the high level flow in plain english a 10 year old could understand).
I actually like my job, the people are lovely and the work is interesting its just frustrating when an ounce of thought saves a pound of cure.
That is quite possibly one of the best quotes I have heard on how to approach discipline in engineering.
> Inexperience grows from the bottom
In what way?
> I've gotten "better" by simply being too stubborn to get worse
How does one get better by wanting to get worse?
This makes no sense to me.
I'm not sure I agree, but I can understand the sentiment.
It's not just "the web side of things". Which makes it even more scary.
You know, I feel like this has bled into backend development too. And it's not entirely the fault of engineers... or so I like to think, to excuse myself :P
What worries me a lot is that I find myself being one of those people, not being able to fully help my situation.
It's just... too much. It seems like every few months a new thing pops up and everyone suddenly wants to adopt it. And even if you try and resist it because it's not necessary, someone from above (e.g. CTO) will decide that the whole org has to adopt that and now you're screwed.
In a typical backend application you have to deal with:
- Application framework
- Testing framework
- Logging framework
- Metrics framework
- Visualization tool(s) for the three things above
Which is already quite a bit, but obviously manageable and you could get pretty knowledgeable on all of that if they stay static.
But hey, now your org has to expose every service on gRPC. Ok, one more thing to tack onto your app...
Now everyone has to expose GraphQL endpoints too!
Oh and next month we're migrating to a new metrics solution. Because. Great, let's throw away all the knowledge built on the old one and scramble to migrate to the new things we don't have time to learn fully.
Alright, now let's start writing new services in this new language.
And now everyone has to throw events on this new message bus thing that some team over there built using Kafka!
Finally, let's move everything off of AWS managed services and onto Kubernetes! Yay!
All of that can easily happen within the span of 2-3 years. And in the meantime you have to deal with on-call rotations, customer issues, implementing new features... and if you're a senior dev you're also busy mentoring folks, going to meetings with other teams, etc etc.
You know what I'm talking about, right? Over time, I feel like it's becoming impossible to achieve mastery of anything. There are way too many things to learn to be able to make sense of the whole, and no time to do deep dives on all of them. And then there's all the new stuff that keeps being adopted.
And then things blow up and they're freaking mysterious and when you finally find out what it was, you realize you'd only have been able to catch that problem beforehand if you'd read the freaking source code for some library you adopted.
It's madness. It's pure hell.
Boy I miss working on low level code.
Half my battles with my team are them wanting to pull in dependencies, sometimes it's crazy like a wrapper dependency to pull in a different dependency when it's simpler to just pull in the indirect dependency and use it directly.
It's very much a thing that comes from experience though, when you've been bitten by upstream enough to be a little circumspect about it.
I agree with all of it, particularly "Lets throw away X and switch to Y, we lose all the domain knowledge we have with X but Y others this absolutely tiny marginal improvement, so it's worth it!"
As a lead I spend a lot of time pushing back on adding stuff until someone has made the case why we should add it.
You want to use Y, great - document why you want to add Y, what benefit it brings, who runs Y, where is the code, how many maintainers does it have, is it well tested, if want to remove Y later is it a big job, what does it offer that we can't currently do etc.
It’s a big enough world where you don’t have to interact with every technology that exists, but this is one of the more prevalent ones - and you might find it useful to make peace with that community.
Anyway, good luck out there.
That said, there is some truth in it. The thing about the back end is that there's no satisfaction in it other than the satisfaction of doing it well and earning the esteem of peers, so most back end developers are paying attention to the quality of the code they write. The front end has other attractions as well, and many front-end developers start out focused on those attractions, and learn by experience (if they learn at all) that writing high quality code is related to the cool stuff that got them into programming in the first place.
Frontend and Backend engineers should get together and build a better system that serves both well. That's what I meant by asking Postgres core devs to build front-end tools, something like Gemini .
Yes, they are capable of making great software, but it’s exactly what gp said... they don’t think. I’ve seen so much code which looks nice from the outside, abd is all structured. But if you start reading, I’m riddled wtfs and questions about why things weren’t done in a more simple concise and stable way.
Recently had to deal with some Slack bot written in js at work, and my coworkers had to deal with me ranting about that crap for a whole afternoon.
The way I put it was, this code was written in the most complicated way possible, for no apparent reason.
Thank you, this has been entertaining.
Wait until this guy learns what the JS in "json" column types means, and how the largest webcrap sites rely on them for persistence.
In 2017 Bryan Cantrill gave a wonderful talk titled "Platform as a Reflection of Values: Joyent, Node.js and Beyond" https://vimeo.com/230142234 In it, he talks about how even though they liked JS as a technology, they came to realize that the values of the JS/Node.js community differed from their values at Joyent.
In the talk (at around 19:30), he identifies that in 2014 Joyent's core values wrt node.js were "debuggability", "robustness", and "stability"; while the core values of the broader node.js community were "approachability", "expressiveness", and "velocity". And there's nothing wrong with either of those sets values, but they're not the same!
It's not entirely about persistence... it's about the ease and reliability of the persistence.
IIRC some Postgres folks have a plan to fix it? Maybe Craig knows?
There have been some significant improvements in the last couple years - most importantly the introduction of the freeze map in 9.6. But also quite a sprinkling of other incremental improvements.
It can obviously still be a problem, but it's not as pronounced as it once was.
On what exactly?
The freeze map bit I referenced above is the following commit:
Commit a892234f830e832110f63fc0a2afce2fb21d1584 gave us enough
infrastructure to avoid vacuuming pages where every tuple on the
page is already frozen. So, replace the notion of a scan_all or
whole-table vacuum with the less onerous notion of an "aggressive"
vacuum, which will pages that are all-visible, but still skip those
that are all-frozen.
This should greatly reduce the cost of anti-wraparound vacuuming
on large clusters where the majority of data is never touched
between one cycle and the next, because we'll no longer have to
read all of those pages only to find out that we don't need to
do anything with them.
Some of the additional changes were (reverse chronological order):
* 2020-03-28 - "Trigger autovacuum based on number of INSERTs" - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
This is important because it will reduce the cost of an eventual index wraparound, as there will be less work in a later anti-wraparound vacuum
* 2020-01-20 - "Allow vacuum command to process indexes in parallel." - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
As index processing can be expensive (no equivalent to scanning only changed parts of table), processing them in parallel can greatly reduce the time for a vacuum. Note that this isn't yet done by autovacuum.
* 2019-04-04 - "Allow VACUUM to be run with index cleanup disabled." - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
This can be extremely useful for manual vacuum when getting close to a wraparound, since the index processing step is not necessary to stave of wraparound.
* 2018-04-04 - "Skip full index scan during cleanup of B-tree indexes when possible" - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
This can make vacuums for pretty clean tables vastly cheaper.
There's also a number of changes that make indexes smaller / less likely to bloat. That in turn makes vacuuming more efficient. E.g.
Edit: formatting (gah, why is HN formatting so limited)
- Support large number of connections (without external pooling tools) and stop starting a new process per connection (consumes huge amount of resources). Other DBMS works with threads instead of processes and don't have any issues.
- Easy to configure high availability with multiple nodes
- Graph capabilities directly in PostgreSQL without extentions or something else
- Running a scheduling tasks directly in PostgreSQL without external tools (for example a script that removes old data from database).
- Supporting Temporal Tables
But overall I like PostgreSQL more than Oracle or SQL Server, even if I miss the above features.
This suggests to me that the guest here maybe doesn't know about collations? But that doesn't seem likely (possible?) given his stated experience and the roles he's held. So what gives?
Their app for search relied on this. The solution for them was the citext data type to create a case insensitive string - https://www.postgresql.org/docs/13/citext.html
I'm fairly familiar with Postgres, but have not heard of a collation that tackles case insensitivity without tweaking the system itself? Postgres you can tweak this at the system level, but I've not seen it in the wild and as per the docs you're now non deterministic. For MySQL it appears you can change this, but seems a surprising default for them they expected case insensitivity.
Yes, another option could have been indexing and searching on lower or upper casing, but they wanted the minimal change to their app.
There was once a conference talk they gave about their migration process from MySQL->Postgres, but I'm not immediately able to find the video online so it may have been removed.
It's just not always enough without tuning, just like GC.
Online (forums, etc) this type of question usually elicits informative answers, and probably in face to face communication too.
"lemming status of developers"
Looks like you (or anyone else) can directly contribute these corrections
I'm pretty sure he's talking about Oleg Bartunov! He's one of the best speakers about Postgresql at Russian Highload conference: https://www.youtube.com/watch?v=uhvqly8MtoI&list=PLH-XmS0lSi...