Hacker News new | past | comments | ask | show | jobs | submit login
A PostgreSQL response to Uber [pdf] (thebuild.com)
142 points by kissgyorgy on April 28, 2017 | hide | past | favorite | 82 comments

Title should be changed to "A PostgreSQL consultancy's response to Uber [pdf]". Many people in this thread, including myself, assumed this was an official response by PostgreSQL.

thanks. comments like yours are why I always check the HN thread before deciding if something's worth reading.

The use of graphics in this PDF/slideshow was incredibly effective, more so than most articles I read.

The PDF lays out Uber's statements, and either lay it over a real-world analogy (a road on a sinkhole for database corruption) or lay it over a picture that primes their response (like a picture of apples and oranges when they plan to respond that Uber is comparing different features of mysql to postgresql).

The use of the elephant picture to give "elefacts" (sort of a parody on politifacts, where they evaluate the truth of uber's statements) is also great.

The images add humor and reinforce the content - great use of graphics!

Not to mention the elephant painted to look like a taxi :-)

This is missing one huge point of the [uber engineer post](https://eng.uber.com/mysql-migration/).

They did not switch from a postgres instance to put all their data in a mysql instance. They switched from a single postgres instance to shard their data across many mysql instances. This an entire reworking of the architecture that is completely ignored in this powerpoint.

Honestly I think Uber switching was more of a business decision as their employees had trouble figuring out how to properly use Postgres. Instagram definitely has different data needs, but is probably much bigger than Uber wrt data. They use Postgres successfully via Django no less!


As someone who participated in this transition, I don't think that was a part of it. The new database layer took quite a bit of instruction to use properly as well, and there were plenty of people misusing it.

> switched from a single postgres instance

did you even read the post?

There's replication, but I mean that all the data is in one pg instance, whereas there isn't a single mysql instance with all the data.

This comes across as a bit defensive and snarky. I'm not sure what it's hoping to achieve. The not so subtle message, reading between the lines, is that Uber engineers had bad experiences with postgres because they are morons. Well, maybe. But my main take away from that is that postgres is hard to use correctly, which if true is actually a good reason not to use it.

If only programmers could be polite to each other occasionally.

I do not think that Postgres is hard to use correctly and I also think that Uber did a poor job on trying to fix their problems. This PDF just outlines the details. Uber also has the tendency to switch databases every few years. It is hard for me to take their opionion seriously on databases.


Postgres is not hard to use correctly and the implication is that Uber didn't make an attempt to clear up certain misunderstandings, not that they wouldn't have been able to. It's not moronic to succumb to using talking points or FUD you've seen in blogs or incompletely-informed personal bias in making decisions. It's just human. Since people are going to keep looking up literature on the topic here's a new higher-quality piece of literature.

Come on, it's fun. Programming can be so boring, a little passion and trolling can't hurt too much.

> This comes across as a bit defensive and snarky

As somebody who works on postgres a good chunk of his time, I agree. I preferred the initial type of responses (where we just looked into the valid complaints and disregarded other angles). But everyone can speak about something like this, and that's good too.

It's not harder to use than MySQL. And for everything (even MySQL) the rule of thumb is: if you are not confident with the technology don't use it or let somebody else manage it.

Sure, I wasn't making a claim about how easy to use postgres actually is. My point was that these slides give the impression (I assume inadvertently) that postgres is difficult to use by making fun of Uber engineers' alleged inability to use it correctly.

My reading was less that PostgreSQL was "too difficult to use" and more that MySQL is less strict about safety.

That difference is more substantial than it may sound. "An ounce of prevention is worth a pound of cure", as they say. While PostgreSQL may do things in a way that require a little extra annoyance up front, they're usually that way to prevent disastrous data integrity issues further down the line.

What can you say when someone says things like "We prefer logical replication because it uses less bandwidth"? There are extremely good justifications for the high bandwidth cost of real (bit-level) replication. Logical replication has uses but IMO it should be applied cautiously and sparingly.

I felt all of the ASCII shrugs were justified. They signify a tradeoff that Uber made. My personal feeling is that most of those tradeoffs were bad, but what can you do. The slides shrug because you can't argue. If that's the problemset that their organization prefers to deal with, more power to them.

well if my system spwans new database connection in an open transaction, than you have more problems than just your database. they even had another talk where they let their master die, but just ignoring "free space left on device" messages. Well even Skype could've handled PostgreSQL and they probably had a smaller pool of engineers.

I'm not a big fan of Uber as a company and I don't necessarily think that their engineering team is the shit (I really don't know). But my heuristic here is "any problem that Uber engineers aren't smart enough to avoid is a problem that I might very well face myself".

I'm not so sure that's an appropriate heuristic here. The problems they faced were partly a result of scale[1]. Most companies with scaling problems of the same magnitude as Uber have engineers competent enough to correctly handle the issues they cited as reasons for switching. Honestly, there are hugely people at any number of Postgres consultancies that could have helped them setup logical replication - since they complained about it being difficult to do - which seems like it would have solved many of their issues. I'd like to believe Uber had the resources to do this, if they had cared to.

[1] For example, write amplification.

well I think a lot of problems arise in these fast running companies. a company that scale's up slowly and brings their tech stack up and forward with a small team that slowly adds people (so that everybody can learn from people working a little bit longer on the stack) will lead to a way more sane teck stack than a extremly fast growing team + technology.

also I loved some talks about instagram. (https://de.slideshare.net/iammutex/scaling-instagram) Scaling: replacing all components of a car while driving it at 100mph (And the end slides 150+ are awesome, too). And while they use PostgreSQL and talk about everything, they also did engineering quite right and prolly that's the problem of Uber. They tried to get to big to fast and know everything burns (not just their tech).

Having just read through the slide deck, I'm having a hard time coming to the same conclusion you do. Overall, I thought it was quite balanced, pointing out strengths and weaknesses of both MySQL and PostgreSQL. The author even started out with a slide with Inigo Montoya, "You insulted my elephant. Prepare to die", so he was clearly cognizant of the fact that this could be taken as purely a blind defense of PostgreSQL, and that is not the intent.

Would you mind taking the time to point out what phrases or slides gave you the impression that this was written from the position that "Uber engineers had bad experiences with postgres because they are morons"? I know that different people can get different impressions of the same material, so it would be helpful for me to understand better what gave you that impression.

There are a couple of comments like:

> I assume the company the size of Uber can figure it out. C’mon

> But... c’mon. Uber?

Both of these I think reflect the idea that it's likely Uber would have been able to continue to use Postgres if they were interested in fixing the issues they had with the system, rather than had additional, other motivations for doing so, the belief being that the Postgres-specific issues they list are likely soluble if they had wanted to put in the effort. They're a large enough organization that they should have had the resources to do so.

That doesn't mean that their decision to move off of Postgres for those other reasons wasn't the right thing to do: just that there's not enough information there for us to really understand the decision process. From a Postgres community standpoint, it's important to make sure that they have quality answers to the issues publicly raised by very visible companies such as Uber. Many people will read about Uber's experience with Postgres, and it makes sense for the Postgres community to be clear what can be done about them.

Your point about "postgres is hard to use correctly" I think is one of those things that it's hard to use a lot of the systems out there—not just Postgres—at the scale that Uber or some of the other large installations do. That's when you really become aware of where the stresses put on the systems start to show and what you need to be aware of to tune and set them up correctly for your use case.

Overall, I think 'gdulli's response (https://news.ycombinator.com/item?id=14223170) is largely on point.

Like I said above, if you'd point out which parts struck you as particularly unfair, I know I'd benefit from it to hear more from your perspective.

I found the use of shrug emojis in response to substantive points from the Uber paper pretty infuriating.

I didn't say that postgres is difficult to use. I said that the slides give that impression.

Interesting. I'm not a fan of emoji (or memes for that matter) in general, but I realize they're part of the tech culture now and try not to let them bother me too much.

The first shrug emoji was after the 9.2 data corruption bug. I took that one to mean "Yup. What are we going to do? There was a bug, and we fixed it as quickly as we correctly could. Incredibly regrettable, but that kind of stuff is going to happen." There are bugs in software. Knowing the Postgres developer community, they take correctness very seriously.

The second one is after the Uber quote which describes their tolerance for developer's holding open transactions and blocking I/O operations, which they do so from a position of inexperience because they're not database experts. I understood this one to mean "If you're going to use handle transactions in this manner, that's not something Postgres itself is going to be able to help you with."

The third was in response to the Uber quote regarding Uber application bugs which resulted in open idle connections. I took this shrug to mean that if this is an issue, it likely can (and should) be fixed in the Uber applications. It's not really a Postgres issue.

The fourth (and last) was in response to the lack of quantitative information regarding their Postgres issues (which makes honest, in-depth third-party investigation difficult), Uber's decision to go schema-less, and that MySQL is more tolerant of the bugs in Uber software. I took this one again to mean that there's little that Postgres itself is responsible for, or can do anything about here.

A question I always like to ask myself when someone has issues with something a third party is responsible for is what is a realistic and reasonable response from the third party. That often makes me realize that there's little they can be expected to do or are really responsible for, at least for some of the issues. For the most part, I think these shrugs reflect that.

If the shrug emojis were removed, would it be okay? Anything else?

The issue is that it's rude to quote someone and then respond with an emoji. If what they're saying is relevant then it merits a proper response. If it isn't relevant, or you don't have anything to say in response, then you shouldn't quote it in the first place.

You're welcome to come up with your own theories of what the emojis mean, but I don't see any point in doing so.

This is a slide deck from a presentation at Percona Live 2017.


We're missing everything that was said by the presenter. I would strongly suspect that the presenter said something about the slide.

Yeah that's a fair point. I’d suggest that’s a reason to think carefully before posting a slide deck, though. If essential points are left off the slides entirely, then it may not be such a good idea to post the slides by themselves.

I think many people haven't had the issues you personally do with the presentation. That said, other comments in this thread make clear that you are not the only one to have read them as snarky. It's very useful to be charitable and give people the benefit of the doubt when reading their communication. I think there's a lot of useful information in the slide deck presented in a succinct way, and publishing them as-is is a useful and efficient way to do so.

I think with very little effort it's easy to interpret the shrugs as I did above, and there's little if any additional information that needs to be address along with the final shrug. I wouldn't have presented it this way, but I don't see any malevolence or negative intent on the part of the author.

There's kinda two audiences for this paper: Uber, the engineering team that migrated away from psql, and the tech community at large, which saw Uber's technical decision as a signal about the quality of psql.

IMO, the shrug emoji is fine. If it were the only response, that'd be a problem but the author gives a proper response in the following slide: it was a short lived bug, and it's not like MySQL is objectively better. I can think of a few MySQL problems I've run into, like fixing a data loss bug in the rollback code by refusing transactions larger than 10 percent of the rollback buffer until a real fix can be published, making backups, restores and migrations pretty much impossible.

But the tl;dr here is that psql will continue to get a black eye in the tech community as long as slony exists and is useful.

The implication that the stated reasons are wrong and that Uber had some ulterior motive for switching is weird. They didn't have to say anything. They instead gave a rather detailed accounting of their reason. What's the motivation to lie? Nobody would have even known they switched if they hadn't said anything.

> stated reasons are wrong and that Uber had some ulterior motive

> What's the motivation to lie?

You're right. Uber didn't have to publish anything. In all of the discussion regarding, I don't think anyone has tried to imply that Uber is lying about their reasons for switching. If that's how you read what I wrote above, that was not my intent (indeed, I tried very hard to make clear that isn't the case.) That's different from the position that Uber, given their resources, likely could have gotten Postgres to work in their environment if purely the Postgres issues they described in the article were why they chose to use MySQL.

Again, I don't think anyone is accusing Uber of lying about or obscuring the reasons they switched from PostgreSQL to MySQL. At least I know I'm not.

The slides do say "There was plenty of speculation about "real" motives."

I think this is humorous, and, in a way, brutal. Keep in mind it's not an official document from Postgres but rather just a guy who is big on Postgres writing a funny, and yes, snarky response to Uber. He makes some decent points, if not about the technologies themselves, about people switching to something more or less equivalent for reasons that could easily be interpreted as more gut reactions than solid business and tech necessities.

To me, Uber acted as though someone bought a Honda and it had some mechanical issues (and no seat heat), so he went apeshit, drove it off a cliff, then bought a Toyota thinking he will never have that problem again.

The response to "MySQL handles our devs’ bugs better" is ¯\_(ツ)_/¯ over and over, but, in my opinion, it's perfectly valid criticism. When writing a tool for real-world businesses to use, the path of least resistance needs to lead to bug-free code, and the tool must handle common buggy usage gracefully.

The path of least resistance should lead to quick and obvious failure rather than a false sense of security that the system is working. Careless development creates a price that has to be paid one way or another. You can't get something for nothing.

Wait, that's not what's happening. We're not talking about something that actually failed but appeared to work. We're talking about something that was maybe poorly used but still worked correctly. That can be genuinely beneficial.

That's a very good point. Unfortunately, it is not the meaning of "¯\_(ツ)_/¯" in any language that I know of.

> You can't get something for nothing.

If by that you mean that you can't have a developer tool whose "least-resistance" usage leads to anything other than failure (be quick and obvious, or hidden), I'm sorry that that's the nature of the tools you've found. I do think we can have nice things; it's why the level of abstraction of our tools keeps steadily rising over time.

I put effort into learning the tools I use that are as important as the place where I store all the data. So it's a moot point.

If you want your product to succeed it has to be easy to get to work. That means having insecure defaults etc. When a product is in production, people will go through hell to make it work a little better; when they're trying out a product they'll drop it at the slightest inconvenience.

Some people work that way. Other people do a more in-depth evaluation. The first group tends to end up with MySQL, the second with PostgreSQL. I'm painting with a broad brush here, there are situations where it might make more sense to use MySQL.

It would be hard to argue that PostgreSQL is not successful. MySQL has a larger market share, but the PostgreSQL project is alive and thriving.

> When a product is in production, people will go through hell to make it work a little better

No they won't. https://www.linux.com/news/about-40000-mongodb-databases-fou...

> When writing a tool for real-world businesses to use, the path of least resistance needs to lead to bug-free code, and the tool must handle common buggy usage gracefully.

Disagree. There's a reason fail fast and hard is gaining steam as a best practice. The number of latent, surprising or buggy behaviours explodes combinatorially with each layer of a system that acts permissively in this way.

The issues talked about in the presentation aren't caught at compile time, testing, or canarying, by PostgreSQL. The database "failing" here, according to the presentation, meant outages for Uber.

Except these weren't DB failures, they were very clearly programmer failures. And the OP suggested addressing such programming failures by using more permissive tools.

But you don't solve failures by making the systems involved more permissive, you solve problems by making them more strict. If the opposite were true, our most reliable systems would be written in bash or perl.

Take for example opening processes instead of threads. How would you make that more strict? Spin up a VM each time? Is MySQL wrong in using threads?

Strict vs. permissive has little bearing on this question, but it still isn't a valid problem on Uber's part as it's trivially solved by connection pooling.

"Trivially solved by <doing some work> instead of not having to do anything if you used MySQL" is precisely the ¯\_(ツ)_/¯ answer that's just a snarky way to ignore constructive criticism.

Is it constructive criticism? Doesn't seem like it to me. It's absurd to suggest that ~20 lines of code in a system of hundreds of thousands of lines amounts to a valid criticism.

Only in a world in which all else was equal between Postgres and MySQL would this even remotely be plausible as a criticism. We don't live in that world.

What 20 lines of code, and what has the amount to do with whether the criticism is constructive or not, and valid or invalid?

"Doing things this other way would serve us better, for this reason" is constructive criticism by definition. Non-constructive would be "doing things the way you do them is stupid".

Responding with ¯\_(ツ)_/¯, with "don't write any bugs lol", or with "just do extra work" is ignoring feedback.

My personal experience dealing with Postgres people vs. MySQL people has always been oddly lopsided, with Postgres users seeming to be crazy defensive about their product and getting very offended about any perceived slight when compared with MySQL, and MySQL users generally shrugging their shoulders and saying "post what?"

One camp seems to be made up of perfectionists who spend a lot of time worrying about how things "should" be, and the other seems to consist of pragmatists who just want it to work.

I will leave it to the reader to decide which is which and which has more appeal to business decision makers.

In case somebody missed there is another article from Uber about justifying MySQL -> Postgres move a while back.


Really nice to see another side of Uber's issues. Mainly because since Uber is a huge company most people will read their article as the absolute truth.

Is there a video of this presentation? Was this from pgconf 2017?

I saw it presented at Percona Live three days ago. As far as I am aware, there was no video.

I think this slideshow is doing the project a disservice. Mysql is definitely a "lesser" database, that seems to be widely known and accepted. It seems to be also very frustrating for the PostGres team that that is not the sole factor, but it isn't. For a shared database it's not even that high up there. This slideshow is bemoaning that Uber seems to find the design decisions of PostGres a problem. Really, they're fine decisions "just get a department to deal with them", is the message. And yes, that is very much the PostGres attitude.

The great thing about MySQL is that it generally just keeps working with incredibly small amounts of maintenance whereas PostGRESQL just constantly needs attention. This has always been my personal complaint. From vacuuming (yes I used PostGRESQL before autovacuum, and you can still fuck up autovacuum) to upgrades, everything is just fiddly fiddly fiddly.

The end result is that mysql, you start it, you run it, you do your normal OS upgrades and everything just kinda hums along. For years and years. PostGRESQL is like all enterprise solutions : you start it and run it and a month or so and it suddenly refuses to accept connections, or suddenly it starts using too much disk (e.g. misconfiguring autovacuum), or ... It has a bazillion things you need to configure and make cooperate and there's large procedures for everything you need do to. Every week some warning light goes all flashy and won't stop flashing until it made you press a few buttons where it was perfectly predictable which buttons needed to be pressed. It forces you to consider 2000 configuration options, rather than picking sensible defaults, instead asking.

But yes, you get something back for that. A bigger, better, more correct and far more featureful database. In many ways it starts having the issues of other large databases (e.g. the 3-page-and-totally-inscrutable SQL stored procedure functions).

This is very much a case of "pick your poison". But frankly, if you want your app to just run, like we all do, MySQL will serve you better. If your OCD can't deal with small imperfections, datatypes that fit only 99%, having values that your text mode SELECT in the database can't print ... if those bother you, stay away from MySQL. And of course the classic, if you have a "real database workload" (very heavy load with constant reads AND constant writes), yes you probably need PostGRESQL.

You could say Mysql is halfway between LevelDB and PostGres.

By the way, if you need a mobile database with zero maintenance, SQLite will serve you even better. It can't be shared with other applications and is not meant for database-behind-network approaches, but you'd be surprised how well it can work.

> I think this slideshow is doing the project a disservice.

I think you know, but I just want to emphasize: This is not the project's response, it's an individual's response.

This is fun, but here's the actual, less snarky, response from Postgres developers, as previously discussed: https://news.ycombinator.com/item?id=12201353

The index issue and the memory used per connection I can understand but when they didn't even try to use one of the many logical replication systems that have been used a scale bigger than Uber, hello Marco at Skype, the argument against Postgres gets a bit confusing. Uber seemed to really understand some things with amazing depth and not understand others that are documented by others outside of Uber. I think some politics played a big part.

I know PostgreSQL meant to defend themselves, but it just made the matter worse.

1) I didnt even know Uber switched database, now I know. I also know the reasons.

2) Comes across as unprofessional, you don't see Microsoft defending MSSQL this way. They let the users see it for themselves.

> I know PostgreSQL meant to defend themselves


> you don't see Microsoft defending MSSQL this way.

PostgreSQL Experts Inc. is a consultancy that specializes in PostgreSQL; they don't appear to be particularly linked to Postgres development organizationally (and none of their staff profiles highlight involvement in Postgres development). If this was the PostgreSQL Global Development Group, you'd be a bit more on point.

If its not PostgreSQL but instead a consultancy then the title should be changed to reflect that. The misunderstanding is likely to affect the comments (as above).

The title does already reflect that. The name of the consultancy is on the title page of the document.

This is not a response from the PostgreSQL team. The author is a consultant specialising in PostgreSQL.

I think they reacted perfectly - they showed they are passionate about their software, while keeping it lighthearted and informative.

Streisand effect indeed but I don't mind. I got to see two posts and some technical aspects I wouldn't otherwise.

> now I know. I also know the reasons.

Good for you. Now you can make informed decisions.

> matter worse.

What do you mean by "worse"? Your comment sounds like having issues in a software product is something embarrassing, something worth of hiding. I always assumed that it is a good thing to have an open discussion about problems in your own software, no?

I'm a little confused by the pdf. Is it implying that the program is opening and closing a connection for each query? Is that normal these days?

Some applications will indeed do that to guard against the server-side memory cost of thousands of postgres pool connections. It makes sense when queries are relatively rare.

This was the solution that we used at MX Logic in the early years, before we moved to pgbouncer and went back to long-lived stateless connection pools.

I thought it was more that they were opening a connection for several queries but doing other blocking IO tasks in between the queries, rather than just computation.

Facebook (TAO), Dropbox (Edgestore) also moved to a schema-less database design. I wonder how these large orgs manage versioning of their models...

You put a version key & number in the model blob metadata itself.

Love the shrug emoji

I think it is an emoticon, but I am not sure! Or are emoticons always sideways? Hmm...!

I always thought "emoticon" is the most generic word that includes everything from ":)" to the shrug to unicode emoji.

given that it's made with unicode characters, as opposed to just ascii, I believe it is an emojicon ;)

The reason (and timing) for the somewhat snarky response is probably because Uber right now have a bit of a black mark against them PR wise.

Meanwhile Amazon and AWS have essentially banned putting a relational database behind any publicly facing website or service.

While I don't doubt that they'll tell you that that was a scaling decision, it actually has more to do with an architectural decision that proved to be difficult to overcome once it's flaw became apparent.

They operated warehouses using a monolithic oracle database, one for each FC. They had hundreds of different services using the same database. Whenever one service wanted to do something new, they had to spend a massive amount of time running their proposed database change past every team on the database. I've seen a single column addition take 9 months and hundreds of engineer hours to get approved.

So once the warehouses got really big, sharding was the obvious answer but they couldn't make sharding work because they couldn't coordinate their way out of their mess they created. They couldn't scale because they engineered themselves into a corner that made it impossible to use normal best practices for scaling SQL databases.

NoSQL has an interesting lack of a feature that solves their problem. Because theyre not relational, they don't really work very well sharing data across services and teams, so they don't get into major coordination tangles on shared databases. Maybe that works for them, but it's more of an indictment of their engineering culture than it is a slight on SQL databases. And it's pretty punitive in a TSA kind of way: We fucked up once so none of you can have nice things anymore.

I'm interested to know where you got this from -- is there a link or video somewhere to find out more?

Can anybody explain about the buffer pools part?

Caching can happen in a number of places, with a variety of degrees of success. In psql, a lot of caching is deferred to the OS filesystem layer. In MySQL, apparently, they have a additional cache in the MySQL address space.

There's two sides to this. Generally speaking, an general purpose OS will use cache management algorithms that suck compared to what an application could do, because the application has more structured knowledge. In the case of a DB, it knows about indexes, and row sizes, and is less likely to evict half an index or row.

On the other hand, the OS is sort of the last authority. Varnish, in particular, argues that programmers should rely on the OS caching algorithms, because you have them whether or not you want them. A poor interaction between userspace and kernelspace caches can end up increasing I/O activity if kernel pages something to disk before the userspace does (varnish had a doc somewhere explaining this better, which I can no longer find). The penalty here though is context switching. A userspace cache is available in memory, whereas a filesystem / buffer cache will incur a context switch to retrieve the data from kernelspace to userspace.

Finally, both have a number of caches, so this is more about how much and what type of userspace caching.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact