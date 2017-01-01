The PDF lays out Uber's statements, and either lay it over a real-world analogy (a road on a sinkhole for database corruption) or lay it over a picture that primes their response (like a picture of apples and oranges when they plan to respond that Uber is comparing different features of mysql to postgresql).
The use of the elephant picture to give "elefacts" (sort of a parody on politifacts, where they evaluate the truth of uber's statements) is also great.
The images add humor and reinforce the content - great use of graphics!
They did not switch from a postgres instance to put all their data in a mysql instance. They switched from a single postgres instance to shard their data across many mysql instances. This an entire reworking of the architecture that is completely ignored in this powerpoint.
did you even read the post?
If only programmers could be polite to each other occasionally.
As somebody who works on postgres a good chunk of his time, I agree. I preferred the initial type of responses (where we just looked into the valid complaints and disregarded other angles). But everyone can speak about something like this, and that's good too.
That difference is more substantial than it may sound. "An ounce of prevention is worth a pound of cure", as they say. While PostgreSQL may do things in a way that require a little extra annoyance up front, they're usually that way to prevent disastrous data integrity issues further down the line.
What can you say when someone says things like "We prefer logical replication because it uses less bandwidth"? There are extremely good justifications for the high bandwidth cost of real (bit-level) replication. Logical replication has uses but IMO it should be applied cautiously and sparingly.
I felt all of the ASCII shrugs were justified. They signify a tradeoff that Uber made. My personal feeling is that most of those tradeoffs were bad, but what can you do. The slides shrug because you can't argue. If that's the problemset that their organization prefers to deal with, more power to them.
[1] For example, write amplification.
also I loved some talks about instagram. (https://de.slideshare.net/iammutex/scaling-instagram)
Scaling: replacing all components of a car while driving it at 100mph (And the end slides 150+ are awesome, too).
And while they use PostgreSQL and talk about everything, they also did engineering quite right and prolly that's the problem of Uber.
They tried to get to big to fast and know everything burns (not just their tech).
Would you mind taking the time to point out what phrases or slides gave you the impression that this was written from the position that "Uber engineers had bad experiences with postgres because they are morons"? I know that different people can get different impressions of the same material, so it would be helpful for me to understand better what gave you that impression.
There are a couple of comments like:
> I assume the company the size of Uber can figure it out. C’mon
> But... c’mon. Uber?
Both of these I think reflect the idea that it's likely Uber would have been able to continue to use Postgres if they were interested in fixing the issues they had with the system, rather than had additional, other motivations for doing so, the belief being that the Postgres-specific issues they list are likely soluble if they had wanted to put in the effort. They're a large enough organization that they should have had the resources to do so.
That doesn't mean that their decision to move off of Postgres for those other reasons wasn't the right thing to do: just that there's not enough information there for us to really understand the decision process. From a Postgres community standpoint, it's important to make sure that they have quality answers to the issues publicly raised by very visible companies such as Uber. Many people will read about Uber's experience with Postgres, and it makes sense for the Postgres community to be clear what can be done about them.
Your point about "postgres is hard to use correctly" I think is one of those things that it's hard to use a lot of the systems out there—not just Postgres—at the scale that Uber or some of the other large installations do. That's when you really become aware of where the stresses put on the systems start to show and what you need to be aware of to tune and set them up correctly for your use case.
Overall, I think 'gdulli's response (https://news.ycombinator.com/item?id=14223170) is largely on point.
Like I said above, if you'd point out which parts struck you as particularly unfair, I know I'd benefit from it to hear more from your perspective.
I didn't say that postgres is difficult to use. I said that the slides give that impression.
The first shrug emoji was after the 9.2 data corruption bug. I took that one to mean "Yup. What are we going to do? There was a bug, and we fixed it as quickly as we correctly could. Incredibly regrettable, but that kind of stuff is going to happen." There are bugs in software. Knowing the Postgres developer community, they take correctness very seriously.
The second one is after the Uber quote which describes their tolerance for developer's holding open transactions and blocking I/O operations, which they do so from a position of inexperience because they're not database experts. I understood this one to mean "If you're going to use handle transactions in this manner, that's not something Postgres itself is going to be able to help you with."
The third was in response to the Uber quote regarding Uber application bugs which resulted in open idle connections. I took this shrug to mean that if this is an issue, it likely can (and should) be fixed in the Uber applications. It's not really a Postgres issue.
The fourth (and last) was in response to the lack of quantitative information regarding their Postgres issues (which makes honest, in-depth third-party investigation difficult), Uber's decision to go schema-less, and that MySQL is more tolerant of the bugs in Uber software. I took this one again to mean that there's little that Postgres itself is responsible for, or can do anything about here.
A question I always like to ask myself when someone has issues with something a third party is responsible for is what is a realistic and reasonable response from the third party. That often makes me realize that there's little they can be expected to do or are really responsible for, at least for some of the issues. For the most part, I think these shrugs reflect that.
If the shrug emojis were removed, would it be okay? Anything else?
You're welcome to come up with your own theories of what the emojis mean, but I don't see any point in doing so.
IMO, the shrug emoji is fine. If it were the only response, that'd be a problem but the author gives a proper response in the following slide: it was a short lived bug, and it's not like MySQL is objectively better. I can think of a few MySQL problems I've run into, like fixing a data loss bug in the rollback code by refusing transactions larger than 10 percent of the rollback buffer until a real fix can be published, making backups, restores and migrations pretty much impossible.
But the tl;dr here is that psql will continue to get a black eye in the tech community as long as slony exists and is useful.
We're missing everything that was said by the presenter. I would strongly suspect that the presenter said something about the slide.
I think with very little effort it's easy to interpret the shrugs as I did above, and there's little if any additional information that needs to be address along with the final shrug. I wouldn't have presented it this way, but I don't see any malevolence or negative intent on the part of the author.
> What's the motivation to lie?
You're right. Uber didn't have to publish anything. In all of the discussion regarding, I don't think anyone has tried to imply that Uber is lying about their reasons for switching. If that's how you read what I wrote above, that was not my intent (indeed, I tried very hard to make clear that isn't the case.) That's different from the position that Uber, given their resources, likely could have gotten Postgres to work in their environment if purely the Postgres issues they described in the article were why they chose to use MySQL.
Again, I don't think anyone is accusing Uber of lying about or obscuring the reasons they switched from PostgreSQL to MySQL. At least I know I'm not.
If by that you mean that you can't have a developer tool whose "least-resistance" usage leads to anything other than failure (be quick and obvious, or hidden), I'm sorry that that's the nature of the tools you've found. I do think we can have nice things; it's why the level of abstraction of our tools keeps steadily rising over time.
It would be hard to argue that PostgreSQL is not successful. MySQL has a larger market share, but the PostgreSQL project is alive and thriving.
No they won't. https://www.linux.com/news/about-40000-mongodb-databases-fou...
Disagree. There's a reason fail fast and hard is gaining steam as a best practice. The number of latent, surprising or buggy behaviours explodes combinatorially with each layer of a system that acts permissively in this way.
But you don't solve failures by making the systems involved more permissive, you solve problems by making them more strict. If the opposite were true, our most reliable systems would be written in bash or perl.
Only in a world in which all else was equal between Postgres and MySQL would this even remotely be plausible as a criticism. We don't live in that world.
"Doing things this other way would serve us better, for this reason" is constructive criticism by definition. Non-constructive would be "doing things the way you do them is stupid".
Responding with ¯\_(ツ)_/¯, with "don't write any bugs lol", or with "just do extra work" is ignoring feedback.
To me, Uber acted as though someone bought a Honda and it had some mechanical issues (and no seat heat), so he went apeshit, drove it off a cliff, then bought a Toyota thinking he will never have that problem again.
One camp seems to be made up of perfectionists who spend a lot of time worrying about how things "should" be, and the other seems to consist of pragmatists who just want it to work.
I will leave it to the reader to decide which is which and which has more appeal to business decision makers.
The great thing about MySQL is that it generally just keeps working with incredibly small amounts of maintenance whereas PostGRESQL just constantly needs attention. This has always been my personal complaint. From vacuuming (yes I used PostGRESQL before autovacuum, and you can still fuck up autovacuum) to upgrades, everything is just fiddly fiddly fiddly.
The end result is that mysql, you start it, you run it, you do your normal OS upgrades and everything just kinda hums along. For years and years. PostGRESQL is like all enterprise solutions : you start it and run it and a month or so and it suddenly refuses to accept connections, or suddenly it starts using too much disk (e.g. misconfiguring autovacuum), or ... It has a bazillion things you need to configure and make cooperate and there's large procedures for everything you need do to. Every week some warning light goes all flashy and won't stop flashing until it made you press a few buttons where it was perfectly predictable which buttons needed to be pressed. It forces you to consider 2000 configuration options, rather than picking sensible defaults, instead asking.
But yes, you get something back for that. A bigger, better, more correct and far more featureful database. In many ways it starts having the issues of other large databases (e.g. the 3-page-and-totally-inscrutable SQL stored procedure functions).
This is very much a case of "pick your poison". But frankly, if you want your app to just run, like we all do, MySQL will serve you better. If your OCD can't deal with small imperfections, datatypes that fit only 99%, having values that your text mode SELECT in the database can't print ... if those bother you, stay away from MySQL. And of course the classic, if you have a "real database workload" (very heavy load with constant reads AND constant writes), yes you probably need PostGRESQL.
You could say Mysql is halfway between LevelDB and PostGres.
By the way, if you need a mobile database with zero maintenance, SQLite will serve you even better. It can't be shared with other applications and is not meant for database-behind-network approaches, but you'd be surprised how well it can work.
I think you know, but I just want to emphasize: This is not the project's response, it's an individual's response.
This was the solution that we used at MX Logic in the early years, before we moved to pgbouncer and went back to long-lived stateless connection pools.
They operated warehouses using a monolithic oracle database, one for each FC. They had hundreds of different services using the same database. Whenever one service wanted to do something new, they had to spend a massive amount of time running their proposed database change past every team on the database. I've seen a single column addition take 9 months and hundreds of engineer hours to get approved.
So once the warehouses got really big, sharding was the obvious answer but they couldn't make sharding work because they couldn't coordinate their way out of their mess they created. They couldn't scale because they engineered themselves into a corner that made it impossible to use normal best practices for scaling SQL databases.
NoSQL has an interesting lack of a feature that solves their problem. Because theyre not relational, they don't really work very well sharing data across services and teams, so they don't get into major coordination tangles on shared databases. Maybe that works for them, but it's more of an indictment of their engineering culture than it is a slight on SQL databases. And it's pretty punitive in a TSA kind of way: We fucked up once so none of you can have nice things anymore.
There's two sides to this. Generally speaking, an general purpose OS will use cache management algorithms that suck compared to what an application could do, because the application has more structured knowledge. In the case of a DB, it knows about indexes, and row sizes, and is less likely to evict half an index or row.
On the other hand, the OS is sort of the last authority. Varnish, in particular, argues that programmers should rely on the OS caching algorithms, because you have them whether or not you want them. A poor interaction between userspace and kernelspace caches can end up increasing I/O activity if kernel pages something to disk before the userspace does (varnish had a doc somewhere explaining this better, which I can no longer find). The penalty here though is context switching. A userspace cache is available in memory, whereas a filesystem / buffer cache will incur a context switch to retrieve the data from kernelspace to userspace.
Finally, both have a number of caches, so this is more about how much and what type of userspace caching.
1) I didnt even know Uber switched database, now I know. I also know the reasons.
2) Comes across as unprofessional, you don't see Microsoft defending MSSQL this way. They let the users see it for themselves.
[...]
> you don't see Microsoft defending MSSQL this way.
PostgreSQL Experts Inc. is a consultancy that specializes in PostgreSQL; they don't appear to be particularly linked to Postgres development organizationally (and none of their staff profiles highlight involvement in Postgres development). If this was the PostgreSQL Global Development Group, you'd be a bit more on point.
Good for you. Now you can make informed decisions.
> matter worse.
What do you mean by "worse"? Your comment sounds like having issues in a software product is something embarrassing, something worth of hiding. I always assumed that it is a good thing to have an open discussion about problems in your own software, no?
