
Why Uber Engineering Switched from Postgres to MySQL - myhrvold
https://eng.uber.com/mysql-migration/
======
jedberg
> MySQL supports multiple different replication modes:

> Statement-based replication replicates logical SQL statements (e.g., it
> would literally replicate literal statements such as: UPDATE users SET
> birth_year=770 WHERE id = 4)

Postgres has that too (using a 3rd party tool, but it's an officially
supported tool). We were using it on reddit 10 years ago. It caused a lot of
problems. I wouldn't call that an advantage for Mysql.

Honestly, reading this it seems like the summary is: "We don't follow great
engineering practices so we need a database more forgiving". Which is fine if
that's how you want to run your business, but isn't really the death knell for
Postgres.

A specific example:

> This problem might not be apparent to application developers writing code
> that obscures where transactions start and end. For instance, say a
> developer has some code that has to email a receipt to a user. Depending on
> how it’s written, the code may implicitly have a database transaction that’s
> held open until after the email finishes sending. While it’s always bad form
> to let your code hold open database transactions while performing unrelated
> blocking I/O, the reality is that most engineers are not database experts
> and may not always understand this problem, especially when using an ORM
> that obscures low-level details like open transactions.

Your developer should understand database transactions. But you should make it
easier for them by abstracting it so that they don't have to. And in this
particular case, I'd say they shouldn't be using the database to do locking
around sending a receipt. It should be put into a queue and that queue should
be processed separately, which avoids the transaction problem altogether.

~~~
spotman
Try enforcing this on teams that use ORMs like hibernate with 500 developers.

Super, duper, common issue, you will find this at every large shop at some
point in its life time, usually around the time of hiring people and expanding
extremely fast, and taking on some tech debt.

All functions of extreme scale, hyper growth, and yeah, not following the
absolute best practices all the time, but tech debt is like any debt, you get
something now, and pay later. If they continue going up and to the right they
will be able to afford it.

~~~
djsumdog
ORMs have mostly been just painful at ever shop I've been at. I've used
ActiveRecord, Squirl, Hibernate, django.db .. they're all various level of
suck.

The one huge advantage of an ORM is the ability to support multiple databases,
but that only really works if you can do everything using the ORM. The moment
you have a function too complex that you need to write some SQL, now you need
some case statements and multiple integration tests for all the database your
product needs to support.

They remove some boiler plate while adding others. In one of my own projects,
I just created several files (pgsql.commands, mysql.commands, etc.), a basic
set of classes around them and a base set of commands that will work for all
the DBs I wanted to support (so the command files had an inheritance model,
albeit only one layer).

With all that being said, most ORMs I've used do have explicit transaction
support. I know Squirl had a `transaction {}` block you could wrap commands
around. Transactions shouldn't be an excuse. They should be off by default and
explicit added around blocks of things that need to be atomic.

> Try enforcing this on teams that use ORMs like hibernate with 500
> developers.

I realize this is a hyperbole (I hope) because you really shouldn't have 500
developers all on the same monolithic project (unless you're developing
like...the Linux kernel). Getting your team to at least try to implement best
practices does take some effort, but with things like weekly demos and code
reviewed commits, it's do-able.

~~~
jessedhillon
I used to be a huge proponent of ORMs everywhere, but I've come to realize
that if you're writing your app in such a way that a developer needs to be
able to do any arbitrary data fetch or transformation whenever they want,
that's your real problem. The set of retrievals and transformations you want
to support should be well-defined, and abstracted into a layer whose interface
allows the level above it to only think in terms of models and not their
backing store.

After you have that, then it doesn't even matter on the backend. The models
you present to the layer above can have complex and changing relationships to
the actual storage -- maybe they contain data which is derived from what's in
the database, but transformed after being fetched so that none of their
properties actually correspond to a column or a field in a store. In my
experience -- having seen the tragedy that is a Rails project gone full
ActiveRecord -- this pattern enforces an excellent separation of concerns and
constrains a problem which can otherwise grow unboundedly in complexity.

~~~
barrkel
I don't really agree; I think you either need to have a very thin layer
between your DB facts and your domain, or else use the DB as a kind of
persistence layer for a complex graph.

The latter only really works if you've got a primary implementation language
and aren't integrating lots of applications / libraries written in different
languages communicating with the same database. You need to go down the SOA /
distributed RPC / FFI route to integrate different languages, and that has its
own complexities.

Personally I prefer treating the DB as a canonical store of facts. Models with
a lot of code are pretty suspect. Retrievals can be tuned to just the facts
required, ma'am - you don't accidentally drag in the banana + gorilla + whole
forest. Doesn't stop you building a higher-level service layer if that's what
you need, either. You'll need that when you scale up anyway; chatty models
won't work at that level either.

~~~
jessedhillon
Yeah, I'm saying that as your app grows out of being a simple CRUD app into
something more useful and involved, there will be less of a relationship
between what you need to store things efficiently, and what you need to
present them well. Your model will become more graph-like, probably. For this
reason, patterns designed around AR-style models will fail to scale. I
disagree that this only works in a mono-lingual environment, although you will
need tooling and infrastructure to support it; a model-centric architecture
typically doesn't afford the possibility of multi-lingual support.

The code doesn't go in the models, it goes in the service/arbitration layer.
DB as a store of facts is obvious -- DB as a 1:1 representation of what yet-
unforseen features, UIs and platforms will need is a naive and limiting
assumption. You have to build your application in a way that future product
needs won't be constrained by storage and modeling decisions, which is a
tension that Rails apps frequently encounter.

------
ledjon
I would argue that most of these Postgres "flaws" are actually advantages over
MySQL when you look at them holistically rather than the very specific Uber
use-case.

Postgres's MVCC is superior (can rollback DDL, can add indexes online, can
have open read transactions for a VERY long time without impacting other parts
of the system)

Postgres supports many types of indexes, not just b-tree. One thing it doesn't
have is clustered b-tree indexes... which is really what MySQL does that makes
it somewhat "better." I wonder how Uber adds an index to a table that already
has 1B+ rows in it with mysql?

Postgres have WAL level replication is a better guarantee of actually
replicating the data correctly. I cannot tell you how many times I've had to
tell my boss that the "mysql replicas might be slightly out of sync with the
master" because of various replication issues. The way it handles triggers and
scheduled events alone is garbage and can very easily break replication and/or
silently cause inconsistency.

As for data corruption, if there is a bug that causes corruption, then there
is a bug. I don't think that is a fundamental design flaw as implied in this
article. You shouldn't rely on 1/2 assed replication design to accidentally
save you from the data corruption bug. There are many downsides to the design
MySQL has that are simply not listed here.

I have been both a professional MySQL administrator as well as Postgresql (as
well as SQL Server and many NoSQL engines). Many of these Postgres issues are
only issues at crazy huge scale, and I would say at that point you probably
want to move away from relational anyway. MySQL has its own very large set of
problems at scale as well.

It sounds like Uber is using MySQL as just a data bucket with primary keys
("Schemaless") which is good -- because you can't alter tables to save your
life with MySQL.

At the end of the data each developer/business needs to use what works for
them, but I would really shy away from pointing to this article as a linchpin
in the "MySQL vs. Postgres" war (if there even is such a thing.)

~~~
jobu
> _It sounds like Uber is using MySQL as just a data bucket with primary keys_

They have a couple posts about "Schemaless", but I still don't understand why
they used MySQL as the data store instead of something like Cassandra. (
[https://eng.uber.com/schemaless-part-one/](https://eng.uber.com/schemaless-
part-one/) ) From that post it looks like they basically built a no-sql
database on top of a relational database.

The only reason given was operational trust ( _" If we get paged at 3 am when
the datastore is not answering queries and takes down the business, would we
have the operational knowledge to quickly fix it?"_ ). The project took nearly
a year to roll out, and in that time the operation knowledge could surely be
trained, hired, or contracted.

~~~
smoodles
Operating Cassandra at the scale that Uber is going to require is going to be
painful and as operationally draining as MySQL if not more.

There are really not a large number of options here anymore with the departure
of FoundationDB from the market. CockroachDB might be an option in a few
years, though I'm still confused why they are moving towards a SQL-ish vs key-
value interface...

~~~
nickpsecurity
"departure of FoundationDB from the market"

Pissed me off so much. Only thing close to Google's F0 RDBMS on the market, at
a reasonable rate, and the beginning of a good offer to enterprises. Then,
"poof!" It's a good example of why I tell companies to not put anything
critical into something from a startup. If they do, better have a
synchronized, backup option tested and ready to go.

"why they are moving towards a SQL-ish vs key-value interface..."

That's easy: most databases and buyers use SQL. Key-value is preferred by
startups & non-critical, side projects in big companies you see here a lot but
aren't representative of most of the market. Need first-rate, SQL support. I
think EnterpriseDB shows that it's also a good idea to clone a market leader's
features onto alternative database.

~~~
ngrilly
> Only thing close to Google's F0 RDBMS

Did you mean F1 (instead of F0)?

~~~
nickpsecurity
Yeah, yeah. I keep getting the 0 and 1 mixed up. Thank you.

------
sam_pointer
We did something very similar at EA Playfish, at least one alumni of which is
part of the Uber engineering team.

We used a 2 column InnoDB-backed table for all of our data storage, massively
sharded, and run in a 3-host master-slave-slave configuration.

At that time EC2 would routinely kill hosts without the courtesy of a poke via
ACPI and as such we became very good at quickly recovering shards. In a
nutshell this mechanism was to have the new host contact a backup slave,
perform an lvm snap, pipe the compressed snap over a TCP connection, unroll it
and carry on, letting replication take up the delta.

That enabled us to not only manage the 10 million or so daily active users of
that title, but was also the platform under the 12 or so additional titles
that studio had.

We had lots and lots of very simple things and failures were contained.

I think at the time we were the 3rd-largest consumer of EC2 after Netflix and
"another" outfit I never learned the name of. EA being what it was, however,
we were never permitted to open source a lot of the cool stuff Netflix and
ourselves seemed to develop in parallel.

~~~
sandGorgon
this is frikking awesome! do you have any of the lvm and pipe scripts publicly
available ? i'm kinda struggling with building and setting up lvm on ec2
automatically and was wondering if there is any tidbits you can pass along.

~~~
sam_pointer
Unfortunately not. That company and codebase is very much dead, and I don't
own any of it. Even if I did it would be a bunch of context-less shell (in the
first iteration) and then a bunch of context-less Chef (in the second platform
iteration).

Things I do remember:

\- We used ephemeral volumes for all data stores. This was pre-provisioned
IOPs and EBS was flaky as heck back then. I can't remember the disk layout,
although we did experiment a great deal.

\- We took great pains to ensure there was enough space to make the snapshot
(IIRC is was a telemetry/monitoring item)

\- The pipe scripts were essentially "netcat".

The best I can offer is this talk:
[http://vimeo.com/57861199](http://vimeo.com/57861199)

------
fusiongyro
The article could be summed up as "Postgres is not a distributed database."
MySQL isn't either, although it certainly has more friendly replication
technology. I think it's a lot more likely that what's really happening here
is that they've designed their "schemaless" schema or its supporting software
to handle the kind of soft errors that MySQL is permitting and Postgres was
not.

We have MySQL replication across the country where I work and I certainly
wouldn't characterize it as robust; it fails every 3-6 months. MySQL
replication is certainly a lot older and easier to use than Postgres's, but
SQL databases are fundamentally CP systems. When you say "This design means
that replicas can routinely lag seconds behind master, and therefore it is
easy to write code that results in killed transactions" it sounds like you're
blaming the way replication was implemented for a physical problem. There is
no way to design a replication system such that two highly-consistent
databases can achieve perfect availability in the face of real-world networks.
A worse protocol can exacerbate the problem, but a better one can't make it go
away.

I have never seen corruption with Postgres (unlike MySQL), but I have never
tried cross-datacenter replication with it. Apart from that, Postgres
generally seems to do much better with consistency than MySQL does, where DDL
statements are not transactional, etc. So I am not surprised to hear that
their system trips harder on Postgres's more aggressive consistency.

In short, I suspect a more robust solution to their problem is a NoSQL
database. On the other hand, it sounds like they want a combination of
availability and consistency that will be difficult to get off-the-shelf. I'm
glad they found a way to make it work. I wouldn't generally choose Postgres
for a scalable system with an aggressive availability constraint--but then
again, I wouldn't choose MySQL either, and I generally avoid problems that
demand highly scalable, highly available solutions.

------
drob
We've hit a lot of the same fundamental limits scaling PostgreSQL at Heap.
Ultimately, I think a lot of the cases cited here in which PostgreSQL is
"slower" are actually cases in which it does the Right Thing to protect your
data and MySQL takes a shortcut.

Our solution has been to build a distribution layer that makes our product
performant at scale, rather than sacrificing data quality. We use CitusDB for
the reads and an in-house system for the writes and distributed systems
operations. We have never had a problem with data corruption in PostgreSQL,
aside from one or two cases early on in which we made operational mistakes.

With proper tuning and some amount of durability-via-replication, we've been
able to get great results, and that's supporting ad hoc analytical reads. (For
example, you can blunt a lot of the WAL headaches listed here with
asynchronous commit.)

------
pella
Roadmaps ( PostgreSQL ) 2016-2017-...

* Postgres Professional roadmap ( Pluggable storages, Multimaster cluster with sharding, Effective partitioning, Adaptive query planning, Page-level data compression, Connection pooling, Native querying for jsonb with indexing support, ....) [https://wiki.postgresql.org/wiki/Postgres_Professional_roadm...](https://wiki.postgresql.org/wiki/Postgres_Professional_roadmap)

* EnterpriseDB database server roadmap ( Parallelism, Replication, Vertical Scalability, Performance ) [https://wiki.postgresql.org/wiki/EnterpriseDB_database_serve...](https://wiki.postgresql.org/wiki/EnterpriseDB_database_server_roadmap)

====

And "Scalable PostgreSQL for real-time workloads
[https://www.citusdata.com](https://www.citusdata.com) " \-->
[https://github.com/citusdata/citus](https://github.com/citusdata/citus)

~~~
leblancfg
My big question here is why they decided to move over to MySQL instead of
using the Citus (also open source) Postgres extension. They don't mention it,
so we don't know whether it was considered, and if so, why it was not
selected. Postgres' rapid feature growth in the past years is interesting in
itself.

~~~
bonesmoses
Citus currently has major problems joining non-distributed tables with
distributed tables, and it doesn't fully support schemas yet. It's _almost_
there, but not quite.

~~~
leblancfg
Thank you!

------
thinkingkong
Posts like this are important.

We too often rely on a buzz-word heuristic and that's how you end up with
dozens of random technologies that are harder to maintain and don't
necessarily solve any of your problems. This method is good, because it shows
that when you understand the problem the right way, you can find the right
solution, even if by popularity it looks like a "step backwards"

Massive Kudos.

~~~
millamox
I strongly disagree. it would have been useful f they'd stuck to problems
without well-known solutions.

Sadly, they also mixed in issues which are easily solved, or in a particularly
egregious case, where they just complain about a bug. As though MySQL never
had a bug. That was silly.

My read of it was: Postgres annoyed us a few times, and we got fed up with
its, so now something different will annoy us. Please look forward to our blog
post in 4 years about how we're using X instead of MySQL/Schemaless because
those were also imperfect.

~~~
tracker1
I got a similar impression... though with Uber's scale, funding and resources,
they probably could have worked with and through their issues with Postgres.
I'm actually surprised they didn't take a multi-pronged approach to their
issues. Since they're using Schemaless, I'm curious why they didn't go for one
of the many non-sql databases that may well be a much closer match to their
use case.

It seems to me that Cassandra (C*) may have been a better match to their
needs... yes it means more administrative and application tuning, but would
definitely scale to meet their needs. RethinkDB would also likely be a better
match to what they are wanting to do.

That said, I've been holding out for some time on PostgreSQL's in the box
replication story to take root and mature. There are definitely more mature
features and solutions to sharding and replication around MySQL, I just tend
to find MySQL to be brittle and every time I've ever worked with it, I have at
least a half dozen WTF moments... from binary data handling/indexing, foreign
key syntax, ANSI out of spec, and others. PostgreSQL has some great features,
and once the replication issues settle in, it will become the default choice
for a lot of projects in a lot of organizations. Though, mySQL/maria and even
MS-SQL are currently better options for many SQL use cases.

~~~
chucky_z
What replication issues are you referring too? I never once had a problem with
pgsql's replication across 8.1->9.5.

It would randomly die, but that was always either my fault or the applications
fault, never pgsql itself.

The lack of master-master seems to be the big thing everyone mentions, but
PostgresXL is currently in a usable-in-production state.

~~~
tracker1
But, what is the _current_ replication setup that comes _with_ postgres that
is well documented with the PostgreSQL (current-version) documentation... the
past, when I've looked there's mention of 2-3 solutions (none in-the-box) and
others that require at least a 5-figure support contract.

Compare to MongoDB, RethinkDB, MS-SQL and others where the tooling for
replication comes in the box. Yes, to of the examples are "no-sql" but even
the mysql replication is in the box and supported as such.

~~~
chucky_z
Did you read this?

[https://www.postgresql.org/docs/current/static/high-
availabi...](https://www.postgresql.org/docs/current/static/high-
availability.html)

I'm not sure what more you can ask from documentation.

Does MySQL document Vitess, Galera, MaxScale, etc...?

------
NhanH
Well, this is heresy. Does that mean we are now officially boycotting Uber?

Joke asides, one thing I've been trying to figure out for awhile is the
limitation at which certain components/ systems broke down. Basically,
something along the line of "given X records, this operations would take Y
time, or would cause Z A B C problems". I've actually got developers friends
asking me how fast a simple "SELECT * FROM X WHERE index=?" would take on a
million row table, since they were surprised that some NoSQL DB could do a
query on hundred million rows in a few seconds.

I guess that's part of why you only learned how to scale after having done it
once.

~~~
Illniyar
"surprised that some NoSQL DB could do a query on hundred million rows in a
few seconds"

The tone seems to suggest this is fast, a simple index query in an RDBMS of
even a hundred million rows will take milliseconds on even a weak computer.

~~~
xaprb
Justify that with actual math, please? I don't think you know how computers
(processors, bus, memory access, etc) work. How exactly do you think an RDBMS
can query an index at a hundred billion rows per second?

~~~
gkop
Illniyar is right, a b-tree with a branching factor of 100 can index 100M rows
in 4 levels. Even with the index on a spinning disk, should have no trouble
coming in well under 100ms.

~~~
falcolas
Indexes like this are also typically in memory to begin with, making it even
faster, since you only have to find the actual record on disk.

Even better, if you only need one field from the record, and it's part of a
compound index, you can frequently return the data from just the indexes; no
disk seeks required. Small tip with InnoDB on MySQL - any non-primary key
index is automatically a compound index with the primary key.

------
forgotpwtomain
Great write up. A couple points -

I'm not sure this post is illustrative of any generally applicable
considerations (re: the title) in the choice of Postgresql vs MySQL, since
Uber seems to no longer be using a relational model for most of their data and
is using MySQL effectively as a key-value store.

> say a developer has some code that has to email a receipt to a user.
> Depending on how it’s written, the code may implicitly have a database
> transaction that’s held open until after the email finishes sending. While
> it’s always bad form to let your code hold open database transactions while
> performing unrelated blocking I/O, the reality is that most engineers are
> not database experts and may not always understand this problem, especially
> when using an ORM that obscures low-level details like open transactions.

I have to very seriously disagree here, ORMs make a lot of things easy - and
you can get away with building stuff for a while without understanding the
underlying databases or SQL but only to a certain scale (I'd say more like
medium-scale, definitely not large or Uber level). If you have engineers
writing code that interacts with a database without understanding
transactional semantics, the engineer in question not the database is the
problem.

> We started out with Postgres 9.1 and successfully completed the upgrade
> process to move to Postgres 9.2. However, the process took so many hours
> that we couldn’t afford to do the process again.

There seem to be ways [0][1] to do online upgrades with Postgres (before
logical decoding in 9.4), although I haven't personally used them. Not sure if
they explored these options at Uber or not?

[0] [https://github.com/markokr/skytools](https://github.com/markokr/skytools)
[1] [http://slony.info/](http://slony.info/)

~~~
spotman
> I have to very seriously disagree here, ORMs

In spirit I agree with you, its the engineer's fault for not reading the
documentation of their ORM or equivalent.

But in these big ships with hundreds of programmers, leaving transactions open
in hibernate is a daily occurrence somewhere.

Usually caught before production, but happens oh-so-frequently, that anything
that exacerbates the pain from this would be seen in a negative light for
sure.

------
BadassFractal
I've heard from technical leaders at multiple now well established unicorns
how they'd never use postgres or switched from postgres simply because MySQL
has a lot more tooling built it and many more people are exposed to its
shortcomings at "web scale" so that it's very well known where and when things
will break.

Disclaimer, I'm a hardcore Postgres user myself, but I also keep tabs on the
other tools.

~~~
segmondy
I've been a hardcore and exclusive Postgres user since 2004. Always rolled my
eyes when folks say they use MySQL or moved to it from Postgres. Reading about
that bug sent shiver's down my spine. I know no system is bug free, but
Postgres is one of those systems that have held up very well for me, and now
I'm super paranoid. :-( I guess I understand tho, I moved from BSD to Linux
for more tooling.

------
quotemstr
> Each of these system calls incurs a context switch

System calls are not context switches. I wish people would distinguish between
them. A system call is just a change of privilege level and is efficient ---
there's no cache invalidation required on almost any system.

A context switch, on the other hand, involves a call to the scheduler, saving
and restoring of much more CPU register state, and various kinds of cache
invalidation. (It's even more expensive if you're switching between different
processes instead of different threads in the same process.)

The kernel may perform a context switch while executing a system call --- this
context switch is what makes blocking calls blocking. But even IO system calls
do not _necessarily_ cause context switches, especially in the case where an
operation can be satisfied by accessing only the page cache.

tl;dr A system call is not necessarily a context switch

------
Illniyar
So the major issue detailed here is that postgres basically uses immutables
rows which creates performance issues with writes.

Just read about their new schemaless db in their blog an the first paragraph
contains this:

"The basic entity of data is called a cell. It is immutable, and once written,
it cannot be overwritten. (In special cases, we can delete old records.) A
cell is referenced by a row key, column name, and ref key. A cell’s contents
are updated by writing a new version with a higher ref key but same row key
and column name."

So, mmm..., not saying that postgres didn't pose a problem for them but I
think postgres' db model fits better to their new db then mysql. They probably
had to work really hard to get mysql to work like postgres.

Without this issue, it looks like two things needed.to be done with postgres
that would have solved their problems have indexes that point to primary id
and do logical replication (which they say a plugin solved in 9.4).

Is this a case of "I got burned by something so I won't use it again"

~~~
Illniyar
So more from here: [https://eng.uber.com/schemaless-part-
two/](https://eng.uber.com/schemaless-part-two/)

"Each Schemaless shard is a separate MySQL database, and each MySQL database
server contains a set of MySQL databases. Each database contains a MySQL table
for the cells (called the entity table) and a MySQL table for each secondary
index, along with a set of auxiliary tables."

So... 1 table, with 1 index and manually created and updated secondary index
tables.

With this scheme I can only assume postgres will work just as well or better.

------
0xmohit
Facebook maintains it's own fork [0] of MySQL. A couple of interesting talks
are also available: MySQL at Facebook, Current and Future [1] and Massively
Distributed Backup at Facebook Scale [2].

[0]
[https://github.com/facebook/mysql-5.6](https://github.com/facebook/mysql-5.6)

[1]
[https://www.youtube.com/watch?v=jqwegP9xwVE](https://www.youtube.com/watch?v=jqwegP9xwVE)

[2]
[https://www.youtube.com/watch?v=UBHcmP2TSvk](https://www.youtube.com/watch?v=UBHcmP2TSvk)

~~~
lazyant
from what I understand FB uses Mysql as permanent storage, not as relational
database

~~~
yeukhon
I believe they use it for almost everything. They built all the graphs on top
of MySQL data. I don't know what permanent storage is, but if you are
referring to storing images and videos, I doubt. Highly doubt that. They may
be storing pointers, but as a file system, I doubt. But I wouldn't know for
sure, I don't work there. Oh, they do have Cassandra (well they built
Cassandra)...

~~~
jhartmann
They use a system called haystack for binary large object storage. They very
much don't use mysql for those.

~~~
yeukhon
That's my feeling they wouldn't use MySQL as a blob store.

------
pritambaral
I wonder what the design decisions are behind (or what it would take) to make
Postgres store secondary indexes on disk like InnoDB does. Sure, the extra
index lookup through the primary index is a cost, but it seems like write-
amplification can sure be a greater concern too. Ultimately, it would be nice
if Postgres gave the DBA a choice of — if not move outright to — secondary-
index indirection through the primary index like InnoDB does.

~~~
saurik
I would definitely not want PostgreSQL to "move to" the mechanism used by
InnoDB as that mechanism is slower for reads (which this article even admits).
This disk layout is one of the reasons I continue to use PostgreSQL. If you
care more about write performance than reads you might even want to look into
something entirely different than a traditional database.

FWIW, PostgreSQL's mitigation to the write amplification problem mentioned
here are "heap only tuples". It is highly likely that Uber could have
prevented a lot of this pain by learning more about this feature--which is
notably not mentioned even once in this entire article, which to me completely
undermines the feel they are trying to achieve of "we really really really
know what we are doing"\--and tuning their table density parameters to take
maximal advantage.

Instead of "why we moved off of X" it would be much better to see "we are
considering moving off of X: anyone know what we are doing wrong?". Sure,
maybe Uber knows about HOT, and did extensive analysis to determine it wasn't
a solution to their problem; but it frankly does not seem at all to be the
case. Anyone who knows a lot about PostgreSQL would have explained HOT to
them, so they probably didn't even consult with PostgreSQL experts behind the
scenes.

Sadly, "we are looking for help with a complex technical challenge" is
something the market punishes under the premise that everyone has to be
entirely self-sufficient gods of their technology stack :(. The only time I
remember ever having seen a company reach out to the community for help was
reddit (with respect to something involving PostgreSQL or Cassandra... I don't
remembee the specific issue).

~~~
pritambaral
Thank you for introducing me to "heap only tuples", I did not know about it.
From what I read about it from Postgres's documentation[0], it would have
helped in the updates where no indexed column was updated.

0: [https://wiki.postgresql.org/wiki/Index-
only_scans#Interactio...](https://wiki.postgresql.org/wiki/Index-
only_scans#Interaction_with_HOT)

~~~
dialogbox
You can find more detailed explanation from Bruce Momjian's site. (From page
64:
[https://momjian.us/main/writings/pgsql/mvcc.pdf](https://momjian.us/main/writings/pgsql/mvcc.pdf))

------
denishpatel
They migrated from MySQL to Postgres
([https://www.yumpu.com/en/document/view/53683323/migrating-
ub...](https://www.yumpu.com/en/document/view/53683323/migrating-uber-from-
mysql-to-postgresql)) with almost same reasons and now they are counter
argumenting for their inability to use software and lack of skills to upgrade!

------
scotty79
Funny how the article is just:

\- we used X in a fashion that suited us best

\- it caused us problems Y because of some technicalities of X

\- so we switched to Z and we could avoid Y thanks to how Z handles the
technicalities differently than Y

and the top rated HN comments are:

\- you used the X wrong

\- all the technicalities of X that caused you problems Y are actually
superior features of X

~~~
zihotki
But think how funny that'd be if we assume that "all the technicalities of X
that caused you problems Y are actually superior features of X" is actually
true?

------
appleflaxen
This was a great overview and write-up.

Anyone know why they are using MySQL over MariaDB[1]?

1\. [https://mariadb.org/](https://mariadb.org/)

~~~
mwpmaybe
I'm not saying this is the case at Uber, but it's quite common for MariaDB
users to refer to it as MySQL, or to use the two names interchangeably. It's
difficult to get out of the habit as all of the CLI tools are still mysql*,
clients (i.e. the protocol) are all named "mysql", CM modules (like Ansible's)
are all named "mysql", etc. And sometimes it's just simpler (if lazier) to say
you're using MySQL rather than explain what MariaDB is and why it's
preferable.

~~~
zokier
I still occasionally call LibreOffice OpenOffice. Besides mysql has less
syllables than mariadb.

------
jswny
It really is sad that we all just can't appreciate a company like Uber giving
us insight into their internal engineering choices. So many people on here
think Postgres is so perfect that if you don't like it you must be using it
wrong. Postgres is a tool like anything else. It has good use cases, and bad
use cases.

------
pella
anno 2013 "MIGRATING UBER FROM MYSQL TO POSTGRESQL"

[https://www.yumpu.com/en/document/view/53683323/migrating-
ub...](https://www.yumpu.com/en/document/view/53683323/migrating-uber-from-
mysql-to-postgresql)

------
markpapadakis
Great write-up. A few observations:

1\. The encoding and translation schemes of Postgres and mySQL/InnoDB are well
described in the blog post, and I would also agree that InnoDB’s design is,
all things considered, better for all the reasons outlined in the post.

2\. I don’t understand why anyone still uses lseek() followed by
read()/write() and not pread()/pwrite() syscalls. It’s trivial to replace the
pair of calls with one. Aerospike is another datastore that resorts to pairs
of seek/red-write instead of pread/pwrite calls.

3\. Process/connection model makes no real sense nowadays - although to be
fair, there is, today, practically almost no difference in terms of footprint
between OS threads and OS processes (other than memory and FDs sharing
semantics, they are practically the same). It’s still more appropriate to use
threads (although I ‘d argue maintaining a pool of threads for processing
requests and one/few threads for multiplexing network I/O is the better
choice).

4\. ALTER TABLE is obviously a pain point with mySQL, although I am not really
sure many users with large datasets care; they probably figured out long ago
it’s going to be an issue and they designed and expanded accordingly. It’s
also a relatively rare operation. That said, other than using mySQL (or any
other RDBMS) to build the data plane for an elaborate, distributed KV store,
one should consider Salesforce’s approach too. Their tables have some 50 or so
columns, and the column names are generic (e.g column_0, column_1, … ). They
have a registry where they assign column indices (e.g column_0) to a specific
high-level entity type (e.g customer title, or price), and whenever they need
to query, they just translate from the high level entity to the actual column
names and it works. They also, IIRC, use other tables to index those columns
(e.g such an index table can have just 3 columns, table id, column index,
value) and they consult that index when needed (FriendFeed did something
similar).

5\. Cassandra should have no problem supporting the operations and semantics
of Shemaless ass described in their blog posts. However, given they already
operate it in production, they probably considered it and decided against it.

~~~
fdr
w.r.t. 2:

[https://www.postgresql.org/message-
id/6248.1046130083%40sss....](https://www.postgresql.org/message-
id/6248.1046130083%40sss.pgh.pa.us)

    
    
         Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
         > Tom Lane wrote:
         >> It seems unlikely to me that eliminating lseek on some platforms would
         >> be worth the hassle of maintaining two code paths.  lseek is mighty
         >> cheap as system calls go.
         >> 
         > It was considered expensive enough to write a syscall avoidance layer 
         > that caches the file pointer and skips lseek if fpos==offset.
         
         You're missing the point: that layer is mostly there to ensure that we
         don't foul up the kernel's readahead recognition for sequential fetches.
         It's nice that Linux doesn't care, but Linux is not the only platform
         we worry about.
         
         			regards, tom lane

~~~
markpapadakis
This ML thread's only real argument is that some OS/Kernels may not support
pread/pwrite. The readahead argument makes little to no sense IMO. Unless
there are too many uses of random access IO in the codebase, they should use
pread and friends if available there. Especially considering most people run
it on Linux not some exotic OS nowadays.

------
0xmohit
Worth quoting from the article:

    
    
      Accordingly, using pgbouncer to do connection pooling with
      Postgres has been generally successful for us. However, we have
      had occasional application bugs in our backend services that
      caused them to open more active connections (usually “idle in
      transaction” connections) than the services ought to be using,
      and these bugs have caused extended downtimes for us.

~~~
fdr
What I don't understand: no system gets to maintain open transactions for
free. MySQL keeps UNDO logs, so the effect there, much like Oracle, is
possibly running out of UNDO space, as well as slowing down reads that have to
apply UNDO to get the last-committed row version. So what gives? Did Uber
Engineering fix the dangling transaction issues while migrating off, or are
they relying on some other property of MySQL vs. Postgres?

~~~
0xmohit
They seem to be relying on the database take care of those :)

------
vanviegen
One major advantage of MySQL's clustered indexes the article doesn't mention
is that, although secondary key reads may be a little slower, primary key
reads will be faster. The row data lives _in_ the primary key index, so there
is no need for referencing an additional database page (possibly causing
random I/O).

This is especially relevant when doing range queries over the primary key.
Imagine a table containing billions of chat messages, from which you want to
retrieve a single conversation history. With a clustered primary key on
(conversation id, message id), MySQL would need to process just a couple of
database pages. Postgres, on the other hand, would need to reference a semi-
random page for each of the messages.

Now imagine a 10k message chat conversation, a table too large to fit into
RAM, and storage by means of spinning rust (yeah, yeah, I know what year it is
:-)). The difference would be somewhere between 2 and 3 orders of magnitude.

~~~
pritambaral
Uhh ... Postgres supports Index-only scans; as long as the data you're asking
for is in the index, that is.

So if you have an index on (conversation_id, message_id), and you try to
retrieve message ids of a specific conversation, only the index will be
touched.

~~~
vanviegen
Doh, you're right of course! We were having this problem in the Postgres 9.1
era, before index-only scans were a thing.

Still, it's quite inefficient maintaining an extra copy of the data that is
never actually used. Though no longer multiple orders of magnitude less
efficient.

However, I'd guess that programmers don't often think to add these seemingly
useless fields to an index, as it feels inefficient and just wrong. But at
least this offers an out in pathetic cases.

------
jasode
Fyi... a related (not duplicate) discussion of a previous Uber story:
[https://news.ycombinator.com/item?id=10923848](https://news.ycombinator.com/item?id=10923848)

------
trequartista
Wow, this is such a detailed analysis. Having used Postgres and suffered
issues with data replication as well as database crashes, this post was really
helpful.

------
ismdubey
The fact that Uber scaled to so many users with Postgress gives me such a
relief. For now, I am good !!

------
cdelsolar
So basically, if you don't intend to use it as a relational database, and you
have enough scale to run cross-data-center (and across-the-world) master-
master replication, then you should maybe switch from PostgreSQL to MySQL?

------
vbezhenar
Why would anyone run hundreds of connections? A server can only process
number_of_processor_cores connections at once. Sure, few connections might
wait for I/O, but not hundreds, unless database is very untypical.

~~~
brianwawok
For example: You are using Python and you have 10 web servers and 20
background servers connected to a common DB. Each server has 10 threads, and
each thread holds 1 connection open. That is 300 open connections.

Opening and closing connections is very slow and expensive, so almost always
better to keep these 300 connections open than to try to be fancy.

You COULD try to say give each server only 3 connections and make them share,
which cuts you to 90 connections.. but then you have to try to share state
between different python processes (not easy), and will often end up with
deadlocks and sync overhead.

~~~
tracker1
Agreed, which is why many clients create a pool of connections that gets
reused. Connection cost is expensive, and the rdbms already handles
concurrency and even a couple thousand connections shouldn't be a significant
overhead.

~~~
brianwawok
I'm trying to find evidence on what memory usage is for MySQL for 1k
connections vs postgres with 1k connections. I am finding a lot of people
saying postgres has heavier connections but for 1k connections what's the
difference. 1MB of memory? 1GB?

~~~
kasey_junk
[https://wiki.postgresql.org/wiki/Number_Of_Database_Connecti...](https://wiki.postgresql.org/wiki/Number_Of_Database_Connections)

Its a little outdated but I've found it largely holds. That is, you'd want a
mighty box for 1000 concurrent connections.

That said, connection queuing works really well with postgres such that
throughput is frequently better at lower connection counts than at higher
ones.

I have no experience with mysql that is less than 15 years out of date.

------
viraptor
I'd like to see their migration strategy as well. I mean, they say moving from
pgsql 9.2 to higher version (which then allows online upgrades) is too much
work. Yet they'll have to migrate to mysql, which will take much more
engineering effort. For anything close to realtime, they'll need to copy the
old data, while at the same time forking the new writes into both pgsql slaves
and new mysql servers. And they cannot use WAL for that without some advanced
processing.

I hope this follows in the next blog post.

------
sriharis
A common solution to conserve bandwidth is to use compression. This can be
done easily in PostgreSQL by using ssh tunnels and turning on compression. I
wonder why they didn't try that.

~~~
Terretta
Reading this, I wondered if they deeply understood the difference between
bandwidth and latency. I doubt compression would be the thing, as I'd guess
they had a latency problem, not a bandwidth problem.

You see this kind of misunderstanding commonly pooled with other "a computer
has physically moving parts" misunderstandings like the ORM or connection
pooling concerns outlined.

After tyranny of abstractions, nobody knows how the moving parts really work.

// As alternatives given you'd like to keep the immutable data approach which
brings a lot of goodness, consider a log-structured file system for the disk
concerns, and geo-sensible replication for the latency concerns. At this
scale, for near real-time app, bi-coastal DB is a bad model. You shouldn't
have all users in SF querying a database in DC. Given the nature of the
business model, they can share geographically at one time scale, and roll up
and replicate geo diverse data at a high latency leisure.

------
polskibus
Does anyone know if citusdb or enterprisedb improve on the postgresql issues
mentioned in the post vs last postgresql version?

~~~
brandur
CitusDB is essentially vanilla Postgres now that they've moved their
implementation to an extension.

It won't specifically address the problems posted here, but will solve other
ones like trying to scale a system beyond a single node.

------
brandur
Interesting post! While I suspect that a MySQL installation is just as likely
to have its own problems in the long run, I'm not smart enough to provide any
kind of compelling point-by-point refutation. However, a number of the points
made strike me as having possible trade-offs that were not really addressed
in-depth.

My summary of the arguments against Postgres and some basic thoughts on each:

1\. Writes are more expensive because all secondary indexes must be updated
with a new physical location.

This may be true, but the MySQL model of using primary keys from secondary
indexes will mean that reads are inherently expensive. They even mention this:

> This design means that InnoDB is at a slight disadvantage to Postgres when
> doing a secondary key lookup, since two indexes must be searched with InnoDB
> compared to just one for Postgres.

So it seems like a classic read vs. write trade-off.

I'm also a little skeptical of any performance claims that don't include any
numbers. It's possible that efficient coding in Postgres makes this much more
of a wash in terms of performance than claimed here.

2\. Replication is less efficient because it's sending a lot of physical
information out along the stream.

This is quite true, but IMO unlikely to be a major issues for most users
unless they're dealing with a huge amount of data and streaming it over a slow
connection (i.e. across the continent like Uber's disaster recovery center).

3\. Data corruption from a bug found in 9.2.

Certainly a bad situation, but IMO not really a valid claim for situation. 9.2
is way behind at this point, and there's not much to say that they wouldn't
have encountered a similar bug or something worse in MySQL in all that time,
especially operating at scale.

To give a counter-anecdote, I operated Postgres at scale for a long time
across many versions starting at 9.1 and was lucky enough to have never once
encountered a bug with data corruption.

4\. Postgres' MVCC model makes it easy for replicas to accidentally fall
behind their master.

This one is valid (and annoying), but there are very good reasons for it, and
you have some switches to control the behavior based on value transactions
finishing on followers or prompt replication more highly.

5\. Upgrades are difficult because the WAL stream works at a physical level
and is not compatible between database versions.

Again, this is valid, but the statement-based replication is a scary idea.
Row-level replication is more interesting and probably something that Postgres
should have though.

Some good news is that Postgres is getting closer to logical WAL streaming,
which should make in-place upgrades possible.

~~~
codedokode
> This may be true, but the MySQL model of using primary keys from secondary
> indexes will mean that reads are inherently expensive.

With MySQL the indexes are usually kept in memory so there should not be
noticeable overhead.

------
nierman
with respect to "Difficulty upgrading to newer releases":

pg_upgade has a --link option which uses hard links in the new cluster to
reference files from the old cluster. This can be a very fast way to do
upgrades even for large databases (most of the data between major versions
will look the same; perhaps only some mucking with system catalogs is required
in the new cluster). Furthermore, you can use rsync with --hard-links to very
quickly upgrade your standby instances (creating hard links on the remote
server rather than transferring the full data).

that is all referenced in the current documentation:
[https://www.postgresql.org/docs/current/static/pgupgrade.htm...](https://www.postgresql.org/docs/current/static/pgupgrade.html)

------
mace
I think a fair summary is "We had a few problems with PostgreSQL mostly due to
our rapid growth. We rethought the problem and changed the way we use a
relational database for large-scale storage and are now using MySQL as a dumb
key-value store."

The conclusion reinforces this: "Postgres served us well in the early days of
Uber, but we ran into significant problems scaling Postgres with our growth."

I read this a both endorsement of PostgreSQL as well as highlighting some of
the problems that any large-scale use of it would run into.

------
snarfy
I wonder how much of this could have been solved by using a different file
system. There is all of this talk about the physical layer but no mention of
the file system used.

> Typically, write amplification refers to a problem with writing data to SSD
> disks: a small logical update (say, writing a few bytes) becomes a much
> larger, costlier update when translated to the physical layer.

This is exactly the type of problem solved by the file system layer.

~~~
pritambaral
How would any filesystem help with that? SSDs typically write entire blocks,
even if the OS asks them to only write a few bytes. That's just how SSDs work.

~~~
Terretta
Rethink the file system. Think log-structured.

~~~
pritambaral
Doesn't matter. Even a log-structured fs, when told to write 4 bytes of new
data to disk will have to write those 4 bytes immediately and return. If an
application asks the OS (via fsync), and the OS asks the fs and the fs doesn't
write to disk but tells the OS it did, then the fs just lied and risked data
loss.

If power is lost between the fs lying to the OS and its subsequently actually
writing to disk, the data that the fs lied about is lost.

You don't want that with a DB on top of it.

------
mspradley
Why did they not consider Oracle or MS SQL Server? They can afford the
licensing and both have numerous replication technologies to choose from.

~~~
je42
Oracle ??? Let's start:

\- No transactions for DDL changes.

\- Oldschool commandline client. auto commit disabled by default. no history.

\- Weird sql syntax + semantics. f.e. null == empty string.

~~~
quicksilver03
I'm with you on all of your points except auto-commit: having it off by
default is much better, it forces you to explicitly commit when you change
data and thus think if you really want to persist the changes.

~~~
je42
yes. except that all other clients have auto commit on. thus if you transition
from that to auto-commit off. You face two issues: 1\. you don't know. and you
thing you commited something but actually you didn't 2\. you forgot and you
committed something. but transaction keeps an row lock open and you bring down
the server... :)

------
manigandham
They should use SQL Server (which has great replication abilities, although
horizontal scale out is still difficult) or MemSQL (which is distributed,
scalable, and can do everything they need).

Or use Cassandra which is a perfect fit (or ScyllaDB which is a better version
of it).

This all sounds like an aversion to just paying for or using better products
when the problem is easily solved.

~~~
darklajid
SQL Server needs a couple hits with a cluebat to even satisfy the basic three
requirements for a database system they listed (I'm talking about the MVCC
line of course).

Honestly, if a commercial database provides what you require and you have the
budget? Sure, why not. But I'd always try to avoid that myself, because the
serious players (MS, Oracle, is anyone still using DB2?) are really, really
expensive.

~~~
manigandham
> basic three requirements

SQL Server works just fine and has lots of concurrency control to do whatever
they need. And the way they're using the database doesn't seem to really make
this an issue outside of their replication.

> are really, really expensive

This whole janky setup they have sounds even worse. None of the commercial
relational databases are really that expensive considering what they offer,
and we're talking about Uber here. We're a small startup that pays for both.

Money for (a better) working product with support is the right call, not build
it yourself. This is just poor tech decision (outside of using a RDBMS in the
first place).

~~~
brianwawok
Maybe you should call up Google, Facebook,linked in, Twitter, and tell them
they all made mistakes and should use MSSQL.

~~~
manigandham
Those are all massive scale tech companies that actually needed to invent many
of the datastore technologies used for big data today. Uber is not among them.

But I'm sure you know all this as it seems you work for an enterprise database
company that actually makes the exact product Uber should use.

------
exabrial
My big think with MySQL is that the last time I needed to make this decision
(about 3yr ago), the support tooling, community knowledge, and documentation
around MySQL was light years ahead of Postgres. There were literally hundreds
of MySQL clients and utilities and Postgres was "just a database".

Competition breeds excellence.

------
simon2Q
I've tried to produce a full reply to most of the technical points raised
there. [http://blog.2ndquadrant.com/thoughts-on-ubers-list-of-
postgr...](http://blog.2ndquadrant.com/thoughts-on-ubers-list-of-postgres-
limitations/)

------
_navaneethan
From this article, Can i assume the below point?

why postgres is designed such a way of "physical replication" rather than the
design of mysql's ONLY "logical replication"?

Because postgres empowering _data integrity_ with help of forceful
constraints.

------
hyperion2010
This is a fantastic read. I hope the pg folks can turn as many of the issues
brought up here into bug reports as possible (I think many of the issues,
especially re: replication, are known), this kind of feedback is invaluable.

~~~
davidw
Some of these may be tradeoffs rather than bugs. I'd _love_ to read a reply
from someone knowledgeable about Postgres internals - it'd be very
interesting.

~~~
hyperion2010
Yep! saurik's reply above sheds some light on the issue (read more important
than write). Also, to the downvoter, sometimes the solution to a bug report is
improved and more visible documentation of existing features.

------
raisyer
while Postgress might be better if you use it 'as-is'.... community of MySQL
is much better and the tools available are more mature... just goes on to
prove that even if something is not-that-good.. it still might be
successful,scalable and popular if there is a strong community behind it..

~~~
madhusudhan000
>community of MySQL is much better and the tools available are more mature

Can you backup your statements with some facts. I have seen the postgres
community to be much better particularly now with Oracle taking ownership.

------
distantsounds
Perhaps their engineers can design a web page that allows the scroll wheel to
work.

------
madhusudhan000
So let me try to summarise this.

Poor replica MVCC support

They are actually pointing to a blog article written in 2010 ->
[http://blog.2ndquadrant.com/tradeoffs_in_hot_standby_deplo/](http://blog.2ndquadrant.com/tradeoffs_in_hot_standby_deplo/)

Do they realise that it is 2016 ?

Guess they did't bother to understand the hot standby feedback system.

> Postgres’s design resulted in inefficiencies and difficulties for our data
> at Uber.

What kind of inefficiency ? The explain what is purpose of WAL and replication
which every database person knows about but didn't care to explain the actual
problem at hand ?

Data corruption

> During a routine master database promotion to increase database capacity, we
> ran into a Postgres 9.2 bug

Why the heck didn't they upgrade to a newer version ? Did you report this bug
to pg dev , did they take so much time to fix this, or were you just assuming
that the bug could fix itself ?

> The bug we ran into only affected certain releases of Postgres 9.2 and has
> been fixed for a long time now. However, we still find it worrisome that
> this class of bug can happen at all.

Postgres 9.2 is pretty old and there has been 3 major releases after that. WTF
?

I can say countless instances where MySQL data corruption was a constant
nuisance with version 5.5 and they have fixed it with newer releases.

Replication

> During peak traffic early on, our bandwidth to the storage web service
> simply wasn’t fast enough to keep up with the rate at which WALs were being
> written to it

So you have run into a hardware limitation and then blame postgres. What was
limit that you hit ? I don't understand this point at all.

Concept of context switching

I am surprised that this is actually an issue, in a database the slowest part
is always the disk and not the CPU. Confused on how did they hit this
limitation first without actually touching others.

Time taken by a context switch :
[http://stackoverflow.com/questions/21887797/what-is-the-
over...](http://stackoverflow.com/questions/21887797/what-is-the-overhead-of-
a-context-switch)

Which is in microseconds.

InnoDB buffer pool

> By comparison, the InnoDB storage engine implements its own LRU in something
> it calls the InnoDB buffer pool

Postgres has something similar called shared_buffer. They are speaking as if
postgres relies entirely on the operating system which is false.

> It makes it possible to implement a custom LRU design. For instance, it’s
> possible to detect pathological access patterns that would blow out the LRU
> and prevent them from doing too much damage

Not sure what kind of damage they are speaking. In a postgres sequential scan
(full table scan), a ring buffer is used instead and does not result in the
shared buffers being blown away.

If you need a custom LRU design, there is definitely something wrong in the
way that you are using an OLTP database.

Connection Handling

This is complete BS. Nobody uses databases without connection pools. Agree
that a thread is more lightweight than a process, but you would never hit this
limit at all in real time which is in the order of microseconds again. In a
production system, one would open connections immediately and then hold them
in the connection pool. This overhead is almost not visible at all. If you are
constantly opening and closing connections then there is something seriously
wrong with your design.

> However, we have had occasional application bugs in our backend services
> that caused them to open more active connections (usually “idle in
> transaction” connections) than the services ought to be using, and these
> bugs have caused extended downtimes for us

So they are blaming the database for a bug in their design/system. Computers
are no match for human stupidity.

> Accordingly, using pgbouncer to do connection pooling with Postgres has been
> generally successful for us.

Again what is the problem, the whole article smells more and more like a
useless rant, just because you dont know how to use them ?

Conclusion

Another thing is that they have not given any kind of query/access pattern in
which they use postgres/mysql. They put in a couple of low level things and
then say that postgres is badly designed.

I can think of only two logical explanations

1) The article writer was already familiar with MySQL and they didn't bother
to even dig into postgres deeper

2) They have been paid by oracle :P

~~~
david_k_kim
I completely agree with you. The blog post seems to be seriously biased to
justify their choices due to other reasons.

------
postila
Well done :-)
[https://twitter.com/LeviNotik/status/757991465649778689](https://twitter.com/LeviNotik/status/757991465649778689)

------
xaprb
Why not PostgreSQL? (Sorry, someone had to say it.)

------
zeeshanm
I like the sample data they have used:

    
    
      id  first         last            birth_year
      1   Blaise        Pascal          1623
      2   Gottfried     Leibniz         1646
      3   Emmy          Noether         1882
      4   Muhammad      al-Khwārizmī    780
      5   Alan          Turing          1912
      6   Srinivasa     Ramanujan       1887
      7   Ada           Lovelace        1815
      8   Henri         Poincaré        1854

~~~
antonius
Mind explaining the significance? I didn't pick up on it.

~~~
muddyrivers
It caught my eyes as well. Like it very much.

------
dschiptsov
Nice to see how Postgres (a relative of Informix) follows an old-school maxim
to focus on Consistency and Durability by being "append-only" and never over-
writing the data. Sticking to the right principles is better than over-
optimization.

The Uber engineers should, perhaps, take a look at Changelogs of last 5 or so
releases of MySQL to see how many bugs in InnoDB has been found in each
release and read stories about data loses due to inability to repair storages.

According to old-school DBA tradition, it is much better to have an
straightforward storage engine based on right principles written in C than
fancy storage written in C++. At least if one values ones data.

Well, in the age of in-memory "databases", "durability through replication"
and "eventual consistency" old school focus on disk commits might sound a bit
outdated, until one gets that moment when shards got messed up and there is no
way to know how many writes are missing and where.

Database is a durable storage which guarantees data consistency and ability to
roll-back to a clean state through direct-access (by passing all caches)
writes. At lest this is what we had in glorious times of IDS 7.3

------
frik
Also check out highscalability.com for more stories that value MySQL and its
great InnoDB engine:
[http://highscalability.com/blog/category/mysql](http://highscalability.com/blog/category/mysql)

------
morekozhambu
There is this interesting talk on MySQL vs Postgres.

[https://www.youtube.com/watch?v=emgJtr9tIME](https://www.youtube.com/watch?v=emgJtr9tIME)

------
cnfjdnx
Uber """"""""""engineering""""""""""

------
kev009
I like this transparency, I know I never want to work at uber.

------
slantedview
The connection handling section was surprising to me, reading that Postgres
uses a process per connection! This is pretty shocking to me, in a bad way.

~~~
iampims
pgbouncer works pretty well for that. I kinda wish it was part of the default
Postgres install.

~~~
scrollaway
How does pgbouncer work with Amazon RDS (where you can't install separate
software)?

~~~
dijit
you install it on the clients of the servers themselves and "connect locally"
to the bouncer or have a bouncer step (server pair) in front of the database
connections themselves.

------
swasheck
this reads like a laundry list of buzzwords that were designed to justify not
throwing any effort into postgresql and just going with a new shiny toy (not
mysql. yes. i know it's been around for a while).

it happens everywhere.

~~~
edgyswingset
From the post:

> [...] This design difference means that the MySQL replication binary log is
> significantly more compact than the PostgreSQL WAL stream.

Doesn't sound like what you described at all.

~~~
swasheck
so you're pulling one line from the article to tell me that i'm wrong?

on-disk format/write amplification: > For tables with a large number of
secondary indexes, these superfluous steps can cause enormous inefficiencies.
For instance, if we have a table with a dozen indexes defined on it, an update
to a field that is only covered by a single index must be propagated into all
12 indexes to reflect the ctid for the new row.

How wide is their data? Depending on the answer to this, it could be that
they've over-indexed, have a poor indexing strategy, or are reacting to the
poor queries generated by an ORM (not sure if they use one, or if they hand-
code their own SQL).

data corruption: everyone has bugs.
[https://bugs.mysql.com/search.php?search_for=&status=Active&...](https://bugs.mysql.com/search.php?search_for=&status=Active&severity=1&limit=10&order_by=&cmd=display&direction=ASC&os=0&phpver=&bug_age=0)
add in mysql's tendency to loosely-adhere to the SQL standard and there are
many ways that you can actually corrupt your own data.

i'm not here to debate whether postgres is better than mysql. i'm just saying
that it seems like a lot of research went into justifying a switch. who knows,
maybe that research could have been spent optimizing their current
environment.

~~~
coldtea
> _so you 're pulling one line from the article to tell me that i'm wrong?_

Better than telling them that they are wrong while not only not pulling even
one line from their article, but misattributing it to be something very
different from what it is.

In what world does the response to the concerns and analysis in the article
can ever be: "data corruption: everyone has bugs"...

~~~
swasheck
the same world where mysql lets you corrupt your own data. did you just stop
reading? in what world does a data corruption event prompt you to change
platforms to another platform that has a history of data corruption?

~~~
fadzlan
They do address other stuff. They have huge writes and needs better writes
performance. Maybe not what you and I need, but hey, I suppose they know Uber
needs better?

MySQL handles it differently than Postgres, and gives them better performance
for their purpose (based on their experience/test). They were explaining it in
the parlance of the terms that MySQL and Postgres. If those are buzzwords,
then MySQL and Postgres are both created using buzzwords?

Their explanation is not perfect (for me, why do their datamodel needs massive
updates?). But I wouldn't write it off as buzzwords and dismissing Postgres
because of data corruption. There are a lot of other things they were trying
to explain there.

