
Re: Why Uber Engineering Switched from Postgres to MySQL - GordonS
https://ayende.com/blog/175137/re-why-uber-engineering-switched-from-postgres-to-mysql
======
codeulike
Explanation: Ayende is a well known programmer in the .NET community. Among
other things he works on RavenDB (open-source NoSql db for .NET) and is
currently working on a new storage engine for it called Voron. In this post he
discusses Uber's blog post about Postgres and MySQL and compares the low level
workings of both with what his team is doing with RavenDB and Voron.

~~~
crescentfresh
Thank you for explaining what Voron is.

~~~
gtrubetskoy
BTW - "voron" is raven in Russian.

------
gtrubetskoy
The article talks about Postgres mostly delegating caching to the OS while
dealing with it via system calls (which they describe as "expensive"), while
InnoDB has its own implementation and is therefore faster.

I wonder if part of the problem is really Linux, not Postgres and that another
way they could have addressed this is by trying FreeBSD (which is what a lot
of people run Postgres on).

~~~
jasode
_> I wonder if part of the problem is really Linux, not Postgres_

I'm not a PostgreSQL expert but it seems like it has a different philosophy
with OS filesystem buffering.

Using Oracle and MS SQL Server as counterexamples... you can configure Oracle
to bypass os filesystem buffers by using "raw" disk io. With MS SQL Server, it
opens datafiles with CreateFile(,,,FILE_FLAG_NO_BUFFERING) to bypass the
Windows NTFS caching. (Microsoft recently ported SQL Server to Linux so not
sure what combination of techniques they are doing on there since
CreateFile(FILE_FLAG_NO_BUFFERING) is a Win32 api and not relevant to Linux.)

The closest approximation PostgreSQL has is the "shared_buffers" parameter but
from the documentation[1] I've read, the best practice for that setting is 25%
to 40% of main memory. In contrast, Oracle and MS SQL Server db engines with
their self-managed buffer pools are designed to use almost 100% of the main
memory.

Conclusion: PostgreSQL relies on the os filesystem cache and its own db buffer
does not supersede the os buffers. Other db engines' buffer pools are designed
to bypass the os filesystem cache.

Installing PostgreSQL on FreeBSD instead of Linux wouldn't drastically change
the recommended range for "shared_buffers".

[1][https://www.postgresql.org/docs/9.1/static/runtime-config-
re...](https://www.postgresql.org/docs/9.1/static/runtime-config-
resource.html)

~~~
valarauca1
>Microsoft recently ported SQL Server to Linux so not sure what combination of
techniques they are doing on there

Open options:

    
    
            int f = open("your file", O_DIRECT|O_DSYNC|O_RDWR);
    

O_DIRECT signals that a file descriptor should by-pass kernel level caching
and write directly to the device.

O_SYNC signals that a file descriptor calls should not return until all
data+metadata has been synced with disk.

O_DSYNC does the same as O_SYNC but doesn't force meta-data synchronicity
before the block ends.

Related Stack Overflow [https://stackoverflow.com/questions/5055859/how-are-
the-o-sy...](https://stackoverflow.com/questions/5055859/how-are-the-o-sync-
and-o-direct-flags-in-open2-different-alike)

Related LWN article
[https://lwn.net/Articles/457667/](https://lwn.net/Articles/457667/)

------
neilc
> a lot of the pain the Uber is feeling has to do with the way Postgres has
> implemented MVCC. Because they write new records all the time, they need to
> update all indexes, all the time

This is not true in general: there are circumstances where Postgres can skip
updating indexes when a tuple is updated, assuming the update only changes
values in non-indexed columns. See:

[http://pgsql.tapoueh.org/site/html/misc/hot.html](http://pgsql.tapoueh.org/site/html/misc/hot.html)

[https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f...](https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;h=f12cad44e56c363e62fa617497bbedfe1ba8c1fe;hb=HEAD)

~~~
nl
The Uber article referenced makes it extremely clear they are working on
indexed columns - the examples are all about updating the secondary indexed.

~~~
Uberphallus
I'm curious on the particular usecase that requires slamming indexed columns
that often.

~~~
matthewrudy
Imagine the Uber `Rides` table, indexed on `status`

Every time a new order is created the status flows from

* unassigned

* assigned

* passenger on board

* completed

That should really be an event stream, but if you want to find all
`unassigned` orders, an index would be good.

We had a similar problem with this kind of table, at much lower volume than
uber.

But only when there was a two column index

`status, pickup time`

Or

`status, created at`

~~~
Uberphallus
> That should really be an event stream, but if you want to find all
> `unassigned` orders, an index would be good.

No, an index is not good for that, because it only has 4 values (at most 4
times faster than a full table scan), and it hammers the index on every
update, using PostgreSQL or not.

Same for the two column index: single column indexes on pickup time and
creation time are highly selective, and their values are updated basically
never, whereas status has low selectivity and updates are common: combining
them you don't significantly improve selects on the two columns, and limit the
performance edge on time select where time.

Status index only performs well when you select one of the values which is is
significantly underrepresented in the column (e.g. all except completed). The
most performing approach without doing black magic is to split in columns with
separate indexes: Assigned Time, Pickup Time, Completed Time. If you want
unassigned ones, you select where Assigned Time IS NULL, same for the other
columns. Performance on update for most RDBMS blasts, as index updates are now
distributed among several, allowing higher concurrency and overall throughput.

------
jakozaur
Discussion on original blog post was yesterday:
[https://news.ycombinator.com/item?id=12166585](https://news.ycombinator.com/item?id=12166585)

------
xomateix
Was down for me, cached version:
[https://webcache.googleusercontent.com/search?q=cache:https%...](https://webcache.googleusercontent.com/search?q=cache:https%3A%2F%2Fayende.com%2Fblog%2F175137%2Fre-
why-uber-engineering-switched-from-postgres-to-mysql)

------
pella
more info: Matt Ranney - presentation ( Senior Staff Engineer at Uber )

"Matt Ranney at All Your Base 2015"

[https://vimeo.com/145842299](https://vimeo.com/145842299) ( 30 min )

keywords: Uber + PostgreSQL + Chaos Monkey-style failure testing

"WHAT WILL I LEARN?

After this talk, you’ll be able to better assess the risk from the different
failure modes of databases in your system’s architecture."

------
darkhorn
If you want performance and the data is not critical like a Facebook post you
need to use MySQL. But if you deal with money, like a bank or an ERP then the
way to go is PostgreSQL. It should be a known fact for software developers.

~~~
BHSPitMonkey
As someone unfamiliar with the inner workings of both, why is that the case?

~~~
ak39
Because guaranteed transactions matter.

Moving $10 from Bob's account to Jack's involves:

1\. Inserting a transaction to remove $10 from Bob's account

2\. Inserting a transaction to add $10 to Jack's account.

Both 1 and 2 must succeed or fail in total if either of them fail (or both).

You can imagine the disaster in banking apps, ERPs and other financial systems
when 1 succeeds but 2 fails and the database is happy with that imbalanced
state.

~~~
BHSPitMonkey
MySQL has transactions, so why does this example prove that it's unsafe for
critical applications?

------
fritzy
Both of these articles fail to mention that as of Postgres 9.5, you can use
logical replication rather than WAL. This is why Amazon's RDS now supports
multi-region slaving with Postgres.

------
bogomipz
Why is there a reference and link to a wikipedia article about a sci-fi novel
in the second to last paragraph? Did I miss something? Is this an inside joke?

~~~
codeulike
Good question. [http://www.catb.org/jargon/html/O/on-the-gripping-
hand.html](http://www.catb.org/jargon/html/O/on-the-gripping-hand.html)

------
curiousgal
Are we supposed to know who Ayende is? Please change the title.

~~~
dev360
Almost everybody in the .NET community knows who he is.

~~~
soundoflight
I agree with @dev360. I think it has to do with RavenDB being more common with
.NET developers (at least from my experience).

~~~
V-2
Hm, isn't RavenDB a .NET solution? As such how could it not be "more common
with .NET developers"?

~~~
sjm-lbm
RavenDB is just a server application. There's nothing stopping you from
connecting to it via, say, Python.

It's just much more commonly used in .NET apps.

------
mrmondo
As someone else mentioned, this is a repost from yesterday - mods?

~~~
GordonS
This isn't a repost, it's a blog post by Ayende in response to yesterday's
article.

I thought it worthy of posting since Ayende heads up RavenDB and knows a thing
or two about databases and their low level implentation

~~~
mrmondo
Oh my gosh you're totally correct! Sorry for my mistake there.

------
joering2
Very nice write-up! Hope this helps them spam the heck out of people who never
opted into their advertisement in the first place.

/sarcasm

I only wish LE would treat CAN-SPAM seriously and put more sources into
criminal enforcement.

~~~
philliphaydon
I don't understand your comment...

~~~
curiousgal
HN's own r/SubredditSimulator/ I presume.

