

Google and Facebook Team Up to Modernize Old-School Databases - salemh
http://www.wired.com/wiredenterprise/2014/03/webscalesql/

======
collyw
So relational databases are old school, and column stores are all the hip
trendy thing, despite throwing away a ton of features?

~~~
jlouis
Well, there are three types of databases: Relational, Columnar, and finally
massively distributed. They throw away different feature sets in order to
handle a certain subset better.

The primary reason Relational databases are so friggin' strong is that they
have a solid foundation and maturity. Most new "NoSQL" stuff is immature crap
which should never be used in production for any kind of persistent data. But
they are being used as such, and I do note that matures the products over
time.

~~~
collyw
Tell me about it. I have been hauled into a project, where we are
reimplementing a relational database using NoSQL. No idea why, other than the
fact that they want to use NoSQL.

------
mixmastamyk
What is the rationale of putting this work into Mysql? With its sloppy
execution and the dark clouds of Oracle hanging over it? Why not Postgres, or
even MariaDB instead?

~~~
davidw
Because they are locked in to Mysql and can't easily switch to something
better.

~~~
MrBuddyCasino
I'd really like to see the evidence for that. Postres > MySQL, and MySQL
doesn't have that many lock-in features last time I looked.

Any Google/FB/Twitter employees here than can elaborate?

~~~
HarrisonFisk
(I am a manager of the Data-Perf team at FB which deals with lots of different
DB technologies)

MySQL (specifically InnoDB) is extremely efficient as a storage backend
compared to PostgreSQL.

There are a few features that make InnoDB better in many cases:

1\. Change buffering for IO bound workloads: If you are IO bound, then InnoDB
change buffering is a huge, huge win. It basically is able to reduce IO
required for secondary index maintenance by a huge amount.

[http://dev.mysql.com/doc/innodb/1.1/en/innodb-performance-
ch...](http://dev.mysql.com/doc/innodb/1.1/en/innodb-performance-
change_buffering.html)

2\. InnoDB compression: When you are space constrained (say using flash
storage), then being able to compress your data is a big win. In our case, it
reduces space by around 40% which translates directly to 40% less servers
required. While you could do something like run PG on ZFS with compression,
for an OLTP workload, you want the compression in the DB so that it can do a
lot of work to minimize the compressing and decompressing of data.

[https://dev.mysql.com/doc/refman/5.6/en/innodb-
compression-i...](https://dev.mysql.com/doc/refman/5.6/en/innodb-compression-
internals.html)

3\. Clustered index: The InnoDB PK is a clustered index. This makes a lot of
query patterns (such as range scans of the PK) very cheap. Combined with
covering indexes (which PG now has too!), you can really minimize the IO
required by properly tuned queries.

There are a variety of smaller things as well, such as InnoDB doing logical
writes to the redo log vs. PG doing full page writes, so on very high write
systems, the REDO log bytes written will be dramatically less. Also MySQL
replication has traditionally been more flexible than PG, but PG has made some
great strides recently, so I don't know if I would maintain that position
still.

~~~
Jweb_Guru
Your smaller points don't make much sense to me... PG doesn't have a REDO log,
that's just not the way it's architected. If you mean WAL, a patch went out in
9.4 to prevent updates to a page from rewriting more than necessary, so
they're not doing full-page writes each time. Clustered index also makes
inserts slower--it's again not a straight win for MySQL (and what query
patterns other than range scans of the PK does it make cheaper?). Finally,
while obviously it's not the same as InnoDB compression, TOAST does a rather
good job in practice, and Postgres's indexes are quite efficiently compressed
(especially with new changes in 9.4). I can't speak to (1), but it's not at
all clear to me that any of these advantages put MySQL ahead in the long term,
and certainly not for all workloads.

~~~
HarrisonFisk
Right, PG calls the REDO log the WAL (for most purposes they are the same
thing). I did not know that 9.4 can do partial page writes to the WAL now.
Guess I will have more reading to do, thanks for pointing it out! A nice blog
post by a colleague recently showing how large writes to redo logs matter is
(not about PG, but why it is significant in the context of size of entries):

[http://smalldatum.blogspot.com/2014/03/redo-logs-in-
mongodb-...](http://smalldatum.blogspot.com/2014/03/redo-logs-in-mongodb-and-
innodb.html)

As far as when clustering a table is useful, see the CLUSTER command in PG. It
is roughly the same places you would want to do it, except it is automatically
maintained. You do need to realize what is going on to minimize impact on
inserts, but in a lot of cases, data in inserted in generally ascending order
so it mostly 'free'. This does make GUIDs really bad for PKs in InnoDB.

Clustered indexes are like covering indexes (which PG got recently). You don't
quite realize how useful they are until you get access to it ;)

InnoDB compression for us is primarily for non-lob objects, so TOAST is quite
a bit different than the cross-row compression that we get. We will normally
do compression of large objects outside of the DB whenever possible.

I'm not saying that InnoDB is _always_ better than PG, but in _a lot_ of cases
that I have tested it with, it is indeed better. PG has come a long way
recently, including options such as covering indexes to close the gap.

------
rakoo
I don't think "Webscale" is the right word; they should have used
"Worldscale". The web part is irrelevant here.

~~~
RyanMcGreal
Every time I see the word "Webscale" I think of this animation:

[http://www.youtube.com/watch?v=b2F-DItXtZs](http://www.youtube.com/watch?v=b2F-DItXtZs)

~~~
coldtea
The name was SPECIFICALLY picked to refer to that animation, it's an inside
joke kind of thing.

------
jpalomaki
Previous discussion in HN at least here:
[https://news.ycombinator.com/item?id=7480843](https://news.ycombinator.com/item?id=7480843)

Project page: [http://webscalesql.org/](http://webscalesql.org/)

------
mikkelewis
Currently under what circumstances will we see performance increases, and by
how much?

------
arrc
Well not just Google / Facebook but Linkedin and Twitter are also in the team.

