Hacker News new | comments | show | ask | jobs | submit | HarrisonFisk's comments login

(I am a manager of the Data-Perf team at FB which deals with lots of different DB technologies)

MySQL (specifically InnoDB) is extremely efficient as a storage backend compared to PostgreSQL.

There are a few features that make InnoDB better in many cases:

1. Change buffering for IO bound workloads: If you are IO bound, then InnoDB change buffering is a huge, huge win. It basically is able to reduce IO required for secondary index maintenance by a huge amount.


2. InnoDB compression: When you are space constrained (say using flash storage), then being able to compress your data is a big win. In our case, it reduces space by around 40% which translates directly to 40% less servers required. While you could do something like run PG on ZFS with compression, for an OLTP workload, you want the compression in the DB so that it can do a lot of work to minimize the compressing and decompressing of data.


3. Clustered index: The InnoDB PK is a clustered index. This makes a lot of query patterns (such as range scans of the PK) very cheap. Combined with covering indexes (which PG now has too!), you can really minimize the IO required by properly tuned queries.

There are a variety of smaller things as well, such as InnoDB doing logical writes to the redo log vs. PG doing full page writes, so on very high write systems, the REDO log bytes written will be dramatically less. Also MySQL replication has traditionally been more flexible than PG, but PG has made some great strides recently, so I don't know if I would maintain that position still.


Your smaller points don't make much sense to me... PG doesn't have a REDO log, that's just not the way it's architected. If you mean WAL, a patch went out in 9.4 to prevent updates to a page from rewriting more than necessary, so they're not doing full-page writes each time. Clustered index also makes inserts slower--it's again not a straight win for MySQL (and what query patterns other than range scans of the PK does it make cheaper?). Finally, while obviously it's not the same as InnoDB compression, TOAST does a rather good job in practice, and Postgres's indexes are quite efficiently compressed (especially with new changes in 9.4). I can't speak to (1), but it's not at all clear to me that any of these advantages put MySQL ahead in the long term, and certainly not for all workloads.


Right, PG calls the REDO log the WAL (for most purposes they are the same thing). I did not know that 9.4 can do partial page writes to the WAL now. Guess I will have more reading to do, thanks for pointing it out! A nice blog post by a colleague recently showing how large writes to redo logs matter is (not about PG, but why it is significant in the context of size of entries):


As far as when clustering a table is useful, see the CLUSTER command in PG. It is roughly the same places you would want to do it, except it is automatically maintained. You do need to realize what is going on to minimize impact on inserts, but in a lot of cases, data in inserted in generally ascending order so it mostly 'free'. This does make GUIDs really bad for PKs in InnoDB.

Clustered indexes are like covering indexes (which PG got recently). You don't quite realize how useful they are until you get access to it ;)

InnoDB compression for us is primarily for non-lob objects, so TOAST is quite a bit different than the cross-row compression that we get. We will normally do compression of large objects outside of the DB whenever possible.

I'm not saying that InnoDB is always better than PG, but in a lot of cases that I have tested it with, it is indeed better. PG has come a long way recently, including options such as covering indexes to close the gap.


Thanks a lot for the detailed reply, this is why I love HN!

1-2 years back I saw a benchmark stating that PG works better than MySQL on I/O constrained AWS instances, so I didn't expect this ("I've learned something today").


I think Postgres is a better default choice because you are likely to be safer and for it to behave properly with your data. In certain cases, apparently Mysql might be faster for certain workloads, and with certain tradeoffs. That's the kind of informed decision you can make once you're scaling to that size.


Thanks. Presumably these optimizations could be added to postgres as well, no?

Since work is/will have to be done to the chosen db, it would be best for everyone if the one chosen was of high integrity.


There is nothing that prevents these from being implemented for PostgreSQL. In fact, they have already done a lot of optimizations that close the gap, such as covering indexes. Before covering indexes, InnoDB was even further ahead of them.

Of the ones mentioned, I would guess compression is the most complex. For InnoDB this took many years to get the point of it being usable and efficient. Many naive implementations can cause huge overheads for CPU which makes it unusable.


There is nothing that precludes public contributions to the project. The important part is that the change be useful to 'webscale' type applications.

Prior to this announcement, the only people involved have been these companies to get it bootstrapped, so the work so far looks a bit slanted to these companies.

Taken from the FAQ:


"We’ll be doing everything in the open, so everyone in the MySQL community will be able to take what they want from what we do and contribute however they like."


Yes, we are still very active users of MySQL. Most of the primary portions of facebook.com are still served from a backend MySQL system (with lots of caching and many other services involved). Some data is not stored in MySQL, but in other systems such as HBase (messages being a big one).


Doing compression on the ZFS level is significantly worse than InnoDB compression.

InnoDB has a lot of really smart optimizations which make it much better than just zipping things up. Included are the modification log (so you only have to re-compress occasionally) and a dynamic scaling ability to keep compressed pages in memory rather than always decompressing. These optimizations are really only possible with an understanding of the data.

I would only consider ZFS for something like an append-only data warehouse type system.


I'm sure you would want the compression done by the database and not the filesystem, since there are many ways to do compression to fit specific applications, and the database knows what it's trying to do. I read a little bit of the MySQL docs regarding how it uses compression, and it sounded pretty different than general-purpose compression.


Do you have any references?


This is a good intro for those (like me) who are unfamiliar:


I know that a lot of work has been done on InnoDB compression, but I didn't quite grasp the extent, thinking it wasn't far different from Postgres ( http://www.postgresql.org/docs/current/static/storage-toast.... ).


In addition, when you are dealing with many thousand application servers and many hundred database connections, the database will not end up happy having to maintain > 10k connections.

So you will end up needing to add a proxy or connect every time. As you indicated, since MySQL connections are so cheap (ie. you can do > 50k per second), it is faster and easier to create a new connection each time.


Not just the database, at those quantities of connections, the OS itself starts to have problems.

One frequently implemented solution is to kill off old idle connections, but you may be surprised at how many pooling solutions don't deal with killed connections well (or force the handling on the application at the wrong layer of abstraction).


> it is faster and easier to create a new connection each time.

Unless you're creating an SSL connection, right?

Did you guys read the same OP I read?


Why would the license matter for a kernel module? You know that the linux kernel is also GPL v2 right?


How about the usual reason? Because it can't be pulled into non-GPL projects.

This is actually relevant, at least, to me. I've been porting Twitter's fatcache[1] to iOS, and wouldn't be if it was GPL'd. I haven't even looked at Facebook's code, due to the GPL license, which makes it—literally—unusable to me on iOS and Android.

[1] https://github.com/twitter/fatcache


It is possible to make a legal argument that a Linux-specific device driver is in fact derivative of the Linux kernel code that it interacts with. If said argument is made, then no matter what the author of said code says, it may actually be forced to be GPL v2.

I am not a lawyer, and I have no real idea how likely said argument is to hold up in an actual court. It also seems unlikely that anyone would actually file a lawsuit about this sort of stuff. But if you want to be as careful as you can be, you won't go wrong if you use the GPL v2 on principle for a Linux device driver.

On this topic it is worth noting that Linus Torvalds has stated different seeming opinions at different times. In general as more and more of the Linux kernel has been made available to loadable modules, he's become less and less inclined towards the argument that they are not derivative works. See http://linuxmafia.com/faq/Kernel/proprietary-kernel-modules.... for a comparison of his views at two points in time. (I don't know his current views.)


What qualify as a derivative is always a complicated issue. FSF and Linus Torvalds have both given their opinions about it, but it really is up to the court to decide if a kernel module relies on the kernel. They would ask if a kernel is module based on the functionality of the kernel, to the point where it expands on what the kernel do.

Words like based, expand, and relies on is keywords that the judge or jury would look at in identifying if something is a derivative work. A kernel module would likely have to pass each to not be declared as derivative.

The word "linking" or "loadable modules" however is not part of copyright, and would only come into question regarding the intent of the author. One could argue that thanks to linus posts, it shows that his intention is to allow such actions under non-gpl terms even if the module would be a derivative work under copyright law. That would however be limited to code that linus himself has written.


IANAL, but if you create a derivative work of GPL v2 code you have to release it as GPL v2. This doesn't mean you, the copyright holder, cannot release it also with another license. The GPL forces you to ensure that people can use your extension of the original code with the same rules of the original code.

In case of kernel modules, if you distribute a binary built with kernel headers (like distributions do, then the sources of that kernel module have to be available under the the GPL license.

But if you distribute the sources of a kernel module, and each user compiles it, you can legally load it into a kernel without breaking the GPL. That's how ZFS can legally run on linux as a native module. I don't know if that's really necessary, but better safe than sorry.

Distributions can automate this compilation step so that from the point of view of the end user it doesn't make any difference, just a nuisance to the package maintainers (and yes, a few more mb of downloads for compilers and such).


IANAL, but if you create a derivative work of GPL v2 code you have to release it as GPL v2. This doesn't mean you, the copyright holder, cannot release it also with another license.

Sorry, not true.

The whole point of it being a derivative work is that someone else is also a copyright owner. If you release under a different license, you're saying that you give permission on their work. You can SAY that, but you're wrong.


As the copyright holder of a specific work, you can license the said work any way you like (GPLv2, Commercial, BSD, Apache, et cetera). Dual licensing [1] is not out of the question.

[1] http://en.wikipedia.org/wiki/Multi-licensing


It's indeed terribly unfortunate that you can't take free software and make it nonfree.


But it is kernel code. Would you expect it to be trivial to port a device mapper kernel module from Linux to Darwin? Seems like a bit of a stretch


It wasn't for 1B cash, it was 800M cash, 200M Sun stock.


The easiest way is to avoid running into the problem.

The main reason the central table space grows is due to rollback segments. To avoid collecting a lot of them there are two tips (in addition to file per table you mentioned):

1. Use multi-threaded purge. 2. Avoid very long running transactions (ie. 12+ hours)

Once we have done that, we don't really see this problem any longer.


This really reminds me a lot of the progression of MySQL.

Originally it was used with ISAM/MyISAM and it was pretty popular. Then InnoDB came around and it quickly revolutionized the MySQL world, allowing MySQL to grow to the next level. Now InnoDB is by far the most commonly used storage engine and the default on several distributions.


When you delete your account, the data is actually removed. As part of the recent privacy audit, the Irish DPC reviewed the account deletion framework and verified that it does indeed work:

http://www.dataprotection.ie/docs/21-09-12--Facebook-Ireland... (section 2.10 - page 41)



Applications are open for YC Summer 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact