Ingesting MySQL data at scale – Part 1

netcraft · on Aug 12, 2016

> Every day we collect more than 100 terabytes of data from MySQL databases that is an amazing amount of data - I dont use pinterest but had no idea they were that large.

firasd · on Aug 12, 2016

They have over 100 million MAUs so even 1000 UTF-8 characters for each each active user (email, first name, last name, location, tokens, other stuff) would be up to 0.4 TB. Then add all the inactive users, and pins, tags, friend connections, and all the other data they'll process for recommendations/discovery.

delta1 · on Aug 12, 2016

MAU == Monthly Active User for those that don't know the acronym like me

pbreit · on Aug 12, 2016

Curious if Postgres would be up to the task here?

jasode · on Aug 12, 2016

I notice you often ask that question on MySQL threads.[1][2]

Here's the recurring pattern: if it's a single-instance database, either PostgreSQL or MySQL will work. However, if it is sharded database with multi-master replication, the overwhelming industry preference is MySQL instead of PostgreSQL.

When it's just a single-instance db, a compelling case can be made for PostgreSQL because of features such as stricter type checking and stronger stored procedure language.

However for sharded databases, there have been several high-profile case studies of migrations from PostgreSQL to MySQL including Etsy and Uber. I can't think of any major company doing the reverse of migrating a multi-master db from MySQL to PostgreSQL.[3] From an operational standpoint, the replication of Postgre dbs is fragile compared to MySQL. Instagram is a famous example of a large website using PostgreSQL but keep in mind that they did(do) master-slave instead of master-master. Reddit on Postgre is another example configured as master-slave. As for this particular thread about Pinterest, they use master-master and I've seen no evidence that Postgres+3rd party tools is superior for that scenario.

If you disagree with the above assessment, it would be helpful if you explain how PostreSQL is equivalent or better than MySQL for multi-master db replication architectures. If you just repeatedly ask, "is Postgres good for this?", the replies don't seem to give you the answers you're looking for. If you have a strong position, you should state it so it guides further discussion.

[1]https://news.ycombinator.com/item?id=10087412

[2]https://news.ycombinator.com/item?id=10926854

[3] when Uber migrated from MySQL to PostgreSQL in 2013, it was still a single-instance to single-instance migration. They did it to take advantage of PostGIS features. The subsequent 2016 operational difficulties of multi db replication pushed them back to MySQL.

rwultsch · on Aug 12, 2016

Hi, I was the first MySQL DBA hired Pinterest and before that I worked at Facebook and GoDaddy. At none of these places did we run active/actice. One of the first things I did at Pinterest was rip out the multi-master configure because it is dangerous.

Why MySQL? Really easy replication, no vacuum, and great point lookup performance.

cookiecaper · on Aug 12, 2016

You're obviously qualified so I'd love to get more info on how you deployed MySQL replication. I have a low opinion of it after using it at my employer for the last few years. MySQL allows writes to slaves (yikes) and the binlog coordinates are a constant hassle. ROW, STATEMENT, and MIXED binlog formats can make interoperability a mess and further increase the risk that there will be discrepancies between slave and master data (for some reason, we're on STATEMENT and/or MIXED replication, which means that the slave receives a list of the queries to execute and may have different output than the master). To sync up a new slave, you have to set up slave_skip_errors to ignore primary key conflicts and other common errors that occur because it's impossible to really get a firm binlog coord (it's really only marginally possible due to Percona's efforts with innobackupex/mydumper). I know that GTIDs are available now and are supposed to resolve the issues with binlog coords, but we haven't activated/used this featureset yet. Is it really a magic bullet?

MySQL just seems to do replication in a very unstable way that no other major RDBMS is willing to approach. I use PostgreSQL replication on a personal project and while it takes a smidgen longer to set up (maybe 20 minutes instead of 5), it seems well worth it to me. I never have to worry about whether my slave's output matches the master (because the query is performed on the master and the result shipped out via the WAL and that's the only way to use replication). I don't have to worry that I connected to the wrong endpoint and accidentally wrote to the slave, causing a conflict that requires a new full dump and resync. I don't have to worry that binlog coords are going to be wrong or that a write will go in at just the wrong moment and potentially require me to redump and resync the whole thing again (but at least require me to skip errors).

How do you handle these issues with MySQL replication, and justify the risks in order to shave a few minutes off the upfront setup time?

rwultsch · on Aug 12, 2016

Hi cookiecaper, All our operational code (other than pinterest specific bits) is open sourced https://github.com/pinterest/mysql_utils

Both statement and row based replication can be very reliable in modern versions of MySQL. It is my experience the ways data is corrupted are: 1. read_only not being set on slaves, so random users can write to the slave. We set read_only on startup based on service discovery. 2. Bad automation for failovers. See https://github.com/pinterest/mysql_utils/blob/master/mysql_f... for how we do it. 3. Crashes without all the durability settings being on.

If you are having to run slave_skip_errors, you are doing it wrong. You should checkout out our automation for backups and restores. They can be found in mysql_restore.py and mysql_backup.py .

With regard to PG replication, I suggest you watch https://m.youtube.com/watch?v=bNeZYVIfskc&t=26m54s Uber had a 16 hours outage in large part caused by pg replication issues.

Vacuum issues are also no joke.

There are a host of other issues: MySQL can deal with large numbers of connections, PG needs middleware. MySQL is more efficient with "web" workloads where most queries only need to pull on row. etc...

-Rob

cookiecaper · on Aug 13, 2016

Those MySQL utilities definitely have a lot of interesting stuff in them and will be good for further research, so thanks. After digging up the xtrabackup command used in there I don't see anything obvious that we were missing (we no longer perform xtrabackup commands since we're all RDS now), but will continue to research. It actually looks like some instances may have been as simple as using the wrong slave coords file; apparently xtrabackup_slave_info contains the master coords, not xtrabackup_binlog_info, which records where the slave is in its own binlog. That's a bit counterintuitive.

We've had other problems with innobackupex, though. We were working with Percona support on a case where a backup lock blocked writes to the DB for 40 minutes and never really resolved it. We had to use minimalist locking parameters in several other cases, which may have also contributed to incorrect binlog coordinates. We experienced a variety of other bugs and issues as well, including corrupted database files, and normally had to invoke our Percona Gold contract to get workarounds or patches.

I'll just note that this class of errors doesn't seem possible in any non-MySQL replication system; there is no slave_skip_errors setting in PgSQL. Your slave either has integrity or it doesn't. That's the way slaves should be. A sane database system won't allow a user to write to a slave or to skip replication rows. I'll also note that the band-aids that make MySQL semi-usable are only there because of Percona's efforts. This stuff doesn't make MySQL seem promising, even if there are workarounds for some of the problems.

Uber's problems as described in that video had nothing to do with PgSQL replication. The carnage was caused by running out of disk space. MySQL doesn't behave well when it gets to 0 free space either; I know from experience. It's as much AWS's fault as PgSQL's, because the reason their disk filled up was a change to IAM requirements. He mentions briefly an attempt to hack Pg replication so it would try to resync from a file with a corrupted header, but probably good for him that that didn't work.

Can't speak to complications associated with vacuum as I've never had to deal with super-large PgSQL databases and pgbouncer is indeed annoying.

olavgg · on Aug 12, 2016

I cannot understand the fuzz about multi-master replication, it either involves dirty/phantom reads, or it is awfully slow. Wikipedia has a lot of examples of these issues https://en.wikipedia.org/wiki/Isolation_%28database_systems%...

There should be a book about the dark side of multi-master replication at scale, about what can you trade away and still be happy with. Preferring either PostgreSQL or MySQL for this is just avoiding the technical complexity that multi-master replication is.

pbreit · on Aug 12, 2016

I was just genuinely curious and not experienced enough to hold an opinion stronger than that generally Postgres seems to be a good choice for most of what could be thrown at it in the web world.

Thaxll · on Aug 12, 2016

Who do people feel the need to mention Postgres on every post talking about MySQL... It reminds me the same problem with Linux and *BSD.

dump121 · on Aug 12, 2016

They are two competing open source databases, knowing strength/weakness of each is always good. So unless the answer is `we rock, you suck`, it would be a useful info. for lots of people

threeseed · on Aug 12, 2016

PostgreSQL has been mentioned on EVERY database post for years now. And it always involve an endless amount of condescension in particular directed at those that god forbid choose to use a NoSQL database.

Everything seems to get turned into a competition despite the fact that the world have moved away from the centralised, vertically scaled database towards a more heterogenous landscape.

infamouscow · on Aug 12, 2016

What drives me mad is when people claim they choose PostgreSQL over MySQL because data integrity concerns, yet they run PostgreSQL on Linux as opposed to FreeBSD or Illumos using ZFS.

dijit · on Aug 12, 2016

Postgres' MVCC is almost like copy-on-write by itself, so anything more than a journal is overkill in any realistic scenarios.

Personally we tried ZFS with PostgreSQL and we managed to fill the WAL drive before the vacuum could clear it out with our update rate.

So we went back to XFS on the more commonly adopted (in our company); CentOS

EGreg · on Aug 12, 2016

BSD > Linux

Postgres > Mysql (in most cases)

But larger learning curve

spudlyo · on Aug 12, 2016

Sadly at present Postgres has absolutely awful mysqldump support (It's not even on their roadmap) so I doubt it would be a good fit for Pinterest's environment.

dijit · on Aug 12, 2016

is that a joke?

of course they wont support mysqldump, but there is pg_dump, which does the same sort of thing except you can dump to binary and compress it on the way without pipes/forks.

spudlyo · on Aug 12, 2016

Wait, hold the phone, are you telling me by migrating to Postgres I can shave a handful of system calls off my long-running i/o bound logical backups!? Oh Postgres, is there anything you can't do better than MySQL?

dijit · on Aug 12, 2016

No I'm telling you that `mysqldump` will not support postgres the same way that a diesel truck will not take petrol.

spudlyo · on Aug 12, 2016

[flagged]

dreaminvm · on Aug 12, 2016

Not sure if you are trolling, but why would Postgres want to support mysqldump? Postgres has pg_dump and MySQL has mysqldump. If you are comparing the capabilities of pg_dump vs mysqldump, that's a different discussion.

shenli3514 · on Aug 12, 2016

If there is a RDMS with great scale capacity, it will save the effort of data migration. Spark/Hadoop could read data from the RDMS directly.

spudlyo · on Aug 12, 2016

I found this article very unsatisfying due to the lack of key details. This reads like a blog post tailored for an audience of folks with pointy hair.

As far as I can tell, they went from Hadoop mappers pulling logical backups from the DBs using python and mysqldump to ... Hadoop mappers pulling logical backups from S3, which were pushed there by scripts running on the DBs, probably still using mysqldump. Although I have no idea, since there are no details. Are these backups physical? Logical? How is the 12 hour big table problem solved by this approach? Why was there a limitation on the number of mappers usable by the old approach?

And what of the old system's DB failover problem? Nothing that can't be solved with a script that is failover aware! Nice. No reason to have dedicated ingestion slaves that aren't, you know, master candidates, when you have scripts that can restart themselves.

haney · on Aug 12, 2016

As someone with pointy hair I had no idea this was a stereotype. Should I migrate to a more web scale haircut?

matteuan · on Aug 12, 2016

"Ingesting MySQL data at scale - Part 1"