

MySQL in the cloud at Airbnb - bkwok
http://nerds.airbnb.com/mysql-in-the-cloud-at-airbnb

======
blantonl
I run a fairly busy but normal implementation of MySQL on Amazon EC2 - 1
Master server and 3 read only slaves, with about 50GB of data spread between
our portal database, a mediawiki instance, a forums database, a large dataset
data for FCC license data, and a large core content database.

In initial testing of RDS after their recent updated release (we were excited
about it!), we've found that RDS performance is significantly lower than
running your own MySQL implementation with your storage being a 4 disk RAID-0
implementation on EBS. I suspect RDS uses a single EBS disk which is probably
the root of the performance issues.

We didn't run actual benchmarks, but we quickly determined that RDS wasn't
going to cut it from a performance perspective. In some cases, queries that
take 3 seconds on our own MySQL implementation took well over 20 secs on RDS.

I see the value of RDS for many use cases, but in our case the performance of
RDS is so significantly lower for a high-traffic use case that it wasn't going
to work out for us.

Now, we still see performance problems when running MySQL on RAID-0 EBS - so
we're considering going to an ephemeral disk implementation on EC2 and letting
a single slave run on EBS and then snapshot from there often. If we lose an
instance, we could restore from a snapshot quite quickly (within minutes).

In any case, regardless of our approach, RDS isn't going to be a part of it
for now.

~~~
tknaup
Assuming you use InnoDB, is your innodb_buffer_pool_size big enough to hold
your data (or a significant amount of it)? It's by far the most important
setting to get speedy reads, which form 95% of the queries in our case. When
you did the testing, did you warm up the RDS instance before, for example by
running a number of common queries? It naturally takes a while after booting
until MySQL can make efficient use of caches and buffers.

I'm not sure if instance-local storage will give you better performance
compared to EBS. From the EBS page: "The latency and throughput of Amazon EBS
volumes is designed to be significantly better than the Amazon EC2 instance
stores in nearly all cases."

There is a problem with the recovery approach you described: as far as I know
you can't use an EBS snapshot to populate local storage on a new instance, so
you would be forced to use EBS for the new master. Promoting the slave to be
the new master would be the better approach, with less downtime and data loss,
provided that MySQL replication didn't break unnoticed.

\- Tobi

------
davcro
I can't imagine life without RDS. I'm a solo developer responsible for a large
web app. Before RDS I hosted MySQL on dedicated hardware from Softlayer. I had
a dedicated box for the master, one for the slave, and another for storing
backups. The DB was about 100GB which made backing up a total nightmare.
Mysqldump was too slow, I ended up having to use xtrabackup from Percona.
Anyhow it got to the point were I was spending over 50% of my time maintaining
the database instead of doing productive things, like building new features
for my app.

Then I switched to RDS. I instantly fell in love with snapshots. I love that
you can backup your database while it is hot. Sometimes, before doing a scary
migration or query I'll take a snapshot just in case. I love that.

Even more amazing is the ability to scale up/down while the database is live.
When I was with Softlayer and I needed to migrate to new machines there was
always downtime and I always had to pay for both the new machine and the old
machine for a small duration of time. Thus migrating would cost a couple grand
in lost revenue and servers expenses, not to mention my Friday and Saturday
night.

~~~
rbranson
FYI: On Linux, you can do instant snapshots with LVM. This is pretty useful as
it's possible to snapshot an InnoDB database as it's running without losing
data.

~~~
davcro
This doesn't work at scale. The IO performance with LVM is terrible. I lost a
lot of money trying to implement this. Percona wrote Xtrabackup as a
replacement for LVM.

Documentation:

[http://www.mysqlperformanceblog.com/2009/02/05/disaster-
lvm-...](http://www.mysqlperformanceblog.com/2009/02/05/disaster-lvm-
performance-in-snapshot-mode/)

[http://www.mysqlperformanceblog.com/2009/02/24/xtrabackup-
op...](http://www.mysqlperformanceblog.com/2009/02/24/xtrabackup-open-source-
alternative-for-innodb-hot-backup-call-for-ideas/)

~~~
spudlyo
The _write_ performance with LVM is terrible.

LVM snapshots on read slaves or during times when you have sufficient write
capacity work as advertised.

------
Maakuth
Okay, this justifies my HN-browsing once again. I was thinking about hosting
MySQL-using application in the cloud literally thirty minutes ago, and then
this caught my eye in the HN front page. RDS looks very promising, I think it
will prove itself useful to us. Thanks HN!

~~~
ScotterC
Same here! I was on the fence about starting my app with RDS or moving to it
later. This article puts me over the edge

------
mceachen
For anyone considering using RDS -- make sure you set your character encoding
properly. Out of the box (at least in mid August), RDS used latin1_swedish.

Changing the database parameters is pretty painful, and the ec2 tool
documentation is a bit misleading. Here are instructions, including how to
verify your setup: [http://matthew.mceachen.us/blog/howto-configure-an-amazon-
rd...](http://matthew.mceachen.us/blog/howto-configure-an-amazon-rds-instance-
to-use-utf-8-925.html)

------
dan_manges
Does anybody how much Multi-AZ synchronous replication affects performance? I
searched, but I couldn't find any benchmarks. Amazon's FAQ[1] says "You may
observe elevated latencies relative to a standard DB Instance deployment in a
single Availability Zone as a result of the synchronous data replication
performed on your behalf" but they don't quantify it.

[1] <http://aws.amazon.com/rds/faqs/#39>

~~~
spudlyo
Since the hot standby can't serve read requests and is synchronous, I assume
something like DRBD is being used on the block level to replicate the data to
a separate AZ. I assume AMZN has adequate interconnect bandwidth between their
AZs. Gigabit Ethernet has around 100-200μs RTT latency, and then there is the
fact that you're writing the data twice...

------
perssontm
Would it be possible to get some figures of how much you pay amazon for this?
And also db-size and queries per day or something?

Good post!

~~~
samratjp
I suppose one could do a quick guesstimation using this
<http://calculator.s3.amazonaws.com/calc5.html>

------
mikeyk
Would have loved to have seen some actual perf numbers, especially in terms of
how much of a cost the Multi-AZ writes entail. We're doing Postgres on EC2,
which means RDS doesn't quite help, but Postgres 9's hot standby/streaming
replication at least gets a bit closer (and could be easily extended to a
Multi-AZ set-up if one of our hot standbys was in a different AZ).

------
Johngibb
How does the scalability offered by Heroku compare to this? Does Heroku let
you interface with RDS if you outgrow their db offerings?

~~~
detst
It's the very first item on their add-ons listing.

