Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Anyone used Amazon EC2 with a database or other high IO operation?
62 points by MichaelApproved on Nov 15, 2009 | hide | past | favorite | 56 comments
I'm thinking of using Amazon's EC2 to run my database and web farm but I'm concerned about the intense IO that's required to run the database. I've shared resources on a vps in the past and it's been a nightmare. I know they offer high IO servers that have limited sharing but it's still sharing.

Has anyone had good or bad experience running a high IO operation using EC2?




We run all of reddit on ec2, which includes a bunch of postgres servers. Each one is running with a single EBS. However, I've heard horror stories of people trying to run much less busy databases and having lots of problems, usually with MySQL.

Those databases are all on XLarge instances, so there is minimal sharing, and we've also gone to great lengths to make sure all of our normal queries are in indexes, so the disk gets hit less.

We also have a read slave for every database to alleviate read loads.

One thing you might want to do is run 'iostat -xtc' on your current box and put that in a log file. Then go back and analyze it and see what your average and peak reads and writes are. Amazon's max for a single EBS appears to be about 1000 ops / second (at least, that is what we were doing when they told us we maxed out the performance of the disk).

Good luck!

Edit: I forgot to mention that on all the database disks, we use ext2 and noatime. Both decrease the total number of writes necessary, and have very little downside (the biggest being that you have to fsck on a crash).


I was just talking to someone wondering how you guys at reddit were able to do it. The article about taking your last server offline really pushed me to move forward on this.

I should mention that I'm running Windows with MSSQL. I imagine the performance should be the same but there must be some differences of course.

I like the option of rackspace allowing for a dedicated server next to the cloud but I should probably give EC2 a few test runs of my own too. I think if EC2 is going to work it wont be an overnight solution. I'll have to move over slowly, learning to tune the box as I use it more.


If you're using Windows, I would stay away from EC2 right now. They aren't quite mature enough for Windows yet. If you look at the EC2 forums, the majority of complaints are about Windows bugs.

Actually, you might want to try and get in on Microsoft's Azure beta.


Azure is tempting, especially since MS is trying hard to get into this market, but I worry about putting all my eggs in the MS basket this early is Azure's life.


Are you able to build something quickly to do some test runs against Windows Azure to see if it has the performance you need? You may also want to check out SQL Azure (part of the Windows Azure Platform), also in beta, to see if it can do what you need for DB performance.

As far as Windows Azure being early in its life, you are correct. However, it has been in beta for about a year and it is running Windows Server 2008, which has been in the market for almost two years. You may also want to look at the Windows Azure forums at http://bit.ly/MSDNWinAzureForum to get an idea of other people's experiences with it.

You can also get more info about Azure and its beta availability at http://bit.ly/WinAzurePlatformDevCenter

(Jason - working for M80, representing Microsoft)


I'm going to PDC this week and hopefully there will be people there with some good real world experience with Azure that I poll.


Have a great time at PDC. I'm sure finding Windows Azure experts there will be like finding coffee in Seattle :-), and you'll definitely find plenty of experienced Azure users at the Azure presentations.


I am running MSSQL on Windows on EC2 without any issues, normal db tuning rules apply.


Here at RescueTime we use EC2 for all of our servers, with the exception of one server outside EC2 for monitoring.

All of our MySQL instances are currently each running on 4x EBS volumes using LVM stripped across all 4 volumes.

Our primary transactional table is nearing a billion rows, and just this weekend we performed a migration that segmented this table into 40 partitions, containing data back to 2007.

We are currently handling roughly 3,000,000 inserts and 2,500,000 updates on this one table daily. IOWait on our DB holds steady at about 5% throughout our peak (5am - 4pm PST).

It is certainly not Fibre Channel backboned speeds, but the other benefits of using EBS CURRENTLY outweighs any IO performance hits we might have. The ability to backup production and refresh any of our other DB environments in less than 15 minutes, using EBS snapshots is worth it. Especially for the price we pay monthly for this infrastructure.

If you need any more details on our setup or tips in troubleshooting/setting up your own, feel free to ping me.


We have a highly trafficked and large dataset that we aimed to move from our datacenter to EC2. This is all in MySQL. Upon doing so, we had to spend significant effort trying to get our system stable and we ended up narrowing it down to EBS simply not being able to handle the IOPS we needed.

We explored all the options available and worked with both Percona and Amazon on the issues, but it was clear that EBS and EC2 just are not meant to handle such load. Our database is currently back running in our datacenter.

There have been a number of reports with similar results using EBS. It all depends on your particular IO profile, but since you said it was high, I encourage you to tread lightly.

If you want more information you are welcome to email me.


The pain point in this particular case (I worked with qhoxie and bjclark on this) wasn't so much that the EBS performance was consistently bad, but that it was very unpredictable. Servers would suddenly, for no apparent reason, start performing very poorly.

That's been a consistent theme I've seen with EC2 instances--it's hard to predict how fast something is going to be, and once it's running, you don't know if its performance is going to change. That's one reason to take benchmarks of EC2 (and possibly other cloud providers) with a grain of salt.

I've heard anecdotes of people starting up n EC2 instances, running benchmarks on each, then killing all but the fastest ones.


We don't go that far, but we did choose the data center that had the faster machines. It had upgraded hardware that the others didn't. It won't be the fastest forever, but you can certainly shop around internally to find faster machines.


Just to add to what QHoxie said (I worked with him on this project), the site is doing Alexa top 750 traffic. Every page view would hit the db atleast a few times. And it's not that the site just couldn't run off EBS volumes, but we would see massive performance spikes where EBS IOPS would slow to a crawl for a few minutes to even a couple hours.


This is a good clarification to make. It was not that our entire application ground to a halt when we flipped the switch. We would experience intermittent and extreme performance issues. We dug deep with tools like cacti and ended up being able to point only at the 'unknowns' of the shared resources and EBS.


I'm beginning to thing that MySQL does something very different with disks than Postgres, because everyone I've talked to that said they had a problem had been running MySQL, and everyone who has been successful has been running Postgres (myself included).


This really would not be all that surprising, but it also would not surprise me if the average Postgres setup was better tuned/architected than the majority case of MySQL uses.

Our setup was surely not ideal, but even with significant tuning, it was not sufficient.

It is really good to hear successes like Reddit's, though. How long have you folks had your databases in EC2?


We had some of databases at EC2 since November of 2008, and moved the rest of them over in May 2009.


Maybe. I know when we migrated Chesspark from native machines to EC2, the postgresql database performance was at least an order of magnitude worse, and we spent two weeks doing caching and index hunting that we never had to worry about before.


I work on the Cassandra distributed database for Rackspace. A _lot_ of people start out trying Cassandra on EC2 but the universal conclusion has been that I/O performance is miserable.

Anecdotally, Rackspace Cloud Servers I/O works much better, although the only serious head-to-head comparison I have seen is this one: http://pl.atyp.us/wordpress/?p=2240 (tl;dr: < 20MBps on writes with local storage or EBS on EC2 vs 100MBps+ on RCS)


At OpenX we're using Cassandra on EC2 and it has served our needs well so far. Our use case is a little different in that we set hard timeouts on the client side reads and can tolerate a small percentage of requests that don't complete in time.

According to Amazon the I/O depends on the instance type (http://aws.amazon.com/ec2/instance-types/) so the OP may want to take that into consideration when picking an instance type.


Unfortunately, last night I completely failed to remember that Rackspace and EC2 are direct competitors in this space so what was intended to be a simple data-point may be construed as more than that - my apologies.

What I intended to say was "you can make Cassandra work on EC2." However, if the I/O throughput of EC2 is a fraction that of some other provider, you'll need more machines in EC2 to achieve the same throughput.

My personal site is hosted on the Rackspace Cloud and I've been very pleased with it, but my workload is different so I can't give a comparison.

I also know that we've benefited directly from jbellis' support on IRC and the mailing list, so this should be considered an endorsement of Cassandra, if anything.


Of course no offense taken. Glad EC2 is working out for your workload. :)


I had heard you were using MySQL as database earlier. What made you switch to Cassandra? Just curious.


We're using both. Existing MySQL back-ends are still there, but for new development that can leverage a distributed key-value store we're using Cassandra.


I can't answer your question from personal experience because I always just use plain old EBS volumes with my EC2s.

That said, here is a good writeup on RAID use with EBS by someone at Heroku: http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs...


That's got some really good information. Most important to me is that it does confirm my fears of flaky IO performance. He does point out some ways to tune it but it still seems like a rough experience.

These cloud services seem great for low IO apps but once you have to hit the database hard it seems like anyone's guess if your db will perform.


That's why I'm a big fan of Rackspace. You can get bare metal servers for heavy I/O and simultaneously have "Cloud" servers on the same local network (example: Github).


Are you actually using this sort of service from them? When I talked to them about 2 months ago, we couldn't get anything like that since their "cloud" stuff is in a datacenter you can't currently buy dedicated servers in. In theory, they're opening another next year that may solve the problem. It's hard to tell, though, since their sales people don't seem to have any experience selling a hybrid setup.

My understanding is that Github virtualizes things themselves, they're not running Rackspace cloud servers.


Anyone have any experience with NewServers? They offer dedicated server provisioning in about 5 minutes. On paper they look great but I have not found much commentary about their product anywhere.


I've been hosting with them for five years (last two @ newservers). Their support is great and they even helped installing my scripts. I ordered 7 differents servers since the beginning to create farms and their load balancing mechanism is also very cool. Cheap and quality service. I'm a non-US customer, my site is a top 100 site in my country.


Murat SARIKAYA http://www.msxlabs.org


FWIW: SoftLayer offers the same combo of dedicated servers and cloud offerings, and I prefer their turn around time, support, and pricing to RS.


That sounds like a good mix. I'll explore that option some more.


Seconding the EBS Raid0 option. Some comparison here http://blog.mudy.info/2009/04/disk-io-ec2-vs-mosso-vs-linode...

Also, if you have a part of a DB that just requires unholy amounts IO, there is of course the new 68gig instances, move that part and run it purely on memory with a combination of frequent dumps to disk.


Maybe a combination of DB file in RAM and transaction logs to disk?


It's worth noting that you can speed up write operations on EBS volumes by first initializing them by writing data to every block on the volume. Subsequent writes will be faster due to the way Amazon virtualizes disks.

http://docs.amazonwebservices.com/AWSEC2/2007-08-29/Develope...


I'm curious about the economics of this. If you reserve an extra large high-CPU instance for 3 years, you end up paying $3,000 per year ($2,800 initial fee + hourly rate). Isn't it cheaper to have your own servers?


To buy physical hardware equivalent to that instance, yes, it would be cheaper. However, you need to also factor in the costs of keeping that machine in a rack for 3 years with power and cooling. When I worked out the costs, it ended up being cheaper to use EC2 by about 30%. My datacenter was a bit expensive, but that is because I was looking at datacenters in San Francisco, since I don't want to have to drive to the middle of nowhere when I have to do maintenance on the physical machine.

Also, I was comparing against 1 year reserved instances, which is what I use now.


If it's not a commercial secret, I wonder, are sites like Reddit and Digg low margin businesses, or are hardware/bandwidth costs a tiny speck compared to the ad revenue generated?

In your "IAMA" thread on reddit, someone estimated that Reddit must be pulling on the order of $1M per month, making the costs neglibile. Digg must be earning even more, because of its greater number of users and their higher susceptibility to ads.


I'm not at liberty to discuss reddit's revenues, but what I can tell you is that the entire operational cost of the site (about $15K last month) is small compared to the human cost (ie. salary and benefits)


cheaper but less flexible.


But you are already giving up flexibility with the 3 year term and reserved instances, or am I misunderstanding something?


I don't think you're misunderstanding anything at all

My opinion of what constitutes "middle of nowhere" may differ from the average San Franciscan's, but I've had the opposite estimation of costs that jedberg did. That is, EC2 was 20% more expensive than operating ones own hardware in a South Bay datacenter for a year.

This would have been about a year ago, but, IIRC, including operating costs, one would break even at around 10 months of unreserved EC2. Pricing has certainly changed in the meantime, so 30% in the other direction is well within the realm of possibility.

Back then, it cost about $3 per year per watt. Now it's pushing $4. The trick is, in the case of ones own hardware, one usually pays for the available capacity, not just what one uses. On a 500W server, that's $2k annually.

That's the flexibility that AWS offers: one doesn't pay for the operating capacity, only the use. It would seem, however, that one pays for the hardware up front either way.

The great increase in flexibility of operating ones own hardware is not suffering from only ordering off an abbreviated menu. Germane to the OP's question, I have never found any virtual or dedicated server offerings come close (say, a factor of 10) to the I/O performance of something custom configured/assembled (even from inexpensive, commodity blocks). Whether this constitutes a problem is subject to interpretation

In general, I remain skeptical that there exists in The Cloud cheap-enough commodity servers suited to even a majority of applications. Remember, even Google had custom hardware very early on.


I've had the same problem with disk I/O bound apps in the cloud. Having said that, the cloud is a great way to launch a site/startup then move off if it actually works out.


I was talking about a different sort of flexibility. The ability to spin up another instance identical to this one if you suddenly need more capacity. The ability to spin up instances to do some data crunching, then turn them off when done, so you only pay for what you use.

If the capacity you need is fixed and known it will be cheaper to get a dedicated server. Here's a a counter example. I once worked for an online retailer and we got an order of magnitude more traffic at christmas. With a dedicated server we had to pay for the capacity needs of christmas all year. On the cloud we could add capacity for xmas and then turn off hte extra capacity the rest of the year. It was a lot cheaper.


I am using MySQL on EC2 c1.medium instances. I haven't noticed an IO limit yet, but that could be how I'm using it plus the data set is still under 2GB.

I use the database as a raw data source and have a cron that generates read only data sets for the web servers to use. This works for my application, but probably not most.


With any high IO web application, it all comes down to horizontal sharding to make it work. But, we recently switched from a high powered DB to Amazon and we're doing fine.


Good luck so far using EBS for MySQL, logs etc. I/O costs not significant yet.

You can Raid EBS devices btw.


RAID helps if disk is the bottleneck. If it's the network (EBS is network attached), then you're SOL and should probably try a bigger instance instead and hope fewer people share that physical host.


Doesn't Amazon already RAID the storage they offer you? Is drive failure something I need to worry about? I thought they had a beefy RAID system already in place.


They do, but I think he's talking about doing a raid0 just to leech more io, not safety.


I see now with all the comments that everyone is referring to getting more platters spinning the data.


If you enable striping (RAID0) for performance, how do you manage snapshots and backups? Seems like there would be consistency issues re-building the array from multiple snapshots.


You wouldn't want to use the virtual hard drive they provide you, but rather EBS (Elastic Block Storage). With that you can setup a mirrored RAID (redundancy is a given already) of as many devices as you please.


Just write to /mnt (150GB ramdrive on the small instance) then occasionally sync to an EBS volume. That'll probably give you the best I/O.


/mnt is not a ram drive, it's ephemeral hard disk, and EBS has better IO than the /mnt. You only get 1.7GB RAM on the small instance.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: