

Disk IO and throughput benchmarks on Amazon's EC2 - sstrudeau
http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html

======
rbranson
EC2 isn't the fastest, I'll admit. What makes EC2 a great choice for a hosting
platform is that it is fairly consistent, stable, well-documented, and allows
you to experiment much more rapidly than really any other option. If you go
with dedicated hosting, you better hope you set everything up properly with
the right balance of CPU/Memory/Disk the first time, or you've got to make
expensive, painful changes to your infrastructure.

~~~
windsurfer
EC2 is to the hosting world as McDonald's is to the restaurant world.
Consistent, predictable, obvious.

~~~
psadauskas
But also one of the most expensive... Doesn't really fit that analogy.

~~~
m_eiman
If it's cheap or not depends on what you compare to. Most other VPS systems
are like home cooking: it's cheaper (and likely tastes better) but you have to
do more work yourself. EC2 is more like McD: the price is higher but all you
have to do is eat the thing / run the actual app. You can also buy just a
cheese burger when you need it, you don't have to order a big meal.

------
spudlyo
It's worth noting that you can speed up write operations on EBS volumes by
first initializing them by writing data to every block on the volume.
Subsequent writes will be faster due to the way Amazon virtualizes disks.

[http://docs.amazonwebservices.com/AWSEC2/2007-08-29/Develope...](http://docs.amazonwebservices.com/AWSEC2/2007-08-29/DeveloperGuide/instance-
storage.html)

~~~
aolnerd
Those docs refer to ephemeral storage. I haven't seen any claims by Amazon or
test results that indicate it is also true for EBS volumes. I have been
meaning to test this myself.

~~~
spudlyo

      gracie:~# time dd if=/dev/zero of=/dev/sdl bs=1024k count=10240
      10240+0 records in
      10240+0 records out
      10737418240 bytes (11 GB) copied, 344.745 seconds, 31.1 MB/s
    
      real    5m44.748s
      user    0m0.025s
      sys     0m10.152s
    
      gracie:~# time dd if=/dev/zero of=/dev/sdl bs=1024k count=10240
      10240+0 records in
      10240+0 records out
      10737418240 bytes (11 GB) copied, 217.106 seconds, 49.5 MB/s
    
      real    3m37.125s
      user    0m0.023s
      sys     0m10.268s
    
      gracie:~# time dd if=/dev/zero of=/dev/sdl bs=1024k   count=10240
      10240+0 records in
      10240+0 records out
      10737418240 bytes (11 GB) copied, 247.319 seconds, 43.4 MB/s
    
      real    4m7.323s
      user    0m0.036s
      sys     0m9.794s
    

EDIT: I ran it a third time just to make sure the results held. I'd still take
this with a grain of salt, in my experience EBS i/o results are pretty
inconsistent.

~~~
3ijsflksflk
data rules

------
mark_l_watson
Good article and quantified my own take, making me feel justified for setting
up RAID0. Lately, I have spent 5 to 10 hours a week doing setup/experiments on
EC2 + EBS (about half on customer projects, and about half for my own stuff).
I expect (hope!) that this time spent per week will decrease drastically since
although I have been having a lot of fun, this is time spent not doing
development.

I continue to be a big fan of Heroku because it saves me so much time, but I
have been having limited success getting customers to go with Heroku - they
seem to want to pay for custom EC2 setups.

BTW, one huge advantage of running on EC2 is that it is so easy to run an old
version of a system to test against. I moved my own reserved instance from
booting on an ephemeral drive (out of S3) to booting off of EBS. I only
deployed what I currently need, and will always have the old version around to
test against, check a bootup script, etc.

------
aolnerd
Another thing to consider is that EBS offers snapshot backups and is therefore
not functionally equivalent to the ephemeral disk which is not durable.

Ephemeral disks offer more predictable performance, especially on the largest
instances, however they will not provide the random iops of a raid set of EBS
volumes.

------
jedberg
We use EBS for all of reddit's databases. We occasionally run into IO limits
when doing a vacuum, but that is about it.

~~~
aolnerd
Do you RAID them? What size instances do you use? EBS seems susceptible to
poor performance when there is contention on your instance's shared network
card or contention on the EBS device itself.

Vacuums use sequential IO, so it makes sense that you max out pretty quickly.

~~~
jedberg
> Do you RAID them?

No

> What size instances do you use?

Databases are m1.xlarge.

