

EC2 I/O - snewman
http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/

======
jread
I’ve been working on some similar analysis with EC2 and other providers. I
think the big missing point in this post (acknowledged by the author) is with
regards to EBS optimized instances and provisioned IOPS where we’ve observed a
dramatic improvement in IO consistency. Another interesting observation I've
experienced is that performance consistency often declines using multi-EBS
volume raid, likely due to variations in spindle tenancy or network latency
variations. EBS test volumes were 300GB. Better performance/consistency is
possible using larger EBS volume sizes.

Here are links to a couple summaries of the analysis I’ve done on EC2,
Rackspace and HP. I plan on writing a blog post regarding this analysis soon.

Disk Performance: The value columns is a percentage relative to a baremetal
baseline 15k SAS drive, where 100% signifies comparable performance.
Benchmarks included in this measurement are fio (4k random read/write/rw; 1m
sequential read/write/rw), fio – Intel IOMeter pattern, CompileBench,
Postmark, TioBench and AIO stress:

[http://dl.dropbox.com/u/20765204/1012-disk-io-
analysis/disk-...](http://dl.dropbox.com/u/20765204/1012-disk-io-
analysis/disk-performance.html)

Disk IO Consistency: The value column is a percentage relative to the same
baseline. A value less than 100 represents better IO consistency than the
baseline. The value is calculated by running multiple tests on an instance,
measuring the standard deviation of IOPS between tests, and comparing those
standard deviations to the baseline. Testing was conducted over a period of a
month on multiple instances in different AZs.

[http://dl.dropbox.com/u/20765204/1012-disk-io-
analysis/disk-...](http://dl.dropbox.com/u/20765204/1012-disk-io-
analysis/disk-consistency.html)

------
staunch
This is one of the keys reasons I recently started my new project, Uptano
(shameless plug: <https://uptano.com>), and all servers use dedicated RAID 1
(two drives) with 10K RPM or SSD storage.

The same issue applies to network performance. I've seen very expensive EC2
instances that couldn't even push 50 Mbit/s to the net, while instances of the
same type could at least do a few hundred Mbit/s. AWS' answer was always to
simply buy even more expensive instances, so _less_ people are sharing, but
that's a terribly costly answer.

I'm doing bonded (802.3ad) 2x1 Gbit/s connections on all servers, because
that's what I wish EC2 had.

Multiple customers, with highly varied workloads, sharing the same physical
server hardware is simply a fundamentally flawed idea. IMHO, it only makes
sense to use a VPS for very small personal projects, where you don't want to
justify ~$140/mo in server costs.

EC2 was a really novel thing and it brought lots of great technology to the
scene, but they made a few fundamentally wrong choices.

~~~
verelo
We've had similar issues in the past like when we've used RDS for some of our
projects (In particular issues with disk IO). AWS a great place to start, but
chances are the "one size fits all" solution is going to get difficult once
you're doing more advanced tasks.

------
snewman
OP here. We'd like this work to be a useful resource - everyone benefits when
there's more / better information about how these complex cloud systems
perform in real life. So please comment with suggestions, questions, or any
other feedback!

~~~
darkarmani
I wish you tested with larger numbers of EBS volumes. I've just started
researching EC2 and EBS, and the links I saw seemed to indicate that large
numbers of EBS volumes with raid0 smooths out the performance (8,16,24
volumes).
[http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs...](http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs/)

The graphs are very well done. That amount of data would have been
incomprehensible if not for your carefully thought out charts.

~~~
snewman
Thanks. I did put in quite a bit of work in finding the best ways to present
the data, which depending on how you count is around five- or six-dimensional.

I'll see whether we can work larger RAID configurations into a follow-up.

------
kanwisher
Great article got all the way through it. It was similar to what we saw, is
that ephemeral storage on large and xlarge are almost always the way to go.
One thing missing from the article was a good comparison of $/iops and
$/iops/storage between the instance types.

~~~
snewman
Thanks!

$/iops between storage types is the subject of the "Cost Effectiveness" charts
near the top of the post, but I suspect I'm not quite catching your meaning.
What do you mean by $/iops/storage?

------
jordanthoms
Enjoyed the article, but it feels very incomplete without discussion of the
solid-state, provisioned iops, and ebs optimization options. Would be
interesting to see if those get rid of the bad apples and what sort of benefit
they give.

------
frew
Really great article and neat product!

One question: on the throughput graphs, I understand why you normalized them
per graph, but were there any differences between graphs (particularly in
terms of EBS vs. ephemeral) that would be sufficient to drown out the
variability within the throughput graphs?

~~~
snewman
Yes, the differences between graphs were fairly substantial. You can get a
sense of the EBS vs. ephemeral difference by looking at the "Throughput by
Threadcount" graphs (about 1/3 of the way through the post). Briefly, EBS has
much higher throughput for random writes; ephemeral (instance storage) is
faster for reads and bulk writes.

------
jnsaff2
What size were the EBS volumes? Would be interesting how important the volume
size is to performance.

If you only provision 1TB (or larger if they have them available now) EBS
volumes then you'd have spindles dedicated to you whereas with smaller ones
there might be a lot more variations because you share.

More background: [http://perfcap.blogspot.com/2011/03/understanding-and-
using-...](http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-
ebs.html)

~~~
snewman
We're double-checking, but it looks like all of the EBS volumes were 100GB. It
would be interesting to see the impact of using 1 TB volumes.

~~~
snewman
Confirmed: all EBS volumes were 100GB.

------
zurn
Hmm, it shows "4-EBS RAID" getting around 2.6x speedup for 4k writes. They
don't say what RAID configuration they are using, but it sounds odd.

A 4k write has to be synced to all the disks unless they have a <=4k stripe
size AND are using RAID-0 AND are using stripe-aligned IO ops. It's also
possible they use 4k writes to cache that end up forming large dirty blocks
which the OS then syncs as larger I/Os. But that would be measuring something
else than the benchmark claims.

~~~
zurn
Correcting myself: they do say in the methodology section that they are using
RAID 0 and say that the write benchmark is multithreaded. So that explains it,
it's measuring write throughput for 4k writes.

Nothing to see here, move on...

------
jaequery
i wish OP would've atleast included some "conclusion/final words". for someone
in the same boat of choosing the right platform, this benchmark serves no real
help in deciding which to go with. a comparison with Rackspace Cloud for one,
would be very helpful.

~~~
alanctgardner2
I think the conclusion is that you need to provision a lot of machines, and
benchmark your own workload thoroughly to assess any cloud provider. That way
you'll have solid performance numbers that are relevant to your case. They
(rightly, in my mind), don't come down one way or the other on particular
instances or options, because the cost/value analysis depends so specifically
on your workload. If you know your workload very well already, they give
enough detail that you could reuse some of their data.

~~~
njharman
Or not worry about it so much and just go with good enough.

------
thegyppo
Amazon's IO/IOPS performance is pale on comparison to a lot of providers
(shameless plug): <http://serverbear.com/benchmarks/io>

