

RAID0 ephemeral storage on AWS EC2 - cemerick
http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws-ec2.html

======
andrew311
I've done thorough testing on random read/write performance of ephemeral vs
EBS, and I can tell you that EBS is waaaay better in the case of random IO. I
can attest to the accuracy of the random IO performance in one of the
references from the post:

<http://victortrac.com/EC2_Ephemeral_Disks_vs_EBS_Volumes>

Amazon even says this on their page (also referenced by Gabriel).

Joe Stump's article might lead you to believe the ephemeral and EBS are equal
for random IO, but Joe only tested on a RAID0 config with two EBS volumes.

In general, even with the best instances under the best circumstances, you
won't crack 2K iops/sec on ephemeral RAID0 with four drives.

EBS, on the other hand, with 8 volumes configured in RAID0 will exceed 24K
random reads/sec and 12K random writes/sec. The reads are so much higher
because EBS is mirrored.

The downside with EBS is you can see worse performance when there are noisy
neighbors. I've seen performance drop 50-70% for hours at a time on m1.large
because of network card contention, but you can avoid this when on larger
instances (m1.xlarge or m2.4xlarge do the trick).

Sequential IO is another matter. I haven't thoroughly compared, but I believe
in this case things are more even.

Performance aside, there is something to be said for removing dependencies on
a complex system like EBS. It also frees up network bandwidth and provides
quite a bit of storage without the $0.10/GB cost of EBS. If iops isn't a
problem and you plan on replicating, then ephemeral can be a big win.

------
dotBen
So I'm kinda confused. The instance goes down, you lose your data. That's the
main reason why Amazon build EBS _(granted it has issues of its own)_

I'm not quite sure how, in production, this setup really helps you because you
KNOW that you will loose your data.

~~~
ghotli
Some examples where you might not care: HDFS has every block on three
DataNodes around the cluster. Cassandra is similar and both will auto heal if
a node goes down. Projects like drbd will do block level mirroring of a
partition between a set of machines with only one master elected at a time.

Some usage patterns don't care if the data is lost, they just care that the
I/O is fast. Consider a group of identical machines serving up read-only data
round-robin'd via a load balancer. If one goes down, who cares? Only the load
balancer does. A new machine gets provisioned and gets added back to the
group. DuckDuckGo probably has a data lifecycle that pushes out a new read-
only index set on a regular schedule. If the data is kept in S3 and pushed out
to read-only nodes then this is a very good use case for only using the
ephemeral drives.

Servers where writes occur are really the ones that need to be durable. Best
case would be for the application to wait until the write was fsync'd on
multiple machines if you're faced with node failure at any moment.

This topic is kind of a rabbit hole. If you're using ephemeral drives and
can't stand to lose data since your last backup then you have to use some sort
of multiple machine architecture. Once you do that you're at the mercy of the
network. A set of postgres servers in master/slave replication will always
have the possibility of losing writes since the slave replication is
asynchronous. Cassandra and other quorum-write based systems can make it so
that if you write to a single node, a quorum of nodes must agree upon the
write before returning the "block written" signal to the client.

Really the CAP theorem[1] explains it all. Read the paxos wikipedia article[2]
and the chubby paper[3] if you want to see how to keep a set of nodes in total
agreement as to the current state of the system.

[1] <http://en.wikipedia.org/wiki/CAP_theorem>

[2] <http://en.wikipedia.org/wiki/Paxos_(computer_science)>

[3] <http://labs.google.com/papers/chubby.html>

------
RyanKearney
Why would anyone go with RAID 0 for a server? At least do RAID 10.

~~~
jonburs
The article describes techniques for setting up RAID(0) against EC2 local
ephemeral storage for performance gains. As there's no way to replace a failed
drive RAID 10 wouldn't provide any benefit.

~~~
sobbybutter
Well, you'd have complete data loss on RAID-0 if there's a single drive
failure. With RAID-10, you can sustain at least one drive failure without data
loss. That way, even though you can't replace a failed drive, you can at least
get the data off to a new instance.

~~~
aaronblohowiak
The whole instance can disappear at any time, so you can't rely on hardware
redundancy. This means you _must_ back up your data off the ephemeral drive in
some fashion already, in which case going for RAID-10 doesn't provide much
benefit.

~~~
sobbybutter
Ah, I see. I'm not an EC2 customer, but how often do instances disappear? This
seems kind of odd on Amazon's end. What's the reason for instances
disappearing?

~~~
wmf
Hardware failure.

