
New High I/O EC2 Instance Type - hi1.4xlarge - 2 TB of SSD-Backed Storage - jeffbarr
http://aws.typepad.com/aws/2012/07/new-high-io-ec2-instance-type-hi14xlarge.html
======
bravura
What are good use-cases for _on-demand_ high I/O servers?

At $3.10/hr, these instances work out to $2k/mo. There are probably many more
cost-effective options if you want a 2TB SSD server.

Since the benefit of using EC2 is that you can provision instances
elastically, what are the sorts of scenarios in which one needs to provision
high I/O servers elastically?

[edit: A few minutes of Googling, and I can't find any dedicated servers with
2 TB of SSD.]

~~~
mbreese
I do genome mapping where our indexes won't entirely fit in memory. It would
be very handy to be able to spin up a few of these instances, load the indexes
from an EBS volume onto the local SSDs, then run for a couple of hours or so.
This is a very I/O intensive job that we need to run about once a week, but
then the rest of the time could be idle.

SSDs would make our jobs run significantly faster. So much so that we've toyed
with the idea of adding SSDs to our in-house cluster, but couldn't quite
justify the costs. This might actually shift the cost savings to get our lab
to migrate to EC2 as opposed to our in-house or university cluster.

~~~
lonnyk
Any chance we could get an example of the data set and the calculation that
needs to be done?

~~~
Gmo
Well, the raw output of a typical so-called "next gen sequencing" (which are
actually very current gen) machine is around 1TB (at least, the ones we used
here).

This is raw file though, so once processed (but not yet analyzed) I believe we
have sizes around 50 to 100GB (but that's not really what I work on so don't
quote me fully on this).

The next steps vary on what you want to do exactly, but it usually involves
alignment of base pairs (basically, trying to tie together by their ends
sequences of DNA but seeing if they "fit").

~~~
Gmo
I said by their end but it can also be the full sequence, depending on the job

------
Fizzer
This is a game changer for big sites on EC2. The key word here is local: 2 TB
of _local_ SSD-backed storage.

In this video [1], Foursquare says the biggest problem they're facing with EC2
is consistency in I/O performance. They say that the instance storage simply
isn't fast enough for them, and while EBS is fast enough when RAIDed, it isn't
consistent since it isn't local (EBS is traffic goes over the network). Reddit
has also complained about EBS, but they've been able to move onto the instance
storage.

If you're willing to reserve the instance for 3 years, the average monthly
cost becomes only $656. That's quite a good deal.

Foursquare says in that video they're planning to migrate off of EC2, in part
due to I/O performance. I'll be interested to hear whether or not this
instance type changes their minds.

[1] [http://www.10gen.com/presentations/MongoNYC-2012/MongoDB-
at-...](http://www.10gen.com/presentations/MongoNYC-2012/MongoDB-at-
foursquare)

~~~
XERQ
_If you're willing to reserve the instance for 3 years, the average monthly
cost becomes only $656. That's quite a good deal._

The only problem with reserving that instance for 3 years is that better
hardware always comes along, especially with the cost of SSDs coming down
significantly every year. Usually if you're in the big-data space, your
hardware is likely retired after 24 months (12 months if you're well funded)
so locking yourself in for 36 months might be a bad investment.

~~~
jaylevitt
Has Amazon ever bumped the specs on existing hardware types? Or do they just
create new hardware types? e.g. is it possible that if you get a 3-year
reservation for an h1.xlarge, by 2015 h1.xlarge might have newer specs?

I had thought that EC2 reservations were upgradeable, but a quick check on the
forums shows you're right, they're not. Of course, you can play your own
"tiered usage" game, like laptops in IT departments, where the old h1.xlarge
becomes cheap enough to use as a second-tier machine and you go reserve the
h1.xxlarge for Cassandra.

~~~
gtaylor
If you sign up for a reservation, you seem to be able to send support a
message in order to have them cancel it so that you can change to the new
hotness. We had to do this for our three-year reservations when high usage
reservations came out. We were getting shafted because our previous generic
"Reservations" were converted to medium use, whereas we were using them as
high use.

So it does at least appear that in some cases, they'll let you out of your
reservation so that you may sign up for something similar. Or at least they
let us do that.

------
josephcp
Reminder since it's a pair of SSDs and most people will probably look into
using this for their DB store: If you use current generation
controllers/software & SSDs, you're going to have a bad time if you turn on
RAID and don't know exactly what you're doing.

TRIM ( <https://en.wikipedia.org/wiki/TRIM> ) isn't supported with RAID on SSD
today on hardware controllers and most distributions of linux don't support
TRIM on RAID out of the box if you're doing software RAID, so you're going to
see performance plummet like a rock after you do one pass of writes on the
disk. In many RAID configurations, you're going to zero-write the entire disk
when formatting it, so performance is going to suck from the get-go. For this
reason, even if you have a tiny database and don't expect to write 1TB worth
of data, your performance might still suck. Personally, I haven't tried linux
software md TRIM in production, the patch is pretty recent, so you're on your
own here (if possible, scaling out horizontally may be a solution to consider
for redundancy, I have no idea what Amazon using for SSDs, but recent
Sandforce generations fail all the time, so plan for that).

If you don't know to look for this issue, you're going to be scratching your
head when your RAID10 SSD configuration write throughput is worse than a
single 7200rpm drive. On the other hand, IOPS on SSDs are _AMAZING_ for
databases/datastores. Amazon may have solved this for you already behind their
visualization instance, and they might be running their own software striping
behind whatever raid you're doing, so be sure to test it out fully first.

~~~
benmccann
What's a good way to utilize the two disks for your datastore? Run mongodb for
example one disk and backup to the other? Run one sharded instance on one disk
and another sharded instance on the other disk?

~~~
tlack
With MySQL (and Oracle before that) it was common to simply move different
parts of the database data files to different disks. I don't use Mongo so I
can't speak to that, but the concept works pretty much universally. See here
for more information about spreading your database around multiple disks:
[http://www.mysqlperformanceblog.com/2010/12/25/spreading-
ibd...](http://www.mysqlperformanceblog.com/2010/12/25/spreading-ibd-files-
across-multiple-disks-the-optimization-that-isnt/)

------
ehsanu1
This has been a long time coming, but AWS has consistently been improving
their service (as long as you can ignore the particularly bad reliability as
of late).

It's telling that they have only enabled this for a huge (quadruple extra
large) instance type. It's probably hard to make this work for someone who
just wants a 10GB disk with great IO. The problem at the low end is that disks
are larger and would thus have to be divided up to make proper use of them,
leading to IO contention..

The high IO options will probably only ever be available for pretty large
instances.

~~~
cperciva
_It's telling that they have only enabled this for a huge (quadruple extra
large) instance type._

My guess (not based on any knowledge of EC2 internals) is that they don't have
any way to do fair I/O sharing between guests. If they did, they could split
these boxes into 32 small instances with 1 ECU, 1.7 GB RAM, and a 60 GB disk
with 2500 random reads / 250-4000 random writes per second.

~~~
XERQ
Xen offers easy ways of doing fair I/O sharing between guests. These servers
they're using are most likely multi-tenant systems with 256-512GB of RAM and
6-12TB of SSD storage. Providers don't like keeping expensive systems around
that aren't making money, especially when demand changes every hour, so I
expect that they have at least 4 instances sharing the I/O of each host
(especially when they mention broad ranges of expected I/O).

The most likely reason for not slicing these systems up to smaller instances
is they want to maintain consistent, high performance I/O.

~~~
arohner
AFAIK, the largest tier in any AWS instance type has always been the full box.
i.e an m1.xlarge is the whole box, an m2.4xlarge is a whole box, etc.

~~~
XERQ
I would agree with you, but them listing such broad write IOPS ranges makes me
think otherwise. I could be wrong though.

~~~
agwa
There's a technical reason for the range, explained in the blog post:

> Why the range? Write IOPS performance to an SSD is dependent on something
> called the LBA (Logical Block Addressing) span. As the number of writes to
> diverse locations grows, more time must be spent updating the associated
> metadata. This is (very roughly speaking) the SSD equivalent of seek time
> for a rotating device, and represents per-operation overhead.

------
jbarham
FWIW it costs $27,156 a year on-demand or $12,720 as a reserved heavy
utilization instance.

For heavy analytics workloads I'd bet that Google BigQuery
(<https://developers.google.com/bigquery/>) would be cheaper and faster and
more reliable.

------
mrb
Anyone can report on the model of SSD they use (via ATA IDENTIFY)?

My guess based on perf characteristics is each instance has 2 x 960GB OCZ
Talos 2 C Series SSDs: [http://www.oczenterprise.com/ssd-
products/talos-2-c-sas-2.5-...](http://www.oczenterprise.com/ssd-
products/talos-2-c-sas-2.5-mlc.html)

~~~
spartango
It's very likely that they are using Intel SSDs (likely 320/330, maybe 510+)
for a variety of reasons:

Intel has a good drive reliability history, and is very enterprise friendly in
bulk purchasing. Intel has had excellent firmware for their drives, which at
the datacenter scale is valuable--people who have dealt with raid controller
firmware(including Amazon) know all about this.

Traditionally, Amazon has not used SAS drives in EC2, opting for lower cost
SATA drives. It's also unlikely that Amazon is using small numbers of high
capacity (>=500GB) drives because they still aren't perfectly price effective;
price per gb is ok, but replacing a failed drive is more costly.

Also keep in mind that to get to today, Amazon has been rolling these drives
in huge numbers out through two enormous data centers, so it's unlikely that
Amazon has picked a brand new drive (say the latest OCZ Vertex 4).

There are other factors that Amazon has bumped into while testing drives, but
they remain unreported and internal.

------
btb
Would these be suitable to run a single big SQL Server on? I mean specwise
they seem perfect for our size/use. They say data will survive a reboot, but
what are the chances I would some day have to wake up in the middle of the
night and having to restore the database to a new server? Would some of the
recent amazon downtimes be one of those cases where that could happen?

~~~
jedberg
That would be a really bad idea unless you have real-time replication to
another server. SSD or not, you don't want to write data to the ephemeral
store without a real-time backup, unless you're willing to lose it all.

------
oomkiller
Hopefully we will start to see some other providers offer SSD-backed storage
since Amazon does it now. It would be nice if they offered it on some smaller
instances too though.

~~~
thegyppo
Storm on Demand have done SSD storage for a while now, pretty solid IOPS
numbers too. I'll run a benchmark comparison tonight.

~~~
thegyppo
Ran some benchmarks, really insane IOPS on this plan. Pretty average CPU
performance (they're using fairly old E5620's).

[http://blog.serverbear.com/post/27553311076/hi1-4xlarge-
benc...](http://blog.serverbear.com/post/27553311076/hi1-4xlarge-benchmarks)

~~~
signifiers
I just ran the numbers on the new EC2 instance, and I'm pretty skeptical about
the benchmarks above. I'm not sure that, for example, a half second of dd
/dev/zero really tells us much.

When interpreting _any_ benchmarks on EC2, it's important to understand that
there is a 5-10% read/write performance hit on first use because AWS uses lazy
block wipes between customer instance launches. See
<http://www.youtube.com/watch?v=IedaYaKsb-4#t=29m49s> (should pre-cue, if not,
skip to 29:49). This is referenced in the docs, but it's easy to miss:
[http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/In...](http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/InstanceStorage.html?r=409#disk-
performance))

So here you go, for hi1.4xlarge:

 __*

Summary for the impatient - After initialization (i.e., second-write), quasi-
realistic I/O on the new SSD EC2 instances sustains writes @ 420 MB/sec and
reads @ 6 GB/sec. The entire 8.6GB / filesystem copied over to SSD in 21
seconds.

Not bad.

 __*

    
    
        # df -h
    
        Filesystem            Size  Used Avail Use% Mounted on
        /dev/sda1            8.0G  1.1G  6.9G  14% /
        tmpfs                  30G     0   30G   0% /dev/shm
        /dev/xvdf            1023G   16G  957G   2% /media/ephemeral0
    
        (Note: /dev/xvdf and /dev/xvdg are just soft links to /dev/sdf and /dev/sdg respectively)
    
        Crude stats on first-use:
    
        # hdparm -tT /dev/xvdf
    
        /dev/xvdf:
        Timing cached reads:   14788 MB in  1.99 seconds = 7446.69 MB/sec
        Timing buffered disk reads:  1066 MB in  3.00 seconds = 355.04 MB/sec
    
        Wipe the device:
    
        dd if=/dev/zero of=/dev/xvdf bs=1M& pid=$!
        while true; do kill -USR1 $pid; sleep 4; done;
         [...]
        dd: writing `/dev/xvdf': No space left on device
    
        1048567+17 records in
        1048566+17 records out
        1099511627776 bytes (1.1 TB) copied, 1955.42 s, 562 MB/s
    
        Stats after zero-wipe (dd /dev/zero) to device:
    
        hdparm -tT /dev/xvdf
    
        /dev/xvdf:
        Timing cached reads:   13260 MB in  1.99 seconds = 6673.05 MB/sec
        Timing buffered disk reads:  1124 MB in  3.01 seconds = 374.02 MB/sec
    
        hdparm -tT /dev/xvdf
    
        /dev/xvdf:
        Timing cached reads:   11188 MB in  1.99 seconds = 5624.17 MB/sec
        Timing buffered disk reads:  1122 MB in  3.00 seconds = 373.99 MB/sec
    
        hdparm -tT /dev/xvdf
    
        /dev/xvdf:
        Timing cached reads:   12930 MB in  1.99 seconds = 6505.78 MB/sec
        Timing buffered disk reads:  1124 MB in  3.00 seconds = 374.15 MB/sec
    
        Confirming Effect Of Pre-wiped I/O:
    
        hdparm -tT /dev/xvdg
    
        Timing cached reads:   11796 MB in  1.99 seconds = 5931.68 MB/sec
        Timing buffered disk reads:  1038 MB in  3.00 seconds = 345.87 MB/sec
    
        hdparm -tT /dev/xvdg
    
        /dev/xvdg:
        Timing cached reads:   12658 MB in  1.99 seconds = 6367.41 MB/sec
        Timing buffered disk reads:  1050 MB in  3.00 seconds = 349.47 MB/sec
    
        hdparm -tT /dev/xvdg
    
        /dev/xvdg:
        Timing cached reads:   12856 MB in  1.99 seconds = 6468.39 MB/sec
        Timing buffered disk reads:  1066 MB in  3.00 seconds = 354.80 MB/sec
    
        Pre- Vs. Post-wipe performance: 373.6 MB/sec vs. 349.3 MB/sec (6-7% speed improvement)
    
        Somewhat more real-world numbers:
    
        dd if=/dev/sda1 of=/dev/xvdf bs=1M
        8192+0 records in
        8192+0 records out
        8589934592 bytes (8.6 GB) copied, 19.7876 s, 434 MB/s
    
        dd if=/dev/sda1 of=/dev/xvdf bs=1M
        8192+0 records in
        8192+0 records out
        8589934592 bytes (8.6 GB) copied, 20.0365 s, 429 MB/s
    
        dd if=/dev/sda1 of=/dev/xvdf bs=1M
        8192+0 records in
        8192+0 records out
        8589934592 bytes (8.6 GB) copied, 21.4193 s, 401 MB/s
    
    

*Edit: formatting

------
damian2000
The pricing in the blog post is a bit unclear - the prices on
<http://aws.amazon.com/ec2/pricing/> are ... US East $3.10 for linux and $3.58
for windows. EU West $3.41 for linux and $3.58 for windows.

(Reserved instance prices are cheaper)

------
Loic
Another approach is something like OVH SSD servers (24GB ECC memory, 2x300GB
SSD, £210/month) <https://www.ovh.co.uk/dedicated_servers/mg_ssd_max.xml>

If you are using MongoDB, you take 3 or 4 of them and shard and you backup
with "conventional" storage for the replica set. You end up with a 6 node
cluster for less than the price of this Amazon instance.

Lesson: You need to have a business which can benefit from a lot of start/stop
of your instances for them to make sense from a pure financial point of view.

~~~
cbg0
Some notes:

1\. You can't order from OVH if you're from outside their list of approved
countries. 2\. You're not using any RAID and those are desktop grade SSD
drives, and they tend to die out, sometimes without a clear warning, as
they're not really intended for 24/7 server use.

~~~
Loic
1\. The list of countries from where you can order servers is expanding on a
regular basis. If your country is not there yet, it should come in the future.

2\. You have 2 300GB with a RAID card (battery powered). So, you can put in
RAID. For the reliability, I keep my fingers crossed (I have some of these
servers) but no failures yet and an interview with the operators said that
basically they an extremely good reliability. This is not marketing in this
case, this is because they need it to be financially sustainable.

By the way, do you know what kind of SSD (SLC, MLC, real disks or cards?) are
used by Amazon?

~~~
caw
Being a storage admin, when you start to have gobs of hard drives, there's
always one failing. You'll have dry spells where nothing fails, and then all
of a sudden you're looking at 2,3+ failures on a system, although not
necessarily on the same RAID group.

If your hardware provider needs to cut down on the warranties to be
financially sustainable, I'd be concerned. It looks like these are rentals and
not purchases, so why wouldn't these guys be warrantying to Dell/HP or the
drive manufacturer directly? Are they buying gray market to reduce the cost,
trying to pass that savings off to you, but then in turn run out of recourse
when they need to replace a drive?

I'm just speculating; I have no idea if this company is good or not. I'm just
concerned about the statement you made about the company, whether it's from
your understanding or what they actually said.

~~~
Loic
Sorry, not being English my comment was maybe not really clear. They do not
buy on the grey market (they are the largest hosting provider in Europe, 120k+
servers) but they carefully select the drives to have only the ones with the
best reliability because the cannot afford to simply swap the drives of the
dedicated servers to often as they operate on a low margin approach. They are
not cutting down on guarantee, these are dedicated servers with guaranteed
hardware, they change the drives in case of failure at no cost.

If you operate on low margin, you better have systems with minimal needs of
manual operations, because as soon as you have one guy pulling a dedicated
server, changing the drive and putting another one, you have lost a couple of
months of your earnings on this particular server. If you do that too often,
you are not happy at the end.

------
rit
It seems we all may be missing the "backed" part - which I did on my first
read through. They don't seem to be revealing how much of the logical volume
is actual SSD, which I think is why they're instead putting down IOPS numbers.

Still, a huge and significant improvement over anything previously available.
I'm looking forward to playing with it.

~~~
spartango
It's very likely that the entire volume is SSD, otherwise the tp99 would be
atrocious (and if you look at the netflix numbers its actually quite good).
The reason for the broad ranges is more likely that the SSDs have some
inherent performance variability due to wear leveling and GC processes.
Additionally, as time goes on, multi-tenancy will start to come into play (if
Amazon has smaller SSD instances or larger machines), which will stay within
that range comfortably.

------
jaredstenquist
Since I'm spending more than $1,000/month for an RDS master, which is backed
by EBS, i'm intrigued at the idea of running our database off of these. Of
course I'd lose all the awesome automated features of RDS, but worth
considering.

Maybe AWS will release a High Performance RDS option that runs off of them.
Wishful thinking.

------
drudru11
This is nice, but they need a smaller instance that costs less that also has
SSD. Until then, I'm not going to blow $5k a month on a system like this.

------
amnigos
This would be very useful for elastic EMR workloads, will be good for killing
I/O bottlenecks.

------
level09
wow, would like to do a redis benchmark on one of those !

------
beedogs
Amazon will find some way to make this slower than shit and less reliable than
a campaign promise.

~~~
jedberg
I've used them and they are quite performant. Definitely lives up to its
promise.

