

New Amazon EC2 High Storage Instances - zeratul
http://phx.corporate-ir.net/phoenix.zhtml?c=176060&p=RssLanding&cat=news&id=1769314

======
pella
"The New EC2 High Storage Instance Family"

[http://aws.typepad.com/aws/2012/12/the-new-ec2-high-
storage-...](http://aws.typepad.com/aws/2012/12/the-new-ec2-high-storage-
instance-family.html)

" _The High Storage Eight Extra Large (hs1.8xlarge) instances are a great fit
for applications that require high storage depth and high sequential I/O
performance. Each instance includes 117 GiB of RAM, 16 virtual cores
(providing 35 ECU of compute performance), and 48 TB of instance storage
across 24 hard disk drives capable of delivering up to 2.4 GB per second of
I/O performance._

 _This instance family is designed for data-intensive applications that
require high storage density and high sequential I/O -- data warehousing, log
processing, and seismic analysis (to name a few). We know that these
applications can generate or consume tremendous amounts of data and that you
want to be able to run them on EC2. The storage on this instance family is
local, and has a lifetime equal to that of the instance. You should think of
these instances as building blocks that you can use to build a complete
storage system. You should build a degree of redundancy into your storage
architecture (e.g. RAID 1, 5, or 6) and you should use a fault-tolerant file
system like HDFS or Gluster. Of course, you should also back up your data to
Amazon S3 for increased durability._ "

------
alimoeeny
Why the link URL is like this?

~~~
mbell
Seriously is this a phishing site? Amazon page being served off a
phx.corporate-ir.net domain?

~~~
jeffbarr
Great question, here's the scoop!

The corporate-ir site is actually part of Thomson Reuters and our press
releases end up there.

If a particular AWS release includes both a press release and a blog post, the
press release goes out first.

After the release shows up in public I publish the blog post and submit it to
HN.

Quantum fluctuations of the universe caused the press release to get more
votes than the blog post and that's why it's on the front page.

~~~
MichaelApproved
I thought it was industry norm to link a press release to the blog post. In
that case, the blog post would go out first. What's the reason you decided to
send out release first? I'm genuinely curious about the reasoning.

~~~
jeffbarr
The press release is considered "definitive" for some reason. This is how our
PR team explained it to me.

This is actually a very interesting conversation. It is interesting to see
that the post on the AWS blog, hosted on TypePad, is seen as more official
than the press release.

~~~
wmf
This may be related to Reg FD. AFAIK press releases are considered fair to
investors since they're guaranteed to hit everybody's Bloomberg terminal.
Although Jonathan Schwartz convinced the SEC that blogging is compliant with
Reg FD, I suspect many companies don't want to risk it.

~~~
apaprocki
Yes, Reg FD -- professional IR services cater to this market. In case you're
interested in the timeline from the Bloomberg Terminal point of view:
<http://imgur.com/wwzUp>

tl;dr PR newswire (BUS) comes first @ 8:54, it hit Bloomberg News wire four
seconds after release. Amazon blog wasn't until 9:17, followed by others such
as TechCrunch @ 10:11.

------
corresation
Worth noting that it sounds like hs1.8xlarge is built on magnetic disks (24
2TB HDs - edit: originally put 1TB), each reading some 100MB/sec, yielding the
theoretical max of 2.4GB/s in a RAID-0 configuration. No one actually uses
disks in such a fashion, and gross throughput of magnetic drives has seldom
been of much utility (hence the strong demand for SSDs. Random IO matches the
vast majority of workloads more appropriately).

Just caveats. This doesn't look like a terribly interesting option.

~~~
JoachimSchipper
Good point, but their blog post does give a few possible uses: "Storage
instances are ideal for data-intensive applications including Hadoop
workloads, log processing and data warehousing, and parallel file systems to
process and analyze large data sets in the AWS Cloud". If your code fits the
pattern, a high-storage instance may fit well.

Of course, getting that much data into and out of the cloud is its own
problem.

~~~
corresation
Even getting the data to that machine presents a problem that seems to
undermine the value of it entirely: As you mentioned, the real value in this
is linear processing of large sets of data, but the storage is ephemeral so
your process has to be some variation of firing this instance up, copy TBs of
data to the machine, and then do linear processing. Given that you have to get
the data there, the value of the high aggregated gross-throughput seems
secondary -- just stream process it, etc.

I'm having a tough time seeing where this type of instance fits.

~~~
edvinasbartkus
That's why AWS Data Pipeline came along. <http://aws.amazon.com/datapipeline/>

~~~
wmf
Data Pipeline looks like a fine orchestration service, but it's not going to
ingest 48 TB of data any faster than you can do it yourself. Which is probably
not that fast.

~~~
throw_away
maybe + <http://aws.amazon.com/importexport/>

------
RyanZAG
I'm going to need to mortgage my house to get my hands on some of these,
aren't I?

~~~
petercooper
On-demand pricing is $4.60 at the moment which isn't bad considering what you
get ($110 per day; $3300ish per month).

I just ran some 'Reserved Instances' quotes and for 12 months I got $3968.00
upfront and then $2.24 per hour OR $9200.00 upfront and then $1.38 per hour.
For 3 years you can go as far as $16924.00 upfront and then $0.76 per hour
(for a long term effective rate of $1.404/hr).

~~~
moe
_On-demand pricing is $4.60 at the moment which isn't bad considering what you
get ($110 per day; $3300ish per month)_

Well, depends on what you compare to.

The reserved price for 3 years is quite revealing in this case; So Amazon asks
$16924 upfront and... wait, $17k upfront?

You can buy an equivalent supermicro box with 24x2T, 192G Ram (not 128) for
$10k. Thus if you rent the reserved EC2 variant for 3 years you end up paying
at least a 4x markup versus housing a dedicated box.

~~~
ihsw
You fail to mention rack rental fees and network IO fees, and maintenance
(parts replacement, parts shipping/handling, cost of downtime, maintenance
staff salaries).

~~~
moe
We're talking about a difference of $10k USD _per instance_ per year.

You'll want at least two for redundancy so that's $20k spare change if you go
dedicated. That buys quite lot of rackspace, network IO and spare parts.

Staff salaries don't factor in because if you need storage of that scale you
can't do without a competent admin either way (the $20k comfortably pay for a
fully managed colo with remote hands).

Needless to say most deployments of that size will need more than two boxes,
at which point the markup tips entirely into wtf-territory. Note my
calculation was really generous here, too. In reality you get steep hardware
discounts on top that make amazon look even worse, and you can buy boxes with
higher storage density for an even better $/GB.

~~~
TillE
My impression of EC2 and S3 has always been that they're appropriate for
meeting short term needs (to handle extra load or to function as backups), but
that's about it.

Their long-term pricing is terrible compared to other options, and they offer
few benefits. At the low end, standard VPSes are much cheaper. At the high
end, never mind colo, there are dedicated and even managed servers which offer
far better value.

