At $3.10/hr, these instances work out to $2k/mo.
There are probably many more cost-effective options if you want a 2TB SSD server.
Since the benefit of using EC2 is that you can provision instances elastically, what are the sorts of scenarios in which one needs to provision high I/O servers elastically?
[edit: A few minutes of Googling, and I can't find any dedicated servers with 2 TB of SSD.]
SSDs would make our jobs run significantly faster. So much so that we've toyed with the idea of adding SSDs to our in-house cluster, but couldn't quite justify the costs. This might actually shift the cost savings to get our lab to migrate to EC2 as opposed to our in-house or university cluster.
That's definitely something we will look into :)
What I'm working on is mapping those short sequences (50-75 bases) to the genome and then either looking for mutations or expression levels (how many of those reads map to a particular location). There are a couple of ways to do the mapping, but most these days use either a big hash table or a Burrow Wheeler transform.
And that's all just to get the data that you can then do something else with (gene expression, variation modeling, etc...).
This is raw file though, so once processed (but not yet analyzed) I believe we have sizes around 50 to 100GB (but that's not really what I work on so don't quote me fully on this).
The next steps vary on what you want to do exactly, but it usually involves alignment of base pairs (basically, trying to tie together by their ends sequences of DNA but seeing if they "fit").
Cloudburst (a hadoop based aligner) has a good description of an algorithm:
Though they can get much more sophisticated and there are a number of open and closed source implementations...I only link this one because of the quality of the figure.
The data sets we work with in my group can be up 400gb's of compressed text for the reads from a single individual.
Another example from biology with a similar computational profile would be searching through a hugh number of mass spectrometer outputs to identify the components in a new sample.
I am the founder of SSD Nodes, Inc. (http://www.ssdnodes.com) and we offer lower-sized SSD-backed storage in addition to custom cloud and dedicated plans that range from 1-12TB of local SSD storage, at comparable performance. [/plug]
EDIT: Downvoted? Please offer your point of view.
EDIT2: Sample pricing offered in comment below.
8 x 3.4GHz E3-1270
11x 200GB SLC SATA III SSD (2TB @ RAID 5)
32GB DDR3 ECC RAM
6000GB Outbound BW
2 x 1 GigE Public/Private Interfaces
That's just month-to-month, if a longer term were purchased we could give a discount depending on the term.
Way too many people seem to believe that the only benefit of "on-demand" is "elasticity", and then make bogus arguments here that "if you can plan your traffic you shouldn't be using EC2": EC2 is cheaper than people like to claim (and is in fact quite price competitive) and your ability to turn on/off machines on a whim changes the way you look at hardware so drastically that, in all honesty, it makes traditional ways of dealing with hardware seem draconian and only worth putting up with if you are dealing with some weird corner case or have horribly special requirements.
How often do we have to repeat this argument here on HN.
When running 24/7 then EC2 instances are 2-3x more expensive than the cheapest equivalent rent-server options and orders of magnitude more expensive than the physical hardware if you buy it.
The numbers have been recited countless times, I'm not digging them up yet again.
So, no, EC2 is not cost effective for steady loads at the mid-range. EC2 shines at the very low and the very high end and in specific workloads, i.e. it shines when the benefits can be quantified to an amount greater than the price difference.
Orders of magnitude? So, like 100x or 1000x more expensive? Really?
We were in the 5k/month ballpark for EC2, and cut it to under $600 with a few grand outlay for hardware spread over the course of a quarter.
That said, all of my current projects are on EC2 for the provisioning flexibility, and because I hate having to drive down to a datacenter at 4AM to swap a drive.
If you find a company that has reasonable support, reliable servers, good datacenters, and the minimal features required to debug issues remotely, then you are looking at prices fairly equivalent to those offered by EC2 heavy-utilization reserved instances (and are going to end up with a similar contract length anyway). If this somehow doesn't work out: call Amazon AWS's sales division and see if you are compelling enough for them to negotiate with (they totally have a sales division, and they really do "want your business").
Regardless, your choice of quote is really bothersome: "people like to claim" that EC2 is as expensive as their on-demand list prices, and that's a fact clearly demonstrable by the person I'm responding to (who is quite clearly and obviously claiming EC2 is more expensive than it really is) and one that is not defensible as the price you should be looking at is the heavy-utilization reserved pricing; if you'd like to respond to my comment "and is in fact quite price competitive" then you should quote that and adjust your argument appropriately.
Honestly, the history of HN is not much better (as I scour around trying to find the "numbers" you claim "have been recited countless times"). It is actually difficult to find people who don't claim that Amazon EC2 is more expensive than it is; I'm almost wondering if you and I are living on different versions of the site...
"EC2 is about 10-20 times more expensive than dedicated hosting. Even if reserved instances save us 22% over 3 years, it still doesn't even come close." -- cmer
^ No, EC2 reserved instances save you 71% over 3 years.
"It costs $576/mo to run an extra large EC2 instance fulltime" -- stephenjudkins
^ No, even two years ago (before "heavy utilization reserved instances") you could drop this price by 66% to $195.84/mo.
"With EC2 prices at about $0.10 per hour, I can't imagine ever using a service with such a high premium." -- apinstein
^ Obviously: no, but the fact that this person is angry about the price of a small instance at $72/mo is quite telling; he isn't willing to go lower than $20/mo.
I found a price comparison by vladd from earlier this year, comparing a high-end VPS to EC2's largest offering, coming up with a nearly 10x difference, but the server is entirely useless: it is a consumer-level product running non-ECC RAM. Later comments claim the same hosting company has "competitively priced servers with ECC ram".
A couple months ago I found a thread that linked to a fairly detailed argument stating that EC2 instances are 2-3x more expensive than a VPS. However, this person again is performing a comparison with non-ECC hardware. What damns this comparison, however, is that he is not taking advantage of 3-year reserved instances for a long-term high-end use case: his numbers seem to be based on 49% off, when he can easily get 71% off, nearly a 2x difference. <- Again, EC2 is cheaper than people like to claim.
Seriously: I can't find anyone who is actually doing legitimate comparisons of Amazon's offerings. People either compare EC2 to "I spent a week of time negotiating a deal to take over a bunch of hardware from a failing company down the street" (which, for the record, will also give you a great deal on chairs and office furniture: comparing the cloud to a fire sale is inane), assume "a server is a server is a server" and find "the cheapest" option (which seems to always have unreliable RAM), or (frankly: "and") fail to take into account Amazon's reserved instance discounts.
That said, Hacker News has a really horrible search system, and I'm trying to find something kind of esoteric (as I want to search for a dollar sign, and thereby have to use proxies such as "expensive" and "cost"). I would thereby love to see an honest comparison, and am happily willing to believe that I missed it: do you have a link to such?
Sorry to break that for you but EC2 instances are in all likelihood not running on ECC-Ram either. If they had ECC-Ram then Amazon would probably prominently advertise that or at least respond when they are directly asked. If you can find a link to prove the opposite then I'll take that back.
I would thereby love to see an honest comparison, and am happily willing to believe that I missed it: do you have a link to such?
You have probably already seen any of the blog-posts I could cite here, so I'll instead just try to wrap your two claims up:
1. You claim that dedicated servers are more labor intensive (setup, hardware failures) and require more staff. This is not my experience at all. In fact the complexity and idiosyncrasies of the AWS platform are much harder to abstract in the beginning, and no less labor intensive in the long term. You're just trading one set of problems (hardware issues) for a different one (cloud issues). What you may save on the hardware management front you have to spend on adapting your application for a cloud-environment.
2. You claim that equivalent hardware to an EC2 instance (with comparable performance, good support, network, etc.) would be roughly the same price as an EC2 instance. Sorry but that is laughable, when have you last time benchmarked an EC2 instance? Even a cheap rented dedicated server (hetzner, leasweb, ovh) will normally give you twice the bang for buck on every key metric (I/O, Ram, CPU). And this quickly raises to beyond an order of magnitude when you start comparing EBS to a local array or a 256G Ram box to 256G Ram in EC2-instances. Where redundancy is a concern you can usually quite literally buy two of each and still be cheaper than EC2.
I'll say what I always say: EC2 does have its place. However for deployments in the range of 10-~50 servers you will in pretty much all cases save a lot of money by sticking with dedicated servers for the base-load. That is unless your app needs the cloud-flexibility, of course (most apps don't).
What makes you believe this flexibility would come for free anyways? As all things it comes with a price-tag, and actually quite a hefty one in this case.
I've been dealing for long enough with hosters and hardware to know that nothing goes without saying.
Xeons and Opterons which only support ECC
Have you actually checked the CPU models they use?
All I know is that amazon uses a range of different CPUs, and some Xeon/Opteron models do accept non-ECC Ram.
only be a few percent more expensive
In the past ECC DIMMs used to be significantly more expensive.
Either way, as said, I don't know whether they're using ECC Ram. I agree it should go without saying, but I don't share your optimism that it actually does. I also wonder why they explicitly mention it for their GPU-instances when it goes without saying otherwise.
A little more than an anonymous one-liner in a forum would really help my confidence...
Interesting! Any other hacks enabled by EC2 and the like that make life much easier than real dedicated hardware?
Since we moved to EC2, updating is simpler. The service runs on a micro instance. We launch a large instance to do all the CPU- and IO-intensive processing that prepares the new dataset, then launch a new micro instance, upload the dataset, run a few smoke tests, and if all is well, cut over to it. Because we're doing it off line, we were able to optimize the data processing for speed rather than low resource usage, and cut the runtime down to 45 minutes.
One thing that's often missed in discussions of IaaS versus bare metal is that the elasticity of a particular application can be affected by its design. When we were running on dedicated machines, we smoothed out the load to avoid idle hardware, but after moving to EC2 we concentrated it into spikes to get maximum productivity from running instances. In our case, spiky load is better from a business point of view, because serving data that's 1-25 hours old is better than data that's 8-32 hours old.
1. Keeping your app on EC2 and working around the lack of high I/O options
2. Keeping your app somewhere else and working around the lack of EC2-style elasticity
In TFA, Netflix had chosen #1, and they used to run an extra memcached layer + I/O on 48 instances. They were able to bring this down to 15 I/O instances with no intervening cache, and lower overall latency.
That said, I'd guess the on-demand hi1.xlarge won't get a lot of usage; I imagine they offer it just for orthogonality's sake (all other instances are available both on-demand and reserved), plus the ability to try before you buy.
What's really exciting is that Amazon clearly recognizes their lack of good I/O solutions. Maybe we'll see a whole range of options stem out of this... one can hope.
Say I have a batch process that has huge I/O requirements that has to run once per month and be finished within X hours of starting or SLAs are broken. (Plenty of these types of custom workloads exist in the enterprise)
I can either buy a server with specialist enterprise-grade SSD / Fusion-IO / whatever (> $20,000 most likely) for this once a month process or I can spin up one of these high I/O servers for 1 day per month for a grand total of $50.
In this scenario, this new server type is a godsend.
You may find that Fusion-IO is cheaper than $50 + 3 months of consulting time.
NB Honestly, I think that is a business opportunity....
And as with all EC2 pricing, reservations drop the price substantially. A 1 lease reservation for a "heavy utilization" instance comes out to $7280 per year + $0.621 per hour for a total of $12719
I believe that's $7280/year plus $0.621 per hour, or about a thousand dollars a month amortized over a year.
(Don't process cc transactions this way.)
Otherwise, as noted below, the most likely use case is in a reserved situation running Cassandra or Postgres or something.
(Not trying to be argumentative, I'm just trying to figure out what you have in mind.)
But what I'm thinking of is the tokenization of a large corpus of text into bigrams and trigrams for example.
Hmm, that reminds me, I have an interesting toy problem which is sequential I/O limited...
Note that there's a code contest for Common Crawl that recently was announced: http://commoncrawl.org/first-ever-code-contest/
This kind of instance is a nice addition for us as far as our auto scaling is concerned.
If lots of your infrastructure is in ec2, you may need a good db server inside ec2 that is used constantly. eg if you have a read heavy cassandra workload, you need ssds.
In this video , Foursquare says the biggest problem they're facing with EC2 is consistency in I/O performance. They say that the instance storage simply isn't fast enough for them, and while EBS is fast enough when RAIDed, it isn't consistent since it isn't local (EBS is traffic goes over the network). Reddit has also complained about EBS, but they've been able to move onto the instance storage.
If you're willing to reserve the instance for 3 years, the average monthly cost becomes only $656. That's quite a good deal.
Foursquare says in that video they're planning to migrate off of EC2, in part due to I/O performance. I'll be interested to hear whether or not this instance type changes their minds.
The only problem with reserving that instance for 3 years is that better hardware always comes along, especially with the cost of SSDs coming down significantly every year. Usually if you're in the big-data space, your hardware is likely retired after 24 months (12 months if you're well funded) so locking yourself in for 36 months might be a bad investment.
I had thought that EC2 reservations were upgradeable, but a quick check on the forums shows you're right, they're not. Of course, you can play your own "tiered usage" game, like laptops in IT departments, where the old h1.xlarge becomes cheap enough to use as a second-tier machine and you go reserve the h1.xxlarge for Cassandra.
So it does at least appear that in some cases, they'll let you out of your reservation so that you may sign up for something similar. Or at least they let us do that.
AWS cuts their costs at a relatively reliable rate. We've done the match an found that the 1 year reservations are absolutely worth while, but that the 3 year reservations are not. Granted that was for our specific workload / use case.
Money has a time value, and this stuff is getting cheaper fast.
TRIM ( https://en.wikipedia.org/wiki/TRIM ) isn't supported with RAID on SSD today on hardware controllers and most distributions of linux don't support TRIM on RAID out of the box if you're doing software RAID, so you're going to see performance plummet like a rock after you do one pass of writes on the disk. In many RAID configurations, you're going to zero-write the entire disk when formatting it, so performance is going to suck from the get-go. For this reason, even if you have a tiny database and don't expect to write 1TB worth of data, your performance might still suck. Personally, I haven't tried linux software md TRIM in production, the patch is pretty recent, so you're on your own here (if possible, scaling out horizontally may be a solution to consider for redundancy, I have no idea what Amazon using for SSDs, but recent Sandforce generations fail all the time, so plan for that).
If you don't know to look for this issue, you're going to be scratching your head when your RAID10 SSD configuration write throughput is worse than a single 7200rpm drive. On the other hand, IOPS on SSDs are AMAZING for databases/datastores. Amazon may have solved this for you already behind their visualization instance, and they might be running their own software striping behind whatever raid you're doing, so be sure to test it out fully first.
That said, you're absolutely right about being cautious/not RAIDing the volumes. There are almost no RAID configurations that support TRIM at present, so it's definitely not a good idea to be RAIDing up these drives. Just go JBOD.
Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not recommended for use on SSDs. During the initialization stage of these RAID levels, some RAID management utilities (such as mdadm) write to all of the blocks on the storage device to ensure that checksums operate properly. This will cause the performance of the SSD to degrade quickly.
It's telling that they have only enabled this for a huge (quadruple extra large) instance type. It's probably hard to make this work for someone who just wants a 10GB disk with great IO. The problem at the low end is that disks are larger and would thus have to be divided up to make proper use of them, leading to IO contention..
The high IO options will probably only ever be available for pretty large instances.
My guess (not based on any knowledge of EC2 internals) is that they don't have any way to do fair I/O sharing between guests. If they did, they could split these boxes into 32 small instances with 1 ECU, 1.7 GB RAM, and a 60 GB disk with 2500 random reads / 250-4000 random writes per second.
The most likely reason for not slicing these systems up to smaller instances is they want to maintain consistent, high performance I/O.
> Why the range? Write IOPS performance to an SSD is dependent on something called the LBA (Logical Block Addressing) span. As the number of writes to diverse locations grows, more time must be spent updating the associated metadata. This is (very roughly speaking) the SSD equivalent of seek time for a rotating device, and represents per-operation overhead.
For heavy analytics workloads I'd bet that Google BigQuery (https://developers.google.com/bigquery/) would be cheaper and faster and more reliable.
My guess based on perf characteristics is each instance has 2 x 960GB OCZ Talos 2 C Series SSDs: http://www.oczenterprise.com/ssd-products/talos-2-c-sas-2.5-...
Intel has a good drive reliability history, and is very enterprise friendly in bulk purchasing. Intel has had excellent firmware for their drives, which at the datacenter scale is valuable--people who have dealt with raid controller firmware(including Amazon) know all about this.
Traditionally, Amazon has not used SAS drives in EC2, opting for lower cost SATA drives. It's also unlikely that Amazon is using small numbers of high capacity (>=500GB) drives because they still aren't perfectly price effective; price per gb is ok, but replacing a failed drive is more costly.
Also keep in mind that to get to today, Amazon has been rolling these drives in huge numbers out through two enormous data centers, so it's unlikely that Amazon has picked a brand new drive (say the latest OCZ Vertex 4).
There are other factors that Amazon has bumped into while testing drives, but they remain unreported and internal.
When interpreting any benchmarks on EC2, it's important to understand that there is a 5-10% read/write performance hit on first use because AWS uses lazy block wipes between customer instance launches. See http://www.youtube.com/watch?v=IedaYaKsb-4#t=29m49s (should pre-cue, if not, skip to 29:49). This is referenced in the docs, but it's easy to miss: http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/In...)
So here you go, for hi1.4xlarge:
Summary for the impatient - After initialization (i.e., second-write), quasi-realistic I/O on the new SSD EC2 instances sustains writes @ 420 MB/sec and reads @ 6 GB/sec. The entire 8.6GB / filesystem copied over to SSD in 21 seconds.
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 8.0G 1.1G 6.9G 14% /
tmpfs 30G 0 30G 0% /dev/shm
/dev/xvdf 1023G 16G 957G 2% /media/ephemeral0
(Note: /dev/xvdf and /dev/xvdg are just soft links to /dev/sdf and /dev/sdg respectively)
Crude stats on first-use:
# hdparm -tT /dev/xvdf
Timing cached reads: 14788 MB in 1.99 seconds = 7446.69 MB/sec
Timing buffered disk reads: 1066 MB in 3.00 seconds = 355.04 MB/sec
Wipe the device:
dd if=/dev/zero of=/dev/xvdf bs=1M& pid=$!
while true; do kill -USR1 $pid; sleep 4; done;
dd: writing `/dev/xvdf': No space left on device
1048567+17 records in
1048566+17 records out
1099511627776 bytes (1.1 TB) copied, 1955.42 s, 562 MB/s
Stats after zero-wipe (dd /dev/zero) to device:
hdparm -tT /dev/xvdf
Timing cached reads: 13260 MB in 1.99 seconds = 6673.05 MB/sec
Timing buffered disk reads: 1124 MB in 3.01 seconds = 374.02 MB/sec
hdparm -tT /dev/xvdf
Timing cached reads: 11188 MB in 1.99 seconds = 5624.17 MB/sec
Timing buffered disk reads: 1122 MB in 3.00 seconds = 373.99 MB/sec
hdparm -tT /dev/xvdf
Timing cached reads: 12930 MB in 1.99 seconds = 6505.78 MB/sec
Timing buffered disk reads: 1124 MB in 3.00 seconds = 374.15 MB/sec
Confirming Effect Of Pre-wiped I/O:
hdparm -tT /dev/xvdg
Timing cached reads: 11796 MB in 1.99 seconds = 5931.68 MB/sec
Timing buffered disk reads: 1038 MB in 3.00 seconds = 345.87 MB/sec
hdparm -tT /dev/xvdg
Timing cached reads: 12658 MB in 1.99 seconds = 6367.41 MB/sec
Timing buffered disk reads: 1050 MB in 3.00 seconds = 349.47 MB/sec
hdparm -tT /dev/xvdg
Timing cached reads: 12856 MB in 1.99 seconds = 6468.39 MB/sec
Timing buffered disk reads: 1066 MB in 3.00 seconds = 354.80 MB/sec
Pre- Vs. Post-wipe performance: 373.6 MB/sec vs. 349.3 MB/sec (6-7% speed improvement)
Somewhat more real-world numbers:
dd if=/dev/sda1 of=/dev/xvdf bs=1M
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 19.7876 s, 434 MB/s
dd if=/dev/sda1 of=/dev/xvdf bs=1M
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 20.0365 s, 429 MB/s
dd if=/dev/sda1 of=/dev/xvdf bs=1M
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 21.4193 s, 401 MB/s
From my limited testing, a 1 Node instance from GridVirt at $30/month can compile gcc-4.6.3 within 15 minutes (not exactly an IO-intensive example...)
(Reserved instance prices are cheaper)
If you are using MongoDB, you take 3 or 4 of them and shard and you backup with "conventional" storage for the replica set. You end up with a 6 node cluster for less than the price of this Amazon instance.
Lesson: You need to have a business which can benefit from a lot of start/stop of your instances for them to make sense from a pure financial point of view.
1. You can't order from OVH if you're from outside their list of approved countries.
2. You're not using any RAID and those are desktop grade SSD drives, and they tend to die out, sometimes without a clear warning, as they're not really intended for 24/7 server use.
2. You have 2 300GB with a RAID card (battery powered). So, you can put in RAID. For the reliability, I keep my fingers crossed (I have some of these servers) but no failures yet and an interview with the operators said that basically they an extremely good reliability. This is not marketing in this case, this is because they need it to be financially sustainable.
By the way, do you know what kind of SSD (SLC, MLC, real disks or cards?) are used by Amazon?
If your hardware provider needs to cut down on the warranties to be financially sustainable, I'd be concerned. It looks like these are rentals and not purchases, so why wouldn't these guys be warrantying to Dell/HP or the drive manufacturer directly? Are they buying gray market to reduce the cost, trying to pass that savings off to you, but then in turn run out of recourse when they need to replace a drive?
I'm just speculating; I have no idea if this company is good or not. I'm just concerned about the statement you made about the company, whether it's from your understanding or what they actually said.
If you operate on low margin, you better have systems with minimal needs of manual operations, because as soon as you have one guy pulling a dedicated server, changing the drive and putting another one, you have lost a couple of months of your earnings on this particular server. If you do that too often, you are not happy at the end.
Still, a huge and significant improvement over anything previously available. I'm looking forward to playing with it.
Maybe AWS will release a High Performance RDS option that runs off of them. Wishful thinking.