I would love to see some longer-term measurements. I get the impression that this benchmark was doing a one time measurement. But the one big problem I have with cloud services is I/O performance over a longer time period.
We read so many times about very unreliable performance. Sometimes it's ok and sometimes it's really, really bad.
Without any kind of time period to continuously run a benchmark in, this doesn't really help. For all we know, the first placed service was just having a good day and the last placed a very bad one.
As a hobby, I've been running long-term benchmarks on a handful of cloud services, for exactly the reason you suggest -- performance varies quite a bit over time. You can browse through some of the data at (http://amistrongeryet.com/dashboard.jsp). The UI on this site is abysmal, and it only shows 30 days of data (I've actually collected almost two years), but it still gives some flavor of how much variability there is. For instance, check out this graph of SimpleDB reads: (http://amistrongeryet.com/op_detail.jsp?op=simpledb_readInco...). I've blogged on the data from time to time; for instance, (http://amistrongeryet.blogspot.com/2010/04/three-latency-ano...).
If there's interest, I'll work on making this data more accessible, adding documentation, and providing access to the full two years. In any case it's limited by the fact that I'm only probing AWS and Google App Engine, and only one or two instances of each. What I'd really like to do is open this up to crowdsourcing -- as I discussed a while back at (http://amistrongeryet.blogspot.com/2011/07/cloudsat.html). If anyone is interested in participating in a project like this, let me know!
I was surprised that the Storm on Demand SSD instances didn't seem do better (really only ~2x as fast as spinning disks?). Turns out this is some weird bundle of benchmarks.
If I'm reading right, its ~868MB/s seq write (CPU bound) and ~594MB/s seq read (CPU bound). Not sure how to read the random IO results. The bigger instances would probably be faster still (more CPU).
So the Storm on Demand SSD instances seem to be blazingly fast.
Interesting from an IO point of view. It would be nice to have a clearer tie-in with CPU and other benchmarks for these providers. Does anyone have any experience with Storm on Demand to share? Their prices for SSD-based servers look enticing.
We co-locate with their parent company Liquid Web and have conducted extensive testing of their cloud services. They are a very solid service, knowledgable technical staff, very quick support response, great hardware, excellent reliability, and reasonable prices. We've been monitoring Storm for 2 years with 100% availability in 2 of their 3 regions: http://cloudharmony.com/status
Here is a CPU performance comparison for the same set of servers. This metric is an approximation of EC2's ECU for other providers (the algorithm uses 16 CPU benchmarks and EC2 instance performance as a baseline): http://cloudharmony.com/benchmarks?benchmarkId=ccu&selec...
The service seems really interesting but the charts/graphs could use a little tufte love.
Why are the bars in the chart in a different order than the lines in the table? Maybe have the different amazon/xyz-provider be different shades of the same color?
If you are running additional queries and need a web service token (the site currently only allows you to generate 5 reports/day for free), ping me and I'll send you one for free.
Is it just me or Linode is looking pretty bad in these benchmarks ? What gives ? And that's without mentioning price ... Cloud Storm's SSD first option starts at only $75/month and they give you 3GB RAM !
Those servers are not tested at the same time. Storm was tested a week ago, but Linode 2 years ago - not a fair comparison, as servers do get upgraded continuously.
I tested Linode, EC2 and Slicehost few years ago and Linode was twice as fast than others in I/O. Though, I was testing smallest instances available, while benchmark above used large instances.
That's false, it's a fine configuration for a slave database server where you don't care if it goes down after a year or two. In production you could make it a RAID 10 if you cared, but if we're just trying to get a sense for the numbers, RAID 0 is cheaper/easier. Or you could just do a single SSD as a baseline, but I would think that those who care enough about performance to go SSD are probably going for a RAID.
I've been running some RAID 0 SSDs on slaves for years under moderately heavy write load. If you leave enough space free for them to do wear leveling, there's no reason they should fail within months.
I really wish they'd give more information about their methodology. For example, I've used the SoD SSD servers for some of my own testing, and they pull 30K IOPS for small random synchronous writes. How does that translate to "361.62"? WTF are the units here? What workload were they even testing? Yes, I know they list the random grab bag of tools they're using, but most of those are capable of generating many different kinds of I/O and they don't say what arguments they used. "361.62" seems very precise. I'm sure the two digits after the decimal point really impress the pithed snails or MBAs who are the benchmarkers' apparent target market. However, given both the bogosity of combining disparate measurements like this and the well known variability over time of cloud performance, that precision is not justified. Numbers that are more precise than they are accurate or meaningful are just decoration.
P.S. I expect someone will ask for more specifics, so here are a few. First, Bonnie++ sucks. Many of the numbers it produces measure the memory system or the libc implementation more than the actual I/O system. I've seriously gotten more testing value from building it than running it, so its very presence taints the result. Second, fio/hdparm/iozone might be redundant, according to which arguments are used. Or the results might be non-comparable. Either way, the aggregate result could only apply to an application with exactly that (unspecified) balance of read vs. write, sequential vs. random, sync vs. async, file counts and sizes, etc. Did they even run tests over enough data to get past caching effects? That's particularly important since they used different memory sizes on different providers. Similarly, what thread counts did they use on these different numbers of CPUs/cores? Same across all, or best-performing for each component benchmark? With such sloppy methodology anything less than an order-of-magnitude difference doesn't even tell you which platforms to test yourself.
So this compared Amazon's cc.4xl server @ 2.10 an hour against, e.g., Rackspace's 16G server at 0.96 an hour, and used its local storage in RAID 0 vs Rackspace's SAN storage? I'm pretty sure you couldn't have compared apples to oranges any better.
We read so many times about very unreliable performance. Sometimes it's ok and sometimes it's really, really bad.
Without any kind of time period to continuously run a benchmark in, this doesn't really help. For all we know, the first placed service was just having a good day and the last placed a very bad one.