Hacker News new | past | comments | ask | show | jobs | submit login
"Amazon's EBSs are a barrel of laughs in terms of performance and reliability" (reddit.com)
270 points by quilby on Mar 18, 2011 | hide | past | web | favorite | 152 comments

Having been at a startup that used hundreds of EC2 instances and EBS volumes I can assure you all that Amazon EBS performance is downright terrible and Amazon didn't inspire any confidence that they could solve it.

Even worse than the EBS performance is Amazon does not offer any shared storage solutions between EC2 instances. You have to cobble together your own shared storage using NFS and EBS volumes making it sucky to the Nth power.

EC2 is fine for Hadoop-style distributed work loads, and distributed data stores that can tolerate eventual consistency, that's all good. But for production database applications requiring constant and reliable performance, forget it.

My experience with the AWS RDS database product has been excellent.

We looked at RDS and had a call with some of their engineers, but we basically had our EC2 + raid'd EBS set up almost the same as they did, all best practices already being done.

Since RDS really is EC2 + EBS, they couldn't provide any real assurances it performed better than our own installation.

We ended up moving off of AWS as a whole. After several discussions about how we can continue to scale, the ultimate answer was without AWS.

EC2 is great for distributed stuff, but when need something that is heavy IO, for instance, it is a big problem. Scaling it ends up costing more to work around AWS's performance problems than to go elsewhere.

Did you guys find a better cloud service, or did you roll your own in a datacenter somewhere?

We went with a managed hosting provider who built us a private cluster. Basically a private cloud. But that way we could get a dedicated SAN and move our DB servers out to dedicated boxes with whatever disk configuration we desired.

Which provider? Can you email me? jedberg@reddit.com.

Interesting, I wonder if this becomes a trend as other startups and cloud customers discover these limitations and look for more custom solutions.

Yeah they have a few products (e.g. EMR, RDS) where they charge by the instance anyway so you're just paying them by the hour for the five minutes it would take you to set up the server once

Hmm. I think you underestimate the effort that is spent on those two. RDS has really good replication which is really hard to configure and set up yourself. And having configured Hadoop I know it takes more than 5 minutes :) Perhaps Whirr makes that easier. Also, EMR's Hadoop is tuned to work really well with S3, which you don't get with stock Hadoop (or even with Cloudera's).

The biggest issue I have with RDS is that I can't do a multi-master deployment to scale up writes. I've got a very write-heavy workload in my systems (roughly one write for every two reads).

You can't do multi-master with MySQL anyways, which until very recently has been the only "engine" RDS supports. Even if you could do multi-master, replication is still single threaded. You have to come up with your own sharding scheme. This is a limitation of MySQL not RDS.

You can't do multi-master with MySQL? News to me - we've been using circular replication between two servers, each a master and slave, for quite some time now.

Not possible with RDS, unfortunately, but works fine on two EC2 instances.

You can do it, but it won't scale your write load. In most cases it actually makes it worse due to the having to apply all the writes taken on the other master node in serial.

There is a difference between multi-master and circular replication. To me, Multi-master is that I can write to both masters at the same time, which implies there is a way to resolve conflicts. Databases like Cassandra (timestamps) and Riak (vector clocks) have this, MySQL does not. If you write to the same record on both masters bad shit happens and its very hard to sort out.

You can write to both masters at the same time in a MySQL multi-master circular replication setup. It's done via auto_increment_increment and auto_increment_offset configuration settings in my.cnf - each server generates autoincrement keys that are unique to that server.

This still doesn't protect you from UPDATE statements. I suppose you could pull it off if you could be absolutely certain that you were only CREATEing. You still have the problem that replication is single threaded, so this doesn't scale your writes beyond one thread.

Anything more you could share about it? We got scared off (at least for now) by the inability to replicate in from a self-hosted MySQL instance (for migration purposes), but would still love to hear more about your experiences.

We had consistent serious problems related to EBS for a several-month streak about a year ago, and I heard almost identical stories from other EC2 users around the same time. Instances with EBS attached would suddenly become completely unreachable via the network. Sometimes we had to terminate the instances, but usually we could revive them by detaching all (or most) of the EBS volumes, then reattaching and rebooting. Amazon seems to have fixed this problem, but I wouldn't be surprised if we suffered in the future the way reddit has.

Overall, EC2 is a very impressive offering, for which I commend Amazon. At times, I've been so frustrated that I'm ready to switch, but they fix things just quickly enough that I never quite get around to it. In the end, I'm willing to accept that what they're doing is hard, there will be mistakes, and it's worth suffering to get the flexibility and cost-effectiveness that EC2 offers.

This comment further down, supposedly from an Amazon employee, paints a grim picture for EBS: http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...

The perspectives of disgruntled employees have been known to be worse than reality, on occasion. Not definitively saying that's the case here, just saying.

I work at Amazon - a lot of teams are like this. They're stuck managing a woefully broken product and spend all of their time propping up the beast, leaving no capacity left for meaningful fixes (in these cases, meaningful fixes are always gigantic engineering projects).

The team develops a reputation internally for being glorified firefighting, and have trouble recruiting. More senior engineers eventually flee (having, well, choice in the matter) leaving a team heavy with junior talent with no seasoned gurus leading the way.

The company is also growing at ludicrous speed, and hiring is difficult. When the product is in such a painful state, attrition from the team is high, and with slow hiring you are barely countering attrition (exacerbating the junior talent problem), and not even close to growing the team to be in a position to take care of the problem for good.

I suspect this is an industry-wide problem though, and is hardly unique to this place.

This is quite disturbing to read about the place that tons of companies rely on to host their apps. As EC2 (and friends) grows, it's growing undoubtedly more complex. Combine that with many of the original/senior devs leaving to greener pastures and seems like as more time passes it becomes more risky to host with Amazon. Doesn't really inspire confidence ...

This is very accurate of many teams within Amazon. Also, I don't know the exact story of what is happening with EBS now, but I have heard an increasing number of horror stories about EBS.

I wonder if it's time for AWS to open a development office in a more startup-oriented city (i.e. SF). It might help them attract and retain more talent.

A.) Amazon isn't a startup. B.) A lot of Bay Area companies (Zynga, Facebook, Salesforce) are opening Seattle offices to take advantage of the Amazon and Microsoft talent pools. C.) They already have a Bay Area office. http://public.a2z.com/index.html I believe some core SimpleDB guys (Jim Larson) were based out of there.

As are often the perspectives of folks from another team, or trolls posing as employees. Definitely a grain of salt needed.

Even if true, I don't think that comment is fair to the EBS team. It seems a likely reason for that behaviour is they don't have enough resource to work on and test (presumably) large changes to fix the underlying issue while also fixing the tickets which crop up. The tone of the comment seems to imply they're foolishly overlooking the obvious solution.

This is totally true, but at the same time, given the success and scale of AWS, it's insane that they would not have the resources they need.

Word in the industry is that AWS is insanely profitable, so they've got not problem finding the money to hire the help.

My gripe with EBS is that hiccups in EBS cause my Linux instances to "lock up", consume 100% CPU and become unresponsive. AMZN is providing their own Linux distribution and drivers for the EBS devices so they can also attack this problem by patching the Linux kernel.

If you switch to their distribution, do you get any benefit now?

Compatibility with the platform is excellent and it's comfortable for anyone who's familiar with Red Hat or Fedora.

I'd compile my own web server stack, but it's as good a foundation to build on as any Linux distribution.

The hiring market is very competitive right now. It is hard for Amazon (or anyone) to hire good engineers. I think this is exacerbated by Amazon's lack of perks. My opinion is that if Amazon wants to hire the best of the best to work on AWS products they need to stop being so cheap ("frugal") and match the perks (and pay) other software companies offer.

This comment does not seem legitimate. There's nothing in it that would imply special knowledge of AWS or EBS.

Yeah, I call BS on this one. Amazon AWS is for certain use cases. It has never been a platform for all solutions. In fact, it has mostly been a platform for people to craft their own solutions. AWS is not a web hosting platform. If you want a web hosting platform, you create one the best you can from the tools available. This is the sort of response I would expect from an employee of AWS, not the one I saw in those comments. That or maybe the comment was from a customer service guy who isn't a developer.

I'm surprised Reddit ever though AWS would be a good platform to host on. You don't bitch about it, you create the best system you can and if something doesn't work, then you need to do more work. If you don't want to put in the work, then AWS is wrong for you. You don't see Heroku bitching about AWS, rather they made the thing work for them with great engineers.

RAIDing together multiple EBS volumes feels like a massive hack to me. I can't help but wonder if this compounds the problem at Amazon's end. If EBS performance is a problem, Amazon need to fix it. For example, if some way of tying together multiple EBS volumes is a reasonable way of working around the problem, then why aren't Amazon providing "high performance" EBS volumes which do that under the hood?

If I were faced with EBS performance issues, I would see this as a big red flag, consider EBS unsuitable for the application and avoid it, rather than carrying on with such a workaround.

One other huge downside of raiding EBS volumes is you can't use EBS's snapshotting features as you cannot guarantee a perfect sync (you could use LVM yourself however).

Honestly, since EBS vols are supposedly not tied to a single disk, the raiding should be done on Amazon's end. That it isn't is telling.

You have to snapshot at the system level anyway if you want a consistent snapshot: otherwise the filesystem (or your database) could have been reordering and delaying writes that end up not being part of the "consistent snapshot". This is simply not a RAID-specific issue, nor is it a problem with EBS (as it is generally easy to use LVM, xfs, and/or PostgreSQL to handle that part of the job).

This is something I've never quite understood. Best practice guides say you need to do a "flush all tables" in MySQL and then do a filesystem freeze (possible in XFS) before you can use a snapshot system like the ones built into EBS or LVM. If you don't, you apparently stand a good chance of getting an inconsistent snapshot, even if the snapshotting mechanism itself is (like EBS and LVM) "point in time" consistent.

Why is all this necessary? If the system (i.e. DB + FS + block device) are all working as they should, then once a commit returns, the data should be on disk. If it's not, you have no guarantee data that you thought was committed will still be there after a kernel panic or power outage.

In that case, no amount of xfs-freeze or table flushing during a snapshot is going to save you from the fact that your DB is one kernel panic away from losing what the rest of your system believed were committed transactions.

In the specific case of a database server that actually has correct fsync semantics that the user has not disabled for some crazy performance reason, you are correct. However, there are many use cases that people want consistent snapshots across, like "apt-get install", that do not use a write barrier for every atomic-feeling operation.

(In fact, with a good database solution, like PostgreSQL, the RAID issue of the parent post is also solved: put your write-ahead or checkpoint logs on a single device, as its linear writes will easily swamp network I/O on an EBS, and use RAID only for backend storage, where you need random I/O.)

This is one reason why Oracle is still the gold standard. when entering hot backup mode, which is what you do during a snapshot, it logs the FULL BLOCKS that are changed. Failures and inconsistencies can be replayed from the archive logs.

Of course this means you can quickly blow out your log archival , so it's meant to be a transitory mode:

PostgreSQL has this exact same feature.

True. However, for some cases where you don't mind losing some data due to a recovery process EBS snapshots are 'good enough'. Additionally, with a database like CouchDB with a 'crash only' design, it should work for some cases as well.

We use EBS snapshots as a last-resort backup. They're really convenient that way. We have a more robust backup system, but in the unlikely event that something goes wrong at least we have those snapshots, even if they're not perfect.


In fact there is a handy package called ec2-consistent-snapshot (https://launchpad.net/ec2-consistent-snapshot) that will manage this for you!

May be I'm missing something here; Why there's even a discussion about RAID at the EBS level? When Amamzon says, "Amazon EBS volumes are designed to be highly available and reliable" and if we have to talk about RAID then the issue is on Amazon's end

I think most people are doing RAID-0 to get more perf out of EBS volumes

It also seems that in 2008 adding mirroring also hurt performance. I'm going to dive into this tonight to see if things have changed at all with these benchmarks.

"His results show a single drive maxing out at just under 65MB/s, RAID 0 hitting the ceiling at 110MB/s, RAID 5 maxxing out about 60MB/s, and RAID 10 “F2″ at under 55MB/s."

Summary source: http://www.nevdull.com/2008/08/24/why-raid-10-doesnt-help-on...

Data source (google cache): http://webcache.googleusercontent.com/search?q=cache:Vscz-VX...

Yes. Except, anybody who is doing RAID-0 over an EBS volume for perf reasons is ASKING for trouble.

You need to do RAID-10. EBS volumes CAN and DO fail.

I wish I had more than one upvote for this: swimming against a trend like that never works out well.

Generally speaking this is the sort of thing that people warn about when they say "if you want to run on a cloud, you need to design your application for a cloud". Meaning, you can't presume your infrastructure is dedicated and carries similar MTBFs of (say) an enterprise hard drive, which upwards of 1 million hours.

Amazon provides plenty of opportunities to mitigate for this, such as providing multiple availability zones. Reddit, if you read the original blog post, wasn't designed for that - it was designed for a single data centre.

OTOH, the variability of EBS performance is true, and frustrating. If you do a RAID0 stripe across 4 drives, you can expect around sustained 100 MB/sec in performance modulo hiccups that can bring it down by a factor of 5. On a compute cluster instance (cc1.4xlarge) it's more like up to 300 MB/sec if you go up to 8 drives, since they provision more network bandwidth and seem to be able to cordon it off better with a placement group.

> modulo hiccups that can bring it down by a factor of 5.

The comments on reddit indicated hiccups more on a factor of 10x and, sometimes, 100x.

Either way, the issue is that the more drives you add to your RAID0, the more often one of those drives experiences a "hiccup," and kills the performance of the entire volume.

It's not clear this was a single volume problem so much as an issue with one or more network switches in that availability zone (if you look at the AWS service health notes for that date).

Even in your own data centre, if your FC fabric goes wonky, your whole SAN is hosed.

Never fails: a cloud provider has issues with a specific cloud product, so clearly the cloud is an illusion that will crash down on you[1]. Any discussion about any cloud provider's product is obviously a chance to soapbox about the industry as a whole.

[1]: http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...

In the minds of many people, Amazon is the most well known, and respected cloud provider. Their outage is merely a reminder that the one big difference between cloud services and in-house services is, I can't control it. Of course the probability that Amazon will have better uptime than you is pretty high for most people, but you have no recourse when there is a problem.

We've been looking at moving some or all of our stuff to either Amazon EC2/EBS/S3 or Rackspace cloud hosting, and it has been interesting.

Amazon seems more flexible, since you buy block storage (EBS) independent of instances. If you have an application that needs a massive amount of data, but only a little RAM and CPU, you can do it.

Rackspace, on the other hand, ties storage to instances. If you only need the RAM and CPU of the smallest instance (256 MB RAM) but need more than the 10 GB of disk space that provides, you need to go for a bigger instance, and so you'll probably end up with a bigger base price than at Amazon.

On the other hand, the storage at Rackspace is actual RAID storage directly attached to the machine you instance is on, so it is going to totally kick Amazon's butt for performance. Also, at Amazon you pay for I/O (something like $0.10 per million operations).

Looking at our existing main database and its usage, at Amazon we'd be paying more just for the I/O than we now pay for colo and bandwidth for the servers we own (not just the database servers...our whole setup!).

The big lesson we've taken away from our investigation so far as that Amazon is different from Rackspace, and both are different from running your own servers. Each of these three has a different set of capabilities and constraints, and so a solution designed for one will probably not work well if you just try to map it isomorphically to one of the others. You don't migrate to the cloud--you re-architect and rewrite to the cloud.

If you're interested to see how sites perform on EC2 and Rackspace over time:



You're monitoring from AWS US-East it looks like, you'll want to mention that to give people some context around the latency numbers.

That true but I think what's more interesting is the number of incidents (timeouts, exceptions, and significant slowdowns).

I use the Rackspace cloud for a few Windows servers. The experience has been mostly positive, but they disappear for a few minutes each week, which is kind of troubling.

Disappear as in they crash and reboot, or disappear as in they're unpingable?

Gah, sorry for the late response, I just saw this. They just go unpingable for a few minutes. They seem fine when they return from purgatory.

It's always been at random times overnight, so far it hasn't happened when I know there's been load on the servers.

We were bitten by EBS' slowness at my company recently, when moving an existing project to AWS. You effectively can't get decent performance off of a single EBS volume with PostgreSQL; you need to set up 10 or so of them and make a software RAID to remove the bottleneck. It's a fairly large time commitment to build and maintain, but it's pretty fast and reliable once it's up and running (cases like the recent downtime notwithstanding).

Can anyone tell me if MySQL fares any better than Postgres on a single EBS volume? I wouldn't assume it does but I shouldn't be making assumptions.

MySQL does not fare any better on a single EBS volume. The issues with EBS are systematic. Similarly you have to raid several volumes together to see decent performance, and this is the recommended AWS solution.

Did you use Raid10? I would love to see a post on using postgresql with ec2/ebs -- how to setup raid, etc.

Orion Henry at Heroku wrote about this and described different software RAID configurations and the performance characteristics of each a while back:


Yes, but as a lowly developer, I have no idea how to set read-ahead buffers or change io schedulers.

Plus, that's a year old, would love to see some updated advice. You'd think Amazon would write more guides like this.

Well, that's really just "-setra" and other file system mounting options, and mdadm (Linux software RAID) configuration options. Yes, there's a little bit of a learning curve and pain to get things set up, but it's not completely out of reach.

Despite being relatively old, I think the advice and approach still holds. Clearly, EBS hasn't improved since then and the need to do this kind of striping over EBS volumes hasn't been obviated yet.

I found a benchmark from 2008 that details the problems with RAID10 and sourced it in a comment above [1]. These are just raw disk transfer numbers, though. I can only imagine how they would change as CPU usage/postgres load climbs. IIRC disk IO is network traffic and network traffic is CPU dependent, so as load increases, IO will suffer greatly.

[1] http://news.ycombinator.com/item?id=2341425

Build-out Script for Postgres/PostGIS with RAID 10 on Amazon EBS volumes: http://sproke.blogspot.com/2010/12/build-out-script-for-post...

I second that.

Did you do any performance tweaking to PostgreSQL with respect to EBS? You have an insanely deep write buffer and quite good random read performance with EBS, which is nothing like the disks people normally deploy PostgreSQL to.

I tuned the hell out of our big postgresql instance a year ago, but I'll be damned if I can remember the rational for every change. I have a list of all the changes from default, but I've long since forgotten/lost the reason for making them.

That being said, we get more bang for our buck by spreading our data across many small databases that don't need much tuning beyond upping the memory defaults. The EC2 cloud isn't great for the uber-server, but it's halfway decent for many small servers.

I've never understood how people can use EBS in production. The durability numbers they quote are bad and they wave their hands around about increased durability with snapshots, but never quantify what that means.

Hard drives are unreliable and they certainly don't fail independently of one another - but the independence of their failure is much more independent than EBS.

With physical dives and n-parity RAID you drastically reduce the rate of data loss. This is because although failures are often correlated, it's quite unlikely to have permenant failure of 3 drives out of a pool of 7 within 24 hours. It happens, but it is very rare.

With EBS, your 7 volumes might very well be on the same underlying RAID array. So you have no greater durability by building software RAID on top of that. If anything, it potentially decreases durability.

You could utilize snapshots to S3, but is that really a good solution? It seems that deploying onto EBS at any meaningful scale is a recipe for garunteed data-loss. Raid on physical disks isn't a great solution either, and there is no substitute for backups - but at least you can build a 9 disk RaidZ3 array that will experience pool failure so rarely that you can more safely worry about things like memory and data bus corruption.

The increased durability based on snapshots is actually quite simple, and they explain it in various places: if one of the drives in Amazon's RAID fails, they need to bring up a new disk to replace it in the array. When they being up new disks they typically can do this instantaneously, because they really just dynamically page fault the drive from your latest snapshot. However, all dirty data since the last snapshot will have to be copied from the other drive(s). This is a window of time during which your array is exposed to unrecoverable read errors losing data. The less dirty data you have, the smaller this window of time.

We (Cedexis) presented our findings on - How do EC2's East, West, EU & APAC zones compare: (pdf) http://www.cloudconnectevent.com/2011/presentations/free/76-...

If you would like to know more please send me an email: prakash [at] cedexis.com

You should post that to HN, if you haven't already. Possibly wrap a blog post around it.

Anybody care to comment on using EC2 with local (what Amazon calls ephemeral) storage and backup to S3? Seems to me the advantages are: it's cheaper and you avoid the performance and reliability problems with EBS. The disadvantages?

Using EBS has other features that are hard to overlook, such as snapshots and ability to quickly move your volumes to another instance when an instance failure happens, or if you needed to change the size of an instance (which you couldn't do directly until very recently).

All of your EC2 instances can disappear without warning and everything on the local storage is now gone forever.

That's the "backup to S3" part.

That's a fair point, but I don't think it holds up real well. What are the semantics? Do you block until everything is fully backed up on S3? Are you continuously taking database snapshots and forwarding them to S3? What happens if the backups start to fall further and further behind production?

What do you tell the hordes of angry redditors when the last thirty minutes of carefully (or angrily) composed comments vanish?

EBS-RAID0 is much faster for reads than local. Local is faster for writes.

this seems to contradict several comments here. "citation needed".

I run a database cluster with dozens of nodes on EC2. Small entries, lots of small IOPS.

From http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs..., "On a good day, an EBS can give you 7,000 seeks per second and on a not so good day will give you only 200."

The ephemeral store will never give you more than a hundred seeks per second. If you're seek-bound, then EBS, every time.

My experience has been that EBS handles concurrent loads better, ephemeral drives handle non-concurrent loads better.

Transferring 100gb+ of data on EBS (even with an 8x RAID) is a nightmare. Ephemeral drives, however, it's fairly fast.

Throw 100+ database connections at a few ephemeral drives (even in a RAID) and watch your web site slow to a crawl.

Lesson for startups: start in the cloud, grow your business, build your own cloud.

Never trust critical parts of your business to others.

Netflix seems to be the biggest counter case - grew their data centers and effectively gave up and moved it to AWS. I suspect the sweet spot is doing a bit of public and private cloud, adjusting how much is on one or the other based on costs, service levels and capacity requirement volatility.

Good advice but I'd argue there's one tweak to make that even better: start outside the cloud (say, just some Linux VM's from Linode or whatever), then only if you get enough real customer/visitor demand to warrant easy/virtual scaling, then move to a cloud provider. Needing a cloud/elastic hosting provider is a bit of a Maserati Problem. If you get to the point where you have to build/manage your own data centers (like Google, Amazon, Orbitz), you have a Fleet-of-Maseratis Problem.

I'll probably be downvoted for this but seems to me the root cause of this problem is Reddit's architectural decision to remain in a single availability zone. If it wasn't EBS it could have been some other issue related to the single AZ that could have brought the site down. Blaming EBS, particularly if you knew it to be a potential weakness in your architecture, seems like a deflection of responsibility.

Perhaps reddit could've mitigated some downtime with some cross-zone redundancy, but the underlying frustration is that Amazon does not provide a well behaved storage solution, which is a very critical infrastructure component for most web services.

Exactly. While Amazon clearly tries to make single-zone reliability as good as possible, I think they expect customers to use a multi-AZ setup if they expect true reliability.

Having been running a 200gb millions of transactions per day Postgres cluster on Amazon's EC2 cloud for two years now, I can attest to the fact that EBS performance and reliability SUCKS. It is our SINGLE biggest problem with EC2.

200gb really isn't all that big of a database. It shouldn't have to be this hard.

This very moment our team is restoring Postgres volumes because the EBS volumes our primary and secondary were on both failed simultaneously.

Were both in the same availability zone?

How is it that Amazon.com is so reliable if there are so many problems with their "cloud" products? Do they not use the same software to run their site?

If you understand the limitations of the various products you can build a VERY reliable service. The reddit assumption of a single datacenter and single technology to store that data was an engineering failure. They essentially didn't have a disaster recovery plan in place.

I'm sure reddit's engineers are as capable as any for producing a seemless disaster recovery plan, but the most common obstacle to implementing it is cost. Most web services choose the occasional risk of downtime in one data center instead of incurring the cost of being in two data centers at all times.

Yep. And there's that whole asymptotic cost/complexity curve where as you chase more 9's of perfection, your cost and complexity rises out of proportion to the value you're getting. At the end of the day, no matter how much we might like Reddit, it's still just a website with social discussion forums and link sharing, full of non-essential chatter and pictures of kitties. (Again, I love Reddit, don't get me wrong, but it's far from a Mission Critical resource for any business or person's life.) So achieving perfect reliability & performance is probably not worth the cost/pain.

I suspect it's because amazon.com has different performance requirements. For instance, I imagine the read/write balance is very different for amazon.com than for reddit.com.

Amazon.com is not hosted on EC2. It's entirely separate.

This isn't entirely true. Amazon.com uses EC2 in addition to dedicated servers.


"She's the director of IT services for the retail giant, although she has nothing to do with the main website operations"


Unless Amazon has managers named Jen Boden, Jen Boden is a Business and HR director, not IT director.

If I ran the tech at Amazon, I'd want to reuse as much otherwise internal tooling and software architecture and best practices between EC2 and core Amazon.com as possible. But, have physically separate machines and network zones. Maybe share some of same data center, of course, but that's as far as I'd take it, and even that sounds a little risky.

I was at the Cloud Connect conference last week. In a session on cloud performance Adrian Cockcroft (Netflix's Cloud Architect) spoke and said they do not use EBS for performance and reliability issues. They initially had some bad experiences with EBS and because of this decided to stick with ephemeral storage almost exclusively.

The guys from Reddit also spoke about their use of EC2. Apparently they are running entirely on m1 instances which suffer from notoriously poor EBS performance relative to m2 and cc1/cg1 instances.

What's the failure rate of EBS versus having direct access to physical disks? My guess is that at scale, it's probably similar.

Although you would hope that the storage components of AWS's cloud were highly reliable, I think the main benefit is not single instance reliability but being able to recover faster because of quickly available hardware.

I don't have solid numbers, just some experience using this. Ephemeral drives outright fail more often than EBS volumes, however, EBS volumes suffer performance degradation significantly more often than ephemeral drives. EBS volume performance is HIGHLY variable, at all times of day, no matter what load you throw at it. Ephemeral drives are very consistent most of the time.

Both types of drives CAN and DO fail, so RAID-10, fail over, and replication are a must have.

I firmly believe "the cloud" is a fad, unless for some reason you own and operate all the hardware yourself (ie. Google).

Like other technical fads, everyone will probably come back to servers they can reach out and touch when needed, sooner or later.

The cloud significantly lowers capital expenditure to get into an Internet-enabled business, which cultivates the very startup ecology that Y Combinator exists to leverage and support. Those teenagers who started the Facebook Pokemon game would have never had the resources to build a scalable solution with hardware that they own. (That is, unless Y Combinator paid a lot more money as part of participating. They might also be a bad example, because I remember that one of them had a successful sale...it's true for a lot of other ideas, so work with the example.) The cloud lowers the barrier of entry enough that good ideas can be explored and built, with very little financial risk to those getting into it.

This was the role of shared hosting in the past. Several years ago, everybody realized that having root is better. Now, instead of colocating two servers and negotiating transit and dealing with remote hands, you can spin up two Linodes for $40 and have enough power to build anything. Critical mass? Add three more. You're not waiting for a shipment of servers to the datacenter to handle a sudden load from a positive mention on HN.

Saying that the cloud is a fad and we should all own our gear does two things: (a) increases humanity's carbon footprint, since most organizations never utilize hardware to their full potential, and (b) guarantees that only those with significant capital to buy a fleet, a cage, and power will ever compete in the Internet space, which is where we were many years ago. It is very arguable that the cloud is progress, and everybody sitting on the sidelines calling it a "fad" is scared by it.

Jeremy Edberg of Reddit had a good comment later in that thread, to someone who paralleled the cloud to electricity generation:


What sucks is, my remarks really depend on what you define "cloud" as, which -- partially thanks to Microsoft television commercials -- is currently up in the air.

The cloud's real advantage is the ability to build out fast, but it is not cost. It is cheaper to build it yourself and run it yourself if you know exactly what you need, and have time to do so. If you don't, the cloud is cheaper.

So you're right that the cloud is great for startups. It is not so great for established stuff.

Oh, it's certainly about cost. When you talk about paying for your own transit, your own power, cage space, and remote hands, cloud providers can be significantly cheaper than owning the hardware. You also lose the administrative overhead of having to perform drive swaps when your units degrade -- it's just computing capacity that exists with a minimum of hassle to you. I think if you add up all of the variables, cloud can (and does) come out more cost-effective.

I think private clouds are fantastic for established stuff, and many companies use public clouds to their benefit as well.

I added this in an edit after you replied, but cloud is a term that is difficult to nail to the wall: my explanation to people that I like to run with is that the cloud is a way to think about your architecture.

Rather than have a DNS box, two Web servers, a DB box, and so on, then another server for every development environment, virtualizing the hardware makes a lot of sense. You get a lot more traction out of each U, and with a large number of of-the-shelf utilities, you can automate the hell out of that. Need a clean test environment to try an installation of your software? There are ways to accomplish that in minutes, and dispose of it and reuse the space. That to me is a cloud. Virtualization and automation on top of it. That's what Linode has been doing for nearly eight years now, so it's arguable that Linode pioneered the cloud space. In 2003, it was just called VPS hosting.

Integrating a public cloud and a private cloud makes a lot of sense, and a lot of established big-iron is taking this approach. Big players are realizing that the cloud makes a lot of sense, which we see with HP's announcement that they intend to enter the cloud market.

As the parent post mentions, it's also about moving your costs from a mix of capital and operating to 100% operating. This is one of the arguments that has motivated Netflix' move to AWS (and it makes sense for other SaaS): their costs scale more or less directly with their customer base, and thus revenue, with no up-front capital required.

See http://qconsf.com/dl/qcon-sanfran-2010/slides/AdrianCockcrof... for more.

You can do a lot of the things you describe here with VPS (Virtual Private Servers). You get root access, you don't manage hardware, you often receive some virtualization benefits (images, snapshots). Does that count as "Cloud Computing"?

Definitely. Another term for mostly the same thing, in my book. I talked about this here: http://news.ycombinator.com/item?id=2340734

VPS providers give you the tools, cloud providers give you the tools and a few finished products with less flexibility. You can use VPS as part of a cloud implementation, just like you can use dedicated servers as part of a cloud implementation, too.

If you start with the cloud, you best formulate an "exit plan".

Reddit for example doesn't seem to have one and seems quite stuck.

Reddit didn't start with the cloud, though. Very early, they were on dedicated and found it inflexible to their needs, and scaling a site with the eyeballs that Reddit has would have been very difficult with their first architecture.

The "exit plan" is, really, not marrying your entire architecture to one provider. Spreading the love gives you a bargaining chip and flexibility to see which provider will perform better for you in the long run, and allows you to see the strengths and weaknesses of each. Internet latency is pretty bad, though, so sharding an app across multiple providers can be a bit of a challenge.

We're not that stuck. We can be out of Amazon in a month if necessary. We very specifically don't use any of their "lock-in" services to make easier on our open source users, which has the side effect of not locking us in either.

Cloud = Marketing(VPS);

It's not quite that simple — there's another level of 'true cloud' platform services like GAE, Heroku, Force.com, etc. that really deliver on the promise.

PaaS = Productize(VPS);

> I firmly believe "the cloud" is a fad

You are wrong.

> everyone will probably come back to servers they can reach out and touch when needed, sooner or later.

No they won't, because most of us don't want to be managing hardware, ever.

If you've ever had to deal with the expense and overhead associated with running a business that has extensive production systems, you wouldn't say that. The cloud represents a huge decrease in the initial cost necessary to set up production systems, and it relieves businesses of all kinds of issues regarding long-term leases on equipment or depreciation / amortization of equipment. You don't have to worry about swapping out racks just because they've reached an arbitrary end-of-lease date. You don't have to worry about provisioning hardware months in advance to make sure it's available "if" you need it.

There are definitely hiccups, but I can't imagine many guys running an internet-heavy business going forward are seriously going to say "let's build out our own datacenter rather than solve the issues with the cloud" unless they're doing something really, really, specialized.

It's not a fad, it's shared services. Sharing comes at the cost of flexibility, which can be a pain in the butt.

Personally, if I'm going to be operating a large computing environment, I'd rather stick 80% of my workload in a cloud environment and pay someone to deal with utilities, buildings, hardware, etc.

The remaining 20% may require a "higher touch" setup at a colo or a facility that I control. The smaller I can make that 20%, the less I need to spend on setting up and maintaining infrastructure.

That is entirely wrong. With AWS, we've built a multi-AZ load balanced infrastructure for very little time and money. Getting an equivalent setup out of our own hardware would have been orders of magnitude more expensive and time consuming.

Can you please explain how you built a multi-AZ load balanced infrastructure, given that Amazon's ELB only load balances within a given AZ. I assume you used some external service. Would you mind providing the details. Thanks.

ELB load balances very easily across availability zones. In fact, that pretty much seems to be the setup they expect you to use. It doesn't allow you to balance across regions, though, which may be what you mean.

Ya, thanks for the response. I went and double checked what ELB does. I was looking for a way to load balance across regions. I know some services that can do it, but are quite expensive.

You know, back in the 70s, there was the concept of a "computer bureau", these would be some people who had a mainframe and you would rent time on it by the hour, so if you had a payroll run, or a simulation, or whatever, you would upload it to them via a modem (or courier them the punchcards!), run it there, download the results (or get them delivered printed out). Early BBSs and MUDs often ran in spare capacity on these mainframes.

There ain't nothin' new under the sun...

For a data set in the mere tens to hundreds of GB (in MongoDB, if anyone's curious), is there any reason I shouldn't conclude from this that I should use instance storage only (with multi-AZ replication and backups to S3, both of which I would be doing in any case)? Moderately slower recovery in the rare event of an instance failure seems better than the constant possibility of incurable killing performance degradation.

(Edit: I hadn't considered the possibility of somehow killing all my instances through human error. Ouch. That probably warrants one slave on EBS per AZ.)

I recently had an EBS volume lose data for no apparent reason. I'm not a heavy EC2 user at all - I was just doing some memory/cpu-heavy stuff that wouldn't fit in to RAM on my laptop and using EBS as a temporary store so I could transfer data using a cheap micro instance and only spin up the big expensive instances when everything was in place. I ended up downloading files on an m2.4xlarge because the files I had just downloaded to the EBS volume vanished.

Are you certain the data left the filesystem buffer and actually got acknowledged by EBS?

No; I'm very much a beginner when it comes to EC2. I unmounted the filesystem, detached the volume, then shut down the instance.

This seems too much of a coincidence.

We released a dropbox-like product to sync and the back-end is on EBS. Yesterday we saw two times when a device got filled to 7GB and as it got closer it became slower and slower and slower. We did not have any instrumentation/monitoring in place and we were immediately suspect it was something on our end.

We (wrongly?) assumed reliability and (decent) performance from AWS.

Being totally new to AWS, why does everyone skip right past using ZFS?

http://blogs.sun.com/marchamilton/entry/a_brilliant_argument... "Cloud Storage Will Be Limited By Drive Reliability, Bandwidth ... The key feature of ZFS enabling data integrity is the 256-bit checksum that protects your data."

ZFS will ensure that what was written to disk comes back to memory consistently, or with errors spotted. It wont ensure that the right thing was written to disk, or that the database IDs which were written leave your database relationships in a consistent state, etc.

ZFS will do nothing about this "More recently we also discovered that these disks will also frequently report that a disk transaction has been committed to hardware but are flat-out lying.", for instance, other than tell you the data you want isn't there to be read - like any filesystem would.

I love the idea behind EBS, a SAN makes life so much easier, but I too find that EBS glitches are the largest cause of unreliability in AWS.

I'm not immediately planning to move out of AWS, but the trouble with EBS has certainly got me thinking about other options and has made me much less inclined to make an increased commitment to AWS.

EBS is not a SAN which is largely the point being made in these comments and in the other HN article on reddit's post mortem.

Isn't EBS intended for stuff like Hadoop job temporary data used during processing?

This kind of complaint reminds me of people who buy a product that does A very well, but then they trash it in reviews for not doing B. It was never advertised as doing B, but you'd never know that from the complaining.

We used Amazon and got bad performance in the beginning too. It is bad when you pull files out of S3. By bad I mean the latency is high.

We tried GoGrid and they lost or crashed our server instance.

I've personally used Rackspace, so far so good, but I've only been doing development on it.

Why is reddit relying on only one cloud provider? AWS can/should do better but service providers of the size of reddit should be using mult-vendor set-ups for sure.

They did say in their original post-mortem that spreading the load among multiple availability zones has been on their todo list for a while. It has just taken longer than they expected with their limited engineering staff.

It probably has something to do with the group being very small. Sure they turn a lot of traffic, but there's only so much you can do with a group of their size on what I imagine is still a limited budget.

Sounds like a case of similar to safety systems at a nuclear plant. Not pressing until it is REALLY PRESSING! Its the usual dilemma, investing time/moey on something that most likely wont be needed versus adding that cool feature all the users will immediately see the benefit of. In a competitive environment, it isn't difficult to understand how they ended up on one vendor.

If a nuclear plant has problems, it can kill a lot of people, and wreck the lives of many others.

If reddit has problems, I suppose the worst that can happen is a cloud of toxic and poorly thought out comments is released on the internet.

So the tradeoffs they've made, in saving some money, are probably sensible.

> If reddit has problems, I suppose the worst that can happen is a cloud of toxic and poorly thought out comments is released on the internet.

Actually, that's what happens when reddit is working :)

Well, depending on prevailing conditions, they might be more widely dispersed rather than contained within the special "echo chamber" that reddit has built for that purpose.

Is a multi-provider setup common? I certainly think Reddit should be on multiple availability zones within AWS, but spanning multiple providers seems hugely more difficult.

On the comment itself, I have this: http://news.ycombinator.com/item?id=2339715

EMR is a mess too. The Amazon-blessed Pig is almost a year and 2 major releases behind, and the official EMR documentation seems to describe a version of EMR that doesn't even exist.

"Elastic" is AWS's claim to fame, but I am not seeing it.

Trying to resize an EMR cluster (which is half the point of having an EMR cluster instead of buying our own hardware) generates the cryptic error "Error: Cannot add instance groups to a master only job flow" that is not documented anywhere.

(Why would Amazon even implement a "master only job flow", which serves no purpose at all?)

The master only job flow is designed to let users play around with the instance and discover things without having to pay for a full cluster. A single node versus multi-node cluster is configured way differently and that is why you can expand a single node cluster. If you had started with a two node cluster you would have been able to expand it.

Also, if you want Pig you should complain about it vocally on the EMR forum. That is the best way to get them to listen to you.

The AWS business model is to sell shared hosting on commodity hardware. Cloud is a cool buzzword but it is still sharing hardware. Cheap, commodity hardware is the magic that lets you scale up so big and so fast for a highly accessible price.

But you're still sharing the same hardware as everyone else and its still just commodity hardware.

For what it's worth, it's not entirely accurate to say that you are always using shared hardware on AWS, at least for your servers. It depends on how you set up your environment.

Sharing hardware is an implementation detail. You could potentially build a cloud infrastructure where everyone has dedicated hardware. The whole point of cloud computing is that the implementation shouldn't matter to end uses.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact