Per the linked dashboard, some instances in a single AZ in a single Region are having storage issues. Calling EC2 "down" is a bit dramatic, provided AMZN are being sincere with their status reports. Any system that can competently fail over to another AZ will be unaffected.
I would agree with you, but Amazon is just downright dishonest in their reports, which makes me sad, because I love Amazon. Go look at the past reports, they've never shown a red market, only "degraded performance" even when services for multiple availability zones went down at the same time due to their power outage (so had you architected to multiple AZs you were still fucked). When they have a single AZ go down, they won't even give it a yellow marker on the status page, they'll just put a footnote on a green marker. It makes their status dashboard pretty much useless for at a glance checking (why even have colors if they don't mean anything?)
Read their report from the major outage earlier this year, they start out by saying "elevated error rates", when many services were in fact down, and it wasn't until hours later they finally admitted to having an issue that affected more than just one availability zone.
From Forbes:
”We are investigating elevated errors rates for APIs in the US-EAST-1 (Northern Virginia) region, as well as connectivity issues to instances in a single availability zone.” By 11:49 EST, it reported that, ”Power has been restored to the impacted Availability Zone and we are working to bring impacted instances and volumes back online.” But by 12:20 EST the outage continued, “We are continuing to work to bring the instances and volumes back online. In addition, EC2 and EBS APIs are currently experiencing elevated error rates.” At 12:54 AM EST, AWS reported that “EC2 and EBS APIs are once again operating normally. We are continuing to recover impacted instances and volumes.”
It's like grade inflation. You can never give out an F (Mr. Admissions officer, are you so bad at your job that you would admit such an unqualified student?), so a Gentleman's C is handed around. In Amazon's case, it's a gentleman's B+ (green, with an info icon).
A: fine
A-: problems
B+: servers are on fire
I really like Amazon as a company, use a lot of their services, but this is dishonest.
The last Netflix post mortem mentioned they had a bug in their configuration where they kept sending traffic to already down ELB instances, which was the cause of the last outage for them if I remember correctly.
Not necessarily, it could be some element of the Netflix architecture that due to their size and/or design trade-offs has taken longer / is harder to eliminate than it would be for others.
Other services, like Twilio, have come through several of these major problems with US-EAST generally unscathed while Netflix has had issues repeatedly.
According to a site which doesn't document what its reports are based on. Given that Netflix worked for me during that period, I'm suspicious that downrightnow might be using EBS somewhere.