
Amazon EC2 currently down. Affecting Heroku, Reddit, Others - fredoliveira
http://status.aws.amazon.com/?t
======
pg
Convenient that we're too backward to use AWS. That means everyone can at
least talk about it here when AWS is down.

~~~
toomuchtodo
Funny thing is, the last couple interviews I've had in Chicago and Silicon
Valley, I actually get points when explaining caution is necessary when using
AWS for production.

A magic bullet it isn't.

~~~
31reasons
Whats the alternative? Building your own is certainly not.

~~~
mrkurt
Old school colo/dedicated servers/etc. There's something delightfully simple
about only having to deal with "standard" hardware failures.

~~~
vidarh
Not to mention unless you have very unusual traffic patterns (spin up lots of
servers for short periods of time), colo/dedicated servers will usually be
vastly cheaper than EC2, especially because with a little bit of thought you
can get servers that are substantially better fit for your use.

E.g. I'm currently about to install a new 2U chassis in one of our racks. It
holds 4 independent servers each with with dual 6 core 2.6GHz Intel CPUs, 32GB
RAM and a SSD RAID subsystem that easily gives a 500MB/sec throughput.

Total leasing cost + cost of a half rack in that data centre + 100Mbps of
bandwidth is ~ $2500/month. Oh, and that leaves us with 20U of space for other
servers, so every additional one adds $1500/month for the next 7-8 or so of
them (when counting some space for switches and PDU's). Amortized cost of
putting 2U with 100Mbps in that data centra is more like $1700/month.

Amazon doesn't have anything remotely comparable in terms of performance. To
be charitable to EC2, at the low end we'd be looking at 4 x High Mem Quadruple
Extra Large instances + 4 x EBS volumes + bandwidth and end up in the $6k
region (adding the extra memory to our servers would cost us an extra
$100-$200/month in leasing cost, but we don't need it), but the EBS IO
capacity is simply nowhere near what we see from a local high end RAID setup
with high end SSD's, and disk IO is usually our limiting factor. More likely
we'd be looking at $8k-$10k to get anything comparable through a higher number
of smaller instances).

I get that developers like the apparent simplicity of deploying to AWS. But I
don't get companies that stick with it for their base load once they grow
enough that the cost overhead could easily fund a substantial ops team...
Handling spikes or bulk jobs that are needed now and again, sure. As it is,
our operations cost in man hours spent, for 20+ chassis across two colo's is
~$120k/year. $10k/month or $500/per chassis. So consider our fully loaded cost
per box at ~$2200k/month for quad-server chassis of the level mentioned above
with reasonably full racks. Lets say $2500 again to be charitable to EC2...

This is with operational support far beyond what Amazon provides, as it
includes time from me and other members of staff that knows the specifics of
our applications, handles backups, handles configuration and deployment etc.

I've so far not worked on anything where I could justify the cost of EC2 for
production use for base load, and I don't think that'll change anytime soon...

~~~
cstejerean
If disk performance is important you can also take a look at the High IO
instances, which give you 2x 1TB SSDs, 60GB of RAM and 35 ECUs across 16
virtual cores. At 24x7 for 3 years you end up with ~$656/mo per instance, plus
whatever you would need for bandwidth. By the time you fill up an entire rack
it still ends up being slightly more expensive than your amortized 2U cost,
but you also don't need to scale it up in 2U increments.

------
diego
The N. Virginia datacenter has been historically unreliable. I moved my
personal projects to the West Coast (Oregon and N. California) and I have seen
no significant issues in the past year.

N. Virginia is both cheaper and closer to the center of mass of the developed
world. I'm surprised Amazon hasn't managed to make it more reliable.

~~~
acangiano
Realistically, it's at least in part because everyone defaults to the East
region. So it's the most crowded and demanding on the system.

~~~
jey
Yeah, it's got to be much larger than the other regions, so it makes sense
that we see more errors. Since error_rate = machines * error_rate_per_machine.

~~~
nasmorn
The whole region is down, you just calculated the chance of at least one
machine having an error.

~~~
jey
No, I calculated the error rate for the region. If us-east-1 has 5 times the
machines (or availability zones, or routers, or EBS backplanes, or other
thing-that-can-fail) as us-west-1, we would expect to see us-east-1 have each
type of error occur about 5 times as often as us-west-1.

------
btilly
As <https://twitter.com/DEVOPS_BORAT> says, _At conference you can able tell
cloud devops by they are always leave dinner for respond to pager._

Also, _What is happen in cloud is stay in cloud because nobody can able
reproduce outside of cloud._

(And many other relevant quotes.)

~~~
eloisant
To be fair, AWS downtime always make the news because they affect a lot of
majors websites, but that doesn't mean an average sysadmin (or devops,
whatever) would do better in term of uptime with his own bay and his toys.

~~~
toomuchtodo
When our gear is down, we can actually get into the datacenter to fix it.

What do you do when Amazon is down other than sweat?

~~~
darrencauthon
There's not much sweating to do, as it always comes up relatively quickly.

~~~
toomuchtodo
How long was Amazon AWS "degraded" today?

~~~
acdha
2 minutes if you checked the "multi-AZ" box on your RDS instances or ELBs.

------
jasonkester
For added fun, their EC2 console is down. I got this for a while:

    
    
      <html><body><b>Http/1.1 Service Unavailable</b></body> </html>
    

... then an empty console saying "loading" for the last 20 minutes. Then
recently it upgraded to saying "Request limit exceeded." in place of the
loading message (because hey, I'd refreshed the page four times over the
course of 20 minutes).

On the upside, their status page shows all green lights.

~~~
hornbaker
They've acknowledged that at <http://status.aws.amazon.com/> for awhile now
with a tiny "i" status icon (I can't load my instances pane):

12:07 PM PDT We are experiencing elevated error rates with the EC2 Management
Console.

~~~
mrb
Amazon misrepresents reality on this status page!

They have standardized icons to represent various levels of issues (orange =
perf issue, red = disruption). But they don't even use them. Instead they add
[i] to the green icons to indicate perf issues (Amazon Elastic Compute Cloud -
N. Virginia) and disruptions of service (Amazon Relational Database Service -
N. Virginia).

Maybe this status page is controlled by marketing bozos who want to pretend
the situation is not so bad.

------
jmvoodoo
All of our EC2 hosts appear to be functioning fine, but they can't connect to
their RDS instance which renders our app useless. If you scroll down the page
you'll also see that RDS instances are having connectivity issues. Not sure if
it's related but for RDS users the impact is far worse.

EDIT: We are also using multi-AZ RDS, so either Amazon's claims for multi-AZ
are bs, or their claims that this is only impacting a single zone is bs.

~~~
Banduin
Our multi-zoned RDS instance was able to fail over to another zone with
minimal downtime. It took about 2 minutes.

~~~
jmvoodoo
Lucky you? :)

------
ojiikun
Per the linked dashboard, some instances in a single AZ in a single Region are
having storage issues. Calling EC2 "down" is a bit dramatic, provided AMZN are
being sincere with their status reports. Any system that can competently fail
over to another AZ will be unaffected.

~~~
nowarninglabel
I would agree with you, but Amazon is just downright dishonest in their
reports, which makes me sad, because I love Amazon. Go look at the past
reports, they've never shown a red market, only "degraded performance" even
when services for multiple availability zones went down at the same time due
to their power outage (so had you architected to multiple AZs you were still
fucked). When they have a single AZ go down, they won't even give it a yellow
marker on the status page, they'll just put a footnote on a green marker. It
makes their status dashboard pretty much useless for at a glance checking (why
even have colors if they don't mean anything?)

Read their report from the major outage earlier this year, they start out by
saying "elevated error rates", when many services were in fact down, and it
wasn't until hours later they finally admitted to having an issue that
affected more than just one availability zone.

From Forbes: ”We are investigating elevated errors rates for APIs in the US-
EAST-1 (Northern Virginia) region, as well as connectivity issues to instances
in a single availability zone.” By 11:49 EST, it reported that, ”Power has
been restored to the impacted Availability Zone and we are working to bring
impacted instances and volumes back online.” But by 12:20 EST the outage
continued, “We are continuing to work to bring the instances and volumes back
online. In addition, EC2 and EBS APIs are currently experiencing elevated
error rates.” At 12:54 AM EST, AWS reported that “EC2 and EBS APIs are once
again operating normally. We are continuing to recover impacted instances and
volumes.”

~~~
kalid
It's like grade inflation. You can never give out an F (Mr. Admissions
officer, are you so bad at your job that you would admit such an unqualified
student?), so a Gentleman's C is handed around. In Amazon's case, it's a
gentleman's B+ (green, with an info icon).

A: fine A-: problems B+: servers are on fire

I really like Amazon as a company, use a lot of their services, but this is
dishonest.

------
run4yourlives
Heh, the cynical side of me would like to point out that this is a great way
to get people to stop talking about wiping a user's kindle. :-)

~~~
macrael
This kind of thinking is poisonous. I know it's in good fun and it's fun to
look for connections in things, but it is actually preposterous to think that
Amazon would purposely disrupt wide swaths of highly paying customers for much
of a day to bury one story about bad customer relations. My guess is there are
a lot of people working very hard to try and solve this problem right now,
let's not belittle their efforts because of a conspiracy, let's belittle their
efforts because of bad systems design.

~~~
Angostura
It was a joke. I made the same joke earlier today. No-one is seriously going
to believe this.

~~~
xenophanes
He knows you were joking. It's a bad joke. "It's a joke" is not a magic bullet
that means you can do no wrong.

Poisonous ideas spread as jokes. That is one of the ways they spread. A person
thinking well about the issue wouldn't find the joke funny because it doesn't
make sense. The joke relies on some poisonous, bad thinking to be understood.
It has bad assumptions, and a bad way of looking at the world, built in.

~~~
Angostura
Let me explain to you why it is a funny joke. It is funny because it involves
Amazon undertaking massive technical measures, with huge reputational damage
in order to try to kill a story which is primarily not spreading via Amazon-
hosted sites anyway.

It's akin to a man with athlete's foot deciding to remedy it by discharging a
shotgun into his leg.

~~~
xenophanes
It's easy to understand shooting a leg with a shotgun. That's a simple thing.

The Amazon thing in question is far more complicated, and far harder to
understand.

Thinking they are "akin" is a mistake. It shows you're thinking about it wrong
and failing to recognize how completely different they are.

One isn't going to confuse anyone or be misunderstood, the other will confuse
most of the population and be misunderstood by most people.

One, if someone misunderstood, only involves one individual being an idiot.
The other involves a large company being evil and thus can help feed
conspiracy theories.

I'm not sure if you are aware of the difficulty of Amazon doing this. Suppose
Jeff Bezos wants to do it. He can't simply order people to do it because they
will refuse and leak it to the media and he'll look really bad and then he'll
definitely have to make sure to try super hard for there to be no outages
anytime soon.

Shooting yourself in the foot is stupid but easy. Doing this is stupid and
essentially impossible. To think it's possible requires thinking that Amazon
has a culture of unthinking obedience, or has an evil culture that all new
hires are told about and don't leak to the media. Totally different.

Casually talking about impossible, evil conspiracies by big business, as if
they are even _possible_ , is a serious slander against those businesses,
capitalism, and logic. Slandering a bunch of really good things -- especially
ones that approximately 50% of US voters want to harm -- and then saying "it's
just a joke, it's funny" is bad.

~~~
EternalLight
Casually joking about impossible, evil conspiracies by big business on the
other hand is something completely different and also funny.

No one will believe its related and its certainly not slander to joke about
it. Also you might want to leave the political opinions out of hacker news...
there is no 50% of US who dislikes those things, they only have different
ideas about how to support it.

------
incision
Unfortunate.

I have to deal with a number of folks who will be overjoyed to read this news
when their tech cartel vendor of choice forwards it this evening.

There's a huge contingent of currently endangered infrastructure folks (and
vendors who feed off them) out there who throw a party every time AWS has a
visible outage.

~~~
rdl
AWS sucking at availability (and especially specific parts like EBS, and then
services built on top of EBS and on top of AWS) doesn't mean the correct
option is to mine your own sand and dig up your own coal to run servers in
your own datacenters in the basement.

Even if you're totally sold on the cloud, you can still have a requirement
that things be transparent all the way down. AWS is one of the least
transparent hosting options around.

If you're a customer of a regular colo, or even a managed hosting provider
themselves based at a colo, it's pretty easy to dig into how the
infrastructure is set up, identify areas where you need your own redundancy,
etc. Essentially impossible within AWS -- there is no reason intelligible to
me that ELB in multiple AZs should depend on EBS in a single AZ, but that's
how they have it set up.

------
benwerd
This would be a great time to post a guide to architecting systems for
failover using AWS. Anyone got a great guide?

~~~
jedberg
We (Netflix) have done a bunch of presentations on it which are on our
slideshare page and across the internet.

After this issue is over I can give a longer answer. In short, we've just
evacuated the affected zone and are mostly recovered.

~~~
typicalrunt
I'm guessing your Chaos Gorilla helped to harden your architecture against
this threat.

Since you've mostly recovered, how did your system do? Are there side-cases
that Chaos Gorilla didn't touch?

~~~
ohashi
EDIT: I WAS WRONG. Chaos Monkey and Chaos Gorilla both exist and simulate
different forms of chaos.

~~~
zackzackzack
"Create More Failures

Currently, Netflix uses a service called "Chaos Monkey" to simulate service
failure. Basically, Chaos Monkey is a service that kills other services. We
run this service because we want engineering teams to be used to a constant
level of failure in the cloud. Services should automatically recover without
any manual intervention. We don't however, simulate what happens when an
entire AZ goes down and therefore we haven't engineered our systems to
automatically deal with those sorts of failures. Internally we are having
discussions about doing that and people are already starting to call this
service "Chaos Gorilla"."

<http://techblog.netflix.com/2011_04_01_archive.html>

~~~
ohashi
My apologies! I was wrong.

------
dave1619
Our app is down because it's hosted on Heroku and it's frustrating because it
seems like N Virginia is the least reliable Amazon datacenter. Every year it
seems to go down a couple times for at least a couple hours.

Heroku should offer a choice between N Virginia and Oregon hosting (I think
they're almost comparable in price nowadays). That way people who want more
uptime/reliability can choose Oregon. Sure it will be further from Europe (but
then it will be closer to Asia) and people can make that choice on their own.

But basing an entire hosting service on N Virginia doesn't make sense anymore,
considering the history of major downtime in that region.

~~~
nlh
Or better yet, Heroku should offer an add-on "instant failover" service that,
for a premium of course, offers a package for multi-site (or, knowing they're
100% AWS, multi-datacenter) deployment with all of the best practices, etc.
Seems like a logical next step for them (or a competitor) given the recent
spate of outages.

------
kellyhclay
Can't update my list fast enough, but other major services experiencing
problems are Netflix and Pinterest. Lots of other (smaller) sites are starting
to fail too.

[http://www.forbes.com/sites/kellyclay/2012/10/22/amazon-
aws-...](http://www.forbes.com/sites/kellyclay/2012/10/22/amazon-aws-goes-
down-again-takes-reddit-with-it/)

------
rdl
Why the fuck do the two most critical services (ELB and Console) have depends
on their historically most unreliable pile of shit (EBS)?

~~~
taligent
Seriously THIS has to be addressed.

I can tolerate EC2/EBS going down but why on earth is ELB/Console always going
down at the same time ?

------
fredoliveira
One of the most frustrating issues here is that we have to deal with Amazon's
status page for information. It's a complex page, divided by continent instead
of region, which means at least 5 or 6 clicks to figure out progress. They
should learn from these issues about how people want to be informed - to date,
they haven't. Also, they have a twitter account, which would be the perfect
fit to keep everyone up-to-date with what's going on; to at least show a human
side to these issues. Alas, they're not updating that either.

I've been working with AWS since early 2006 when they first launched - I was
lucky to be granted a VIP invite to try out EC2 before everyone else, and
ended up launching the first public app on EC2. This might be the first time
when frustration has overcome my love for these guys.

------
orourkek
It's degraded performance for some EBS volumes in a single availability zone -
isn't this title a bit sensationalistic?

~~~
thomaslutz
No. Amazon just understates the issues. Reddit, Heroku etc. all have problems.

~~~
orourkek
Even if they do understate the issue, It's limited to US-EAST 1, and is an EBS
issue. Saying that EC2 is "down" because of this is totally off the mark -
I've got dozens of EBS volmes in EAST 1 that are unaffected, plus all of the
other zones that are operating normally...

~~~
thomaslutz
EC2 instances are affected by the EBS issues. But you're right, it's not
correct that EC2 is "down".

~~~
onenine
Not all instances are EBS backed.

------
gtaylor
Does anyone know if this really is just one AZ? Seems like an awful lot of
larger sites are down. I'd expect at least some of them to be multi-AZ by now.

~~~
rynop
I'm multi-az and am being impacted. I also use multi-az RDS adn its being
impacted. So calling BS on the 1 az impact.

~~~
ejdyksen
One of our Multi-AZ RDS instances failed over successfully. It was down for
about 5-10 minutes.

------
jeaguilar
This is affecting more than a single Availability Zone, but probably for
reasons that have been seen before. One reason might be that an EBS failure in
one AZ triggers a spike in EBS activity in other AZs which overwhelms EBS. (I
believe this is what happened in April 2011).

Does anybody have any experience with migrating to Oregon or N. California in
terms of speed and latency?

------
ceejayoz
My dashboard says "Degraded EBS performance in a single Availability Zone". It
then lists each of the five zones as "Availability zone is operating
normally." <http://cl.ly/image/202F3B0I371g>

~~~
dirtae
I was seeing that too, but it looks like Amazon has now updated the
availability zone status. When I run ec2-describe-availability-zones from the
command line, it's telling me that us-east-1b is impaired. (Availability zone
mapping is different for each account, so my 1b may be some other availability
zone for you.)

------
blhack
I'm sorry, am I in a time machine? Didn't this exact thing happen last year,
and didn't these exact people explain exactly how they were going to make sure
it never happens again?

What the hell is happening here?

------
d0m
Ironically, on the first hacking health event, our server was running heroku..
and heroku went down. 3 months later, we have the second hacking health in
Toronto and all AWS is going down.

------
1SaltwaterC
ELB is also fucked. Seen that nobody mentions it. Some of our load balancers
are completely unreachable.

Just one multi-AZ RDS instance claimed the automatic failover. However, the
200+ alerts over the automatic failover due to internal DNS changes to point
to the new master shows that things aren't as easily described by the RDS DB
Events log.

Some instances reported high disk I/O (EBS backed) via the New Relic agent
(the console still has some issues).

So far, this is what I see from my side.

------
tyw
I reacted quickly enough to get a new instance spun up and added it to my
site's load balancer, but the load balancer is failing to recognize the new
instance and send traffic to it... yay. Console is totally unresponsive on the
load balancer's "instances" tab. If I point my browser at the ec2 public DNS
of the new instance it seems to be running just fine. So much for the load
balancer being helpful.

------
lgleason
Now is a good time to stroke your dev pair :)

<http://news.ycombinator.com/item?id=4680178>

------
cupcake_death
Maybe I'm being daft. Why does this sh*t never seem to affect www.amazon.com

~~~
mvelie
Amazon doesn't run their site off the same set of EC2 servers and systems that
everyone else does. They just used their experience to build it.

~~~
yap
Amazon runs all of the Amazon.com web servers on EC2 hosts (and has since
2010:
[https://www.youtube.com/watch?feature=player_detailpage&...](https://www.youtube.com/watch?feature=player_detailpage&v=dxk8b9rSKOo#t=393s)).
They've just made sure that they have enough hosts in each of the other AZs to
withstand an outage.

------
hornbaker
Update from <http://status.aws.amazon.com/>:

2:20 PM PDT We've now restored performance for about half of the volumes that
experienced issues. Instances that were attached to these recovered volumes
are recovering. We're continuing to work on restoring availability and
performance for the volumes that are still degraded.

We also want to add some detail around what customers using ELB may have
experienced. Customers with ELBs running in only the affected Availability
Zone may be experiencing elevated error rates and customers may not be able to
create new ELBs in the affected Availability Zone. For customers with multi-AZ
ELBs, traffic was shifted away from the affected Availability Zone early in
this event and they should not be seeing impact at this time.

~~~
nkohari
This is infuriating. Our instances aren't ELB-backed, so we have an entire AZ
sitting idle while the rest of our instances are overwhelmed with traffic. Why
did they make this decision for us?

I'm normally an Amazon apologist when outages happen, but this is absolutely
ridiculous.

------
rsync
This reminds me of an ad campaign we sometimes run on reddit:

[http://www.reddit.com/comments/hg9oa/your_platform_is_on_aws...](http://www.reddit.com/comments/hg9oa/your_platform_is_on_aws_and_your_offsite_backups/)

... which you'll be able to click as soon as reddit is back up :)

------
gorkemcetin
Countly mobile analytics platform servers are also affected from this issue
(<http://count.ly>). Waiting and waiting. :)

@caseysoftware thanks for the links, we now have something to read and
implement this week.

------
patrickgzill
You have a virtualized platform, on top of which are many pieces like load-
balancing, EBS, RDS, the control-plane itself, etc.

You have burstable network connections which by their nature, will have hard
limits (you can't burst above 10Gbps on a 10Gbps pipe, for example; even
assuming the host machine is connected to a 10Gbps port).

Burstable (meaning quite frankly, over-provisioned) disk and CPU resources.

And if any piece fails, you may well have downtime...

It is always surprising to me, that people feel that layering complexity upon
complexity, will result in greater reliability.

------
blaze33
Hosted on heroku with a hosted heroku postgres dev instance, I observed a drop
in the db response time :

\- from a consistent average of 10ms over the last week

\- to a new consistent average of ~2.5ms

beetwen 17:31 and 17:35 UTC. AWS started to report the current issue at 17:38
UTC. My app then experienced some intermittent issues (reported by newrelic
pinging it every 30s). Don't know if it's related, could it be some sort of hw
upgrade that went wrong ?

I did a push affecting my most used queries but that was one and a half hour
sooner, at 15:56 UTC, so probably unrelated.

------
mattbillenstein
I thought at least Reddit had learned the "don't use EBS" rule from past
outages - I was bitten by the April/2011 outage and switched everything over
to instance storage shortly thereafter.

For most applications I think architecting EBS out should be straightforward -
instance storage doesn't put a huge single-point-of-failure in the middle of
your app if you're doing replication, failover, and backups properly.

And EBS seems to be the biggest component of the recent AWS failures upon
which they've built a lot of their other systems.

------
Naushad
The Console itself is down, <html><body><b>Http/1.1 Service
Unavailable</b></body> </html>

Its high time AWS does something now.

------
bdcravens
Noticed that Rackspace was down for a bit - I presumed they're getting
hammered on their WWW as people are considering an infrastructure switch?

------
thehigherlife
I jokingly said the internet was broken this morning as I've been having weird
connectivity issues all morning. Funny that I was sort of right.

------
lucb1e
Dropbox runs on S3, they are unaffected? And Heroku's website seems okay. Only
reddit is down. That's because of this?

------
jasonlingx
I am frankly surprised so many still use EC2 considering how frequently it
breaks. It's not cheap, so the only reasons to use it would be reliability or
scale right? Why not just get lots of boxes at Hetzner and OVH (40 EUR a month
for 32GB of RAM and 4x2x3.4GHz cores) and scale up / redundansize that way?

------
jaysonelliot
I suppose this explains why Airbnb is down now, as well:
<http://aws.amazon.com/solutions/case-studies/airbnb/>

I was just in the middle of booking a stay in a Palo Alto startup embassy for
this week, too!

~~~
mikeevans
They've been down for several hours actually. I tried to get to their site
this morning to check out the jedburg talk, and got the "We're on the case"
message.

------
traderd65
Building infrastructure on top of Amazon with global replication across
multiple availability zones can sidestep such failures and guarantee
uninterrupted operations to customers - www.pubnub.com is unaffected by AWS
being down.

------
foxylad
Not wanting to shill for Google, but we've been very happy with Appengine's
reliability - particularly since moving to the HR datastore. Have other
Appengine users had any significant downtime apart from MS datastore issues?

~~~
valhallarecords
What site are you running on App Engine?

------
robbiet480
I made this to express my outrage and hope to help others easily express
theirs too! <http://robbiet480.github.com/StopLyingToUsAmazon/>

------
jaequery
isn't it about time they supported a multi-regional failover system?

------
LaSombra
Coursera is also down.

------
jbarham
FWIW here's Google App Engine status page:
<http://code.google.com/status/appengine>

------
lgsilver
Two of our four EC2 servers are down. Both have RDS connections and EBS
storage. Looks like the RDS connection has something to do with it...

~~~
spartango
RDS is built on EBS, so if EBS has problems you can bet that RDS will feel the
effects.

------
johnmurray_io
Here comes another round of Hacker News posts about how you should never host
your app in just one availability zone...

------
j45
How does Twilio manage to stay up when AWS goes down, and how much of that can
the average developer reasonably do?

~~~
driverdan
See Keith's (of Twilio) response here:
<http://news.ycombinator.com/item?id=4684766>

~~~
j45
Nice, thanks!

------
pcorsaro
My site is currently down because of this.

------
potomak
This is why you should think about a multi region architecture to prevent
these types of service interruptions.

------
justplay
i read in my book that amazon is the best cloud service provider and seeing
this thing can't really prove it. I am still confused which cloud service i
use ? Amazon is only one which Indians are currently rely on ,since latency is
bit smaller for amazon servers as compared to other providers.

------
orph
Adding insult to injury: when trying to view instances in the console I now
only see "Rate limit exceeded".

------
gavinlynch
Disappointing, but I guess not totally unexpected. Our prod site was down for
a bit with many others.

------
gourneau
These semi-regular outages are why I am going to use the Google Cloud Platform
for my next project.

------
timmythebest
a ton of other stuff is down too... www.reddit.com, netflix,
www.freewebsite.com and a bunch more

------
playhard
As of now , I don't have a problem with my site. Elastic beanstalk with ec2
small instances.

------
sigkill
Isn't it worrying that there are so many services/sites that are so dependent
on Amazon EC?

------
halis
Thanks I have a friend whose site is down from this and this helps shed some
light on it.

------
rynop
EIP remapping is all messed up as well. api errors. console for it does not
work either.

------
Revex
I just recently signed up for a EC2 (3 weeks ago) and my small site is also
down. :-(

------
zenwheels
Perfectly timed with our product launch, now our site is down too -
www.zenwheels.com

------
fsckin
My instances just came back up.

------
nell
The tech industry is manufacturing its version of "Too big to fail" entity.

------
johansch
The emperor has no clothes!

------
Lashawndazqd
I wonder if this is going to be an attack or hardware/software failure..

------
Yrlec
Does this just affect normal EBS or EBS with provisioned IOPS as well?

------
gailees
At least HN isn't down! :D

------
espeed
When is Google Compute Engine supposed to be open to everyone?

~~~
quotemstr
There's Azure.

------
stinky613
Somewhere Steve Wozniak is saying "I told you so"

------
bas
Some of our sites are affected. Good times!

------
mtgx
Time to switch to Google's Compute Engine.

------
briandear
I fixed EC2 using this one weird trick..

------
zenwheels
Woot, Back Online www.zenwheels.com

------
losvedir
Sigh, my heroku site is down.

------
Latricerzj
Single point of failure?

------
GigabyteCoin
It's back up now.

------
interro
just checked my site and it is up now!

------
cryptoz


------
taigeair
back up :)

~~~
SquareWheel
As far as I can see Reddit is still down.

------
Ramonaxvh
This is the kind of stuff I think about when I hear people talk about the
cloud and promise that downtime is a thing of the past.

Cloud hosting is not drastically different from any other type of service and
is still vulnerable to the same problems.

~~~
gfodor
If you go through the pains of architecting your system to span multiple AZs,
or you avoid using EBS, then you probably dodge most of the EC2 outages.
(Remains to be seen if that is the case here.)

That said, I don't think most people think using the cloud means that downtime
is a thing of the past. I think the more attractive proposition is when
hardware breaks, or meteors hit the datacenter, etc, it is _their_ problem,
not yours. You still have to deal with software-level operations, but
hardware-level operations is effectively outsourced. The question is if you
think you can do a better job than Amazon -- some companies think they can,
most startups know they can't.

~~~
w_t_payne
Yeah. Even with this, they still do better than I would. My record:
misconfigured air-conditioning unit alarm leading to servers being baked at
high temperature over a weekend, leading to much wailing and gnashing of
teeth. I now know to be really careful to set up air conditioning units
properly, but what other lessons am I still waiting to learn? The main lesson
that I took from this is that I should stick to what I am good at: cutting
code & chewing data. :-)

~~~
gfodor
Yeah this is another important point. Part of the cost of AWS is also a bit of
an insurance policy against you physically breaking your servers :)

------
Cordiapxq
Workplace productivity is going to skyrocket today

------
gailees
WAT!?

------
Codhisattva
Worst infrastructure ever.

------
valhallarecords
This is why I trust Google's data centers
<http://www.google.com/about/datacenters/gallery/#/>

~~~
taligent
HN really needs to hold a "dumbest comments of the year" award.

This has to be a strong contender.

~~~
JackWebbHeller
HN really needs to hold a "snarkiest response of the year" award.

~~~
pardner
Snap!

------
lr
pg, or someone else from HN... Could you please edit this title for accuracy?
Maybe, "Poorly designed sites taken out because of problems in one Amazon
availability zone."

~~~
Terretta
That's editorializing.

For many sites, a single server in a single zone (e.g., a non redundant
server, an instance, a slice, a VM, whatever) _is the right decision_ for ROI.

For many sites, the money spent on redundancy could be better spent on, say,
Google Adwords, until they're big enough that a couple hours downtime has
_irreplaceable_ costs higher than the added costs of redundancy (dev, hosting,
admin) for a year.

~~~
lr
Yes, it is editorializing. My point is, but I guess too subtle, that the
current link text is very much an editorial comment, especially since the
content at that link location has nothing to do with the sites mentioned in
the link text.

