
US-East AWS Connectivity Issues - fjordan
http://status.aws.amazon.com/
======
rbranson
This appears to be connectivity issues entirely to/from the Internet or other
EC2 regions from a single availability zone in us-east-1. The intra-AZ
networks within us-east-1 have remained available during the event. One of the
AZs we use was affected, but no external traffic flows to it. I noticed this
because an auto-scale group was trying to bring up instances inside of the
affected zone (our us-east-1a) and was unable to contact a server outside of
AWS.

~~~
cperciva
I'm definitely seeing issues in multiple AZs. It seems to be partly firewall-
related, however: I've seen cases where it's hard to get an initial SYN
through, but once a TCP connection is established it stays established.

~~~
fjordan
This, in addition to the increase in traffic we detected directly before,
smells of DOS. Also, it is Friday the 13th.

------
tomweingarten
Did anyone else notice a huge spike in incoming network traffic on their EC2
instance immediately before the outage? Roughly 9:55AM EST.

~~~
justinsb
Did it look like a ddos attack, or do you think something went wrong where you
were getting traffic meant for other EC2 nodes?

I'm not quite sure how you would tell the difference of course...

~~~
tomweingarten
We didn't get enough data to be able to determine that, but I'd be very
curious to hear if someone else did.

------
rschmitty
Does anyone know why in the world they display a green checkmark with a near
invisible little 'i' for this?

~~~
iota
There are 4 statuses.

Green checkmark (status 0)

Green checkmark with "info" badge (status 1)

Yellow triangle (status 2)

Red "do not enter" rectangle (status 3)

I suspect that status 0 indicates that they are investigating a problem with
the server, and it switches to status 1 once the problem has been confirmed.

This is also a good example of poor icon design...they aren't self-
explanatory, and so they should not be used.

~~~
cperciva
What happened to status 2?

~~~
iota
Good catch. Fixed!

------
jolan
Amazon is continuing the trend of announcing outages 30 minutes after they
start.

Just signed up for a support contract since the status page said everything
was fine.

~~~
colinbartlett
And by "announcing" you mean indicating everything is a-okay with the green
checkmark but putting a tiny footnote next to it.

------
frabcus
We (ScraperWiki) can still access some of our US East servers. From those, can
daisy chain SSH into the ones that are offline. Those servers can't see the
world, but are working fine and can see other EC2 instances.

~~~
devy
Hi, can you use port forwarding to get website up on those affected nodes?

------
jpea
I wonder if it extends beyond Amazon, since my gmail now doesn't pull anything
up after 2009, web or IMAP.

------
aquark
I'm getting external monitoring failures that are firing on and off, but have
no problem reaching the servers or the site.

Interestingly newrelic is reporting the site down at the same time it is
reporting a normal level of load on it.

------
joe010
I've recently moved some of our servers over to Digital Ocean, but I'm still
using AWS for DNS since their Route 53 weighted DNS with health checks work as
a basic load balancer for our needs. I'm seeing DNS health checks that point
at individual servers at Digital Ocean that are showing 0.91 for a status (1
being up and 0 being down. The alarms attached to the health checks keep
flipping from "alarm" to "ok" and causing tons of alerts. As of about 15
minutes ago all of my checks started holding steady back at a status of 1 (ok)
Good stuff :)

------
jd007
ELBs are also having problems. One of mine is reporting all instances out of
service (transient error), then all instances in service, intermittently. But
the ELB is never reachable (even when it reports all instances healthy and
up). All instances behind this one are reachable, up and running. US-East-1.

Some of our other instances are reachable but some are not, same as others
have been reporting.

------
sadris
Why does this never happen to AWS West? I should really get to migrating over
with 3 outages in the past 2 years on US East.

~~~
knodi
It does happen, you just never notice because you don't have instances in US
West.

------
brryant
There are definitely issues with network connectivity between AZs as well as
public internet connectivity.

------
jipumarino
I got into one of our machines that presented the connectivity issues from
another one which was still reachable. It had no external (curl
www.google.com) connectivity. Just two minutes ago it started resolving again.

------
ihaveajob
It looks ok now for us (appfluence.com), but even when it was down, our
website was still up, only the sync services went offline. And even then, they
were accessible from the web server...

------
NotDaveLane
It's region us-east-1c for now, at least from where I'm sitting... I have
instances in other us-east datacenters that are fine.

~~~
trevyn
Specific availability zones in a region are mapped per-account, so your
east-1c might be my east-1a:

"To ensure that resources are distributed across the Availability Zones for a
region, we independently map Availability Zones to identifiers for each
account. For example, your Availability Zone us-east-1a might not be the same
location as us-east-1a for another account. Note that there's no way for you
to coordinate Availability Zones between accounts."

[http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-
reg...](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-
availability-zones.html)

~~~
ceejayoz
I wonder how that works with new zones. I remember us-east-1e being added
separately to the original four. Presumably, that one's the same for all
accounts that'd already signed up at the time.

------
scrabble
So what is the best way to balance a hosted site between Amazon and a separate
service? Because these connectivity issues suck.

~~~
bredman
One option would be to use Route 53 weighted round robin (WRR) DNS records and
health checks to accomplish this.

------
jday
this has taken openredis offline:
[https://twitter.com/openredis](https://twitter.com/openredis)

heroku is also reporting issues:

[https://status.heroku.com/incidents/554](https://status.heroku.com/incidents/554)

------
martin_
All of mine just started magically working

~~~
martin_
I retract that statement!
[http://shutter.io/img/vs6jjs/raw](http://shutter.io/img/vs6jjs/raw)

------
TallboyOne
Aaand were beck up now.

------
knodi
Always on a Friday...

~~~
jlgaddis
Not just any Friday...

    
    
        $ date
        Fri Sep 13 12:07:58 EDT 2013

~~~
xdissent
Not just any Friday the 13th...
[http://en.wikipedia.org/wiki/Programmers'_Day](http://en.wikipedia.org/wiki/Programmers'_Day)

~~~
TallboyOne
Not just any Friday the 13th Programmer's Day...
[http://www.holidayinsights.com/other/fortunecookie.htm](http://www.holidayinsights.com/other/fortunecookie.htm)

------
o0-0o
Down in Manhattan

