
AWS increased error rates / intermittent outages - needcaffeine
http:&#x2F;&#x2F;downdetector.com&#x2F;status&#x2F;aws-amazon-web-services<p>https:&#x2F;&#x2F;twitter.com&#x2F;search?f=tweets&amp;vertical=default&amp;q=ec2&amp;src=typd
======
frakkingcylons
FYI there's a great Chrome extension that hides all the working services
(green checks) on the AWS status page so you can quickly see what's down.

[https://chrome.google.com/webstore/detail/real-aws-
status/ka...](https://chrome.google.com/webstore/detail/real-aws-
status/kaegondhonfdclembpcgaaammmlfaekj)

~~~
savant
Yeah I wrote this during the last extended outages - had some downtime since,
well, I couldn't interact with the api :P .

The source code for it is here: [https://github.com/josegonzalez/real-aws-
status](https://github.com/josegonzalez/real-aws-status)

~~~
elwell
Well their status page is down too:
[https://status.aws.amazon.com/](https://status.aws.amazon.com/)

Edit: http(s) was at fault (no SSL cert?)

~~~
Thaxll
Yes no HTTPS.

------
dewyatt
Sorry guys, I knew I shouldn't have started my 4 instances at once.

~~~
0xmohit
It wasn't you. Somebody had set up auto-scaling at the behest of AWS support.

~~~
zymhan
And it solved all their problems!

Because the whole thing stopped working.

------
elwell
Status page is down too:
[https://status.aws.amazon.com/](https://status.aws.amazon.com/)

Shouldn't there be a separation of concerns for status pages, maybe use:
[https://www.statuspage.io/](https://www.statuspage.io/)

Edit: http(s) was at fault (no SSL cert?)

~~~
aschuster93
Looks like they don't have a cert on their status page.
[http://status.aws.amazon.com/](http://status.aws.amazon.com/) works fine,
though.

------
Zenfinch
Brace for half the internet going down if gets any worse, especially if its
us-east (wish I was being ironic).

------
needcaffeine
Dammit now my Amazon Echo doesn't work.

~~~
elwell
I feel sad for a lonely grandma somewhere; her grandchildren never visit, but
they bought her an Alexa to talk to that Amazon couldn't keep online.

~~~
0xmohit
I can't even ask Alexa if AWS is up.

------
ed661266
API Gateway is returning 500s. So much for server-less architecture :(

~~~
dewyatt
I've had both 500s and 401s interestingly.

------
joshwa
Many customer-facing amazon services are also affected, including amazon.com!

Twitter is not pretty:

[https://twitter.com/search?f=tweets&vertical=default&q=amazo...](https://twitter.com/search?f=tweets&vertical=default&q=amazon%20down&src=typd)

------
c0achmcguirk
I blame Pokémon Go. Somehow or another I'm sure it's to blame.

~~~
elwell
Global Outage?: [http://downdetector.com/status/pokemon-
go/map/](http://downdetector.com/status/pokemon-go/map/)

------
XiZhao
I like how half the internet can die when EC2 has problems.

~~~
mrweasel
Part of the problem is that a large number of site pull in services, re-
targeting, AB-test, and weird Javascript in general. These things are pulled
in without questioning or demanding an SLA or putting in an easy way of
pulling them back out. Even if you don't use AWS yourself, you can be sure
that some third party you rely on is deploying on EC2. Of cause they'll never
tell you that.

For most new stuff, we require that it can be loaded with something like
Google Tag Manager or UberTag, so we can quickly disable them when they fail.

It doesn't help that Amazons status page isn't all that good and sometimes
doesn't seem to actually reflect the true state of their service.

------
0xmohit
AWS would wish that all those downtime reporting services move to their
infrastructure. That ways there won't be anyone to report the downtime.

------
zzleeper
I can't log into amazon.com to check my orders (first time that happened).
THat's quite surprising as often these problems are unrelated

------
jimaek
Am I the only one having daily issues with SQS? I have 150+ servers writing to
a queue and they timeout at random a few times per day.

~~~
dopamean
The app I work on is a single server writing to a dozen or so queues. The
throughput is pretty low but I still manage to see failures for a few minutes
once or twice a week.

------
0xmohit
AWS should design for resilience [0].

[0] This is a part of the wisdom shared by AWS support upon reporting an
outage.

~~~
toomuchtodo
1\. "Aren't you fault tolerant against AZ failures?"

2\. "Aren't you fault tolerant against region failures?"

 _throws up hands, moves back to bare metal_

------
cdsmarty
Seems to be effecting everything but I have some running instances that seem
fine. [https://cloudstatus.eu/status/aws](https://cloudstatus.eu/status/aws)

Edit: Does seem to be recovering now

------
the_watcher
Amazon.com remains up... unless you attempt to buy something, when you get a
500 error.

~~~
FT_intern
I wonder how much revenue is being lost every second

------
needcaffeine
11:27 AM PDT We are investigating increased API error rates in the US-EAST-1
Region.

~~~
joshwa
11:50 AM PDT We can confirm increased error rates for the EC2 APIs and are
currently working to resolve. We also observed isolated periods of impaired
network connectivity for some EC2 instances.

12:21 PM PDT We have identified the root cause for error rates accessing the
EC2 APIs and EC2 Management Console and are currently working to resolve. We
observed isolated periods of impaired network connectivity for some EC2
instances, however running instances are currently operating normally.

------
loourr
Is anyone having issues with their lambda functions not working? Mine stopped
working and is just returning "Service error." when I try to test even though
they claim it's operating normally.

------
carterh062
Looks like it's just their service API for now, i.e not responding to any
queries on existing instances for me at all. However, everything else is
working fine as far as I can tell.

------
tamcap
We are getting very intermittent issues with S3 and SNS so far.

~~~
0xmohit
SNS informing about down services would be too bad. So it must go down with
others.

------
elwell
Can't log in to AWS Console. Instances running fine.

------
carterh062
In the console, running instances was timing out on load for EC2 and some of
our AWS API calls to get hostnames are timing out in PHP

------
swingbridge
Ditto, can't see any instance data either via the API or graphical dashboard.
Instances themselves seem normal for the moment.

------
elwell
Just got through to the AWS Console. Finally can get back to work.

------
nnd
On this note: what are some good alternatives to AWS?

------
awsofflineagain
Dear smarta$$es, could you please tell me the benefits of serverless
architecture (lambda) once again? My monolith app works perfectly fine even
when the whole AWS is offline :P

------
tschellenbach
everything on getstream.io is still up and running. most issues seem related
to provisioning more instances.

------
tschellenbach
does anybody have more details about what's actually down? their description
is a bit vague.

