
AWS S3 Having Problems Again? (Monday 12PM Pacific) - RyanGWU82
We&#x27;re seeing similar problems to last night -- lots of 503s from S3. Anyone else?
======
wbharding
Indeed. As I write this we're in the midst of our third S3 outage of the day.
The past two were eventually documented on the AWS Service Dashboard. The
latest one has not yet received its tiny status icon to indicate an outage.

It's one thing that S3 keeps going down today; we run our own server cluster
and I accept that 100% uptime isn't possible. But it's aggravating that they
can't at least figure out how to give timely updates on their dashboard when
something is broken.

We inevitably learn of S3 outages through our internal error reporting systems
before AWS posts it to their status page. When they do finally post, it is
usually a tiny "information" icon, even when reporting a problem that makes
the service unusable. The laggy, misleading nature of their status page gives
the impression they must be tying bonuses to the status icons. Can't fathom
why else they would be so inept when it comes to keeping us updated when
something is wrong. Surely they have sufficient internal monitoring to pick up
on these outages long before they update their customers.

~~~
ceejayoz
It shouldn't be, but I've found
[https://twitter.com/ylastic](https://twitter.com/ylastic) by far the best way
to find early info on AWS issues.

------
KenCochrane
From Amazon:

"Hello, We have just become aware of EC2 network connectivity issues in the
US-EAST-1 region. The impact of this issue is loss of network connectivity to
EC2 instances in US-EAST-1. The AWS support and engineering teams are actively
working on bringing closure to this issue. I will share additional information
as soon as I learn more about this issue."

------
dkuebric
Yep, same. Lots of latency too--here's what we're seeing:
[http://kuebri.ch/bucket/s3_latency_081015.png](http://kuebri.ch/bucket/s3_latency_081015.png)

~~~
zeeta6
What tool is that?

~~~
dkuebric
[http://www.appneta.com/products/traceview/](http://www.appneta.com/products/traceview/)

~~~
zeeta6
Thanks

------
Negitivefrags
I'm risking being inflammatory here, but do people really believe that they
get better uptime from AWS compared to renting dedicated servers?

I feel like AWS has way too many moving parts to be stable.

It's very tempting for them to reuse bits of infrastructure everywhere which
increases the chances that if something goes wrong somewhere it will break
your stuff. So for example, hosting instance images on S3 means that when S3
has issues, now EC2 has issues.

~~~
deanCommie
AWS is so massive that even when 0.1% of the customers are having problems, it
is huge news like this.

The reality is most customers are not affected, and overall service uptime is
highest anywhere around.

Not to mention that whenever AWS is having issues it's always in one region at
a time, and frequently a single availability zone. As long as you build your
application to be AZ-tolerant, you won't run into problems.

~~~
mnutt
_The reality is most customers are not affected, and overall service uptime is
highest anywhere around._

Unfortunately it's really impossible to say in this case, since they don't
release numbers. Informally everyone I know with S3 buckets in US-Default had
issues this morning.

 _As long as you build your application to be AZ-tolerant, you won 't run into
problems._

What you say about multiple AZs is true for EC2, but many other AWS services
(especially EBS-backed ones) tend to go down across the entire region. If
you're serious about availability, you really need to be in multiple regions.

------
edgan
The us-east-1 region gets treated differently than all other regions by AWS.
Part of the reason it gets treated differently it is the default, and hence
the most popular. It also doesn't help that it is on the east coast, and
experiences more weather.

For the above reasons, and that I work in the SF bay area, I put everything in
us-west-2. us-west-2 sometimes has it's own issues, but nothing quite at the
level of us-east-1.

~~~
mdellabitta
IIRC, the AWS console itself is hosted out of us-east-1. Which means you're
always somewhat exposed to whatever failure modes it has.

~~~
not_kurt_godel
This is no longer true.

------
thspimpolds
"12:28 PM PDT Between 12:03 PM to 12:19 PM PDT we experienced elevated errors
for requests made to Amazon S3 in the US-STANDARD Region. The issue has been
resolved and the service is operating normally"

Our AWS TAM called us. I don't think he wanted the nasty call I gave him at
4:30am

------
atopuzov
Amazon yet again lying to it's customers about the status of the service is
the only real issue I see here>. Services fail, it's a fact of life but at
least admit it's broken and that the issue is being fixed instead of blatantly
lying and saying minor disruptions.

~~~
eric_h
[http://status.aws.amazon.com](http://status.aws.amazon.com) appears to
indicate that they did have problems and have now resolved them.

------
bhz
We saw a short burst of 503s a short while ago, but we have not seen any
since. Hopefully we do not see any more though.

Also, for the record, S3 has been very stable for us otherwise. We have been
rather happy with AWS overall.

~~~
onyxraven
Same, though just as I write this we see another spike of errors.

~~~
bhz
Ok, no 503s but just got a very small burst of 500s,

"com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 500, AWS
Service: Amazon S3, AWS Request ID: -redacted-, AWS Error Code: InternalError,
AWS Error Message: We encountered an internal error. Please try again., S3
Extended Request ID: -redacted-"

:/

------
RyanGWU82
Looks like it got better around 12:20 PM, about 10 minutes after the incident
started. We haven't seen any problems in the last few minutes.

~~~
RyanGWU82
... and errors started up again at 1:00 PM.

------
autotune
What happened to that 99.99% availability? Either way this just got posted at
reddit.com/r/sysadmin which might be useful to some for tracking error rate:
[https://pulse.turbobytes.com/results/55c8751aecbe400bf80005f...](https://pulse.turbobytes.com/results/55c8751aecbe400bf80005f2/)

~~~
ceejayoz
Their SLA guarantees 99.9% on a monthly basis. The 99.99% mentioned on the
product page isn't guaranteed at all.

As for what happened, my money is on this: [https://aws.amazon.com/about-
aws/whats-new/2015/08/amazon-s3...](https://aws.amazon.com/about-aws/whats-
new/2015/08/amazon-s3-introduces-new-usability-enhancements/)

> You can now increase your Amazon S3 bucket limit per AWS account... Amazon
> S3 now supports read-after-write consistency for new objects added to Amazon
> S3 in US Standard region.

The 100 bucket limit used to be an absolute, unchangeable hard limit - rare
for AWS and thus likely something deep in the architecture from S3 being one
of their first services - so I suspect the lifting of that limit involved some
fairly major changes to the backend.

~~~
StabbyCutyou
They actually would let you increase that, but only up to a certain point and
only if you specifically requested it. I don't see them mention the absolute
ceiling being lifted, so that is probably still in place somewhere.

I'd wager it's more likely that read-after-write change.

------
toomuchtodo
503s galore. Is anyone seeing issues in other S3 regions?

------
arturhoo
We had problems while connecting to S3 standard US region from us-east-1 at
19UTC but it was solved 20 minutes later.

edit: seeing connectivity issues again at 19h50UTC

------
azundo
We're seeing similar symptoms here as well.

------
matwood
We have also seen a higher rate of port scans/attacks today. I wonder if it is
AWS wide causing system overload issues.

------
kordless
Interesting this article was bumped from the front page so quickly. Makes you
wonder...

------
needcaffeine
Just started again in us-east-1.

------
andrebrov
We had problems with AWS ML tonight

------
mstkrft
Same here :(

------
kernel_sanders
Same for us

------
Stovoy
Yes, seeing the same thing.

------
AnonNo15
Seeing it too. 15:00 EST

------
ronreiter
Yes, same here.

------
ninjakeyboard
bad day.

------
mej10
Yep!

