

Show HN: AWS S3 outage test from across the world - sajal83
https://pulse.turbobytes.com/results/55c8751aecbe400bf80005f2/

======
chr15
Is there any really way to design your application to handle S3 failures like
this? S3's SLA has 99.99% availability, but is there a way to handle the 1% so
your application is not affected? Options I can think of:

    
    
      1. Using a CDN to serve files can help in some cases
      2. On-prem systems may be able to use gateway-cached volumes and use the local disk cache vs S3
    

Other ideas?

~~~
untog
Slightly OT, but there's an interesting phenomenon at work now that so much of
the internet depends on Amazon's infrastructure. When it goes down you might
not even need to worry about it that much, as so many sites/apps will be
broken that most users will just assume that the internet is broken.

~~~
alpb
It happened a few times before. And no Internet was not broken and not many
sites are using AWS as you would assume. A lot of services still run on their
datacenters or on-premise servers, or maintain a hot backup that they can
switch to immediately.

>> When it goes down you might not even need to worry about it that much

I'm afraid this is pretty much getting the entire cloud thing wrong.

~~~
untog
_lot of services still run on their datacenters or on-premise servers, or
maintain a hot backup that they can switch to immediately._

I'm not an idiot, I'm well aware of that. My point is that when a large number
of consumer-facing sites go down, users (who aren't aware of Amazon cloud
servers) simply assume something is wrong with the internet.

Obviously if you have a mission critical service this isn't acceptable. But
for a lot of average sites/apps it might not be worth the investment in
time/effort to cover relatively small outages such as these.

 _I 'm afraid this is pretty much getting the entire cloud thing wrong._

Not really. It's the utility of the cloud - if there's an outage there are
already a lot of people working to fix it. If you're self-hosted, that's on
you.

~~~
tedunangst
The first thing I do when Netflix goes down is complain on Facebook. Facebook
up, Netflix down? It's not the Internet that's broken.

------
Sami_Lehtinen
"Oh no, our server made a boo boo. Please try again."

~~~
sajal83
Pls try again. The server had crashed due to "too many open files". I'm
leaking file descriptors somewhere.

~~~
13
You should fix that error message too, it's pretty awful.

------
rbinv
I just re-ran the test and got an error rate of 1.23% (vs. 41.98%):
[https://pulse.turbobytes.com/results/55c88a0fecbe400bf800073...](https://pulse.turbobytes.com/results/55c88a0fecbe400bf800073e/)

edit: S3 seems to be back up and running according to the AWS status page.

------
imrehg
Yeah, I got 2.53% error rate, but that's nothing to worry about - using 79
servers, that's exactly 2 errors, both of them in China, which kinda makes it
feel less Amazon's fault, than the Great Firewall's.... Maybe there should be
some more meaningful error measure than the raw failure percentage.

~~~
scott_karana
Vancouver, Tokyo, Cebu, Singapore, Bangkok, and Kharkiv all show EOF.

Paris, Roubaix, Manchester, Portsmouth, Budapest, Tokyo, Bilthoven, Vleuten,
Utrecht, Manila, Sovetskaya Gavan, Tyuven, Singapore, Bangkok, Taipei City,
Kiev, Kherson, Ashburn, Mountain View, Rowley, amd Newark all show 503s.

It's more than just the Great Firewall. :-)

~~~
imrehg
It depends, you have to re-run the test...

------
cddotdotslash
Got some alerts from a couple services that rely on S3 this morning. Perhaps
this is related, but everything is back up for now.

