Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I trust S3 a lot (in fact, there was a time I had >1% of all objects in S3; I have since deleted a very large percentage, but I believe I still have well over a billion objects stored).

I would definitely agree: it doesn't fail under load; for a while I was seriously using S3 as a NoSQL database with a custom row-mapper and query system (not well generalized at all) that I built.

However, this particular aspect is a known part of S3 that has been around since the beginning: that it is allowed to fail your request with a 500 error, and that you need to retry.

This is something that if you read through the S3 forums you can find people often commenting on, you will find code in every major client library to handle it, and it is explicitly documented by Amazon.

"Best Practices for Using Amazon S3", emphasis mine:

> 500-series errors indicate that a request didn't succeed, but may be retried. Though infrequent, these errors are to be expected as part of normal interaction with the service and should be explicitly handled with an exponential backoff algorithm (ideally one that utilizes jitter). One such algorithm can be found at...

-- http://aws.amazon.com/articles/1904

Regardless, the 2% failure rate on that one S3 IP endpoint is definitely a little high, so I filed a support ticket (I pay for the AWS Business support level) with a list of request ids and correlation codes that returned a 500 error during my "static hosting + redirect" test today. I'll respond back here if I hear anything useful from them.



2% failure rates are excessive, agreed - but why is the requirement to retry on 500 so off-putting? Virtually all API's have this occur on some level and you do the exponential backoff song and dance.

What am I missing that makes this such a show stopper with browsers? You can still do the backing off clientside with a line or two of javascript.

Seems like the pros outweigh the cons but I'm probably missing something.


You are still thinking about this as an API, with for example JavaScript and some AJAX. The use case here is zone apex static website hosting: if you go to http://mycompany.com/ and get a 500 error, the user is just going to be staring at an error screen... there will be no JavaScript, and the browser will not retry. As I actually explicitly said multiple times: for an API that is a perfectly reasonable thing to have, but for static website hosting it just doesn't fly.


Oh I see what you mean, you're concerned about the first bytes to the browser being faulty. Well, that 2% error rate is spread out across the total of requests, the likelihood of a user getting a 500 on his first hit should be significantly less then 2%. (but it does seem like it will still be way too high)

Very valid point saurik, thanks for pointing out the extent of the problem. It is a dilemma. Seems kind of silly to have an instance just for the first hit to go through reliably for visitors, goddamit Amazon.

Edit: Wait a minute, maybe this could be solved with custom error pages which I think they support. :P


I..what? That's not how percents work.


No, it isn't, but that's not what I meant either. I'm assuming that not all requests are equivalent because of the nature of S3.


You're going to need to explain how the requests would differ. If anything I'd expect image files to be more cache-friendly and have fewer visible failures than the critical html files. An image might have a 2% failure rate once or twice plus fifty error-free cache loads, while an html page might have 2% failure every single click.


The likelihood of a user getting a 500 on his first hit is the same as the user getting a 500 on his 100th hit - 2%.


Interesting. I guess in the case of static web hosting you could use onerror to deal with failed frontend requests to smooth out the broken images from the user perspective. Though as I say, not been a problem for me.


Yeah, for images you can probably deal with that; but what if your JavaScript doesn't load because the script itself was a 500 error, or the entire website doesn't load because of a 500 error... well, you're screwed. The use case here is for zone-apex whole-site static website hosting (either of just canonicalizing redirects or of the final webpage: same issue).


I've just discovered that you can use the same technique for script and style tags too (though I'd rather not have to).

It sounds like Jeff has tracked down the specific issue in your case, so things are looking up :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: