Right. Blame it on the victim. How do you make a "fault tolerant" service when core services like ELB together with the API behind it start to fail? Multi-region? Multi-cloud? When is it "designed to make web-scale computing easier" part supposed to kick in? With half-baked producs like ELB or things like EIP that cease to work when you need them the most?
I actually asked the AWS Premium support regarding the ELB multi-AZ issues, in order to actually make things easier for everyone. This is the answer I got:
"As it stands right now, you would need to make a call to ELB to disable the failed AZ. It may be possible for you to programatically/script this process in the case of an event.
Going forward, this is something that we would like to address but I don't have any ETA for when something like this might be implemented."
IMO, there's plenty of blame to go around, however the onus really should be on the individuals that are making the decision to go on Amazon and trust that there services will always be up. Unfortunately, some people don't know, so they will just blindly choose Amazon for their name recognition.
For the places that truly care about reliability and have the technical staff to make informed decisions, they should understand the limits of reliability with various architectures. As I mentioned before, one of the tenant of reliability is isolation. When the scope of isolation is increased (e.g. single host vs multi host), one must also handle failures at that scope. Amazon isolates at the datacenter level. So should those utilizing Amazon's offerings.
I actually asked the AWS Premium support regarding the ELB multi-AZ issues, in order to actually make things easier for everyone. This is the answer I got:
"As it stands right now, you would need to make a call to ELB to disable the failed AZ. It may be possible for you to programatically/script this process in the case of an event.
Going forward, this is something that we would like to address but I don't have any ETA for when something like this might be implemented."