I think it will be a while before we know the exact reasons behind the EC2 outage. At first blush it appears to be a kind of cascading system failure, e.g: networking error causes EBS mirroring which causes other failures.
What kind of simulation tools exist to model these kinds of failures? Is there an overlap in tools that you might use to simulate mechanical systems (gulf oil spill or fukushima daiichi) that map into computer systems like AWS?