
Simple Testing Can Prevent Most Critical Failures - luu
https://www.usenix.org/conference/osdi14/technical-sessions/presentation/yuan
======
yeukhon
I really enjoy going to usenix and watching usenix presentations. Almost all
of them include the PDF and a pretty good quality of the presentation (and
plus audio for those who only wish to hear). Also the presentation mode is
awesome - presentation and speaker are on the same page. I just can't stand at
camera person or video editor switch between presentation and speaker
constantly. I want to read the slides, as much as I enjoy the gesture of the
presenter.

I think one take away is not just handling exception, but actually monitor
exceptions and make use of the exceptions. Very recently I was debugging an
application inside a container. The code caught Exception (base class of all
exceptions), well, I had to ask developer to log the stacktrace. Finally the
stacktrace revealed the actual problem and I was able to write a PoC to test
and narrow down the root cause to the security of the container.

With linter and static analysis we probably can encourage developers "hey look
you are catching too much or too little."

Lastly, planning for failure by doing HA design is critical. In AWS we have to
prepare for underlying host going bad (which in turns means we have to stop
and start to move EC2 instance to a different host). It's easy to say but
actually hard to do as often applications are not truly stateless. PoC
failure, stress test, performance testing are necessary.

