Hacker News new | comments | show | ask | jobs | submit login

Having the code behind the chaos monkey is not nearly as valuable as having the guts to run it in the first place.

If this were reddit, I'd link to that meme about doing testing in production.

Since this is HN, I'll instead say that if you don't have the guts, you're only lulling yourself into a false sense of security. AWS or not, your systems will fail.

By running the monkey, at least you can make it fail on a schedule that is convenient to you, instead of happening when you're drunk on a Saturday night (or whatever your vice of choice might be).

At my work (large corporate office), we have random node outages. It's not quite as in depth as chaos monkey, but it goes towards the same purpose. Just pull the plug on the server. More than once, a random node outage has caught a novice developer making static links to nodes through the load balancer. We also have random pen-tests designed to DoS or otherwise disable services around the network. Controlled destruction of your infrastructure is the quickest way to highlight any faults.

But remember: what's the difference between hacking and pentesting? Permission.

Without chaos monkey, instances will not randomly die anywhere near as frequently as with. (Otherwise, why run it at all?)

Increasing the amount of failures in order to increase the percentage of nice (not 3am) failures takes guts.

yes it's like knowing a mad axeman and then letting said person go wild in yoru server room.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact