Hacker News new | comments | show | ask | jobs | submit login

This is incredibly reckless and naive advice.

You roll out when it least impacts customers because:

1. Impacted customers affects your bottom line. Minimize that and you minimize losses if something screws up.

2. Murphy's law. I don't care how much testing and QA you do; stuff will ALWAYS creep through, and sometimes it'll be nasty. QA works to minimize this, but it can't eliminate it entirely due to diminishing returns. Show me your guaranteed bug free deployed code and then I'll consider changing my view.

3. If you've designed and tested your rollback procedure prior to actually doing it, the chances of not being able to roll back in the real deployment is orders of magnitude lower than the chances of a failed deployment requiring a rollback, which in turn, if you've done your testing and QA, is orders of magnitude less likely than a successful rollout (but not 0, thus the midnight rollout).

If you're worth your salt, you have a tested rollback procedure, laid out in simple to follow instructions (or better yet, an automated rollback mechanism with a simple-to-follow manual process when the automated method inevitably fucks up).

You rollout, and if it fails, you roll back. And if that fails, you use the manual procedure. You should have the entire process time boxed to the worst case scenario (assuming successful manual rollback) so that you know beforehand what the impact is, and won't need to go around waking people up asking what to do.

Rollbacks are a myth. You can never rollback. Always be rolling forward. Enabling a culture and environment that allows for small frequent changes solves that problem.

The way to not impact a customer is to make deploys trivial, automated and tolerant to failure because everything fails.

Always be rolling forward

I basically agree with this idea, but when I'm selling people on the idea of making deployments trivial non-events that happen in the daytime, having the notion of "if something goes wrong, you can very easily jump back" gives people a sense of security.

In practice, when things go wrong, I've found it easier to roll forward than to roll backwards.

I've done a number of rollbacks in my time (for enterprise banking systems). They work so long as you do a few dry runs first and have an audit system in place.

A part of this is looking at the risk assessment and the notion of "guaranteed bug free deployed code" in a different way.

For a stable system that's in production, you don't need "guaranteed bug free deployed code" you need code that is not any worse than what's currently running out there. Doing frequent (daytime) deployments makes it easier to make a change, test (both with humans and robots) that change, and get it out there. You don't have to manually test everything in order to change anything when you're changing just one thing at a time.

I've come to believe that the far riskier approach is to make a bunch of changes at once, introduce a bunch of bugs, test fix bugs until you feel confident, and then release this huge change all at once in the middle of the night.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact