
Continuous deployment for mission-critical applications - peter123
http://www.startuplessonslearned.com/2009/12/continuous-deployment-for-mission.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+startup%2Flessons%2Flearned+%28Lessons+Learned%29&utm_content=Google+Reader
======
ghshephard
I think the problem is in our definition of "Mission Critical" which is a
fuzzy word. Depending on how "critical" a failure would be, continuous
deployment may or may not make sense.

The author talk about 2-year certification periods as though that were a big
deal, which indicates he has a different definition of "Mission Critical"
than, say, a Shuttle Engineer, Power Grid Manager, or Nuclear Regulatory
Organization (All of which experience 5+ year certification periods _after_
the release of a new system)

A multi-hour failure at amazon.com has different consequences than an multi-
hour failure of the west coast power grid.

As one who deals with organizations that have QA, Code Drop, Full Scale Test
Environment _in addition_ to their staging environment prior to allowing
anything (including software that has been fully regressed with the coverage
described in the article) to be pushed into production, I understand the
resistance to deploying "latest and greatest"

The workaround - maintain a Full Scale Test Environment (preferably,
_multiple_ such environments) internally for your Engineering, QA and
Operations Group - and drive your continuous deployment builds into _those_
environments, and then stage checkpoint releases for your "Production"
environments.

The only downside is that production customers don't get the opportunity to
provide rapid-feedback on new features, so you still get a bit of the
negatives associated with waterfall class requirements-architecture-design-
implmement lifecycles, but, you are able to enjoy the benefits of rapid
prototyping and bug triage internally that are associated with continuous
builds, massive automation frameworks, and close-to-real-world deployments of
your software.

~~~
justinfreitag
Continuous deployment is all about avoiding such failures. As such the more
"critical" the system is the more valuable this form of deployment is (though
I also believe its also valuable for less critical systems!). And of course,
continuous deployment is not complete without shakedown and back-out
procedures.

~~~
ghshephard
I don't buy that the more critical a system is, the more valuable continuous
deployment is. The downside of continuous deployment is that you see more
problems in production than you would with a fully regressed build. The
advantage is that those problems cost 1/10th as much to fix, releases occur
much more frequently, and you have a tight feedback chain between your
customer and the product.

The argument for Continuous Deployment is that you _always_ see problems in
production, and that all you are injecting from moving from a staged-release
cycle to CD is that you are now seeing 10-15% more issues (Let's say 60 P2 or
greater Issues instead of 50 P2 or greater Issues), but the cost of fixing
everything has dropped significantly, fixes occur much, much more quickly
(sometimes same day instead of multiple months), and you are able to enjoy the
productivity advantages of your software that is finely tuned to the users
needs much, much more quickly. CD can (and usually does) result in less
downtime than stages releases - mostly because of the rapid cycle from coding
-> error detection -> problem resolution.

The _only_ problem as I see it, is that you don't have as much control over
the probability of a P1 issue hitting your system. That's fine in the case of
something like Amazon.com, where a P1 issue might cost the organization $10
million dollars, but they've received $50 million dollars in value from using
CD. It's not the case where a P1 issue might result in a catastrophic loss
measured in 10s of Billions of dollars (Power Grid, Shuttle Launch, Nuclear
Systems) - for those, you need to stage ensure 100% coverage/regression/code
review. In fact, you need 100% coverage/review of your _development
techniques_, not just the code produced.

~~~
skmurphy
I think another way of looking at this is what can you do to make critical
systems more robust and resilient.

The capability to monitor, deploy, and rollback new features/fixes on short
notice would seem to be a part of that, even if actually used infrequently. I
would think that you would need to be able to respond to a change in the
environment the system is operating in as much ensure that a new release
doesn't contain bugs that would manifest in your current understanding of the
operating environment.

I am suggesting that much if not all of the infrastructure needed to reliably
patch/rollback critical systems can also be used for continuous deployment at
the option of the development team and/or the customer.

So in the event that a P1 hit a continuously operating network application
(e.g. power grid) the ability to deploy and rollback new features rapidly in
response might be a valuable option to have. It's an approach that increases
resilience. This does not mean you have to do this all of the time.

~~~
ghshephard
I absolutely agree with you that continuous deployment increases resilience.
The problem is that it also increases instability.

Perhaps the closest example I can think of with regards to continuous
deployment in mission critical situations are the martian rovers - I think
they had some real-time deployment of new code. But, the implications of a
problem with them were relatively minor - a few hundred million dollars, and
no lives lost.

Are there any examples of continuous deployment in a scenario in which
hundreds of lives and/or billions of dollars are at stake?

------
dshah
Eric has an uncanny knack for helping us see the light.

My favorite part:

"These beliefs are rooted in fears that make sense. But, as is often the case,
the right thing to do is to address the underlying cause of the fear."

This line of reasoning is generally much more persuasive than the "you're an
idiot for being scared of this, get over it."

And for the record, I think this whole continuous deployment thing is on the
path to truth and justice.

~~~
kiba
Hmm. I actually don't have fear of continuous deployment. I just see it as a
natural thing to do.

It make sense to me that you want people to test your game for gameplay
feedback so that you could tweak it until perfection.

------
aik
Great stuff.

Am I correct in this statement of why a continuous-deployment model does not
suffer from an increase in bugs:

In a non-continuous-deployment model: With a huge QA process, a lot of bugs
are found and fixed up front so the product is more bug-free once shipped.

In a continuous-deployment model: A less tested product is shipped often,
however due to having an engineer-client feedback model in place, bugs can be
quickly and easily reported to engineers. In addition, since it is shipped
often, it can be fixed quickly before most clients will see it.

Overall, continuous-deployment results in more and happier clients due to the
quick response the company provides?

------
grogers
I understand that continuous deployment could be very beneficial, but how do
you change a customer from using US Army type deployment to accepting
continuous deployment? What if you need to satisfy multiple different
customers, all with their own taste or distaste for risks and new releases,
how do you make the transition to continuous deployment?

It seems like crucial to the benefit of continuous deployment is that your
code actually be used when you check it in by lots of people, so that
regressions are found immediately. If you can't get that to happen, the entire
process grinds to a halt.

~~~
teej
I don't think continuous deployment necessitates deploying to production at
the end. I worked at a startup that used continuous deployment to deploy to
staging, then used push deployment to QA and to producion. This worked really
well with the mostly-outsourced dev and QA teams. It gave rapid and concrete
feedback to those teams, but left high risk operations in the hands of the dev
lead and ops.

