Hacker News new | past | comments | ask | show | jobs | submit login
Continuous Deployment (avc.com)
179 points by cwan on Feb 12, 2011 | hide | past | web | favorite | 82 comments

One of the great benefits of continuous deployment is that it really helps you hone in on a useful testing philosophy.

We've been doing continuous deployment at Ars for about 1.5 years. We would generally write tests for our apps before that, but they were somewhat directionless (and awfully easy to blow off for a "minor" change). Once our tests became crucial, we did a much better job of picking relevant tests and feeling their importance.

Many people visibly shudder when we tell them about our deploys. What they don't consider, though, is that manually ensuring a given release is "good" is less reliable than letting a well instructed computer system do it. When you make changes, run your tests, and then eyeball things to make sure everything's cool, you're really just doing unstructured integration testing. You're likely to miss regressions, unintentional bugs in seemingly unrelated systems, etc.

It's great to see the ideas like CI trickle into the that startup world at large, to the point where we have VC blogging about it.

The thing that's amazing to me is that anyone would choose to NOT work like this. If you are web-based startup and cannot do multiple deployments a day, or you don't empower ever last person on your technical team to do a production deploy, you are at a serious competitive dis-advantage.

There is a difference between CI and Continuous deployment. CI can just mean you continuously build/run tests/deploy to staging(may be). The article talks about continuous deployment to production.

I asked how to roll back the changes. He said "we don't roll back, we fix the code."

Not the best idea. I don't debug as well under the extreme pressure of my site being broken with the clock ticking.

It's not always like that. John Allspaw's answer is correct in the sense that actual rollbacks aren't done -- the fix will be a new push -- but sometimes, before you do that, you will just turn the new feature off in production.

This is possible because many new behaviour is controlled by "feature flags" which are associated with the server configuration. So you have the benefit of everybody, developers and production, staying very close to trunk, but still being able to have radically different behavior on development machines versus production.

That said, I have participated in debugging sessions on a product which used continuous deployment, and they can indeed be nerve-wracking. Personally I wouldn't want to use CD by itself, without a good culture of code review and a great test suite.

Feature flags are a very powerful feature, especially for services. Many of the most successful web companies (amazon, google, facebook, twitter) have entire frameworks and infrastructure dedicated to supporting the design pattern.

You'd probably find it easier when the deployed changes were extremely small. Our last automated deploy was a single line change. Most are bigger, but not huge.

Also, while they don't do full rollbacks, I suspect more than one fix has been "remove the offending code until we can figure out what's wrong".

I've seen single line changes cause data loss, corruption, system outages, remote root exploits...

I'm not sure that the number of lines makes the change more or less risky.

And how exactly does this relate to the way they deploy their code? As far as I can tell, they actually review code before marking it ready for deploy. That kind of changes would be an issue with "usual" large batch deploys as well.

I've also seen many devs look at a broken release and instantly realize their mistake. A forgotten production config, a hard coded variable, or an empty cache. If code is in fact reviewed before going to production, the risk is significantly lower.

Nice choice of username, by the way.

The safest change to make to a stable system is the smallest change possible.

Agree, it seems to me that an emphasis is on time production deployment takes rather than rollback vs fix

When you know you can't really rollback, I'm sure you're helluva lot more sure that your code won't take down the site

I'd bet it's semantics.

Rolling back the code-base would mean everyone else would need to roll back theirs, and you'd lose history. It's easier across the board to simply re-write what it was - a "roll forward", if you will.

Not the best idea, if you don't debug as well under such pressure. Some people do.

Or, more commonly, people think they don't perform well under this kind of pressure, but they actually do. Usual scenario:

Employee calls the CEO of the company in the middle of the night panicked, saying the site is down and there is no way they can fix the problem, they don't even have the right skill set.

CEO tells the employee to calm down and do their best.

Site is back online twelve minutes later.

More likely, the developer can handle the tech pressure, but having his/her manager FREAKING OUT and over his shoulder asking questions the whole time is what really causes the slowdown.

Step 0. in getting started with continuous deployment is having an organization that doesn't lose it's mind every time there's a blip.

That often means you need managers who are especially good at (or at least dedicated to) deflecting the freak out/crazy.

CD won't increase your changed related incidents, but to paraphrase the IBM parable, "no one ever got fired, for doing quarterly releases, and heavy QA."

I used to code for a website handling about 25,000 logged in users per day, and there was a post-commit hook on SVN which pushed changes live, immediately. Working on that taught me a lot (mostly how not* to do things), one thing I learnt is that I code much, much faster under that kind of pressure.

* Note: I am not endorsing this as a setup, it's crazy. I don't think I've ever seen such high staff turnover.

"At Etsy, they push out code about 25 times per day. It has worked out very well for Etsy and has led to [...] improved morale."

I am in the opposite situation at work. My company has scheduled, monolithic, all hands on deck releases once a month. It's an insane legacy policy from a time before our dev team had scripted releases and automated tests running against every commit in development. We've solved the technical issues of continuous deployment, but socially, we're stuck in the dark ages. It's a huge morale killer. We've had several great developers rally to change this policy and were stonewalled, and eventually left.

Without knowing the details of this company, here is my theory:

With established companies, with established revenue and legacy product/process, management thinking is probably something along the lines of "if it ain't broke, don't fix it". They don't --pardon my french-- give a shit about morale if the profit is rolling in. If they didn't start with CD, they aren't going to do it now.

So to champion any change, you're going to have to make a case around greater revenue/margins, better (less) customer support, more defensibility in the marketplace, or something else that gets the suits all geeked.

Complementary part of this theory: Startups need every advantage they can get since they have the weaker market position when they begin. So they're more willing / required to be innovative about every last bit of their process, including deployments.

Other advice -- start slow. Prove the process on a smaller part of the product.

We adopted continuous deployment at my last company, and it was a huge win for us. It resulted in less downtime, reduced the cognitive load on our developers, and let us turn changes and bug fixes faster, which just made everyone happier. Here is the approach we took --

We got continuous integration working first by setting up an automated test server. We used cerberus, which is a trivially simple ruby gem that simply polls a repo, grabs code, runs tests, and reports the result. You could install this anywhere, even an old Mac mini if you wanted. We spun up a low end server for it. We wrote great tests, got decent coverage, and made adjustments in our automated testing strategy to increase our confidence.

Then we worked on zero-downtime deployment and rollback. This was actually the hardest part for us. With regard to schema changes, if we had to hammer our database (new index on a big table) then we needed to take the site down, but otherwise our strategy was to always add to the scheme, wait for the dust to settle, and then remove things later. This worked for us, but we had a relatively simple schema.

I haven't quite figured out how an ajax-heavy site would pull this off. That seems like a hard problem since you need to detect the changes and refresh your javascript code.

We then combined these two to get automated deployment to a staging server. We could have written monitoring code at this point, but we decided to punt on that, relying on email notification for crashes and watching our performance dashboard.

And finally, we set it up to deploy to production, and it just worked, and we never looked back. It was the most productive and pleasant setup I've ever worked in.

Regarding ajax heavy applications, I have been faced with that particular problem. A few of my sites are javascript heavy apps, with long flows between page reloads. If I ever change my server API in a way that would break the javascript, I need to signal to the client that it needs to refresh. (I keep track of version numbers for the server code, and whenever that is bumped, it means the client is out of sync)

This can be kinda.. awkward. What I've opted either a lightbox or some kind of message saying we need to refresh the page. But that is not ideal.

Has anyone dealt with this issue before? With javascript heavy apps, the development is more like a traditional desktop app, or a mobile app that has to deal with the client/server model interface in a non-trivial manner.

I've been doing continuous deployment (multiple production deployments per day) with an ajax-heavy application and have been doing a sort of rolling API change, where both the client and the server still function using the last generation's contract, so I rarely get a client request that the server can't deal with, or send a response that the client can't deal with. It doesn't work with every kind of change, but it has helped me.

Something that you could do (and I may wind up doing) is just having the page ask the server if it needs to refresh itself every so often.

I haven't directly but I just happen to be staring at Pivotal Tracker which handles it pretty nicely. The do as you say and push a little lightbox which says "A system change has occurred requiring a refresh of this page. Please click 'OK' to reload Tracker."

And Gmail would have the same problem. But to date, I have never seen Gmail ask to refresh..

I think how Gmail handles the problem is they keep multiple instances of the server-side software running, one for each version of the API. From my experience, whenever GMail rolls out a new feature, I don't see it until I do a refresh and frequently that's what GMail tells me I need to do to see the new feature.

It's great seeing VC's get involved on this level with their portfolio companies. I imagine it makes explaining when things go wrong easier.

yes, and it is fun to be able to "go into the factory and see how things are made"

i had so much fun

Reminds me of the calacanis article about facebook's developer culture - continuous deployment not only makes updates faster, it democratized the process and gives every developer the power to make things better. Good for the product, and good for the team. http://launch.is/blog/launch002-what-i-learned-from-zuckerbe...

TBH if you are a startup and have downtimes, people don't trust you. I know I won't go to a site if it failed on me on the second click because somebody was _adding features_.

If you are a startup and don't have downtimes, you're either building something trivial or a god.

We've actually had less broken downtime since we started doing automated deploys than we did beforehand. It's partially the result of good testing, partially because the changes are just so much smaller than they used to be, and partially because pushing to our master git branch is now serious business.

Another advantage is that when something breaks, the code is fresh in your mind.

If you only deploy in big steps, you have already worked on a few other things and have forgotten a lot about the (broken) code. So fixing things take more time and it becomes more difficult to learn the 5 whys behind the bug.

Imagine this scenario. Peak traffic time in the day and site goes down because of a deployment, an investor sees it and threatens to pull money out. Ad networks that you work with actually see a report where during peak hours their advertisements weren't served, causes of major trouble. I work for one of the biggest sites in my region and take my word for it, uptime is essential and not something left to chance or gods.

Continuous deployments ensure delivery of features and turn around time fast enough for a changing business model of startup and can't be stopped either.

So uptime and continuous deployment model has to happen at the same time.

Deployments should always be automated and revertable. If you think you can run a healthy startup with just about adequate mistakes perhaps you got an easier place to work at :)

You're scenario has nothing to do with CD. In fact, if you aren't using aspects of CD, your scenario is even more dangerous.

Just getting to this now, but there's a slight misquote in Fred's post. :) I (and I think Kellan said it at the same time) said "...we don't roll back, we roll forward..."

We do that because it's simply faster and easier to roll forward than to roll back the entire deploy. Rolling forward means taking advantage of the percentage rampups of new code paths, feature and config flags to turn things off/on, and even reverting the 5 line change is simpler than rolling it all back.

So no, I'm not suggesting we let bugs in prod languish while they are debugged for a long while.

Anyone have good pointers on the best tools for continuous deployment of Django? I know there are tools like Capistrano, Fabric and Django Evolution out there but if someone has first hand experience using some of these it would be good to learn about.

I use Fabric at work, with South for migrations.

One useful technique I've found is to have Fabric deploy to a "staging" install on the target server (we still use SVN). It runs unit tests in that install, and if the tests fail, it stops deployment. If they pass, fabric will then complete deployment to the production server (and run tests again for good measure).

Of course that depends on having good testing practices and coverage, but it helps reduce the number of stupid last-minute mistakes that creep into the repo. Fabric is also useful in numerous repetitive small tasks that you need to run without going through the rigamarole of SSH.

Generally though we avoid running South through Fabric - changes to data schema are run very carefully after backing up etc. South is great, but any migration tool should be used with due care.

Interesting. Somehow, I had never heard of South. Thanks!

For others like me, from http://south.aeracode.org/docs/about.html

South brings migrations to Django applications. Its main objectives are to provide a simple, stable and database-independent migration layer to prevent all the hassle schema changes over time bring to your Django applications.

You read about these "20 deploys a day" type situations, and it sounds great, and I'm sure it makes VCs all warm and creamy inside. But they're talking about small changes, and not all deploys are small. You can't incrementally change a database engine, for example.

What I would be more impressed by is if they could run tests against a full load of real user input, and have useful/reliable metrics come out the other end. There's no reason for the deploy to fail if you have a real mechanism for testing the deploy. I've yet to hear of a shop of the size of an Etsy that does real production/load testing.

"[...] 4.4 machine hours of automated tests to be exact. Over an hour of these tests are [...] Selenium. The rest of the time is [...] unit tests."

Impressive. But this not production test, where you tee the actual input from users into the system. Nor is it load test, where you measure whether or not the site is performant under real load. For that matter, there's nothing here that indicates you debug your operations (provision/deploy/backup) at all. My comment about not seeing such an operation stands.

Read a bit farther.

> Load average, cpu usage, php errors and dies and more are sampled by the push script, as a basis line. ... A minute later the push script again samples data across the cluster and if there has been a statistically significant regression then the revision is automatically rolled back.

Precisely so, but that's not testing, it's repair. You've just rolled back from a production problem automatically. The user still saw the problem.

Sorry if this seems rude. You're definitely way ahead of the game, but you're not addressing the thing that I'm talking about.

None taken. I don't work for IMVU.

I am working on a service that will offer load testing using ruby scripts. One of the main goals is to enable continual load testing. I'm building out the rest interface now. If anyone wants to be an early adopter, drop me a line.

That's how we roll at Cloudkick. We ship the code. We're going to get all of Rackspace to do this too if we can.

Will you really be able to pull that off? I imagine certain parts of the code base you could use CI, but Rackspace servers cannot have downtime. If I were a Rackspace customer, I wouldn't want innovation - I'd want my boxes to never go down. And I believe that means a rigorous QA process.

It's not always true that development is a trade-off between fast feature development and stability. If you're building a monitoring system (like Cloudkick), more features means more stability for whatever you're monitoring.

It seems like you're creating a false dichotomy between rigorous QA and continuous deployment.

If you have a working and stable production system and multiple layers of test automation, rigorous QA gets easier, in that they only have to validate one change at a time.

Interesting stuff. I've been doing something similar with my sites for over two years now. Its more of an workflow than just a system really.

1. A staging server where the new changes are first tested 2. Production server which gets the updated and verified working code 3. In case of any trouble (which hasn't happened in quite sometime btw) the old solution backup is kept with a date and time and can be reverted just as easily as code is updated from staging server

You can actually use a load balancer to roll over the servers too while deployment and it takes the factor of downtime away to an extent. But this can't be done with a powershell or any shell script and a network admin has to be present at the time of deployment which makes it difficult.

I have had next to none downtime on my servers. Last I remember my sites going down was during a DNS shuffle well over 6 months ago, never because of some developer or sys admin screw up. Yes I do have more than ten deployments on a single site in a day.

We use Capistrano[0] to automate releases in the same manner. It keeps multiple copies of the site in timestamped directories and has a 'current' symlink which it just points to the latest. If you ever need to roll back, just point the symlink to a different timestamp.

Similarly, for our rails apps we use unicorn[1] to do no-downtime deploys. When we roll out new code, unicorn brings up new workers while the old workers are still running. The old workers are not sent any new requests, and once they finish their in-progress requests, they are killed.

[0] https://github.com/capistrano/capistrano/wiki While most of the docs talk about rails, capistrano isn't ruby-specific. We use it to deploy node.js apps as well. [1] http://unicorn.bogomips.org/ A forking ruby http server

Do you also have timestamped databases? How do you deal with schema changes between versions?

Most of our deploys don't have a schema change, and often when adding tables or columns the migration can be run before the new code is deployed. When dropping or renaming columns you'd usually need to first deploy code that is able to gracefully handle the migration.

However, sometimes it's not worth the extra code complexity and we just take the site down and migrate.

So what do you do when you want to make a change that isn't a logical evolution of your existing codebase? Say, for example, you have an ecommerce site, and you're switching payment processors to use one with an entirely different API. Surely you can't just push a change like this straight to your production server.

Make the code change conditional and push it out off, cookie (or query string or URL) force some users into the new path...

If you don't already have an A/B or multi-variate framework in place, first push the no-effect change to the old code to understand that now it needs to be conditional on XX.

IMO, adding a new payment provider is a logical evolution. If your intent is to turn the old one off, I still believe it's worth the complexity to run them both until you're sure.

Yep, after all the tests run. I'm sure you have good test coverage on your payment stuff, right?

Besides, you should have your calls to your payment processor both abstracted and logged, so switching processors shouldn't change the way your application files payments. And you can review the logs to make sure things are still working, too. Which you should do anyway, code changes or not.

Continuous Delivery is a great book about how to build such deployment systems and infrastructure (I'm only about a third of the way through it):


Shameless plug -- we're building hosted continuous integration and deployment for heroku. We'll be opening up our beta real soon, email hn@zenslap.me to get an invite. More information at http://zenslap.me.

Does anyone have experience trying this in a more enterprise-friendly product? We'd have a struggle figuring out how to communicate these changes to our customers on a daily basis, not to mention the absolute horror if something changed/broke right before or during a crucial production run. There's a lot of things I want to implement from these ideas, but the examples I see are usually high-volume consumer websites, so I am curious if people in other areas have success with this.

That's pretty cool. This could work in mobile but Apple's review process kills it there and the monitoring part is hard. App stores, though, make this possible. I wish Apple would allow a post-review oath for trusted developers, those who have established a track record of successful releases.

If a mobile app wanted to be updated every single week, I would get annoyed at some point. The point is that web applications can do this in a seamless manner.

There's no technical reason why that should be so. Google Chrome doesn't make you click a button to update it, or even tell you that it has updated. It just does it.

Mobile is different because people have to pay for bandwidth

Not a problem with Google's binary delta based updater.


Unless you live in Canada.

one of the many reasons i prefer android. this could work in the android model

I don't even roll out changes really, I edit the live production code, one small change at a time. I always felt this was kind of dumb, and I know it brings my site down for a few seconds to a minute here and there (about 1k visitors are on the site at any given time). But i've always felt this allows to get 5x more work done than I would within structured rollout system with version control, etc.. and has given me a great sense of instant satisfaction that motivates me to keep working.

You don't use version control?

I'd seriously recommend learning git (or svn). After a few days learning curve it won't slow you down in the slightest, and it will have a huge effect on the quality of your codebase since you won't need to keep old code around and you'll always be able to figure out what you changed, why and when.

Is anyone doing this at a site where serious money is involved? For instance, that 4 minutes of downtime described in the article would be a loss of something like $120k at Amazon.

Looking at the opportunity cost lost by four minutes of downtime without comparing it to the increased cost of not doing continuous deployment doesn't seem like a fair comparison.

Also, with huge monolithic deployments, the risk of much longer downtime is increased. If you can get deployments into non-events, you don't have as many catastrophic problems.

there is money involved at Etsy. every minute of downtime costs them $1000

Etsy exhibits an ever-changing array of glitches and bugs which are difficult to pin down, monitor and understand. These are exhibited in both the web interface and the API.

I think that their plan to 'push code' 25 times a day or whatever plays a role in that. And really, I don't think having their dogs, VCs and first day employees publish changes to the site helps.

I think the issue is which test they have written - their tests aren't catching a few important details here and there.

I wonder if they have some kind of automated test suite just to make sure everything works fine. Having worked on large codebases, it's almost impossible to make sure everything is still alright manually, after a new deploy.

Good tests are almost a pre-requisite to continuous deployment. Usually code is only deployed after passing the tests. Check what "continuous integration" is.

You can do continuous deployment without an automated test suite. Heck, I've hacked source code on production servers more times than I like to admit, skipping the deployment part altogether :p

Tests are the reason why continuous deployment works. The path going from what the programmer has typed to running on the production server becomes so short, running the test suite non-stop increases confidence on the changes.

Yes, there are tons of unit tests that are run (voluntarily) before commits and (automatically) on staging pushes.

And then once code is live we have many system and business-level monitors in place, so we know almost immediately if anything's wrong. More info about that here:


One thing to add is our test wizards have also conjured up a system we call the "try server", which allows the execution of all the test suites asynchronously on fast machines before you commit. So, you don't have to wait for your laptop or machine you're working on to run the tests, you just kick it off and get an email with your results in a few minutes. This makes it so it's painless to be sure you'll never commit a red build.

Yeah, I was thinking about doing a similar thing here at Cloudkick.

Currently, our test suite is not that large and it takes around 6-8 minutes to complete, but with testing every minute counts (testing is generally not that fun and a slow test suite just makes it more painful).

We have a two types of tests:

- Twisted tests - this tests runs asynchronously and finish pretty fast so they are generally not that problematic - Django tests - Django tests don't run in parallel so they are pretty slow. Recently, I was playing around with the Django test runner and I have made some modification to it to run the tests in parallel. Now the Django tests finish around 50% faster.

The only problem with this solution is that it is a "hack" and it requires some modifications to the Django core (I guess I should play more with the nose parallel test runner).

We also use some other "tricks" which make tests run faster - for example, MySQL data directory on a test server is stored on a ram disk.

Coupled with slow tests is the fact that unless you have something like the try server it ties up your machine while the tests are running. Being able to just kick it off and continue working on the next feature or bug goes a long way to reducing the pain.

There is a great example of continuos deployment, good tools and methods here: http://vimeo.com/14830327 .

Big changes create big problems. Little changes create little problems

The real problem comes when little changes (unknowingly) create big problems

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact