We've been doing continuous deployment at Ars for about 1.5 years. We would generally write tests for our apps before that, but they were somewhat directionless (and awfully easy to blow off for a "minor" change). Once our tests became crucial, we did a much better job of picking relevant tests and feeling their importance.
Many people visibly shudder when we tell them about our deploys. What they don't consider, though, is that manually ensuring a given release is "good" is less reliable than letting a well instructed computer system do it. When you make changes, run your tests, and then eyeball things to make sure everything's cool, you're really just doing unstructured integration testing. You're likely to miss regressions, unintentional bugs in seemingly unrelated systems, etc.
The thing that's amazing to me is that anyone would choose to NOT work like this. If you are web-based startup and cannot do multiple deployments a day, or you don't empower ever last person on your technical team to do a production deploy, you are at a serious competitive dis-advantage.
Not the best idea. I don't debug as well under the extreme pressure of my site being broken with the clock ticking.
This is possible because many new behaviour is controlled by "feature flags" which are associated with the server configuration. So you have the benefit of everybody, developers and production, staying very close to trunk, but still being able to have radically different behavior on development machines versus production.
That said, I have participated in debugging sessions on a product which used continuous deployment, and they can indeed be nerve-wracking. Personally I wouldn't want to use CD by itself, without a good culture of code review and a great test suite.
Also, while they don't do full rollbacks, I suspect more than one fix has been "remove the offending code until we can figure out what's wrong".
I'm not sure that the number of lines makes the change more or less risky.
Rolling back the code-base would mean everyone else would need to roll back theirs, and you'd lose history. It's easier across the board to simply re-write what it was - a "roll forward", if you will.
Employee calls the CEO of the company in the middle of the night panicked, saying the site is down and there is no way they can fix the problem, they don't even have the right skill set.
CEO tells the employee to calm down and do their best.
Site is back online twelve minutes later.
That often means you need managers who are especially good at (or at least dedicated to) deflecting the freak out/crazy.
CD won't increase your changed related incidents, but to paraphrase the IBM parable, "no one ever got fired, for doing quarterly releases, and heavy QA."
* Note: I am not endorsing this as a setup, it's crazy. I don't think I've ever seen such high staff turnover.
I am in the opposite situation at work. My company has scheduled, monolithic, all hands on deck releases once a month. It's an insane legacy policy from a time before our dev team had scripted releases and automated tests running against every commit in development. We've solved the technical issues of continuous deployment, but socially, we're stuck in the dark ages. It's a huge morale killer. We've had several great developers rally to change this policy and were stonewalled, and eventually left.
With established companies, with established revenue and legacy product/process, management thinking is probably something along the lines of "if it ain't broke, don't fix it". They don't --pardon my french-- give a shit about morale if the profit is rolling in. If they didn't start with CD, they aren't going to do it now.
So to champion any change, you're going to have to make a case around greater revenue/margins, better (less) customer support, more defensibility in the marketplace, or something else that gets the suits all geeked.
Complementary part of this theory:
Startups need every advantage they can get since they have the weaker market position when they begin. So they're more willing / required to be innovative about every last bit of their process, including deployments.
Other advice -- start slow. Prove the process on a smaller part of the product.
We got continuous integration working first by setting up an automated test server. We used cerberus, which is a trivially simple ruby gem that simply polls a repo, grabs code, runs tests, and reports the result. You could install this anywhere, even an old Mac mini if you wanted. We spun up a low end server for it. We wrote great tests, got decent coverage, and made adjustments in our automated testing strategy to increase our confidence.
Then we worked on zero-downtime deployment and rollback. This was actually the hardest part for us. With regard to schema changes, if we had to hammer our database (new index on a big table) then we needed to take the site down, but otherwise our strategy was to always add to the scheme, wait for the dust to settle, and then remove things later. This worked for us, but we had a relatively simple schema.
We then combined these two to get automated deployment to a staging server. We could have written monitoring code at this point, but we decided to punt on that, relying on email notification for crashes and watching our performance dashboard.
And finally, we set it up to deploy to production, and it just worked, and we never looked back. It was the most productive and pleasant setup I've ever worked in.
This can be kinda.. awkward. What I've opted either a lightbox or some kind of message saying we need to refresh the page. But that is not ideal.
Something that you could do (and I may wind up doing) is just having the page ask the server if it needs to refresh itself every so often.
i had so much fun
We've actually had less broken downtime since we started doing automated deploys than we did beforehand. It's partially the result of good testing, partially because the changes are just so much smaller than they used to be, and partially because pushing to our master git branch is now serious business.
If you only deploy in big steps, you have already worked on a few other things and have forgotten a lot about the (broken) code. So fixing things take more time and it becomes more difficult to learn the 5 whys behind the bug.
Continuous deployments ensure delivery of features and turn around time fast enough for a changing business model of startup and can't be stopped either.
So uptime and continuous deployment model has to happen at the same time.
Deployments should always be automated and revertable. If you think you can run a healthy startup with just about adequate mistakes perhaps you got an easier place to work at :)
We do that because it's simply faster and easier to roll forward than to roll back the entire deploy. Rolling forward means taking advantage of the percentage rampups of new code paths, feature and config flags to turn things off/on, and even reverting the 5 line change is simpler than rolling it all back.
So no, I'm not suggesting we let bugs in prod languish while they are debugged for a long while.
One useful technique I've found is to have Fabric deploy to a "staging" install on the target server (we still use SVN). It runs unit tests in that install, and if the tests fail, it stops deployment. If they pass, fabric will then complete deployment to the production server (and run tests again for good measure).
Of course that depends on having good testing practices and coverage, but it helps reduce the number of stupid last-minute mistakes that creep into the repo. Fabric is also useful in numerous repetitive small tasks that you need to run without going through the rigamarole of SSH.
Generally though we avoid running South through Fabric - changes to data schema are run very carefully after backing up etc. South is great, but any migration tool should be used with due care.
For others like me, from http://south.aeracode.org/docs/about.html
South brings migrations to Django applications. Its main objectives are to provide a simple, stable and database-independent migration layer to prevent all the hassle schema changes over time bring to your Django applications.
What I would be more impressed by is if they could run tests against a full load of real user input, and have useful/reliable metrics come out the other end. There's no reason for the deploy to fail if you have a real mechanism for testing the deploy. I've yet to hear of a shop of the size of an Etsy that does real production/load testing.
Impressive. But this not production test, where you tee the actual input from users into the system. Nor is it load test, where you measure whether or not the site is performant under real load. For that matter, there's nothing here that indicates you debug your operations (provision/deploy/backup) at all. My comment about not seeing such an operation stands.
> Load average, cpu usage, php errors and dies and more are sampled by the push script, as a basis line. ... A minute later the push script again samples data across the cluster and if there has been a statistically significant regression then the revision is automatically rolled back.
Sorry if this seems rude. You're definitely way ahead of the game, but you're not addressing the thing that I'm talking about.
If you have a working and stable production system and multiple layers of test automation, rigorous QA gets easier, in that they only have to validate one change at a time.
1. A staging server where the new changes are first tested
2. Production server which gets the updated and verified working code
3. In case of any trouble (which hasn't happened in quite sometime btw) the old solution backup is kept with a date and time and can be reverted just as easily as code is updated from staging server
You can actually use a load balancer to roll over the servers too while deployment and it takes the factor of downtime away to an extent. But this can't be done with a powershell or any shell script and a network admin has to be present at the time of deployment which makes it difficult.
I have had next to none downtime on my servers. Last I remember my sites going down was during a DNS shuffle well over 6 months ago, never because of some developer or sys admin screw up. Yes I do have more than ten deployments on a single site in a day.
Similarly, for our rails apps we use unicorn to do no-downtime deploys. When we roll out new code, unicorn brings up new workers while the old workers are still running. The old workers are not sent any new requests, and once they finish their in-progress requests, they are killed.
While most of the docs talk about rails, capistrano isn't ruby-specific. We use it to deploy node.js apps as well.
A forking ruby http server
However, sometimes it's not worth the extra code complexity and we just take the site down and migrate.
If you don't already have an A/B or multi-variate framework in place, first push the no-effect change to the old code to understand that now it needs to be conditional on XX.
IMO, adding a new payment provider is a logical evolution. If your intent is to turn the old one off, I still believe it's worth the complexity to run them both until you're sure.
Besides, you should have your calls to your payment processor both abstracted and logged, so switching processors shouldn't change the way your application files payments. And you can review the logs to make sure things are still working, too. Which you should do anyway, code changes or not.
I'd seriously recommend learning git (or svn). After a few days learning curve it won't slow you down in the slightest, and it will have a huge effect on the quality of your codebase since you won't need to keep old code around and you'll always be able to figure out what you changed, why and when.
Also, with huge monolithic deployments, the risk of much longer downtime is increased. If you can get deployments into non-events, you don't have as many catastrophic problems.
I think that their plan to 'push code' 25 times a day or whatever plays a role in that. And really, I don't think having their dogs, VCs and first day employees publish changes to the site helps.
I think the issue is which test they have written - their tests aren't catching a few important details here and there.
You can do continuous deployment without an automated test suite. Heck, I've hacked source code on production servers more times than I like to admit, skipping the deployment part altogether :p
Tests are the reason why continuous deployment works. The path going from what the programmer has typed to running on the production server becomes so short, running the test suite non-stop increases confidence on the changes.
And then once code is live we have many system and business-level monitors in place, so we know almost immediately if anything's wrong. More info about that here:
Currently, our test suite is not that large and it takes around 6-8 minutes to complete, but with testing every minute counts (testing is generally not that fun and a slow test suite just makes it more painful).
We have a two types of tests:
- Twisted tests - this tests runs asynchronously and finish pretty fast so they are generally not that problematic
- Django tests - Django tests don't run in parallel so they are pretty slow. Recently, I was playing around with the Django test runner and I have made some modification to it to run the tests in parallel. Now the Django tests finish around 50% faster.
The only problem with this solution is that it is a "hack" and it requires some modifications to the Django core (I guess I should play more with the nose parallel test runner).
We also use some other "tricks" which make tests run faster - for example, MySQL data directory on a test server is stored on a ram disk.
The real problem comes when little changes (unknowingly) create big problems