For this to work, every action that will result in a database write needs to go in log, and that action log needs to be replayable be the updated version.
In the context of a web application, you have:
1. Take your database backup, start recording action log.
2. Perform your database migration against your database backup.
3. Install your updated web application (not available to users yet).
4. Test your updated web application.
5. Disable your existing web application.
6. Replay the contents of your action log using the updated web application.
7. Making your updated web application available to your users.
It's definitely not a trivial matter to sort out: it means lots of time working on your deployment process, for example you have to be set up to have both versions deployed simultaneously, which likely means a lot of mucking about with URL rewriting. The application also has to be built to support it.
Maybe there is a better way?
The author makes some amazing points that I'm surprised intelligent people are missing.
Let me give an example. We are currently migrating from storing blobs of data in Voldemort (a key/value store) to storing them in S3. They should have never been in there in the first place but whatever. We're going to do it with "zero" migration time. In fact we're already doing it.
- Set up a job that copies existing data in Voldemort to S3.
- Deploy a minor release of our code that multiplexes the current writes to both Voldemort and S3.
- Continue migrating existing data
- When existing data is finished migrating, deploy a new release that forces all traffic to S3 instead of Voldemort.
People need to learn to do things like dark launching and feature flags. Dark launching let's you exercise new code paths with no impact to the user. Feature flags give you the ability to enable to features to some or all of your users. Feature flags are awesome ways to A/B test as well.
People need to stop doing stupid shit like redefining what some property means mid-release and instead define NEW properties and deprecate old ones in subsequent releases.
Same goes for schema changes. If some part of your code base cannot tolerate an additional column that it doesn't need, that's a bug.
You can also adopt some configuration management that allows you to provision duplicates of the components you're deploying so you can swap back and forth between them in the case where you might have a breaking release. That's what we do and it's one of the upsides of using EC2.
All of this requires discipline and dedication but the benefit is so worth it. You have to stay on top of dependencies. You can't let bitrot take hold by going 4 major revisions without upgrading some package. We did and it bit us in the ass.
This is why we adopted the swinging strategy of duplicating our entire stack (takes about 30 minutes depending on Amazon allocation latency) on major upgrades.
As for deploying to a segment of users put a "version" field in the user and then gate features based on the version. If you name your versions or use a unique numbering system it should be fairly easy to remove the old gating.
Yes, none of these things are foolproof but then again your existing system probably isn't either. The more you deploy the better you get at it and the more automated it becomes. Force the issue by forcing a deploy everyday to begin with, then ramp up to twice a day, etc. After a month or two you'll learn solutions to almost all your deployment problems and there will be a well-known solution in the organization for solving the problem whether it be gating features or schema migrations.
What about having two code bases on your production server and having your web server (nginx) route the traffic accordingly? 85% goes to the current code base and 15% goes to the new one?
If that's possible it would seem pretty easy.
Gating would probably work better in the long term as you get more devs and users, and need to do things like A/B/C testing.