Ask HN: It's 2016, how do you do your automated zero downtime DB migrations? - raimille1
======
umut
Sometimes zero downtime migrations are not worth the effort. In most of the
cases, you end up thinking of a 3-state picture (BEFORE / DURING / AFTER) and
the delta between DURING and AFTER needs some manual work. As long as you
don't drop data or end up with irreversible changes, you manage the process
and complete the migration just fine.

------
orionblastar
When I did DB migrations, I first tested out my code on a test server that had
a copy of the production database so see how well it would go and if there
were any problems.

I would test out all code on a test server first before putting it on a
production server.

Migrations were usually scheduled at Midnight and finished before anyone came
into work. Before the Migration a backup of all data was done.

It is the only safe way to do things.

~~~
mrits
You risk losing data before the between the start of the migration and the end
of it. Unless you disable the service. In that case you certainly don't have
zero downtime.

------
bobf
Some aspects of zero downtime DB migrations require your changes and
deployment protocols to follow certain rules, so I'm not sure a generic
automated solution is possible. I was writing and speaking a fair amount about
this back in 2012 and the concepts are still basically the same today -- I'll
outline the core ideas here and provide a few links with more configuration
examples, etc. at the end.

The basic idea behind zero downtime DB migrations is to use a Blue-Green
deployment model. Blue-Green deployment uses two identical production
environments, called "Blue" and "Green", with a router or proxy setup in front
of them. Only one of the environments serves traffic, while the other is idle.

For our example, let's say Blue is initially live. When you deploy new
application code, it would be deployed to Green and tested. Once you're happy
with it, you switch the router or proxy to send requests to Green. Blue
becomes idle, and then you update Blue.

With MySQL as the database, I like to call this setup "Active-Passive", using
a Master-Master infrastructure. Normally, Master-Master implies an "Active-
Active" setup with each instance being able accept reads and writes. (People
that have used Active-Active Master-Master MySQL recognize the big downside:
conflicting writes break replication.)

Using an Active-Passive approach for supporting zero downtime schema changes
lets you avoid the problem of conflicting writes break replication. Basically,
think of Active-Passive as the ideas used in Blue-Green deployment, extended
to your database.

The basic process is:

    
    
      - Run STOP SLAVE on both the Active and Passive servers.
      - Run SQL for the schema change on the Passive server.
      - Run START SLAVE on the Active server.
      - Wait for replication lag on the Active server to become small enough (ideally about a second). You can check replication lag with SHOW SLAVE STATUS "Seconds_Behind_Master", although that isn't 100% reliable and you are better off with something like Percona's MySQL Toolkit's pt-heartbeat.
      - Run LOCK TABLES on the Active server for the final replication catchup
      - Ensure replication lag is zero on the Active server
      - Modify your proxy configuration to change the Active/Passive designations
      - Unlock the new Passive server
      - Run START Slave on the new Active server
    

Now, there are some limitations:

    
    
      The new schema must be backwards compatible with the previous schema:
        - Add new columns with triggers rather than modifying in place.
        - New columns cannot be required immediately, or old writes will not replicate appropriately.
        - You can't use server-generated data functions (UUID, NOW, RAND, etc)
      And it can't conflict with pending writes:
        - No auto-increment INSERT unless the application doesn't insert to that table.
        - No DROP COLUMN nor DELETE rows if they are used in the previous schema version.
    

Resources:

My article in SysAdvent on Zero Downtime MySQL schema changes:
[http://sysadvent.blogspot.com/2012/12/day-3-zero-downtime-
my...](http://sysadvent.blogspot.com/2012/12/day-3-zero-downtime-mysql-schema-
changes.html)

Slides from a presentation I gave at a Devops conference in 2012:
[http://www.completefusion.com/zero-downtime-deployments-
with...](http://www.completefusion.com/zero-downtime-deployments-with-mysql/)

Pivotal Labs about Blue-Green deployment:
[https://docs.pivotal.io/pivotalcf/devguide/deploy-
apps/blue-...](https://docs.pivotal.io/pivotalcf/devguide/deploy-apps/blue-
green.html)

(As a side note, if you haven't heard of it before, SysAdvent is awesome: ~25
sysadmin/devops posts published annually in December.)

~~~
raimille1
Blue-Green with that strategy really sounds like a great strategy, also thanks
for the sysadvent link!

