
Migrations and Future Proofing - w01fe
https://github.com/Prismatic/eng-practices/blob/master/swe/Migrations_and_future_proofing.md
======
spoondan
This is a really good write-up.

In consulting and mentoring on this topic, I've found a lot of engineers push
back against how "dirty" it is to have multiple copies of the data around in
different formats. It feels wrong to not have a single, authoritative data
format at any given instant. If the idea is to change the column type, why not
just `ALTER TABLE ... ALTER COLUMN` instead of `ALTER TABLE ... ADD`?

But if you think about it, excepting trivial cases, once you're migrating
data, there are parallel realities at least for the duration of the migration
and deployment. It's not a question of whether you create divergence by
versioning/staging (in some fashion) your data. It's a question of whether you
manage the divergence and convergence of the parallel realities that already
exist as part of a migration. If you don't, you either incur downtime or risk
data corruption.

One big win here is that, by being disciplined about your code and data
changes, you can cleanly separate _deployment_ from _release_. You can deploy
a feature but have it disabled or only enabled for a subset of users.
Releasing a feature means enabling its feature flag, not orchestrating a set
of migrations, replications, and deployments.

------
nostrademons
When I was at Google this was the single worst problem we had in engineering,
at least in terms of engineer-hours consumed. We came up with a bunch of
solutions, a few of which (like protobufs) are open-sourced and many of which
are just in the heads of the engineers who did them, but there's unfortunately
no general solution to the problem. Sometimes I dream about a programming
language that has thought through all these issues and includes "evolvability"
as a first-class design constraint, but oftentimes these problems show up in
multi-process situations where you may be using multiple programming
languages.

------
tcopeland

        When you introduce a new API endpoint or format 
        for data at rest, think hard
    

Yup. I've added columns where I've used a datetime where a date would have
sufficed and then regretted it later once tons of data was already in the
table. Or added a varchar(255) and only later realized that that wasn't big
enough. Sometimes the wrongness of a type only becomes clear down the road.

    
    
        If you're designing an experimental server-side 
        feature, see if you can store the data off to the 
        side (e.g., in a different location, rather than 
        together with currently critical data) so you can 
        just delete it if the experiment fails rather than 
        being saddled with this data forever without a 
        huge migration project.
    

Yup, sometimes an extra join or lazy-load is well worth the isolation.

~~~
DenisM
Why is it hard to drop an extra column from a database, or to change its type?
ALTER TABLE...

~~~
taeric
It a) takes time, and b) means current code can not possibly work. Worse, it
c) means canceling a rollout is now a complicated process, due to b.

------
shykes
When we introduced pluggable storage drivers in Docker 0.7, we wanted all
existing data to work as usual (full compatibility of data at rest), but we
also wanted to migrate the layout of the legacy storage system (based on AUFS)
so that it would be "just another driver" instead of a perpetual special case.
At the same time, we didn't have the luxury of a full-stop mandatory
migration, because if anything went wrong, the upgrade would fail and the user
would be stuck in a hairy half-migrated situation. Keep in mind we are not
talking about a relational database, but directories used to mount the root
filesystems of live containers. That means that some of those directories may
be mounted and therefore unmovable. So we had to accomodate partial migration
failure, and the possibility of a partially migrated install.

So we shipped a migration routine which ran at startup _every time_ and gave
up (gracefully and atomically) at the slightest sign of trouble. Over time, we
reasoned, each install would converge towards full migration, and the huge
majority of containers would be migrated within seconds of the upgrade. The
rest would be much easier to deal with if anybody had any trouble.

Of course we had the luxury of a data structure which allowed this.

------
jamessantiago
At least for the server size, entity framework partially solves this with code
first migrations. Using the code first model you can define your data types in
code then have entity framework generate the appropriate sql code. If you
change your data structure down the road you can autogenerate a migration that
changes the database from one version to another. If you deploy a version that
is a few migrations ahead then it will execute the proper migrations one after
the other.

For the client side it's usually a good idea to specify a versioning
relationship between server and client. AWS, for example, you request the API
version you want to use:
[http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGu...](http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGuide/APIUsage.html)

------
tieTYT
How does Erlang deal with these problems? It often touts minimal downtime and
the ability to run updates to your code while it's running.

I _think_ that means you can have Process V1 and Process V2 running on the
same server simultaneously. If they read from the same database, won't you run
into issues?

~~~
jamii
When you update an erlang module, the vm stores the old and new versions of
the code. Calls like foo() call the current running version. Calls like
module:foo() call the latest version of that module. So typically you would
have all the control flow for a given process in one module and it would use
the module call to control when the upgrade happens. At that point it gets to
pause and migrate it's data.

The process isolation and code swap mechanisms do give you some help, but when
it comes to messages sent between processes you are back in the same boat with
API versioning or carefully ordered upgrades.

There is no real magic involved. It takes careful thought and tons of testing
to make live upgrades work. Most of the projects I worked on preferred to just
eat the few seconds downtime instead rather than risk getting the server into
some weird in-between state.

------
w01fe
Author here, would love feedback on this, and also happy to answer any
questions.

~~~
ivanceras
I saved the document for later use, but it got me thinking how ORMs like
hibernate is dealing with the migration and future proofing issues. I wrote
code libraries for a small company, in which part of the libraries design is
to allow data transformation/representation of Data Models to work on the new
database schema/changes while representing the data structure as if it were
from previous versions. So I ended up writing generators to create a mapping
of the DAO and the Models exposed to the API's in such a way that when the
database changes while you still have to support the previous version of the
API, I will only have to edit/tweak the mappers of the previous version of the
API.

The library has been opensourced here in this link
[https://github.com/ivanceras/orm](https://github.com/ivanceras/orm)

