
Feature flagging to mitigate risk in database migration - mikojava
http://blog.launchdarkly.com/feature-flagging-to-mitigate-risk-in-database-migration/
======
languagehacker
Sure, this works for one kind of migration -- a whole-hog swap of data source.

How would this work for alterations to an existing database?

I don't think it would, at least not for an unsharded relational database.

Anyhow, just something to think about. NoSQL's magic, right?

~~~
pkaeding
You can definitely use feature flags to slowly (and controllably) roll out an
alteration to an existing database, but exactly what it would look like
depends on the alteration.

If you are going to make a schema migration with no downtime (even if you are
not feature-flagging it) you will need to make the code work with both the old
and new schema for a period, anyway. So, if you are feature flagging it, this
period might just be extended, and you roll out the migration slowly. The
'new' thing I introduce with this approach is really just the 'integrity
check' (and the gradual rollout, allowing you to monitor for
errors/performance issues).

For example, if your migration involves:

\- adding/removing a field: your code needs to deal with the fact that the
field might not be there. This usually means you have a reasonable default
value, or something.

\- restructuring: this is a broad category, so lets work with a more concrete
example: you currently have a nested (unbounded) repeating sub-object, and you
want to break them out into their own top-level documents in another
table/collection. In this case, you will have a DAO function
`getChildren(parent)`, which uses the feature flag to decide if it should make
the query to read the parent, and then pull out the nested list of children,
or if it should load the children from their separate collection. You can save
the children in both places, and do the dual reads as well, comparing the
results. When the cutover is complete, you run a script that removes the
nested children (instead of decommissioning the old database, like in the
example in the blog post).

Ultimately, the concept is the same: you do the work in both places for a
period of time, and check the results. If there are any discrepancies, you
still have the fully-functional original data source.

------
drichelson
Did you encounter any performance problems during the 'Early Canary Read'
phase? this seems like a lot of DB action.

~~~
pkaeding
Author here.

Yeah, at that point, you are doing two writes, and (possibly) two reads. I
simplified the code a bit for the blog post, but you can do the read and write
pairs concurrently, so you don't have to wait longer to get data to return to
the caller.

------
albertmw
interesting concept, but I don't see how rollbacks would work. aren't you at
risk of losing data?

~~~
ivan_ah
Why would you need to rollback?

The new DB is not authoritative until the final step of the switchover, so you
could easily scrap it at any point.

