Data migrations? Denormalizing columns from one table to one or more child table...

liyanchang · on May 10, 2019

I’ll openly admit that we don’t have everything figured out. You’re absolutely right that currently, we constrain ourselves on what we migrate to admittedly simple migrations.

I think there’s a whole set of problems to be solved in this space and frankly, I’m as surprised as anyone that given how SQL is declarative, we use procedural code to do migrations so part of my post was hoping people would tell me what tool I should be using or how this approach fails over time. So your examples are really helpful for me as I think through if it’s possible to do automatically, workaround, or get by without.

It seems to me that we just lack the ability to express these transitions mathematically that can help us do them. And of those, there’s probably only a subset which are possible to do without taking downtime.

In particular, the class of migrations that you outlines are a combination of DDL and DML changes and also have quite a bit of code complexity to do without downtime. It’s definitely a current weakness.

evanelias · on May 10, 2019

A totally valid point, but I'd argue those should be handled by a separate tool or process. Data migrations tend to be fully programmatic; tools and frameworks can help reduce the code required, but cannot handle every possible case. (having performed numerous multi-billion-row data migrations, I learned this painfully first-hand...)

For simpler cases, where it may make sense to run a data migration immediately after a schema change, a good generic middle-ground may be configurable hook scripts. A declarative schema management system can then pass relevant info to the hook (which tables were changed, for example) and then the script can run any arbitrary row data diff/apply/migrate type of logic.

I do understand your point though; for relatively straightforward data migrations, an imperative system can capture these much more cleanly by just coupling them with the corresponding schema migration code.

barrkel · on May 11, 2019

I honestly like the way Rails does it: both capturing the imperative deltas and dumping the final schema which gets checked in. Not a big fan of down migrations, usually a waste of time.

Otherwise I like Percona's OSC, particularly how it can tune down table rewrites when there's competing work, or replication is lagging too much. We're just at the point where we need to automate the OSC tool rather than using it as a point solution for migrating our bigger tenants.