I've experienced the exact opposite - no end of problems with Rails/Django/Alemb...

GlennS · on Aug 7, 2020

I've done automated diff-based migrations before using SQLDelta. It caused me a lot of trouble and I think it's a bad approach.

These diff tools just don't have the information they need to solve the data problems that are the actual hard part of schema migrations.

Simple example: I want to concatenate these two text columns into one.

I do think SQLDelta and similar tools can still be useful after doing the real migration using incremental scripts. You can use them to check for problems, and to fix up easy things like views and stored procedures.

djrobstep · on Aug 7, 2020

Of course they don't automatically solve data migrations for you, I'm not sure how any tool could possibly guess your intent like that.

If you have an example of how a chain-based migration tool makes this easier, I'd love to hear about it.

GlennS · on Aug 7, 2020

I've never worked on anything that separated data migrations from schema migrations, so I suspect I'm not quite understanding you, but...

Your data migration is just part of your chain of incremental patches?

For my example you just write:

    ALTER TABLE my_table ADD COLUMN my_column TEXT NOT NULL DEFAULT '';
    UPDATE TABLE my_table SET my_column = my_old_column_a || my_old_column_b;
    ALTER TABLE my_table ALTER COLUMN MY_COLUMN DROP DEFAULT;

jashmatthews · on Aug 7, 2020

Beyond a fairly small scale you have to separate data migrations from schema migrations because they just take too damn long.

GlennS · on Aug 7, 2020

I believe it and it makes sense, but I've never encountered it myself.

Sounds like your definition of "small scale" encompasses every project I've ever worked on. :D

djrobstep · on Aug 7, 2020

Right, that's exactly what you might add to a diff-generated script to do the same thing.

So it doesn't inherently seem better or worse to me.

GlennS · on Aug 7, 2020

The difference is that - if I understood right - you're generating your diff just before you actually do the migration.

By comparison, the incremental migration scripts are written at the same time as the feature (I like to do them as step 1 at lot of the time) and by the same developer, who currently has the problem in their head.

djrobstep · on Aug 7, 2020

I don't know of anybody doing that - unless you had automated checks that it was a fast and non-destructive operation, that would be highly risky.

I'm simply talking about using the current production database as a basis for generating the required changes, rather than "here's what I think my production database currently looks like based on a version number and these 75 chained migration scripts I've run on it in the past"

GlennS · on Aug 7, 2020

I'm still confused. When do you generate your diff then?

I think migrations ought to be done by the developer at the same time that they write their other code changes, because they have all the context at that time.

However, it doesn't seem like that could work for diffs, because the version of the production database you're diffing against may not be the same as the version you end up deploying to.

djrobstep · on Aug 7, 2020

Generate it whenever you prefer, probably with the rest of the code changes as you say.

Immediately before applying, check that the database is still in the same state as it was when you generated the diff script. If yes, apply. If no, abort.

jashmatthews · on Aug 7, 2020

I like the idea but haven't been able to make it work in practice. Diffing produces migrations that just can't be run on big production DBs and it just doesn't work properly if you're using schema based partitioning or have partitioning which depends on the inserted data.

evanelias · on Aug 7, 2020

These problems are definitely solvable with proper tooling. The specific challenges vary between RDBMS though. For example when running schema changes on large-scale MySQL/MariaDB, the key things are supporting external OSC tools, and supporting sharding. I wrote a diff-based schema management system, https://skeema.io, designed with precisely these concerns in mind.

djrobstep · on Aug 7, 2020

Which non-diff based migration tools automatically generates migrations immediately suitable for big production databases? I don't think any tool is capable of that, certainly none I've heard of.

bb88 · on Aug 7, 2020

> production state + pending changes = goal state

That's exactly what Django is doing, btw.

> I did a talk at PostgresOpen a while back about the problems with Django-style migrations

Good talk generally, and a lot of stuff I agree with.

If the migrations sit out for a year in the code base, it's not a big deal though. It looks like a big deal, but in practice, it's not. Migrations can be reset back to zero if you want in Django.

But when you're trying to develop in parallel developer environments, the Django migration DAG alerts you when you have two leafs that haven't been merged.

And if you ever had to roll back a failed prod deploy, Django supports a backwards migration too.

I get what your saying, but in practice the benefit of using Django where the "code is the configuration" is better than trying to keep code and configuration separate and trying to match them up later.