Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How does your development team handle database migrations?
269 points by abetlen on May 10, 2019 | hide | past | favorite | 146 comments
What's your team workflow for delivering features that require database migrations. How do you keep migrations from slowing down development as your team grows.



I've been really happy with how my current company[0] has been doing migrations and I've seen a couple others do it but it seems like it should be more widespread.

Database Schema as Code

Instead of writing up and down migrations, you define what the end state should look like. Then the computer will figure out how to get here. This is just how the industry started managing server configurations (Puppet) and infrastructure (Terraform).

We use protocol buffers so it was pretty straight forward to have a definition of what our tables should look like. We have a script that figures out what the delta is between two states (either proto files or a db) and can calculate the schema migration SQL (e.g. CREATE TABLE, etc).

From there, we run it through a safety check. Any unsafe migration (either for data loss or performance issues e.g. DROP TABLE) requires an extra approval file.

There's no real difference between an up migration and a down migration (except that one tends to result in an unsafe migrations). It's calculable at CI time so we can give devs a chance to look at what it's going to do and approve any unsafe migrations. API compatability checks enforce that you need to deprecate before you can drop.

DML, that is data changes, are handled via standard check in a sql file and CI will run it before the code deploy and after the schema migration.

Alembic is the one other place I've seen this concept (a couple others mentioned this) so it's not new, but surprised I haven't seen it more places.

[0] Shameless plug: We're hiring if you're interested in changing how healthcare is paid for, delivered, and experienced. https://www.devoted.com/about/tech-jobs/


I call this declarative schema management, since the repo declares the desired state, and the tooling knows how to reach this state. This concept is finally catching on lately, although some huge companies have already been doing it this way for quite some time. Facebook is a key example; they've managed their schema changes in a pure-SQL declarative fashion, company-wide, for nearly a decade.

I'm developing a suite of tools [1] to provide declarative schema management and "schema change by pull request" functionality, initially targeting MySQL and MariaDB. A few large companies have built pipelines using one of my tools -- including Twilio SendGrid, who wrote about their process in-depth recently [2].

[1] https://skeema.io

[2] https://sendgrid.com/blog/schema-management-with-skeema/


This is good to know. As someone who didn't do much with databases before, I was frankly worried given how it didn't seem like many others were taking this approach when it made so much sense (we did have the advantage of having a defined schema which I know isn't always available). Seems like I just didn't know what to search for.

Git would never have worked it required devs to write the up/down patches - why should we have to write the up/down migrations for my schema?

Excited to see more tooling around declarative schema!


I'm surprised this isn't more of a thing. It seems like the natural evolution of "[X] as code". I've always been a little turned off by migrations (though they were certainly an improvement over the previous situation, which was basically just indeterministic changes on the fly).


My thoughts exactly. But it's a major paradigm shift for those coming from the world of Rails/Django/etc migrations, and that unfamiliarity understandably leads to some initial resistance and skepticism.

fwiw, other declarative tools are starting to pop up -- besides my tool Skeema, some others I've seen recently are Migra [1] and sqldef [2]. And meanwhile a bunch of enterprise tools for MS SQL Server have operated in the declarative fashion for quite a long time, although usually with GUIs instead of being git / pull-request-driven. So I think/hope it's just a matter of time before this concept becomes more widely known.

[1] https://github.com/djrobstep/migra

[2] https://github.com/k0kubun/sqldef/


It's definitely a thing, eg SQL Server Data Tools has this as a default - Schema Compare and Data Compare, and you can just use declarative approaches to defined your final state and let the tool take care of it.

That being said - if you want to do this the downside is usually that its slow as hell, and the non-migration approaches can cost you downtime.

Generic solutions to specific states often means copying all data somewhere else so you can modify the table and then put it back in a useful fashion - a migration often allows more piecemeal approaches.

Edit: a guy I like wrote a good model/migration set of articles http://dlmconsultants.com/model-vs-mig/


Curious, how do you deal with renaming fields or tables?

This is a (minor) pain point for traditional migration systems.


Excellent question! The short answer is Skeema doesn't directly support renames yet. Renames are inherently more imperative than declarative, so they don't fit in well with the model. I've thought about handling them via tracking their history/state, but it would be hacky.

Two workarounds exist in Skeema currently:

* You can do the rename "out of band" (e.g. manually, rather than via `skeema push`), and then update your schema repo via `skeema pull`. This isn't ideal, but then again, table/col renames typically involve nasty code-deploy-order complexities to begin with (regardless of migration system): there's no way to rename something at the same exact instant that your new code goes live, and it's difficult/annoying to write code that can correctly interact with both names.

* For new empty tables, happily a rename is equivalent to drop-then-re-add. So this case is trivial, and Skeema can be configured to allow destructive changes only on empty tables.

I've written a bit more about this at https://github.com/skeema/skeema/blob/master/doc/requirement... , as well as the bottom of https://www.skeema.io/blog/2019/01/18/declarative/ .


How about a column named x_no_wait_y declares a column named y, but if a column named x exists it's renamed?


If you use ms sql server ssdt you use refactor/rename and it finds all references and changes them and then when you go to deploy it generates a sp_rename - 100% killer feature right there :)


Data migrations? Denormalizing columns from one table to one or more child tables, possibly more than one relation away? Switching one set of fields in a single table to be in a different table via a relation, converting something from 1:1 to 1:n?

The concept appeals to me, but it only seems to work for trivial migrations.


I’ll openly admit that we don’t have everything figured out. You’re absolutely right that currently, we constrain ourselves on what we migrate to admittedly simple migrations.

I think there’s a whole set of problems to be solved in this space and frankly, I’m as surprised as anyone that given how SQL is declarative, we use procedural code to do migrations so part of my post was hoping people would tell me what tool I should be using or how this approach fails over time. So your examples are really helpful for me as I think through if it’s possible to do automatically, workaround, or get by without.

It seems to me that we just lack the ability to express these transitions mathematically that can help us do them. And of those, there’s probably only a subset which are possible to do without taking downtime.

In particular, the class of migrations that you outlines are a combination of DDL and DML changes and also have quite a bit of code complexity to do without downtime. It’s definitely a current weakness.


A totally valid point, but I'd argue those should be handled by a separate tool or process. Data migrations tend to be fully programmatic; tools and frameworks can help reduce the code required, but cannot handle every possible case. (having performed numerous multi-billion-row data migrations, I learned this painfully first-hand...)

For simpler cases, where it may make sense to run a data migration immediately after a schema change, a good generic middle-ground may be configurable hook scripts. A declarative schema management system can then pass relevant info to the hook (which tables were changed, for example) and then the script can run any arbitrary row data diff/apply/migrate type of logic.

I do understand your point though; for relatively straightforward data migrations, an imperative system can capture these much more cleanly by just coupling them with the corresponding schema migration code.


I honestly like the way Rails does it: both capturing the imperative deltas and dumping the final schema which gets checked in. Not a big fan of down migrations, usually a waste of time.

Otherwise I like Percona's OSC, particularly how it can tune down table rewrites when there's competing work, or replication is lagging too much. We're just at the point where we need to automate the OSC tool rather than using it as a point solution for migrating our bigger tenants.


I am guessing that you are probably not using Python/Django... but is this any different than what Django offers?

Django allows you to define your models (schema) and then run a command that will generate the migrations. If you don't like the migration that was generated, you can modify it. You can customize up and down operations.

There are also tools that will take an existing database and generate the Django models for you.

All of these operations can also export the exact SQL that will run on each supported RDBMS platform in case you want to be extra sure on what exactly will be executed.


Django migrations can be problematic because they're meant to be sequential and have interdependencies. I've had problems merging multiple feature branches because of this, even though there are no code conflicts.

A system like Saltstack or Puppet for databases would not have checked in migrations, these would be generated on the fly at deploy time.

So you could very well have multiple state changes in a single run, by comparing actual DB state and desired DB state, then creating the SQL code as needed for that deployment.

Honestly not having to fiddle with the migrations table on a live server seems pretty nice ;-)

This could very well turn out to be Django's next gen migration tool...


> Django migrations can be problematic because they're meant to be sequential and have interdependencies. I've had problems merging multiple feature branches because of this, even though there are no code conflicts.

They're actually a directed graph; this means a conflict wasn't handled on the branches that should have been, and would probably have been a problem regardless.


This was helpful to think about, thanks.

I've rarely encountered logical merge conflicts with migrations, but I could see it happening.

I used to be on the SQL Server team at Microsoft and had some exposure to the customer support teams. So data integrity and eliminating any potential for errors was huge.

So while I love the idea of migrations being generated on the fly from actual state in Production-System-5 to desired state of commit 27a73e, I'm skeptical of it working that well in practice. Certain cases come to mind where there might be intermediate migrations from [full name] -> ([first name] [last name]) -> ([first initial] [last name]). The system would have to be smart enough to know A -> C may require A -> B -> C or prompt the engineering team for a better DML migration script.

Also, you will want there to be documentation about what was performed whether that is a migrations table that points to a .py file... or a .json output... or a log file.


Yeah. I’d love to see the academic paper with formalizations that help me understand the true scope of this problem. Your example is a great one that prompts many questions. Is it possible to travel directly to the commit o(1) or will the code have to calculate the diff of each commit and apply them one at a time o(n) and how much definition and dependency mapping humans need to do to have it work correctly?


The closest I can think of is trying to define a set of CRDT-compatible operations that are expressive enough to describe your database schema, starting from an empty database. Then, the migration you need to perform is whatever the CRDT merge operator says you need to do.


That's great initially, but problems definitely crop up at scale:

* What happens when your company creates new systems that aren't Python/Django? You can either still shoehorn all migrations into Django models, or have multiple separate schema change processes/pipelines... both options are not good.

* If someone makes an out-of-band schema change manually (either by accident or to do a rapid hotfix), you're no longer on a known version. Not sure about Django, but when this happens, most traditional migration systems cannot operate without additional manual cleanups... whereas declarative tools can inherently transition any state to any other state.

* Depending on the DBMS, with large tables and certain types of ALTERs, using an external online schema change tool is greatly preferable to just running an ALTER directly.

* Does Django support sharding / the notion that a change to a sharded model must be made to multiple shards?


* I see your point on not standardizing on one framework. Generally when that has happened for me, it turns into a new service and it has its own database/tables/migration management. It does get quite annoying, for sure.

* I've seen enough things go wrong that on my teams I do not allow DDL to be executed outside of a controlled process that comes from code. But yeah, if that were to happen, it would annoying to figure out what was done and then try to re-model.

* With Django you can specify exact SQL to run. So you can break up operations into multiple smaller steps... canonical example is building a new column based on an old column. You first add the column with NULL. Then you populate in batches of ~10k records. Then you add on the constraints/indexes.

* I haven't used Django with sharding. It appears there are some posts about it, but it all appears to be community generated content and not part of the official docs.

All-in-all, I could see that at a large scale with very mature engineering organizations with lots of activity and complex operations that something like Django could fall short and a home-grown system like this may be beneficial, assuming it were reliable enough.


Sharding is usually handled by patching the QuerySet / ObjectManager, how Citus handles it is a good example: http://docs.citusdata.com/en/v8.1/develop/migration_mt_djang...


Prepare for the exciting future of DevOps transformation: it's `./manage.py makemigrations`.


This sounds nice! One Question: You said that DML changes are handled via "standard check in sql file". Does this simply mean a new SQL file for each migration? And how are DML changes connected to DDL changes? For example, if some code is two versions behind and updated to the current schema, wouldn't this mean that the DDL is updated in one step to the current state, but the DML potentially in two steps, breaking the update?


That's correct. The DML changes as part of CI are somewhat new so we haven't ironed it all out yet.

Here's the scenario that I think you're laying out: 1. Commit A creates column foo 2. Commit B has DML that reference column foo 3. Commit C removes column foo

This works fine if our CI deployer does each commit individually. First roll out any schema changes, then run any DML SQL.

However, our deployer might pick up all those changes and since we roll out the schema migrations first (in this case a create + drop -> NOP) and then runs the DML (which will error), this is an issue because of the rollup.

In practice, we have yet to see this case (most of the time, the dev who write the DML is close enough to the code to know if it's going to be dropped soon and we don't drop that many columns - in part because we know that there be dragons) but truthfully, I haven't thought about it much and need to think through what the impact is beyond this example. Thanks for helping me refine my thinking and I'll have something to ponder on this weekend!


Yep, your example describes exactly (and better) what I meant. Thanks!


Can that handle column renames? Most schema-to-schema diff tools can't tell the difference between a rename and a delete/add.


I’ll openly admit that we don’t have everything ironed out. In fact my next big project is to tackle derived columns (rename is a column where the transformation is the identity function).

It requires a bit more finesse and integration into our code base as it requires multiple deploys with code that knows how to handle both columns.


Strong agree. You can do much better than than Rails/Django migrations.

I have been advocating for this approach for a while now: https://djrobstep.com/talks/your-migrations-are-bad-and-you-...


Not sure about the state of the world currently after living in BigCo filter bubble for the past few years, but do you even need custom tools to calculate the delta between the schema as checked into VCS vs the database's actual state?

Spanner (https://cloud.google.com/spanner/) I think can auto-compute the diff between its current state and a given schema, generate appropriate SQL statements to perform a migration and get user confirmation for destructive schema changes.


> This is just how the industry started managing server configurations (Puppet) ...

Yes, and CFEngine pioneered this in 1993 (Mark Burgess).

I make my living as a CFEngine consultant.


I've read and reread a great article titled "Evolutionary Database Design" on Martin Fowler's web-site [0]. This article describes database migrations as being a process. We've found that for complex changes, we'll often need a pre-migration and a post-migration (temporally being before the code change and after the code change respectively).

We commit the migrations along-side the application code and in our case we use FlywayDB [1]. The only down-side is that this tool doesn't perform roll-back operations automatically. You can always do them by writing another migration that goes forward to the past.

Another popular DB migration tool is Liquibase [2]. I don't have much experience with this tool as it doesn't fit our build pipe-line as well but it does support and encourage defining a roll-back for each migration.

[0] https://www.martinfowler.com/articles/evodb.html

[1] https://flywaydb.org/

[2] http://www.liquibase.org/

EDIT: HN is the new StackOverflow? I think this is a really important question for development teams and yet I could see it being closed due to the "Questions asking for tool or library recommendations are off-topic for Stack Overflow ..." rule. Sad!


> We commit the migrations along-side the application code and in our case we use FlywayDB [1]. The only down-side is that this tool doesn't perform roll-back operations automatically. You can always do them by writing another migration that goes forward to the past.

I've done that too, and agree re: rollbacks... I've also never been a huge fan of the fact that the final, in-production server isn't ever really reified as a source file. What you have instead is the initial state and a bunch of migration files. It works, and it's the most reflective way of how the final schema got to that state, but it's not the final state. I guess that's the trade off of the relatively unsophisticated approach taken by tools like Flyway.


I'm a fan of liquibase. It's a very mature tool which allows you to run it every time the application starts, useful for development to create in-memory dbs. Or you can just call it as part of the build process if you don't want to check the DB changelog tables on startup.

It tracks changes that have been applied in changelog tables so you don't apply them multiple times.

Common operations such as adding a column are defined in a supported markup language but for more complicated things, such as migrating data, you can reference ad-hoc SQL files. All of which can be checked-in to your codebase.

The only real downside is because you are applying the entire development history of the database, you can sometimes be caught doing illogical things like adding and then removing something later on. This can be mitigated by rewriting history if needed though.

Flyway was not as good for me because it lacked a domain language, raw SQL makes it less easy to interpret but I know developers who preferred that.


> HN is the new StackOverflow?

Could you clarify what you mean by this?


This is actually a good place to ask important questions that will be closed by the moderators there.


In fact these are the only questions I'm really interested in asking of other developers any more, and most of the reason why I almost never ask or reply on SO, despite 37k rep.

It was looser in the earlier days, but I guess moderators wanted easier to evaluate rules, and it's easier for moderators to decide to come down on the side of moderators than people who ask questions and start discussions.


Makes sense, though, right? They want a Q&A site where you can get definitive answers. This sort of discussion-oriented thing is better suited to a forum with threaded replies and whatnot.


I actually think there ought to be levels, where you can access different kinds of conversations at higher rep levels. Or some other way of gating out clueless people. Noise drowns out signal otherwise.


Thank you, the Fowler article is very helpful. Does your team have someone act as a DBA to oversee each migration?


We differentiate between DBas and DBAs (administrators verses architects). Our team has DBAs who make sure what we're doing is sane and more importantly tune SQL emitted by ORMs as needed.


It doesn't matter what tool you use, as long as you have automated migrations as part of the automated deployment process.

A lot of the implementations look like this:

create a migrations directory; add an initial migration script in it; make a migrate command to execute before service starts but after the backup.

The migrate command recipe: create a migrations table in the db if it doesn't exist, otherwise fetch the list of migrations that have been applied inside this database; then, apply the initial migration script if it's name is not found in the said migrations table, and insert its name in the migrations table so that it will not be executed again in this database by the migrate command.


We have an ant script that does almost exactly this


We use dbup[0]. Its philosophy[1] is that you should treat your DB changes like code changes in source control, and only perform up migrations. I agree with this. We previously spent a lot of time and effort writing down migrations that were never, ever used. If you need a down migration, take a backup before running your up migrations.

We're a C# shop. Our DB migration scripts are simply named with a datestamp and a change description. The scripts are added to a console exe project as embedded resources and everything is committed to source control. These exe projects are very thin and contain a single line of C# code that passes the command line arguments directly into an in-house DB deployment library which is installed via nuget. This library handles command line parsing and executing dbup in various ways, using the scripts embedded in the calling assembly.

The result is a simple executable file that, depending on its command line, can upgrade the DB directly, preview which scripts would run against a particular DB instance, test the DB deployment against a clean SQL Server LocalDB instance, or generate SQL scripts for DBAs.

One nice feature is that the exe can also return the connection string of the SQL Server LocalDB test instance to external code as a test fixture. We can use this to directly unit test our repository types against a real, clean database. When the test fixture is disposed by XUnit, the temporary DB is cleaned up and removed.

The console projects are built and tested as part of our CI builds, then pushed to Octopus Deploy[2] as packages. The Octopus deployment process simply runs the executable and passes it the connection string of the DB to update.

[0] https://dbup.github.io/

[1] https://dbup.readthedocs.io/en/latest/philosophy-behind-dbup...

[2] https://octopus.com/


>If you need a down migration, take a backup before running your up migrations.

Aren't you missing some steps? Or does your db back up schema structure and data separately?

  1. Make sure nobody can insert/update/delete anything
  2. Take a backup
  3. Run your up migration
  4. Thoroughly test everything
  5. Allow insert/update/delete again


That is our upgrade process, yes.


a lot of time writing down migrations? why was it so hard?

I use https://fluentmigrator.github.io/ and also used dbup..

I generally like the up/down better, and both the up and down is remarkably trivial to write generally, the downs are useful during development you might change your mind about the DB structure. Never used a down in production.


We used to do this but found it was error prone since the down migrations would have bugs and wouldn’t always catch everything. We have dedicated SQL devs that write the migrations by hand and so moving to generated code is a slow and painful process. We now encourage developers working on the DB to thrash out their design in a local SQL Server Express instance before baking their changes into a migration script that will be automatically run into our dev env by octopus. If the scripts don’t run or the tests against the temporary DB fail during build then they never get to our shared dev DB.


we never have problems with it. It's all local, we don't have a shared dev DB, I think that's asking for trouble.

We have test stacks, and a production stack. Those DBS are TBs in size. Locally we have a conditioned DB thats much smaller. The rule is locally, you only change your DB via migration.


We use Rails migrations, which means incremental deltas rather than automatic derivation of the result. Usually, our more interesting migrations include data transformation, and not merely adding / removing tables, columns and indexes. For example, perhaps a field needs to be denormalized on one of its associations to speed up queries, or a 1:1 relationship now needs to become 1:n. I think it's less easy to build declarative tooling around this.

We have a tenanted architecture, so we need to run the same migrations on lots of different databases of wildly different sizes and data complexity. We test migrations by restoring a snapshot of a sample of customers and running the migration through, and some smoke tests before and after that can be compared.

More recently, migrations are becoming a release bottleneck. That is, they take too long to run within a downtime window (fintech, so Sunday afternoon is when releases go out). We're looking at building tooling around Percona OSC, and using triggers to maintain data invariants for data migrations until code is able to catch up.

Migrations aren't what really slows us down though. Instead, it's data volume, and the difficulty in writing efficient queries while also providing a flexible UI experience.


Would be cool to have git for databases.

"oh no! our migration deleted columns without re-representing that data in the new manner, and our users have already done changes to the database so we can't simply restore from a backup!"

quick!

  dbgit checkout -b fixed-migration before-original-migration
  # *run fixed migration*
  dbgit rebase --onto fixed-migration after-original-migration master
day saved!

If only it were so simple.


The main issue is you can't "reverse migrate" creating a table or adding a column as that deletes prod data. So your base tables are always forward migrated, and probably have only basic constraints.

With good updatable view support, you can maintain versions of table views and constraints on the views that may be forward, reverse or laterally migrated. (You'd have a different set of views for each version of the schema in use, so each build of the client is connecting to exactly the schema it expects.)

Then you'll want to checkpoint the base tables. So the system must track usage of view-set versions to determine when they can be dropped entirely. Since the constraints are enforced at the view level, you have to deal with legacy data that violates current constraints, but was hidden by the views.

Then the system can do a checkpoint and delete obsolete columns and tables, moving forward the earliest version you can roll back to.

Of course, that all requires good updateable view support, and I don't think any products offer that.


That’s basically the proposition of any event sourcing system https://www.martinfowler.com/eaaDev/EventSourcing.html but you’re kinda right, that involves a level of code complication (but then it would be so simple as you described to restore / rebase / etc)


You can run/test your migrations in a transaction and roll back if it doesn't work.


Not all problems are obvious, though. You can have an app work 100%, your database be completely coherent, and then realize some of the data is missing.

It would be cool to not have to be so careful when committing migrations, needing to be absolutely sure that we're not screwing something up. This is similar to how one would be careful of changes done in source code before we learned how to use version control systems like git.

I can go completely wild with git, deleting random files, overwriting others with random junk, sharing them to my coworkers, etc. It would only take a few minutes to fix that.


This is OK for git because updates to a repo happen maybe 100 times a day at the most.

A production DB is changed potentially millions of times a day.

If you have this goal from the beginning, you can create an append-only database, but that's orthogonal to migrations.

To have "append-only" migrations is seemingly out of scope for mainstream database engines.


This isn't always true. In MySQL many DDL statements can't be rolled back.

See https://dev.mysql.com/doc/refman/5.7/en/cannot-roll-back.htm... for an example.


sometimes, maybe, if you happen to run the right kind of DBMS...


isn't this what datomic does?


By hand and with careful consideration. Nope wait we use Alembic. It’s actually pretty good. I like the notion of not doing any data destroying migrations. For example if you are adding a column that replaces a different one keep them both. Then at a later time when no code paths touch the old column and that can be proven drop the column that was deprecated. It’s safer that way. But I’ve not seen this done in practice just something I’ve been thinking about.


We use Alembic as well and it does make the process easier. I like the suggestion of not applying destructive migrations immediately.

Just out of curiosity, has your team used Alembic's branching system with any success?


Is branching where it forks and then comes back? I'm still a bit new to the process but if that's the case then yes. The team is still cool to the idea of my way of doing migrations they tend to do the destructive ones and call it a day albeit with the commonsense to do backups for truly data destroying things like dropping of columns or redefining of columns.


I'm skeptical of tools that claim to handle schema migrations automatically or declaratively. I've had them fail in the past at critical moments. I also don't believe that they can handle the most important cases.

I think that numbered sequential migration steps are still the correct way to go. They should be written at the time the feature/change is developed, ideally by the person doing that development.

Whether they are in SQL or some sort of ORM thing probably doesn't matter that much.

I don't know how to handle branching/merging issues with this approach beyond "pay attention, be careful, talk to your colleagues about database changes you're making".

At my first company we stored the whole schema instead of the incremental steps to get there. We used SQL Delta to synchronize the real database to what we wanted.

SQL Delta is a good tool, but one time we ran into a case it couldn't handle during a big deployment at a large hospital and had to abort and roll everything back. Very embarrassing. We switched to incremental scripts after that.

SQL Delta is still a good tool if you have lots of views, user defined functions, or stored procedures. As long as they aren't needed for the migration itself, you can just synchronize all of those at the end.

It's also a great tool for actually inspecting if the result of your migration is what you expected.

I do like the idea of having easy access to the full schema text as well as the migrations (someone mentioned that Rails does this). Ideally I think this should be generated from the migrations and not checked into version control.


At my .NET shop we use FluentMigrator [1] which allows you to just run up/down migrations. Our deployments are completely automated and, in all honesty, while this has been a great and a painless way to manage migrations there really is no silver bullet. Minor releases are dead simple. Larger releases require quite a bit of planning to make sure things are deployed in correct order and SQL updates are backwards-compatible (at least during the deployment window) so that we don't incur downtime or completely break things. The only time we ever had to roll back was when one of these large releases deployed a non-backwards-compatible SQL script which caused us a lot of bad data during 5 minute deployment window. So, this is usually a good motivating factor to try to deploy slim releases as frequently as possible. And if you do that you'll appreciate it when your DB migrations become boring and mundane things that just happen on auto-pilot.

[1] https://github.com/fluentmigrator/fluentmigrator


How do you apply your migrations? I was setting up a project a few weeks ago and reached for FluentMigrator as it used to be my go to tool... but apparently the command line tool is now deprecated and they recommend writing custom code to do migrations. I eventually decided to give EF Core migrations a try, which is working pretty well so far.


We use the CLI you're referring to on ci/cd servers. Frankly, I had no idea it was deprecated and couldn't find much on this after a little bit of googling. Would mind sharing a link?


Perhaps "deprecated" isn't quite accurate. I could have sworn one of their documentation pages basically said, "there's a good chance you'll run into problems if you use the CLI, and we don't recommend using it." If there was something that explicit I can't find it at the moment. There is [0] that says "please use the in-process runner if possible," [1] discussing problems with the cli migrators, and [2] where the maintainer seems to have a preference for removing the cli's in the long run.

[0]: https://fluentmigrator.github.io/articles/migration-runners....

[1]: https://github.com/fluentmigrator/fluentmigrator/issues/1006

[2]: https://github.com/fluentmigrator/fluentmigrator/issues/1016


I hand code my migrations for Postgres. When starting a project a create a base.sql file. Then for first database change I create db/000001-<name of change>.sql, db/00002 [...] and so forth. When I want to create a new database I just run psql<base.sql followed by psql<db/*.sql. Very simple but extremely effective. I don’t think any tool can handle migrations better than hand coded sql. Sometimes I use PL/PgSQL to handle procedural data migrations, which cannot be solved using normal sql.


If you think any tool can't handle this better than you are woefully misinformed.

I was doing this method back in 2006, let me assure you that the automated tools available today make this look like amateur hour.

Automatic schema transactions, automatic up/down scripts, automatic detection that a schema change hasn't been committed. Automatic change tracking table in the db. Ability to seed a database with test data.

With a single command I can go back to any arbitrary point of changes, switch branch, work on a bug fix, switch back to dev and the next time I fire up the Dev site it'll automatically redeploy the new changes.

Just streets ahead of hand coded scripts. Yes, there's a learning curve, yes you have to pay attention to your design still, but it is so good.

I was skeptical at first, but honestly I'm never going back and I don't miss it. I'm using Entity Framework Migrations, I assume other languages have similar/better tools.


EF Migrations works pretty well if you use the database as a dumb object store. If you need anything remotely advanced than that, eg functions, triggers, custom types or materialised views, then it doesn’t gel at all. Raw sql is the only way to go. To me, the database is an API, not just s dry store for OO.


How do you know which ones have already been run on the environment you are targeting?


I keep a simple table with the latest version run and a time stamp. Then I check in each file whether its version# (in the filename) is less or greater than what’s in the table. It’s easy because I can do it in simple sql.


Have you experienced any problems with how long it might take to run the whole list of incremental changes?


No never. It is as fast as you can pipe data through to psql.


What's your precise problem?

Migrate in a backwards compatible manber, so that version N of the app works with N+1 schema (eg add a column, but don't destroy existing ones, use triggers to keep data aligned). When all nodes for an app are are at N+1, you can make a new version with destructive changes (that would break N but not N+1). There's a Fowler article about this.


I'm more interested in hearing about what the workflow is like for developers on larger teams. Do they each work on their own features, write separate migrations, and have a DBA approve and merge them.


For a "very large company dedicated to moving fast" example, here's what the process looked like at Facebook a few years ago. AFAIK same process today, with one improvement noted below.

Background:

* Almost everything is self-service by necessity. Except for some high-blast-radius cases, dev teams are able to manage their own schemas without needing MySQL team intervention. This is made possible by having automation that has appropriate safeties built in.

* There's a repository (git, hg, whatever) storing schemas. It has a couple levels of subdirectories to organize different database tiers and individual databases. In each of the bottom-level subdirs, there are text files containing CREATE TABLE statements, one file per table. In other words, this is a declarative repo, modeling the ideal state of tables in each database.

Process to add or change a table:

1. Just add or change a CREATE TABLE statement, and commit in SCM.

2. Submit a diff (pull request). Someone on your team reviews it, same as a code review.

3. Once merged, the schema change can be kicked off. (A few years ago, a dev would need to run a simple CLI command to tell the automation "please begin working on this table", but I believe this has been automated away since then.)

The tooling automatically manages running the correct DDL safely, on the correct machine(s), even in the case of a large sharded table. Devs never need to write ALTER TABLE statements; everything is just based on CREATE TABLE.

There was a separate flow (with extra steps, on purpose) for destructive actions like dropping tables or columns.


The one weakness of this system is that it doesn't understand or handle foreign key constraints. If you have those you have to manage it the old fashioned way (whatever that is for you)


That's true. Most large-scale MySQL shops, including Facebook, discourage or outright forbid foreign key constraints. This is sacrilege to many relational db purists, but there are a number of solid reasons:

Foreign keys aren't shard-aware, greatly reducing their utility.

They introduce performance bottlenecks due to extra locking. In an insanely-high-write-volume OLTP environment, such as a social network, this really matters.

They don't play nice with online schema change tools in general -- not just fb-osc. These tools all involve creating shadow tables and propagating changes to them, which is problematic with foreign keys.


We were a large MS-SQL shop and we had the same. No FKeys in test or prod and we were "only" a billion and change e-commerce, nowhere near a social media site level of traffic.

To the original question: hand-written ALTER scripts, each taggable as pre, during, or post release actions. We had standard patterns for adding non-null columns (pre to add a nullable column and a cursor-based/batched update, then another ALTER to make the nullable column (now populated) non-nullable). Also had a set of rules to ensure version N of the code (web and DB) could run on the DB at version N or N+1.)


When I worked at Etsy (a couple years ago now, so this is out of date), devs wrote `ALTER` statements and included them in a ticket. Once a week, the DBAs would run all the migrations. The only real things I remember worrying about was making sure to set a default if you were adding a column, and if you were changing a large table, it'd take a long time.

My current company uses mongodb, so migrations aren't a thing. It's pretty nice, TBH.


We wrote a tool that pulls in several open source tools, the most important of which is sqitch[1] - a beautifully simple Perl based tool for managing migrations that supports rollbacks and verifications - all written in SQL.

All hooked up into CI, as well as being available from the CLI for developers, so they can open a PR and get feedback, as well as seeing what locks are required for a given migration.

1: https://sqitch.org/


For Java projects, the most common one tends to be Flyway, in my experience. There's also Liquibase that I've heard of, but never used.

Flyway is okay in my experience. Can't complain about it, but I can't praise it either, it just does what you'd expect.


Liquibase is pretty solid, my company has used it across a couple of projects. Similar to your Flyway experience, it's not anything particularly amazing but it works.


Django migrations it’s a truly great tool.


Yes it is. (Except when they made makemigrations be like O(2^N) or something in the number of models around 1.9).


At my previous job we either didn't migrate and wrote application level transforms that would update records as they were encountered by users (mongodb) or we had a custom built migration system that ran JavaScript snippets on our shards. The migration system was miserable to work with and it was hard to debug the code on stage in such a way that would allow us to anticipate whatever might be on prod...


This sounds like a nightmare. Why?


>(mongodb)


This is a fairly typical approach minus the pain points of our specific migration system for Mongodb as far as I know. We favored application level transforms for the sake of safety and speed. Doing that obviously gets really bothersome as time goes on though.


rails db:migrate

It’s always one of the first things I miss when I have to work on a non-rails codebase.


We use the RedGate SQL compare tools [1] to compare our new schema to our old one and auto-apply the diffs to the production DB (this is done automatically by our deployment process).

To reduce the chance of error we don’t destroy columns or tables.

Our application then has an update step which runs on startup for any data migrations (or new data additions), and then updates a version number stored in the DB. The data migrations are super rare.

This all means we can migrate from any past version to our latest one: because we have all previous schemas stored in git and n update functions in our app that can walk the versions

It’s worked reliably for over a decade and is pretty much entirely pain free

[1] https://www.red-gate.com/products/sql-development/sql-compar...


I've got a fair amount of experience with it and the Change Automation tool they have. Both are fantastic products, the latter for certain scenarios over SSDT itself. Sounds like have had it sorted for a while, my main concerns are when it wants to rebuild certain tables when a simple sp_rename or other step would be sufficient.


This is such a common complaint!


> To reduce the chance of error we don’t destroy columns or tables.

Does this mean you have a lot of unused tables and columns deprecated in the database?


No. It’s very rare that columns or tables are deprecated. It’s a very mature web app that basically just grows.


Django. Django generates the schema migrations automatically from the changed table schema definitions that the developer used when testing their branch, and those rarely cause problems. Data migrations need to be tested against staging DBs with realistic data. But neither is really a major pain point: individual developers create and commit the migration files while preparing their branches for review.


Only major gotcha is when renaming fields: it'll drop the old field then create a new field, by default. You have to do the renaming by hand, trivial but can't be forgotten... Which is why you always test on staging first of course.


Makemigrations detects if the type is the same and asks you if you renamed it.

So, when changing columns do the type change and name change in separate migrations and you won't have to do it by hand.


really the best part about Django


Migrate during continuous deployment.

We use a staging environment, so CD deployed migrations are always run at least once before running on production.

Migrations have to be written so that currently deployed code will work after the migration. Eg, to rename a column, add+copy, deploy, then in a second migration drop the column.


There is a great writing from Stripe which explains their process for database migration : - Dual writing to the existing and new tables to keep them in sync. - Changing all read paths in our codebase to read from the new table. - Changing all write paths in our codebase to only write to the new table. - Removing old data that relies on the outdated data model.

https://stripe.com/fr/blog/online-migrations


Depends on the database, in my opinion. Each one has quirks to be mindful of, depending on the amount of data you're migrating/touching.

For example, if you accidentally put a 'default' on a column when you add it to postgres, it will lock the entire table while it rewrites every row, inserting that default value.

Another common postgres blunder is creating indexes on big tables without using 'concurrently'. This also locks the table for writing and you'll have a bad time.


Just for reference, as of PG11 it no longer locks or rewrites for default column creation.

It's one of the nice things about Postgres, they're always improving. A few years ago, concurrent index creation wasn't a thing either. Nor were "CREATE/DROP IF (NOT) EXIST" statements for various bits and pieces, but they just keep adding to it over time.

In the upcoming PG12 there is REINDEX CONCURRENTLY, so we'll finally be able to fix corrupt/bloated indexes on the fly without having to drop and recreate them.


The one pain is having to manually create a cut off period for migration scripts, meaning when to start fresh from a single schema and restart the migrations again. This is basically free food for some incubator at HN so if they are going to make some breakthrough product, make it easier to not ever have to worry about that and it will be worth me posting this.


On my team we use numbered sql scripts. We restore a production db on our local boxes. This is the starting point. The numbered scripts are executed against this. So the actual deployment is getting tested out locally all the time. This is dead reliable, and handles tricky data transformation that would fail under a declarative/calculate approach.


I've used Flyway[0] at my current and previous shops. It seems to work well.

[0] https://flywaydb.org/


Using DbUp, which is basically a library that you use to wrap SQL scripts in a console app.

This gives you full control over your migrations, while still being incredibly simple - it's just SQL!

For views, sprocs and functions, we don't use sequential migration scripts, instead opting for idempotent creation. This also means it's easy to see the evolution of views etc when looking in source control.

In the past I've used Entity Framework and Entity Framework Core migrations, and hated them both - I'll never be a fan of codegen, but aside from that they sometimes generated wrong code that had to manually adjusted, and you also quickly end up with hundreds of scripts.

I like DbUp very much.


My team works with five database systems containing more or less the same data because we "switch" to a new DB every year without investing enough in migrating off the old one.


It's "job security" ;)


I don't think there is a way to change the database without slowing down development. After all, the data is the most valuable thing you have.

Besides that, I tend to follow the same general principals RainforestHQ presented in 2014 https://www.rainforestqa.com/blog/2014-06-27-zero-downtime-d...



I second this. migrate works great for me.


We have been doing migrations with our internal ORM for many years now; first it was written in Perl, then ported to Java, later ported to PHP, after that ported to C# (past ~10 years) and it does up/down migrations with safe guards in place. It (the underlying algorithm/heuristics) has been working fine for around 20 years on around 1000 projects (client consultancy mostly) in that time. Lessons learned (opinionated/IMHO, no sweeping wisdom attempt); a) database changes should be in code so there is no discrepancy between code + db b) your tooling (in this case our ORM) should figure out how to get from the current state to the new state; it should not be a manual job (because you will make mistakes) c) migrations including rollback should be fully automated (we only had issues when people tried to do clever things manually) d) have common sense safe guards in place; do not try to be too clever to save a few MB's of disk space; for instance, deploy will fail if you attempt to remove a column; you can only do that manually.


We used to use dbschema (https://pypi.org/project/dbschema/) to apply migrations generated by SchemaSync (http://mmatuson.github.io/SchemaSync/) using a helper script to simplify capturing a migration from a known base of previous migrations.

After submitting a number of bug fix and feature patches to the upstream projects I ended up writing a (IMO better) tool to apply the migration scripts (https://bitbucket.org/koalephant/mallard). We currently still rely on SchemaSync for that part, but it’s been more reliable and it’s ultimately a tool for developers who review the generated sql anyway - the tool to apply them needs to run automated on remote environments.


.NET/MSSQL shop here:

Our source controlled files contain the database definition:

- a dacpac file generated by SSDT for the schema

- sql scripts for seed data (which changes a lot in our case), generated by a custom CLI tool (uses bcp.exe).

We have a "dev" database that is basically the "master". We have a CLI tool to generate the database definition files from the dev database. When someone make/need changes in the database, he makes them in the dev DB then call the CLI tool to update the source files.

When publishing a version the dacpac and data script are included. Migrations are created on demand (no need to go through each version, skipping versions is common). Our migration tool create reference databases of the source and target versions, then generate a schema diff script with SSDT and a data diff script with ApexSQL Data Compare. We can review/editthe migrations scripts before applying migrations.

It works well enough that we run automated migrations at night (with backup before/restore after in case of issue).


Scripts.

And I hate to tell you but despite what you’ll read on HN every single company I’ve seen and every single vendor I’ve dealt with - is using scripts.

Just simple scripts, in source control, amended by hand, lord hopes applied as part of a CI system, often without any kind of version numbering (so once it’s in the database you don’t know where it came from) and with no rollback.

Declarative migrations do exist. If it’s in .NET then it’s invisible to me, but I imagine it’s pretty rare. Using SSDT (Microsoft) for tooling is possible but has so many edge cases and is poorly documented so coming up with an architecture requires the kind of expert where there may literally only be a dozen in the world, so it isn’t done outside of demos that fail and get thrown out the second you’d ever try to implement them.


I recently had to update 3 db machines with the updated schemas from one, and the data from a third. Then copy the finished version over to the first and finally the third.

Turns out Visual Studio has a 'diff' generator for both schema and data. Holy hell that worked the treat.


We use fluentMigrator for .Net, but basically the workflow is similar to Rails migration. It is working fine and did not slow us down.

What slows us down and related to this topic is cross-database migrations. We have a bunch of microservices in different codebases. Sometimes we need to adjust bound contexts which move the responsibilities from services to services, we found it's much harder to migrate between different databases behind those services. It's either hard to keep track in the same repo and/or too slow to migrate large volume of data across hosts. In the end we wrote some optimized SQL to do that. Maybe monorepo could help but this doesn't happen too much, and we haven't start trying it yet.


we have had good success with liquibase[0] and flyway[1].

[1] http://www.liquibase.org/ [2] https://flywaydb.org/


We use https://github.com/cakephp/phinx works pretty well. It just runs on deployment. Thankfully I've never had to try using the rollback feature.


- Flyway for relational databases, and

- flyway-like go based homegrown process for Cassandra (that includes support for executing go scripts for data migrations etc)

Above work pretty well - for many, many microservices and continuous deployments all the way to prod.


I feel like Terraform would be a fantastic tool for this but never found the time to look into what it would take. Less so for complex data changes maybe (or maybe not) but for things like basic schema changes certainly.


We use RedGate SQL Source Control. However, we also deploy our applications using MSDeploy and SQL Source Control has no good solution for that. We stitch the migration scripts together manually. Also, no matter how good you control it, your database gets out of sync with the migrations after a while. We also have an integration test which restores a base database and applies the migrations of the current branch on top of it.

We're going to research DACPACs to deploy databases. We hope it will be better.


.net has entity framework an ORM has has migrations. It isn't problem free, but we found workable solutions to our issues.

By default EF will throw an error if the model (in code) and the database are out of sync. There is a setting you can set to tell the ef not to care about the version, but then you have to take the responsibility of making sure the old code will run against the new db version.

Most of our stuff is adding a new column or table so the old code doesn't rely on it and everything goes fine anyway.


Yes, we have used that for a bit and for the most part this works fine. We had however a case where several migrations were not applied in production, and that is a mess to figure out.

But the development experience, especially at the start of the project (if you don't maintain your seeding code), is awesome. You just point the project at an empty or non-existing database, the schema is automatically created, and any necessary base data can automatically be seeded.


But that is most important part of migrations to throw that error so you are aware of it. First thing you fix your db and then push to test server, and you never have broken db on production, NEVER.


Is this question asking how to move from one schema to another, or one database backend to another? Most answers here focus on schema changes but I'm not sure that was the question.

Schema changes are relatively straightforward, there are plenty of tools that can help. Changing the database backend is a different story altogether. There are so many unknown unknowns when changing databases, that it's generally best to avoid it if you can.


I work at a Perl shop and we use Sqitch [1]. It helps manage your migrations, you write three raw SQL scripts per change: deploy, verify and revert. The tool works with any DB or backend, since it only does the schema.

It does NOT integrate with your server framework so you need to figure an ORM solution out, independent of Sqitch.

[1] https://sqitch.org/


I wrote my own schema as code library and then switched back to normal migrations with no down method. It takes some care to not delete anything for a while until it’s definitely not needed ever again.

https://www.ajostrow.me/articles/how-to-run-migrations


Previous shop we used entity framework migrations in .net. So migrations are applied by octopus deploy.

This place we do it manually but because of how the app is architected migrations are rare. If we migrate we try to make the code compatible with before and after schemes.



If anyone is looking for a javascript specific solution, I maintain a module originally written (and abandoned) by TJ.

$ npm i migrate

https://www.npmjs.com/package/migrate


Why does it matter what the tool is written in?

If anything, I'm a lot less likely to use something written in JS when correctness is absolutely paramount.


Django and other modern web development framework handles db migrations pretty well.

$ python manage migrate


We use dbmate [1].

SQL migration management with in-database version tracking.

Super simple, but works for us.

[1] https://github.com/amacneil/dbmate


We use https://github.com/rubenv/sql-migrate for Go

You basically write SQL queries for up and down.


For my current project I have not needed to migrate data.

I have a system called timequerylog and a tool tql-cli where data is just appended to JSONL files organized by key, date, and time.


Ohh, this is a favorite topic of mine and I'm of the opinion there's no clear best solution (possibly good market opportunity here), only a series of trade-offs.

----

In one project, we use DBGhost (for MS SQL Server). It's like RedGate SQL Compare, but in deployable, self-contained executable form. It does a complete schema sync, so internally we run on every build, and externally can upgrade from every previous release, without the pesky "in between" evolution you tend to get with ordered migration scripts. It's run as part of our upgrade process for every app version, and our deploy package is built from the 'initial database create' SQL script in source control.

To make a schema change such as adding a new column, you modify the relevant `CREATE TABLE` script, commit it to source, and that's it.

We also use a hand-built pre- and post-deploy script to do anything tricky that can't be automated (renaming a column, copying/converting data from one column to another, etc). Importantly, these scripts are idempotent (eg: `if (old_column_exists) { start transaction; create new column; copy+transform data; drop old_column; commit transaction; }`). We generally avoid major changes like this as much as we can, but it's possible when necessary. We also have DBGhost configured not to drop anything (to avoid mistakes, or dropping customer customization that they do even though our support contract explicitly says not to), and instead write those by hand.

This process has been in many dozens of customer-facing releases -- including on-premise customers -- for several years, and 'just works'. DBGhost is a clunky app, but our interaction with it is now entirely scripted (via either build or deployment scripts), and the compare engine part of it is actually very good.

----

In another pretty simple app (that runs on our cloud infrastructure only, with 'production' deployments mirrored in a couple different regions, and usually a single 'dev' deployment though sometimes there are other temporary ones) we opted to only use an idempotent, hand-written script, executed as part of deployment. It has `create table if not exists` statements to setup from scratch, and it also has the transforms for changes done over time (`create index if not exists;` `drop column if exists` etc). We periodically remove the transform statements to clean it up after all deployments have been updated past that point.

Even though it's manual, it's actually quite easy to maintain, so long as you're careful about ensuring it's idempotent. The nice part is it typically gets deployed to the dev infrastructure multiple times during the course of working on something, so if there's a problem it becomes obvious very quickly when the deployment fails.

----

There's also another app which uses the more traditional ordered migrations files, which I liked at first but over time I find it annoying. Deploying from scratch installs a really old initial database schema, then proceeds to apply dozens of modifications, many of which overwrite previous modifications.

----

I've also worked on an ordered migrations file app where there was a separate 'create' step for the initial creation (instead of running through each migration). The first time I deployed from scratch I found a bunch of problems where the create script wasn't 100% synchronized with the migrations, but also was synchronized enough that it wasn't possible to 'fix' by running each migration (one of the early ones failed). The only fix was to go through by hand and manually run the relevant migration files. I'm sure there can be better practices to help prevent this (eg some compare step on build), but this happened years ago and still sticks with me as a terrible way to do migrations.


SQL scripts. It is painful.


The horror. I actually had a dev tell me he didn't trust doctrine (symfony project), so would do it all in SQL. Thankfully the project was relatively new so it was possible to rewrite everything using migrations.


Poorly


SSDT / DACPAC


This doesn't fit everywhere, but... 1) Never modify existing tables (okay, almost never). 2) Add new tables at will. 3) Pay a consultant to write PL/SQL that glues it all together and blame them for any and all issues.


3) ha ha




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: