First step is to decouple migrations from deploys, you want manual control over when the migrations run, contrary to many frameworks default of running migrations when you deploy the code.
Secondly, each code version has to work with the current schema and the schema after a future migration, making all code effectively backwards compatible.
Your deploys end up being something like:
- Deploy new code that works with current and future schema
- Verify everything still works
- Run migrations
- Verify everything still works
- Clean up the acquired technical debt (the code that worked with the schema that no longer exists) at some point, or run out of runway and it won't be an issue
There's a little bit more to it. Firstly you can deploy the migration first as long as it's forwards compatible (ie. old code can read from it). That migration needs to be zero downtime; it can't, for example, rewrite whole tables or otherwise lock them, or requests will time out. Doing a whole new schema is one way to do it, but not always necessary. In any case you probably then need a backfill job to fill up the new schema with data before possibly removing the old one.
I thought it was correct to run the DB migrations for the new code first, then deploy the new code. While making sure that the DB schema is backwards compatible with both versions of the code that will be running during the deployment.
So maybe there's something I'm missing about running DB migrations after the new code has been deployed - could you explain?
I'm not the person you've asked, but I've worked in devops before.
It kinda doesn't matter which you do first. And if you squint a little, it's effectively the same thing, because the migration will likely only become available via a deployment too
So yeah, the only things that's important is that the DB migration can't cause an incompatibility with any currently deployed version of the code - and if it would, you'll have to split the change so it doesn't. It'll force another deploy for the change you want to do, but it's what you're forced to do if maintenance windows aren't an option. Which is kinda a given for most b2c products
There's a couple of fundamental rules to follow. First, don't put something that will have insane impact into the application deploy changes. 99% of the DB changes are very cheap, and very minor. If the deploy is going to be very expensive, then just don't do it, we'll do it out of band. This has not been a problem in practice with our 20ish person team.
Second, it was kind of like double entry accounting. Once you committed the change, you can not go back and "fix it". If you did something really wrong (i.e. see above), then sure, but if not, you commit a correcting entry instead. Because you don't know who has recently downloaded your commit, and run it against their database.
The changes are a list of incremental steps that the system applies in order, if they had not been applied before. So, they are treated as, essentially, append only.
And it has worked really well for us, keeping the diverse developers who deploy again local databases in sync with little drama.
I've incorporated the same concept in my GUI programs that stand up their own DB. It's a very simple system.
The main challenge I have noticed with that approach is maintaining the sequencing across different branches being worked upon by different developers - solvable by allocating/locking the numbers from a common place. The other is rolling back multiple changes for a given view/stored proc where, say, each change added a separate column - if only one is rolled back, how do you automate that? Easily done manually though.
I will say that stored procs are specifically messy, and we did not have many of those. They had a tendency to really explode the change file. With DDL, you can fix a table column in isolation. Fixing a typo in a 100 line stored proc is another 100 lines. And we certainly didn't have multiple people working on the same proc at the same time.
We had no real need to address that aspect, and I would do something more formal with those if I had to, such as having a separate file with the store proc, and simply a note that it has changed in the change file. I mean, that's a bit of a trick.
> Though I'm still surprised that some people run DB alteration on application start up
I think I've seen it more commonly in the Golang ecosystem, for some reason. Also not sure how common it is nowadays, but seen lots of deployments (contained in Ansible scripts, Makefiles, Bash scripts or whatever) where the migration+deploy is run directly in sequence automatically for each deploy, rather than as discrete steps.
Edit: Maybe it's more of an educational problem than something else, where learning resources either don't specify when to actually run migrations or straight up recommend people to run migrations on application startup (one example: https://articles.wesionary.team/integrating-migration-tool-i...)
It makes things somewhat easier if your app is smallish and your workflow is something like e.g. Github Actions automatically deploying all commits on main to Fly or Render.
At my company, DB migrations on startup was a flag that was enabled for local development and disabled for production deploys. Some teams had it enabled for staging/pre-production deploys, and a few teams had it turned on for production deploys (although those teams only had infrequent, minor changes like adding a new column).
Personally I found the idea of having multiple instances running the same schema update job at the same time (even if locks would keep it from running in practice) to be concerning so I always had it disabled for deploys.
Secondly, each code version has to work with the current schema and the schema after a future migration, making all code effectively backwards compatible.
Your deploys end up being something like:
- Deploy new code that works with current and future schema
- Verify everything still works
- Run migrations
- Verify everything still works
- Clean up the acquired technical debt (the code that worked with the schema that no longer exists) at some point, or run out of runway and it won't be an issue