The problem is so trivially solved that there's almost no product to build. But people want to build something to solve it, so they build a much more complicated solution that can be configured to solve any potential database migration scenario, regardless of whether anybody in history has ever needed to handle that scenario.
Here's how you actually version databases:
alter table buns add column seed_count int
update version_number set version = 31
Extra points for building the tool that spins through all the files in source control and executes the ones that come before the current version_number on the server in question. And for automatically creating a change_0031-0032.sql file for the next guy to use (with that update line pre-filled.)
You'll be hard pressed to spend an entire day building this, but sadly nobody does. Everybody downloads some overcomplicated monstrosity like the one in the article that describes changes in XML and whatever else. Then spends two days fiddling with config files to get it working.
I'd give up complaining about it if I didn't keep ending up working with shops that use stuff like this.
Sure but as with just about any problem in this space, the first 80% is the easy part and the final 20% will occasionally just flat out kill you.
How do you handle versioning stored procedures and triggers? What if your dev envs don't use partitioning but your QA / prod envs do? How do you rollback? Perhaps you're using the expand-contract pattern to avoid the need for rollback, great, how do you manage the complexity of inflight (expanded) but not yet contracted changes?
For basic use cases you definitely don't need liquibase but there comes a point where something like flyway makes a difference without adding much complexity at all.
versioning stored procedures and triggers?
Procedures and the like get their own .sql files and live under source control like everything else. At build time, they all get blown away and rebuilt.
What if your dev envs don't use partitioning but your QA / prod envs do?
SQL still works though, right?
How do you rollback?
Nobody has ever needed to rollback. It's the scary ghost story that convinces people to build big ugly database build systems.
In short, most shops just plain won't ever need any of the stuff they're paying for (in dev hours) to run these big tools.
Hopefully somebody will build this small thing as a tool, so that people can download it and therefore be "using a tool" and thus not running up against Invented Here Syndrome.
In practice, you end up having lots of complex restrictions - you have to change your schema definition slightly between dev and prod, because your dev box is linux and your prod machine is AIX and there's a bug in your dbms on AIX. You can't just drop stored procedures, because rebuilding them takes 30 minutes of downtime you can't allow for. Or you have security policies on data in prod that changes the table definition.
In an ideal world you wouldn't need these things, but once you have a complicated enough system, you basically have to have a tool that does your database release management.
This is at the same the truest and the wrongest sentiment.
True, because experience teaches that db rollbacks are exceedingly rare for production. The cost of maintaining them is larger than the costs of doing it by hand. Moreover, since the rollback path typically is much less tested and validated, you run significant risk of a blind rollback messing up your schema or data even further. With GP programming we typically choose to hide new functionality behind a feature flag and toggle it instead of relying on downgrading the software version.
Wrong, because there should be ability to perform provably lossless schema and data transformation. A lossless transformation would by its nature be amenable to a safe rollback. Just like a unified patch(1) patch file is a lossless transformation of a file and thus can be reversed, a lossless database change could be. The database engine could even offer an explicit support for checking, validating, and enforcing lossless (reversible) transformations, in similar spirit to the foreign key constraints guarding against mistaken deletion of relational data. Come to think of it, it would be very similar to a long-running transaction that encompasses both data and schema change, and that has side-effects visible to other transactions before it is closed and has its metadata discarded.
Some engines implement some support for some things. Many DDL commands can be run in a transaction in SQL Server. I believe Oracle creates a savepoint of some kind after every commit and DDL commands typically imply a commit. But it varies across vendors and it's a lot to untangle.
In practice a lot of rollback scripts tend to make the assumption that the change script succeeded without issue. When the change script fails for an unknown reason often the rollback script will fail as well, in the worst case also having partially succeeded.
I did a rollback several times, but not a DB schema rollback. The potential to lose data is too big. I try to write code and migrations that support rolling back at least one deployment without rolling back the migration. Most of the time you need this anyway to support zero downtime deployments.
Writing and testing backward migrations is ... problematic.
Properly having rollback and other safeguards in place allows you to move faster and make mistakes with less consequences. Everything is a trade off.
- exclusive lock to prevent multiple instances running at the same time
- forcing immutability
> Then spends two days fiddling with config files to get it working.
Look, dozens and dozens of developers have been using this for many years at us without problems. If it takes two days to set it up: let it be.
And we're writing our migration scripts in SQL, not XML.
Liquibase has its share of gotchas, especially in the SQL syntax. My sense is that most of the problems I experienced had to do with weird edge cases introduced by trying to handle some complex enterprisey scenario that I would rather deal with by just not letting things get that complicated in the first place. That exclusive lock mechanism, for example, isn't foolproof, and I suspect that it has bitten me more times than it's helped me.
Roll-your-own is not necessarily the easy option. I'd say that, if your goal is to get something robust enough to use in production, a day to implement is a very optimistic estimate. I'd say that it's one to steer clear of unless you're sure you know what you're doing.
Which one is better depends, as usual, on your situation. I think my rubric would probably be: If the XML schema definition language is at all attractive to you, use Liquibase (or similar). That's far and away its most unique contribution. If, on the other hand, the knowledge that anyone would even consider defining their SQL schema in a language other than SQL fills you with holy rage, maybe choose an option that sits closer to the metal. This doesn't necessarily need to be DIY project, but, thanks to Zawinski's Law, that might be the only way to ensure that your solution also stays close to the metal.
I'm frustrated to see people make a bunch of new tools that ignore the lessons from old tools, but for me this has been solved for a while.
As a datapoint, I've been a backend guy for coming up on 25 years now, and not only have I never needed to roll back a database migration, I'm not aware of that ever having needed to be done anywhere, even anecdotally. It just never happens in the reality we live in.
Yet all migration tools support it, require reverse migration scripts, etc. And you listed it in your "must have" list.
About rollbacks: yes, they are not always useful and you can think of ways to avoid them. But these days when you can rollback a docker application upgrade with a few commands/clicks, this way you can also roll back the DB changes. Not all the changes require this, but there are many ways a schema upgrade can go. Datapoint: changing stored procedures/functions that should match the calling application's version. Yes, not everybody uses logic in the DB, but sometimes you need that performance.
Automatic rollbacks strike me as somewhat risky. My thinking is that if the migration (or upgrade) failed it was likely due to something that hadn't been thought about when the migration script was written. In that case it seems unlikely to me that another automated process (the rollback) will be any more successful.
Or you can spend a few hours to learn a tool that has tutorials online (where you can point newcomers). When you move to another project, you can take your little tool with you, but you will need to start again teaching and convincing teammates.
Just have a target schema definition somewhere, and use a schema diff tool to generate the changes you nee on demand.
No need for version numbers at all. No need to run through change scripts to set up a database from scratch. Almost no need to manually write scripts.
I agree with the other replies - PLEASE don't do this. Use liquibase, flyway, or any other tool. They both let you write everything in SQL if that's your preference. The initial setup time will be saved later on if the team grows.
One thing that is nice from liquibase is that you get a nice cli that let's you do some fun stuff like update the database directly and then auto-generate a diff (your example above). See https://docs.liquibase.com/commands/home.html
Liquibase abstracts it away and allows you to use plain SQL where you need it. Your solution looks simple at first glance, but does not scale very well. For small projects your approach will work, but its an instance of "not invented here".
Not really the case in the project I've used liquibase with.
Also: what about not having a versions (as those may clash between branches) but migration creation datetimes to line them up?
Basically, you have a project with all of your tables, stored procedures, etc. as separate .sql files per object, but they are always "CREATE" statements, never ALTER/DROP/etc. So your project defines a normative schema of how you want it to be. What makes this really interesting is that you can then "compile" your database schema, and you actually get "compiler errors" for things like invalid syntax. (I'm putting quotes around that because it's not 100% the same as a programming language compiler error, but close enough.) And you get warnings and static analysis, that helps you detect issues before you even deploy to your local database. My understanding of how this works during compilation is that it basically spins up a temporary database, deploys your fresh schema to it, and validates that it deploys successfully. It can even validate against specific server target versions for compatibility. Plus, you get IntelliSense since it knows about your full schema, making editing the .sql files easier.
Where it becomes very helpful is with changes. There are no migration scripts that you code by hand. Need to add a column? Just add it to the MyTable.sql file's CREATE TABLE statement. No ALTER necessary. You get nice git history showing the column was added to the table, without having to track back through migration history. But for new developers coming on your team, or when you want to just start your local database over again, you don't need to run through years of history of migrations. You just start with the current normative schema and deploy it as only CREATE statements. Of course, you can also add pre- and post-deployment scripts as well for seeding data or whatnot.
How this works when it comes time to deploy changes to an existing database is that it builds the temporary database of the normative schema, then does a schema comparison to the target database, and checks for any changes that would cause data loss such as dropping or dangerously altering columns. If no such changes are needed (i.e. if you're only adding a column, or changing a sproc), it can safely go ahead and generate and run that DDL script (i.e. ALTER TABLE ADD whatever, plus your pre- and post-deployment scripts). If you want to proceed allowing the data loss, you can do so if you feel it is safe. But you have that safety check to prevent screw-ups. And all of this is generated into a nice .dacpac file that you can pass around in your CD pipeline, or hand off to a DBA for deployment. You can also generate that script without running it directly on the target if you prefer, and hand that script off as a SQL file.
Also, I've found this is quite compatible with git branching, due to the normative-schema nature, whereas with migrations you can easily end up with conflicting migration order when i.e. two branches create migration #101. In the SQL Server Database Project case, if the changes are not conflicting in a git merge (i.e. columns added to two different tables), there's no problem and you don't have to touch any files at all. If they are compatible changes to the same file (i.e. two columns added to the same table), you can usually do an auto-merge and you'll likely end up with a valid schema. And in the rare case of a conflict, you can merge that however you see fit.
Is it perfect? No. Is it complex? Yes. But once you get used to how it works, I've almost never had to script out any DDL changes by hand (I often forget the ALTER syntax), I don't think I've ever had to roll back (rollbacks are usually as simple as re-deploying the older .dacpac file), and I've found so many issues due to compiler errors and static analysis that have saved a huge amount of effort. Not having something like this is the biggest thing I miss when I have to work with PostgreSQL or MySQL. If anyone knows of something similar for those databases, I'd love to know!
Internally it uses sqitch and apgdiff.
The process is almost exactly like the one you described for your tool.
It's completely language agnostic, super easy to embed into any deployment pipelines, and works using regular SQL (so you don't need to learn a new syntax).
Not exactly a Liquibase alternative for anybody working in the enterprise space.
I always find these tools a bit lacking though. Over the long term, things that I find mattering are:
- Correctness of changes being made
- Support in getting to Zero-downtime migrations
- Support in CI around testing against the right schemas
- Squashing changes/garbage collecting after a given point
- Support for applying the same change to multiple production environments (where you can't rollback one because another failed).
These are all hard problems! Some are quite specific to certain kinds of application too. But companies always seem to end up building custom tooling for this sort of stuff. I'm sure there's a place for a tool that gets the right balance between support for these complex use-cases and defining a way of working.
Unfortunately I suspect that the tool _also_ needs runtime instrumentation of the database in some cases to understand behaviour enough to address all of these concerns.
Just for context - over the last 13 years, I developed professionally in PHP, Java, Scala and Python, together with some code in Go and Rust. I can patch C and C++ code. By saying professionally, I mean being paid to do it as a main job.
The integrated nature of Django (ORM, admin, migrations) is a killer, but very rare to see. Spring + Hibernate doesn't come even close. Play framework seems to go in this direction but misses a lot of stuff. Maybe it's matched in modern PHP or in Microsoft land but I don't have much experience there. You can collect all the components and tie them together, but you're focusing on assembling the tooling when you could be focusing on the real problem. Being able to develop alone or in a very small team means real agility and low overhead. The time to start matters very much.
I know Django has its own pain points (for me, it's mostly performance). Each part of the stack has a better alternative as a standalone library. But I'm willing to sacrifice using the top tools in favor of using the top stack. Most of the web apps have a core product that's unique, but otherwise they all need to deal with database, authentication, authorization, CRUD admin interface etc.
Yes, Laravel has something like you're describing: https://laravel.com/docs/5.3/migrations
Django has a set of views where you can manage users and their permissions, plus any data of yours. You mostly just specify what classes and what fields are visible in the admin interface. There are extension points so you can override some behaviors with code. Eventually, you can replace parts with your own views.
The integrated nature is beautiful. You already have the ORM models. With a few lines of code, you now have a GUI for them.
Does Django admin provide a way to model schema in GUI and create migrations automatically ?
Have a look at http://forestadmin.com. It's a fully GUI admin component. I found it when looking for Django alternatives when developing in Java.
* XML is a lot better for configuration if you don't have a very good YAML verification. Wrong indention and stuff can be easy to miss
* Make sure your k8s probes work with long-running SQL operations.
You can also include multiple change sets in one xml file, this helps with keeping the number of files down.
if you want you can just write pure SQL inside the xml with the <sql> tag.
Note: We ony implemented rolling forward. Rolling back is hard. Esepcially for DROPing collumns.
Btw https://github.com/amacneil/dbmate also seems cool.
However, this seems like the traditional database migrations which has been used since forever, isn't it?
For our Ruby shop, we use Sequel Migrations. For generic migrations, using pure SQL, MyBatis migrations is fantastic.
XML is a non-starter for most people. I know this is a Java product, and Java developers have a perverse love affair with XML, but literally no one else uses it. The page mentions that changesets can be written in JSON and YAML, but there are no examples. XML is a horrid abomination and I have a viscerally negative reaction to seeing it. At least show something that doesn't cause PTSD to the viewer.
I've used Rails DB migrations (written in Ruby with ActiveRecord's DSL), Alembic (written in Python with SQLAlchemy's DSL), and roll-my-own pure SQL, and nothing beats SQL. You're working with a database, why not work with its language? Any migration of significant complexity requires the use of actual SQL, and trying to abstract that away just results in mixing languages and obscuring the change being made. It only takes a few lines of code to manage the versioning.
> updateTestingRollback tests rollback support by deploying all pending changesets to the database, executes a rollback sequentially for the equal number of changesets that were deployed, and then runs the update again deploying all changesets to the database. 
In a lot of common multi-tenant designs you could have thousands of schemas. You can't have a human doing this. Additionally, the hard part of schema versioning is really the data migration and rollback steps which you failed to mention.
The GUI database client generates up and down SQL files automatically when you click and change schema.
Currently, it supports MySQL, MSSQL, PostgresSQL, SQLite.
Here is a simple demo how this GUI works : https://youtu.be/IKvraKy0S90
 : https://github.com/xgenecloud/xgenecloud
(Full disclosure : I'm the creator)
Liquibase also allow you to run java based change sets not only XML or YAML which is very good for more complex migration logic but you have to be careful not to use your DAOs or some service that might mutate in the future to support a schema change but rather use pure sql hardcode to that change set, as change sets run in order you can be sure of the current state of your schema.
Relatively flat learning curve also
I've always preferred Flyway since SQL is the natural language to interact with your RDBMS. I would only consider writing migrations in XML if I have to support multiple database implementations e.g. when you built a product like Jira or SonarQube that support different databases (but think twice wether you want to go down that rabbit hole).
"Introduction to managing DB shcema changes with Liquibase"