
Migra – A schema diff tool for PostgreSQL - networked
https://github.com/djrobstep/migra
======
muxator
This fills a gap for me.

I tend to think of a database schema in declarative terms ("table T has these
fields"), whereas the SQL DDL statements are more akin to commands (ALTER
table T, adding a column). This always felt bad to me, because in order to
define the final state of your schema, you need to define the initial state
plus the commands necessary to get to the new one.

A diff tool frees you from this pain.

~~~
evanelias
Indeed, although you can take it a step further than just diff'ing. Following
the philosophy of infrastructure-as-code, a fully declarative approach would
be a repository of schema files, and tooling that can convert changes to the
repo to actual DDL, which it then knows how to execute safely (even taking
into account multiple dev/stage/prod environments, sharding, etc).

I built a tool called Skeema that does this, although it only supports MySQL /
MariaDB / Percona Server for now.
[https://github.com/skeema/skeema/](https://github.com/skeema/skeema/)

It's currently a stateless CLI tool, using git-like semantics: you can pull
from a db to the filesystem, or push from the filesystem to a db /
environment, or diff to preview what DDL would be generated. But the next
iteration may include integration with CI/CD workflows, Github API, etc for a
more seamless workflow. e.g. if you want to add a column to a table, create a
pull request modifying the file, and the tooling automatically applies the DDL
to a staging environment. Then get your PR reviewed/approved, and once you
merge to master the tooling automatically applies the DDL to production.

------
cellis
There is also a Javascript version of this at
[https://github.com/gimenete/dbdiff](https://github.com/gimenete/dbdiff) and a
Go version at
[https://github.com/joncrlsn/pgdiff](https://github.com/joncrlsn/pgdiff), each
with varying degrees of completeness. As well as apgdiff (Java)
[https://github.com/fordfrog/apgdiff](https://github.com/fordfrog/apgdiff) (
these tools have been tried many times, and I've been keeping track of them :D
), I'm really tired of writing db migrations. Will try this out!

~~~
ssahoo
We used apgdiff for automated schema diff generation. 2.4 was released many
years ago and lacks compatibility with PG 10. Looking for alternatives. Thanks
for the mentions.

Edit - Migra does not compare pgdump outputs.

~~~
djrobstep
That's right, migra does not compare dump outputs.

I've found that the best parser of PostgreSQL dump files is PostgreSQL, and
don't want to reimplement that.

Hence, migra works directly on database connections. If you have a dump file
you want to compare, you can just create a temporary database and load the
dump into that first. There's an example in the docs of how to do this easily.

~~~
ssahoo
this complicates the workflow where schema compare is necessary in source
controls.

------
blaisio
This is awesome!

A lot of people point to migrations as the best way to track changes to a
database schema. But there are a lot of problems with them. For example, they
involve adding version control on top of another version control system, which
can cause a ton of problems. They also don't maintain themselves well. If you
leave it alone, running migrations will just take longer and longer over time,
even though you don't really get more utility.

I think we need more support from databases themselves to solve this problem.

In the meantime, this is a really good stopgap, because it can theoretically
allow you to just have a file with your "ideal schema" for each commit. No
need to maintain a separate database migration history too. You can even
generate a series of migrations by looking at your git history!

~~~
djrobstep
This is a great summary of the advantages, as I see them. Would you mind if I
add this comment as a testimonial somewhere in the docs?

~~~
blaisio
Sure, go for it!

------
yangyang
This looks interesting.

If you're using SQLAlchemy, Alembic is a good migration management /
generation / application tool:
[http://alembic.zzzcomputing.com/en/latest/](http://alembic.zzzcomputing.com/en/latest/),
and works with most (all?) the databases supported by SQLAlchemy. It doesn't
handle every kind of database object / attribute thereof, but it's reasonably
easily extensible.

~~~
djrobstep
Hi, migra author here. I wrote migra partly out of frustration with Alembic
and similar migration tools.

Some of the problems:

\- Annoying version number management

\- Management of long "chain" of migration files

\- Unnecessary coupling between ORM and migration tool

\- Heavyweight process to generate and run each migration, which slows down
local development

\- Bad testability

I talked in more depth about this topic at last year's PostgresOpen, if you're
interested:
[https://www.youtube.com/watch?v=xr498W8oMRo](https://www.youtube.com/watch?v=xr498W8oMRo)

~~~
zzzeek
Alembic / SQLAlchemy author here.

> Annoying version number management

the GUIDs? OK.

> Management of long "chain" of migration files

Alembic does not require migration files to be linked together in any
particular way, except that individual migrations have dependencies on others.
Each file represents an individual series of changes to a database, which is a
reasonable system for developers where a particular new feature comes with a
particular new set of schema artifacts. It integrates very well with version
control and accommodates for workflows like branching and merging of source
trees.

That said, I would assume the Migra way is that you have the entire database
schema in a versioned file so you can diff directly, and the "generate the
changes" aspect is on the fly (otherwise, if you were persisting the DDL for
changes, I don't see how you'd avoid having a "chain" of migration files
dependent on each other). That is more lightweight if you can get by with it.
When you have your tables written out in terms of SQLAlchemy metadata, that's
your "schema". So it's there but just in terms of SQLAlchemy structures.
Alembic could generate the migrations and just run them without you ever
having to see them or commit, however I can't provide a diff tool that I can
guarantee is 100% accurate and would not need its results to be checked before
running. By having the individual changeset go into source control, that
allows the developer who creates that migration to be the one to make sure it
generates correctly, as well as being able to add other elements of the change
that aren't necessarily part of a plain schema diff, where it then can be
committed permanently without the chance of it mis-generating at a later time.

> Unnecessary coupling between ORM and migration tool

100% false. There is no usage of SQLAlchemy ORM within Alembic anywhere. The
"sqlalchemy.orm" library is not imported. There's a single unused artifact in
the test suite that was copied over from SQLAlchemy:

    
    
        $ grep -e "sqlalchemy.orm" `find alembic -name "*.py"`
        alembic/testing/plugin/plugin_base.py:  from sqlalchemy.orm.util import randomize_unitofwork
    

You're likely referring to the fact that Alembic builds on schema structures
defined in terms of SQLAlchemy Core. It's hard to say this is "necessary" or
"unnecessary", the schema has to be defined somewhere, with Migra it seems
like it's typically hand-written DDL or some other generation tool, with
Alembic it's SQLAlchemy Core, since it's for people using SQLAlchemy already.

> Heavyweight process to generate and run each migration, which slows down
> local development

Alembic has an autogenerate feature that is in some ways similar to migra's
schema diff but it does have areas where it does not accommodate for certain
structures, the most prominent one is regarding Postgresql enumerated types.
But this is not a design flaw, it's a missing feature I'd love someone to work
on.

There is also of course the need to express your schema in terms of Python
structures, which is probably what you're referring towards when you say
"coupled with the ORM" as well as "heavyweight", but this assumes an
application is already based on SQLAlchemy and already has this information.

If not, then Migra is a good choice, assuming database agnosticism is also not
required, and if you have some totally other means of generating your schema,
which I'd imagine is hand-written DDL, if I'm understanding the documentation
correctly.

> Bad testability

This is an area where Alembic is lacking in guidance and pre-built features,
but not in general capability. We have Alembic migration test fixtures in
Openstack that might serve as a basis for something more generically part of
Alembic but I don't have the resources to work on this solo. But Alembic does
include schema comparison tools and the examples for testing in Migra don't
look much different from what one would do with Alembic (indeed this is what
we do in openstack, compare schemas using Alembic).

~~~
djrobstep
Hey, thanks for the response!

I've written a lot of code on top of SQLAlchemy over the years, big fan of
your work.

These points are mostly philosophical differences, I appreciate that not
everybody has the same perspective I do.

But to me, thinking of database changes in terms of current state/goal state
and the differences between them, is more intuitive than as a sequence of
versions.

By heavyweight, I mean the need to generate or modify a migration file and
version each time a change is required. When I'm working on an app and playing
around with table structures I find it much easier to be able to just type
"sync" and have everything automatically match what's defined in my models or
setup scripts or whatever, without generating a new numbered version each time
I make a change.

You don't have to build your schemas through hand-written DDL. You can still
build your schemas through Python structures or however you want, as long as
you can generate a target with it to compare against.

~~~
zzzeek
yes I can definitely see Migra is more productive when switching around
between schemas in development. But I'm not sure how I'd go about using this
approach if i need to deploy a fixed series of schema migrations to a
production server farm.

------
xmatos
I like the idea of a tool like this, but I'm not sure it's a better process
than having migration scripts. And I dislike migration scripts.

To me, the perfect process would be, using an MVC web framework as an example,
to generate a temp schema from models and run a schema diff tool like this, to
update the destination database. That would eliminate migration scripts, but
would probably slow down the db sync process, once your db start getting
bigger.

Now, if you're not using an ORM, I think it's a bad idea to use a tool like
this to update a production db from development. It will work for a single
dev, but not for a team project, since you'd need a common dev schema that
would be used as a source for the diff, and that will get clogged quickly.

It's still a nice tool to find and update small inconsistencies between
different environments. Nice work!

~~~
stephenr
I’ve been using a couple of python tools wrapped in two shell scripts to
achieve what I think works well:

migration-capture.sh (using
[https://github.com/mmatuson/SchemaSync](https://github.com/mmatuson/SchemaSync))
creates a temporary db using a base schema SQL file, applies previously
created migrations, and then creates up/down migration scripts by comparing
that to a db that’s been modified manually or by an mvc model tool, etc.

migration-apply.sh (using
[https://github.com/gabfl/dbschema](https://github.com/gabfl/dbschema))
applies the “up” scripts that haven’t been applied previously OR runs the
“down” scripts corresponding to “up” scripts that have been run previously but
are not found in the deployed scripts (ie rolling back a deployment
automatically runs the down script for migrations that were in the rolled back
versions)

Right now my only concern/goal for improvement with the setup is to replace
the python tool the apply script relies on, with a pure shell solution so it’s
one less dependency to need to install on production machines.

Edit: spelling typo

Edit2: Wtf is a bass schema.

------
Twisell
I don’t really get the part about extensions. It say it leave extension out of
the diff.

However is it possible to include extension using a flag? The first use case I
was thinking of being precisely to generate upgrade script in the context of
an extension.

I’ve experimented around pure SQL extension for some project and the real pain
point was generating upgrade script. So far I only used a "nuke and reinstall"
approach but being able to prepare upgrade script with migra would be much
smoother.

------
conceptme
I have used pgModeler([https://pgmodeler.io](https://pgmodeler.io)) some time
ago for this and mysql workbench for mysql. It was still a bit unstable the
last time I tried it but it's pretty nice when it works.

------
Ericson2314
This is a very nice idea! Even if it just did schema equality that would be
nice.

------
wereHamster
I've used this one in the past:
[https://github.com/eulerto/pgquarrel](https://github.com/eulerto/pgquarrel)

------
ergo14
What are the benefits of using this over Alembic? Both are python tools and
alembic supports other vendors too.

------
HugoDaniel
How does it compare to sqitch ?

~~~
djrobstep
Migra is strictly a diff tool. It doesn't impose a particular migration
workflow on you (but is suited to some more than others).

Sqitch is a migration workflow tool, for its particular preferred workflow.

------
girishso
Looks great. Yesterday only I thought of creating something like this!

