
Django zero-downtime migrations for Postgres that respect database locks - tbicr
https://github.com/tbicr/django-pg-zero-downtime-migrations
======
StavrosK
So basically what this does is avoid locking the database while migrating?
Obviously (and the README mentions this) the two releases have to be
backwards- and forwards-compatible, so with this backend you avoid locking for
other processes while migrating.

Very useful, especially for long migrations, but I would like to see a bit
more detail about how this library achieves this, what the caveats are, etc.

EDIT: Never mind, there's a comprehensive "how it works" section farther down,
I just needed to scroll far enough. This is very useful.

~~~
tbicr
I found only two caveats:

\- it doesn't use transactions, so if migration will down, then you will need
to fix state manually (one point of improvement), however you cannot run
`CREATE INDEX CONCURRENTLY` in transaction.

\- it can use `CHECK (column IS NOT NULL)` constraint for safe `NOT NULL`
replacement for huge tables, that isn't compatible with standard django
behavior.

So all this cases highlighted in README.

------
aidos
Looks like this handles a bunch of cases around locking you can run into,
which seems a noble goal. I like the approach of making a matrix of all the
migration operations and figuring out how to work around the locks, kudos.

There are definitely some wonderful ways to mess things up when starting out
with postgres migrations. Nothing quite like the surprise you get the first
time you rewrite every tuple on a table with 20M rows because you added a
column with a default value (no longer an issue with the latest postgres).

As pointed out in the docs, your code must be prepared to support the schema
both before and after the migration. Not that I use django anyway, but
normally I'm more worried about the interlacing of the code and db changes to
keep things running smoothy. Migrate a bit, release some code, migrate some
more etc.

------
bmdavi3
This is cool.

Another alternative would be to use postgres savepoints, which are like
transactions inside transactions, as a wrapper around each migration. You can
do the same thing - set lock_timeout and catch errors when those values are
exceeded, and try the transaction again.

Provide an option to run the handful of operations that can't be run inside a
transaction as an escape hatch, and then you can retain the ability to run
migrations inside transactions, which is usually a good thing.

~~~
tbicr
Unfortunately savepoints live only in transaction and ACCESS EXCLUSIVE will be
released only on whole transaction commit/rollback, so look like no benefits
with savepoints for schema migrations to compare with plain transaction.

------
rbanffy
This is awesome. It's not a silver bullet and won't solve all case. You'll
still need to design in degraded modes for your application for when parts of
it become unusable (that's you should be doing anyway), but this covers a lot
of your likely needs.

------
etaioinshrdlu
Is there any equivalent for django+mysql? I historically found more tools for
online schema changes in the mysql sphere, such as gh-ost (
[https://github.com/github/gh-ost](https://github.com/github/gh-ost) )

~~~
rantanplan
gh-ost is hardly useful for anyone outside of github.

It's predicated on the fact that you don't use foreign keys. Now why would
someone use MySQL without FKs... is beyond me, but I'm sure they have their
reasons.

MySQL has this [https://dev.mysql.com/doc/refman/5.6/en/innodb-online-
ddl.ht...](https://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl.html) for
Online Schema migration and

MariaDB has ALTER ONLINE [https://mariadb.com/kb/en/library/alter-
table/](https://mariadb.com/kb/en/library/alter-table/)

They both work with triggers + ghost tables, so they don't need transactions.

~~~
orf
Rails does not use foreign keys by default.

~~~
rantanplan
What does "by default" mean?

If you want to have referential integrity there's no other choice. Otherwise
you create tables that have no relation to each other. No one stops you from
doing this, but then you probably don't want an RDBMS in the first place.

~~~
orf
Rails by default does not add a foreign key constraint at the database level
when defining a relationship[1]. And once you have a fairly large rails app
that uses this default it's somewhat tricky to migrate.

It's madness, I know.

1\.
[https://api.rubyonrails.org/classes/ActiveRecord/ConnectionA...](https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-
i-add_reference)

~~~
rantanplan
WTF!

Thanks for that tidbit, I have no XP with Rails, but I would never imagine
they do something so atrocious.

~~~
lloeki
The rationale is that non-trivial conditions on ActiveRecord model
associations can be quite dynamic (possibly implying ruby code evaluation),
therefore constraints are best handled at the model level, or else the
constraints would only be partially checked at the db level, or even
conflicting with the models (e.g if a constraint is to be enforced based on
some condition evaluated at runtime, like a simple _if_ clause, or when using
STI or polymorphism).

~~~
rantanplan
I have no clue what you're talking about. That's an honest statement, not
trying to be confrontational.

If you have constraints that can be enforced by the DB, you simply use the
DB's constraints, because the DB guarantees they're gonna work 100% of the
times and your data will be correct.

If they are more dynamic or require custom business logic... well you do it on
the application layer. That's what everyone does.

I mean you can probably implement your own transactions, that doesn't mean
that you should. And if you do, then just admit that there's no point in using
an off-the-shelf RDBMS.

~~~
matthewmacleod
Well, Rails is kind of an off-the-shelf solution to building web apps, so it
makes some compromises to present a uniform process and API for some actions.
One of those trade-offs is that the relationship between database tables in
ActiveRecord is handled at the model/application layer instead of the database
layer.

This has some benefits - it allows for a uniform definition of relationships
regardless of database backend, allows for constraints that can’t be expressed
by the database itself, and allows constraints to be used as a first-class
concept for things like presenting error messages to users. But it also means
that data integrity is not guaranteed - modifying records concurrently or
outside of the application can result in a data model that’s valid according
to the database schema but not according to the application model.

FWIW I exclusively use Postgres as my Rails database backend these days, and
foreign key relationships are extremely easy to include in migrations. This
still requires a companion definition in application code so that good error
messages can be presented, but that seems acceptable to me. I’d hope that this
eventually becomes the default for databases that support these keys.

~~~
rantanplan
Sure. I'm all for options and pluralism.

I am just that crazy person that believes that default options should err on
the safety/integrity/consistency side.

I also love abstractions, but abstractions can't change their underlying
fundamental reality. So I would like to be the one who makes the compromises
and I'd like those compromises to be explicit rather than implicit.

------
orf
There have been some messages on the development mailing list about adding
support for create index concurrently when using Postgresql, which is a big
pain point for larger migrations.

It's a bit tricky as you cannot use it in a transaction though

~~~
tbicr
Constraints creation also tricky because in transaction it take ACCESS
EXCLUSIVE lock (eg. lock table on READ/WRITE) and spend time to constraint
validation (CHECK, NOT NULL, FOREIGN KEY), so lock will be released only on
transaction commit/rollback.

------
crescentfresh
This is amazing. My conundrum is we don't use Django lol. Wish this sort of
thoroughness existed in standalone pg migration tooling.

~~~
ggregoire
Same here. Our backend team uses Python, SQLAlchemy and PG. They are looking
for a tool to handle migrations and schema versionings. Any recommendations?

~~~
aidos
Alembic. It's made by zzzeek (Mike Bayer), who created SQLA and it's hackable
to do custom migrations.

We use it by changing the models in code and then having Alembic automatically
create a suitable migration for us. Then we change the migration as required
(adding steps for data migrations etc). It's really simple:

    
    
        alembic revision --autogenerate -m 'changing something...'
    

That generates a python file with an upgrade and downgrade function. You can
then customise fully as you require.

Then on the db server you can run:

    
    
        alembic upgrade head  # or downgrade etc

