
GitHub's online schema migration for MySQL - qiuyesuifeng
https://github.com/github/gh-ost
======
MichaelGlass
At NoRedInk, We've been using gh-ost for a few years now, and it's been a
pleasure.

\- The ability to control a running migration is crucial. We have pretty
predictable load, and we generally run long-running migrations during off-peak
hours. If a migration runs longer than we were expecting and might run into
peak hours, we can pause the migration and have the migration not impact
users.

\- hooks make it trivial to integrate with other tools. Right now it reports
to slack, but if we used it more, we'd likely hook it up to real monitoring
infrastructure.

\- there's a lot of default behavior that we want. I'd recommend regular users
wrap their best practices in another script and not call gh-ost directly. It's
nice to not worry about good defaults for e.g. throttling, or worrying about
whether ghost is hooked up to some kind of external monitoring.

~~~
killbrad
I'm probably really ignorant asking this, but how do you "pause" schema
migrations period. And even if you did, how do you ensure a consistent
experience for your users if your db is broken? Some sort of application logic
to deal with inconsistencies? That seems really expensive (from a development
work perspective).

~~~
radicality
Not OP but I’m familiar with the topic and run similar tooling on large
clusters. By pause he probably means prevent it from starting on more
databases and let whatever is inflight finish. For the second point, correct,
your application needs to handle both schemas during transition. When that’s
done, you can rip out the unneeded logic from your application.

~~~
xenomachina
> your application needs to handle both schemas during transition.

How is this typically done? Have a version number in the db? Have the app
examine the schema with every transaction? Have the app assume old/new schema
optimistically, and if that fails rollback and try with alt schema? Something
else?

------
perlgeek
We evaluated gh-ost, but the killer for us is that it doesn't support any kind
of foreign keys.

I understand that at GitHub's scale, foreign keys might be more of a hassle
than what they are worth, but for a smallish company that values data
integrity over scale and uptime, this is not an acceptable choice.

~~~
cup-of-tea
I'm interested to know what companies are doing "at scale" to not need to use
foreign keys. Do they just write user ids or whatnot into other tables?

~~~
Chilinot
I'm guessing they are validating the constraints in their applications instead
of their databases. While this puts more requirements on your setup and
developers, it offloads a lot of stress from your database.

~~~
cup-of-tea
Yeah, but from a data integrity point of view that is horrifying. I guess they
must have some very comprehensive testing framework to validate the code.

Also, pardon my further ignorance, but if you're not going to use foreign key
constraints, what is the point of using a relational db? Why not just a fast
key-value store for each index?

~~~
fipar
You may me able to architect your application to do well-enough without
Foreign Keys yet still require features that key-value stores do not provide,
like SQL, triggers, etc.

In the specfic case of MySQL, while still horrifying (I agree with you! but it
is one of the things you some times have to do at scale), you can create the
Foreign Key constraints but then disable their verification and periodically
look for violations, as described here:
[https://www.percona.com/blog/2011/11/18/eventual-
consistency...](https://www.percona.com/blog/2011/11/18/eventual-consistency-
in-mysql/)

------
throwawaypls
Back when I worked for Shopify, I got a chance to work on something similar --
GhostFerry([https://github.com/shopify/ghostferry](https://github.com/shopify/ghostferry)),
which allows for doing all sorts of migrations, that too between various
databases.

It was recently open-sourced. Do take a look.

~~~
pwnna
Hey! I'm the current maintainer of Ghostferry. Thank you for all your work!

For the reader here: one thing to clarify here is that gh-ost performs schema
migration via a data migration between two different tables and it does it via
a very efficient way. Ghostferry on the other hand is general purpose data
migration library that moves data between different databases, most likely
different hosts. Frequently, both schema migration and data migrations are
abbreviated as migrations and thus may cause some confusion. The domain of
operation of Ghostferry do not necessarily overlap with gh-ost, as it would be
very inefficient to use Ghostferry to implement gh-ost.

That said it is a very interesting project on its own as it has a lot of
potential use cases. I don't want to hijack the thread any further than I
already have so if anyone has any further questions, you can contact
information and docs in the repo.

------
viraptor
I used it and it's really impressive. Works as described. The only issue with
this is that you can't easily use it without understanding how it works. It's
more of a system you have to own rather than a tool you can use, so you can't
just point a new person at it and go "just run this".

~~~
groodt
I agree. I've used it a lot too, but only after a few test runs against some
snapshots to get familiar with the operational aspects of it.

------
analogmemory
So my understanding is that this is for migrating a db to a new one? Can
someone explain like I was beginner why/how'd you would use this?

~~~
jschmitz28
In certain scenarios if you need to modify the schema for a table in MySQL it
will lead to the entire table being locked, and for large tables this could
lead to a noticeable outage for users if you need to run queries on that
table. One case I had where we faced this problem was changing the primary key
for a table from 32 bit to 64 bit ints since we were running out of space. We
used Percona's online schema change tool for handling this, which wrapped the
creation of a new 'ghost' table (which has the target schema you want), rate
limited writes from original table to ghost table, triggered writes from
original table to ghost table as new writes came in, and finally a table
rename from the ghost table back to the original table name in order to
perform the full migration with no data loss or outage.

Sounds like this tool is doing something similar but avoiding the use of
triggers for flexibility.

~~~
manigandham
Modern advice: always use 64 bit integer ids. If it's a small table, it won't
matter. If it's a big table, you'll need them anyway.

~~~
daigoba66
That’s not always the best advice.

Consider a table that you know will only have a few rows, but is referenced in
one or more FK columns. If I use a 64bit integer when 32bit (or smaller!) is
enough, I’m now using twice as much space for the FK column. If there are
millions of rows, plus indexes, that can add up pretty fast.

------
Existenceblinks
That's really old and still good strategy. [off-topic] I've heard this first
time from a novel (1964).

Flynn.io uses the same kind of strategy; transaction log && async replication
([https://flynn.io/docs/databases](https://flynn.io/docs/databases))

A little sad nanobox.io which one of my app running on has an inferior
strategy; temporarily offline at the last sync moment
([https://docs.nanobox.io/data-management/data-migrations-
scal...](https://docs.nanobox.io/data-management/data-migrations-scaling/))

------
AdamJacobMuller
This is a really amazing, very well designed and thought out, tool that solves
a problem that should never exist.

------
pkulak
Holy crap, an alternative to Percona? Why does MySQL get two awesome tools and
Postgres nothing?

~~~
josegonzalez
Postgres supports transactional DDL statements natively, and many alter table
statements don't end up locking the table nearly as severely as some MySQL
versions do.

~~~
michaeldejong
Actually both lock for many (crucial) schema operators, and often severely
enough to block your application from reading from the table(s) under change.
I've been researching this stuff for a while. Check out
[http://blog.minicom.nl/blog/2015/04/03/revisiting-
profiling-...](http://blog.minicom.nl/blog/2015/04/03/revisiting-profiling-
ddl-statements-mysqls-return/) . It's slightly outdated, but still holds.

------
thathoo
Square also its online schema migration tool that is open source here:
[https://github.com/square/shift](https://github.com/square/shift)

Its pretty cool. Check it out as well.

~~~
sciurus
That's not a schema migration tool per-se. It's a web interface for managing
running a schema migration tool (in their case the venerable pt-osc, but there
is an open issue for supporting gh-ost too).

------
ceohockey60
Very cool! Curious, does this leverage this go-mysql library at all?
[https://github.com/siddontang/go-mysql](https://github.com/siddontang/go-
mysql)

~~~
tejasmanohar
Yes, [https://github.com/github/gh-
ost/search?utf8=%E2%9C%93&q=go-...](https://github.com/github/gh-
ost/search?utf8=%E2%9C%93&q=go-mysql&type=)

------
kd22
Can someone shed some light on how this tool compares to something like
Flyway?

~~~
bpicolo
It's an alternative to e.g. pt-online-schema-change [0]. The problem is that,
for very large mysql tables / clusters, running DDL against the tables live
will lock up reads/writes against the table for ages. These tools allow you to
run those changes without taking downtime.

[https://www.percona.com/doc/percona-toolkit/LATEST/pt-
online...](https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-
schema-change.html)

------
magoon
I believe RDS uses this same technique for instance resize/replace.

------
zmoazeni
We use gh-ost at Harvest[1] and it's a dream in comparison to manually
migrating on a replica and switching master/slave roles [2].

Also the linked post[3] in the readme hit us very close to home. We originally
tried some of our migrations with pt-online-schema-change, which was great in
theory but caused a lot of locking contention during the actual process.

I see many people hammering on the lack of foreign key support which is
interesting to me. At some point, a database system grows to where relying on
MySQL's Online DDL[4] "works" but not really with production load. I feel like
a team knows when they need to bring in a tool like this.

The dev in me understands how wonderful FKs are for consistency. But the db-
guy in me that has had to deal with locking issues recognizes FKs as a
tradeoff, not dogma.

If you shy away from migrating your large or busy tables, or are scheduling
frequent maintenance down times in order to migrate these tables, that's when
gh-ost (and others) are appropriate to evaluate.

So for us it's not an immediate red flag that gh-ost doesn't support FKs. We
just have to work around that limitation[5] because the alternatives are much
worse.

For the record, we don't gh-ost all of our migrations. Only the ones that are
deemed sufficiently large enough are gh-osted and those heuristics will change
from team-to-team.

But as a guy who has had to deal with our database issues AND as a developer
who doesn't want to be chained by a database design decision from a decade
ago, I love the flexibility gh-ost gives us as we continue to grow.

[1] [https://www.getharvest.com/](https://www.getharvest.com/)

[2] [https://dev.mysql.com/doc/refman/5.6/en/replication-
features...](https://dev.mysql.com/doc/refman/5.6/en/replication-features-
differing-tables.html)

[3] [https://dev.mysql.com/doc/refman/5.6/en/replication-
features...](https://dev.mysql.com/doc/refman/5.6/en/replication-features-
differing-tables.html)

[4] [https://dev.mysql.com/doc/refman/5.6/en/innodb-create-
index-...](https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-
overview.html)

[5] [https://github.com/github/gh-
ost/issues/507#issuecomment-338...](https://github.com/github/gh-
ost/issues/507#issuecomment-338725563)

------
z3t4
I was investigating using the binary log for another project a few years ago,
but came to the conclusion that it's too hard to work with ... I don't
remember any details though, maybe someone can fill me in ?

------
qaq
You can jump through hoops or just use an RDBMS that supports transactional
DDL.

~~~
dtech
That does not solve the problem. Transactional DDL still needs a full table
lock for most operations, which on large tables can take minutes to hours.
Then it's not really an online schema migration anymore.

~~~
c2h5oh
Depends on a migration. Postgres can add / drop a column to a table with a
billion rows in milliseconds as long as you don't provide a default value for
the new column.

~~~
anarazel
And in v11, even if there's a default column!

