
Migrating bajillions of database records at Stripe - luu
http://robertheaton.com/2015/08/31/migrating-bajillions-of-database-records-at-stripe/
======
rtpg
When migrating to new data models I almost always follw this list now.

[https://matthew.mceachen.us/blog/how-to-make-breaking-
change...](https://matthew.mceachen.us/blog/how-to-make-breaking-changes-and-
not-break-all-the-things-1315.html)

Pretty much the perfect checklist for ensuring your migration strategy is
safe.

------
Animats
What database are they using? Their description reads like they used some
NoSQL database, and then they needed to do an ALTER TABLE.

Also, how many merchants can they have? I have a database of all US businesses
on a desktop machine and a server. It's a few gigabytes. There are about 20
million business entities in the US, and some of them may not be Stripe
customers.

~~~
timr
This can happen even with something as robust as postgres. If you have a huge
table undergoing lots of read/write activity, the locking needed to modify the
table schema can take (essentially) infinite time, and/or slow your writes to
a crawl and/or spike memory usage on your DB such that you no longer have
faith in process completion.

Example from my past: setting a default column value on a huge table
undergoing hundreds of writes per second. Oops.

~~~
beambot
IIRC, this isn't true for Postgres. Many common schema alterations in Posgres
are lock-free (non-blocking). In MySQL, these alterations used to be locking
(ie. O(n) in number rows)... but perhaps it has gotten better? (Wouldn't know;
don't use MySQL.) I did a quick cursory google search to find some
substantiation:

\-
[http://www.estelnetcomputing.com/index.php?/archives/12-Lock...](http://www.estelnetcomputing.com/index.php?/archives/12-Locking-
when-altering-a-table-in-postgres-vs.-mysql.html)

\- [http://stackoverflow.com/a/6542757](http://stackoverflow.com/a/6542757)

~~~
timr
The relevant bit from your first link: "postgres will update the table
structure and there is no locking _unless you do something that needs to alter
existing rows._ "

So, altering a column (e.g. adding a default) will trigger the problem. Adding
a new column or table will not.

------
manigandham
This is a pretty normal way to handle things: double writes, active
sync/migration, double reads, disable old writes, finish sync, disable old
reads.

Over the last 12 months we've migrated our entire system handling hundreds of
millions of requests a day through 2 different database systems. Just requires
testing and good release management.

Also it would be nice to have real numbers other than "bajillions"... what's
that even mean? Doesn't sound like much more than a few gigs of data, in which
case this transition could take seconds by just using an in-memory system.

------
ianhawes
I imagine the approval for posting this article went something like, "Sure you
can do it, just uh, don't use any real numbers."

------
meesterdude
Awesome writeup! Validating to see that the way I do it is the way stripe does
it with bajillions of records, while the system is running. Very pragmatic and
incremental - maybe a little overly cautious with the feature flag, but better
safe than sorry.

I've had to do similar when migrating password schemes. When users login they
would validate against their current password, and then generate a new
password hash to use next login. Although these types of migrations usually
take quite a while, as a user has to login for the migration to happen.

------
joebeetee
I really enjoy Rob's writing. Here are a couple of other good ones

[http://robertheaton.com/2014/03/07/lessons-from-a-silicon-
va...](http://robertheaton.com/2014/03/07/lessons-from-a-silicon-valley-job-
search/)

[http://robertheaton.com/2014/07/14/getting-nothing-done-a-
mi...](http://robertheaton.com/2014/07/14/getting-nothing-done-a-misguided-
quest-for-productivity/)

------
dmmalam
Did you consider using views/triggers on the database to migrate to the new
API? [1]. Would still have to change all models, though static typing would
make this much easier/safer.

[1] [http://johannesbrodwall.com/2010/10/13/database-
refactoring-...](http://johannesbrodwall.com/2010/10/13/database-refactoring-
replace-table-with-view/)

------
beambot
Does the "before_save do" hook happen inside the same transaction as the other
object's save method? If not, this could result in some gnarly data
consistency issues...

------
arenaninja
You know, this is great for Stripe and all, but it makes me shudder to think
the size of databases at, say, Bank of America, Wells Fargo, etc. Must be
humongous

~~~
afarrell
And flubbing that migration means that people lose their houses.

~~~
scott_karana
Doubtful. Banks have effectively infinite piles of paper trails, and I suspect
they have many, _many_ backups in a variety of other forms as well. They're
strongly risk averse, _and_ they are going to strongly try to avoid
culpability.

Nobody would lose their houses, but their IT staff would lose man-years of
sleep, and people in _many_ departments would probably be fired after the
smoke settled.

------
duellsy
I'm always intrigued how companies with data of this magnitude deal with
migrating, I'm really glad Rob put this post together with such detail

------
dreamdu5t
Bajillions? I see what you did there... Afraid your 10-20mil records aren't
impressive enough? How many is it?

~~~
softawre
You realize it was probably the lawyers who wouldn't let him say, right?

How many users does your company have?

