
Ask HN: How do you version control a Database? - 719Ben
Hello All,<p>I am working on a project with 2 others and we have a PostgreSQL database that links to our project. We are trying to figure out the best way to share the layout of the database as it is edited.
Currently we email each other when we change the table layout but this takes a lot of time and feels very backwards from regular Git commits.
Does anyone have any advice to version controlling a database layout? We could care less about the actual data at this point because we are still developing the system.
======
lastofus
If one of the many schema migration solutions out there doesn't work for you,
hacking one together is simple enough:

* Put your schema/data migrations into .sql files as raw SQL

* Number the files from 0001 on

* Write a script to apply each migration file to the DB in order, recording the file name to a log table after successful application. Only apply migrations not found in the log table.

* Commit your .sql migration files along with the code it is tied to.

~~~
guh_me
That's a good approach - I've used it in a legacy system, although I named the
files in the format "TIMESTAMP_migration_description", which is a bit more
descriptive.

------
rmurri
Not sure if this is helpful, but you can use a database migration framework.
Check update scripts into your version control, then the others can just run
them when they pull your changes. We use alembic
([http://alembic.readthedocs.org/en/rel_0_7/](http://alembic.readthedocs.org/en/rel_0_7/))
but something else may be a better fit for you depending on what software you
use.

------
davismwfl
So there are solutions to this issue as others have pointed out. Full products
are made around it.

There are a few ways to do it quickly and more down and dirty though. 1, do
what lastofus suggested, it works and is simple.

2, and the way I have used with small groups when there is limited data for
testing and we are using localhost databases for dev. First, check in
drop/create scripts for ever element in the database. Second, create a sql
script that inserts all the test data, users etc. Third, create a shell script
that you can run that gets the latest DB scripts and insert script and runs it
all. It won't scale for a long time, but it works for longer than people
think. Then whoever is making a new change is responsible for updating the
insert script to make sure it stays 100% operational. And personally, we
always tied the db script checkin to the same code set checkin that had the
breaking change if there was one. Otherwise if it wasn't a breaking change
which was our goal usually it could go separate.

Of course, a migration framework is superior, but sometimes the weight of it
just isn't what you need yet.

~~~
SayWhatIMean
Migration frameworks superior? No way.

Migration frameworks work on diffing dev with a copy of the deployment target.
This is _inferior_. If you rename a table it may drop the table losing all
data, then create a new one. It has no way of knowing a rename from 2
unrelated tables. It includes junk/testing objects that should never pollute
proudction.

Hope your bank is not using one of these migration frameworks.

Nubmered scripts are the way to go. With a strict policy to never update a
script, only create new ones. You're modifying state, so you must capture
ordered steps that moved you from state 1 -> 2\. This will avoid a host of
subtle issues. After a release you can create a backup as a baseline for the
next set of scripts to run against.

~~~
davismwfl
Migration/comparison frameworks/tools can be used poorly just like any tool,
but they don't have to be. When you have the funds and a moderately sized
team, especially in dev, I think the migration tools/frameworks can be
invaluable time savers and are superior. Also, the OP isn't in production, but
once in Production I would expect schema changes would be less regular and
generally less breaking except for occasionally. So having a tool now is more
advantageous in many ways but may not fit into their budget or learning curve
window, and it isn't needed, just can be nice.

No tool mitigates the responsibility for reviewing the updates before applying
them in production, that would just be irresponsible IMO. But then again to
me, there is no such thing as a SQL database schema change that doesn't
require a person monitoring deployment. I know for my teams over the years
many times we have tested a script in dev and stage, only to find out in
production some set of records or other change that wasn't reflected in the
dev/stage environments causes an issue. That is also why we started using
migration frameworks to do diffs for us between environments to help minimize
the chance we would run into these issues. It also let us do comparisons of
the scripts marked for deployment to make sure there weren't conflicting
changes, again something that occasionally happens on larger or fast moving
teams.

It isn't that numbered files don't work, but just like my other suggestion,
they only scale and go so far, then you really need a tool to help you keep
things straight and help compare your perception to reality. For the OP,
either numbered scripts or my other suggestion works for now, but I wouldn't
rule out migration/comparison tools as they can really be helpful.

And actually the bank point isn't really a good one, many of the financial
clients I have worked for use SQL migration tools to manage their deployments,
along with their DBA's playing overwatch. This allowed them to configure rules
within those tools that would prevent errors from happening.

~~~
SayWhatIMean
I think diff is OK as a verification check after a deployment. But diff is not
OK to generate the change script itself.

It's not about abusing the diff. It's about diff being fundamentally the wrong
technique to transition from state (C)urrent -> (T)arget.

A diff cannot always bring you to the desired T. The diff only factors in 2
states (C and T) and not X,Y,Z which you may have an import from and which may
need to occur during a transitional state before T is reached.

It doesn't factor the desired state T may not even exist to diff against if
developers move forward adding objects not for this release. It makes things
easier on web developers at the expense of deployments.

Which "scales" better? Migration tools scale better at letting web-devs free
wheel in the DB.

For bullet proof deployments migration tools don't scale. They excel at the
simple cases but hit a hard limit.

BTW any details on what kind of data in production broke a deployment? Heads
up to avoid it myself.

~~~
davismwfl
I agree you may have a few transitional states to go through before dev
becomes production. And in general, we didn't let the migration tools write
our scripts for us, we used them to automate the testing and deployment of our
scripts, maybe that is the key difference in how we used them. Which is maybe
why I have never experienced what you are saying, when we added the tools in
place, properly set things up and used them consistently we had less issues,
and smoother deployments (not perfect, but smoother). But again, they were
larger teams all working in the same data sets so we had to have coordination
etc. Plus we had DBA's that were gate keepers for the database systems, and we
never allowed developers write access to production, that is just asking for
issues.

As for some of the issues I have seen with teams moving fast or that are
larger. In one case we had around 40 commonly used databases, hundreds of
tables in most of them, millions of data rows, multiple servers etc. Not
saying the DB architecture was ideal, but it was not uncommon for larger
systems at mid+ sized Enterprises I have seen as a consultant. Many times it
is when an organization is between rapid growth still and getting discipline
that you see a lot more of these types of issues (larger teams, fast moving
still with a lack of BTDT leadership). The issues we saw between dev, stage
and production were 1. the datasets were too large to replicate from
Production to each environment, 2. security concerns over too many people
having access to some of the data, 3. multiple teams split functionally
instead of per sprint.

So how did these cause issues.

1\. We would not see all the data in each environment and while we would test
across a wide swipe of data and also run queries trying to validate our alter
and update scripts it wasn't uncommon every couple of deployments to have DB
scripts fail to run because of some data we couldn't test against. The toolset
helped us a little here, because the DBA's set it up to auto test changes
against the production data before it would let the script run. And every time
a script would fail they would add new test cases if it warranted it.

2\. Since not everyone could have access to all the data they couldn't predict
every outcome, see #1. So the toolset helped us automate what they couldn't
see so we could prevent some issues. This is very common when dealing with
financial or PII in health care settings. Also, this SHOULD and could have
been mostly resolved by not splitting teams functionally #3.

3\. Teams were split by their function instead of sprint. We had a data team,
analytics team and development, each working mostly independent of each other.
Meaning it wasn't rare for the data team to release a script to production in
the morning which didn't get applied to dev/staging and then development comes
along a few hours later tests and gets sign off on their changes in staging.
Then development deploys to production and has scripts fail to run in
production or causing some data havoc (or vice versa between the teams). Had
the teams been integrated better this would have been less likely but still
not impossible. And I hear all the purist saying well the problem was someone
skipped stage etc. Yep, it sure is a core problem, but frankly in every
environment I have worked I have seen it happen and so to dismiss it as it
shouldn't happen is naive. Like saying well I am not testing for any errors in
my code because there shouldn't be any, uh huh. So when we used the toolset it
would tell us hey, something has changed here, and the DDL looks different
than what you said it should look like, abort.

Also, most places I have seen need to routinely do production to stage and dev
refreshes to help prevent a lot of these types of issues, which right there
tells you that shit happens and it is nearly impossible to plan for all
outcomes when it comes to complex systems. So tools are the right thing to
help you.

Last point, when we did have major deployments and a lot of DB scripts were
going out, the toolset also would run through all the scripts and warn us of
conflicts. It would give us false positives sometimes, and miss some things,
but it really saved our asses more than once.

Sorry for the length, just wanted to share. FWIW, we agree the simple methods
work, its just that tools and frameworks, implemented properly really do
mitigate some of the human and process failures we all have from time to time.

~~~
SayWhatIMean
Ty for the share.

------
dragonwriter
> Currently we email each other when we change the table layout but this takes
> a lot of time and feels very backwards from regular Git commits. Does anyone
> have any advice to version controlling a database layout?

One method is to use pg_dump to produce the script to recreate the database,
and keep that in version control with source. It may not be the best method,
but its probably the one that requires the least changes to existing workflow
of modifying the database and still gives something like what you want.

------
xdanger
[http://sqitch.org/](http://sqitch.org/) \+ git

------
chishaku
A simple framework for database migration that has worked for my team:

[https://github.com/guilhermechapiewski/simple-db-
migrate](https://github.com/guilhermechapiewski/simple-db-migrate)

------
bigsexyjoe
Flyway is a Java based migration framework. It's pretty good. It's simple and
straightforward. [http://flywaydb.org/](http://flywaydb.org/)

------
bigsexyjoe
Flyway is pretty good. It's Java-based.
[http://flywaydb.org/](http://flywaydb.org/)

------
monknomo
My team is using Red Gate

