

Ask HN: How to version control DB data? - cammil

What are the standard/recommended ways to store version-controlled, database data? I'm curious about solutions that would enable querying on tables/rows but restricted to those values at a certain point in time.<p>Fyi, I have posted to SO, with little satisfaction:
http://stackoverflow.com/questions/8522766/what-are-the-standard-recommended-ways-to-store-version-controlled-database-dat
======
jacksondeane
If I am reading this correctly, you want to maintain multiple versions of the
same blog post, correct? This isn't really 'version control' (git, svn,
etc...) in the traditional sense .

At a high level, you could just keep a record of every version of every blog
post, then separately maintain which entry is most current, or active, or
whatever metric you want to display/query. This will let you maintain all
versions of the blog post and you can run diffs on different versions, revert
to older versions and branch from any version.

~~~
whichdan
I'm curious - has anyone here had experience doing this with a few million
rows of data? Even a few revisions each means the table would grow rapidly.
You'd also (probably) need three operations for each insert:

1\. Insert the new row

2\. Find the previous row

3\. Set previous row to `active`=0

Are there performance tricks to keep this reasonable? I'd also wonder how to
modify the table itself (dropping/adding columns) when there would potentially
be selects + inserts + updates running 100% of the time - but I realize that's
borderline out of scope of the OP's question.

~~~
blcArmadillo
Not sure about the other questions but in regards to changing the actual table
structure. If you're dropping columns you'd first make the changes to your
code base so that it no longer relies on those columns. If you have long
running queries you'd just have to wait till the old ones using those columns
are finished and then you could change the DB. And you'd do it in just the
opposite order when you wanted to add a column.

------
JS_startup
I don't know of any type of version control that will work like that. You
could, of course, add your databases to an SVN repository but you won't be
able to branch/merge individual rows in the database.

I'd re-examine your requirements on this one. You normally wouldn't be doing
intense revision control directly inside of the database

------
shrikant
Look into Change Data Capture?
<http://en.wikipedia.org/wiki/Change_data_capture>

------
mattvot
WordPress will just create a new entry in the db with a different status, like
draft. The current version would be 'published'.

------
irrumator
<http://chronicdb.com/blogs/undelete_from_whoops>

------
equark
This is already implemented in HBase. View your table as 3D: row, column,
time.

------
pasbesoin
I haven't looked into the details of what they do, but with respect to your
concern about blog entries (on your SO post), I'm immediately put in mind of
MediaWiki (the engine behind Wikipedia).

