
Backup your Database in Git - tortilla
http://www.viget.com/extend/backup-your-database-in-git/
======
dryicerx
I find my self saying this a lot lately

 _Just because technology A can do action B, does not mean it's the most
appropriate path to take_

~~~
tdavis
I know what you mean. Yeah, I _could_ do this, or I could just rely on
automated iSCSI snapshots to seamlessly and automatically backup the entire
machine whenever I want.

~~~
bretthoerner
Are those reliable for a database, though?

I haven't done enough at such a low level to really understand how that would
work. I mean, the database application layer typically promises to be atomic,
sure. But at any given point in time are the bits on disk truly atomic? Does a
database not sometimes take 2+ writes to fully change data? And does an iSCSI
backup really copy the entire disk at a single point in time? I thought that
was magic specific to something like ZFS. Because if it doesn't, couldn't you
have a database backup that isn't really usable?

I guess what I'm trying to say is that application level DB dumps exist for a
reason - and I'm no expert so I'd love someone to explain how full disk
backups work re: databases.

~~~
tdavis
_DB dumps exist for a reason_

Indeed they do, and that is the "faultless" way of creating a backup.
Depending on your RDMS (and luck perhaps), the backups can be fine or they can
be unusable. In my experience, MySQL databases backed up this way almost
always end up being corrupted. On the other hand, I've had good success with
PostgreSQL. Sure, occasionally the internal registry indices will be out of
sync and will need to be recreated, but it has never been a big deal.

Even still, I wasn't decrying DB-level backups since, obviously, they're far
better (and sometimes necessary). But you can still start a cronjob to write
the backup and let the snapshots take care of the rest. My point is, you
already get heavy-duty compression and incremental algorithms and restoration,
not to mention the ability to restore the complete system state in the event
of a catastrophe.

------
gaika
Too bad git is not good for large files -
<http://kerneltrap.org/mailarchive/git/2006/2/8/200591>

~~~
silentbicycle
Well, the extent to which this matters depends on how often you run it. Even
if the hashing for a 10gb database dump takes ten minutes* , it will still get
backed up ten minutes after the hour, every hour. There are very likely
_better_ solutions, but it would be easy to set up, and better than nothing.

(Besides, any solution that has to deal with a complete db dump is going to
have to hash, diff, or otherwise process a large volume of data.)

* As a single data point, sha1 just ran for me at approx. 800mb/min.

~~~
imajes
a better fix is to mysqldump every table individually, and then git commit
those as distinct .sql files. This way less-often updated data (reference
tables, etc) tend not to need updating, and your hash generation is typically
smaller. Plus, you can skip log/session type tables if you are that way
inclined to use them. :)

~~~
patrickg-zill
That is exactly what 1 client of mine does for Postgres.

------
tezza
This git backup approach is crap. Pure crap.

Sorry to lower the tone on HN, and I'll get to the _reasons_ soon, but my
brain overloaded at how over-engineered this problem is.

\------------------

Deterministic SQL Output ::

For the delta set in git not to jump around, the ordering of the statements
MUST BE THE SAME every time. git is designed for source code

Trivial Amounts of Space Saving ::

just creating a well known bz2 of the file is going to give you excellent
compression, and then you can save it to S3 / DropBox, your home computer

ALTER TABLE ::

ALTER TABLE or any UPDATE * is going to yield a line by line delta... checking
in the whole DB/Table again. Given that this is app specific, WHY not just
compress the whole dump.

Backup Complexity ::

You've written a cron script, validated that the whole db is being dumped, hdd
space is cheap, so why involve another step that can go wrong?

Duplicating a tar.bz2 file is a lot easier to validate than a git commit. Much
much much faster too.

~~~
pilif
MySQL usually does not reorder the data on disc, so in the OP's case, this
might actually work for him. I do agree though that just moving around bz2
files is much easier. Andit scales better (as in: continues to work) once you
begin accumulating REAL data.

A common size for a complete dump of one of the plattforms I'm working with is
400 MB bzip2 compressed. You certainly would not want to have that amount of
data uncompressed in a git repository.

should you decide to compress pre giting, you of couse lose all advantages of
using fit for this in the first place.

This feels like a non-solution to me

------
wvenable
Backing up MySQL's binary log is probably a better solution than doing
periodic mysqldumps. Also the ordering of the inserts in the dump isn't likely
to be deterministic, screwing up some of the benefit of this method.

~~~
akkartik
Yep. I've tried tracking mysql dumps, and lots of little things can cause
spurious diffs.

------
nir
I wrote a short Ruby script that does a similar operation with Subversion -
it's somewhat Rails specific in that it takes the username/password from RoR
config files, but that can be easily modified.

The code is (ironically) here:
<http://github.com/niryariv/dbbackup/tree/master> and some background here:
[http://niryariv.wordpress.com/2009/03/23/backing-up-mysql-
da...](http://niryariv.wordpress.com/2009/03/23/backing-up-mysql-database-
with-subversion/)

------
mlok
Happy to see someone confirm this can be done. It also made me think of
<http://www.liquibase.org/>

