
Ask HN: Archive a MySQL database - wenbert
How do you archive a database? I have this database and it has been receiving data for the last 3 years and I would like to archive the old data WITHOUT affecting the current data (current transactions, etc.). The database is heavily used and is close to a million records and on some cases it takes a little longer for the complicated queries to complete...<p>What is the best way to do this? I have no experience on this. In fact this is the first time I have handled a database with this volume. Perhaps you can direct me to a good resource, etc.<p>I am using MySQL5 on a LAMP environment. If it requires downtime, I can sacrifice a day for this.<p>EDIT: If I backup the entire database, I would still have "old" records and the database would still be at the same size.
======
tmm1
[http://www.xaprb.com/blog/2006/05/02/how-to-write-
efficient-...](http://www.xaprb.com/blog/2006/05/02/how-to-write-efficient-
archiving-and-purging-jobs-in-sql/)

<http://www.maatkit.org/doc/mk-archiver.html>

~~~
wenbert
Thanks for this. This is what I am looking for.

------
chadr
Here is what I do with mysql 5.1 and my innodb tables on Linux...

mysqldump --single-transaction --skip-lock-tables --all-databases | gzip >
"/some/directory/backup-`hostname`-hourly-`date +%H`.sql.gz"

These two options are important: --single-transaction and --skip-lock-tables.
At least in mysql 5.1, mysqldump defaults to locking each table before dumping
it (not good if you have a busy app and large db).

I run the above in a cron once per hour. It gives me 24 hours worth of hourly
backups. I then use rsnapshot (<http://www.rsnapshot.org/>) to store daily,
weekly, and monthly backups offsite.

I also run my backups with a low priority (nice -n 19) so that my application
gets higher priority than the backup script.

~~~
chadr
One last thing... mysqldump has the following option:

-w, --where=name Dump only selected records; QUOTES mandatory!

You can probably use this to backup just the rows you are interested in and
then delete them once you are sure the are properly archived.

------
agotterer
One of the ways we archive data is by setting an archive cutoff point. So say
we only need the last 6 months of data... Every month we will run a script
that takes anything older then 6 months and move it to a separate table (or
database). Obviously this solution isnt perfect for every situation. But
depending on how much data you have for each row, a million rows usually isnt
all that much. So far example we may have a table called "data" and a second
table called "data_archive". This will keep the data table and indexes small
and manageable.

You should of course still backup your data since the archive is really just a
reference.

~~~
wenbert
This makes sense. But I imagine it that it will be complicated provided that
one table has a lot of other dependent tables associated with it. Am I right
to assume this?

Perhaps if I had foreseen this, I would have made my database schema easier to
archive. :-(

------
brianr
If you can't afford downtime, check out Innodb Hot Backup (a commercial
product sold by oracle).

If you already happen to be using LVM, check out this post:
[http://www.mysqlperformanceblog.com/2006/08/21/using-lvm-
for...](http://www.mysqlperformanceblog.com/2006/08/21/using-lvm-for-mysql-
backup-and-replication-setup/)

Otherwise, you're going to have some downtime, which could range from a few
seconds to a lot longer depending (mostly) on the size of your data. The
simplest way is using mysqldump, but that may be too slow (probably not
though, unless you have really low-end hardware, 1m records isn't that many).

There are faster (and less straightforward) ways, which basically involve:
-stop the server -copy the data directory somewhere else -restart the server

Google is your friend here. If you have a spare server, you might also
consider setting up replication so that the next time you want to back up your
data, you can take it from a slave and it won't require any downtime.

------
gojomo
This technique using LVM snapshots always struck me as the optimal way to
achieve negligible-downtime backups of even largish databases:

[http://blog.dbadojo.com/2007/09/mysql-backups-using-lvm-
snap...](http://blog.dbadojo.com/2007/09/mysql-backups-using-lvm-
snapshots.html)

(Haven't done it myself, though.)

~~~
chrisbolt
We've been using this at work for a while now, and it's a good way of getting
consistent backups. Only drawbacks are that I think it only works with MyISAM
tables, and writes block for a few seconds while the snapshot is created.

------
wenbert
Another thing I am curious about. When archiving/purge/whatever it is called;
what happens to the dependent tables?

For example, the Header-to-detail relationship between tables.

Anyway, I will be trying out the Tools found in this link:
<http://www.maatkit.org/doc/mk-archiver.html>

------
chrisbolt
If you're using MyISAM, you can use MERGE tables to split data over multiple
tables, or you can use partitioning. Then when you decide one of the 'sub'
tables is 'full' you can take it out of the MERGE table, do a read lock, rsync
it to a backup server, then drop it off the main server.

------
thomasmallen

      # mysqldump -u (user) -p (database name) > (destination filename)
    
      mysqldump -u root -p some_project > some_project.sql
    
      # to import:
      # mysql -u (user) -p (database name) < (backup file)
    
      mysql -u root -p some_project < some_project.sql

------
newt0311
Read up on WALs and incremental backups. I know postgres has support for these
and MySQL should too.

------
slater
You could use PHPMyAdmin:

Go to the "export" tab and use the "Save as file" option

~~~
wenbert
Exporting the database? But then I would still have a lot of rows in the
database. What I was trying to say is that I archive the "old" records in the
database and still have the currently used ones available. How do you go about
this?

~~~
elijahbuck
What is your criteria for a record being "old"? Once you've established your
criteria, just archive the database as it is now and then drop the rows that
match your oldness criteria.

~~~
wenbert
I imagine that I have also to query down to the dependent tables?

