

Ask HN: How do you deal with "database debt" (old data) - vyew

tl;dr - how to efficiently deal with mass quantities of temporary data...<p>Theres been a lot of talk lately about technical debt and such. Our product deals with online meetings and we create a lot of temporary data, that we keep around for 3-6 months, sometimes 1 year to be safe.<p>Over time a lot of temporary data gets created in our database. We run a cleaner each night to search and find old data and delete it. Its getting expensive (cpu and disk-wise) because first we need to go through and navigate all the tables to identify what can be deleted, then, running the deletes in mass quantities).<p>Its getting to the point where the cleaner has to run all night and I'm still not sure if we're keeping up with the data were creating. As we grow internationally our off-peak hours become shorter... so it has less time to run. Lately I need to keep an eye on the cleaner frequently so doesn't put too much load on the master db as traffic starts ramping up the following day.<p>I'm curious how other companies deal with this issue, if at all. Do large companies just ignore that, and just keep adding disks? As our product grows and the db relationships get more complex, its becoming a real chore to keep adding to the logic of the cleaner.<p>Tim
Vyew.com
======
bartonfink
When you say that you keep this temporary data around for 3-6 months, do you
have a requirement that it be kept in the database for 3-6 months or can you
keep backups and fill requests for old data from them? That can reduce the
size of your tables so your cleaner has less data to process.

If you know that certain tables are subject to this record-keeping
requirement, why not add a created_timestamp field to each table you need to
keep around? That way, your "cleaner" logic is as simple as "delete * from
table where created_timestamp before (six months ago)" with the database-
specific date functions and appropriate table name.

