Back in the day (1999) I wrote a data processing and archiving pipeline for a research satellite, and I was surprised to discover the specificity of their data archival requirements. Off the top of my head, these were decision points:
* Pretty much only used "gold" CDs, rated for longer storage.
* CD write speed was restricted - higher write speeds caused more write failures
* The total data payload was regulated (~600 MB I think), and one of the reasons why was because of the physical impact of human handling on the outermost areas of the disc.
* Storage temperature was also regulated
But ultimately, the two main data storage strategies employed were:
* Physical redundancy - six copies of everything, mailed out to six physically separate facilities every week.
* No archiving "forever" - Archived data was (theoretically) to be restored to contemporary storage technology whenever there was a significant step change in storage technology. At the time, another member of the lab was working on restoring old Voyager data feeds, although I have no awareness of how common this actually was.
There is something funny about how pirated movies and music are probably the most resilient data available and how irreplaceable data like the one that you reference is the most vulnerable.
Maybe we could set up a scheme where a few megabytes of precious data are appended to torrents of movies and mp3s for extremely resilient back-ups?
I have a collection of old burned CDs. All, somewhere between 8-13 years old. Recently I went through and either copied the data to an external drive or tossed 'em. I was able to get all the data off of probably more than 80%. But there were a few that I couldn't get anything off. So I'd say maybe 10 years for burned disks.
My pressed disks (store bought albums and old video games from the same vintage) seem to be 100% okay unless there's scratches on them.
I have been told that is the difference between "duplicating" CDs and "replicating" CDs, with the latter typically done when making large quantities of copies.
Maybe I need to try mine again but last year I went through a stack of CD-Rs I burned back in 98-99 and didn't have one failure. They've pretty much lived in the dark in a plastic tub, but in Florida (indoor) humidity.
Digital data is best suited for constant replication instead of long term "cold" archival. 400,000 CDs is just 320TB of data, perfectly reasonable amount to keep on live hard drives these days. And once you are handling files instead of physical discs then you can just keep changing the underlying HW as tech marches on. With checksumming and redundancy, it seems almost impossible to have bitrot eating data.
I recently ripped all my CDs to WAV files (got tired of the space all those physical discs were taking up). 800+ CDs, some of which dated back to the dawn of the digital music era in 1983. Some of them (music from Eurythmics, Dire Straits, Pink Floyd) required several passes to get a clean copy, but I was able to copy them all.
In my case, I always kept them in their jewel boxes, which were stored vertically, much like you'd store an LP. For most of their life they were kept indoors, but they also spent about 3 years in an unheated storage unit.
The fun part was finding adapters for the few 8 cm CD singles I had, so they'd play in a modern slot-loading drive.
> Some of them (music from Eurythmics, Dire Straits, Pink Floyd) required several passes to get a clean copy, but I was able to copy them all.
Did you use something like Exact Audio Copy? It has options to automatically try try and try again if it reads a bad sector, which really reduces the amount of (human) time required to rip a disc.
The one that worked came with a Rhino Records mini CD of Little Richard. Luckily I didn't have too many of them, and was worried every time that it would get hung up inside the drive...
fwiw, you could convert those wav files to flac without losing any quality and save considerably on file size. you'd also end up with the added benefit of being able to properly tag them
I converted them to .mp3 (at a fairly high bitrate) so they'd play off a USB stick in the car (it doesn't know flac).
It increases my storage, having two copies like that. But I think of the .mp3 files as my "mix tapes" that I wouldn't be upset to lose since I can make another at any time. :)
Backups are a requirement, now that I've gotten rid of the physical media. Local storage is on a mirrored drive array. I also have a copy on a Truecrypt'd drive at work, and another copy at a relative's house several states away. Online backup really isn't an affordable option for 350+ gb. :/
Since the main problem seems to be oxidation, I'm surprised nobody recommends putting them in a sealed cabinet and flushing with compressed nitrogen. Seems easy and cheap.
All you really need to keep data forever is RAID (10 is nice, 1 is the minimum; for more resiliency, you can mirror the entire array a couple of times too) with a load of hot spares, perhaps with the copies stored at 4 or more sites (2+ continents). Radiation shield the lot to reduce bit flips, and if one occurs, you have a background read process that catches them and uses a simple quorum to determine the correct value; some form of ECC could also be built into either the file format or the filesystem itself. As more storage is needed and/or the distro they are using updates a major version, build a new system in parallel, replicate first, test, then remove the old one once it's all verified as good.
The 200,000 CDs at the Library of Congress could be backed up on a RAID-1 NAS, but partnering with a cloud storage service might still be a better idea, plus it would allow people to download some of the data if the author allows it.
One of the variables are aluminum vs gold on the reflective surface. I've yet to have a gold one come back unreadable or corrupted but have a number of aluminum ones that have.
Compact disc owners have noted 'black crud' as a failure mode, where the tendrils of black oxide begin to develop on the disk surface between the layers. My original Cars CD had that happen to it and it was fascinating, it seemed to have started from a defect on the back of the disk which was not noticeable until black stuff started growing there.
I remember looking for a good DVD+R brand when I was backing up my CDs to FLAC for years and everyone had settled with Taiyo Yuden by all the various audiophile and archiving obsessed people I could find online. I haven't had a problem with the DVDs I've burned 10+ years ago now (I checked recently, I read 30 without a hitch before I was satisfied), but now I'm wondering if anything has changed since that time.
When I was in college (mid 90s) it cost about $1 per blank CD. They were rated at "10 years". However, you could pick up a gold blank for about $5 each, and those were rated at "100 years". I wonder how accurate that was. I don't have ready access to either one, sadly.
I spent this summer ripping my stack of about 100 backup CD-Rs/DVD-Rs dating back to 1997?99? to store on Glacier instead. They were just lying in stacks on CD spindles, pretty much the worst kind of storage possible. The majority copied fine. 2-3 failed with visible damage. 2-3 failed with no visible damage.
While most of the discs were the cheapest ones teenaged me could get his hands on, 2 of them were Kodak Gold CD-Rs. One of those was one that failed...
I've never had a recordable CD last longer than a couple years, no matter the brand. I always kept them in cool dark closets, too.* Most (but not all of them) of them at least stay readable, but if you happen to have recorded the checksums I'd wager you wouldn't find a single one that matched after 3-5 years - I haven't.
Maybe it was better with gold blanks. I don't believe I had any of them.
I'd say if you have any archival data on CDs, pull them out, make sure they're still good, and back that data up somewhere else too.
* With the exception of those I burned to listen to in the car. None of those lasted more than a few months.
But ultimately, the two main data storage strategies employed were: * Physical redundancy - six copies of everything, mailed out to six physically separate facilities every week. * No archiving "forever" - Archived data was (theoretically) to be restored to contemporary storage technology whenever there was a significant step change in storage technology. At the time, another member of the lab was working on restoring old Voyager data feeds, although I have no awareness of how common this actually was.