Hacker News new | comments | show | ask | jobs | submit login
Ask HN: How would you store 5TB of data for 50 years, untouched?
78 points by icey on Aug 8, 2010 | hide | past | web | favorite | 172 comments
Let's say you were thinking of putting digital data in a time capsule where it couldn't be touched for 50 years. How would you store it? How would you ensure it was readable at the end of the 50 years?



Here's what I would suggest:

Take a bunch of fibrous, cellulosic material, pound it into a pulp and then squeeze it into very thin, flexible sheets of material. Let them dry.

Then, take some form of pigment or dye, and with a very fine stylus impregnated with the dye, visually encode the data on the cellulosic sheets using a set of graphemes. Each grapheme would roughly represent a phoneme in a spoken language.

It would take quite a while to encode all that data. I'd suggest building some type of mechanical to automate the task of transferring dye on to the cellulosic sheets. I'd also want to bundle these individual cellulosic sheets into stacks of 200-500 for organization's sake. I'd probably cover them with a more durable material such as animal hide or perhaps a thicker layer of cellulosic material.

I'd then take all these bundles of data laded cellulosic material, and I'd build a structure to protect these bundles from the elements. Developing a cataloging or indexing system for these bundles shouldn't be too hard. I'm sure it's been done before.

Regardless, you could either preserve the materials or let the public have free access to the information. You'd run the risk of damaging the data, but if you had a mechanical replication system, you could simply make multiple copies of each data set, and ensure the safety of the data that way.

Sheets of fibrous, cellulosic material should last several thousand years if kept in the right environment.

You know. Now that I think about it. It's probably much too complex a system to handle something like that. I really don't think it would work.


It is important to remove the lingin from your fibrous material, or otherwise ensure the flexible sheets have a basic pH, else the sheets will deteriorate.


Out of curiosity, how large would the time capsule need to be to contain 5TB of data encoded that way?


The storage density of A-format mass market paperbacks containing dense UTF-8 text is roughly 4Mb/Kg. (A 400 page novel weighs around 250 grams and contains roughly 1Mb. Source: I went and weighed one of my novels, of known word count.) We can up the density somewhat by bzip2-compressing and then uuencoding (or similar); maybe 10Mb/kg is achievable.

Normal A-format paperbacks use acidic wood pulp for paper, but acid-free paper doesn't add a whole lot to the cost. So we get roughly 10Gb/ton, and the whole kaboodle comes in at roughly 500 tons. As the density of wood pulp is close to that of water, this approximates to a 10 x 10 x 5 metre lump. Quite a large time capsule, but not unmanagable :)

However. If we posit the availability of decent optical scanners in 50 years' time, there's no reason not to go denser.

We can print the data as a bitmap at 300dpi and be reasonably sure of retrieving it using a 2400dpi scanner. Let's approximate 300dpi to 254dpi, and call it 10 bits/mm. Printed on reasonable quality acid-free paper, we can get 100mbits/square metre, and that square metre will weigh around 80 grams (it's 80gsm paper -- same as your laser printer runs on). Call it 1.1 mb/gram. At this density, we can print the whole 5Tb (or 40tbits) on just 40 tons of paper, without compression; with compression, call it 20 tons. That's a 2.71 metre cube; just under 9' on an edge.

This assumes a monochrome bitmap. If we go to colour and use archival-quality inks we can probably -- conservatively -- retrieve one bit each of red/blue/green per dot, and it's probably not unreasonable to expect to get a whole byte out in place of each bit in our mono bitmap. So we can probably shrink our archive to roughly 2.5 tons of paper -- a pallet-load.


I wonder if anyone has actually attempted this and seen how dense you can pack it and reliably recover. I imagine you would need measures to counter small misalignments when rescanning and imperfections in the physical media.


200kb per A4 page using a 600dpi b/w laser printer: http://ronja.twibright.com/optar/


Take a look at http://www.ollydbg.de/Paperbak/index.html It encodes the binary onto paper and uses Reed-Solomon ECC to restore unreadable data.

I've tested it out myself, and it's only after you start to crumple it together that it stops working. I tested it with an inkjet printer though. A laser printer may stand up better.



So roughly half a Library of Congress.


Hah, I'm building a time capsule right now. Acid-free paper, laser printed, and encased in epoxy resin. I'm targeting 300 to 500 years.

Eventually I want to do the same thing with a 20ft cargo container and a bunch of concrete.


ha, yeah right.

no way that would ever work.


I know nothing about this topic. Would you care to elaborate on this point so the rest of us can learn?


Paper, books, printing presses and libraries.


paper


Well that's embarrassing. Thanks, now I know.


Use ion beam deposition (http://en.wikipedia.org/wiki/Ion_beam_deposition). You can inscribe 20,000 pages on a 5" nickel disk. The data can be read by any civilization that can build an optical microscope. It will last for millennia.


Since some of the solutions people are proposing assume you have a lot of space for storage, I'll assume that too.

1. Get a 9600 bps modem. Use it to encode your data, and record the output as an audio file.

2. Take this audio file, and split it up into 60 minute segments.

3. Record these 60 minute segments onto two-sided vinyl LPs, 30 minutes per side. This will take about a million LPs.

4. Print on acid-free paper, using ink that will survive 50 years too, instructions on how a 9600 bps modem works. Describe the encoding in detail, sufficient so that someone using the equivalent of MATLAB or Mathematica or something 50 years from now on the computers they will have then could easily write a program to decode a modem signal.

5. Also print and include instructions for making a record player. As with the modem, the important part is describing how the signal is encoded on the LP. They'll have no trouble building a record player 50 years from now. (Assuming they don't just photograph the LPs with the 3D terapixel camera on their iPhone 54, and then write an app to extract the signal from the photo...)

5. Store all of this somewhere. LPs will last 50 years easily in a typical office environment, so you probably don't have to resort to something like a hermetically sealed vault buried in an old salt mine or anything extreme like that.


They put ZX Spectrum loading sounds (kind of like a modem except totally audio), on this LP, XL1:

http://en.wikipedia.org/wiki/Pete_Shelley

It is software which, when loaded into the Speccy's audio In port, does funny lightshows in time to the album's songs.

BTW, the LP speed doesn't matter, the Speccy picks it up anyway.

Pretty innovative. And the best thing is you can now get the LP in a TAP-style emulator format! So it survived over 25 years.


Here's the catch: Transmitting 5TB at 9600bps takes exactly 132 years, 1 month, 14 days, 21 hours, 24 minutes and 27 seconds. So the time capsule would be opened before you're done loading it.


Get 300 modems, then. You will be done in less than six months.


Jesus, you'd struggle to get the data off those the day after you did it, nevermind in 50 years.


Nice idea. Probably no need for an actual modem in the encoding, I think that should be feasible with software nowadays?


It's nice to work with standard things when you're talking archive. A physical modem is a lot more standard than a custom one-of program.


I had the same question. Can someone explain?


If a constraint was that it's going in a literal time capsule that would be buried underground (i.e. physical damage to the area is not controllable) I would get a couple of SSDs and a couple of backup tapes, and save redundant copies on different types of media. Given enough space I'd also stick a couple of machines capable of reading the data just in case.

Removing that constraint and completely ignoring cost I'd also setup a low-risk savings account with $1M in it and put the data on S3 and Rackspace Cloud. I'd store access credentials in the capsule. Odds are pretty good one of those 2 will be around in 50 years (and you'll have a chunk of money left over in interest).

Try to keep everything ASCII, with really good text descriptions of data formats.

Realistically 50 years is not a long time: I would bet we'll still have legacy access to USB, SATA, and probably ext3 & NTFS (though probably not IDE). Tons of computer folk who used these technologies will still be alive to work them. English will still be the primary language in the US.

An interesting problem is what to do when the timescale allows these things to change. What if nobody remembers USB, or what spinning platters are. Or the English language?


Neither tape nor SSDs will last 50 years. Within about 10 years, tapes will loose magnetization through thermal movement and capacitors in SSD storage cells will flip due to cosmic radiation. Over some decades the plastic the media and/or casings are made of will just decay (a serious problem for museums of modern art and design: http://www.getty.edu/conservation/science/plastics/index.htm...). The only media with a prooven track record of preservation over decades are acid free paper, parchment and non-organic materials like steel, stone and clay. But getting 5TB on an stone tablet has its very own challenges. [Edit: And using acid free paper won't buy you anything if you print using plastic based toner common in modern laser printers, at least use an inkjet with inorganic pigment (and not dye based) ink. If you look out for pitfalls like this, you might be able to implement your requirements with http://ronja.twibright.com/optar/ and only 125 metric tons of acid free paper, which you should be able to buy for less than $200000.]

Some people claim than MO media and DVD-RAM can guarantee 30 years, but this still is an estimate, they have not been around long enough to actually know.

The only "reliable" way to store digital data for more than five years known today is to copy them to new media well in advance of the old media loosing them, and even that is difficult if the amount of data is growing faster than the the storage technologies get faster. (I don't know if I should trust Eric Schmidt, but a few days ago he claimed that currently humanity generates as much data every two days as it did up until 2003, http://techcrunch.com/2010/08/04/schmidt-data )


People said floppy disks wouldn't last 30 years too. Yet, I've got some C64 disks that are approaching that age quickly. While I'm sure there is some corruption, most of them still work just fine.

If I was to store on magnetic media, I'd do it in a way that allows for some data loss (like usenet does with .par2 files). If you can stand to lose some of it, just pad it with enough redundancy for recovery and you'll be fine.


> People said floppy disks wouldn't last 30 years too. Yet, I've got some C64 disks that are approaching that age quickly. While I'm sure there is some corruption, most of them still work just fine.

I think that this is luck; I found a batch of 8-year-old floppies a few years ago, and more than 2/3 of them were unreadable.


I'd give a shot at DVD, make 3 copies of each disk, put the physical media in an air-tight container filled with helium, and put it all in a salt mine used for physical records storage.


You wouldn't need to use helium. Nitrogen is the standard "inert gas" for such purposes. At standard temperatures and pressures, N_2 does just as good a job at not reacting with stuff as noble gases do.


Just curious -- what is wrong with laser toner? It's carbon, and doesn't fade. And petroleum is "organic", but lasts for millions of years.

Edit: answered my own question. There are several kinds of toner (I had never heard of liquid toner), but some kinds are just fine:

"Toners composed of stable resin materials and a stable pigment such as carbon black are capable of strong bonding to the paper surface. Copies using these toners and printed onto permanent or archival quality paper can be considered permanent and suitable for long-term storage." -- National Archives Australia


SSDs with batteries and/or solar power for 50 years?


If it's solar radiation that flips the bits, why not just stick it in a lead case?


A SSD (flash-based) doesn't need to be powered. Finding a computer that has a SATA/IDE interface to read it back out after 50 years is an interesting problem, however.

[update]

Spansion quotes 20 years minimum. Under optimum conditions (whatever they may be) and with an adequate amount of redundancy, 50 years should be achievable.

Spansion single-bit-per-cell floating-gate flash devices are designed to provide 20 years of data retention after initial programming when exposed to a 125° C environment. Spansion two-bit-per-cell MirrorBit flash devices are designed to provide 20 years of data retention after initial programming when exposed to an 85° C environment. Both MirrorBit flash and floating gate flash are guaranteed to provide 20 years of data retention after initial programming when exposed to a 55° C environment. MirrorBit flash is guaranteed to retain data for up to the minimum guaranteed cycles (10,000). [F]loating gate is guaranteed to retain data after the guaranteed minimum of 100,000 cycles.

http://www.spansion.com/Support/AppNotes/EndureRetentn_AN_A0...


I've never heard of rechargeable batteries with a lifespan of 50 years. Also, the best quality photovoltaic solar cells have a lifespan of 25 years, and their output is down to around 5% of original capacity by the time 20 years rolls around.


Even better run a program on multiple servers that has the ability to move the data around different online storages and has the ability to seek out and buy more online storage solutions if required over time.


My answer to the last question would probably be including a reader right with it. AC current will likely be around a lot longer than 50 years. So if you have some kind of computer containing the data, just need to plug it in and it starts a self-explaining film(ideally with pictogrammes, several language explanations, etc) that should make the data usable...


Convert the data to a string of letters.

Have a child.

Name your child that string of letters.

Now preserving your data is the government's problem- they have to produce a birth certificate and keep track of him/her in their databases.


They'll just rename your child.

The thing about government is, all the laws apply to you. But they can do anything they want.


Encrypt the data with a key long enough that, by Moore's law, you'd expect computers to be able to break it in 50 years. Submit the data to Wikileaks. Destroy the key.


Modern encryption systems do not fall out of use because computers get faster, it's because new mathematical attacks to the underlying encryption are found, and this makes brute forcing it easier than people thought at the time.

In other words, the reason we don't use (say) the MD5 hash system is not because computers are able to break the 32-bit hash system, but because people have discovered flaws in the MD5 algorithm that means it doesn't give '32 bits of strength'. In this case it's not the hardware (CPU clock speed) that gets better, but the software (programmes that break MD5) that gets better.

Hence, you could use Moore's Law to predict what computers in 50 years time will be like, but you can't know what mathematical techniques will be like in 10 yet alone 50 years. Your encryption system you use might get broken in 10 years time.


DES is arguably an exception to that. Twenty years after the standard was published, the EFF published designs for a machine that could brute-force a DES key in a matter of days.

But even when it was published, people were saying a 56-bit key was too small, which they aren't saying of modern cryptosystems (to my knowledge).

http://en.wikipedia.org/wiki/Data_Encryption_Standard


Yes OK, a small key size makes brute force more likely. Bruce Schneier wrote about how it's not possible to brute for 256-bit keys. Short section here http://www.schneier.com/crypto-gram-9902.html#snakeoil , there's a longer explaination saying you can't even flip all the bits in 256-bit key before the universe expires, but I can't find that at the moment. So once you hit a upper limit (256-bits), you can't brute force anything ever, so faster computers are useless. Algorithms like that get broken by new maths, which could happen at any time, you can't predict it, so you can't rely on it.


If your doing it along those lines a better solution would be to encrypt the data with something that you don't expect to be broken in 50 years and store the key on sealed paper, or whatever should last 50 years, in a time capsule for that time.


A lot of people are talking about 50 years like it was a super long time, and propose solutions that are really intended for hundreds to thousands of years. I think it's overkill. [Also I think a lot of you are under 30 :-) ]

For only 50 years, I'd probably risk making many thousands of DVDs and CDs, using different manufacturers and drives. Store with tons of redundancy and ECC, don't use inner / outer tracks for anything important, etc.

Also, are all of the data equally important? You can afford to store the more critical pieces in more expensive and less compact, but more robust formats.

I think the real enemy is obsolescence, and that keeping the data simple (and providing decompression programs and indices in easily understandable formats) is likely more important than worrying about bit-dropout, which seems largely manageable over your specified time.

For 500 years, I'd print it, or micro-inscribe it. (One problem with printed matter is that it has other inherent value, e.g., fuel for heating the yurts of cold barbarians).

For 5K years, micro-inscription and (if you are worried about technological crashes) an archive in the sky. You could populate a host of satellites in various orbits, timed to re-enter at intervals of (say) a decade over a few thousand years (hard to be exact with atmospheric drag and climate change, but you get my drift). Getting something from orbit down to the ground is not hard, getting /noticed/ and picked up as an interesting artifact is probably harder.

For 5M years, add a metric buttload of ECC and stick it in the DNA of some critter that doesn't get out much. A bottom-feeder in a radiation-shielded environment would be cool. Say, a lobster.


I love the lobster idea. It would be even better if the lobsters survival was based on the integrity of the data. This would provide evolutionary ECC.

You would also need some mechanism to signal people in the far future that the lobsters were data carrying devices. Otherwise they wouldn't have any reason to randomly decode sea creatures. Perhaps you could program the lobsters to develop spots on their shells every century which denote the first 10 prime numbers.


A religion that worships the lobster-god?


Compact Cassettes are now in their 47th year of production, still going strong in developing countries. I’m willing to bet that you will still be able to buy cassettes and players in twenty years.

The story for CDs seems to me to be similar and they are still popular everywhere. I give them at least another forty to fifty years.


The DNA idea is great. You'd need a long "this is a message" intro, say a long ATATATATAT repeating sequence (much too long to occur by chance).

As you say, you can use forward error correction to preserve the data. The hard part is describing the data format to the reader.

From the ATATA...intro, a reader knows they have a message. But now they need to know how to interpret it. You need a way of encoding information (english text?) in DNA and you also need a way of describing that encoding mechanism in DNA too...

Basically, over 5k years you should look up all the protocols SETI people have thought up. And/or re-read Gödel, Escher Bach.


Why is everybody worried about the future people knowing how to read the data? Barring some unprecedented catastrophe, we should still have detailed technical specs of today's formats in 50 years.

Just bury 250TB worth of SSD storage, along with a device that activates every year and copies from one 5TB block to the next. Any single SSD will only be in use for a year. If the drives can survive 49 years before their first use, it will work.

Storing the data in some ridiculous format is just going to discourage anyone from ever reading it. I'm sure the people of tomorrow have better things to do than OCR millions of sheets of paper just to see grandpa's porn collection.


How many digital devices that were created in 1960 are still running today? Most digital devices back then were created using vacuum tubes. How many vacuum tube makers still exist? The military was buying vacuum tubes from Checkoslovakia in the 80's to keep the SAGE early warning system running. ref: http://en.wikipedia.org/wiki/Semi_Automatic_Ground_Environme... That's because there were no American manufacturers of vacuum tubes after the late 70's.

Sure, we still have the technical specifications for how to build it, but manufacturing the individual components would be a giant pain in the ass.


I suppose it depends on how much effort we assume will be expended reading this data. If humanity bands together and exploits every available technology, it should be a piece of cake. If it's just our grandkids doing it for kicks, that may present a real challenge.

Regardless, I think it's likely that we will be better at reading 2010 media in 2060 than we are at reading 1960 media in 2010.


Yeah, but by then we'll be able to build everything with 3d printers, so all they'll need is the specs.


I've read that, for some systems, they use microcontrollers and opamps to precisely emulate the original tube's current and temperature response characteristics.


Just bury a laptop with it. They'll figure it out.


Who says that in 50 years SSD storage won't be a ridiculous format? It might be just as hard to get data off an SSD as it would be to get data off an LP, or some other device.


Even so, it should be possible, especially if full documentation for the interface and filesystem used is included. There will probably still be a few computers around which still use SATA interfaces, even if it's only because of how old they are.

And currently, it's fairly easy to get data off of LPs, and they're actually somewhat popular for musical releases. There are devices which let you record LPs into computers, even. So perhaps not quite the best comparison.

(A minor nitpick: it being a SSD doesn't matter. The interface used to connect to it does. There is no difference in the hardware used to talk to a SSD or a HD, as long as they're both using a SATA connector. Note that I refer to the entire unit, what you get if you buy a HD or SSD, not to the actual internal systems that unit uses to access and store data. Those are obvious different, but the computer talking to it doesn't have to know about them.)


> I'm sure the people of tomorrow have better things to do than OCR millions of sheets of paper just to see grandpa's porn collection.

That's actually a pretty good thought - whatever it is, label it as porn.

People have spent millions of dollars restoring vintage erotica films.


Contract with the Long Now Foundation's Rosetta Project to put the data on their 'Rosetta Disks', readable by any civilization with high-powered optical telescopes:

http://en.wikipedia.org/wiki/Rosetta_Project

Supposedly one holds 13,000 pages of text in human languages. If we assume your data is similar text, and one page is 58 lines of 66 characters (as are plain text IETF RFCs), you'll need:

(5TB / (3828 bytes)) / 13000 = 110473 disks


1. Convert all the data to decimal (digits 0-9).

2. Put a decimal point in front of this long string. The result will be a rational number between 0 and 1. Call it x.

3. Get a titanium rod exactly 12 inches long.

4. Using a fine laser, etch a line in the rod precisely 12x inches from the end.

5. Done. Precise, durable, elegant, compact, and green.

EDIT: </sarcasm>


Ah, you added the `</sarcasm>` tag while I was responding …. Anyway, it was an excuse to break out Frink (http://futureboy.us/fsp/frink.fsp), which reports that the resolution required, which (I think) is (1 foot)/(50 terabytes/byte), is 6 * 10^(-15) m, i.e., on the order of the diametre of a proton. Honestly, I thought it would be much smaller.

Another problem is that any rod etched in this way will have two decodings. :-)


Gah, anti-procrastination settings prevented me from correcting this, but it's too stupid to let stand. At least this post demonstrates once again the important principle that a smart (computer) calculator is no match for a dumb (human) calculator.

Of course, n bytes can store a number up to 2^(8n), not up to n. Thus, the number we'd be recording has order of magnitude (50 terabytes/bit)×log[2]/log[10] ≈ 1.20e+14, so we'd need to distinguish that many digits after the decimal point.

Conversely, as bayes pointed out (http://news.ycombinator.com/item?id=1585498), if we have a resolution of 10 digits after the decimal point, then we can handle about 10*log[10]/log[2] ≈ 33 bits.

(EDIT: To be clear, it's the same person posting; I just can't stand to wait 154 minutes to correct the error.)


It's certainly possible to imagine universes in which that would work.

But in ours, where stuff is made of atoms, I can't see you positioning the mark on the rod any more precisely than the width of an atom, which I think is about 10 to the -10 meters. So I'm guessing you could only encode 30 or 40 bits, even with super-advanced etching and measuring equipment.


My answer is that you don't put digital data in a time capsule. Digital data is easy to copy, and that's what you want to leverage. Put a two pairs of servers in two data centers and keep them running, migrating to new tech when needed. I'd assume 5-10 generations of hardware would be necessary.


This is the best answer.

I also think things like NASA datasets, other govt agency datasets, etc, should be placed on torrents for anyone who wants to make a copy. Let the self-replicating nature of the internet serve as the backup backup plan.

If you put those Apollo datasets online, it's a guaranteed certainty that some hacker somewhere will have them in 50 years.


Bleh. It'll just sit on TPB with 0 seeds, 0 peers for years. I'd recommend bundling it with pr0n.


Just out of curiosity: whats the background of this question? Are you planning to actually do this? What kind of data?

If so, you could maybe give use some more information on the constraints involved(although i must admit thinking about it without any constraints is fun, too)


There isn't really much in the way of background, I was thinking about people who decide to use cryopreservation, or the potential of sending out spacecraft for long periods of time where it may be out of communication range but the craft is meant to return, or even something as simple as a school's time capsule.

The constraints were chosen in order to remove the easiest answers (file sizes, period of time, etc).

Ultimately I think it's an unsolved problem that will become more important over time. My family can has photo albums from over 50 years ago but that doesn't have the kind of bandwidth we need for larger datasets (audio, video, etc).

So I guess it's just a thought experiment I thought was interesting.


If this digital data didn't have to be put away somewhere, I would make the most of intelligence. Put someone in charge, with the skills to transfer the data the newer mediums as they become popular and to ensure that the copies are not corrupted in the process. This person would be paid in whatever kind of way leads to the most loyalty, I would also leverage their sense of pride. Maybe have a few different people each tasked with protecting overlapping segments of the data to help ensure nothing is ever lost.

Ideally some kind of artificial intelligence would come about sometime in the future to assume the role of data keeper - hiring people to do any work it couldn't do from within the computer and running off some kind of fund that had been set up. Maybe one day there will be a market for creating intelligent services like this, I hope I have something to do with them.


Transmit the data with a laser to a mirror 25 light years away.


Put it on regular hard drives and write a note to your future self to come back for them once time machines are invented.


The beauty is in the simplicity.


Unfortunately, this is far from simple. Laser light is a lossy medium, in that all laser beams diverge.

For example, even just sending a laser to the moon and back is quite sophisticated.

"Laser beams are used because they remain tightly focused for large distances. Nevertheless, there is enough dispersion of the beam that it is about 7 kilometers in diameter when it reaches the Moon and 20 kilometers in diameter when it returns to Earth. Because of this very weak signal, observations are made for several hours at a time. By averaging the signal for this period, the distance to the Moon can be measured to an accuracy of about 3 centimeters (the average distance from the Earth to the Moon is about 385,000 kilometers)." [source: http://www.lpi.usra.edu/lunar/missions/apollo/apollo_11/expe...]

Thus, even if you could get a mirror 25 light years away, the light wouldn't hit it or bounce back.


Paper ? It's possible to print binary data on paper now. I'm not sure at what 5TB would look like though. It's probably better to keep the data at hand and migrate it as the world and the technology evolve.


A4 page is 8.3x11.7 inches

use 1200 dpi printer to print b/w dots this gives: 8.3 * 11.7 * 1200 * 1200 bits/page = 17 MB/page

5 TB are 142 Books with 1000 pages double sided each (size of a small personal library)

case closed.


5 TB on paper needs to much room to store.


It depends on who's doing the storing.

A4 page, which has 210x297mm, say a border of 10mm all around non-printable area, gives 190x277.

Say we print at a resolution of 0.5mm, and in 16 shades, so we get 4x4 = 16 bits per mm, or 2 bytes. That gives 105,260 bytes per page. Probably we can squeeze more bytes in than this, but let's allocate those to redundancy and error correcting codes.

So for 5TB (in hard drive size, 5e12) would take about 23.75 million sheets of paper printed on both sides. At 5g per sheet, that's about 119 metric tons.

1 ream (500 pages) of the 75g/m^2 A4 paper I have beside my printer here is about 50mm thick. Say 215x305x55mm including some slack for packaging, is a little over 3.6 litres in volume; total volume for 23.75 million sheets is 171,285 litres, or 171.285 cubic metres.

A room with a ceiling height of 3 metres would need to be only 8 meters on each side to store this. Of course, the room shouldn't have any windows and should be at the appropriate humidity, etc.

The cost of the paper, assuming $5 per 500 pages, is less than a quarter of a million. Much more forbidding is the labour and temporary acquisition of printers required to transcribe the data to paper. A good printer @50ppm would take nearly 2 years, assuming zero downtime and very quick paper and toner changes. To do it in more reasonable time and with more reliability, you'd want a bunch more printers; and of course, you'd need to hire the people to do the work of shuffling paper and carting it around, but I'd bet you could probably do it for less than the cost of the paper, particularly if you did it in a cheap labour country.


This guy claims 200 kB per A4 page http://ronja.twibright.com/optar/


Which gives you 125 tons for 5TB on 80g/m² paper single sided, which gives you about the same cost as the 119 tons double sided in the GP, but without paper jams in the duplexer unit.


The estimated error rate is however about 1 in 100,000 pages, which isn't quite good enough. Probably a slightly better code can be used with a small amount of inflation; even something like RAID-5 except over pages (i.e. parity pages) would do.


190mm * 277mm = 52630 mm2

size of one pixel: 0.5mm * 0.5mm= 0.25 mm2

1 page can then hold: 52630/0.25 = 210520 pixel

1 pixel is 2 byte thus 1 page is 421040 byte is about 0.5 MB/page (or 1 MB if you print double sided)

5 TB= 5000 GB = 5,000,000 MB = 5 Million pages

this is 5000 books with 1000 pages each.


Note that you have increased the density of information from my calculation of 16 shades per pixel of 0.5mm x 0.5mm, giving 4 bits per pixel (i.e. half a byte, or one nibble, per pixel), so 16 bits per 1mm^2, and thus 2 bytes per square millimeter.

I intentionally chose a pretty big pixel for redundancy reasons rather than trying to be clever and working out a code etc., but the link in the reply to my post looks more worked out.


Use archival CD-R Media - Good for 300 Years

Start by looking at these guys: http://www.falconrak.com/pro_archival_cd-r_gold_ep.html


Any idea where they get the number 300 years? In the archival world, we're leery of such blatant marketing claims, especially since CD technology is only a few decades old.


Not as much a marketing claim as it is a MTBF analysis procedure. They run their media through advanced heat/cold/light cycles to approximate how long the media will last. There a number of vendors/technology providers out there that are working on this technology - see http://www.millenniata.com/. Electronics vendors do the same thing when creating 20 Year+ Lifetime components - Just heat it up to 85 degrees, then drop it down to -45. Repeat over and over to advance the aging process.

I've heard that the LDS church / Vatican have both been interested in the archival media, and they have a pretty good long perspective, so might be worth checking with technologists in that realm.


According to Wikipedia, the LDS Granite Mountain Vault (http://en.wikipedia.org/wiki/Granite_Mountain_(Utah)) adds 40,000 rolls of microfiche per year. Not sure how current that info is, though; the LDS church is pretty secretive with a lot of its methodology.


You'd also want to store a few CD drives with them. And a few computers to attach the CD drives to. And from there the problem arises. Both bearings and electrolytic capacitors are essential for computers and I doubt that they'd last 50 years.


I still keep around a 286 that is 25 years old and it's fine -- are you suggesting that it will cease to function sometime in the next 25 years?


But your 286 is already exceptional, how many other computers of that vintage are still functional today? This is analogous to particle decay in a radioactive material, you can't predict accurately when any given atom of uranium will decay, but you can describe the rate of decay by the time at which close to half the particles in the sample will have decayed. The half-life of the typical consumer grade PC is ten years or so, from what I've seen (this is an estimate, but this would be a good research project for someone), but just because a model of a given computer still works after 20 years, doesn't mean it will work in another five.


If the half life of a modern bit of electronics is 10 years, that implies that in 50 years 1/32 of currently manufactured drives will still be around. I'd reckon the number would be a lot smaller, but given the terrific quantity of CD drives that exist out in the world right now, I think the chances of one still existing is pretty good.

Even if it didn't, if you compare the costs of many other storage techniques, they're probably equivalent to jerry-rigging a CD player to read back these disks. The difference is the cost is shifted to the reading and not the storage.

Your main risk is the shonky assumptions of the "archival" CD manufacturer. Not that I know what those shonky assumptions are, but I have a vivid memory of the hosts of Tomorrow's World demonstrating the durability of CDs by spreading jam on one, then wiping it off.


Yes.

Also I'd claim that your 286 has higher build quality and it's less sensitive than modern computers.


With the rates of our both culture and technology changes, communicating with people 50 year into the future is like communicating with Aliens. So you may apply the same principles:

http://en.wikipedia.org/wiki/Communication_with_Extraterrest...

Another proven way to communicating your knowledge through thousands of years is to start your own ethno-religious group/nation, like for example Jews.

If you want to combine both approaches - try Scientology ;)


This made me think of something similar: start a sect devoted to preserving your files, to keep the ancient hardware running and to protect it from outside influence. Kind of like in "Foundation".


My dad doesn't think we are aliens, and he used to listen to music encoded on disks as a kid...

Humanity is changing much slower than you may think. The big leap was electricity, and that is well behind us. I doubt the next big leap (nanotech) will become widespread for a hundred or more years.


1. The technological gap between your father and your grandfather was much smaller, than between you and your father. The gap between you and your son in 50 years will be much bigger. Iphone4, Android will be antiquated technology for your son in 50 years. I remember some article about a kid, who found old walkman in his father's closet. It's took him some time to learn how to use it. And he wrote article comparing walkman with iPod ;)

2. Nanotech is much closer, than you think! Toshiba is already working on 16nm semiconductor process. The transistors will be of the size of atoms. Intel Nehalem is already 32nm process, so 16nm is two tic-toc cycles aways.

32 nm — 2010

22 nm — approx. 2011

16 nm — approx. 2013

11 nm — approx. 2015

You will need nanorobots to build chips of the NEAR future. My startup in the nanotech area, so I know what I talking about.


Yeah I know nanotech has current limited applications. That's why I said "widespread". As in electricity once had limited applications, now it's widespread.


50 years isn't all that long - there are plenty of tapes and records that are still perfectly usable from then. As long as you included a decent amount of redundancy you'd be alright with a few hard drives surely? There's always the issue of software being able to read the data, but we have no problems opening images and documents from 27 years ago now. In 50 years time there'll probably be a niche industry producing software that converts old formats - just as there is now converting VHS/Cinefilm.


The major film studios, faced with a related problem a long time ago, opted for a method called YCM separation, where they separate the image into yellow, cyan and magenta and record it onto very stable black and white polyester film stock. Properly stored, this supposedly has a lifetime of 500 years or more.

A modern laser film recorder is capable of a resolution of 4096x3112 and 10 bits per pixel, so that's about 16MB of data per 35mm frame with black and white film.


Print them out on an acid free paper (I have books that are over 300 years old), so this will definitely work.

After 50 years you can OCR the data etc (or ask your personal robot to do it for you) and print it using a variant of TeX/LaTex. This has already survived for 34 years, so another fifty years is almost guaranteed;) Knuth predicted some years back that TeX will last for about 100 years.



Lots of good (and bad :-) ideas; here's another:

Mylar "paper" tape, or use some other plastic that's known to have serious archival qualities.

Bulky, but if stable (enough) in the presence of water it'll survive various failure modes that would kill acid free paper. Of course you could etch Mylar or some other stable plastic to gain greater data densities like with the suggestions for paper. Just pick a plastic we know is seriously stable from actual experience, like we know with acid-free paper.

We also have such experience with emulsion based storage methods (microfilm, fiche, etc.), but those are rather delicate for my taste.


Assume we encode the data on acid free paper with color-retaining ink using colored squares of size 1/64" x 1/64", using one of 64 colors in each square. There are then 4,096 of the squares in 1 square inch, so (assuming we print on 8"x10" regions) we can fit 327680 squares on a side of a sheet of paper, so that there are 655360 squares on a sheet of paper (if we use both sides). Each square encodes 6 bits of information, so we have 3932160 bits per piece of paper, or 491520 bytes, which is 480 KB.

At this rate, encoding a gigabyte requires 2,185 pages. As an aside, this is only 5 pages fewer than are contained in the "Art of Computer Programming" box set.

We can comfortably fit a gigabyte, then, on printed paper, in a 10"x12"x5" box. A terabyte will then fit comfortably in a 10'x10'x5' space. Throw a few of these together to get 5 terabytes. Let's add, say, 1 TB more of error correcting codes. In the unused margins of each page add in some information about alignment, a printing of all the colors used (to try to protect against inks changing color over time) and the page number. All together, this is certainly big, but could probably fit in, say, a tractor-trailer. Throw in some books describing the data format and the meaning of the data, and you're done.


Store the drive along with a computer to read it? :)


It seems like optical discs (CD/DVD/BluRay) are the right idea, they're just not made from stable enough materials. As far as I know, you could apply the same technology used to make optical discs to a more stable material like gold. Assuming that you could get the same storage density as a DVD (which seems reasonable, given that we have the technology/know-how to make 25GB single-layer BluRay discs), you'd need something like 500-600 disks (if you etched both sides). I'm a bit suspicious of my math here, but if my back of the envelope calculation is right a single disc made of gold would be something like 0.8 kg.

The nice thing about this solution is that there's a lot of existing knowledge about how to build and design optical media, how to build in parity checking, detect jitter when reading, etc. This could be relatively cheap to do in bulk also, if you found the right material (I just mentioned gold because I know it to be chemically stable, but there are probably better materials).


Is putting it into orbit with solar power to keep it alive an option?


If it's electronic media, I would guess that solar wind/flares will kill it pretty fast without earth's magnetic field for protection.


Probably would have to build a desktop with a 5x1TB HDD and make sure the case is super strong and water proof. Stick the machine in a fire-proof and water proof safe and of course, be sure to have formatted the partitions differently and have RAID.

Alternatively burn a 100 or so Blu-rays and get two Blu-ray readers (one on a mobile device) other an external reader that you will attach to the aforementioned desktop.

Or who knows, maybe holographic storage (http://en.wikipedia.org/wiki/Holographic_data_storage) will come around finally and store the 5TB in a toothpick sized gizmo (which might probably run 512 cores of Googple's Chip (in my alternate reality, Google and Apple merge and buy Intel)...


For the HDD route, I'd probably prefer lower density storage, heaps of redundancy, and a good spread of HDD manufacturers.

A quick Google of HDD MTBF suggests that 1 million hours (over a century) is wildly optimistic, and typical failure rate is 2-4% per year, possibly as high as 13%. If e is the failure rate (as a fraction of 1), and assuming a constant and independent failure rate over time, then the survival chances for any one drive are:

    (1 - e)^50
So the chance that any one will fail is:

    1 - (1 - e)^50
With n mirrors (assuming a reliable checksum to verify data in the event of only a single copy of a mirror surviving), the chances of all failing, f, are:

    f = (1 - (1 - e)^50)^n

    log(f) = n log(1 - (1 - e)^50)

    n = log(f) / log(1 - (1 - e)^50)
So, for a reliability of 99.999%, and hoping you can keep the individual yearly failure rate at 3%, so f=0.00001 and e=0.03, n would need to be at least 47.


I think you're making the improper assumption that the drives would need to be left on during this entire time. I don't believe they would.

I don't know if the bits eventually lose their magnetism over time, so if they do, you may need to spin up the drives every so often and copy to and from drives to make sure the data is still "fresh", but I seriously doubt they'd need to be left on and spinning for the entire 50 year span.


I'm not assuming that the disks are spinning all the time; I don't know what the failure rates of drives left unpowered are, so I took the spinning rate instead. Google search suggests that failure rates for unpowered hard drives is high - sticking heads etc. - and as drives are not designed for this, probably higher than powered drives.


this is also what I've read. Also, anecdotally, I've had a disproportionate number of drives fail on power-up.


There are a lot of amazing solutions on this page. Things that I haven't even thought about, but why do we have to assume that a data reader cannot be kept in the capsule itself? Why should we create something of arbitrary complexity that will cause people to just tear their hair out in frustration?

Since the basics of logic will not change and since theoretically any computer can simulate another. Why shouldn't we just keep a hackable computer with detailed visual instructions and specifications? Further, to enable someone to read the specifications we could have a "learning board" with the symbol and the component next to it. Also, we could even have a haptic output with which people can interact with the computer.

Let's assume that we have a nuclear battery made out of Technetium. Now this feeds into a bank of capacitors and high performance rechargeable cells. Slowly over time the batteries are kept topped off and they are "exercised" by the computer. Further, for redundancy 4 or more computing units in parallel could be placed that would wake up sequentially and call the others to check how the entire unit is working. If we keep something like this in a hermetically sealed environment and we use the radiation source to manage the temperature and use passive cooling technologies for letting out heat. It should be able to sit still until someone finds it.

Now the data itself would be stored on a series of solid state devices [edit: a specially designed optical storage medium would be far better, but this is 50 years not 1050.] attached to a display. Why shouldn't this suffice?

Presuming that civilization has not collapsed anyone should be able to read it.

By the way, the Phoenix lander has a DVD that tries to do just this (see: http://en.wikipedia.org/wiki/Phoenix_(spacecraft)#Phoenix_DV...). It even has this awesome intro by Carl Sagan. (hear: http://www.planetary.org/special/vomgreetings/greet2/SAGAN.h...)

[edit: Dyslexic errors.]


Endow an educational institution with funding to teach the data to students, and a trust fund to pay a stipend to any student who can prove they have taught all of the data to all of their children, and do the same for the children's children. The institution will determine invariant replication through rigorous testing of a recitation of the data set.

The human brain has superior data storage capacity and resilience to any of the mentioned media, and a longer functional lifespan. It can also adapt on the fly to changes in technology and language.

This also makes the data proof against illiteracy and technological collapse, assuming a cultural impetus to memorize the data and teach it to others can be maintained.


Take a person. Make the person maintain 1MB of data -- copy it, memorize it, something like that. Possibly understand it. Repeat for 5 million people.

If 1MB is too much, try 1KB, and repeat for 5 billion people. There should still be some people left for systems maintenance.


0. Write the data on an hard disk (named 0) and calculate multiple hashes for the various fragments of data. The hard disk must contain also a minimal os which is only able to dd partitions and must be connected to a computer

1. Create some type of mechanical which is able to periodically do these tasks:

   - create an hard disk (named n)

   - plug it to the computer above

   - transfer the data from hard disk n-1 to hard disk n

   - check the hashes

   - unplug hard disk n-1

   - reboot the system using the new hard disk


To minimize the failure rate of this environment you should create more computer which check if something is not working


My idea.. include with the data storage devices (5tb data on them), the complete but unassembled parts of the data reader (maybe a full computer ready to go) plus instructions on how to assemble a compatable power supply for the time it's being recovered in..

vacuum seal everything, with a packet of silica.. no air at all, moisture removed.. part by part.

1) then vacuum seal the container or 2) pack it with closed cell insulation

no light, no air, no corrosion or UV damage. advantage to #2 is that it would be ruggedized for hits & transport.


Use a high energy laser to etch thick platinum CDs. The pits produced by the laser should be deep. Coat the etched surface with a thick layer of diamond using vapor deposition. Enclose the CD in a vacuum case made with extremely durable material (granite maybe). Entomb the CD case in a 10 ton cement block. Stack the Cement blocks (if more than 1 CD) in a shape of a pyramid on a dry desert plateau. Irradiate the area around the pyramid. That should last more than 50 years.


Write a book about how you collected the data. Then just collect it again when you need it in 50 years. In a long time scale, processing power is cheaper then storage.


Very interesting topic.

For preservation, encode the data in the DNA of a bacterium, replicate it massively and put them in some kind of suspended animation (hand waving wildly).

Getting the meaning of the data is another matter. I wonder whether it will be possible to create species of bacteria that can decode the above DNA and present it directly accessible to human senses - ex: bacteria that change color, form shapes, etc.


I don't think genomes are anything like as big as the requirement, e.g. this link gives 0.35GB as the size of the human genome:

http://www.utheguru.com/fun-science-how-many-megabytes-in-th...


Nothing stopping you from making a much larger one! Or a lot of different ones, if it's automated.


Read the data and convert it to binary numbers. Use teddy bears for encoding the data. Each teddy bear has another color which represents a sequence of numbers, e.g. yellow is 1010101011101, blue 10101011010101010101010 and so on. We have 16777216 color possibilites with the normal rgb or cmyk color mode. I believe that is more than enough to encode the data (altough I didn't check it).

Easy isn't it? ;)


Two sets of hard drives, five sets of CDs, and five sets of DVDs. I'd store it in a fireproof locked box with silica gel. Would probably work.


No, it won't. Some CDs from the early 90ies are not playable anymore because the foil that stores the data is damaged through aging processes. DVDs and hard drives have similar problems and won't last longer than 5-15 years if you are not lucky.


Would it help to seal the CDs/DVDs/CDROMs in a vacuum bag? And/or in cryogenic storage?


A nitrogen-filled container would be more practical than a vacuum. Lowering the temperature would definitely help, although too cold could be bad.


Well, you know they say a picture is worth a thousand words. Pictures are quite good at lasting for eons, when put upon proper substrate


I have received Commodore 64 floppies from a lot of different people, some of which have dates written in the early 80s or late 70s (for the disk not the data of course). And in every box I've gotten there have only been a couple that haven't worked. Maybe they all just happen to have been stored in optimum ways.. but I fail to see what the big deal is.


Here's an associated question that archives everywhere are beginning to struggle with (though they should have begun struggling long ago). Many municipal records are being born digital these days, and records retention schedules telling us to keep a wide variety of materials permanently.

How would you store an ever-increasing amount of digital data indefinitely?


Given any method, memory diamond[1]. It's incredibly dense (25 grams of the stuff stores a bit over half a Yottabyte), and quite durable. If restricted to existing methods ... a metric fuckton of hard drives in raid arrays, outsourcing backups to cloud storage providers where possible, and investing as much money as possible in getting a way to use memory diamond worked out.

[1] Charles Stross describes it: "Memory diamond is quite simple: at any given position in the rigid carbon lattice, a carbon-12 followed by a carbon-13 means zero, and a carbon-13 followed by a carbon-12 means one. To rewrite a zero to a one, you swap the positions of the two atoms, and vice versa." See http://www.antipope.org/charlie/blog-static/2007/05/shaping_...


It is not only the physical storage, but also the software that can read the data which matters. So choose your formats wisely.


Amazon S3 would surely be the most economical, right? No upfront costs, and running costs steadily decreasing?


I wouldn't count on the 'steadily decreasing' part, s3 hasn't come anywhere near to following the downward price trend in hard drive cost per gigabyte/power per gigabyte, but even at todays costs, that's only $3900, which sounds pretty reasonable compared to what others are suggesting, if I did my math right, to keep 50 gigs around for 50 years. Of course, while I'd rather depend on s3 being around in 50 years than a raid in my closet, I still don't know that I'd give it great odds.

I don't think 'use rackspace /and/ use amazon' is a good strategy either, just 'cause if market conditions change enough that one goes under it's likely the other will to... and it's also likely that they could be bought by the same entity. you'd want to put one copy on s3, and then use some completely different storage method, like microfiche or or archival CD roms or something in a safety deposit box.


1) Buy 5-10 different brands of hard drive and make a copy of the data on each. 2) Invest $100K dollars in very long-term inflation-protected investments 3) In 50 years, use the proceeds (around $1m) to offer a prize to anyone who can decode the disks to a current format


What if there is some kind of solar event that renders the population unable to generate electricity. Does this leave paper/microfiche/stone as the only options? Surely this kind of possibility must be accounted for when determining the viability of digital options.


Sandisk has already solved this: http://www.electronista.com/articles/10/06/23/sandisk.outs.s...

In a few years we'll have those with 64gb/128gb capacity.


Holographic storage. Seriously. The current technology has a 50 year archive life.

http://www.inphase-tech.com/products/default.asp?tnn=3


Here's a good question, given the probability of a bit on an ssd being flipped by cosmic radiation, how much redundancy would allow recreating the data using parity bits or something along those lines?


how long does an integrated circuit last in storage?

design a few of them with the data just burned in. no capacitors to fail, you could get your redundancy by just making a couple dozen of each. design them to wait a few seconds after power on, and then start spitting out the data and a clock. label the power, ground, serial out and data pins.

given those pins, there isn't anything else you could do with it besides hook it up and see what it has to say. and all you need to read it is a stable power supply, and sufficiently fast analog to digital converters.


I think you have to state some assumptions about technology levels at the time the time capsule is opened and whether you want the data to survive much beyond the 50 years if it is not opened.


I would go with b/w microfilm but anyhow see here http://www.ifs.tuwien.ac.at/dp/timecapsule/home.html


doesnt every solution involving magnetic media include the need to have power sources (since HDD, USB drives need a recommended electrical power rating source).

Optical media is the only way to guarantee whichever future "creatures" encounter the data, can actually figure out how to access it.

If you are a blind alien (so that you do not understand the _concept of light_) you can potentially still have measuring equipment that can sense the pits in the media and make sense out of the binary data.


We're talking 50 years not 50 centuries


I can't see infrared or uv but that doesn't mean I don't understand the concept. to get waay off topic.


Just write an own logarithms and encode this in that way. Rent some dedicated servers with enough HDD and RAID, store them there, grant various persons access. :)


1. Encode the data into a stream of bits 2. Send the data using a laser and bounce it off a planet 25 light years away. 3. Recieve the data in 50 years



Print them as high-density barcode on acid-free paper.


You people have way too much free time. I'm going to get Congress to pass a law stating all free time must be given to the gubbermint.

Cheers :)


Make a bunch of asian kids memorize it. You'll need some redundancy, but not much. Most of them will live to be 60.

(note: self.race == asian)


While these are some rather colorful and interesting ideas, might I suggest ROM chips?

goes back to playing 30+ year old Atari games


use microfilm, write as text.

http://en.wikipedia.org/wiki/Microform


If it's media, store it on 80+ iPads. Bundles the reader with the data. Only requires AC power in 50 hrs.


Wouldn't that make the "kindle/nook/something else" a better platform for that kind of solution?


this is the only way to make an ipad useful :p



Or just go to the supplier:

http://hstrial-norsamtechnol.homestead.com/Rosetta.html

The New York Times used them to make a nickel disk of their archives for a very long-lived time capsule.


Put it encoded and distributed over Internet, so Google and other search engines can store it.


metal punch cards, stored in low acid clay vessels in a very dry place. Seems to have worked so far... http://en.wikipedia.org/wiki/Copper_Scroll


Put a webserver on the Moon accessible via satellite and/or short wave radio.


And store data on ceramics or porcelaine ? It's easier than writing on rocks


Stone, or perhaps clay. Some of the oldest written records we still have are from Sumer using cuneiform on clay tablets. Use modern micro-abrasion techniques to encode data to stone, then put that stone away into an area little accessed by humans on a daily or yearly basis.

Assuming there is no cost limit here, I would go one step further and say use some form of metal. Say stainless steel, aluminum, gold, or titanium. Some metal that is very stable over time and does not interact with the atmosphere readily. Again, use micro-abrasion / carving technology to write data to the materials.

The next question is what format for the data? It depends on whats being stored. The biggest issue is that of "formats".

Lets look at things that last a long time. The English(or any language) is unlikely to change that much in 50, 100, or even 200 years. Words and their meanings will change, but for the most part a native English speaker 200 years from now could read what we write now. Whether or not they understand the usage of the language is a different question. So if its a textual document you're saving, write it in plain English. No abbreviations, etc.

What about media? That gets complicated. If its a static image, perhaps keeping it simple is best. In plain English, write that the following section of data is an image. Each group of three numbers starts at 0 through 255. In procession from left-to-right, the values represent Red, Green, and Blue. Each group of 3 numbers is what we called a "pixel". The image is 300 pixels wide, and 800 pixels heigh(arbitrary numbers for this argument).

For moving images, further expand on the single image description and say every 24 images should be spaced equally and viewed over the course of 1 second to achieve animation.

Sound is something I don't know anything about from a data format perspective, but I would again find the simplest mechanical way to produce a sound and store it in that format with ample verbage describing how to handle it.

Edit After reading other responses that came in while I was writing this, I want to add some more thoughts.

Remember that our technology is ephemeral. We don't really use much tech from 50 years ago, hardly any from 100 years ago, and it just gets worse from there.

Things like microfiche, ssd's, cd-roms, blue-ray, etc are all the more bad ideas for long-term backup. Paper books are a better option than any of these for near-term storage for time periods up to 50 years.

If we want to actually store data in a meaningful way for long periods of time, say over 100 years, we have to keep it simple. Your devices will probably not last 100 years, even if kept in storage under the most secure environment. But in 100 years people will still have eyes, ears, and hands.

We have to look back over history and look at the material types that survive long periods. Stone, and metal to an extent, are very good long-term materials. Cloth and paper are not*. DNA is potentially a usable data store, but is corruptible. Plus you can't read DNA patterns with the eye.


> We don't really use much tech from 50 years ago, hardly any from 100 years ago, and it just gets worse from there.

Printing presses, microfiche, film, radio, telephones, television, computers, light bulbs, electrical sockets, toasters, cars, bicycles, electric stoves, speakers, microphones, projectors, etc.

Technology is not nearly as fickle as you'd think. Remember, 50 years ago is only 1960. VHS tape technology is almost 40 years old. Sure, things are moving faster now, but things that work tend to stick around.


I think you're largely right, but I have one nitpick: microfiche, and other types of microform, are pretty stable, and only require a magnification tool to be human-readable. They're pretty much the gold standard for efficient archival long-term storage of information.

But as you say, stone and metal as well as acid-free paper made from rags or mulberry fibres have a proven track record for longevity.


Stone/metal tablets would be the best format, but consider the amount of data in question. 5 TERABYTES. That's a lot of tablets. And a lot of work to engrave those tablets.


"Some of the oldest written records we still have are from Sumer using cuneiform on clay tablets."

Actually that alone is no proof that clay tablets are very durable. Who knows, there might have been billions of them in circulation, and only a few of them survived. That would be a rather bad track record.


I'd store it on BetaMax tapes :) They do say history repeats itself


Paper, it will outlast anything other than maybe stone.


punchcards.


I had a good talk with my ex-wife's uncle about this a few years ago. He's a record's archival specialist in Australia, and they seem to be doing a lot of work with Microfiche. Apparently it lasts quite a long time and is highly dense. They encode data in some sort of an xml format, and the first slide on each sheet has a reasonably explanatory description as to how to decode the data.


B&W film is the best archival format I've come across, it'll last a century stored in a shoebox in the attic, and you can look at it just by holding it up to the light. That's important; anyone who picks it up regardless of their prior experience will know, this is a thing with information on. Would you even recognize a HD if you'd never seen one before?


I agree - I think this is the route for more substantial (centuries) storage. However, I'd probably wouldn't use xml - some variant of http://ronja.twibright.com/optar/ mentioned elsewhere on this page.

I'd include the definition of a simple machine, and the text of a program written in opcodes for the machine, for decoding.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: