
Ask HN: How would you store 5TB of data for 50 years, untouched? - icey
Let's say you were thinking of putting digital data in a time capsule where it couldn't be touched for 50 years. How would you store it? How would you ensure it was readable at the end of the 50 years?
======
iamelgringo
Here's what I would suggest:

Take a bunch of fibrous, cellulosic material, pound it into a pulp and then
squeeze it into very thin, flexible sheets of material. Let them dry.

Then, take some form of pigment or dye, and with a very fine stylus
impregnated with the dye, visually encode the data on the cellulosic sheets
using a set of graphemes. Each grapheme would roughly represent a phoneme in a
spoken language.

It would take quite a while to encode all that data. I'd suggest building some
type of mechanical to automate the task of transferring dye on to the
cellulosic sheets. I'd also want to bundle these individual cellulosic sheets
into stacks of 200-500 for organization's sake. I'd probably cover them with a
more durable material such as animal hide or perhaps a thicker layer of
cellulosic material.

I'd then take all these bundles of data laded cellulosic material, and I'd
build a structure to protect these bundles from the elements. Developing a
cataloging or indexing system for these bundles shouldn't be too hard. I'm
sure it's been done before.

Regardless, you could either preserve the materials or let the public have
free access to the information. You'd run the risk of damaging the data, but
if you had a mechanical replication system, you could simply make multiple
copies of each data set, and ensure the safety of the data that way.

Sheets of fibrous, cellulosic material should last several thousand years if
kept in the right environment.

You know. Now that I think about it. It's probably much too complex a system
to handle something like that. I really don't think it would work.

~~~
Nogwater
Out of curiosity, how large would the time capsule need to be to contain 5TB
of data encoded that way?

~~~
cstross
The storage density of A-format mass market paperbacks containing dense UTF-8
text is roughly 4Mb/Kg. (A 400 page novel weighs around 250 grams and contains
roughly 1Mb. Source: I went and weighed one of my novels, of known word
count.) We can up the density somewhat by bzip2-compressing and then
uuencoding (or similar); maybe 10Mb/kg is achievable.

Normal A-format paperbacks use acidic wood pulp for paper, but acid-free paper
doesn't add a whole lot to the cost. So we get roughly 10Gb/ton, and the whole
kaboodle comes in at roughly 500 tons. As the density of wood pulp is close to
that of water, this approximates to a 10 x 10 x 5 metre lump. Quite a large
time capsule, but not unmanagable :)

However. If we posit the availability of decent optical scanners in 50 years'
time, there's no reason not to go denser.

We can print the data as a bitmap at 300dpi and be reasonably sure of
retrieving it using a 2400dpi scanner. Let's approximate 300dpi to 254dpi, and
call it 10 bits/mm. Printed on reasonable quality acid-free paper, we can get
100mbits/square metre, and that square metre will weigh around 80 grams (it's
80gsm paper -- same as your laser printer runs on). Call it 1.1 mb/gram. At
this density, we can print the whole 5Tb (or 40tbits) on just 40 tons of
paper, without compression; with compression, call it 20 tons. That's a 2.71
metre cube; just under 9' on an edge.

This assumes a monochrome bitmap. If we go to colour and use archival-quality
inks we can probably -- conservatively -- retrieve one bit each of
red/blue/green per dot, and it's probably not unreasonable to expect to get a
whole byte out in place of each bit in our mono bitmap. So we can probably
shrink our archive to roughly 2.5 tons of paper -- a pallet-load.

~~~
falien
I wonder if anyone has actually attempted this and seen how dense you can pack
it and reliably recover. I imagine you would need measures to counter small
misalignments when rescanning and imperfections in the physical media.

~~~
blasdel
200kb per A4 page using a 600dpi b/w laser printer:
<http://ronja.twibright.com/optar/>

------
GnarfGnarf
Use ion beam deposition (<http://en.wikipedia.org/wiki/Ion_beam_deposition>).
You can inscribe 20,000 pages on a 5" nickel disk. The data can be read by any
civilization that can build an optical microscope. It will last for millennia.

------
tzs
Since some of the solutions people are proposing assume you have a lot of
space for storage, I'll assume that too.

1\. Get a 9600 bps modem. Use it to encode your data, and record the output as
an audio file.

2\. Take this audio file, and split it up into 60 minute segments.

3\. Record these 60 minute segments onto two-sided vinyl LPs, 30 minutes per
side. This will take about a million LPs.

4\. Print on acid-free paper, using ink that will survive 50 years too,
instructions on how a 9600 bps modem works. Describe the encoding in detail,
sufficient so that someone using the equivalent of MATLAB or Mathematica or
something 50 years from now on the computers they will have then could easily
write a program to decode a modem signal.

5\. Also print and include instructions for making a record player. As with
the modem, the important part is describing how the signal is encoded on the
LP. They'll have no trouble building a record player 50 years from now.
(Assuming they don't just photograph the LPs with the 3D terapixel camera on
their iPhone 54, and then write an app to extract the signal from the
photo...)

5\. Store all of this somewhere. LPs will last 50 years easily in a typical
office environment, so you probably don't have to resort to something like a
hermetically sealed vault buried in an old salt mine or anything extreme like
that.

~~~
phreeza
Nice idea. Probably no need for an actual modem in the encoding, I think that
should be feasible with software nowadays?

~~~
sliverstorm
It's nice to work with standard things when you're talking archive. A physical
modem is a lot more standard than a custom one-of program.

------
bkrausz
If a constraint was that it's going in a literal time capsule that would be
buried underground (i.e. physical damage to the area is not controllable) I
would get a couple of SSDs and a couple of backup tapes, and save redundant
copies on different types of media. Given enough space I'd also stick a couple
of machines capable of reading the data just in case.

Removing that constraint and completely ignoring cost I'd also setup a low-
risk savings account with $1M in it and put the data on S3 and Rackspace
Cloud. I'd store access credentials in the capsule. Odds are pretty good one
of those 2 will be around in 50 years (and you'll have a chunk of money left
over in interest).

Try to keep everything ASCII, with really good text descriptions of data
formats.

Realistically 50 years is not a long time: I would bet we'll still have legacy
access to USB, SATA, and probably ext3 & NTFS (though probably not IDE). Tons
of computer folk who used these technologies will still be alive to work them.
English will still be the primary language in the US.

An interesting problem is what to do when the timescale allows these things to
change. What if nobody remembers USB, or what spinning platters are. Or the
English language?

~~~
fhars
Neither tape nor SSDs will last 50 years. Within about 10 years, tapes will
loose magnetization through thermal movement and capacitors in SSD storage
cells will flip due to cosmic radiation. Over some decades the plastic the
media and/or casings are made of will just decay (a serious problem for
museums of modern art and design:
[http://www.getty.edu/conservation/science/plastics/index.htm...](http://www.getty.edu/conservation/science/plastics/index.html)).
The only media with a prooven track record of preservation over decades are
acid free paper, parchment and non-organic materials like steel, stone and
clay. But getting 5TB on an stone tablet has its very own challenges. [Edit:
And using acid free paper won't buy you anything if you print using plastic
based toner common in modern laser printers, at least use an inkjet with
inorganic pigment (and not dye based) ink. If you look out for pitfalls like
this, you might be able to implement your requirements with
<http://ronja.twibright.com/optar/> and only 125 metric tons of acid free
paper, which you should be able to buy for less than $200000.]

Some people claim than MO media and DVD-RAM can guarantee 30 years, but this
still is an estimate, they have not been around long enough to actually know.

The only "reliable" way to store digital data for more than five years known
today is to copy them to new media well in advance of the old media loosing
them, and even that is difficult if the amount of data is growing faster than
the the storage technologies get faster. (I don't know if I should trust Eric
Schmidt, but a few days ago he claimed that currently humanity generates as
much data every two days as it did up until 2003,
<http://techcrunch.com/2010/08/04/schmidt-data> )

~~~
wslh
SSDs with batteries and/or solar power for 50 years?

~~~
sliverstorm
If it's solar radiation that flips the bits, why not just stick it in a lead
case?

------
sliverstorm
Convert the data to a string of letters.

Have a child.

Name your child that string of letters.

Now preserving your data is the government's problem- they have to produce a
birth certificate and keep track of him/her in their databases.

~~~
lzw
They'll just rename your child.

The thing about government is, all the laws apply to you. But they can do
anything they want.

------
goodside
Encrypt the data with a key long enough that, by Moore's law, you'd expect
computers to be able to break it in 50 years. Submit the data to Wikileaks.
Destroy the key.

~~~
rmc
Modern encryption systems do not fall out of use because computers get faster,
it's because new mathematical attacks to the underlying encryption are found,
and this makes brute forcing it easier than people thought at the time.

In other words, the reason we don't use (say) the MD5 hash system is not
because computers are able to break the 32-bit hash system, but because people
have discovered flaws in the MD5 algorithm that means it doesn't give '32 bits
of strength'. In this case it's not the hardware (CPU clock speed) that gets
better, but the software (programmes that break MD5) that gets better.

Hence, you could use Moore's Law to predict what computers in 50 years time
will be like, but you can't know what mathematical techniques will be like in
10 yet alone 50 years. Your encryption system you use might get broken in 10
years time.

~~~
philh
DES is arguably an exception to that. Twenty years after the standard was
published, the EFF published designs for a machine that could brute-force a
DES key in a matter of days.

But even when it was published, people were saying a 56-bit key was too small,
which they aren't saying of modern cryptosystems (to my knowledge).

<http://en.wikipedia.org/wiki/Data_Encryption_Standard>

~~~
rmc
Yes OK, a small key size makes brute force more likely. Bruce Schneier wrote
about how it's not possible to brute for 256-bit keys. Short section here
<http://www.schneier.com/crypto-gram-9902.html#snakeoil> , there's a longer
explaination saying you can't even flip all the bits in 256-bit key before the
universe expires, but I can't find that at the moment. So once you hit a upper
limit (256-bits), you can't brute force anything ever, so faster computers are
useless. Algorithms like that get broken by new maths, which could happen at
any time, you can't predict it, so you can't rely on it.

------
kabdib
A lot of people are talking about 50 years like it was a super long time, and
propose solutions that are really intended for hundreds to thousands of years.
I think it's overkill. [Also I think a lot of you are under 30 :-) ]

For only 50 years, I'd probably risk making many thousands of DVDs and CDs,
using different manufacturers and drives. Store with tons of redundancy and
ECC, don't use inner / outer tracks for anything important, etc.

Also, are all of the data equally important? You can afford to store the more
critical pieces in more expensive and less compact, but more robust formats.

I think the real enemy is obsolescence, and that keeping the data simple (and
providing decompression programs and indices in easily understandable formats)
is likely more important than worrying about bit-dropout, which seems largely
manageable over your specified time.

For 500 years, I'd print it, or micro-inscribe it. (One problem with printed
matter is that it has other inherent value, e.g., fuel for heating the yurts
of cold barbarians).

For 5K years, micro-inscription and (if you are worried about technological
crashes) an archive in the sky. You could populate a host of satellites in
various orbits, timed to re-enter at intervals of (say) a decade over a few
thousand years (hard to be exact with atmospheric drag and climate change, but
you get my drift). Getting something from orbit down to the ground is not
hard, getting /noticed/ and picked up as an interesting artifact is probably
harder.

For 5M years, add a metric buttload of ECC and stick it in the DNA of some
critter that doesn't get out much. A bottom-feeder in a radiation-shielded
environment would be cool. Say, a lobster.

~~~
parallax7d
I love the lobster idea. It would be even better if the lobsters survival was
based on the integrity of the data. This would provide evolutionary ECC.

You would also need some mechanism to signal people in the far future that the
lobsters were data carrying devices. Otherwise they wouldn't have any reason
to randomly decode sea creatures. Perhaps you could program the lobsters to
develop spots on their shells every century which denote the first 10 prime
numbers.

~~~
crpatino
A religion that worships the lobster-god?

------
extension
Why is everybody worried about the future people knowing how to read the data?
Barring some unprecedented catastrophe, we should still have detailed
technical specs of today's formats in 50 years.

Just bury 250TB worth of SSD storage, along with a device that activates every
year and copies from one 5TB block to the next. Any single SSD will only be in
use for a year. If the drives can survive 49 years before their first use, it
will work.

Storing the data in some ridiculous format is just going to discourage anyone
from ever reading it. I'm sure the people of tomorrow have better things to do
than OCR millions of sheets of paper just to see grandpa's porn collection.

~~~
iamelgringo
How many digital devices that were created in 1960 are still running today?
Most digital devices back then were created using vacuum tubes. How many
vacuum tube makers still exist? The military was buying vacuum tubes from
Checkoslovakia in the 80's to keep the SAGE early warning system running. ref:
[http://en.wikipedia.org/wiki/Semi_Automatic_Ground_Environme...](http://en.wikipedia.org/wiki/Semi_Automatic_Ground_Environment)
That's because there were no American manufacturers of vacuum tubes after the
late 70's.

Sure, we still have the technical specifications for how to build it, but
manufacturing the individual components would be a giant pain in the ass.

~~~
extension
I suppose it depends on how much effort we assume will be expended reading
this data. If humanity bands together and exploits every available technology,
it should be a piece of cake. If it's just our grandkids doing it for kicks,
that may present a real challenge.

Regardless, I think it's likely that we will be better at reading 2010 media
in 2060 than we are at reading 1960 media in 2010.

------
gojomo
Contract with the Long Now Foundation's Rosetta Project to put the data on
their 'Rosetta Disks', readable by any civilization with high-powered optical
telescopes:

<http://en.wikipedia.org/wiki/Rosetta_Project>

Supposedly one holds 13,000 pages of text in human languages. If we assume
your data is similar text, and one page is 58 lines of 66 characters (as are
plain text IETF RFCs), you'll need:

(5TB / (3828 bytes)) / 13000 = 110473 disks

------
edw519
1\. Convert all the data to decimal (digits 0-9).

2\. Put a decimal point in front of this long string. The result will be a
rational number between 0 and 1. Call it x.

3\. Get a titanium rod exactly 12 inches long.

4\. Using a fine laser, etch a line in the rod precisely 12x inches from the
end.

5\. Done. Precise, durable, elegant, compact, and green.

EDIT: </sarcasm>

~~~
JadeNB
Ah, you added the `</sarcasm>` tag while I was responding …. Anyway, it was an
excuse to break out Frink (<http://futureboy.us/fsp/frink.fsp>), which reports
that the resolution required, which (I think) is (1 foot)/(50 terabytes/byte),
is 6 * 10^(-15) m, i.e., on the order of the diametre of a proton. Honestly, I
thought it would be much smaller.

Another problem is that any rod etched in this way will have two decodings.
:-)

~~~
JadeNB_redux
Gah, anti-procrastination settings prevented me from correcting this, but it's
too stupid to let stand. At least this post demonstrates once again the
important principle that a smart (computer) calculator is no match for a dumb
(human) calculator.

Of course, n bytes can store a number up to 2^(8n), not up to n. Thus, the
number we'd be recording has _order of magnitude_ (50
terabytes/bit)×log[2]/log[10] ≈ 1.20e+14, so we'd need to distinguish that
many _digits_ after the decimal point.

Conversely, as bayes pointed out
(<http://news.ycombinator.com/item?id=1585498>), if we have a resolution of 10
digits after the decimal point, then we can handle about 10*log[10]/log[2] ≈
33 bits.

(EDIT: To be clear, it's the same person posting; I just can't stand to wait
154 minutes to correct the error.)

------
zokier
My answer is that you don't put digital data in a time capsule. Digital data
is easy to copy, and that's what you want to leverage. Put a two pairs of
servers in two data centers and keep them running, migrating to new tech when
needed. I'd assume 5-10 generations of hardware would be necessary.

~~~
waterlesscloud
This is the best answer.

I also think things like NASA datasets, other govt agency datasets, etc,
should be placed on torrents for anyone who wants to make a copy. Let the
self-replicating nature of the internet serve as the backup backup plan.

If you put those Apollo datasets online, it's a guaranteed certainty that some
hacker somewhere will have them in 50 years.

~~~
jpcx01
Bleh. It'll just sit on TPB with 0 seeds, 0 peers for years. I'd recommend
bundling it with pr0n.

------
phreeza
Just out of curiosity: whats the background of this question? Are you planning
to actually do this? What kind of data?

If so, you could maybe give use some more information on the constraints
involved(although i must admit thinking about it without any constraints is
fun, too)

~~~
icey
There isn't really much in the way of background, I was thinking about people
who decide to use cryopreservation, or the potential of sending out spacecraft
for long periods of time where it may be out of communication range but the
craft is meant to return, or even something as simple as a school's time
capsule.

The constraints were chosen in order to remove the easiest answers (file
sizes, period of time, etc).

Ultimately I think it's an unsolved problem that will become more important
over time. My family can has photo albums from over 50 years ago but that
doesn't have the kind of bandwidth we need for larger datasets (audio, video,
etc).

So I guess it's just a thought experiment I thought was interesting.

------
bluemetal
If this digital data didn't have to be put away somewhere, I would make the
most of intelligence. Put someone in charge, with the skills to transfer the
data the newer mediums as they become popular and to ensure that the copies
are not corrupted in the process. This person would be paid in whatever kind
of way leads to the most loyalty, I would also leverage their sense of pride.
Maybe have a few different people each tasked with protecting overlapping
segments of the data to help ensure nothing is ever lost.

Ideally some kind of artificial intelligence would come about sometime in the
future to assume the role of data keeper - hiring people to do any work it
couldn't do from within the computer and running off some kind of fund that
had been set up. Maybe one day there will be a market for creating intelligent
services like this, I hope I have something to do with them.

------
nhnifong
Transmit the data with a laser to a mirror 25 light years away.

~~~
Sapient
The beauty is in the simplicity.

~~~
rubidium
Unfortunately, this is far from simple. Laser light is a lossy medium, in that
all laser beams diverge.

For example, even just sending a laser to the moon and back is quite
sophisticated.

"Laser beams are used because they remain tightly focused for large distances.
Nevertheless, there is enough dispersion of the beam that it is about 7
kilometers in diameter when it reaches the Moon and 20 kilometers in diameter
when it returns to Earth. Because of this very weak signal, observations are
made for several hours at a time. By averaging the signal for this period, the
distance to the Moon can be measured to an accuracy of about 3 centimeters
(the average distance from the Earth to the Moon is about 385,000
kilometers)." [source:
[http://www.lpi.usra.edu/lunar/missions/apollo/apollo_11/expe...](http://www.lpi.usra.edu/lunar/missions/apollo/apollo_11/experiments/lrr/)]

Thus, even if you could get a mirror 25 light years away, the light wouldn't
hit it or bounce back.

------
ecaradec
Paper ? It's possible to print binary data on paper now. I'm not sure at what
5TB would look like though. It's probably better to keep the data at hand and
migrate it as the world and the technology evolve.

~~~
js4all
5 TB on paper needs to much room to store.

~~~
barrkel
It depends on who's doing the storing.

A4 page, which has 210x297mm, say a border of 10mm all around non-printable
area, gives 190x277.

Say we print at a resolution of 0.5mm, and in 16 shades, so we get 4x4 = 16
bits per mm, or 2 bytes. That gives 105,260 bytes per page. Probably we can
squeeze more bytes in than this, but let's allocate those to redundancy and
error correcting codes.

So for 5TB (in hard drive size, 5e12) would take about 23.75 million sheets of
paper printed on both sides. At 5g per sheet, that's about 119 metric tons.

1 ream (500 pages) of the 75g/m^2 A4 paper I have beside my printer here is
about 50mm thick. Say 215x305x55mm including some slack for packaging, is a
little over 3.6 litres in volume; total volume for 23.75 million sheets is
171,285 litres, or 171.285 cubic metres.

A room with a ceiling height of 3 metres would need to be only 8 meters on
each side to store this. Of course, the room shouldn't have any windows and
should be at the appropriate humidity, etc.

The cost of the paper, assuming $5 per 500 pages, is less than a quarter of a
million. Much more forbidding is the labour and temporary acquisition of
printers required to transcribe the data to paper. A good printer @50ppm would
take nearly 2 years, assuming zero downtime and very quick paper and toner
changes. To do it in more reasonable time and with more reliability, you'd
want a bunch more printers; and of course, you'd need to hire the people to do
the work of shuffling paper and carting it around, but I'd bet you could
probably do it for less than the cost of the paper, particularly if you did it
in a cheap labour country.

~~~
staunch
This guy claims 200 kB per A4 page <http://ronja.twibright.com/optar/>

~~~
fhars
Which gives you 125 tons for 5TB on 80g/m² paper single sided, which gives you
about the same cost as the 119 tons double sided in the GP, but without paper
jams in the duplexer unit.

~~~
barrkel
The estimated error rate is however about 1 in 100,000 pages, which isn't
quite good enough. Probably a slightly better code can be used with a small
amount of inflation; even something like RAID-5 except over pages (i.e. parity
pages) would do.

------
ghshephard
Use archival CD-R Media - Good for 300 Years

Start by looking at these guys: <http://www.falconrak.com/pro_archival_cd-
r_gold_ep.html>

~~~
ascuttlefish
Any idea where they get the number 300 years? In the archival world, we're
leery of such blatant marketing claims, especially since CD technology is only
a few decades old.

~~~
ghshephard
Not as much a marketing claim as it is a MTBF analysis procedure. They run
their media through advanced heat/cold/light cycles to approximate how long
the media will last. There a number of vendors/technology providers out there
that are working on this technology - see <http://www.millenniata.com/>.
Electronics vendors do the same thing when creating 20 Year+ Lifetime
components - Just heat it up to 85 degrees, then drop it down to -45. Repeat
over and over to advance the aging process.

I've heard that the LDS church / Vatican have both been interested in the
archival media, and they have a pretty good long perspective, so might be
worth checking with technologists in that realm.

~~~
SageRaven
According to Wikipedia, the LDS Granite Mountain Vault
(<http://en.wikipedia.org/wiki/Granite_Mountain_(Utah)>) adds 40,000 rolls of
microfiche per year. Not sure how current that info is, though; the LDS church
is pretty secretive with a lot of its methodology.

------
nivertech
With the rates of our both culture and technology changes, communicating with
people 50 year into the future is like communicating with Aliens. So you may
apply the same principles:

[http://en.wikipedia.org/wiki/Communication_with_Extraterrest...](http://en.wikipedia.org/wiki/Communication_with_Extraterrestrial_Intelligence)

Another proven way to communicating your knowledge through thousands of years
is to start your own ethno-religious group/nation, like for example Jews.

If you want to combine both approaches - try Scientology ;)

~~~
parallax7d
My dad doesn't think we are aliens, and he used to listen to music encoded on
disks as a kid...

Humanity is changing much slower than you may think. The big leap was
electricity, and that is well behind us. I doubt the next big leap (nanotech)
will become widespread for a hundred or more years.

~~~
nivertech
1\. The technological gap between your father and your grandfather was much
smaller, than between you and your father. The gap between you and your son in
50 years will be much bigger. Iphone4, Android will be antiquated technology
for your son in 50 years. I remember some article about a kid, who found old
walkman in his father's closet. It's took him some time to learn how to use
it. And he wrote article comparing walkman with iPod ;)

2\. Nanotech is much closer, than you think! Toshiba is already working on
16nm semiconductor process. The transistors will be of the size of atoms.
Intel Nehalem is already 32nm process, so 16nm is two tic-toc cycles aways.

32 nm — 2010

22 nm — approx. 2011

16 nm — approx. 2013

11 nm — approx. 2015

You will need nanorobots to build chips of the NEAR future. My startup in the
nanotech area, so I know what I talking about.

~~~
parallax7d
Yeah I know nanotech has current limited applications. That's why I said
"widespread". As in electricity once had limited applications, now it's
widespread.

------
AlexMuir
50 years isn't all that long - there are plenty of tapes and records that are
still perfectly usable from then. As long as you included a decent amount of
redundancy you'd be alright with a few hard drives surely? There's always the
issue of software being able to read the data, but we have no problems opening
images and documents from 27 years ago now. In 50 years time there'll probably
be a niche industry producing software that converts old formats - just as
there is now converting VHS/Cinefilm.

------
mevodig
The major film studios, faced with a related problem a long time ago, opted
for a method called YCM separation, where they separate the image into yellow,
cyan and magenta and record it onto very stable black and white polyester film
stock. Properly stored, this supposedly has a lifetime of 500 years or more.

A modern laser film recorder is capable of a resolution of 4096x3112 and 10
bits per pixel, so that's about 16MB of data per 35mm frame with black and
white film.

------
yannis
Print them out on an acid free paper (I have books that are over 300 years
old), so this will definitely work.

After 50 years you can OCR the data etc (or ask your personal robot to do it
for you) and print it using a variant of TeX/LaTex. This has already survived
for 34 years, so another fifty years is almost guaranteed;) Knuth predicted
some years back that TeX will last for about 100 years.

------
fxj
use the 1000 year dvd: <http://www.millenniata.com/>

or: <http://de.wikipedia.org/wiki/Langzeitarchivierung>

------
okmjuhb
Assume we encode the data on acid free paper with color-retaining ink using
colored squares of size 1/64" x 1/64", using one of 64 colors in each square.
There are then 4,096 of the squares in 1 square inch, so (assuming we print on
8"x10" regions) we can fit 327680 squares on a side of a sheet of paper, so
that there are 655360 squares on a sheet of paper (if we use both sides). Each
square encodes 6 bits of information, so we have 3932160 bits per piece of
paper, or 491520 bytes, which is 480 KB.

At this rate, encoding a gigabyte requires 2,185 pages. As an aside, this is
only 5 pages fewer than are contained in the "Art of Computer Programming" box
set.

We can comfortably fit a gigabyte, then, on printed paper, in a 10"x12"x5"
box. A terabyte will then fit comfortably in a 10'x10'x5' space. Throw a few
of these together to get 5 terabytes. Let's add, say, 1 TB more of error
correcting codes. In the unused margins of each page add in some information
about alignment, a printing of all the colors used (to try to protect against
inks changing color over time) and the page number. All together, this is
certainly _big_ , but could probably fit in, say, a tractor-trailer. Throw in
some books describing the data format and the meaning of the data, and you're
done.

------
crawlerMonkey
Store the drive along with a computer to read it? :)

------
eklitzke
It seems like optical discs (CD/DVD/BluRay) are the right idea, they're just
not made from stable enough materials. As far as I know, you could apply the
same technology used to make optical discs to a more stable material like
gold. Assuming that you could get the same storage density as a DVD (which
seems reasonable, given that we have the technology/know-how to make 25GB
single-layer BluRay discs), you'd need something like 500-600 disks (if you
etched both sides). I'm a bit suspicious of my math here, but if my back of
the envelope calculation is right a single disc made of gold would be
something like 0.8 kg.

The nice thing about this solution is that there's a lot of existing knowledge
about how to build and design optical media, how to build in parity checking,
detect jitter when reading, etc. This could be relatively cheap to do in bulk
also, if you found the right material (I just mentioned gold because I know it
to be chemically stable, but there are probably better materials).

------
buro9
Is putting it into orbit with solar power to keep it alive an option?

~~~
ehsanul
If it's electronic media, I would guess that solar wind/flares will kill it
pretty fast without earth's magnetic field for protection.

------
samratjp
Probably would have to build a desktop with a 5x1TB HDD and make sure the case
is super strong and water proof. Stick the machine in a fire-proof and water
proof safe and of course, be sure to have formatted the partitions differently
and have RAID.

Alternatively burn a 100 or so Blu-rays and get two Blu-ray readers (one on a
mobile device) other an external reader that you will attach to the
aforementioned desktop.

Or who knows, maybe holographic storage
(<http://en.wikipedia.org/wiki/Holographic_data_storage>) will come around
finally and store the 5TB in a toothpick sized gizmo (which might probably run
512 cores of Googple's Chip (in my alternate reality, Google and Apple merge
and buy Intel)...

~~~
barrkel
For the HDD route, I'd probably prefer lower density storage, heaps of
redundancy, and a good spread of HDD manufacturers.

A quick Google of HDD MTBF suggests that 1 million hours (over a century) is
wildly optimistic, and typical failure rate is 2-4% per year, possibly as high
as 13%. If e is the failure rate (as a fraction of 1), and assuming a constant
and independent failure rate over time, then the survival chances for any one
drive are:

    
    
        (1 - e)^50
    

So the chance that any one will fail is:

    
    
        1 - (1 - e)^50
    

With n mirrors (assuming a reliable checksum to verify data in the event of
only a single copy of a mirror surviving), the chances of all failing, f, are:

    
    
        f = (1 - (1 - e)^50)^n
    
        log(f) = n log(1 - (1 - e)^50)
    
        n = log(f) / log(1 - (1 - e)^50)
    

So, for a reliability of 99.999%, and hoping you can keep the individual
yearly failure rate at 3%, so f=0.00001 and e=0.03, n would need to be at
least 47.

~~~
ahlatimer
I think you're making the improper assumption that the drives would need to be
left on during this entire time. I don't believe they would.

I don't know if the bits eventually lose their magnetism over time, so if they
do, you may need to spin up the drives every so often and copy to and from
drives to make sure the data is still "fresh", but I seriously doubt they'd
need to be left on and spinning for the entire 50 year span.

~~~
barrkel
I'm not assuming that the disks are spinning all the time; I don't know what
the failure rates of drives left unpowered are, so I took the spinning rate
instead. Google search suggests that failure rates for unpowered hard drives
is high - sticking heads etc. - and as drives are not designed for this,
probably higher than powered drives.

~~~
lsc
this is also what I've read. Also, anecdotally, I've had a disproportionate
number of drives fail on power-up.

------
todayiamme
There are a lot of amazing solutions on this page. Things that I haven't even
thought about, but why do we have to assume that a data reader cannot be kept
in the capsule itself? Why should we create something of arbitrary complexity
that will cause people to just tear their hair out in frustration?

Since the basics of logic will not change and since theoretically any computer
can simulate another. Why shouldn't we just keep a hackable computer with
detailed visual instructions and specifications? Further, to enable someone to
read the specifications we could have a "learning board" with the symbol and
the component next to it. Also, we could even have a haptic output with which
people can interact with the computer.

Let's assume that we have a nuclear battery made out of Technetium. Now this
feeds into a bank of capacitors and high performance rechargeable cells.
Slowly over time the batteries are kept topped off and they are "exercised" by
the computer. Further, for redundancy 4 or more computing units in parallel
could be placed that would wake up sequentially and call the others to check
how the entire unit is working. If we keep something like this in a
hermetically sealed environment and we use the radiation source to manage the
temperature and use passive cooling technologies for letting out heat. It
should be able to sit still until someone finds it.

Now the data itself would be stored on a series of solid state devices [edit:
a specially designed optical storage medium would be far better, but this is
50 years not 1050.] attached to a display. Why shouldn't this suffice?

Presuming that civilization has not collapsed anyone should be able to read
it.

By the way, the Phoenix lander has a DVD that tries to do just this (see:
[http://en.wikipedia.org/wiki/Phoenix_(spacecraft)#Phoenix_DV...](http://en.wikipedia.org/wiki/Phoenix_\(spacecraft\)#Phoenix_DVD)).
It even has this awesome intro by Carl Sagan. (hear:
[http://www.planetary.org/special/vomgreetings/greet2/SAGAN.h...](http://www.planetary.org/special/vomgreetings/greet2/SAGAN.html))

[edit: Dyslexic errors.]

------
woodrobin
Endow an educational institution with funding to teach the data to students,
and a trust fund to pay a stipend to any student who can prove they have
taught all of the data to all of their children, and do the same for the
children's children. The institution will determine invariant replication
through rigorous testing of a recitation of the data set.

The human brain has superior data storage capacity and resilience to any of
the mentioned media, and a longer functional lifespan. It can also adapt on
the fly to changes in technology and language.

This also makes the data proof against illiteracy and technological collapse,
assuming a cultural impetus to memorize the data and teach it to others can be
maintained.

------
LaGrange
Take a person. Make the person maintain 1MB of data -- copy it, memorize it,
something like that. Possibly understand it. Repeat for 5 million people.

If 1MB is too much, try 1KB, and repeat for 5 billion people. There should
still be some people left for systems maintenance.

------
framp
0\. Write the data on an hard disk (named 0) and calculate multiple hashes for
the various fragments of data. The hard disk must contain also a minimal os
which is only able to dd partitions and must be connected to a computer

1\. Create some type of mechanical which is able to periodically do these
tasks:

    
    
       - create an hard disk (named n)
    
       - plug it to the computer above
    
       - transfer the data from hard disk n-1 to hard disk n
    
       - check the hashes
    
       - unplug hard disk n-1
    
       - reboot the system using the new hard disk
    
    
    

To minimize the failure rate of this environment you should create more
computer which check if something is not working

------
ianglang
My idea.. include with the data storage devices (5tb data on them), the
complete but unassembled parts of the data reader (maybe a full computer ready
to go) plus instructions on how to assemble a compatable power supply for the
time it's being recovered in..

vacuum seal everything, with a packet of silica.. no air at all, moisture
removed.. part by part.

1) then vacuum seal the container or 2) pack it with closed cell insulation

no light, no air, no corrosion or UV damage. advantage to #2 is that it would
be ruggedized for hits & transport.

------
Biotele
Use a high energy laser to etch thick platinum CDs. The pits produced by the
laser should be deep. Coat the etched surface with a thick layer of diamond
using vapor deposition. Enclose the CD in a vacuum case made with extremely
durable material (granite maybe). Entomb the CD case in a 10 ton cement block.
Stack the Cement blocks (if more than 1 CD) in a shape of a pyramid on a dry
desert plateau. Irradiate the area around the pyramid. That should last more
than 50 years.

------
Rhapso
Write a book about how you collected the data. Then just collect it again when
you need it in 50 years. In a long time scale, processing power is cheaper
then storage.

------
aufreak3
Very interesting topic.

For preservation, encode the data in the DNA of a bacterium, replicate it
massively and put them in some kind of suspended animation (hand waving
wildly).

Getting the meaning of the data is another matter. I wonder whether it will be
possible to create species of bacteria that can decode the above DNA and
present it directly accessible to human senses - ex: bacteria that change
color, form shapes, etc.

~~~
arethuza
I don't think genomes are anything like as big as the requirement, e.g. this
link gives 0.35GB as the size of the human genome:

[http://www.utheguru.com/fun-science-how-many-megabytes-in-
th...](http://www.utheguru.com/fun-science-how-many-megabytes-in-the-human-
body)

~~~
sorbus
Nothing stopping you from making a much larger one! Or a lot of different
ones, if it's automated.

------
dashnc
Read the data and convert it to binary numbers. Use teddy bears for encoding
the data. Each teddy bear has another color which represents a sequence of
numbers, e.g. yellow is 1010101011101, blue 10101011010101010101010 and so on.
We have 16777216 color possibilites with the normal rgb or cmyk color mode. I
believe that is more than enough to encode the data (altough I didn't check
it).

Easy isn't it? ;)

------
staunch
Two sets of hard drives, five sets of CDs, and five sets of DVDs. I'd store it
in a fireproof locked box with silica gel. Would probably work.

~~~
bitboxer
No, it won't. Some CDs from the early 90ies are not playable anymore because
the foil that stores the data is damaged through aging processes. DVDs and
hard drives have similar problems and won't last longer than 5-15 years if you
are not lucky.

~~~
Luyt
Would it help to seal the CDs/DVDs/CDROMs in a vacuum bag? And/or in cryogenic
storage?

~~~
kahirsch
A nitrogen-filled container would be more practical than a vacuum. Lowering
the temperature would definitely help, although too cold could be bad.

------
sliverstorm
Well, you know they say a picture is worth a thousand words. Pictures are
quite good at lasting for eons, when put upon proper substrate

------
jawee
I have received Commodore 64 floppies from a lot of different people, some of
which have dates written in the early 80s or late 70s (for the disk not the
data of course). And in every box I've gotten there have only been a couple
that haven't worked. Maybe they all just happen to have been stored in optimum
ways.. but I fail to see what the big deal is.

------
ascuttlefish
Here's an associated question that archives everywhere are beginning to
struggle with (though they should have begun struggling long ago). Many
municipal records are being born digital these days, and records retention
schedules telling us to keep a wide variety of materials permanently.

How would you store an ever-increasing amount of digital data _indefinitely_?

~~~
sorbus
Given any method, memory diamond[1]. It's incredibly dense (25 grams of the
stuff stores a bit over half a Yottabyte), and quite durable. If restricted to
existing methods ... a metric fuckton of hard drives in raid arrays,
outsourcing backups to cloud storage providers where possible, and investing
as much money as possible in getting a way to use memory diamond worked out.

[1] Charles Stross describes it: "Memory diamond is quite simple: at any given
position in the rigid carbon lattice, a carbon-12 followed by a carbon-13
means zero, and a carbon-13 followed by a carbon-12 means one. To rewrite a
zero to a one, you swap the positions of the two atoms, and vice versa." See
[http://www.antipope.org/charlie/blog-
static/2007/05/shaping_...](http://www.antipope.org/charlie/blog-
static/2007/05/shaping_the_future.html)

------
js4all
It is not only the physical storage, but also the software that can read the
data which matters. So choose your formats wisely.

------
resdirector
Amazon S3 would surely be the most economical, right? No upfront costs, and
running costs steadily decreasing?

~~~
lsc
I wouldn't count on the 'steadily decreasing' part, s3 hasn't come anywhere
near to following the downward price trend in hard drive cost per
gigabyte/power per gigabyte, but even at todays costs, that's only $3900,
which sounds pretty reasonable compared to what others are suggesting, if I
did my math right, to keep 50 gigs around for 50 years. Of course, while I'd
rather depend on s3 being around in 50 years than a raid in my closet, I still
don't know that I'd give it great odds.

I don't think 'use rackspace /and/ use amazon' is a good strategy either, just
'cause if market conditions change enough that one goes under it's likely the
other will to... and it's also likely that they could be bought by the same
entity. you'd want to put one copy on s3, and then use some completely
different storage method, like microfiche or or archival CD roms or something
in a safety deposit box.

------
SemanticFog
1) Buy 5-10 different brands of hard drive and make a copy of the data on
each. 2) Invest $100K dollars in very long-term inflation-protected
investments 3) In 50 years, use the proceeds (around $1m) to offer a prize to
anyone who can decode the disks to a current format

------
cdb
What if there is some kind of solar event that renders the population unable
to generate electricity. Does this leave paper/microfiche/stone as the only
options? Surely this kind of possibility must be accounted for when
determining the viability of digital options.

------
ricardobeat
Sandisk has already solved this:
[http://www.electronista.com/articles/10/06/23/sandisk.outs.s...](http://www.electronista.com/articles/10/06/23/sandisk.outs.sd.worm.cards.for.forensics/)

In a few years we'll have those with 64gb/128gb capacity.

------
oomkiller
Holographic storage. Seriously. The current technology has a 50 year archive
life.

<http://www.inphase-tech.com/products/default.asp?tnn=3>

------
safetytrick
Here's a good question, given the probability of a bit on an ssd being flipped
by cosmic radiation, how much redundancy would allow recreating the data using
parity bits or something along those lines?

------
sigstoat
how long does an integrated circuit last in storage?

design a few of them with the data just burned in. no capacitors to fail, you
could get your redundancy by just making a couple dozen of each. design them
to wait a few seconds after power on, and then start spitting out the data and
a clock. label the power, ground, serial out and data pins.

given those pins, there isn't anything else you could do with it besides hook
it up and see what it has to say. and all you need to read it is a stable
power supply, and sufficiently fast analog to digital converters.

------
arethuza
I think you have to state some assumptions about technology levels at the time
the time capsule is opened and whether you want the data to survive much
beyond the 50 years if it is not opened.

------
swschilke
I would go with b/w microfilm but anyhow see here
<http://www.ifs.tuwien.ac.at/dp/timecapsule/home.html>

------
sandGorgon
doesnt every solution involving magnetic media include the need to have power
sources (since HDD, USB drives need a recommended electrical power rating
source).

Optical media is the only way to guarantee whichever future "creatures"
encounter the data, can actually figure out how to access it.

If you are a blind alien (so that you do not understand the _concept of
light_) you can potentially still have measuring equipment that can sense the
pits in the media and make sense out of the binary data.

~~~
corin_
We're talking 50 years not 50 centuries

------
wire
Just write an own logarithms and encode this in that way. Rent some dedicated
servers with enough HDD and RAID, store them there, grant various persons
access. :)

------
straightsilver
1\. Encode the data into a stream of bits 2\. Send the data using a laser and
bounce it off a planet 25 light years away. 3\. Recieve the data in 50 years

------
thedjpetersen
Like this: <http://en.wikipedia.org/wiki/Wikipedia:Size_in_volumes>

------
varjag
Print them as high-density barcode on acid-free paper.

------
ecommando
You people have way too much free time. I'm going to get Congress to pass a
law stating all free time must be given to the gubbermint.

Cheers :)

------
Deuterium
Make a bunch of asian kids memorize it. You'll need some redundancy, but not
much. Most of them will live to be 60.

(note: self.race == asian)

------
Jeema3000
While these are some rather colorful and interesting ideas, might I suggest
ROM chips?

 _goes back to playing 30+ year old Atari games_

------
fxj
use microfilm, write as text.

<http://en.wikipedia.org/wiki/Microform>

------
asinger
If it's media, store it on 80+ iPads. Bundles the reader with the data. Only
requires AC power in 50 hrs.

~~~
safetytrick
Wouldn't that make the "kindle/nook/something else" a better platform for that
kind of solution?

~~~
framp
this is the only way to make an ipad useful :p

------
iterationx
<http://rosettaproject.org/about/>

~~~
zach
Or just go to the supplier:

<http://hstrial-norsamtechnol.homestead.com/Rosetta.html>

The New York Times used them to make a nickel disk of their archives for a
very long-lived time capsule.

------
wslh
Put it encoded and distributed over Internet, so Google and other search
engines can store it.

------
thelonecabbage
metal punch cards, stored in low acid clay vessels in a very dry place. Seems
to have worked so far... <http://en.wikipedia.org/wiki/Copper_Scroll>

------
zandorg
Put a webserver on the Moon accessible via satellite and/or short wave radio.

------
reyman
And store data on ceramics or porcelaine ? It's easier than writing on rocks

------
geuis
Stone, or perhaps clay. Some of the oldest written records we still have are
from Sumer using cuneiform on clay tablets. Use modern micro-abrasion
techniques to encode data to stone, then put that stone away into an area
little accessed by humans on a daily or yearly basis.

Assuming there is no cost limit here, I would go one step further and say use
some form of metal. Say stainless steel, aluminum, gold, or titanium. Some
metal that is very stable over time and does not interact with the atmosphere
readily. Again, use micro-abrasion / carving technology to write data to the
materials.

The next question is what format for the data? It depends on whats being
stored. The biggest issue is that of "formats".

Lets look at things that last a long time. The English(or any language) is
unlikely to change that much in 50, 100, or even 200 years. Words and their
meanings will change, but for the most part a native English speaker 200 years
from now could read what we write now. Whether or not they understand the
usage of the language is a different question. So if its a textual document
you're saving, write it in plain English. No abbreviations, etc.

What about media? That gets complicated. If its a static image, perhaps
keeping it simple is best. In plain English, write that the following section
of data is an image. Each group of three numbers starts at 0 through 255. In
procession from left-to-right, the values represent Red, Green, and Blue. Each
group of 3 numbers is what we called a "pixel". The image is 300 pixels wide,
and 800 pixels heigh(arbitrary numbers for this argument).

For moving images, further expand on the single image description and say
every 24 images should be spaced equally and viewed over the course of 1
second to achieve animation.

Sound is something I don't know _anything_ about from a data format
perspective, but I would again find the simplest mechanical way to produce a
sound and store it in that format with ample verbage describing how to handle
it.

 _Edit After reading other responses that came in while I was writing this, I
want to add some more thoughts.

Remember that our technology is ephemeral. We don't really use much tech from
50 years ago, hardly any from 100 years ago, and it just gets worse from
there.

Things like microfiche, ssd's, cd-roms, blue-ray, etc are all the more bad
ideas for long-term backup. Paper books are a better option than any of these
for near-term storage for time periods up to 50 years.

If we want to actually store data in a meaningful way for long periods of
time, say over 100 years, we have to keep it simple. Your devices will
probably not last 100 years, even if kept in storage under the most secure
environment. But in 100 years people will still have eyes, ears, and hands.

We have to look back over history and look at the material types that survive
long periods. Stone, and metal to an extent, are very good long-term
materials. Cloth and paper are _not*. DNA is potentially a usable data store,
but is corruptible. Plus you can't read DNA patterns with the eye.

~~~
tghw
> _We don't really use much tech from 50 years ago, hardly any from 100 years
> ago, and it just gets worse from there._

Printing presses, microfiche, film, radio, telephones, television, computers,
light bulbs, electrical sockets, toasters, cars, bicycles, electric stoves,
speakers, microphones, projectors, etc.

Technology is not nearly as fickle as you'd think. Remember, 50 years ago is
only 1960. VHS tape technology is almost 40 years old. Sure, things are moving
faster now, but things that work tend to stick around.

------
ollieread
I'd store it on BetaMax tapes :) They do say history repeats itself

------
lordmetroid
Paper, it will outlast anything other than maybe stone.

------
ehosca
punchcards.

------
dnsworks
I had a good talk with my ex-wife's uncle about this a few years ago. He's a
record's archival specialist in Australia, and they seem to be doing a lot of
work with Microfiche. Apparently it lasts quite a long time and is highly
dense. They encode data in some sort of an xml format, and the first slide on
each sheet has a reasonably explanatory description as to how to decode the
data.

~~~
gaius
B&W film is the best archival format I've come across, it'll last a century
stored in a shoebox in the attic, and you can look at it just by holding it up
to the light. That's important; anyone who picks it up regardless of their
prior experience will know, this is a thing with information on. Would you
even recognize a HD if you'd never seen one before?

------
hga
Lots of good (and bad :-) ideas; here's another:

Mylar "paper" tape, or use some other plastic that's known to have serious
archival qualities.

Bulky, but if stable (enough) in the presence of water it'll survive various
failure modes that would kill acid free paper. Of course you could etch Mylar
or some other stable plastic to gain greater data densities like with the
suggestions for paper. Just pick a plastic we _know_ is seriously stable from
actual experience, like we know with acid-free paper.

We also have such experience with emulsion based storage methods (microfilm,
fiche, etc.), but those are rather delicate for my taste.

