Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What's the most stable form of digital storage?
186 points by agomez314 on April 25, 2022 | hide | past | favorite | 235 comments
I wrote a program which I'm proud of having done and would like to keep it for posterity. What's a good storage medium where I can keep and load again in the future? Requirements are: size < 1GB, must keep for at least 3 decades, must be easily transportable (for moves between houses and such) and can sit on a shelf. Bonus points for suggestions on an equally stable storage type that some computer will still be able to understand in the future.


If the question is literally about just one program source code, the answer is easy: print it out.

All my oldest preserved code (early 80s) is on paper, the things it occurred to me at the time to print out. No fancy archival paper either, just listings printed out on my dot matrix printer onto fanfold printer paper.

Anything from that era that I didn't print out is gone.

From the late 80s onward I still have all the files that I've cared to save. The general answer to that is that there is no persistent medium, you need to commit to keep migrating that data forward to whatever makes sense every so often.

I copied my late 80s 5.25" floppies to 1.44MB floppies in the early 90s. In the mid 90s I copied anything accumulated to CD-Rs. In the 2000s I started moving everything to DVD-Rs.

From the late 2000s until today I have everything (going back to those late 80s files) on a ZFS pool with 4-way mirroring.

Of course, aside from preserving the bits you also need to be able to read them in a future. Avoid all proprietary formats, those will be hopeless. Prefer text above all else, that will always be easily readable. For content where text is impossible, only use open formats which have as many independent open source implementations as possible to maximize your chances of finding or being able to port code that can still read the file 30-40 years from now. But mostly just stick with plain text.


But please, do not print on a laser printer. Use an inkjet printer or dot matrix printer. Laser prints have the bad tendency to "unstick" themselves from the paper, you end up losing everything.

"The best long term backup strategy is a string of robust middle term solutions." This was for me the most insightful comment I read (as far as I can remember) on Tim Bray's blog[0] many years ago.

[0]: https://www.tbray.org/ongoing/


Dot matrix also tends to clear out during the years, specially with more acidic paper (such as recycled ones). Saying this as thousands of pages of my programming-related dot matrix prints slowly fade away :)


> Laser prints have the bad tendency to "unstick" themselves from the paper, you end up losing everything.

Have NEVER heard of this


It can happen due to various reasons, but the way laser printers work is they electrostatically charge a toner drum, the toner particles "jump" to the right spot, and are then deposited on the paper that has an opposite electric charge. Then the whole thing goes through fuser rollers that "bake" the toner into the paper. If the toner is low quality, the fusion step wasn't hot enough, or the paper and toner are incompatible, you might not get complete adherence of the toner to the paper. This can cause that flaking behavior in the future.

The trick is to do your research on the printer and toner to make sure they have a high quality finish.

For paper, a heavier than normal printer paper (like 40lb text paper instead of the normal 20 or 28 lb) that has some percentage of cotton in it can last for hundreds of years if you store it in an airtight container.


I have a TON of old HP LaserJet printouts from the 1980s and 1990s have have done this - if you have them stacked in a box, you find them stuck together and pulling them apart takes the print off with one or the other sheet.

I haven't experimented with work arounds but perhaps separating with wax paper or parchment might be helpful.


In the 1980s we used photocopies and sheet protectors to play a tabletop game (SFB), and over time toner would pry off the paper and stick to the plastic. Not sure if modern paper stock or printer fusers do better.


I would imagine that it would be related to regional environmental conditions such as humidity, temperature, air pressure, etc. though I'm no expert on the chemical properties of toner.


I also never had this happen, and conversely find that inkjet fades, and will be totally ruined if it ever becomes damp.


If you're printing with Inkjet, and you want to use color, make sure that all cartridges are pigmented (which is generally true for HP DesignJet series). Otherwise almost all HP black inks are pigment based, use all-black printouts.


I have seen the issue that you are talking about but I am not sure what causes it. I have 25 year old documents printed with laser printer that have not released from the paper at all yet I have ten year old documents where it happens when I bend the paper. My pet theory is that modern paper is not as good because of the high recycled content but that would obviously take a significant effort to test. Another possibility is that modern printers are the problem but this would be even harder to test. I don't print anything to keep anymore so it doesn't really matter anyway.


High quality papers are coated with clay or other material. Perhaps some of these coatings are prone to releasing or otherwise incompatible with toner?

https://en.m.wikipedia.org/wiki/Coated_paper


> Laser prints have the bad tendency to "unstick" themselves from the paper, you end up losing everything.

That's bullshit.

It's the other way around. Inkjet fades out and gets washed off by a little bit of moisture, while dot-matrix "just" fade out.

Don't skimp on toner and paper, don't get it to rot and it'll last centuries.


It might seem like bullshit for any short-term storage (sub 5 year), but I’ll provide an anecdatum of letters literally unsticking from laser printed paper and instead adhering to the plastic bin they were stored in for 7 years.


Wrong paper, and/or bad fusor.

My stuff printed on HP LJ 4 in late 90s is still as good as new.


> But please, do not print on a laser printer.

My understanding is that until recently the UK was printing laws on vellum, for maximum archival durability... but they printed onto the vellum using a normal laser printer. So it must be pretty durable? A laser printer uses simple carbon, rather than complex inks.


I'd be delighted to learn that all our laws before 2000 had faded from the statute book.. but you're probably right, it seems durable enough.


> A laser printer uses simple carbon

No, it uses what's basically a fine plastic powder. By mass it's mostly carbon, yes.


I have some docs printed in late 80s / early 90s on HP laser. Still pretty much in perfect condition. No "unsticking".


Heh of we are going that route why not just laser-engrave it into animal skin?


Here's another benefit to printing: once a decade, when you migrate storage boxes or move houses, what are the odds you'll look at your printouts and reminisce? Probably pretty good, since it's easy to look at something.

Meanwhile, personally, I back up old data but hardly ever look at it, since it's mixed in with old programs that probably won't work and a thousand photos I don't want to see. So maybe laser-etched platinum will last longer, but the barrier to reading it will certainly be higher.


>If the question is literally about just one program source code, the answer is easy: print it out.

Doesn't seem like it's just one source code file if op states : "Requirements are: size < 1GB,"

Depending on font size (say 10pt) and average characters per line, that would be printing several hundred thousand paper pages which is not feasible for the average homeowner to roundtrip back into usable digital files.

Instead of a cheap flatbed scanner, you now need a high-speed auto-fed document scanner and then run batch jobs to OCR several hundred thousand tif images back to digital source files.

One could reduce the paper count by compressing the source text to zip and then printing a form of binary-to-text (e.g. UUENCODE) but now the papers have random-looking gibberish instead of readable text.

Printing <10 MB source code is more realistic than <1 GB.

(But I'm guessing the author may actually has much less than 1,000,000,000 bytes of original source code if one leaves out 3rd-party dependencies.)


For a fully reproducible and backed up build, you will need to print out the 3rd (and 4th and so on!) party stuff as well :(

Good luck with anything that has a "node_modules" folder ;) Please do not deforest earth for that.


But do use some kind archival-grade paper. The £2 per ream supermarket crap that curls up in hot weather is probably full of acid and will become brittle and fall apart eventually.

And I would check if there are issues with either cheap ink fading or cheap toner flaking off on the multi-decade timescale. Though you will probably appreciate a decent printer anyway to chug through 1GB of text!


> The £2 per ream supermarket crap that curls up in hot weather is probably full of acid

I wouldn't be so sure. I need paper that is not basic for a hobby of mine (cyanotype printing, a photographic process that is very sensitive to high pH) and that is actually quite difficult to find these days. Almost all available paper is acid-free, because it doesn't cost much to the manufacturer to add calcium carbonate buffer to the paper.

It doesn't mean that cheap paper is good for archival (if only because it likely lacks mechanical strength) but paper made in the last two decades or so is rather unlikely to become yellow and brittle in the future, and it should keep quite long if stored in correct conditions.


But, meanwhile we got USB. I'd argue that USB thumbdrives/HDDs will continue to work for another decade; and ext4 will probably survive that, too.

Everything I stored on Diskettes, CDs, DVDs, Bluerays were only short-term backups in my opinion; due to the rapidly evergrowing need for more space and Sony trying to push their patented technologies to all markets. I had to buy a drive on eBay to restore backups years later, only to realize that the CDs were totally unreadable due to UV degradation.

These days my backup strategy is redundant USB hard drives, with the assumption that USB will continue to be supported longer than the current SATA versions and current disc-based mediums.

The only thing that survived all this time were ZIP drives and DVD-RAMs. They are still awesome. But sadly nobody uses them anymore, so access to replacement mediums and drives is a little limited :(


There are 2 big problems with USB: data retention and connector format.


How long does data on an USB stick tend to last?


We don't know. They have not been around long enough.


Sounds good (better than that we had already noticed)


> These days my backup strategy is redundant USB hard drives

If you're talking spinning rust, beware hard drives that haven't been spun up in a long while have a tendency to "stick". I'd suggest starting every hard drive up at least yearly and scrubbing the contents.


Cuneiform. We still have fired clay bablylonian tablets from the early Bronze Age


So, "fired clay" then =)


Great points. I worked with some archivists on a project several years ago (When ODF was a big thing) and was surprised at the amount of controversy.

There’s a couple of schools of thought. In general archivists want to preserve the original document, but at that time they were already losing access to 1980s word processing formats.

Some folks advocate PDF/A output as a “standard” preservation technique. The people I was working with were making a point in time TIFF image of whatever was being preserved and storing it side by side at the time. (I think they transitioned to PDF/A when the spec was revised) PDF/A is the standard for US Courts, so renderers will be available for a hundred years or more.

It’s an interesting problem space because time is not kind to electronic documents. Even stuff like PowerPoint from circa 2000 doesn’t always render cleanly today. When “H.269” is released in 2050, will anyone ship H.264 codecs?


> I copied my late 80s 5.25" floppies to 1.44MB floppies in the early 90s. In the mid 90s I copied anything accumulated to CD-Rs. In the 2000s I started moving everything to DVD-Rs.

BluRay discs are expected to last 50 to 100 years at least. It's longer than magnetic tapes. Still not paper but, well, kinda inconvenient to "print" 1 GB of data on paper in a way that's easy to store / re-read.

I have 80s floppies (5"1/4) that still can be read fine but I'd say at least 1/3rd of them are now failing. Still: after about 35 years, I'd say it's not bad. I expect BluRay discs to completely outlive me.


It is interesting to think of ancient Egypt's use of papyrus for paper. Very little of it remains. But all of those carvings in rock, you can go to a museum and see stuff 2000 years old.


Use a ZFS snapshot on a rotation so you can restore accidentally deleted files (a hedge against user error, ransomware, etc.)


Or microfiche, tiny paper.


The only hard part is probably finding someone who can "print" onto microfilm these days. Might be worth talking to your local librarian, they probably know who still does it and how much it costs.

Film is incredibly durable, will easily last 100 years.


This. Inkjet printer and acid-free cotton paper.


"Hashes + Copies + Distribution"

I used to work in the data protection industry, doing backup software integration. Customers would ask me stupid questions like "what digital tape will last 99 years?"

They have a valid business need, and the question isn't even entirely stupid, but it's Wrong with a capital W.

The entire point of digital information vs analog is the ability to create lossless copies ad infinitum. This frees up the need to reduce noise, increase fidelity, and rely on "expensive media" such as archival-grade paper, positive transparency slides, or whatever.

You can keep digital data forever using media that last just a few years. All you have to do is embrace its nature, and utilise this benefit.

1. Take a cryptographic hash of the content. This is essential to verify good copies vs corrupt copies later, especially for low bit-error-rates that might accumulate over time. Merkle trees are ideal, as used in BitTorrent. In fact, that is the best approach: create torrent files of your data and keep them as a side-car.

2. Every few years, copy the data to new, fresh media. Verify using the checksums created above. Because of the exponentially increasing storage density of digital media, all of your "old stuff" combined will sit in a corner of your new copy, leaving plenty of space for the "new stuff". This is actually better than accumulating tons of low-density storage such as ancient tape formats. This also ensures that you're keeping your data on media that can be read on "current-gen" gear.

2. Distribute at least three copies to at least three physical locations. This is what S3 and similar blob stores do. Two copies/locations might sound enough, but temporary failures are expected over a long enough time period, leaving you in the expected scenario of "no redundancy".

... or just pay Amazon to do it and dump everything into an S3 bucket?


S3 eliminates the risk of a disk becoming unreadable, or losing data in a fire. And it's overwhelmingly likely S3 will still exist in an easily readable form in 30 years time.

But it doesn't provide protection against you forgetting to pay AWS, you losing your credentials, your account getting hacked, or your account getting locked by some overzealous automation.


> And it's overwhelmingly likely S3 will still exist in an easily readable form in 30 years time.

There is no indication that this statement holds true. Not even remotely.

Businesses fold all the time. How many services still exist today that existed 30 years ago? Not in some archive, but still operational?

In addition to that problem, tech half-life continues to decrease. 30 years in the future is likely more comparable to 60 years in the past. Hello punch-cards.


> There is no indication that this statement holds true. Not even remotely.

Well, over 30 years I'd bet on S3 over blu-ray or magnetic tape at the very least :)

For one thing, S3 itself is already 16 years old - and Ceph, B2, GCS and Azure all offer extremely similar products, indicating there's solid demand for this product.

Second, it's not clear to me that 'tech half-life continues to decrease' - granted, there's huge churn in javascript web frameworks, but for PCs and laptops? Very little has changed in the last 5 years.

And thirdly, some technologies stay around for absolutely ages. Right now, you can buy a brand new data projector with a 15-pin analog VGA port - and a motherboard with a VGA output. You can get a motherboard for a 64-core ThreadRipper processor... which has a PS/2 connector.


To add to this, the core of the S3 argument (specifically with AWS and Azure) is based on the premise that AWS/Azure (AA from now) are too big to fail and to go anywhere, but I don't think that it has any bearing on whether the specific services will continue "as-is". The idea and concept should be that they're fine for storage, but keep in mind like any service, you need to keep an eye on it, no matter what they tell you now. It's impossible to predict what changes they may introduce even next year (or month for that matter) that makes AA S3 storage not feasible/usable.

Furthermore, keep in mind with this, once your data is in AA, it's no longer your data, it's AA's. Sure, you can pay to retrieve it, but that's the catch -- you gotta pay the toll. Unexpected bills or hidden fees or changes to fees may make the retrieval process simply not fiscally possible after a time if your account is delinquent. (I've seen this with _many_ clients; they tried to squeeze out every coin of the budget and didn't have flexibility with AA, and they ended up with a delinquent account and lost access: https://aws.amazon.com/premiumsupport/knowledge-center/react...)

Now, of course this is considered "your responsibility", and it can be true of anything, but if the data size is that low, then I think just manually managing it probably is a safer and perhaps cheaper bet. S3 as a service is mostly fine, but a lot of people very much so underestimate the expense of it and never bother to test recovery scenarios, and it ends up with a real surprise. (at least a few customers let their AWS bill go delinquent until the next fiscal year allowed them to pay the bills, only to find the data deleted by the above mentioned policy).

Basically, the idea of "set and forget" backups is a pipe dream; if it's important, you need to maintain it, and it basically can be like a second job.


>30 years in the future is likely more comparable to 60 years in the past. Hello punch-cards.

this part is a pro for S3 not a con. In this analogy OP is trying to store the information held by the punch-card, not the punch-card itself. So giving the information over to a business means they will preserve your binary data by moving it from punch-cards to HDD to SSD, etc - they will handle the hardware changes and redundancy.

Your first point stands strong though


For S3 specifically you want to use Glacier. It's made for long term storage and is very, very cheap to store in.

Be warned though that restoration takes special procedures, time, and can be expensive. So Glacier is most definitely a place for storing stuff you hope you'll never need, not just a cheap file repository.

The Glacier fees for retrieving data in minutes are incredibly awful, so take that into account. Count on waiting 12 hours to get your stuff for cheap.


Glacier Deep is the cheapest option. It does come with a catch that there's a minimum of 180 days commitment for their infrequent access tier. Last time I checked, the cost for US-East-1 is roughly like this:

At $0.00099/GB/month, it would cost ~$12/year to store 1TB. Retrieval cost is $0.0025/GB and bandwidth down is $0.09/GB (exorbitant! But you get 100GB/mo free)

So, retrieving 1TB (924GB chargeable) once will run ~$85. I've also excluded their http request pricing which shouldn't matter much unless you've millions of objects.

For the same amount of data, Backblaze costs ~$60/year to store but only $10 to retrieve (at $0.01/GB).

I suppose an important factor to consider in archival storage is the expected number of retrievals, and whether you can handle the cost.


Sounds like points 1 and 2 can be elegantly combined using "next-gen" filesystems like zfs or btrfs. The hashing and scrubbing (automatic repair) happens in the background and the swapping to new/fresh media is automatic through replacing failing hard drives. Plus, the two are open and widely adopted standards.

I always thought a, say, zfs pool with 2-disk redundancy is not only redundant (RAID) but also servers as a backup (through snapshots). The 3-2-1 rule is good, but I feel like zfs is powerful enough to change that. A pool with scrubbing, some hardware redundancy and snapshots could/should no longer require two backups, just a single, offsite one.


What if I don't want to backup stuff, but archive and then forget about it?

Edit: Oh, and I want it to keep existing after I'm no longer alive.


If it's code like the OP seems to indicate, publish it on github; many services draw copies of source code from Github, and they themselves once put all code into cold storage for posterity: https://archiveprogram.github.com/arctic-vault/

> Each was packaged as a single TAR file.

> For greater data density and integrity, most data was stored QR-encoded, and compressed.

> A human-readable index and guide found on every reel explains how to recover the data

> The 02/02/2020 snapshot, consisting of 21TB of data, was archived to 186 reels of film by our archive partners Piql and then transported to the Arctic Code Vault, where it resides today.


That's the "pay someone else to do it" option.

It's this way because "archive and then forget about it" isn't really a thing. It turns out an archive that is not maintained is no archive.


Build a pyramid and carve your data into walls deep inside the pyramid.


And then pay somebody to guard it. Aka the "pay someone else to do it" option that your sibling comment talks about (and of which there are many different flavors, S3 being another one).


I'd prefer a mobile solution. Something I can just put into a box and forget about. Someone else can than stumble upon it and be able to read it.

Basically like a box of photographs. But I know that's a foolish dream to have.


Any medium that is stable physically for at least a few decades and can be read optically. Acid free paper with the data machine encoded, laser etched metal, etc. Anything traditional would need to be online (HDD), easily reread and verified over time (tape), or is not recommended (SSD).

It costs the Internet Archive $2/GB to store content in perpetuity, maybe create an account, upload your code as an item, donate $5 to them, and call it a day. Digitally sign the uploaded objects so you can prove provenance in the future (if you so desire); you could also sign your git commits with GPG and bundle the git repo up as a zip for upload.

EDIT: @JZL003

The Internet Archive has their own storage system. I would assume it caps out because they're operating under Moore's Law assumption that cost of storage will continue to decrease into the future (and most of their other costs are fixed). Of course, don't abuse the privilege. There are real costs behind the upload requests, and donating is cheap and frictionless.

https://help.archive.org/help/archive-org-information/

> What are your fees?

> At this time we have no fees for uploading and preserving materials. We estimate that permanent storage costs us approximately $2.00US per gigabyte. While there are no fees we always appreciate donations to offset these costs.

> How long will you store files?

> As an archive our intention is to store and make materials in perpetuity.

https://archive.org/web/petabox.php


I never thought to donate to Internet Archive before. Thanks, done. I use the Wayback Machine too much to not pay for it!


Where did you get that number, out of curiosity. Google's cloud storage is 2 cents per gb per month and backblaze b2 is $0.005 p/ gb per month. I understand in perpetuity is more expensive but why does it cap out as opposed to being a yearly price per gb (maybe if you assume hard drive storage will decrease at a similar rate?)

Quick envelope math, if they were using backblaze pricing, $5 would give a gb 83 years of storage. But it's unclear if backblaze is actually regionally duplicated


The idea is that cost of storing 1GB will reduce over time ending up being a convergent infinite sum.


what if it gets stuck at the low low price of 1cent per exabyte?


Even if the yearly cost is lower bounded, the net-present cost will be finite for perpetual storage so long as economic growth continues. Basically, an endowment: give Internet Archive X dollars as a one-time lump sum to invest and they can spend rX dollars per year on storage forever, where r ~ 3%.



May I reference this link directly in the future as the canonical storage cost model the Internet Archive is using presently?

EDIT: Understood, thank you for the reply.


I'm not involved in this stuff and I don't know if the IA uses that particular model, although the underlying principle and math is pretty fundamental.


83 years != perpetuity, hosted != archived.


This is very relevant, and perhaps deserves a submission of its own:

http://news.bbc.co.uk/2/hi/technology/2534391.stm

"""But the snapshot of in the UK in the mid-1980s was stored on two virtually indestructible interactive video discs which could not be read by today's computers. """

I can't find the back story now, but if they weren't able to source a working laser disk reader from a member of the public (which IIRC took quite a bit of effort to find), then accessing this data - digitized in the early 1980s - would have cost a fortune.

The inspiration for this project, the 900-year-old Domesday Book, is just as readable today as it was in 1980 (and in 1200 or so). The ability to read data with one's eyes should not be underestimated.


Remark on the side:

This entire page is about 122 kB, is clearly laid out and easy to read.

If I check a similar short-ish news item today (https://www.bbc.com/news/business-61185298) my browser (with ad blocker) needs to load 3.8 MB of data (31 times as much) and I can see less of the actual content.

Instead of Web3, can we maybe go back to Web1?


To add on to your observation, reading mode is even better to look at and loads just shy of 16KB.

As an aside, I still don't understand what Web3 aims to solve but I feel Web 2 is good enough if people don't go crazy with js, images, ads and other shenanigans.


There isn't. Sorry, but there just isn't a permanent format. The real problem isn't the storage media but that technological standards evolve. Tape media is excellent at surviving. I have a 9 track digital tape keepsake from when I used to work with it regularly some 20 years ago. I'm absolutely certain that the data on it is still good. I don't have the 300-pound "dishwasher" drive that can read it, the three-phase power to run it, nor a DEC Vax that understands EBCDIC encoding.

The only true solution is a living one, where you have make sure you have the ability to get your data from an old format to a new one periodically. More importantly, you should look into the idea of 3-2-1 Backups. Anything that you intend to keep indefinitely is subject random events, ie fire, flood, tornado, theft, etc. Having multiple archives in separate systems is more import than worrying trying to ensure a single copy will last a long time.

Storing less than a gigabyte is very cheap to do in multiple formats, such as USB flash drive, external hard drive, CD, BlueRay disc, etc. You can hedge against data corruption with PAR2 files. Also, consider storing a copy on the cloud, ie Backblack B2, AS S3, etc. Again, I suggest creating PAR2 files and/or using an archive format that can resist damage.

Just create calendar events to check periodically the integrity of your archives. Having problems reading a CD, use the hard drive backup to burn a new one. This also a good time to consider if one or more your formats is no longer viable.

Finally, realize that a program runs within an environment and those get replaced over time. You need to no only back up your program, but probably want to store the operating system and tools around it.


An addition to your list:

Use mainstream technology media formats for physical storage. It's trivial to get a USB floppy drive for reading floppies from the early 80's, but getting hold of a new drive to read LS-120 disks from the late 90's/early 00's is pretty much impossible. BluRay is probably the best bet for physical media for the next 20 years. I've done some trials with SD cards but they seem less reliable than BluRay.


  > You can hedge against data corruption with PAR2 files.
This is the key phrase from the entire thread. PAR2 will enable the user to create recovery files, if (actually, when) part of the original data will be corrupted. The recovery files (by default 5% the size of the original files) should be stored alongside the original files in each location.


DEC VAXes have never used EBCDIC. At least not natively. Probably they could convert from/into it, but the Unix/Linux program dd on my Raspberry Pi can do that, too.

Coding is not the problem, the hardware is.


9-track tape is common enough that you can pay to have it read at reasonable cost.


If you want to keep it secure for at least three decades you should follow the principle of Lockss https://www.lockss.org/ "Lots of copies keeps stuff safe".

You might like to read through the site, but if not then I would suggest keeping it safe via storage in multiple formats and locations. If I really wanted to keep something safe and wanted to put effort into it I would put it on a remote service, an external physical media that I might store somewhere else safe, and whenever I get a new computer it would get backed up to my computer. This of course puts extra managerial requirements on you, which for me would be difficult because of the ADHD problems, but of course you would need to make sure to keep your remote service or make sure if you are getting rid of it that you have a plan for moving stuff etc.

In my case I have multiple computers so I would also make sure important to preserve stuff was backed up to all of them.

All of which reminds me I should update a bunch of my stuff.


Is this something to store on their site? I see it is open source but didn't see any examples of a running site - anyone come across an ide just to explore some feature sets here?


M-DISC: https://en.m.wikipedia.org/wiki/M-DISC

They’re special DVD and Blu-ray discs designed for long-term storage. DVD and Blu-ray are so widely used, it seems likely you’d be able to find some equipment in 30 years that could still read them.


Do you really think physical disc and media players for them will last long? I still cling to an old blu-ray player but every time I buy some discs it feels like I’m sifting through the ruins of a collapsed building (in some giant bin in the middle of a large retailer hallway). I also feel like I never see a single other person looking at or purchasing discs…


> Do you really think physical disc and media players for them will last long?

Yes.

There are too many use cases for physical and immutable long-term offline storage for this niche to go unfilled, but the niche is too small (at present) to prompt the development of a replacement medium and format, so while I am sure that the materials and read/write hardware will continue to evolve (better data longevity guarantees, read/write speed, physical durability, etc.) the implementations will remain compatible, or at least the reading ones will.


The research suggests that M-DISC Blu-Rays should be fairly durable if not handled often.

I think the disc players are the weak link. I can definitely imagine them going away nigh-entirely in a decade or three.


I didn't manage to find anywhere to buy these in my country. They could be tricky to get.


Microsoft glass storage is probably close to the best but not commercially available: https://www.microsoft.com/en-us/research/project/project-sil...

35 mm film is also interesting but probably costs a fortune: https://www.piql.com/services/long-term-data-storage/


This is definitely the correct answer - the glass project is really good, and having spoken to people working on it they are absolutely the closest to nailing this. It is close too, at least to a wider rollout.


Ah damn that glass thing actually seems really cool. I’ll have to look and see if there’s much info out there about it right now. Ive heard a little bit about very fast lasers (link mentions femtosecond lasers) and was curious what all sort of non-academic uses they see.


Stone tablets are the gold standard in the most stable storage dept. They can support arbitrary information encodings including text and binary and satisfy the < 1 GB, > 30 years criteria.


Baked clay is likely superior. Ceramics have fewer impurities that stone has that would cause data loss from standard things like erosion. Much of the oldest preserved writing is preserved on ceramics.

For another expensive approach, I suspect gold, like the gold voyager record would surpass it.

I imagine you could encode a lot of data in something like a QR code stamped in clay.


The oldest surviving tablets are stone tablets, so as a stone tablet marketing guy I'll have to say that clay is still relatively unproven technology. (A more hasty experimentalist might conclude that it is proven - to be less durable than stone)


As a titanium engraving marketing guy, I'd argue that my technology is younger and even less proven than ceramics, but I'm going to bet it's more durable than engraving in stone.


The technology is probably superior, but there's a conceptual flaw that is titanium being a valuable material. This results in a risk that data gets erased and the carrier material turned into jewellery, prosthetics or other funky stuff.


Clay tablets were often wiped and re-used. A lot of the ones we have preserved were preserved because the buildings used for storage burnt down, hardening the clay tablets and preserving the writings.

In other words, clay tablets get re-purposed too.


The stone or ceramic tablets are fragile, so they can break.

Bronze tablets will not break. Unlike stone or ceramic, they are slowly corroded, but we have well preserved bronze tablets which have survived at least 2200 years.

A bronze tablet, or better a stainless steel sheet, can be engraved with text using a computer-controlled mill.


Wouldn't corrosion be easily prevented using electroplating? A gilded surface would have significant longevity on top of any metal.


If money is no object, where does, say, platinum-iridium stand on this?


Microfiches would be a great more practical alternative I think :)


Marble seems to be holding pretty well. Lots of broken ceramics.


I'd go with granite. But environmental conditions are just as important on longer time scales: humidity, temperature, acidity, biochemistry, etc.

Then again, selecting a physical carrier isn't going to ensure the longevity of data stored.

Far more poignant are questions whether data on those stone tables will be readable, let alone understandable. Ultimately, those tables will contain marks that - to us - signify 1's and 0's. If you want data to outlive us for just a couple of millennia, anyone far down the future will need to figure out which numerical system is used first. Then they have to understand the concept of binary representation of information. And then they have to understand the encoding.

Ironically, a JPEG picture of a cave painting stored on a stone tablet, is still far more brittle compared to actual cave paintings which lasted for tens of thousands of years.

Of course, other concerns such as curation and relevance of cultural objects stored as a digital representation also need to be factored in. Some stuff simply doesn't survive because later generations simply lose interest. The more time passes, the more longevity of objects solely hinges on simple happenstance and chance.


In fact, passing long-term information is a real life concern. Long term nuclear waste warning messages, for instance, have spawned the field of "nuclear semiotics".

https://en.wikipedia.org/wiki/Long-term_nuclear_waste_warnin...

In which you end up with these types of proposals:

> The linguist Thomas Sebeok was a member of the Bechtel working group. Building on earlier suggestions made by Alvin Weinberg and Arsen Darnay he proposed the creation of an atomic priesthood, a panel of experts where members would be replaced through nominations by a council. Similar to the Catholic church — which has preserved and authorized its message for almost 2,000 years — the atomic priesthood would have to preserve the knowledge about locations and dangers of radioactive waste by creating rituals and myths. The priesthood would indicate off-limits areas and the consequences of disobedience

Meanwhile, there's Github's Archive Program which as this nice tidbit on it's roadmap:

> The GitHub Archive Program is partnering with Microsoft’s Project Silica to ultimately archive all active public repositories for over 10,000 years, by writing them into quartz glass platters using a femtosecond laser.

https://archiveprogram.github.com/approach/

Which sounds nice. But also unrealistic since technology alone isn't enough to ensure longevity. Embedding the longevity of data into culture could also involve founding a "digital priesthood" analoguous to Sebeok's "atomic priesthood". A group of people who safeguard and pass on the knowledge required to read and interpret artefacts containing digital representations of information.


Pencil on quality paper is another good one. The main difference being, of course, that one lasts millennia and the other centuries.


Every few years we get some news about phase change memory (basically, microscopic plastic tablets), but they never go anywhere.

This question could have an answer by now, but it looks like everybody is optimizing for cost/GB.


What's the best way to encode the data though? I'm imagining QR codes. I'm guessing the DPI of any router that could do this would be pretty poor and limit your data storage though. Has anyone done it?


To be fair, transportability and shelf-storage are not particularly great for hundreds of megs of data stored on stone tablets...


Putting your code on GitHub and meeting the requirements to inclusion in their Arctic Vault would put it on storage designed to last 1000 years at least [0]. They use film reels for storage [1]. Then again, retrieval is not super easy ;-)

[0] https://archiveprogram.github.com/arctic-vault/ [1] https://www.piql.com/


Until one day you realize that you have the wrong name or you live in the wrong country and all your data is gone.


The best place to archive it is your current computer which you use every day, ideally with a VM of the machine that can run that code (maybe Open Virtualisation Format can keep it useable longer?). Then hope that when you want to use that program again in the future the VM will still boot on whatever host you have available.

Then come back to it at least once a year to run it again and make sure it still works.

At present we are lucky enough that Windows programs from 1990s still run under Windows 10 to some degree. Thank the folks at Microsoft for maintaining their operating system as a digital museum of archaic bug-compatible APIs.

Something to keep on the shelf would mean you ignore it for too long and it stops working.

Even something like a Python script may stop working due to changes in the language, and old versions of the interpreter no longer being maintained.


This sounds more like a riddle...

But at the face value, it's hard not to wonder whether you'd like to preserve a functional program, or its source code, or its architectural and design ideas?

Either way, your current perception of the program is likely tied to the current technology or perhaps even whole ecosystem around it.

So to preserve something like that, you'd need more than just storage.

If it's just for the source code, as in text, then as golden rule of backup goes - keep many copies in distributed but known locations. In other words - diversify and distribute. Whatever storage digital, or analog, or organic, as in human memory (storytelling is a type of storage too).

Though, likely, you mean the functional program. Thus you'd need to preserve the platform too, along with build tools. So, at least some system specs need to be preserved or a VM image for more or less stable virtualization envs.


The understanding of the original real-world problem, the very idea of a solution, and the language-agnostic mapping of that solution to program architecture (making good use of existing patterns and algorithms[0]) is the genius, the core value of high-level programmer’s work.

These ideas, as well as the appropriate external context, can be captured using any medium, be it analog or digital, since no one came up with a better way to convey it than text and possibly some diagrams.

It is part of the reason I consider well-written and well-structured high-level documentation more critical than tests or even functioning program itself. The latter are lossy, narrower representations.

[0] If there is anything novel as far as specific patterns and algorithms involved, they can be appropriately formalized and documented in isolation of the larger program and also in a language-agnostic way.


Indeed, there's an argument that the culture around anything (including software) is the main thing. And if the culture thrives, then that becomes the stable backup (or rather stable living vessel). Isn't there a Linus Torvalds quote that he has the world's largest distributed backup system for his code? But the point isn't that he bamboozled people into keeping copies of the Linux source everywhere -- it's that he started and oversees a living and helpful project.

I had a teacher who was known as a master printer. Students would ask, "How can I make a truly archival print?" His answer: "Show me something you make worth keeping, and I'll tell you."


SanDisk produced an archival SSD card that is supposed to be stable for at least 100 years. You could put that next to the instructions for reading through the SPI interface, which you can connect to with wires if a sd card slot were simply unavailable.

The card is write-once. They run around $90-100.

https://www.dpreview.com/articles/1049391591/sandiskwormsd


This is a neat idea. Raises the question, is there a device reader and screen that would last 100 years to make the system complete?


Honestly, if there's no device that can read SPI in 100 years, then not having access to your digital artifacts is probably not a big concern...

I would say that an e-ink display might last that long, and you could build a system with a microcontroller and a pretty adaptable power source. It could connect to a keyboard, or you could use a few buttons for a browsable interface. As long as nothing corrodes too badly. And you would want to inspect it for tin whiskers using a microscope or loupe before starting it up.


oops, I meant SD card.


"The most durable digital storage medium is stable at room temperature for 300 quintillion years, a material created by researchers from the University of Southampton’s Optoelectronics Research Centre, as published on 23 January 2014.

"The material, a nanostructured glass disc, also has an estimated lifetime of 13.8 billion years (roughly the current age of the universe) at elevated temperature of 462 K (190 C), and a capacity of 360 TB. It has been hailed as a particularly significant invention, as no other existing storage medium can so safely ensure that data will be accessible by future generations."

https://www.guinnessworldrecords.com/world-records/412399-mo...


definitely more durable than link you provided


Wherever you store this program, I think you should store all the comments attached to this post as well. I bet it will be pretty interesting reading in 30 years.


I don't trust any consumer storage medium to last 30 years, but the good news is I have had much success keeping document archives accessible for roughly a decade, then transferring the content to current technology. From floppy to mechanical hdd to cd-r to now ssd. Maybe consider something shorter term than 30 years, but with an upgrade path.


I’ve been doing something similar. After losing some important personal files back in the early 1990s due to floppies that failed, and hearing a lot of stories then about people losing data in hard disk crashes, I got paranoid about my personal archives and started backing them all up systematically and redundantly to the latest media (CD-R, DVD-R, etc.).

My strategy changed, though, with the arrival of cloud storage. Now I have all of my files—about a terabyte and a half—synced to multiple computers through Dropbox, and I back up each of those computers to an offline hard disk every ten days or so. My main computer is also backed up continuously to an external hard disk.

File formats are a different problem. Until around the end of the 1990s, I used a series of different word processor, spreadsheet, page layout, email, and other programs, all for the Macintosh, and I can no longer open a lot of those files. In more recent years, file formats seem to have become more stable, and I have tried to be careful to use formats that seem likely to survive as long as I will—docx, odt, mp3, pdf, jpeg, and especially txt.


This sounds like a solid strategy.

Most of my oldest documents were Wordperfect 5, so they were still readable. I have converted them to txt (accessibilty) and pdf (preserve formatting).

My biggest miscalculations were using CD-RW media during a period of time in the early 2000s (the media often cannot be read by CD drives the disc was not created on) as well as my choice to use cheap CD-R media which over time the plastic became cloudy, or the reflective layer oxidizing and then flaking over time.


Others have mentioned the Internet Archive. There is also Zenodo [1] where you can archive code and have a good chance that it survives long-term. Zenodo is run by CERN and even has an integration with GitHub, so you can easily archive GitHub repositories. Each archived item can get a Digital Object Itentifier with you could etch into glass to put on your shelf (so you can find your stuff again in a few decades).

[1] https://zenodo.org


Less than 1GB is pretty easy.

Copy-1. Compress a copy and email it to your self.

Copy-2. Burn to a DVDR and keep it on your shelf.

Copy-3. On a USB stick and store where you keep your passport.

Copy-4. On your rolling backups (you have backups right?...).

Copy-5&6. An extra DVDR and USB stick kept off-site (family/friend). Feel free to Encrypt it.

Copy-7. Your rolling backups that you keep off-site. Encrypted.

To be honest, since you already should have a good backup strategy, the cost should be like $5 for a couple USB sticks and DVDrs.


Yeah I think there is a lot to a strategy this simple. In the future there will be a big market for getting data from old media, and by picking normie-tech you will be able to tap into that.

I might swap DVD-r for HDD (with cables to connect to either usb or sata), but either would probably be fine.

30 years is not that long. 1gb is not that much. They main concern is protecting yourself from being unlucky. "what if there is a fire" "what if I forget my password / get banned" "what if I drop my backups on the marble floor" "what if I die".


I think off-site backups are the way to go regarding the fire scenario.

Having a print out of where your financial assets are located is a good first step for the scenario you die. This way your next of kin can track them down.

You should save your passwords in an encrypted file that gets backuped.

I've not got a good solution for if your email provider bans you.


Multiple methods would be most reliable as it spreads it out amongst various ones.

For a single copy long term paper or etched metal is probably the most reliable.

Now what is the most density you can get on standard paper? That’s a more interesting question.

Probably some collection of QR coded, multiple copies maybe.

Real talk: every 6 mos when checking fire alarm batteries check your storage (and as necessary migrate it to copies on new cloud systems etc).

I wonder if printing microfiche is something you can find easily.


Do what the Internet Archive has done[1] for some content, and pay for storage on Arweave[2], where it will be stored permanently (~200 years) across a broad set of servers who are highly incentivized to keep it strictly intact.

[1]: https://arweave.medium.com/arweave-the-internet-archive-buil... [2]: https://arweave.org


Wow that website is terrible on a phone. I wanted to know ardrive pricing but only found a page that asked me to find my wallet and take a picture.

I left my wallet in another room, but I don't think this is what it meant. Too bad it didn't tell me much more than that.


Your problem isn't storing the code, it's storing the environment it compiles and runs in.

Even if it's plain command line C you're still going to have potential issues with compiler compatibility 30 years from now. C will probably be ok if you code defensively to avoid explicit hardware dependencies, but for all anyone knows C will only be available in museums by then.

If it's something high level like Python, it's impossible to guess what state that ecosystem is going to be in 30 years from now.

Same applies to operating systems and tooling.

Vintage computer museum projects either store the complete hardware and software stack or run old code under emulation.

This was easy when you had (for example...) a VAX or PDP-11 that was essentially self-contained. It's going to get harder as processing and dependencies become more and more distributed.

I wouldn't even want to assume that something like Docker will look much like it does today, or if it does that it will be compatible with thirty year old images.


Visited the British Library Archive some years ago and asked the same question, they said - removable hard drives, stored horizontally in a climate controlled room. Apprently that's the best price point for density, MTBF for mechanical or chemical (temperature related) degradation.


Did they say how often they scrub / replace the drives?


I got the impression it was a "keep forever" policy.


Hands down best method is to print it out. This provides the advantage of being able to yak shave your own typesetting pipeline / make your own paper / build your own press / design your own paper based data encoding that’s better than anything we’ve seen before.


Paper. It's proven technology: written records last hundreds of years. You can easily encode arbitrary data as 8 bit mode QR codes and print them out. This gives you machine readability and error correction. Data density is not that good but it's very effective, especially for small amounts of data.

I wrote some binary decoding patches for ZBar for this exact use case. You can, for example, store video games in a QR code:

https://youtu.be/ExwqNreocpg


Thank you very much for your videos!


The video is from someone else. He also contributed code to zbar.


Perhaps you could consider turning it into an e-Book and depositing it with your local legal deposit library ? i.e. make it someone else's problem

On a more serious note, there is lots of good information out there about digital preservation, e.g. from UK national archives[1]

[1] https://www.nationalarchives.gov.uk/information-management/m...


Perhaps I simplified your question, but for me, every important digital "thing" I have is stored in my paid Dropbox account (2TB). I believe that as a rule of thumb, private tech companies invent and upgrade new technologies to make their product better, and a company like Dropbox is surely thinking of this. In my Dropbox account, I have EXE files of small programs I wrote in Visual Basic 20 years ago, and they still work after being stored there for at least 12 years. Good luck.


But "private tech companies" do not provide process to keep our data alive beyond their owner. Data will disappear if you stop paying. You might stop paying because you are dead, but also because you have health issues.


> most stable form of digital storage?

Well, if you're asking literally, then probably cuneiform clay tables (fired on purpose, of course). However, a higher density medium with a reasonable lifetime would probably be a 2D barcode engraved on a plate of stainless steel or something like that.

The ultimate of course would be 3D storage in synthetic quartz, but as a DIY solution, that is much more difficult to write (you need a short pulse laser for that), or even to read (for 2D barcode, any camera works).


While not very practical (currently), for the sake of this discussion, it's worth including storage of digital data in DNA [0]. In theory, if the encoded data were inserted into an organism with a long history of survival, such as horseshoe crabs, it could be retrieved from their offspring in 100,000's of years.

https://en.wikipedia.org/wiki/DNA_digital_data_storage


Make multiple copies in various places, including read-only like CD/DVD. Re-copy at regular intervals.

Do not underestimate the resilience of paper format, however it's harder to move it to digital again.

More ideas that you could also use for programs: https://meaningofstuff.blogspot.com/2015/05/backup-your-smar...


> however it's harder to move it to digital again.

Should be fairly easy if you print bar codes, qr codes, etc. We used to have programs distributed on paper with a simple hand held bar code reader. With modern printers and error correction this should be extremely robust. You could just take a phone cam picture.


Not completely related but I learnt programming partly from photocopies of "Game programming" books in BASIC (which I had to port dialects).

My implementations of those programs are all gone. I have many of them on 5.25" disks but even if they're working, I have no way of reading them now.

However, the photocopies books are still intact with the pages held together by an aging paper tag. Go figure.


These have worked OK for 4000+ years. There's a collection in San Jose:

https://egyptianmuseum.catalogaccess.com/search?search=contr...

Though some are the only remaining replicas from RAID-1 (RAIT?) groups.

Consider adding LDPC or something while you're at it.


I try not to reply to myself, but here's an interactive 3D scan of a 4770 year old homework assignment on a clay tablet:

https://sketchfab.com/3d-models/ancient-mesopotamian-stars-p...

The cost per bit of this data is mind boggling at this point.


Blue ray? They are hardened so the surface shouldn’t scratch, and the laser burns a metallic core instead of polymer like writable DVD


Only 3 decades puts an LTO tape firmly in play. But the drive is pricy if you buy new.

They're used so much by huge businesses[1], those archives will still be on tape in 30 years.

The tar format (tape archive) likewise will still be around.

Probably the biggest worry would be the interface connector, but considering you can buy serial adapters, and RS-232 is over 60 years old, you'll be able to get a USB adapter for whatever ports we have in 30 years.

The standard archive mechanism for the film industry when committing a film's footage to The Vault, is LTO and hard drive. Good enough for Disney is good enough for me.

If a cataclysmic change happens in storage media enough to unseat billions upon billions of LTO tapes, there'll be plenty of warning as the whole world changes over. And you'll be able to pick up spare drives for a pittance.

[1]: Shipping over 100 exabytes per year. This is slightly less than what just Seagate shipped in HDD capacity, but every byte of LTO storage is both with long term retention in mind.


> But the drive is pricy if you buy new.

They seem insanely expensive for what they are! The tapes themselves are cheaper than regular HDDs, which is nice. But when you look at the drives, what's basically a box for recording data on the tape costs thousands of dollars!

A single drive like that would be more expensive than my entire computer! Why is the pricing like that? Is it because overpricing things for enterprises is okay in the minds of the manufacturers? Low supply? Various certification requirements?

How come there are no cheaper compatible drive alternatives and everyone must therefore go for the used market? Actually, why haven't most people even heard of tape drives and backups that way - is it because of the pricing that this never got big or even the fact that not many consider doing backups all that well?


Older people who have been IT professionals are familiar with tape. LTO2 drives are pretty cheap and you can buy new tapes. I got mine inside a second hand server I bought, its my go to "last resort" back up mechanism.


3 decades is short enough to be "living memory", so no need for cuneiform tablets ;-)

My dad's Ph.D. is on PDP-8 magnetic tape. We went to a computer museum to try to recover it, but their PDP-11's Winchester drive (hard drive) had broken (and made dramatic noises), so we weren't able to boot fully in order to mount the tape. Eventually we ran out of time.

Over 30 years, the best way would be to teach someone new. Especially a child.

Educational computing is how Apple built their market, and how the Raspberry Pi is gently gaining market adoption for Linux. There's some LOGO code from 1997 that I wrote on a Mac Plus (and copied forwards repeatedly) at age 7, that still runs on Mini vMac.

The challenge is in finding someone else who believes in your posterity just as much as you. And that's just one challenge of having kids. (thankfully I'm yet to experience that responsibility).


DNA: https://en.m.wikipedia.org/wiki/DNA_digital_data_storage

It will last 10k years if stored reasonably. Storing GBs is no problem. Won’t go obsolete - the technology has been around for nearly four billion years.


> It will last 10k years if stored reasonably.

I've heard people bring up the half-life of DNA in topics of longevity (e.g. why it'd be problematic even if we'd solve many other problems), which is apparently around 500 or so years? https://www.popsci.com/science/article/2013-02/whats-half-li...

Wouldn't you also have to contest such eventual decay, unless you'd pick some very stable materials? Though the page you linked has "in vivo" as a separate category (only one of approaches), so maybe this is not as applicable with the alternative methods.


500 years refers to the chemical stability of DNA exposed to the environment. Under protected conditions it would last longer, and if inserted into a living genome, it could be replicated each generation and readable hundreds of thousands of years later.


I don't think it fares very well against any kind of radiation.


There is a good cyberpunk shortstory here with biotek graffiti artists releasing a virus in the underground with I_WAS_HERE encoded in it. Perhaps with a variant released as a cure for the ensuing pandemic that is eventually discovered to have HERE_I_WAS encoded in it too.


Redundancy?

I have a .tar.gz of my university account that's 25 years old at this point and a .ZIP of my old DOS, Turbo Pascal, etc. stuff that's a few years older. They've been copied so many times over the years and I'm not even sure the path my current copies took. They lived on floppies, on a CD-R for a long time, different PC hard drives, external backup hard drives, flash thumb drives, Dropbox / Google Drive / iCloud, and most recently a microSD card that lives in my MacBook's port. I'm sure it's been on a couple tapes and Zip disks but those media likely long outlived any installed drives I had. Can't remember ever getting a bit error or corruption on a copy. Even the CD-Rs that were well past their alleged lifespan read fine for me.


Suggest Arweave.

https://www.arweave.org/


What about tape?

Tape cartridges are high volume, inexpensive and the drives can be found on eBay or similar for under $200.

They don't do random access in any sort of reasonable time, but can be great for archival work.

Also, isn't there at least thirty years of development roadmap on the books for tape?


If just the source code: print it out on acid-free paper and put it in a safe deposit box.


Paper?

https://www.monperrus.net/martin/store-data-paper

I mostly jest. It would require a lot of paper. But it should be stable storage for potentially centuries.


You could upload it to some cloud storage like aws s3 (infrequent access tier) or cloudflare r2. For a GB of storage it will cost you around 1.25-1.5 cents a month. AWS will cost you 9c every time you download it, cloudflare will cost nothing.


And one day a random bot decides to ban your account and everything is gone.


You copy to Google and Dropbox as well.


That assumes that the policies of these companies don't change suddenly to ban all of the associated accounts in unison.


Have a credit card you maintain, with dedicated cloud storage accounts on AWS, Backblaze, GCP and Azure.

Upload your data to each of their services. AWS Glacier Deep Archive would charge pennies per month to store your data on their cold storage platform. Put a copy in four separate regions there, then again on the other platforms.

Then buy your own M-Disc burner/reader, perhaps two for backup, and burn the data to that. "They" say the media will last decades or more. Who knows about the readers, though.


If you only need to store a small amount of text I sell a product that can theoretically hold data for over 200 years on ferroelectric RAM. Ideally you would store the data on several different mediums in different locations but this device may be something you'd want in that mix.

https://machdyne.com/product/stahl-secure-storage-device/


I've heard that magneto-optical drives https://en.wikipedia.org/wiki/Magneto-optical_drive can keep your data for a real long time.

But I think in all honesty are better off uploading your data to github, Google, dropbox (and whoever else you can find), and relying on at least one of them being around for so long.


Not really what you're asking for, and I can't find the project anymore (I think it's the search engines? I used to be good at this.)

Anyway, there's a guy who etches data into ceramic discs and stores them in a cave or old mine in, um, I want to say Switzerland?. The discs themselves would conceivably last millions of years, and barring cave-in of the tunnels they're stored in, uh, yeah.


Quality consumer media in a current format, store three copies each in two locations. Evaluate integrity of the data each year to delay bit-rot and make new copies as needed, and decide if the format is beginning to disappear and it's time to create six new copies in a modern format.

It's cheaper and easier than finding media that you can literally forget about for decades and still find intact, and easily recoverable.


> ...store three copies each in two locations.

> ...decide if the format is beginning to disappear and it's time to create six new copies in a modern format.

This is funny to me because you might end up with way too many copies to manage.

But the advice about formats and the backups needing to be "alive" is good, even if nobody might want to hear that due to it needing procedures built around it. Most people would simply want to make backups and have them sit somewhere for decades, which sadly isn't as reliable of an approach.


Two copies could be enough if the filing system has mechanisms for verifying file integrity, so never more than four copies in total - no point accumulating them.

It's possible flash-based media could be left unused for two to three years before needing to be powered up, but I think this time-span gets shorter as the storage density increases.



Tape is still what many use to archive project files and the daily rushes for feature films. Certainly more robust than the multiple drives that get used in post.


Tape suffers from hardware progress. It gets harder and harder to find hardware to read a given tape format.

The market for tape is much smaller than consumer markets (in units sold)

You'll probably be able to read CDs in fifty years, just because there are so many of them.

If you have a fifty year old tape, even in perfect condition, you probably won't be able find hardware to read it.


LTO is pretty simple in read two generations back. Migration is pretty straight forward.

If you have a fifty year old tape, even in perfect condition, you probably won't be able find hardware to read it.

As I found out when migrating a pre-LTO AS/400 tape, there are rental places known to IBM partners that rent the drives that can hook to modern equipment.


If you know where I can get a qic tape drive, I'd really like to know !


What do you need to hook it up to?


I've thought about this a lot. It's becoming a serious issue for museums since increasingly contemporary artwork incorporates digital processes, for example video and photographs. Archivists worry about stability of storage media as well as long-term usability of CDROM, DVD, magnetic drives, SSDs, etc.

The ability (or lack of it) to play/view an original archive is already a major problem. Think about the history of floppy disks, zip media, digital tape cartridges, and numerous others. I recall these media being quite prevalent back in the 90's, and that's only <=30 years ago. (I have my own share of them.) Today these media are ancient history and soon, if not already, as inscrutable as the writing systems of obscure prehistoric civilizations.

As said in several comments, the situation is worse for software. Keeping long superseded equipment running is very difficult. (I have some of that too.)

Preserving source code should be pretty straightforward, printed out with archival ink on 100% cotton fiber, acid-free and buffered paper would work. The only catch is long-term storage of the printed document. Paper itself is subject to environmental degradation. Museum standards specify constant 20&deg;C, 50% humidity, sealed against atmospheric pollutants and no light exposure. That should hold it for at least 100 years. :-)

Technologies are great, seems a good bet that a brilliant startup will think of new ways to preserve the history of our epoch.


Just put a copy in S3 Glacier and maybe the equivalents from Azure and GCP. It’s ludicrously cheap, on the order of cents per gb per month.


Wow! Thanks guys for all the helpful comments! I didn't think I'd get so many replies and ideas! Thank you, thank you!


I have often wondered if baked clay tablets are the most cost-effective, durable medium we've invented, in dollars per bit-year.

Yeah, you can do archival diamond, or archival ink on archival paper. But will you be able to read it 1,000 years hence?

I doubt there's a market for a printer, except as a novelty device. I wonder what kind of bit density you can achieve, in clay?


We have parchments with inks that are thousands of years old.

That said, etched metal or ceramic would likely be better. A CNC with a dremel would would let you digitally generate your tablets at ~$200 + tablet materials.

Cheap ceramic tiles are $0.50-$1 per sq ft.


It seems like at a certain point it's the environment, not the material used, that is the most important factor.


It's not only storage. I think most will be fine after 30 years.

The problem is often: how are you going to read it after all those years?

I got floppy disks that are ~25 years old. But without an old USB disk drive there is no way to know if they still are readable. And if USB-C becomes the norm I am not sure if I can read them ever again.


You can get an adapter from USB-C to USB-A.


Getting it to appear somewhere in archive.org certainly can't hurt as long as it's not something private


In terms of permanent without need to interact, it's pretty hard. But with minimal effort you can do it.

Keep the stuff in text and move it around to different places periodically, and keep redundant copies.

At the moment I favour git. It's easy to set up multiple repos, and if one goes offline you can set up another.


Etch the code onto metal sheets?

The CrafsMan has a DIY video on how to do it, he's also just a wonderful treat in general. https://www.youtube.com/watch?v=4tYMUqsVhfc


If it's just for posterity, then consider printing it out as a poster. Because it's "art" its less likely to be lost, too. For extra credit, scan it back in and run it, at least once. I'd wager you'd learn a lot doing that.


Maybe archival DVDs? They sound good on paper and some of them show promising marketing test data but does anyone have any information on whether these products are likely to be any good for long term retention with infrequent reads?


Burned CD.

When I got married (20+ years ago) I burned a CD full of music for the DJ. Songs that were not likely in his/her collection.

I found the CD in a box about a year ago, and it is currently in my car stereo. All the songs are in tact.

Oh.. My xbox also reads and plays the music.


This recent HN discussion has a lot of information about the reliability of optical discs: https://news.ycombinator.com/item?id=30888906

My main comment has some hard numbers based on my own experience: https://news.ycombinator.com/item?id=30889474

In short: 77% of the discs I ripped after ~15-18 years had minor issues as demonstrated through read errors (when ripping with ddrescue) and checksums (in png, zip, and gz files; I did not manually checksum files back in the day). I was able to rip 99.88% on the worst disk, and I estimate that about 97% of the files are intact on that disk. CDs and optical discs more broadly seem to be reasonable, but I'm not really sure they'll be in good shape after 30 years. I'd consider them part of a backup strategy, not an entire backup strategy. I personally use a combination of external (magnetic) hard drives, flash drives, and optical discs for backups. Replacing the optical discs after 10 years or so seems prudent, as is redundancy and/or par2 files.


CDs and DVDs are cheap and burning them is fast and easy. Make several copies, calculate the checksum, and store them in different places. Every few years see how badly they've degraded and after hitting some pre chosen threshold you can return all of them. If their data really is less than 1 GB they could even store 4-5 copies per DVD for additional redundancy. Even on a single disk I'd imagine the errors are unlikely to be in the same region on all 4 copies.


Optical media doesn't hold enough, though. I've got a terabyte of downloaded media. Even with Blu-rays that hold 50 GB, I'm not going to bother burning and storing and keeping 40 of them.

I go with simple external hard drives. I get a new one every couple years and copy everything to it as my main backup. The likelihood of my in-machine drive and fairly young backup drive failing at the same time is pretty low. Even if that happens, the previous backup drive will have a good portion of it. And my important docs folder also goes to Carbonite online backup.


That's your use case (and probably fairly common). The original question specified < 1 GB so my comment was based on that.

If I had to save 1 TB I'd probably save 1 copy with a specialized storage provider, 1 hdd in a safety deposit box and 1 hdd at home. You could also use tapes if you don't trust the HDD hardware to last.


Blu-ray holds more than that.


They still play back. That doesn't mean there isn't progressive degradation that the FEC is correcting for. The only long term reliable optical disc medium is M-disc DVD. Everything else will likely be bad in 50 years or less.


I put my trust in Kodak Gold CD-Rs and a few years later they just didn't read.

https://brianhenryie.s3.amazonaws.com/2022/kodak-gold-cdr.jp...


That's a good scenario. I burned a lot of CDs back in the day too, and in 5-10 years the degradation became apparent. I could recover 90% of the data approximately, but not without tools like cdparanoia.


I’m surprised so many people suggest magnetic media for long term storage. They are extremely likely to be damaged at any point and from several sources and in the very long term they are guaranteed to fail due to cosmic radiation.


How do you encrypt your data in a way that you can decrypt several decades later?

The script you downloaded from GitHub, or data format used, may long be gone.

Especially relevant if you store in the cloud (you should probably encrypt in that case).


Rot13


Keep a copy on your local computer, either with cloud or local backup. Put another copy in a private github repo. Maybe email another copy to yourself, just in case. You should be fine.


Make your program so good or impactful that it is in high demand and/or goes viral (or make it part of something that does), let the Internet handle redundancy and preservation for you.


AKA the Torvalds backup system https://lkml.org/lkml/1996/7/22/8


Just go for a USB or Cloud backup. If you want something more durable you could codify the code on a QR and sculpt it on a rock, It may last some milenia.


Wasn't USB stick storage (flash) one of the best options? I remember reading that these basically last forever if you don't actually use them.



I love how the answers in this thread range from highly technical subject matter expertise to build a pyramid and carve your data on the walls.


yes... well, your code is the least of your worries, I'd say. Keeping the same environment around for decades will be far harder!


HTL BluRay (the expensive ones) are quite good, Panasonic used to sell archival version of them they advertised for 50 years.


Write two copies to tape, keep one on your shelf and put another into longterm storage on a different tectonic plate.


... but take care that they are not too close to the fault ( San Andreas)


I've been told paper/plastic tape is actually one of the best archival methods.


Print it out, make it into a painting kinda, sign it and enjoy it on your wall :)


You wrote a program that's a gigabyte of source code?

I find that incredibly hard to believe


< 1GB


Multiple copies that are themselves copied to new mediums every few years.


Store it in the blockchain


1gb of data? Also make sure you don't chose one of the 100,000 or so that didn't last five minutes


Stainless steel punchcards


Masked ROM.


M-Disc


Gdrive


In approx 30 yrs time you might be able to get the data back on Linux.


publish it to a blockchain


There are many different forms of digital storage, and each one has it's pros and cons.

The most common form is magnetic storage, which is what you find on your hard drive or a floppy disk. It's really good at storing data and can hold a lot of it. However, it is not very stable. Magnetic storage degrades over time, meaning that the longer you use it, the more unreliable it becomes. This means that you need to frequently back up your data and/or replace your hard drive fairly regularly.

Another form is optical storage, which stores information as a pattern of pits on a surface such as a CD-ROM or DVD. Optical storage is also not very stable, as discs can be scratched easily. But optical storage drives are cheap and easy to use (you probably already have one in your computer.) Optical drives can also read from many different types of discs, which makes them versatile.

Finally there is tape storage, which records data onto magnetic tape like the kind used for analog audio cassettes. Tape storage is great for recording large amounts of data quickly, but accessing information on tape takes longer than accessing something stored on magnetic or optical media.


> What's a good storage medium where I can keep and load again in the future?

USB flash? HDD? SSD?

> must keep for at least 3 decades

Most storage mediums won't have problems lasting that long if they're stored well. Particularly flash and SSD storage shouldn't take issue: tape and HDD has a small chance of mechanical failure even if you don't use them.

> must be easily transportable (for moves between houses and such) and can sit on a shelf

I mean, I think most of our current formats do just fine in that regard.

> Bonus points for suggestions on an equally stable storage type that some computer will still be able to understand in the future.

Almost everyone is still using SATA. There is unlikely to be a world in the future that is unable to "understand" serial ATA storage.


>> must keep for at least 3 decades

> Most storage mediums won't have problems lasting that long if they're stored well. Particularly flash and SSD storage shouldn't take issue: tape and HDD has a small chance of mechanical failure even if you don't use them.

I think you're overstating the durability of these systems.

I saved most every HDD I've used dating back to the early 90s. I wanted to get rid of the box of drives, so I got an adapter and tried to see what I could salvage from them before throwing them out. Ages varied from about 8 to 28 years. None of the ancient drives (>15 years) worked. I could usually hear them spin up, but I couldn't get anything off them. Several had the dreaded click of death present. I got some of the newer (<12 year old drives) to work. Even the newer 8 year old drives had really high failure rates. They were all working when I pull them out of service.

My old USB flash drives have also had high failure rates for me.

From my experience, expecting an HDD that isn't in use, to go for more than ~5 years totally idle is asking to lose the data on it. The data might still technically be there, but the mechanical+electrical system that makes up an HDD does not have high longevity.

To the original question. I actually just researched this topic extensively to make purchases for my back up systems and my conclusion is: If I really want data to go 30+ years. I'm using archive grade Blu-ray discs (M-disc and LTH). I burn multiple of the data I want, and I store them in dark places at separate locations.


Flash is a terrible idea. NAND cells are basically little capacitors, and they leak electrons over time. Current TLC drives actually rewrite themselves every month or so to keep read error rates low. If you leave them unplugged, they can't do that.

If you must use flash, keep it cool. We have some deer cameras in direct sunlight. After six months or so, the SD cards become unmountable due to media read errors.


An IDE or SCSI hard disk from 1992 would technically be easily accessible using adapters but mechanically it probably wouldn't spin up. Removable storage from that era like magneto-optical or SyQuest would require a rare and very obsolete drive that may also have mechanical problems.

I expect similar problems looking forward. Hard disks made today aren't rated for 30 years, NAND flash retention is much less than 30 years, and Blu-ray drives may be hard to find.


I had an experience a few years ago, retrieving old data from SyQuest drives. I was unsuccessful getting even a single bit recovered: several drives and several disks ultimately produced zero overlap in readability.


> but mechanically it probably wouldn't spin up

might not spin up. I recently booted a Sun 3/80 from 1990 and while the battery backed RAM that told it how to boot was long dead, the HD was not a problem at all. Once I replaced and re-coded the RAM, the machine booted right up. And this isn't just a one off story. I've got a bunch of machines from that era, and the HDs are not usually an issue (although occasionally they are).


I recently tried to boot and read a number of drives from 10 to 40(?) years old. (Pandemic. Was bored).

About half of them still spun up and were readable. Probably could have gotten more with heroic measures.

Most 3.5 floppies were likewise readable.


> NAND flash retention is much less than 30 years

Hard failures aside, I assume very slow charge decay might be somewhat mitigated by periodically copying files flash-to-same-flash (let alone another) -- or perhaps even rewriting them in place.


Honestly the best answer is to this question is one that requires more effort than "burn once, sit on shelf".

The best answer is "host a NAS that you actively maintain and has an offsite mirror".

I personally host an all-flash NAS for the performance and MTTF benefits over HDD. Since it's always on and performs regular scrubs there is no concern of bit-rot. The data I put there will be there as long as I'm around to take care of the box and pay the B2 bill.


This is ultimately the best option - checksummed in-flight data has a higher likelihood of being verified at both read-source and write-drain endpoints.

That said - I'm at a loss to understand what software is ultimately worth using without modification after 30y....


> I'm at a loss to understand what software is ultimately worth using without modification after 30y

I wrote a video game for my child’s birthday. Does that count?


(I wrote a video game for my child’s birthday. Does that count?)

For your child 100% so if that´s the goal why not?

Modification is a big word. Old cars can still be valueble in capital and make you feel much more alive then a brand new car.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: