Hacker News new | past | comments | ask | show | jobs | submit login

>>Digital records are exponentially more complex to maintain in the long term. You need to deal with formats being unreadable, turnover of media, etc.

No they aren't. There are many open, non-proprietary formats that have existed for many years and will continue to exist for decades. TIFF is a great example.

In contrast, paper records cost a ton of money to store (they take up physical storage space, which means paying rent and/or opportunity cost) and they are also very easily destroyed in accidents such as fires or earthquakes. Oh, you want to create backups? Good fucking luck making copies of all the crumbling pieces of city legislation from 1960s.




The "black hole" argument is real, though.

Think of a catastrophe (natural or otherwise) which yielded a long-term interruption to the power grid— all those cloud-archived images and documents carefully striped with redundancy across multiple disks in different availability zones? All gone.

And yet a stack of papers in a fire-proof safe will be there and perfectly readable for centuries, with no intervention or maintenance required, and no special technology (at all) on the receiving end.


No organization keeps their records in fire proof safes dude. They keep them in basements and warehouses and the trunks of employees' cars. I sell and implement backscanning services for a living. I've seen it all, trust me.


Bit rot isn't supposed to set in for at least a decade on spinning rust drives. That's a hell of a power outage.


To be clear, I'm speaking about more or less permanent outage. The kind where even if the data center could continue to run indefinitely on diesel, there'd be no point because the customers of those companies have experienced sufficient disruption that shopping for things on the internet is no longer a major part of their lives.

There are any of dozens ways this can happen; suffice to say that Romans in 400 AD also thought their way of life was permanent, and a few hundred years later, there was basically nothing left.

So picture a future archeologist exploring an abandoned data center populated by AWS hosts or Backblaze storage pods or whatever, attempting to recover information from the hardware he finds there. Even assuming all the disks are fully intact, how much chance would he have of recovering anything at all of meaning?

We don't know tons about the Egyptians or Romans or Aztecs from what they left us as their worlds collapsed, but it's possible we know considerably more about them than we'll be leaving for those looking back on us a millennia or two from now.

(And it doesn't even take a catastrophe— look at the effort YC is putting into getting that Alto working, and that's a machine that's just a few decades old!)


I worked in a project in high school to digitize and index government records from 1600-1850. The paper stuff is in great shape.

The abstracts written in WordPerfect v.whatever? Not so much.


How many of those non-proprietary formats are really extensively used, though? The biggest chunk are probably in .doc, .xls, and .pdf files, generated from obsolete versions of Microsoft and Adobe products.

Does anybody actually use TIFF? I've almost never seen one in the wild.


TIFF is the most common format for scanning applications. PDF (which is also an open, non-proprietary format) is the second.

Any organization that uses a document management or records management software will have most of their documents in either (or a combination of) those formats.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: