Hacker News new | past | comments | ask | show | jobs | submit login
Back up your precious files on ordinary paper (ollydbg.de)
193 points by adulau on Aug 13, 2012 | hide | past | favorite | 91 comments



Does anyone have a PDF of the source code? (Edit: well, good luck, folks: http://web.mit.edu/~mherdeg/www/paperbak.pdf)


This encoding system kind of reminds me of how the Nintendo eReader worked... http://en.wikipedia.org/wiki/Nintendo_e-Reader


> Redundancy helps to recover partially damaged data. Redundancy 1:5 means that for every 5 consecutive data blocks, if one block is completely unreadable, PaperBack will be able to restore it. To reduce damages caused by coffee pots and other common dangers, blocks are distributed around the page. Higher redundancy decreases page capacity but improves reliability.

Please do study error correcting codes. Most of them can offer much better error correction with less overhead. Take a look at Wolfram's MathWorld article [1]. This topic is heavily covered in most information theory books.

[1] http://mathworld.wolfram.com/Error-CorrectingCode.html


The page claims to implement Reed-Solomon error correction:

http://ollydbg.de/Paperbak/#8


The application allows you to select between 1:2 and 1:10 redundancy.


I don't know how many people remember this, but this reminds me of a technology in the mid-to-early 80s where computer magazines offered a gigantic QR-like code that could be scanned (if you bought the device) so that you wouldn't have to spend any time typing in the BASIC source code.


Oscar Databar: http://www.mainbyte.com/ti99/hardware/oscar/oscar.html

Software was published via standard one-dimensional bar codes, and instead of typing in the code from a book or magazine, you'd manually scan line after line of barcodes using a specially designed hand scanner.


That's a good find. The one I was thinking about that appeared in Apple ][ mags was the Cauzin Softstrip:

http://en.wikipedia.org/wiki/Cauzin_Softstrip


This reminds me of one thing

There was this small Casio musical keyboard at home, bought in the 80s (made in Japan I think)

It came with a scanning pen, you could scan a song (from a book) made up of several barcodes (just like the links posted here) and it would play the song.


Dan has a nice article about this: http://www.dansdata.com/gz094.htm


Baking informaiton into clay tablets isprobably the best bit-per-year-of-rentention technique that mankind has invented. Not sure what's second best -- ink on archival paper?

[I don't count the recordings attached to the Voyager probes. We can't easily retrieve those.]


Oh goodness gracious no, if you're going to measure in linear bit-years it isn't even remotely close. It's hard drives, because the amount of bits stored on hard drives is just so staggeringly enormous it dwarfs the faint traces of fragments of bits and pieces of the ancient world that we've managed to barely save. Orders and orders of magnitude off, even if you decide to count things like raw scans of the ancient stuff as opposed to merely the bits of information in the text and the other handful of useful details like the carbon dating.

You need to do some sort of correction that nonlinearly accounts for lasting more than a few years.


Exactly. A 2TB hard drive that is filled and only lasts one hour has a comparable bits/year ratio as 32KB that lasts 7000 years.


Yep. Though I recommend also creating a network of archives.. http://carlos.bueno.org/2010/09/paper-internet.html


I'm sure we can do better; we just haven't tested the newer alternatives over a long enough time span to be sure. Take a look, for example, at the Long Now Foundation's Rosetta disk:

http://blog.longnow.org/02008/08/20/very-long-term-backup/

They made a disk of sturdy, corrosion-resistant metal, micro-etched thousands of pages of text on one side, and covered the other side with a design of words spiraling into the center and getting smaller, until they were no longer visible with the naked eye -- meant to suggest at a glance what the disk was, and how to read it. The text is a record of thousands of existing languages, like a modern, souped-up version of the original Rosetta stone.

Clay tablets have nothing on that.


As with any archival storage system, the CAP theorem applies. Clay, Acid-free, Papyrus: pick two.


Acid free archival paper kept in a dark, fire proof box is to my knowledge the single best technology we've invented and easily at our disposal for long-term storage.

I think I might totally start 'printing out' photos I want to theoretically have available in fifty years and place them in a safety deposit box…


How stable would the various colors of ink needed for pictures be? I suppose you could also print out a separate B/W version of each color channel for later reconstruction.


Maybe you could print out a color calibration strip with known values, and then use that as a benchmark so you could reverse fading via software? Kind of like what NASA/JPL does so they can color-correct photos taken by planetary landers.

(I know that this isn't what the OP meant, but it's interesting to try and mitigate the limitations of the more common backup solution -- i.e., just printing the photos themselves).


He is putting the data on the sheet for the photo file, not the photo itself.


Next to paper or parchment (mentioned below), there is a technique, that uses microform. Here in Germany a lot of long-term archival work is done with this material. The information is scanned and "printed" on microfiches. In one image, there can be stored a lot (i think up to 16) pages and a lot of images are on one roll of film. Then these rolls are transferred into special containers and put in an old mine in the German "Black Forest".

There are to date more than 28.000 kilometers of microfilm with round about 850.000.000 images. They estimate, that even in 500 years there will be no information loss on these images.

German Link: http://de.wikipedia.org/wiki/Barbarastollen


Very true. Acid-free paper has a expected life of over 1000 years. Think about how long paper documents like the Magna Carta, ancient illuminated bibles from the 700's, Shakespeare's plays have survived without special climate-controlled environments.

Also paper in bulk is hard to destroy. A stack of paper is pretty dense and can survive to some degree a house fire or flood.


Aren't those documents on sheepskin? not paper?


Not just parchment, but also Iron Gall Ink. Which will eat both your paper and your fountain pen.

http://en.wikipedia.org/wiki/Iron_gall_ink


Good point. "Parchment" can also be made of calfskin or goatskin.



Don't forget papyrus!


Are you sure? How many of the clay tablets made did not survive? Just because you have an old one with data does not make it effective retention.

Engraving data on diamonds is probably better...


If the room the clay tablet is in survives, the clay tablet survives. You can't say the same about hard drives and burned DVDs. Pressed DVDs are probably fine though.


The longevity of clay tablets would be biased by the limited technology of the time. Even engraved granite would be a better choice today, since we can easily machine it now.


The coins produce by Long Now would probably beat this by a long shot.


"bitmaps produced by PaperBack are also human-readable (with the small help of any decent microscope). I'm joking"

Actually, tiny human-readable text is incredibly useful for verifying the data was backed up correctly (at least for text-based files). I read about this paper backup technique a while ago, and that was touted as one of the main benefits.


Reminds me of a proposal I read about once for a time capsule -- the kind intended to survive the fall of civilization and to help reboot it. It would be plain human-readable text and diagrams engraved into a hard disc of some stable material, starting out big and legible at the edges and gradually getting smaller and smaller as it spirals towards the center, until it's microscopic.

The text would start out as a Rosetta Stone type of thing, to try to establish a common language. Then as it got smaller, it would describe how to grind a hand lens so you can read further. Then even smaller text describes how to build a microscope. Maybe somewhat further on it talks about CD-ROM drives, and eventually transitions to /that/ format...


You're probably thinking of the Rosetta Project: http://rosettaproject.org/disk/concept/

...I always thought it was a very cool idea too.


Sounds like microfiche


True, the one I read about was paper based though, from a regular printer I believe.


Very funny. And not so impractical - send a letter with one side as description and other as full data.

For backups I can imagine such situation: "OMG! My HDD crashed!!! Ha! No panic! I made backups! Now I just need to restore them from this bunch of scheets. And my restore program is on this sheet. Now .. Erm.. Hm.."


And printing in just 3-tone color potentially increases the information density by a factor of 8.


Three color will take three times as many bits as input and store three times as many bits. Each pixel cell will have eight possible values, but that is just three times the information.


... at a cost of durability. Tho some sort of error recovery algorithm (along the lines of parity archive) could be used.. adding redundancy at a lower rate than what is gained from additional colors..


Color might be useful as redundancy: For example, one could read the data into HSV; the bit being stored could be replicated three times, once in Hue, once in Saturation, and once in Value. I'm not sure how practical this actually is, especially on the radical end of those values. Also, this kind of redundancy doesn't help against coffee spills, just printing errors, and not even well against those.


The color of the ink usually changes much faster, so it could be a problem to try to read it later. If the information is stored digitally, probably it is still possible to read it with the right calibration. But if the information is stored analogously it is really a big problem: for example http://secrethistoryofstarwars.com/savingstarwars.html



And this: http://ronja.twibright.com/optar/

Checkout the nice picture on their home page: http://twibright.com/ :-)


This is similar to "Paperkey - an OpenPGP key archiver": http://www.jabberwocky.com/software/paperkey/


I used to run backups onto VHS tapes using a PCI/ISA card adaptor. IIRC, you could get about 2GB on a tape, with a SVHS recorder, or 1GB on a normal drive. Something like this: http://en.wikipedia.org/wiki/ArVid

Of course, it was absolutely useless. Might as well have piped the data to /dev/null.


Was it not possible to read the data off of it?


It was, but I wouldn't have trusted it with anything important.


Surely if you can use colour aswell you could store LOADS more data per A4 sheet than 80 Meg...


I would be worried about yellow inks fading into white.


No one ever said yellow :) Cyan/Magenta/Green or any other trio of darker and differentiated colors may be able to do the trick.


Good idea, but that may introduce some really thorny issues with color calibration.


Also, potential data-corruption issues with color laser printers that add spurious yellow watermarking dots [1], though perhaps a sufficiently robust error-correction scheme would be able to work around that.

[1] http://en.wikipedia.org/wiki/Printer_steganography


I guess this would be useful in the case of an apocalypse scenario involving mass EM pulses (assuming anyone is still around to build new computers with OCR to reload the code). Better use good acid-fee paper stored at near-vacuum though.


If I can find a computer survived the EM pulses to scan the paper, then most likely I don't need this paper back up anyway. On the other hand if all my data is dead upon the EM pulses attack, I won't put too much faith on the chance of getting hold of any surviving computer.


Well, you could house your precious scanner in a Faraday cage along with a RepRap and other "reboot the world" tech.


Please elaborate; I'm intrigued...

A Faraday cage will protect against an EM pulse? http://en.wikipedia.org/wiki/Faraday_cage

RepRap is a self-reproducible 3D printer? http://reprap.org/wiki/RepRap

And so with the RepRap (and some power source/converter) inside a faraday cage (with other physical protections as well), you're saying that this is the ultimate backup system for worst case scenarios? Do you have any links to provide more information about such a setup?


You seem to have answered your own questions.

Yes, the Faraday cage protects against EM interference. But so does turning your computer off. Your main computer is fried? Pull out your old laptop and dust it off...


Would turning your computer off, help? I can see it helping for normal EM interference, but in an apocalyptic EM pulses scenario I thought the induced current would be enough to damage components.


EM would not erase optical media like CD's either.


No, instead those degrade naturally on their own.

"CD-Rs are expected to have an average life expectancy of 10 years" -- http://en.wikipedia.org/wiki/CD-R#Expected_lifespan

CD/DVD experiential life expectancy is 2 to 5 years even though published life expectancies are often cited as 10 years, 25 years, or longer -- http://www.archives.gov/records-mgmt/initiatives/temp-opmedi...


Regarding degradation: CD-Rs degrade because you write them by using a laser to effect chemical changes, and the chemicals can break down. Commercially-produced CDs, on the other hand, are pressed; they are physically molded by being pressed against a "glass master", and the physical pits are much more durable. Making the glass master is expensive, but the incremental cost of pressing is tiny compared to burning CDs.

So, if you could figure out a way to etch CDs instead of burning them, or make the pressing process cheap enough, you could make very durable CDs. Or if you want to make lots of durable copies of one CD, you can do that now with a glass master.


It sounds like it is possible to make longer lasting CDs using a glass master, but it is only practical if I am making lots of copies. So, for purely archival purposes, optical media doesn't really have a process for extending longevity.


That's why I write all my crucial data onto stone tablets.

The write speed and data density are terrible, the drive, media and storage space cost a fortune, but dang if the data won't last a couple thousand years.


Discussions like these always bring me back to the "10,000 year clock" ... http://en.wikipedia.org/wiki/Clock_of_the_Long_Now


I heard someone did this years ago, but with the intention of sending files via fax


Do you recall the name of the project? I'm very interested in finding alternative implementations of this.


I can imagine how difficult source control would be when using paper: "Hey Fred, I need sheet 7 thru 9 from revision 2.6.3!" "Which filing cabinet is it in?"


So a bit like backup tapes, then? ;)


Yup, like a cassette with its tape spilling out. Just imagine dropping the sheaf of papers on your way to the scanner!


Hopefully the papers would have some sort of index on them, so that the order in which they were scanned would be irrelevant.

Then again, punch cards didn't.


Punch cards were not that hard to sort after dropping them.

Once you have a deck, draw some thing on the deck edges with a permanent marker making sure that all the card edges have been marked.

If the deck is dropped, first of all it doesn't usually scatter that much (cards tend to stick to neighbor) and the marks on the edge help visually sort the deck pretty fast.


Actually punch cards could have indexes. Several programming languages did not not use the last columns of the cards and reserved them for a number which could be used by card sorting machines.

https://en.wikipedia.org/wiki/Card_sorter


Punch cards were generally created by hand so the ability to cut and paste new punch cards was considered a significant advantage. This on the other hand is created by a computer and not human readable so a simple index is easy enough to add.


interestingly, that's exactly what Radix sort was invented for - non-comparison sort in O(n) time


That's why you striped your card decks.


That's dangerous for long-term reliability - you need at least RAID1!

("Redundant Array of Independent Dockets"?)


And ...

The striping actually is redundancy -- it's a redundant ordering indicator for the deck. The nominal ordering is ... the card order. But should that be disturbed, the stripe (a diagonal marking across the edge of the cards) will indicate which cards are out of sequence.

I first encountered this trick in a college library with a physical card catalog. Having been pranked once too often by students re-arranging the drawers, the fronts of the catalogs were crossed with multiple bands of different colored tape. Any out of sequence drawers could be immediately noticed and quickly sorted visually.


Striped decks make for a better pick-up line.


You could even implement raid striping! For extra security and store the 'stripes' in different filing cabinets!


You could cheaply achieve RAID-6 this way! But data throughput might be abysmally slow though... so don't use it for servers. ;)


I will be nothing less than disappointed if I do not see someone on HN implement 'papernode 0.1' by the end of this weekend. Maybe some more literal multithreading action. Why not do the whole system... servers from fax machine arrays!


I just thought about this today. Worse case scenarios don't seem probable, though how devastated and delayed would life become if it did happen?


i thought we were supposed to be scanning all our important documents and shredding them?


"Olly, the author of OllyDbg, presents his new open source joke"

This page is 5 years old. Glad to see that it finally made it onto HackerNews.


Please tell me this is a troll? I mean, I know the author acknowledges the fact this is completely unnecessary, but still - I'm no environmentalist, but this seems to be an incredibly silly idea..


Less of a troll and more of a playful look at barcode style systems, including byte encoding, compression, error correction, format limitations,etc.

As the author says it's useful for a small number of very important files.


The very first line of the page introduces it as a joke


The author did all that work as a joke? I hardly think so.

I looked at the source code. It's 5,575 lines of original C code. (I excluded the AES encryption, bzip2, Reed-Solomon, and CRC16 source code from my count.)

I imagine that he wrote the words "open source joke" as a defense mechanism against people who would mock his efforts with lines like "please tell me this is a troll", "that is completely unnecessary", and "an incredibly silly idea".

Lots of people make self-effacing remarks when they introduce a work that they know will draw criticism: "It's just a first draft, it needs a lot more work." "I hacked it up during lunch [when it really took two weeks]." Etc.


> The author did all that work as a joke? I hardly think so.

Possibly not a joke per se. But have you never written something substantial just for fun?


Good catch, I didn't notice that at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: