Hacker News new | past | comments | ask | show | jobs | submit login
PaperBack: How to store arbitrary data on A4 sheets of paper (2007) (ollydbg.de)
131 points by ubutler 16 days ago | hide | past | favorite | 41 comments

this seems like one of the most efficient ways to store/print read-only data on paper, perhaps data can be compressed with wrt+paq8px so compressed (text) data can potentially be doubled to a total of 12*500 KB of books; 6 MB text on one A4, so the KJV bible could fit. https://www.mattmahoney.net/dc/text.html#1250

or... a few high res photographs of important moments or persons. A VR environment or 45 minutes of speech or music @1.5 kbps with Vocos or Meta's EnCodec https://gemelo-ai.github.io/vocos/


Perhaps make it an android, linux or ios app... even though running an old windows binary on android is feasible.

Or perhaps one should just use paper to fit a maximum of 75 KB of squashed readable multicolumn (6) on a landscape A4 printed with Tahoma smallest print. Perhaps use an alphabetic shorthand like bref, yash or briefhand for a human readable 30% compression. Who needs more then 100 KB on an A4 anyway? One and a half month of journaling. Would be nice if I could get one year of human readable personal insights/interactions/things completed on a double-sided A4 without advanced compression.

I spent way too much time on a project similar to this years ago.

I figured a 8.5x11 sheet of paper reduced to 8x10.5. With 1000dpi that is 84 megabits of data you can spit out. Way more if you use color. x5 if you just use the standard colors (white, black, cyan, yellow, magenta) 420 megabits. That comes out to about 52.5 megabytes.

Now realistically. You will probably have a pretty awful scanner (100-300dpi) you can get better but they cost more. Also your scanner has to be spot on the grid to get the full amount. Also your print heads need to be exactly calibrated. So that means some sort of slop in the grid. usually by lowering the DPI and by making the pixels bigger, some sort of alignment retical, plus some sort of data recovery (like in CDs). That wildly reduces the amount of actual data you can get in there. Oh and then fun things like using a jet ink printer and it ink bleed. I was lucky to get about 2-8 megabytes with my methods.

I stopped playing with it. The ink got too expensive for me to keep playing with it. A box of 10 reams of paper of 1000 sheets each put it in the 40-80GB per box range as you could use both sides of the paper. And wildly slow to get back out. But a fun experiment.

I've got this stuck in my head a bit now and I'm wondering if you have any thoughts on my idea.

I feel instead of trying to encode data directly in individual dots it would be better to encode data as small squares (or overlapping circle) of varying intensities. So a slower symbol rate but with more symbols. Using color you could have a 4D (C M Y K) space to distribute symbols into and probably hit some pretty high densities.

This way your less reliant on how precisely your printer/scanner can place and read a dot (which is not what they're meant to do) and more relying on how well they can produce an image (which they are meant to do)

not a bad idea... could work.

Also the child to yours has a decent link to using masking. That way you could encode 'color filters' into a few different spaces and get the most of the color ranges.

The more dimensions you can pack in, the higher your bit rate. For example I could have 256 glyphs. Just black and white. But if I rotate them (and make sure all the glyphs are rotatable) I can at least 8x the amount of data packed into one location (just using 45degree rotations). Color for me was just to another dimension. You could pack more by breaking the glyphs into 4 regions (or more) and varying the colors. There all sorts of interesting dimensions you can pack in there.

Wow, so with state of the art text compression and error correction, that could conceivably reach 400 megabytes. That'd all my paper books!

Now if I print my text with the smallest readable font I could fit around 500 kilobytes of human readable text on an A4, with shorthand compression more. The same as with paperback uncompressed. The has the advantage of not needing the software, power, a camera or a computer. Just a magnifying glass, good eyes, functioning cognition and light.

Megabits not bytes. it is about 50 megabytes raw data. You will lose most of it to paper alignment and reed solomon error correction. The 52MB is super optimistic. But only if you can get the paper to align correctly and read back the data extremely hyper accurately. I too started with glyphs like you did. But if you can get more colors and just treat them as pixels. You will probably approach the maximum of the medium. Which at point the Shannon CS papers kick in hard. For something like a 10pt font and 1000 dpi. You could be looking at about 3200 glyphs per sheet. Smaller is better obviously. But just to ballpark it that is about 80x40 glyphs. Using 256 different glyphs that is about 1MB of data, probably within most printer/scanners capabilities. 4x more if you use colors (not 5 as you can not use white), more if you do things like multi color glyphs. You may run into the problem of getting the OCR to get the 256 glyphs right though.

Give it a shot. It was fun to mess around with for a few weeks. Until I realized I was basically coating sheets of paper with expensive ink for 2-5MB of data.

OP's information density estimations seem quite optimistic to me. For 2D barcodes such as QR codes, it is generally recommended to use 4×4px blocks or equivalent. So assuming a 600dpi printing and scanning device you would have to encode your data at "150 dpi" or so.

It says in the top "presents his new open source joke", but I actually think this sounds genuinely useful. So many other media formats become unsupported by modern operating systems. Cloud services get shut down left and right. Tapes and floppies and spinning rust gets demagnetized. Optical media has a lifespan too. We have paper from hundreds of years ago that is in decent condition.

> Optical media has a lifespan too. We have paper from hundreds of years ago that is in decent condition.

All the optical media from hundreds of years ago degraded? What a shame

I was curious how this would compare to QR codes:

TFA suggests that consumer scanners and printers can successfully work at a minimum of 200DPI. A QR code can store almost 3kB in a 177x177 matrix, which allowing for some padding nicely fits in a square-inch @200DPI.

This would allow approximately 285485 bytes per page before compression. For better error-correction, you could up the ECC in qr codes, or you could apply reed-solomon to the data before encoding. The latter has the advantage of correcting errors that take out most of a single QR code.

I did something similar a few years back. Basically converted binary data to a matrix and printed it max res (1200dpi inkjet), also wrote the software to read it like a QR code (with perspective & error correction etc). You can easily save several MB per page. The theoretical number is much much higher for color printing, but is gets very complicated (inkjects are basically stochastic with regards to dots, YMCK balance is finicky etc). Also some time after that project I noticed that some gov agencies use printed "QR" codes to send sensitive data (ie biometrics) from office to office in the traditional mail format. It works, but so do floppy disks and nobody's using them anymore for a reason. It was a cool project though.

HCCB is like a color QR

Someone should reimplement this for use with cheap laser engraving machines so we can make physical backups on any material like stone, wood, stainless steel, or plastic.

How cheap are laser engraving machines? For ultra critical keys and generationally important data this sounds like a really good idea. I use M-disc currently but an engraved A4 or Letter size scannable durable material seems like the best option if it is available.

Edit: would this work?


Since it seems to accept an image, the question is what material would be best to run it on and then have it be scannable well on a standard flat bed scanner, which almost certainly will be available in some form as long as our current civilization survives.

You can get table top models for 1-2k.

That’s what I was thinking of

Any opinions on best material to be scanner later on flatbed?

Not sure but if you want to use a flat bed scanner then maybe if the engraving is deep enough you could just ink the plate and then press a sheet of paper onto it and scan the paper instead.

And not a single photo of a resulting printed page. Why do developers absolutely loathe showcasing their final product?

I agree with you! I see this so often that I often wonder what's going on in the minds of the developers.

You put so much effort and invest so much time to write software and documentation - but in the end not enough for an ELI5 with a picture or a simply descriptive text.

Natural history collections may still be interested in such. They merge the concept of needing small footprint size labels (e.g. 4 point Times) that go under the physical specimen. The label must not take a lot of space, so for 2mm long insects you have maybe 4-5 lines of ~30 characters. We've long used QR or 128 codes to encode a small barcode of the catalog number for that specimen. The hotness now is the "digital twin" or "digital exentended specimen" for these physical specimens. Perhaps encoding cheaply some key attributes on another label would let the physical part of the twin track the evolution of the digital twin. The key here is that these collection's goal is protection of the specimen over time, we've been doing it since Linneaus (over 250 years), so archival considerations (acid free paper and ink), and persistence of data beyond the initial layer are important.

>You may ask - why? Why, for heaven's sake, do I need to make paper backups, if there are so many alternative possibilities like CD-R's, DVD±R's, memory sticks, flash cards, hard disks, streamer tapes, ZIP drives, network storages, magnetooptical cartridges, and even 8-inch double-sided floppy disks formatted for DEC PDP-11?

Ah, the good old days! It is amazing what we have at our disposal now for very little money. I really do like the concept behind this however. It catches you off guard because you do think, just as they wrote, "why in the world would I use paper for a backup in 2024?" ... then they explain it in one sentence and it makes obvious common sense.

Is it? If you wanted to store something for 50 years which would you chose? 100 years? 1000? A million? It seems hard to beat engraving in stone. Possibly fill the groves with something. 3d printing is perhaps good enough.

Sounds like you might have a start-up idea?

Supposedly Github Arctic vault was stored in giant 2d barcodes on film reels https://en.wikipedia.org/wiki/Boxing_barcode

I am reminded of a chat I had with the founder of https://www.clump.org/ , trying to do paper backups as a service.

Interesting one, also for storing records long term.

Paper isn't ideal but if it's stored properly and there's no fire / water damage, it'll last for a hundred years easily, after that I believe paper gets fragile etc. But I'm sure there's modern day manufacturing and storage processes that can make it longer lasting, like including plastics or something in the paper, cool environment, and draining oxygen from the environment or replacing air with nitrogen, which can be done passively (a box type storage unit, fill with nitrogen, no pressure needed as nitrogen is heavier than air I believe).

But anyway, that's a lot of effort for relatively data-loose (not-dense) information and a lot of ideas for an industry (physical archiving) that's been around for hundreds if not thousands of years already.

For long term storage you could laser etch metal or stone plates. Then when you wanted to scan a copy you would first print the etched plates with ink onto paper.

Surely the manufacturing process for paper uses more carbon than was extracted from the atmosphere by the former tree? Or have I misunderstood the idea here/it's not regular paper?

Paper manufacturing isn't very carbon intensive. Paper mills for example look great on environmentalist scare-propaganda, but the big clouds of smoke are almost 100% water vapor :P The mills themselves burn lignin for power and even export it.

That's a very fun idea. I would really love for the blog to get filled with content because the economics of it fill me with questions

I'd love to see how mechanized you can get it as well

I back up my private SSH keys by printing them as QR codes.

I liked the cryptocurrency wallet idea of writing down a set of 10 or so words on a piece of paper.

I mean that ultimately won't matter because cryptocurrencies are unusable when society collapses, but still.

> cryptocurrencies are unusable when society collapses, but still.

Nah, I'm sure you could still scam a few people out of their rations with it.

It would be enough to write any hex-looking string in monospace font on a paper, though. No need for it to be backed by a "real" wallet.

The example image linked in that thread is 404ing, and I think I found a working equivalent: https://i.extremetech.com/imagery/content-types/00mAYSZIZNhu...

(it comes from this article https://www.extremetech.com/extreme/134427-a-paper-based-bac... )

"Grid should be more or less parallel to the sides of the scanner (maximal angle must not exceed ±7°), but general orientation is unimportant: portrait, landscape, upside down or even, if you use transparencies, flipped. Orientation may change from one paper sheet to another."

Transparencies. So technically not just paper!

Same name, same idea, better-looking implementation: https://github.com/cyphar/paperback

Before I clicked I assumed it was just going to be a pencil.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact