The Cuneiform Tablets of 2015 [pdf]

WalterBright · on April 16, 2016

One of the sad consequences of perpetual copyrights is that copies of interesting items are not being made, and hence risk getting lost completely.

Houshalter · on April 17, 2016

See the graph on this page, it's very sad: https://www.techdirt.com/articles/20120330/12402418305/why-m...

Also many old films have been lost for similar reasons.

mintplant · on April 17, 2016

A good example of this is the early iOS game Rolando. Once the Angry Birds of its time, it's now just... gone.

ashitlerferad · on April 17, 2016

This Rolando?

http://rolando.ngmoco.com/

golergka · on April 17, 2016

Yes. I remember it from 2008, when gaming press for iOS was trying to become a thing, some guys have published a great and expensively produced video reviews of iOS games, and this was one of the best games they ever reviewed.

awinter-py · on April 16, 2016

As someone who doesn't care about proprietary 80s laserdiscs but does care about software produced in 2016, 'depend on the oldest still-used technology' is an interesting (non-horrible) design choice.

The python PEP for universal linux binaries advises building on an ancient version of fedora for ABI compatibility.

If you think about it, supporting many different deployment targets is kind of like time travel.

thechao · on April 16, 2016

The "ancient version of fedora" is the OS I work on for one of my deploys. It takes a 3-level bootstrap to get a working C++11/C11 compiler.

awinter-py · on April 16, 2016

ok so there are downsides to being a time traveler. But buildsystem blues is a small price to play for passing your final history exam, saving your parents' prom, or finding carmen sandiego.

anateus · on April 16, 2016

They mention Lincos, there's another project inspired by it that attempts to transmit Scheme called CosmicOS: https://cosmicos.github.io/

_pfxa · on April 16, 2016

As a sidenote, I like the (A B | C | D) and $A ideas from it, it would be some nice syntactic sugar for an actual lisp that we use today.

timthorn · on April 16, 2016

> The Computer History Museum, run by “hardware guys,” has an extensive and impressive collection of vintage hardware of all kinds and from all eras, but it sits there as a collection of lifeless beige boxes.

And yet The Centre for Computing History in Cambridge has its exhibits powered up - including a collection of several Domesday Machines.

outofband · on April 16, 2016

I think this is the point of the Living Computer Museum - http://www.livingcomputermuseum.org/

aaronbrethorst · on April 16, 2016

...and that's why I like to shoot film. It seems improbable that JPG or TIFF will go obsolete any time soon, but—someday—they will. The RAW files from my digital cameras are far less likely to be readable ten years from now.

But a negative or a positive image? Not a problem. Never gonna be unreadable. There's a time and place for both formats, just like there is a time and a place for both a Kindle book and a real paper book, and it's important that we make a point of thinking carefully about which the circumstances dictate: convenience or long-term survival.

arrrg · on April 16, 2016

Pragmatism and realism is important.

Some weirdo RAW format? Will probably suck to deal with. Already sucks to deal with. Don’t expect it to last long and maybe keep the JPGs around for your grandkids or rather grand-grand-grand-grandkids.

JPG? I’m quite willing to bet that, baring the apocalypse, this format will be easy to read in 100, 500 or 1000 years. I’m really that certain about it. I’m even certain that in 100 or 500 years whatever OS on whatever device will come with out of the box support for JPGs. Maybe you will need an afternoon of digging to get JPGs displayed in 1000 years … but that should be about it.

Ok, ok, those predictions are quite absurd, but I really would be wary of underestimating the existing and continuing ubiquity of the JPG format. There are probably billions of JPGs created every day, even if you limit it just to JPGs that result from light hitting some sort of image sensor. And that will continue for at least a couple decades. In the end a, what, maybe 80 to 100 year active lifetime of the format (remember, we are already at a quarter century) might seem short in the grand scheme of things, but the cost of just keeping support around will be so low … (again, barring the apocalypse)

My OS can open some weird-ass image formats right out of the box. I can None of those formats (no other image format, period) has ever been as popular or as widespread as JPG … by a wide, wide margin. JPG is gonna be ok. If nothing else.

So I wouldn’t worry about the format. I would worry about what you use to store those images and how you will hand it down. (Often analog photos are on some attic or similar place somewhere, often long forgotten even by the person who took them. Then someone stumbles across them and that is beautiful. Not so easy to recreate with digital images. How do you stumble across that? How do you just keep it around somewhere safely)

You are certainly right that film negatives or positives are more likely to survive the literal apocalypse (where we lose all knowledge about using computers), but I’m not so sure about them being more likely to survive into the deep future … (Though in some ways you do have a point: I think in some ways it’s more likely that your grandkids will stumble across and know how to deal with your box of negatives in the attic than, I don’t know, your old hard drive? Your old PC? How do you even keep such stuff around?)

But my point is this: JPGs are easy. Move beyond that and it gets hard. Any kind of interactive content is a nightmare.

WalterBright · on April 16, 2016

Having a lot of faded old photos, I am not convinced of their longevity.

aaronbrethorst · on April 17, 2016

My fiber-based prints, properly fixed, washed and toned, are likely to survive for 150+ years.

WalterBright · on April 17, 2016

Famous last words :-)

But seriously, how many photos does anyone bother to preserve on archival quality materials? I've lost family photos due to fading, floods, and simply losing them. I've digitized most of the remainder, in order to preserve them.

aaronbrethorst · on April 17, 2016

For me? All of them that matter. I mostly print on gelatin silver paper in a darkroom, and I don't use digital inkjet paper that contains OBAs.

I have digital copies, of course, too, and those are stored on my iMac, in a backup hard drive, in Backblaze, and on S3.

thaumasiotes · on April 17, 2016

> But a negative or a positive image? Not a problem. Never gonna be unreadable.

This is pretty dishonest. Images fade; that's not a different problem from formats being lost, except to the extent that an image in a hard-to-read format might still be recoverable. For example, tiff is uncompressed pixel color values; you could recover that easily without having a tiff reader or even knowing what tiff was.

aaronbrethorst · on April 17, 2016

A properly developed, stopped, fixed, washed black and white negative stored in a box in reasonable temperature conditions isn't going to fade for a very long time.

thaumasiotes · on April 17, 2016

And a digital image with a text file accompanying it describing the image format is similarly resilient. Heck, you could provide text describing the format within the image file.

Also, I'll note that the comment I responded to specifically claimed that positive images last forever.

davorb · on April 17, 2016

The problem isn't that it's digital. The problem is where do you store it? Hardrives, CDs, floppy disks? All of that stuff has a very limited lifetime, after which it simply stops working. And even if it continues to work, what are your chances of being able to buy a CD reader in five to ten years?

The big difference between analog and digital is that with analog, you can simply stuff it in a box and it will stay there. With digital, you have to take the time every now and then to make sure that you don't lose that stuff. And do you really think that you (or someone else, when you are gone) will be doing that properly? I mean, you will most likely have forgotten why that stuff was even "important" in a few years.

There's so much great stuff our generation is going to lose, simply because we'll forget to migrate our backups.

kragen · on April 17, 2016

Maybe you should use PPM P6 format instead of JPEG, TIFF, RAW, or analog. Here's the implementation I used in My Very First Ray Tracer:

    /* PPM P6 file format; see <http://netpbm.sourceforge.net/doc/ppm.html> */

    static void
    output_header(int ww, int hh)
    { printf("P6\n%d %d\n255\n", ww, hh); }

    static unsigned char
    byte(double dd) { return dd > 1 ? 255 : dd < 0 ? 0 : dd * 255 + 0.5; }

    static void
    encode_color(color co)
    { putchar(byte(co.x)); putchar(byte(co.y)); putchar(byte(co.z)); }

(I was using x, y, and z to hold the r, g, and b components of the color, respectively.)

biofox · on April 16, 2016

Searchable online version of the BBC Doomsday data:

http://www.bbc.co.uk/history/domesday

Nutmog · on April 16, 2016

Excellent, except it's half missing as the article mentioned. This turned out to be a dramatic demonstration of how difficult it is to preserve data even when you're trying and you're as big as the BBC.

"You may remember collecting data for the 'National Disc' which, unfortunately, we have not been able to re-publish at this time."

Houshalter · on April 16, 2016

The other day I was thinking about what would be required to preserve all scientific knowledge. There are a great deal of papers that have been made available by hackers, but the filesize is quite large. Many of them are scanned documents.

Raw text itself is pretty compressible, but pdfs record everything about the layout and typesetting, font choice, small smudges on the page, etc. You could maybe run OCR on them and get the text (if the OCR is reliable enough.) But then you lose equations, figures, unusual symbols, and other important info.

Thrymr · on April 16, 2016

There has been some effort in Project Gutenburg to typeset public domain books (and at least one journal issue) in mathematics: http://www.gutenberg.org/wiki/Mathematics_%28Bookshelf%29 Doing this for all of the scientific literature is rather daunting, however.

jerryhuang100 · on April 16, 2016

Cuneiform Tables vs 2013/2014:

http://imgur.com/IaiqCXX

Houshalter · on April 17, 2016

Yes but the storage space of the clay tablet is a dealbreaker. It could maybe store 1 kb. And the file format has been lost, and is only usable by historians that have spent decades decrypting it.

kragen · on April 17, 2016

This paper is a major contribution to digital archival. I'm embarrassed that I hadn't read it until now. Thanks, HN!

I've been thinking a lot about how to solve this problem myself.

Kay says their Smalltalk virtual machine for the 8086 was 6 kilobytes of machine code; I think we can do several times better than that. The most recent BF interpreter I wrote, in 2014, http://canonical.org/~kragen/sw/aspmisc/brainfuck.c, is a bit over a page of code, and with -Os, it compiles to 863 bytes of i386 code (768 bytes .text, 38 bytes .init, 23 bytes .fini, 34 bytes .rodata). Maybe the ideal archival architecture would take a little more code than BF to interpret, because an actually efficient BF implementation has to do all kinds of somewhat unpredictable optimizations.

Some kind of Forth machine, like Calculus Vaporis https://github.com/kragen/calculusvaporis, is one possibility.

Another would be a simple register machine, something like the PIC; also in 2014, I wrote a proposal for a nearly-MOV-machine version called "dontmove" at http://canonical.org/~kragen/sw/aspmisc/dontmove.md. The C implementation at http://canonical.org/~kragen/sw/aspmisc/dontmove.c compiles to 855 bytes of i386 machine code (38 .init, 784 .text, 23 .fini, 10 .rodata) and is dramatically more efficient than a simple BF implementation, and unlike BF, it has features like memory indexing and subroutine calls; but it's not as well tested.

Each of BF and Dontmove took me about half an hour to implement. Even though a simple Dontmove implementation is exponentially faster than a simple BF implementation, it's still orders of magnitude slower than native code. I'm still exploring how to better bridge that gap; ideally implementing the virtual machine wouldn't take the entire afternoon that is the goal of Chifir, and it wouldn't run as slow as Chifir. I suspect that some kind of SIMT architecture (like GLSL and GPUs in general) might be the right path, allowing a simple emulator to amortize interpretation overhead over many lanes of data. I expect Alan would be allergic to this idea.

As mentioned in the Cuneiform paper, Lorie and van der Hoeven have published some papers on what they call a Universal Virtual Computer, directed at archival, but unfortunately some of the design decisions in the UVC run strongly counter to the goal of ensuring that from-scratch implementations have a good chance of being compatible: bignum registers, for example, and complicated fundamental operations like float division. The consequence is that writing emulators to run on the UVC should be very easy, but no two implementations of the UVC will be compatible, so those emulators will not run successfully on new implementations of the UVC written after we are all dead.

An issue barely mentioned in this paper is I/O devices. You'll note, for example, that Chifir has no mouse and no real-time clock, although it does have a keyboard and a framebuffer; its keyboard interface is somewhat underspecified but appears to lack control, alt, or other similar modifier keys, and there are no key-release events, and as abecedarius points out, reading the keyboard is blocking. This means that it will be impossible to write simple video games for Chifir that move your guy only as long as you hold down a key, and the lack of a real-time clock means that it can't do animations at a constant speed. It's likely that the specification and implementation of I/O devices for an archival virtual machine will require as much effort as that of the CPU.

drostie · on April 16, 2016

Somehow my brain turned off for the logo & the authors and I just started reading the main text. So somewhere in section 5, I started thinking, "Wow, this is great, it's like several talks I've heard from Alan Kay. ... ... Wait a minute..." Yep, he's the coauthor.

fritzo · on April 16, 2016

But there is mere 3-4 decade gap separating (1) when incomprehensibly complex but working computer systems are created, and (2) when sufficiently powerful reverse engineering systems can simulate those obsolete systems.

mmckeen · on April 16, 2016

I went to UCLA with one of the authors, weird to see this pop up on HN.

eschaton · on April 17, 2016

They wrote a paper with Alan Kay, why wouldn't it pop up on HN?