
A 1970s Cray-1 hard drive has been imaged - 1880
http://blog.archive.org/2011/09/03/hard-drive-archaeology-and-hackerspaces/
======
ilamont
About four years ago I interviewed Ray Kurzweil, and asked him about the issue
of old data formats. In _Singularity_ or one of his earlier papers or books,
he had lamented the inaccessibility of old data, which prompted me to ask
about the role of standards. His response:

 _We do use standard formats, and the standard formats are continually
changed, and the formats are not always backwards compatible. It's a nice
goal, but it actually doesn't work.

I have in fact electronic information that in fact goes back through many
different computer systems. Some of it now I cannot access. In theory I could,
or with enough effort, find people to decipher it, but it's not readily
accessible. The more backwards you go, the more of a challenge it becomes.

And despite the goal of maintaining standards, or maintaining forward
compatibility, or backwards compatibility, it doesn't really work out that
way. Maybe we will improve that. Hard documents are actually the easiest to
access. Fairly crude technologies like microfilm or microfiche which basically
has documents are very easy to access.

So ironically, the most primitive formats are the ones that are easiest.

So something like acrobat documents, which are basically trying to preserve a
flat document, is actually a pretty good format, and is likely to last a
pretty long time. But I am not confident that these standards will remain. I
think the philosophical implication is that we have to really care about
knowledge. If we care about knowledge it will be preserved. And this is true
knowledge in general, because knowledge is not just information. Because each
generation is preserving the knowledge it cares about and of course a lot of
that knowledge is preserved from earlier times, but we have to sort of re-
synthesize it and re-understand it, and appreciate it anew._

Source:

[http://blogs.computerworld.com/the_kurzweil_interview_contin...](http://blogs.computerworld.com/the_kurzweil_interview_continued_portable_computing_virtual_reality_immortality_and_strong_vs_narrow_ai)

~~~
ilamont
Related thought: The "Rosetta Project" (<http://rosettaproject.org/>) aims to
build a publicly accessible library of human languages. Besides the website,
part of the original concept was to seed the world with "Rosetta Discs", metal
spheres with an approximately 2,000-year lifespan, etched with optically
readable samples of thousands of written languages. Future humans who found
the discs would be able to review and understand the dead languages on it as
long as at least one of the languages on the disk was still known. The website
(<http://rosettaproject.org/disk/concept/>) describes how it would work:

 _The Disk surface shown here, meant to be a guide to the contents, is etched
with a central image of the earth and a message written in eight major world
languages: “Languages of the World: This is an archive of over 1,500 human
languages assembled in the year 02008 C.E. Magnify 1,000 times to find over
13,000 pages of language documentation.” The text begins at eye-readable scale
and spirals down to nano-scale. This tapered ring of languages is intended to
maximize the number of people that will be able to read something immediately
upon picking up the Disk, as well as implying the directions for using it—‘get
a magnifier and there is more.’

On the reverse side of the disk from the globe graphic are over 13,000
microetched pages of language documentation. Since each page is a physical
rather than digital image, there is no platform or format dependency. Reading
the Disk requires only optical magnification. Each page is .019 inches, or
half a millimeter, across. This is about equal in width to 5 human hairs, and
can be read with a 650X microscope (individual pages are clearly visible with
100X magnification)._

The disk still seems to be a work in progress, but the Rosetta project is
concentrating on many Internet and audio-based initiatives (see
<http://www.nytimes.com/2011/07/29/us/29bcculture.html>)

~~~
arjunnarayan
I like the way the disk begins in eye-readable scale and tapers down. But why
not continue this even further? At some point, when enough exposition has been
done so that the corpus is fairly understandable, you could then introduce the
concept of binary encoding. And then after a few short examples and mappings,
then continue the rest of the disc in binary form.

One could theoretically take this further by then explaining how we built our
primitive computers, some simple math, and continue.

I would love an accompanying Rosetta project that was in just one language,
but exposited our understandings of math, physics and computer science so that
some civilization that discovered the twin discs could use the first one to
learn English (as long as they knew or could decipher at least one of the
languages) and the second one to reconstruct our understanding of math,
physics and computer science and rebuild a 2000 AD era computer, and finally
input to it a tar.gz dump of all of Wikipedia.

------
epo
Gosh, some actual "hacking" in Hacker News.

Just waiting for the comments asking "what does this have to do with startups"
...

~~~
4ad
I'd like to see more hacking news and less stuff about startups as well.

Great article, it's incredible no more Cray-1 software has been preserved.

------
michael_dorfman
Oddly enough, I dreamt last night about the old 9-track, half-inch, reel-to-
reel tapes we used to use, back in the day, and woke up wondering how much a
reel actually held.

Looks like it was 140MB at highest density.

Now, excuse me a moment, while I try to get those pesky kids off of my lawn.

~~~
sgt
I think that is the type of dreams HAL would have.

------
ericHosick
Specs on storage module drive:
[http://ia600707.us.archive.org/32/items/9760_9762_oct73/9760...](http://ia600707.us.archive.org/32/items/9760_9762_oct73/9760_9762_Oct73.pdf)
.

The unit has an average seek time of 30ms in 1973. Today, in common 7200 RPM
drives, it seems to be around 8-9ms. There is only a factor of improvement of
3 to 4 times.

I find this pleasantly surprising.

~~~
mjb
Latency has been scaling worse than almost any other factor for nearly the
entire history of computing. The latency/throughput ratio for storage,
networking, memory, cache and almost everything else has been rising
continuously since the 70s and is likely to carry on rising.

This is going to have big consequences for the way that we design systems in
the future. Transferring a large amount of data is going to be cheap and fast.
Seeking, handshaking, back-and-forth and any other latency sensitive
operations are going to be slow and expensive. This has already played a big
role in algorithm design in HPC, and is going to start being felt to a much
larger degree in the wider field over the next decade. As the
latency/throughput ratio gets bigger, the tradeoffs behind optimal system
design will change.

30ms for a Cray 1 was only 2400 clock cycles. 8ms for a modern CPU is around
30 million. That's a big change.

~~~
pflanze
> 30ms for a Cray 1 was only 2400 clock cycles

2.4 million clock cycles. (According to Wikipedia, the Cray-1 had a 80 MHz
clock, which you also mention in your other post.)

~~~
mjb
You are indeed correct. That's kind of how it worked in my head (about 12x
change) but I didn't type it that way for some reason.

------
sgt
Didn't I read about someone who built a replica of a Cray, but struggled to
find actual software for it? This could be exactly what they needed. I think
the story of the Cray replica (scaled model) was featured on HN.

~~~
ra
<http://news.ycombinator.com/item?id=1645291>

Same guy!

------
0x12
That's pretty impressive. Not only did they image the disk, it was a disk that
had had a head crash and one damaged head which they managed to restore to
functioning well enough to capture the data.

Now of course the big question: what is on that disk?

~~~
danbee
I believe the head crash occurred on a different disk pack, one that they were
using to test it.

~~~
Maci
From the Paper:

 _Unfortunately, within 30 seconds of the heads being loaded a high-pitched
whining noise began to be emitted from the drive, implying a potential head-
to-disk contact was taking place. The drive was then powered down and the disk
pack and heads were carefully examined. Thorough examination revealed that
Head #4 on the drive (which reads the bottom surface of the lowest data
platter) had 'crashed' into the disk surface and scraped away a concentric
ring of oxide material, permanently damaging the platter. This is a good time
to point out the advantages of not experimenting with your primary source
material when performing digital archeology experiments!_

Src: [http://www.archive.org/details/2011-cdc-disk-archaeology-
fen...](http://www.archive.org/details/2011-cdc-disk-archaeology-fenton)

------
ColinDabritz
This reminds me of the 'Digital Needle' project, which was a hacking project
done by Ofer Springer back in 2002. The idea was to use a flatbed scanner to
play a vinyl record.

<http://www.phys.huji.ac.il/~springer/DigitalNeedle/>

It came surprisingly close. Perhaps with today's technology we could do
better?

~~~
DerekL
A company called ELP sells laser-based turntables. They start at $9900.

------
pud
This reminds me of the world's geekiest joke.

Q: Why do Cray supercomputers have a clear panel in the front (you can see it
in the photos)

A: So you can Seymour Cray.

------
apaprocki
I can imagine how difficult this was to pull off. I came across some old
Colorado tapes that had backups of BBS related stuff, files, etc. from 10
years ago. I figured I'd try to dump the data off of them using a Linux live
CD only to find that Linux dropped floppy-interface tape drive support a long
time ago. Next time around, I'll pull an ancient live CD version and get the
data off before it goes the way of the Cray. Constantly migrate your data
forward to avoid these hassles!

------
WalterBright
About a year ago, I needed to get some data off some old PDP-11 8" diskettes.
Fortunately, an old friend had an LSI-11 in his closet with the disk drive. He
hadn't run it in years, but it fired right up, and the disks read perfectly,
and he sent me the images.

Whew!

(thank you, Cheshire Engineering!)

~~~
WalterBright
My PDP-10 software, unfortunately, was gone for good. It was all on a magtape,
and the drive that wrote it was way out of spec and the tapes were unreadable
on any other drive. (sob)

------
mMark
Norton Ghost?..

~~~
zephjc
Seeing as the disk crashed, closer to something like divining the data by
waving a magic magnetic wand over it.

