Hacker News new | past | comments | ask | show | jobs | submit login
Recovering “lost” treasure-filled floppy discs with an oscilloscope (scarybeastsecurity.blogspot.com)
306 points by scarybeast 27 days ago | hide | past | favorite | 53 comments



Oh wow, I did not realize that floppy discs used the same exact encoding that was used on tape drives of the era, e.g. the TI-99/4A [1]. The wave file in the article is the exact same sound I'd listen to as a child, loading programs from cassettes.

Total coincidence -- I recently have been recovering games from my old TI-99/4A cassettes, also using an oscilloscope (the same model in fact; for no particularly good reason beyond it was the most convenient way to record off the only tape player I still own), Audacity, and two open-source tools to recover such data [2] and [3].

[1] http://www.unige.ch/medecine/nouspikel/ti99/cassette.htm

[2] https://github.com/dimhoff/ti99_4a_tape_decode

[3] http://www.mrousseau.org/programs/ti99sim/


It's just FM, on floppies known as "single density" mode: https://en.wikipedia.org/wiki/Frequency_modulation

The common IBM PC and related formats use "double density" MFM: https://en.wikipedia.org/wiki/Modified_frequency_modulation


Interesting - I should really try to find the specifics on how AmigaOS stored its data. I'm guessing it's not MFM as the optional driver slash virtual device for reading IBM floppies required an MFM driver (thus implying MFM is not in use for standard AOS floppies).

[edit] it actually IS mfm after all https://wiki.amigaos.net/wiki/Amiga_Floppy_Boot_Process_and_...


FYI, the MFM/Modified Frequency Modulation is also described in the original Shugart SA800/SA801 8-inch floppy diskette drives manuals.

The OEM Manual describes the basic MFM information and the Maintenance Manual contains the PCB layouts, logic information and the schematic diagrams. The first two links are to the OEM/Theory of Operation Manual and the third link is to the Maintenance Manual.

https://hxc2001.com/download/datasheet/floppy/thirdparty/Shu...

http://avitech.com.au/mm-files/dilog-dq419/SA800-801_Diskett...

http://bitsavers.informatik.uni-stuttgart.de/www.computer.mu...

BTW, I still have two SA800 drives from my old CompuPro 8/16 S-100 system that I thought would be nice to play with again when I had a moment but I've no 8" floppies either single or double density. Does anyone know where one can get some?


The Amiga floppy controller hardware just DMAs the bits to / from RAM. MFM encoding and decoding is implemented in the software trackdisk.device by using the blitter.


Great that you're working on something similar! How are you finding those tools for handling degraded tape waveforms? I keep bouncing back and forth between hacking up my own vs. wanting to find some existing tool that has some clever math formula.


Both those I linked seem to work fine, though the tapes I'm recovering from are mostly in good condition. The Python one is a bit more robust than the one included with ti99sim, but much much slower and picky about DC offset.

(Note that both of these also decode the TI's frame format; but it should be easy to pull out the core waveform decoder of either.)


rewrite it for speed :)


> the same model in fact

It could only be Rigol or Siglent. Fascinating though. I know how slow floppies are, but I wouldn't have imagined an oscilloscope to have the necessary resolution.


Siglent SDS 1104X-E, same as in one of the photos.

In my case, the data was coming from cassette tapes, so the oscilloscope was absolutely overkill in terms of raw bandwidth by 3 orders of magnitude. However I was operating at its memory limit of 14 MSa (megasamples)... running at a sampling rate of 20 kSa/s, I had barely enough memory to grab the full 10 minutes of each tape side.

Its vertical resolution (8 bits) I was a little worried about, but the dynamic range of Differential Manchester/FM is so low it turned out to be a nonissue. Certainly the quantization noise is comparable to the noise floor of the basic-quality cassettes that my data was on.


It could be that the disc read head is more sensitive, or able to hover closer to the disc surface

Unlike HDDs, floppy drive heads do actually rub on the surface of the media in normal operation. This is notable because repeated "scrubbing" in an attempt to statistically recover data[1] can make things much worse, especially if the media surface (or worse, the head) is already damaged.

The attempt that bore some fruit was to instead locate the start of the sector data, and continually look at the next 8us of the analog stream. If the voltage appears to generally drift in one direction, then it's a "0" bit, or if the voltage peaks one way and then the other, it's a "1" bit.

More distinctly: if the signal difference between two samples 8us apart is approximately the same, it's a 1; if there's a large difference ("large" being something suitable and AGC-ish like a running average or similar), it's a 0.

[1] For optical media this is not a big concern, and I once wrote some code to recover degraded CD/DVD(+/-R) sectors by disabling the ECC of the drive and reading the raw data many times; it was surprisingly effective and interesting to graph the results as they were being obtained in realtime and see the bits literally rising out of the noise.


> I once wrote some code to recover degraded CD/DVD(+/-R) sectors by disabling the ECC of the drive

How did you do this? Is there some SCSI/ATAPI command for this?


I found out how to use a vendor-specific command to read the raw data from the buffer of the drive before descrambling and ECC. Look up the "FriiDump" project for some more background information. I then patched the firmware in memory (that took a lot of work...) to cause regular read commands to return the raw data.

(Contrary to some of the other comments in this thread, the drive was actually an LG.)


Wow! It does sound extremely painful haha. Thanks for sharing, that's pretty useful info.


What data you can get depends on the disc drive. DiscImageCreator [0] or the redump.org wiki [1] are good starting points.

[0] https://github.com/saramibreak/DiscImageCreator

[1] http://wiki.redump.org/index.php?title=Dumping_Guides


I'd also like to know. I tried to send those commands (SCSI I think?) to the drive, but it seemed to just ignore them. Heck, it seemed to ignore a huge portion of the different commands I tried to send it.


Sounds like "READ RAW". From memory it only works on certain drives - I want to say LiteOn but my memory is hazy.


That matches with my recollection too. I remember getting a specific model of a LiteOn drive specifically to ensure that I had the ability to do this, both for data recovery and because some forms of copy protection abused disc read errors and made reading your own CDs a (kernel blocking) pain under linux. I think the source of `cdparanoia` [1] has some good examples.

[1] https://xiph.org/paranoia/faq.html


I'd be interested to see the results of mounting a much smaller head (eg. a hard drive head) to the floppy drive, and then creating a full 2D image of the disk surface magnetics.

That should be able to read disks where the disk surface has stretched or warped, and the tracks are no longer perfectly circular.

I think there's also a reasonable chance such a method could also be used to recover data on a disc that has been overwritten.


Related to that there are many different copy protection schemes and other formats for writing the data that don't result in circular tracks. Lots of old computers addressed the drive at such a low level that it was possible to directly control the head between tracks, and control the rotational speed of the disk.

For example there's this project that reads stranger formats http://cowlark.com/fluxengine/



Some drives (like those for Atari 8-bits) were intelligent, so software didn't have direct control over the recording parameters. So publishers would create custom authoring drives that could write bad sectors on command; the program loader would check for a bad sector and refuse to load if it wasn't present.

A more-clever technique would write two sectors in a row with the same sector number but different data in each. The program loader would rapidly demand that sector twice, getting different data as the disk passed under the read head. A copy program, however, would only copy each sector once and be missing the data from the sector with the duplicate number.


A more-clever technique would write two sectors in a row with the same sector number but different data in each.

This is also done in some optical disc copy protections, although the identically numbered sectors are far apart, causing seeks from the end to return different data than seeks from the beginning of the media.


Hey @PostThisTooFast, you seem to be shadowbanned based on two flagged comments when your account was new in 2015.

Hey @dang, this account has made many good comments since then.


> To store a "0" data bit, there will only be one peak in an 8us window; to store a "1" data bit there will be two peaks.

Looks to me like this is just FSK (https://en.wikipedia.org/wiki/Frequency-shift_keying), and if so that's great because it means you can use techniques more sophisticated than the ones in that post, which in turn means that you have a chance to go after more 'unreadable' discs.


Do you have any recommendations for any particular techniques or software? I'd gladly try them.


The classical approach is a matched filter.

Here's some references: https://www.rfcafe.com/references/articles/wj-tech-notes/fsk...

http://edge.rit.edu/content/P09141/public/FSK.pdf

There's lots of matlab code floating around out there, if you're willing to try gnuradio that'll work (for example: https://nccgroup.github.io/RFTM/fsk_receiver.html), and this looks promising: http://www.whence.com/minimodem/

ETA: Audacity might be able to do it too : https://www.youtube.com/watch?v=tKNNnbgoGdI


This is how I've done it in the past, seems to work quite well on your wav file :)

https://colab.research.google.com/drive/1Zvb6bQfC3thc5-39exp...


Just a little nitpicking note: the spelling of disc versus disk. Per dictionary definition, it's two spellings of the same word. Per convention, "disk" is usually used with magnetic media (like floppies) while "disc" is usually used with optical media (CDs/DVDs/Blu-ray).

It took me as just a little odd to see "floppy disc" even if it's technically correct.


The blog post uses conventions associated with the machine in question, the BBC Micro, which is an iconic 1980s UK machine. It was pretty much "Disc" back then, e.g. the dreaded "Disc error 0E" from the OS, or the spelling written on the discs themselves, e.g. this Watford Electronics Diagnostics disc:

http://www.computinghistory.org.uk/det/21293/Diagnostics-Dis...

Not sure if it's a UK thing or a 1980s thing.


It’s a en-GB vs en-US thing, that warrants its own Wikipedia article: https://en.wikipedia.org/wiki/Spelling_of_disc


> Early BBC technicians differentiated between disks (in-house transcription records) and discs (the colloquial term for commercial gramophone records, or what the BBC dubbed CGRs).

I love that.

In my own usage I've always made the magnetic distinction without really knowing why. At least now I can identify the boundary as 'magnetism'..!


>UK thing or a 1980s thing //

I think it's both, in the UK at least.

The Beeb in school in the 80s had a 5¼-inch floppy disc drive. The BBC Domesday project, 84-86, was on Lazer Disc. But as Compact Discs took flight in the domestic market we'd moved to 3½-inch floppy disks; which by the 90s were labelled "diskette" IIRC and so were called disks. Then hard drives were called hard disk drives I think because they came from the global/USA market, so the only "discs" that remained in common use were optical discs, and the split - for me at least - was kinda-retconned in that optical discs were always 'discs' and magnetic were always 'disks'.

The early UK made hard disk drives were [sometimes?] called "disc drives", http://www.computinghistory.org.uk/det/22567/%20Acorn%20BBC%... this one from Cumana, Guildford, UK, http://www.computinghistory.org.uk/det/36026/Cumana%205.25-i... says "disk drive" though, and I'm shocked ;o) ...


Fading memories of playing c64 games from both sides of the pond suggest that "disk" was the US spelling and "disc" was the UK.


I (American) agree with chungy; there is a strong convention that specifies floppy disks and compact discs. We could say that "disk" is the American spelling of "floppy disk", but not that it is the American spelling of disc.

I suspect that "disk" is used because it is shortened from "diskette", which wouldn't work at all if spelled "discette".


From https://en.wikipedia.org/wiki/Spelling_of_disc:

By the 20th century, the "k" spelling was more popular in the United States, while the "c" variant was preferred in the UK. Consequently, in computer terminology today it is common for the "k" word to refer mainly to magnetic storage devices (particularly in British English, where the term disk is sometimes regarded as a contraction of diskette, a much later word and actually a diminutive of disk).

So in the mid eighties there was a distinct color/colour kind of split between disk/disc in the US/UK. And someone immersed in the world of restoring data from magnetic storage for a distinctly UK computer of the eighties? Eminently sensible for them to use the UK term, when everything on the computer is going to be saying "INSERT DISC 2".

----

There are also some notes in that page on how Phillips/Sony's choice of "disc" for the CD has ended up with that being the common choice for optical media vs magnetic; back in the eighties this convention was not yet established. And then there are also sections for disc/disk in medical literature, and in disc-throwing games. English spelling is weird.


And talking of interesting spelling, it's "Philips", not "Phillips".


that's my own special typo :)


I had to Wikipedia what term is used for magneto-optical media. As much as I'd love it to be "disck", alas "disc" (as in "MiniDisc") seems correct in that case.


> As much as I'd love it to be "disck"

You reminded me of the https://kol.coldfront.net/thekolwiki/index.php/Boock_of_darc... :D


Disk is (or can be) short for diskette - which means a disc in a protective sheath.

Meanwhile a "disc" is any flat circular object.

So floppy disk, hard disk, but compact disc, is correct.

The only medium I know of that doesn't obey this rule is the Sony MiniDisc. MDs are definitely diskettes, and so should in theory be spelled MiniDisk. But presumably that's for trademark reasons.


yeah geez, immediate reaction. Well established just search around that 'disk' became standard. It's weird when someone writes a modern piece like this and like, the whole time they never noticed it's spelt disk everywhere??! It's another one of those things where you ask yourself "what internet do these ppl surf?!"


Several years back I borrowed a colleague’s KryoFlux to archive hundreds of old 800KB Mac floppies that for over a decade had been sitting in a shoebox with no way to read them. Dozens were unreadable but I still created backups of the raw flux dumps of every one of them so I could come back to them later and do more analysis to see if they were simply unformatted or corrupted but recoverable. (Naturally none of them had useful labels.)

Doubtful there’s any treasures to be found here, but reading this article is giving me some inspiration to spend some time on this project again.


I love this. It's crazy the amount of effort, procuring hardware, debugging and manual massaging it takes to work with old hardware, but I think it's great not to lose the past even if the past seems trivial like Old McDonald's Farm game.

Another great restoration in all its technical glory is one of the flight computers from Apollo[0]

[0] https://youtu.be/Bh_gP5aF3ys?t=185


I had heard about a technique where tiny particles are spread over the media, and then optically imaged. This method was used to recover the most damaged media.


Super cool work! Those bits are worth working to recover! Unlike the bits about disk/disc ;)


How does one go about studying and working in data archeology because this is really cool and seems like something that will be in greater demand as old media gets obsoleted and need to be retrieved to be preserved somehow.


Honestly I think myself and Chris (@scarybeast) would like to know too...! We've been learning as we go along.

Certainly I'd like to make contact with other people who are working in this area and compare notes... if anyone doing so is reading this, please do make contact. I'm on Twitter under the same username, Chris is @scarybeasts.

If you prefer email you can stick "@gmail.com" at the end of my HN username to get that :)

At the moment I'm working on cleaning up the remains of the DiscFerret project, and turning the project website into a general retrocomputing data recovery wiki.


This reminds me of my own attempts to read Acorn Atom programs from cassette tapes. I wrote a program to process wave files. Took me quite long to find a reliable method for recognizing the zero's and one's. But even that was not perfect and required some manual editing. For more details, see: http://www.iwriteiam.nl/D0301.html#20


Fascinating and incredible work.


This reminds me I have a 200MHZ capable osciloscope that I only used when I opened the package and only tested the built-in function generator :(

Good thing I bought it cheap from China.


Interesting they think it was written to after the dent. I'd think it'd error during write.


The floppy disc controller doesn't have a way of knowing if the write worked or not. There are some corner cases here, like the drive can signal a fault line if it's on fire, but if there's simply a dodgy patch of disc surface, nothing will be apparent during the write.

So, the operating system could verify the write by reading it back, but I don't think the BBC Micro disc ROMs typically did that for file writes -- only for formats.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: