"Q. What is this?
A. A challenge to confirm whether or not a professional, established data recovery firm can recover data from a hard drive that has been overwritten with zeros once. We used the 32 year-old Unix dd command using /dev/zero as input to overwrite the drive. [...]"
It's been over a year, and nobody has accepted the challenge yet. Even if there isn't any prize money to win, I'd think that the PR opportunity would be quite enough for any data recovery firm to do it.
So my conclusion is: overwriting once is plenty good enough. You want to overwrite the whole disk though, otherwise the filesystem might leave metadata clues even after the file has been overwritten and unlinked.
It's not worth it - who needs the PR.. it's worthless when we recover zeroed disks frequently anyway :) If their willing to meet the cost to recover it (I'd do it cheap for £600). We are UK based so I guess they wont.
I dont think you can draw the conclusion that overwriting once is plenty fine based on their conclusions. At least not till:
- Someone has tried their disk (people not wanting too is different from giving it a shot :))
- Someone trying a more real life example (install an OS, then copy some files in, then DD it).
Considering the second one has been done... :)
They are quite welcome to clone any old OS drive onto a disk and wipe it the same way with DD. Then fly it over to us here and pay the £1000 (ish) cost to recover it. (yes, I know that is a bit outrageous but so is their "challenge").
It has? Do you have any source/article you can link to? I'd like to read about it and how it's done, seems to me like you'd need a fair amount of black magic to do it!
As I explained elsewhere we get wiped disks sent to us weekly. Some will have deen blanked with 0's (perhaps one a month). I know of only a few that specifically have had dd used on them - but usually we dont know the story of the disks :) so it could be higher.
We have a SEM that produces an image for our analysts to rebuild with a variety of software packages (Encase Enterprise is one example, and we have several pieces of kit from accessdata. Plus scripts/programs written in house).
With a zeroed disk your looking at minimum £1000 upwards and at least a months work (most of that time spent on the SEM and on one of our clusters processign the data).
Am I near the truth if I say that you analyze lots of residual bits to see how the drive usually manages to overwrite a one, and then use that to get a fuzzy logic version of the contents of the drive?
Yes that kinda explains the proces. It's rather complex and not something I am fully versed in (it not being my field, I process the data) but I will have a shot at explaining. It is possible to analyse the individual bits and predict what the byte was before by seeing what has "moved" (i.e. when you zero a byte or a cluster it simply moves all the 1 bits to zero)
The reason 3 passes defeats is (mostly) is that it deliberately makes sure every bit is moved at least once (for example by writing first FF and then 00 to it). Then a final zeroing pass. Because you write the inverse of the first pass on the second run it ensures every "pin" is moved. Then when you write 0's anything that can be reconstructed is just the random garbage from the second pass.
Anyway; a 120GB disk will produce about 1TB of statistical data from the SEM process - which we can analyse. Once you get a handle on a few "known" files (like the OS ones) you can begin to rebuild unknown protions based on that data. Keyword recognition and file signatures help identify when we succesfully recover something.
You are talking about a weeks processing on 25 node cluster (100 cores).
I'm more or less wondering if a RAID:ed system (with say 64kb big chunks of data) will make it impossible to recover the data.
I suspect that a RAID would foil it. For a start we would need to program in the facility to rebuild the RAID (and analyse based on Chunks). I doubt it would work out.
We do quote a price for SEM Raid recovery but it is in the 10's of thousands - a.k.a no thanks :D
Because of how standard data wiping software works one single pass would leave lots of traces. Perhaps not enough to pull entire documents etc. but with professional reconstruction software quite a lot of date can be recovered (I do this every week in my job). Given that you can guess at the contents of portions of the data (the OS :)) rebuilding is fairly easy.
3 passes is the correct method. One pass writing random data, one pass writing the 2's complement data (these passes ensure every bit has been "moved") then write it out with ) 0's. This ensures nearly untraceable data.
I'm more familiar with CD-ROMs than HDDs, so I'll explain the need for encoding in those terms. The Red and Yellow Books specify that a track consists of "pits" and "lands", where pit-staying-pit and land-staying-land both represent "0", while pit-becoming-land and land-becoming-pit both represent "1". Due to limitations of the pressing process (which involves physically squeezing a piece of metal in a press to create the pits), you need a specific minimum number of "0" bits separating each "1" bit (because the metal in question isn't ductile enough). But people want to write arbitrary data to their CDs. This is a problem.
The solution: an encoding (Yellow Book, Annex D, Table D.1) maps 8-bit bytes into 14-bit "bytes". Because they had 16384 outputs for only 256 inputs, it was easy to find a list of outputs that spaced out the "1" bits with sufficient distance. (The other 16128 bit patterns are forbidden.)
Venturing out of familiar territory and back on topic, HDDs have a similar problem: high frequency bit changes are more difficult to write to disk, because the neighboring bit domains bleed into each other. That is, if you choose "north-up" to mean "1" and "north-down" to mean "0", then writing the pattern "10101010" will tend to muddle out the magnetic field, strongly increasing the likelihood that you'll lose the data (especially if it sits for a while before you write to that spot again). The solution is another mapping code: I think hard drives use something more tame (like 8-to-11) and have different requirements, but the principles are the same.
The downside from a data destruction POV: there's no guarantees that manufacturers use the same codes, even within their own product lines, since HDDs are sealed and have integrated controllers. The manufacturers don't even bother to tell you the physical encoding, because there's no need for anyone else to know it. The net result is that there is no longer any "magic code" you can write to a disk to stress sectors, guarantee data destruction, or have any other effect whatsoever upon the physical disk. All that stuff died out back in the days when IDE and SCSI hit the scene, after RLL and MFM drives died out.
I agree: 1 pass will tend to hide a lot of stuff. But when I did a quick example and showed him us recovering SAM files (windows password files) from one of his HDD's containing the MSCACHE hashes of several employees on his Windows Domain he was convinced
(edit: of course that was a lucky break - and you do have to crack the passwords too - but we got some contact info and other document segments too :)).
SDelete works by using the NT defragmentation API to discover what disk sectors are allocated to a file, so it can write to them directly. However, for whatever reason, in practice this does not work with encrypted (EFS) files. I've looked at the source for sdelete, and I can't see how it's going wrong, but I do know from experience (twice) that it does. The symptoms include blue-screens from corrupted OS files, overwritten documents, etc.
There is further corroboration here:
Since I know if someone formats a drive, I can get bits of the data back with
but if they dd it with zeros, I can't do that.
I suspect most/all data recovery firms will only deal with the 95%* of easier cases - which involve them basically running some software and not opening the drive. If they are bit better they might attempt the 4.9%* of cases where they have to replace/fix/bypass the drive's firmware.
They write off the 0.1%* with dd style problems because they would cost too much for the customer and would need a better class of staff (recovery typically only costs a three-digit sum)
That said it depends, how secure you need your data to be. I think given a budget of $100k (if not less) for one drive this could be possible (i.e. a government, competitor company). For example you could employ people from the company who makes the drives, reprogram the firmware to read the weak magnetic data etc.
* percentages made up, to illustrate the point.
If a prosecutor (or government torturer) can prove the suspect had a file and wiped it out (e.g. /documents/superillegalfile.txt), I wouldn't like to be in his place.
In the UK that would be only sideline evidence, you'd have a hard time getting the CPS (crown prosecution service) to actually prosecute based on that evidence. Recovering the file is 9/10ths of the law :P
(this is our main revenue stream btw - forensics for law enforcement).
It always adds.