Hacker News new | past | comments | ask | show | jobs | submit login
Secure deletion: a single overwrite will do it (h-online.com)
34 points by ilitirit on Mar 11, 2009 | hide | past | web | favorite | 19 comments

If you think that you can recover overwritten data, feel free to accept 'The Great Zero Challenge' over at http://16systems.com/zero/

"Q. What is this?

A. A challenge to confirm whether or not a professional, established data recovery firm can recover data from a hard drive that has been overwritten with zeros once. We used the 32 year-old Unix dd command using /dev/zero as input to overwrite the drive. [...]"

It's been over a year, and nobody has accepted the challenge yet. Even if there isn't any prize money to win, I'd think that the PR opportunity would be quite enough for any data recovery firm to do it.

So my conclusion is: overwriting once is plenty good enough. You want to overwrite the whole disk though, otherwise the filesystem might leave metadata clues even after the file has been overwritten and unlinked.

I'd love to take them on. However they are right it is highly unlikely data can be recovered from that drive. They have only a few folders/files on there it seems leaving little to go on to rebuild the image.

It's not worth it - who needs the PR.. it's worthless when we recover zeroed disks frequently anyway :) If their willing to meet the cost to recover it (I'd do it cheap for £600). We are UK based so I guess they wont.

I dont think you can draw the conclusion that overwriting once is plenty fine based on their conclusions. At least not till: - Someone has tried their disk (people not wanting too is different from giving it a shot :)) - Someone trying a more real life example (install an OS, then copy some files in, then DD it).

Considering the second one has been done... :)

They are quite welcome to clone any old OS drive onto a disk and wipe it the same way with DD. Then fly it over to us here and pay the £1000 (ish) cost to recover it. (yes, I know that is a bit outrageous but so is their "challenge").

"Considering the second one has been done... :)"

It has? Do you have any source/article you can link to? I'd like to read about it and how it's done, seems to me like you'd need a fair amount of black magic to do it!

Well umm no article, rather practical experience :).

As I explained elsewhere we get wiped disks sent to us weekly. Some will have deen blanked with 0's (perhaps one a month). I know of only a few that specifically have had dd used on them - but usually we dont know the story of the disks :) so it could be higher.

We have a SEM that produces an image for our analysts to rebuild with a variety of software packages (Encase Enterprise is one example, and we have several pieces of kit from accessdata. Plus scripts/programs written in house).

With a zeroed disk your looking at minimum £1000 upwards and at least a months work (most of that time spent on the SEM and on one of our clusters processign the data).

Excuse the ignorant, but what is SEM short for?

Am I near the truth if I say that you analyze lots of residual bits to see how the drive usually manages to overwrite a one, and then use that to get a fuzzy logic version of the contents of the drive?

SEM = Scanning Electron Microscope. Actually when I say "ours" it is jointly owned by a local university who house it and use it when we dont need it. It is specialised (or rather adapted) for HDD scans though.

Yes that kinda explains the proces. It's rather complex and not something I am fully versed in (it not being my field, I process the data) but I will have a shot at explaining. It is possible to analyse the individual bits and predict what the byte was before by seeing what has "moved" (i.e. when you zero a byte or a cluster it simply moves all the 1 bits to zero)

The reason 3 passes defeats is (mostly) is that it deliberately makes sure every bit is moved at least once (for example by writing first FF and then 00 to it). Then a final zeroing pass. Because you write the inverse of the first pass on the second run it ensures every "pin" is moved. Then when you write 0's anything that can be reconstructed is just the random garbage from the second pass.

Anyway; a 120GB disk will produce about 1TB of statistical data from the SEM process - which we can analyse. Once you get a handle on a few "known" files (like the OS ones) you can begin to rebuild unknown protions based on that data. Keyword recognition and file signatures help identify when we succesfully recover something.

You are talking about a weeks processing on 25 node cluster (100 cores).

Interesting. Are the known files required to be able to find the data, and how big they have to be?

I'm more or less wondering if a RAID:ed system (with say 64kb big chunks of data) will make it impossible to recover the data.

No not needed: but they shortcut the process because the software can get a handle on the data it is given a little better. It just cuts the processing down a bit (never tested it without that kind of searching so I couldnt say how much).

I suspect that a RAID would foil it. For a start we would need to program in the facility to rebuild the RAID (and analyse based on Chunks). I doubt it would work out.

We do quote a price for SEM Raid recovery but it is in the 10's of thousands - a.k.a no thanks :D

I was going to say that my opinion had shifted - except that ErrantX's reply seems reasonable. A $500 prize with relatively little publicity seems unreasonable for an operation that putatively takes a specialized SEM and a week of clustered computer time. A $50,000 prize (possibly paid for by insurance) would be more to the point, if they're really confident; and if the recoverers need a standard OS or other recognizable standard files in order to decode the mechanical operation of the HD, why not make that part of the challenge too?

Whilst I am in agreement 35 writes is serious overkill I would dispute that a single write is suitable.

Because of how standard data wiping software works one single pass would leave lots of traces. Perhaps not enough to pull entire documents etc. but with professional reconstruction software quite a lot of date can be recovered (I do this every week in my job). Given that you can guess at the contents of portions of the data (the OS :)) rebuilding is fairly easy.

3 passes is the correct method. One pass writing random data, one pass writing the 2's complement data (these passes ensure every bit has been "moved") then write it out with ) 0's. This ensures nearly untraceable data.

Complements (either 1's or 2's) are no longer particularly useful patterns with modern HDD codings, which are wrapped in Reed-Solomon and then heavily encoded before hitting the disk.

I'm more familiar with CD-ROMs than HDDs, so I'll explain the need for encoding in those terms. The Red and Yellow Books specify that a track consists of "pits" and "lands", where pit-staying-pit and land-staying-land both represent "0", while pit-becoming-land and land-becoming-pit both represent "1". Due to limitations of the pressing process (which involves physically squeezing a piece of metal in a press to create the pits), you need a specific minimum number of "0" bits separating each "1" bit (because the metal in question isn't ductile enough). But people want to write arbitrary data to their CDs. This is a problem.

The solution: an encoding (Yellow Book, Annex D, Table D.1) maps 8-bit bytes into 14-bit "bytes". Because they had 16384 outputs for only 256 inputs, it was easy to find a list of outputs that spaced out the "1" bits with sufficient distance. (The other 16128 bit patterns are forbidden.)

Venturing out of familiar territory and back on topic, HDDs have a similar problem: high frequency bit changes are more difficult to write to disk, because the neighboring bit domains bleed into each other. That is, if you choose "north-up" to mean "1" and "north-down" to mean "0", then writing the pattern "10101010" will tend to muddle out the magnetic field, strongly increasing the likelihood that you'll lose the data (especially if it sits for a while before you write to that spot again). The solution is another mapping code: I think hard drives use something more tame (like 8-to-11) and have different requirements, but the principles are the same.

The downside from a data destruction POV: there's no guarantees that manufacturers use the same codes, even within their own product lines, since HDDs are sealed and have integrated controllers. The manufacturers don't even bother to tell you the physical encoding, because there's no need for anyone else to know it. The net result is that there is no longer any "magic code" you can write to a disk to stress sectors, guarantee data destruction, or have any other effect whatsoever upon the physical disk. All that stuff died out back in the days when IDE and SCSI hit the scene, after RLL and MFM drives died out.

I would be inclined to agree with you. I don't think the inconvenience is too great to wipe with three passes just to make sure.

I had a bigt discussion about this not very long ago with a client we were destroying some disks for. He had read something along similar lines (one pass is fine).

I agree: 1 pass will tend to hide a lot of stuff. But when I did a quick example and showed him us recovering SAM files (windows password files) from one of his HDD's containing the MSCACHE hashes of several employees on his Windows Domain he was convinced

(edit: of course that was a lucky break - and you do have to crack the passwords too - but we got some contact info and other document segments too :)).

I learned a nice trick with OS X's Disk Utility the other day. You can securely delete already-deleted files, after the fact, with this tool. Just select a volume, go to the Erase tab, and click "Erase Free Space..."

I'd like to add something to this discussion, though, so others won't feel the pain I once felt some time ago: Sdelete from SysInternals is not safe to use if you are also using EFS (encrypted files on XP Pro / Vista etc.). You will get random corruption of random files on your file system.

SDelete works by using the NT defragmentation API to discover what disk sectors are allocated to a file, so it can write to them directly. However, for whatever reason, in practice this does not work with encrypted (EFS) files. I've looked at the source for sdelete, and I can't see how it's going wrong, but I do know from experience (twice) that it does. The symptoms include blue-screens from corrupted OS files, overwritten documents, etc.

There is further corroboration here:



I can kind of believe this is true.

Since I know if someone formats a drive, I can get bits of the data back with strings /dev/sda

but if they dd it with zeros, I can't do that.

I suspect most/all data recovery firms will only deal with the 95%* of easier cases - which involve them basically running some software and not opening the drive. If they are bit better they might attempt the 4.9%* of cases where they have to replace/fix/bypass the drive's firmware. They write off the 0.1%* with dd style problems because they would cost too much for the customer and would need a better class of staff (recovery typically only costs a three-digit sum)

That said it depends, how secure you need your data to be. I think given a budget of $100k (if not less) for one drive this could be possible (i.e. a government, competitor company). For example you could employ people from the company who makes the drives, reprogram the firmware to read the weak magnetic data etc.

* percentages made up, to illustrate the point.

Most people forget overwriting on the directory file name holding structures (of the different file systems.) The names of the deleted files stay there for long time. This is more so on FAT and FFS-type directories (guesstimate ~99% of file systems.)

If a prosecutor (or government torturer) can prove the suspect had a file and wiped it out (e.g. /documents/superillegalfile.txt), I wouldn't like to be in his place.

> If a prosecutor (or government torturer) can prove the suspect had a file and wiped it out (e.g. /documents/superillegalfile.txt), I wouldn't like to be in his place.

In the UK that would be only sideline evidence, you'd have a hard time getting the CPS (crown prosecution service) to actually prosecute based on that evidence. Recovering the file is 9/10ths of the law :P

(this is our main revenue stream btw - forensics for law enforcement).

If [the proof of file name deleted] is backed by other evidence, like ISP logs, CPS will likely hold [the case] valid. In this example, the suspect cant deny it was him instead a housemate or neighbour.

It always adds.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact