
Recovering deleted files using only grep - ezisezis
http://blog.nullspace.io/recovering-deleted-files-using-only-grep.html
======
lelf
I'm wondering a lot why it's 2014 and
[https://gist.github.com/llelf/7862414](https://gist.github.com/llelf/7862414)
(it's hammerfs on DragonflyBSD) is WOW (and not some ordinary boring thing
everyone use)

~~~
natch
parse error

~~~
danielweber
I _think_ he's saying "how come it's now 2014 and everyone doesn't use
<awesome tool> yet?"

------
mpweiher
We used to do free-space scanning on our PDP-11. Disk blocks were not cleared,
so you could just open a very large temporary file and look through it.

Actually recovered quite a few people's homework assignments that way. Of
course, you could also write to that file, so we had free-space sweepers. And
since you could write to that empty space, you could also leave messages, even
from the public (100,0) account, which had a zero permanent disk quota (all
files deleted on logout).

~~~
davidgerard
Yeah, I did that too. Was fun!

------
peterwwillis
Recovering deleted files on ext3/4 systems using either file-format-matching
tools or the journal [http://linux.sys-con.com/node/117909](http://linux.sys-
con.com/node/117909) (also
[https://wiki.archlinux.org/index.php/File_recovery](https://wiki.archlinux.org/index.php/File_recovery))

------
twotwotwo
Yesterday I stupidly deleted sme code I'd been toying with by running 'rm file
dir' instead of 'mv file dir'. It was not yet in any version control, but
there was a chunk of work since the last editor-made backup I could find.
(Right there is problem 1.)

My editor was still open, and the gocode code-completion daemon too. I looked
up some way to dump a process's memory from StackOverflow and used it. (I used
a Python script[1], but I see gdb's gcore command and other ways recommended
elsewhere.) It worked out. An extra complication was that my home directory is
encrypted, so searching the raw disk was out.

So: never too early to use version control. Also, if you think you even might
have fat-fingered something, think a second before you do anything that could
make the situation worse. I didn't pay enough attention to the error message
that could have told me I'd messed up, and answered 'yes' to my editor asking
if I should close the file, thinking it had just moved--if I hadn't done that
it'd've saved a lot of time. On the other hand, had I kept on charging forth
once I realized I had messed up, I could've easily closed the editor, ending
the gocode process I wound up recovering stuff from.

So, yeah: be smarter than I was, y'all.

[1] [http://unix.stackexchange.com/questions/6267/how-to-re-
load-...](http://unix.stackexchange.com/questions/6267/how-to-re-load-all-
running-applications-from-swap-space-into-ram/6271#6271)

------
madaxe_again
Yeah, been there, done that, although my first step is to umount and dd onto
another volume so the sectors don't get reallocated and obliterate the data -
and that's generally a sane first step for ANY recovery of this ilk.

You haven't lived until you unwittingly run rm -rf on a nfs mount of / on a
remote box. Which happens to be the fileserver for a trading shop. In the
middle of trading hours.

~~~
snori74
Indeed. The whole "never run _rm -rf_ " thing is so ingrained that someone I
know took real joy in _finally_ getting to legitimately use this to nuke an
old server once the new one was signed off.

Pity he'd not considered that the old one had nfs mounts to the live data...

~~~
aquadrop
I think 'rm -rf /' shouldn't work and there should be special command instead
of 'rm -rf /', something like rmrfhell, which would ask you three times if you
sure and will tell you about pain and tears of other people who ran it.

~~~
LukeShu
In the GNU version, that mostly is the case. You need to specify `--no-
preserve-root` for it to work, which is obscure enough.

------
Qiasfah
I guess it's a good thing he didn't have an SSD and TRIM enabled!

~~~
amckenna
Not sure why you are getting down-voted. I assume you are referencing the SSD
forensics article that was on here a while back that brought up the fact that
TRIM on an SSD can cause forensic recovery issues. Basically blocks can be
cleared even if the drive is unmounted but powered on.

 _SSD designers developed an interface allowing the operating system (e.g.
Windows, Linux, Mac OS X etc.) to inform the controller that certain blocks
are no longer in use via the TRIM command. This allows the internal garbage
collector to electronically erase the content of these blocks, preparing them
for future write operations._

[http://forensic.belkasoft.com/en/why-ssd-destroy-court-
evide...](http://forensic.belkasoft.com/en/why-ssd-destroy-court-evidence)

~~~
Qiasfah
That forensic article is a consequence of what I was referring to.

When you delete a file on a mechanical hard drive the physical contents of the
file still exist on disk, so you can use tricks like these to recover deleted
data.

When the drive is then told to write over these locations it doesn't matter
that there is old data there and it writes the new data to the location.

SSDs however store data in pages, and while they can write directly to an
empty page they can not write directly to a page that already has data in it.
Instead, an SSD has to read the current data from the page, modify that data
with the new data that it wants to be there, and write the new data to the
whole page at once. This is called a read-modify-write operation and is a
major reason why SSDs (even now) decrease in performance as they fill up.

The issue is that when you delete a file on disk there is no way for the SSD
to know that those data blocks aren't important anymore (without TRIM). The
controller of the SSD has to manage a full drive of data (even if you're only
actually using some percent of it) and only figures out that a file was
deleted when it is finally told to write something else to that location.

TRIM tells the SSD that a file was removed and allows a controller to recover
that area to help maintain its performance.

There is a really good discussion of this topic in this article from way back
in 2009:
[http://www.anandtech.com/show/2829](http://www.anandtech.com/show/2829)

------
a3_nm
How could this work? I would expect this method to yield a bunch of matches
corresponding to every version of the file that was once saved and continues
to live on the disk. Unless you happen to have a string that only existed in
the last version or so, but that's hard to come by...

I once did recovery of this kind for a friend (using, I think, photorec or
extundelete, not grep) and the hardest part by far was piecing together the
"right" version of the files from all matching versions that were recovered
from disk.

~~~
danielweber
I don't think there are any guarantees here, but if he kept on saving the same
file with the same name, it probably overwrote the same sector on disk.

~~~
a3_nm
I guess it depends on the text editor. Some editors will save new versions of
a file with the same name by writing the current buffer to a new file and then
atomically moving that file to the target location, which has no reason to
write on the same sectors.

~~~
rnicholson
What is the reason for saving then moving for existing files? This an
optimization specific to certain file systems?

~~~
samdk
Moving a file (within the same file system) is an atomic operation on most
file systems, but writing data is not.

If you don't do this and you're overwriting a file directly and the write
fails for some reason, the data from the old file will be gone and you'll only
have a partially-written new file in its place.

This also helps with systems that continuously poll files and watch for
changes. If you have, say, a compiler watching your file, you don't want it to
start compiling a partially-written version of your file and give you some
strange error just because it happened to poll before the write finished.

------
moron4hire
A very important lesson regarding the Unix principles of "everything is a
file" and programs that "do one job well".

~~~
sparkie
It's kind of flawed where he says " If you pick x to be big enough, you should
get the entire file, plus a bit of junk around the edges." \- because this is
simply not how filesystems work - if you manage to get the full file, it's by
luck that it was small enough for the filesystem to allocate its contents
contiguously on the block device.

Anything bigger and you're gonna quickly need something more sophisticated
which can understand the filesystem it's dealing with, as it will need to
collect the many pieces of your deleted file scattered across the block device
and merge them. I'm sure that would be mountains of fun to do with bash.

And in this case, the "do one job well" program that you're gonna need is a
program which specifically recovers deleted files from a specific filesystem.

~~~
antics
I'm the author of the post -- that's good to know, thanks! :) I know
embarrassingly little about filesystems. I'm glad you pointed this out.

EDIT: though, I'd point out that if you really wanted to recover the file you
should probably try to use /proc or something (at the time I didn't know about
this). This approach requires crawling the disk which is obv pretty slow. :)
It's less of a "here's a useful thing" and more of an excited "HEY DID YOU
KNOW THAT YOU CAN DO X".

EDIT 2: I updated the blog to link to your comment, because it's baller.

~~~
dredmorbius
If _processes_ which hold your file open are still running, then you can
access the file via the /proc/<pid>/fd/ entries. Run an 'ls -l' in the proc
directory of the process to see those.

You can simply copy the (proc) file to a new location to recover it.

Remember: open files keep the contents present on disk _until the controlling
process exists or closes the filehandle_.

Since you're actually accessing _the file on disk_ your issues of storage
contingency don't come into play -- it's all read back to you in proper file
order.

But yes, files (and virtual files) on Linux are pretty slick.

I also remember being really excited learning about disk image files and the
ways in which they can be manipulated. Including the options of creating
filesystems on virtual disks, partitioning them, and them mounting those
partitions, etc. First tried with bootable floppy distros, but I've played
around with them in a bunch of contexts since.

------
dredmorbius
Back in the days when Netscape Navigator was the browser of choice on Linux, I
found more than once I could recover the contents of a half-composed post to a
web log by searching through /proc/kcore (or its precursor -- I think that's
changed), and looking for string fragments from the file.

My success rate in recovery was markedly better than Navigator's capability in
running without crapping out.

The most painful story I've hear was of a BSD admin who had to recover gzipped
financial audit files from a corrupted disk, requiring both recovery from
media _and_ reconstructed files from fragments of the compressed files.
Apparently somewhat painful, but not entirely without success.

------
ScottBurson
I once used dd and grep to recover files from a badly corrupted ZFS pool. It
was painful. (The cause was a slowly failing PSU -- the +12v line was not
holding up under load. So some writes would successfully make it onto the
disks and some wouldn't. I don't blame ZFS for failing under such
circumstances; but I still wished it had some kind of salvager.)

