Hacker News new | past | comments | ask | show | jobs | submit login
Recovering deleted files using only grep (nullspace.io)
142 points by ezisezis on June 25, 2014 | hide | past | favorite | 45 comments



I'm wondering a lot why it's 2014 and https://gist.github.com/llelf/7862414 (it's hammerfs on DragonflyBSD) is WOW (and not some ordinary boring thing everyone use)


Yep.. installed Dragonfly last week on my home server, the concept of virtual kernels and Hammerfs was the selling point for me, +1 for you sir.


parse error


I think he's saying "how come it's now 2014 and everyone doesn't use <awesome tool> yet?"


We used to do free-space scanning on our PDP-11. Disk blocks were not cleared, so you could just open a very large temporary file and look through it.

Actually recovered quite a few people's homework assignments that way. Of course, you could also write to that file, so we had free-space sweepers. And since you could write to that empty space, you could also leave messages, even from the public (100,0) account, which had a zero permanent disk quota (all files deleted on logout).


Yeah, I did that too. Was fun!


Recovering deleted files on ext3/4 systems using either file-format-matching tools or the journal http://linux.sys-con.com/node/117909 (also https://wiki.archlinux.org/index.php/File_recovery)


Yesterday I stupidly deleted sme code I'd been toying with by running 'rm file dir' instead of 'mv file dir'. It was not yet in any version control, but there was a chunk of work since the last editor-made backup I could find. (Right there is problem 1.)

My editor was still open, and the gocode code-completion daemon too. I looked up some way to dump a process's memory from StackOverflow and used it. (I used a Python script[1], but I see gdb's gcore command and other ways recommended elsewhere.) It worked out. An extra complication was that my home directory is encrypted, so searching the raw disk was out.

So: never too early to use version control. Also, if you think you even might have fat-fingered something, think a second before you do anything that could make the situation worse. I didn't pay enough attention to the error message that could have told me I'd messed up, and answered 'yes' to my editor asking if I should close the file, thinking it had just moved--if I hadn't done that it'd've saved a lot of time. On the other hand, had I kept on charging forth once I realized I had messed up, I could've easily closed the editor, ending the gocode process I wound up recovering stuff from.

So, yeah: be smarter than I was, y'all.

[1] http://unix.stackexchange.com/questions/6267/how-to-re-load-...


Yeah, been there, done that, although my first step is to umount and dd onto another volume so the sectors don't get reallocated and obliterate the data - and that's generally a sane first step for ANY recovery of this ilk.

You haven't lived until you unwittingly run rm -rf on a nfs mount of / on a remote box. Which happens to be the fileserver for a trading shop. In the middle of trading hours.


Indeed. The whole "never run rm -rf" thing is so ingrained that someone I know took real joy in finally getting to legitimately use this to nuke an old server once the new one was signed off.

Pity he'd not considered that the old one had nfs mounts to the live data...


I think 'rm -rf /' shouldn't work and there should be special command instead of 'rm -rf /', something like rmrfhell, which would ask you three times if you sure and will tell you about pain and tears of other people who ran it.


In the GNU version, that mostly is the case. You need to specify `--no-preserve-root` for it to work, which is obscure enough.


Most versions have --preserve-root (fail if the target is /) on by default, so you have to use --no-preserve-root , although rm -rf /* will still work.


I was sysadmin for a small ISP about 15 years ago. I was on vacation about 1500 miles away when I called to check in, the owner gets on the phone and says, "I deleted the entire /bin directory on [the primary web server], is that bad?" I told him to not touch it til I got home, and whatever he did do NOT shut it off!

Thankfully we had two machines which were virtually identical OS-wise (RedHat 6 if memory serves). I was able to get everything put back from the twin machine and keep everybody happy.

Thankfully that server kept running with relatively little issue the entire time even with all those core OS files gone. I don't think any customers were at all aware.


Open processes will hold open their files. So long as it's an 'rm' that you've run (which merely removes directory entries) and not a destructive action on the disk contents themselves, it's often possible for things to continue in a startlingly unaffected manner. Though not always.

The extent to which new calls to deleted files are made will have a strong impact on this.


Is unmount really the best first action, surely that forces a sync and that could be when the data actually gets removed?

I seem to remember having a problem where I'd rm-ed a file I needed and using lsof to find the file handle and then being able to cat the data in to a new file using the handle instead of the filename? Details pretty fuzzy, sorry.

Edit:

Example of recovery this way, tested now, works for me - http://pastebin.com/c2djEcqr - the crucial part that I was forgetting is that there needs to be a file handle somewhere that's still open, which is probably not true in most cases. Worth a quick check before unmounting.

I've used ddrescue and photorec for these sort of "issues" before with much success.


What did you do in that case?


1) Panic.

2) Run to comms room and yank out power cord.

3) Spend several days piecing files back together from backups and the remnant data on disk.

4) Learned a new respect for rm.


I guess it's a good thing he didn't have an SSD and TRIM enabled!


Not sure why you are getting down-voted. I assume you are referencing the SSD forensics article that was on here a while back that brought up the fact that TRIM on an SSD can cause forensic recovery issues. Basically blocks can be cleared even if the drive is unmounted but powered on.

SSD designers developed an interface allowing the operating system (e.g. Windows, Linux, Mac OS X etc.) to inform the controller that certain blocks are no longer in use via the TRIM command. This allows the internal garbage collector to electronically erase the content of these blocks, preparing them for future write operations.

http://forensic.belkasoft.com/en/why-ssd-destroy-court-evide...


That forensic article is a consequence of what I was referring to.

When you delete a file on a mechanical hard drive the physical contents of the file still exist on disk, so you can use tricks like these to recover deleted data.

When the drive is then told to write over these locations it doesn't matter that there is old data there and it writes the new data to the location.

SSDs however store data in pages, and while they can write directly to an empty page they can not write directly to a page that already has data in it. Instead, an SSD has to read the current data from the page, modify that data with the new data that it wants to be there, and write the new data to the whole page at once. This is called a read-modify-write operation and is a major reason why SSDs (even now) decrease in performance as they fill up.

The issue is that when you delete a file on disk there is no way for the SSD to know that those data blocks aren't important anymore (without TRIM). The controller of the SSD has to manage a full drive of data (even if you're only actually using some percent of it) and only figures out that a file was deleted when it is finally told to write something else to that location.

TRIM tells the SSD that a file was removed and allows a controller to recover that area to help maintain its performance.

There is a really good discussion of this topic in this article from way back in 2009: http://www.anandtech.com/show/2829


How could this work? I would expect this method to yield a bunch of matches corresponding to every version of the file that was once saved and continues to live on the disk. Unless you happen to have a string that only existed in the last version or so, but that's hard to come by...

I once did recovery of this kind for a friend (using, I think, photorec or extundelete, not grep) and the hardest part by far was piecing together the "right" version of the files from all matching versions that were recovered from disk.


I don't think there are any guarantees here, but if he kept on saving the same file with the same name, it probably overwrote the same sector on disk.


I guess it depends on the text editor. Some editors will save new versions of a file with the same name by writing the current buffer to a new file and then atomically moving that file to the target location, which has no reason to write on the same sectors.


What is the reason for saving then moving for existing files? This an optimization specific to certain file systems?


Moving a file (within the same file system) is an atomic operation on most file systems, but writing data is not.

If you don't do this and you're overwriting a file directly and the write fails for some reason, the data from the old file will be gone and you'll only have a partially-written new file in its place.

This also helps with systems that continuously poll files and watch for changes. If you have, say, a compiler watching your file, you don't want it to start compiling a partially-written version of your file and give you some strange error just because it happened to poll before the write finished.


So if there's a write failure the user won't lose his file.


A very important lesson regarding the Unix principles of "everything is a file" and programs that "do one job well".


It's kind of flawed where he says " If you pick x to be big enough, you should get the entire file, plus a bit of junk around the edges." - because this is simply not how filesystems work - if you manage to get the full file, it's by luck that it was small enough for the filesystem to allocate its contents contiguously on the block device.

Anything bigger and you're gonna quickly need something more sophisticated which can understand the filesystem it's dealing with, as it will need to collect the many pieces of your deleted file scattered across the block device and merge them. I'm sure that would be mountains of fun to do with bash.

And in this case, the "do one job well" program that you're gonna need is a program which specifically recovers deleted files from a specific filesystem.


I'm the author of the post -- that's good to know, thanks! :) I know embarrassingly little about filesystems. I'm glad you pointed this out.

EDIT: though, I'd point out that if you really wanted to recover the file you should probably try to use /proc or something (at the time I didn't know about this). This approach requires crawling the disk which is obv pretty slow. :) It's less of a "here's a useful thing" and more of an excited "HEY DID YOU KNOW THAT YOU CAN DO X".

EDIT 2: I updated the blog to link to your comment, because it's baller.


If processes which hold your file open are still running, then you can access the file via the /proc/<pid>/fd/ entries. Run an 'ls -l' in the proc directory of the process to see those.

You can simply copy the (proc) file to a new location to recover it.

Remember: open files keep the contents present on disk until the controlling process exists or closes the filehandle.

Since you're actually accessing the file on disk your issues of storage contingency don't come into play -- it's all read back to you in proper file order.

But yes, files (and virtual files) on Linux are pretty slick.

I also remember being really excited learning about disk image files and the ways in which they can be manipulated. Including the options of creating filesystems on virtual disks, partitioning them, and them mounting those partitions, etc. First tried with bootable floppy distros, but I've played around with them in a bunch of contexts since.


Last time I ran fsck on my ext2 partition the fragmentation ratio was pretty low, and I tend to fill up my disks. Fortunately, homework assignments tend to be shorter, and more likely to fit in a contiguous spot. Anyway, what else can you do?

From a different perspective, hopefully /tmp is on a different filesystem from /home, otherwise the reading the man pages might overwrite the blocks you need to recover with the temporary files they produce. (And less, more, sort, etc.) Also, doing Google/StackOverflow searches is probably unwise due to the browser writing stuff to the 50 MB disk cache (FF default, anyway) on the filesystem you want. Probably step 1 should be "remount the partition read-only". Or better yet, "find another computer to use for research" :)


Also a very important lesson regarding the dangers of poor user interface design. It's a little bit crazy that in 2014 we still have a significant amount of serious work being done on systems where a slip of the finger or a one-character typo in a script can literally destroy whole systems with no confirmation and no reliable recovery mechanism.


This is why every system I administer has 'rm' aliased to 'rm -i' (along with 'cp' and 'mv' just in case). I believe this is the default on RHEL/CentOS boxes. Certainly for root, but should be for every user. Sure, it can be a pain sometimes to have to confirm, but at least you get the chance....unless you add '-f'.


This is why every system I administer has 'rm' aliased to 'rm -i' (along with 'cp' and 'mv' just in case).

Glad I'm not the only one. :-)

However, that is rather a specific case, albeit a common one. I have lost count of how many times I've seen even very experienced sysadmins do something disastrous by accident that is entirely due to the poor usability of some Linux shell or other command line-driven software with a similar design style and culture.

I have seen someone nuke an entire system, with a shell script that failed at string interpolation and literally did an 'rm -rf /', after I explicitly warned them of the danger and they thought they'd guarded against it. That person was a very capable sysadmin with many years of experience, but expecting anyone to never make a mistake with that kind of system is like expecting a similarly experienced programmer to write bug-free code with nothing but an 80x25 terminal window and a line editor.


Nothing makes you appreciate "don't miss" like deleting /etc on a live system. For a good few weeks after that I nearly introduced a peer review process to my own shell.

That being said, there's certainly something to that one event doing more to reform my being fast and loose with destructive commands than years of being told/telling myself to do so. (Something likely being that I'm apparently a slow learner.)


Looking at the man page now, there is now a -I option, which asks only once per 3 removals.


We are moving in the right direction with copy-on-write snapshots. What would be neat is an 'immutable' filesystem, where nothing is erased (up to garbage collection). This is likely to extreme to be practical, as we don't want to copy an entire block to change one bit, or read through a journel for every fs action. Even in theory, we don't want do spend the diskspace to record the precise state at every point in time.

Now that I think about it, it shouldn't be to hard to turn this into a workable product for general use. Automatically take a snapshot every 5 minutes, and present a the user a program that broweses the filesystem at time X, probably with integration into the filemanager to restore files/folders. Practically speacking, it needs some form of pruning. Probably along the lines of save every 5 minutes for the past hour, etc. My only concern with this is how optimized for frequent snapshots Btrfs is. Either way, I know what I am doing this weekend.


There are certainly some very grand schemes we could adopt to improve this specific problem, but let's not overlook the simple things. Every popular OS GUI has included some sort of "recycle bin" concept for a long time. There is no reason at all that a text shell shouldn't provide the same safety net for its delete command.


> Automatically take a snapshot every 5 minutes, and present a the user a program that broweses the filesystem at time X, probably with integration into the filemanager to restore files/folders.

Something like this? http://java.dzone.com/news/killer-feature-opensolaris-200


That sounds like it is doing something fancier then periodic snapshotting, such as using an immutable filesystem, where every fs operation is inherently lossless (up to garbage collection).

Of course, I might be reading to much into the continous nature of a slider. Does anyone have experience with that feature?


(Sorry for the late reply)

In OpenSolaris it just used cronjobs to create zfs snapshots.


go find solace with plan9


Back in the days when Netscape Navigator was the browser of choice on Linux, I found more than once I could recover the contents of a half-composed post to a web log by searching through /proc/kcore (or its precursor -- I think that's changed), and looking for string fragments from the file.

My success rate in recovery was markedly better than Navigator's capability in running without crapping out.

The most painful story I've hear was of a BSD admin who had to recover gzipped financial audit files from a corrupted disk, requiring both recovery from media and reconstructed files from fragments of the compressed files. Apparently somewhat painful, but not entirely without success.


I once used dd and grep to recover files from a badly corrupted ZFS pool. It was painful. (The cause was a slowly failing PSU -- the +12v line was not holding up under load. So some writes would successfully make it onto the disks and some wouldn't. I don't blame ZFS for failing under such circumstances; but I still wished it had some kind of salvager.)




Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: