Actually recovered quite a few people's homework assignments that way. Of course, you could also write to that file, so we had free-space sweepers. And since you could write to that empty space, you could also leave messages, even from the public (100,0) account, which had a zero permanent disk quota (all files deleted on logout).
My editor was still open, and the gocode code-completion daemon too. I looked up some way to dump a process's memory from StackOverflow and used it. (I used a Python script, but I see gdb's gcore command and other ways recommended elsewhere.) It worked out. An extra complication was that my home directory is encrypted, so searching the raw disk was out.
So: never too early to use version control. Also, if you think you even might have fat-fingered something, think a second before you do anything that could make the situation worse. I didn't pay enough attention to the error message that could have told me I'd messed up, and answered 'yes' to my editor asking if I should close the file, thinking it had just moved--if I hadn't done that it'd've saved a lot of time. On the other hand, had I kept on charging forth once I realized I had messed up, I could've easily closed the editor, ending the gocode process I wound up recovering stuff from.
So, yeah: be smarter than I was, y'all.
You haven't lived until you unwittingly run rm -rf on a nfs mount of / on a remote box. Which happens to be the fileserver for a trading shop. In the middle of trading hours.
Pity he'd not considered that the old one had nfs mounts to the live data...
Thankfully we had two machines which were virtually identical OS-wise (RedHat 6 if memory serves). I was able to get everything put back from the twin machine and keep everybody happy.
Thankfully that server kept running with relatively little issue the entire time even with all those core OS files gone. I don't think any customers were at all aware.
The extent to which new calls to deleted files are made will have a strong impact on this.
I seem to remember having a problem where I'd rm-ed a file I needed and using lsof to find the file handle and then being able to cat the data in to a new file using the handle instead of the filename? Details pretty fuzzy, sorry.
Example of recovery this way, tested now, works for me - http://pastebin.com/c2djEcqr - the crucial part that I was forgetting is that there needs to be a file handle somewhere that's still open, which is probably not true in most cases. Worth a quick check before unmounting.
I've used ddrescue and photorec for these sort of "issues" before with much success.
2) Run to comms room and yank out power cord.
3) Spend several days piecing files back together from backups and the remnant data on disk.
4) Learned a new respect for rm.
SSD designers developed an interface allowing the operating system (e.g. Windows, Linux, Mac OS X etc.) to inform the controller that certain blocks are no longer in use via the TRIM command. This allows the internal garbage collector to electronically erase the content of these blocks, preparing them for future write operations.
When you delete a file on a mechanical hard drive the physical contents of the file still exist on disk, so you can use tricks like these to recover deleted data.
When the drive is then told to write over these locations it doesn't matter that there is old data there and it writes the new data to the location.
SSDs however store data in pages, and while they can write directly to an empty page they can not write directly to a page that already has data in it. Instead, an SSD has to read the current data from the page, modify that data with the new data that it wants to be there, and write the new data to the whole page at once. This is called a read-modify-write operation and is a major reason why SSDs (even now) decrease in performance as they fill up.
The issue is that when you delete a file on disk there is no way for the SSD to know that those data blocks aren't important anymore (without TRIM). The controller of the SSD has to manage a full drive of data (even if you're only actually using some percent of it) and only figures out that a file was deleted when it is finally told to write something else to that location.
TRIM tells the SSD that a file was removed and allows a controller to recover that area to help maintain its performance.
There is a really good discussion of this topic in this article from way back in 2009: http://www.anandtech.com/show/2829
I once did recovery of this kind for a friend (using, I think, photorec or extundelete, not grep) and the hardest part by far was piecing together the "right" version of the files from all matching versions that were recovered from disk.
If you don't do this and you're overwriting a file directly and the write fails for some reason, the data from the old file will be gone and you'll only have a partially-written new file in its place.
This also helps with systems that continuously poll files and watch for changes. If you have, say, a compiler watching your file, you don't want it to start compiling a partially-written version of your file and give you some strange error just because it happened to poll before the write finished.
Anything bigger and you're gonna quickly need something more sophisticated which can understand the filesystem it's dealing with, as it will need to collect the many pieces of your deleted file scattered across the block device and merge them. I'm sure that would be mountains of fun to do with bash.
And in this case, the "do one job well" program that you're gonna need is a program which specifically recovers deleted files from a specific filesystem.
EDIT: though, I'd point out that if you really wanted to recover the file you should probably try to use /proc or something (at the time I didn't know about this). This approach requires crawling the disk which is obv pretty slow. :) It's less of a "here's a useful thing" and more of an excited "HEY DID YOU KNOW THAT YOU CAN DO X".
EDIT 2: I updated the blog to link to your comment, because it's baller.
You can simply copy the (proc) file to a new location to recover it.
Remember: open files keep the contents present on disk until the controlling process exists or closes the filehandle.
Since you're actually accessing the file on disk your issues of storage contingency don't come into play -- it's all read back to you in proper file order.
But yes, files (and virtual files) on Linux are pretty slick.
I also remember being really excited learning about disk image files and the ways in which they can be manipulated. Including the options of creating filesystems on virtual disks, partitioning them, and them mounting those partitions, etc. First tried with bootable floppy distros, but I've played around with them in a bunch of contexts since.
From a different perspective, hopefully /tmp is on a different filesystem from /home, otherwise the reading the man pages might overwrite the blocks you need to recover with the temporary files they produce. (And less, more, sort, etc.) Also, doing Google/StackOverflow searches is probably unwise due to the browser writing stuff to the 50 MB disk cache (FF default, anyway) on the filesystem you want. Probably step 1 should be "remount the partition read-only". Or better yet, "find another computer to use for research" :)
Glad I'm not the only one. :-)
However, that is rather a specific case, albeit a common one. I have lost count of how many times I've seen even very experienced sysadmins do something disastrous by accident that is entirely due to the poor usability of some Linux shell or other command line-driven software with a similar design style and culture.
I have seen someone nuke an entire system, with a shell script that failed at string interpolation and literally did an 'rm -rf /', after I explicitly warned them of the danger and they thought they'd guarded against it. That person was a very capable sysadmin with many years of experience, but expecting anyone to never make a mistake with that kind of system is like expecting a similarly experienced programmer to write bug-free code with nothing but an 80x25 terminal window and a line editor.
That being said, there's certainly something to that one event doing more to reform my being fast and loose with destructive commands than years of being told/telling myself to do so. (Something likely being that I'm apparently a slow learner.)
Now that I think about it, it shouldn't be to hard to turn this into a workable product for general use. Automatically take a snapshot every 5 minutes, and present a the user a program that broweses the filesystem at time X, probably with integration into the filemanager to restore files/folders. Practically speacking, it needs some form of pruning. Probably along the lines of save every 5 minutes for the past hour, etc. My only concern with this is how optimized for frequent snapshots Btrfs is. Either way, I know what I am doing this weekend.
Something like this? http://java.dzone.com/news/killer-feature-opensolaris-200
Of course, I might be reading to much into the continous nature of a slider. Does anyone have experience with that feature?
In OpenSolaris it just used cronjobs to create zfs snapshots.
My success rate in recovery was markedly better than Navigator's capability in running without crapping out.
The most painful story I've hear was of a BSD admin who had to recover gzipped financial audit files from a corrupted disk, requiring both recovery from media and reconstructed files from fragments of the compressed files. Apparently somewhat painful, but not entirely without success.