What I found most interesting from working on some FUSE filesystems (and from th...

AceJohnny2 · on April 24, 2018

> A neat trick here is that you can effectively still access the file (even restore it) if a process has the file open, through the /proc filesystem.

A technique used in the most extreme unix system recovery story I've ever read, by Al Viro:

http://yarchive.net/comp/linux/extreme_system_recovery.html

Wherein:

* system libraries are recovered from init still having them mapped/opened, as you describe

* basic system utilities like 'ln' are recreated from their syscalls and writing the assembly

* ELF binaries are recreated by crafting their headers manually

jwilk · on April 24, 2018

> access permissions are also properties of the embedding in the directory, rather than the bits of the file itself.

No, in POSIX they are properties of the file. You can use fchmod() and fchown() to change mode and ownership via an fd.

> A neat trick here is that you can effectively still access the file (even restore it) if a process has the file open, through the /proc filesystem.

Yes, you can access them; but I don't believe you can link them. (But I'd love to proven wrong. A few years ago I actually needed to un-unlink a non-regular file that was still open.)

nemo1618 · on April 24, 2018

> Yes, you can access them; but I don't believe you can link them.

I think you can if you use debugfs. I wrote a post here about recovering a running binary after deleting the file on disk: http://lukechampine.com/recoverbin.html

jwilk · on April 24, 2018

Huh, I didn't know debugfs can operate on a mounted filesystem. Sounds incredibly dangerous...

debugfs(8) manpage says that ln "does not adjust the inode reference counts". Is there a way to increase that number?

adrianmonk · on April 24, 2018

I haven't tried it, but based on the manpage, I would expect this to work:

    set_inode_field foo links_count 1

SpikedCola · on April 24, 2018

Why does it sound dangerous? (Genuinely curious, it looks like it could be a very useful tool in certain situations)

I've never used it, but it appears to operate in read-only mode by default[0]:

> -w

> Specifies that the file system should be opened in read-write mode. Without this option, the file system is opened in read-only mode.

[0] https://linux.die.net/man/8/debugfs

hiccuphippo · on April 25, 2018

> Yes, you can access them; but I don't believe you can link them.

Yes you can. I remember YouTube's flash viewer back in the day would put the downloaded flv video in /tmp and then delete it. I used to check the flash pid, go to /proc/{pid}/fd and see the symlink to the deleted file. Then a cp would give me the actual file.

nick0garvey · on April 25, 2018

I don't think this is the same as linking the file. You are not creating a new link to an existing file, you are creating a copy of it and creating a link to that.

If you modified the old file after the cp, you wouldn't see the changes in the new one.

andrewla · on April 24, 2018

That's interesting about the access permissions and ownership; I thought that access permissions in POSIX were path-dependent. Some quick experimentation indicates that ownership and access does in fact apply across hard links.

There's still some truth to the path-dependent notion, in that you may not be able to access a file through a hard link in a directory that you do not have access to, even if you have access to that same hard link through another path. But if you don't have access to the file itself then you're out of luck.

This does make sense from a security perspective, but I thought that the path-dependent checks in the kernel were strong enough to not require inode-associated ACLs.

You're right about restoring the file by re-linking to the hard link, but you can access the contents and cp it out of proc at least.

tinus_hn · on April 25, 2018

On Linux, the linkat(2) function can link up open files using an empty origin path and the AT_EMPTY_PATH flag.

That’s not part of POSIX though.

dfox · on April 24, 2018

You certainly cannot un-unlink file by linking the pseudofile in /proc somewhere, as that would involve cross-filesystem hardlinks.

But it is probably possible to write simple kernel module that would allow you to do that through some non-standard interface.

daurnimator · on April 25, 2018

Yes you can.

    linkat(AT_FDCWD, "/proc/self/fd/N", destdirfd, newname, AT_SYMLINK_FOLLOW);

Will do it. This is the longest-available `flink` syscall method. See https://lwn.net/Articles/562488/

Sadly the AT_EMPTY_PATH change was backed out between 3.11-rc7 and release. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

jwilk · on April 25, 2018

As pointed out in another comment, this doesn't work when the link count is 0:

  $ uname -rv
  4.15.0-3-amd64 #1 SMP Debian 4.15.17-1 (2018-04-19)

  $ touch foo

  $ exec 3<>foo

  $ rm foo

  $ ls -l /proc/$$/fd/3
  lrwx------ 1 jwilk jwilk 64 Apr 25 10:02 /proc/324/fd/3 -> '/home/jwilk/foo (deleted)'

  $ strace -e linkat ln -L /proc/$$/fd/3 foo
  linkat(AT_FDCWD, "/proc/3447/fd/3", AT_FDCWD, "foo", AT_SYMLINK_FOLLOW) = -1 ENOENT (No such file or directory)
  ln: failed to create hard link 'foo' => '/proc/3447/fd/3': No such file or directory

  $ sudo strace -e linkat ln -L /proc/$$/fd/3 foo
  linkat(AT_FDCWD, "/proc/3447/fd/3", AT_FDCWD, "foo", AT_SYMLINK_FOLLOW) = -1 ENOENT (No such file or directory)
  ln: failed to create hard link 'foo' => '/proc/3447/fd/3': No such file or directory
  +++ exited with 1 +++

daurnimator · on April 25, 2018

Unless you use `O_TMPFILE` without `O_EXCL` to create the file.

    open("/tmp", O_RDWR|O_TMPFILE, 0666)    = 3
    linkat(3, "", AT_FDCWD, "/tmp/bar", AT_EMPTY_PATH) = 0

andrewla · on April 24, 2018

I don't think the cross-filesystem hardlink is the problem, since the link in proc is a symbolic link.

I can do this experimentally by creating a symbolic link in /tmp (a different filesystem) to a file in /home, and then creating a hard link (with ln -L) from the symbolic link to another file in /home, and the result is a valid hardlink to the same inode as the original file.

This doesn't work through /proc for an unlinked file, but only because the underlying link call requires a path, not an inode. You can create a hard link out of proc if the file has not been deleted, though, without any cross-filesystem problems.

dfox · on April 24, 2018

You have to somehow increment inode's reference count and write reference to it into some directory.

symlink() does not increase reference count of anything and in fact its target does not have to be meaningful filename at all (although in the practical non-POSIX sense there does not exist any string that is not valid filename). One interesting ab-use of this is that you can use symlink()/readlink() as ad-hoc key-value store with atomicity guarantees (that hold true even on NFS). For example emacs uses exactly this for it's file locking mechanism.

IIRC the files in /proc/pid/fd are not true symlinks but something that behaves as both file (you can do same IO operations as on the original FD) and symlink (ie. you can readlink() them and get some string) at once.

zokier · on April 24, 2018

man 2 open section about O_TMPFILE seems to strongly imply that you can linkat from /proc/<pid>/fd to concrete file. Not sure if there are some special cases for /proc/self/fd vs /proc/<pid>/fd, but that would seem bit odd.

http://man7.org/linux/man-pages/man2/open.2.html

edit: nevermind, seems like O_TMPFILE is the one that has been special-cased here, from man 2 linkat:

> This will generally not work if the file has a link count of zero (files created with O_TMPFILE and without O_EXCL are an exception).

http://man7.org/linux/man-pages/man2/linkat.2.html

:(

amelius · on April 24, 2018

In POSIX, why can't you link a file to which you have read access, but you're not an owner or have group access? It's a pretty annoying restriction.

cat199 · on April 24, 2018

because you are modifying the file's inode, which you do not have permission to modify.

amelius · on April 24, 2018

Ok, but what does it modify besides the refcount?

cat199 · on April 24, 2018

not sure. but you don't have permission to modify that inode, hence, no permission to link. the model is pretty straightforward.

also, being able to 'create' files 'owned' by another user in other locations (by linking them into place) could create quite a few bizarre and undefined corner cases, some of which might have implications for system stability and/or security.

jwilk · on April 25, 2018

But... traditionally, Unix systems do allow creating hardlinks of other users' files. And yes, this misfeature is a source of great number of security holes.

An option to disable this behavior (/proc/sys/fs/protected_hardlinks) was addded only in Linux 3.6, and then it's still disabled by default.

foobiekr · on April 25, 2018

consider what would happen if the file was counted against someone's quota and they rm'd the file but your link was still outstanding.

nine_k · on April 25, 2018

I suppose you can read the file if you can intercept an open fd, read it, and write it somewhere. It could possibly be the now vacant previous location.

jwilk · on April 25, 2018

For me it was (IIRC) a socket, so I couldn't "read it".

Rapzid · on April 24, 2018

I believe you may be able to relink the inode with debugfs, depending on the filesystem.

microtherion · on April 25, 2018

On macOS, you might be able to accomplish this with fclonefileat()

superkuh · on April 25, 2018

This trick used to be the way I would download (or play in VLC) videos sent through Flash embeds in webpages. Way back in the day there was an actual temp file. But many companies didn't like that so they started deleting the file immediately to preserve the consensual hallucination that is 'streaming' and keep the lawyers happy. The way around this was to use stat, proc and awk in a bashrc function, ie:

    vlc $(stat -c %N /proc/*/fd/\* 2>&1 | awk -F[\`\'] '/tmp\/Flash/{print$2}')

With newer versions of Flash this too went away (around ~2014). But I still keep a browswer profile around that uses an old one around just so I can access the downloaded file to play in VLC (much smoother).

linza · on April 24, 2018

Access permissions are part of the inode, IIRC, and not part of the directory entry (basically the same as in Win32?).

dfox · on April 24, 2018

On FAT the "permissions" are in fact part of the directory entry (IIRC some OSes in "multitasking DOS" family even have FAT extended by having what essentially is the unix mode, uid, gid tuple in the directory entry).

On NTFS permissions live in MFT, which essentially is same thing as inode.

sigjuice · on April 24, 2018

Yes, this is documented in inode(7)

bscphil · on April 25, 2018

Another neat side effect of this is that you can remove a file that you don't own, so long as it's in a directory that you do. rm prompts to confirm by default, but it's intuitively surprising that can delete a root-owned file that you have no read or write permissions to.

dsnuh · on April 25, 2018

This is also a cause of frequent confusion among less knowledgeable users when they try to clear up space on a filesystem that is reporting full, but there is a process keeping the file desciptor open. As far as they can tell the file is gone, but the usage hasn't gone down. There is a guy at my work that I point out lsof +L1 to about every three months or so. He can't seem to wrap his head around the concept.

tinus_hn · on April 25, 2018

Using hard links you can have a file that exists in multiple places with none of these being ‘the’ place. If you remove one of these links, nothing happens in the other places. All you see is the refence count going down by one.

Only when the reference count is zero will the file data be removed.

JadeNB · on April 26, 2018

> Only when the reference count is zero will the file data be removed.

Not even necessarily then, right? (That is, there's no guarantee that a file's data is zeroed out just because its reference count drops to 0.) It's more just that it's only when the reference count is 0 that the actual space occupied on disk can be overwritten.

digi_owl · on April 24, 2018

Flash (ab)use this to hide the cache files for streamed videos. They create a temp file, then open it, then delete it. You can however still grab it out of proc...

zokier · on April 24, 2018

O_TMPFILE sort of does that trick now automatically; creates file without directory entry so you don't need jump through those (race-prone) hoops of unlinking the file.