Hacker News new | comments | show | ask | jobs | submit login
Unix filesystems: How mv can be dangerous (jstimpfle.de)
89 points by jstimpfle 939 days ago | hide | past | web | favorite | 38 comments



Once upon a time, mv was "a stupid system call interface to rename(2)". However, some users didn't like having to think about whether the source and destination were on the same mountpoint, so someone decided to make mv more complex[1]. The Unix designers lamented the growing complexity, but the great many naive users celebrated the change, and demanded that all mv implementations do that.

[1]: I believe GNU mv was the first to do that, but I could be mistaken.

Edit: As userbinator points out, this isn't totally true. I still believe it is true for moving directories, but for files, mv has copied them since at least Research UNIX V4 (1973), this being implemented by either Ken Thompson or Dennis Ritchie themselves.


> However, some users didn't like having to think about whether the source and destination were on the same mountpoint

That's actually a good thing, because it makes scripts more composable. Imagine you wrote a script that contains a buried mv somewhere in it, then you'd have to document this restriction all the way up to the end users of your script. If the mv command doesn't handle the mount-point-spanning case, then this functionality will have to be repeated in all scripts that use mv (which would be a disaster).

> the great many naive users celebrated the change

That's a very arrogant, unwarranted thing to say.


> > However, some users didn't like having to think about whether the source and destination were on the same mountpoint

> That's actually a good thing, because it makes scripts more composable.

I argue that it actually precludes any form of composability:

http://jstimpfle.de/fun/composability.html


I didn't say it was a bad change. But, even good changes should make one frown at the groing complexity.


> frown at the groing complexity

When I have the chance to remove replicated complexity from a myriad of places and to concentrate it on one central point (and make the interface simpler and more consistent in the process), I don't frown, I laugh with joy.


But for more than a decade, it was a bug that moving across devices screwed up ownership. So then you still have to to be aware of whether you are moving across devices, but now if you are but don't realize it, it automatically screws up your data.


That must be ancient history, because even Unix V6' mv would copy-delete (it didn't support moving directories to different parts of the tree though; that would be introduced in V7.)

http://man.cat-v.org/unix-6th/1/mv

http://cm.bell-labs.com/7thEdMan/v7vol1.pdf

http://man.cat-v.org/unix_8th/1/mv


I just did some digging. You're right that copying to different filesystems was introduced in Research UNIX.

In V5 and V6 mv, they actually call `execl("/bin/cp"` to do the copy!

Note that neither V5 nor V6 do this for directories though, only regular files (which I guess is implied by the fact that they force the parent directory to be the same for directory moves).

I can't find the source or binaries of V3 mv, but the V3 manual does not mention copying (the text in the V6 manual originated in the V4 manual).

Edit: more:

V7 gained support for moving a directory within the same device, but it still didn't support copying directories across devices.

I can't find much on Research UNIX >= V8, but they were based on BSD.

Edit: even more:

Copying directories in mv was first added to BSD in 1989 by Ken Smith <kensmith@cs.buffalo.edu> (committed by Keith Bostic). It used `cp -R` to do the recursive copies.


> I believe GNU mv was the first to do that, but I could be mistaken.

I was mistaken: BSD mv gained copying directories across devices in 1989, GNU mv didn't get that until 1998


not understanding that the (non-existing) target will be a subdirectory of the source.

mv first tries to rename("/mnt", "/mnt/foo") and only after that fails with EXDEV it decides to mkdir("/mnt/foo")

To me this looks like a bug: the order of the checks has been reversed. mv should never attempt to put a directory into itself, since there is no situation in which that ever makes sense. Only after that check passes should it decide if rename() is enough, or if it has to do a copy/delete.

The problem here is that the error conditions are not mutually exclusive - it is possible to have a rename cross devices and attempt to move a directory into one of its descendants - but there's no specification in the standard[1] of which error should be returned in such a case. The priority of non-mutually-exclusive errors is seldom considered, yet this is a good example of why it's important.

In other words the "rename("/mnt", "/mnt/foo")" should have failed with EINVAL, not EXDEV. A look through the Linux source shows that at least there, cross-device has higher priority than parent-descendant: http://lxr.free-electrons.com/source/fs/namei.c#L4240

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/re...


since there is no situation in which that ever makes sense

Actually, there is:

  mkdir foo bar
  ln -s ../foo bar/foo
  mv bar bar/foo
...so mv(1) leaves it up to the kernel to detect the error condition, as it should.


After resolving symbolic links that turns into "mv bar foo", which is perfectly acceptable since they are siblings. The checks for the destination being a descendant of the source should be performed after symlink resolution.

There's nothing in the standard about rename()'s error code priority (it could be EXDEV or EINVAL - they are not mutually exclusive), so I'd consider that a bug too.


The kernel has to resolve the symbolic link - /bin/mv should not try because it cannot do so without a race.


Presuming that you wouldn't invoke mv nanoseconds before creating the symlink, I think mv should resolve the symlink first and, if the move doesn't make sense, abort. It will still need to handle the case where it looks like the move makes sense, but the kernel returns EINVAL or whatever.


> Presuming that you wouldn't invoke mv nanoseconds before creating the symlink

This kind of thinking has led to lots of security bugs around symlinks.


indeed, and not just symlinks.

being vigilant about not creating unnecessary race conditions is important. doubly so when you're building a tool upon which others will compose their own programs/scripts/etc.


True, symlinks complicate things.

We have to be careful about doing too much (adding edge cases and complexity), though in this case how about:

    canon_src = canonicalize(src);
    canon_dst = canonicalize(dst);
    int ret = rename(src, dst);
    if (ret == EXDEV) {
      /* Note if there are symlinks being recreated
         between the canonicalize() and rename() above,
         then it's better to not fall back to a
         cross device copy anyway.  */
      if (common_prefix(canon_src, canon_dst))
          ret = EINVAL; /* Treat like "copy into self" */
    }


> In other words the "rename("/mnt", "/mnt/foo")" should have failed with EINVAL, not EXDEV

I thought about that one, too. But I guess there is 2 kinds of subdirectory-of: In the mount hierarchy possibly across devices, and on a single device in terms of inode-reachability.

Absence of the first condition does not preclude the second condition: Many filesystems can be mounted at multiple places.

I think the point of EINVAL is to prevent screwing up the intra-device inode links. For this case, the check can't be made in the mount hierarchy. So it seems natural to me that the EXDEV check comes first.


I agree with you, but wonder if historical compatibility is forcing them to retain the buggy behavior.


If the problem were just historical compatibility, I don't see what would be preventing them from adding a flag --do-things-sensibly to turn on the reasonable behavior.


Here's a good article about how it should be done. It does not just say "use move", but it also explains how to backup the old state, how to use traps to remove your workaround code etc.

http://www.davidpashley.com/articles/writing-robust-shell-sc...


My article is not about how to write robust shell scripts. (Which is impossible in a very broad sense: for example, use a single pipeline A | B and include in your operational semantics that all processes can fail (SIGKILL). In shell, you can't access A's exit code)

It is about how given infelicities like multiple devices should sometimes not be abstracted away below an opaque layer. How a seemingly simple command can fail and eat your data in inrecoverable ways, even in the absence of data races (which many programs must ignore to be able to do useful things at all), resource starvation etc.

Notice that so far, the reason why mv went wrong wasn't identified: Is it the kernel API, is it the mv implementation? Maybe it's just not reasonably possible to avoid such kinds of going wrong...


I agree that it's good to understand why a tool is destructive. But in a complex system it's hard to do some things without being destructive. Knowing that "mv" can be destructive, isn't it a logical conclusion to learn how to handle such situations in your bash code? For me that's nearly as important as understanding why mv can be destructive.


I can't imagine how outrageously frustrating `mv` would be if it choked at moving files across mountpoints, especially these days when OSes have more of them than ever.


I think differently about this than you do. When copying across mountpoints, I often do something like

   tar cf - . | (cd target; tar xf -)
of course nowadays on OS X I prefer to use "ditto", but that's pretty much the same thing.

Then I at least do something like "du -s" on source and target to see if they're more or less the same size. And I do some "ls" operations to see if things look OK. Or if it's important data I'll run SHA256 on all source and destination files and compare results.

Then, and only then, do I delete the source.

Remember, you're not paranoid if the computer really is out to get you. :)

Edit: if I'm rearranging files and directories on the same filesystem, I'll just use "mv". But years of bugs and glitches on NFS and such have left me with a good healthy paranoia about copying data in general.


Why not just rsync? That's exactly what it's for!


In my mind rsync is more complicated than the "tar | tar" idiom that I routinely typed.

For important data, the real secret is the verification before deleting source data. I used to have some simple scripts that did (more or less) the following on source and destination:

   find ... -print0 | xargs -0 -x md5 | sort >md5list
I'd have high confidence my files were safely copied when I compared those checksum files.

Nowadays I mostly use canned solutions like ditto or Carbon Copy Cloner (which uses rsync under the hood!) and let them do their thing. But I then also verify by using my own python script that uses "os.walk" to traverse the directory trees and uses "hashlib.hash256" to create file checksums.


I am fond of this shorter syntax:

tar c . | tar x -C target

This syntax allows to avoid errors when the target already exists.


Copy-then-remove-old has vastly different semantics. It creates a new file identity. For a start, just imagine what happens when the old file is still open from some processes.

I'm not saying there should not be convenience wrappers taking the dangerous route. I'm saying there is no shell interface that fails gracefully if no safe operation is possible.

Maybe mv should just get an -x flag, like GNU cp and other commands have.



I'm far away from being an expert, but I also did a little digging (which might be surface for people with more knowledge). "man mv" on Ubuntu 15.04 starts off with "mv - move (rename) files" and the Description starts with "Rename SOURCE to DEST, or move SOURCE(s) to DIRECTORY".

This is pretty much the result of the blog post, right? So it's not really a surprise that it only renames in some cases. It's not really a frontend for rename(2), it's only using rename in some cases (which might be common) as optimization. Atomic file system changes are simply a nice side effect.

Or am I understanding the quoted two sentences incorrectly?


There are different things that "move" can mean. rename(2) can "move" in a safe way, but not across devices.

> Atomic file system changes are simply a nice side effect.

Disagree.

Also, my article was to show how mv's task is so complex that it can fail and eat your data even if atomicity is not a requirement.


Okay, haven't got that so far. Maybe I'm missing some skillset to understand.


You've understood the general use description of mv (copy-rename-delete), but what OP demonstrates is a situation in which it does something unexpected (and inappropriate) when dealing with mounted directories.

See the part where he creates file 'bar' in mounted directory /mnt, attempts to move /mnt within itself to /mnt/foo, then asks to show the directories containing /mnt.

The result is that the mv fails (so it shouldn't do anything, right?), but still creates an additional directory (/mnt/foo) that now contains 'bar' and has (destructively) altered the fs hierarchy.

However, when he tries it again after unmounting /mnt, the behavior is completely normal for any other file-- mv fails, nothing happens, /mnt/bar hierarchy is preserved, no /mnt/foo created.


Thanks, now I understand what's going on! Great explanation!


You could say the same for scripts, languages, etc when not using safety precautions.

I would look at the link from erikb's answer: http://www.davidpashley.com/articles/writing-robust-shell-sc...


Would --no-target-directory help here?

Edit: I see the article does mention this GNU specific option ("-T") as a fix


That's not the issue that is being observed here, it's just a side-note.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: