David A. Wheeler, FOSS (and occasionally security) luminary, who also happens to be creator of the popular sloccount tool, has an excellent page that covers this topic and how to use paths safely and portably in shell scripts (spoiler: it's hard) :
Edit: a simple way to avoid these problems is to prepend the wildcard with ./ (so globbed files won't start with - or -- but with the path ./) and on GNU systems put -- before the wildcard, telling the tool that following arguments are not options.
Looks like the convention was introduced as part of getopt in AT&T Unix System III (1980), though initially only in a handful of places. Then POSIX adopted it as standard for all utilities sometime later.
When --longopt was starting to get popular (in an ad-hoc way) for gnu and other utilities there was a poll (gnu.misc? late 90s?) whether to make it a gnu standard. The other choices were a +, or have commands not accept both bundled single letter options and longopts. There may have been other choices I've forgotten about.
The double-dash won. There were some people who were concerned that it might be confused with the end-of-args double dash.
Maybe that is true but they are certainly unambiguously parsed.
At what point do you give up and, if you must have portability, bootstrap a saner environment?
Are there many options in this area? Assume perl is available? Use autoconf-like shell script compiler? Starting with Lua (which I hope/assume can build anywhere, though I know it's missing lots of functionality out of the box)?
I'm lucky not to have to worry about this, but I figured that having Perl is a safe assumption for desktop and server Unix systems.
Not embedded devices, but I guess those are different enough that you're probably not targetting them for portable scripts. Oh no, have I contradicted myself? :)
I really wonder who this person asked, that was an "old-school Unix admin", that didn't know of this attack. This article also doesn't mention the countermeasure which is available in every utility I know of: the -- argument, which disables parsing of all further arguments and treats them just as filenames.
1. For “--” to work, all maintainers would have to faithfully use “--” in
practically every command invocation. That just doesn’t happen in real
life, even after decades of people trying. People forget it all the
time; no one is that consistent, especially since code seems to work
without it. Very few commands require it, after all.
So because other people may or may not forget it, I shouldn’t use it in my scripts/day-to-day usage? That’s about as silly as saying that, because other people will write unreadable code anyway I shouldn’t bother with comments, short functions or sensible variable names.
2. You can’t do it anyway, even if you were perfectly consistent; many
programs and commands do not support “--”. POSIX even explicitly for-
bids echo from supporting “--”, and echo must support “-n” (and GNU
coreutils echo supports other options too).
This is a problem, but iff you have to use echo for some reason. printf works nearly equally well and supports -- just fine. I may have read somewhere that using printf is actually advocated nowadays, but I’m not sure where and why.
> So because other people may or may not forget it, I shouldn’t use it in my scripts/day-to-day usage?
He specifically mentions that that is not his point, but rather that he is arguing against exactly the sort of "just use '--'" response that one can find in this post:
> Do feel free to use “--” between options and pathnames, when you can do it, as an additional protective measure. But using “--” as your primary (or only) mechanism for dash-prefixed filenames is bad idea.
This article starts with the premise that the person executing the command has no idea what files are in the current working directory. That is itself a more serious problem than the behaviour of wildcards.
Later in the article we learn that it also assumes GNU utilities. That is a second problem (IMO), and arguably also one more serious than the behaviour of wildcards. GNU userland and unneeded complexity (e.g. more features than any user will ever use) are practically synonymous.
Then there is the peculiar assumption that someone can place arbitrary files beginning with - or -- on this system. That itself is a far more serious problem than the behaviour of wildcards; I would say with that capability it is more or less "game over". In BSD you have, at the very least, mtree. How does the Linux user know she isn't executing some substituted executable?
Moreover, if caution was important to the hypothetical user in the examples, I think they would be in the form
> GNU userland and unneeded complexity (e.g. more features than any user will ever use) are practically synonymous.
Ahhhh, the memories. These accusations bring me back to the early 1990's. Remember? Remember how it was? Oh, boy, how did we all get so old? To be young and running System V again...
And still today, whenever I'm faced on a system without GNU utilities, I find myself installing them to get what seems to me like basic functionality. We have never changed, have we?
> This article starts with the premise that the person executing the command has no idea what files are in the current working directory. That is itself a more serious problem than the behaviour of wildcards.
Substitute `person executing the command' with `simple script', and you've got a better justification. I don't want my simple scripts to break down in the face of silly filenames.
I keep forgetting that GNU echo has an -e option, which, if I'm not mistaken, makes it behave like printf. (Why does it need this feature when there also exists a builtin printf? Nevermind.)
Anyway, I didn't think about what happens if you create one file called "-e" and then do
echo *
Anyway, as someone else pointed out, using ./* instead of * will defeat the "exploits" in the article.
I remember when I used pdfimages to extract images from a pdf file and I didn't read the manual beforehand. It turned out that you should call it like pdfimages <pdf_file> <prefix> and if you don't specify a prefix it generates filenames in the form of -img001, -img002, ... (or something like that). I had a hard time deleting those images.
Tangentially, anyone know how to make zsh less greedy about parsing wildcards? Something like this will fail with "no files matched", and the command won't run:
rsync example.com:/foo/* .
My workaround is to quote the argument, but it's annoying.
Well, the problem is that "example.com:/foo/*" isn't actually a local path that can ever be globbed by zsh alone -- it's an rsync/ssh/scp-style remote path, yet zsh interprets it as such. Not sure why. I think the solution is make sure zsh ignores arguments containing colons, but I don't know what the config option is for that.
Edit: Just remembered that zsh's over-eager globbing also fails with git -- eg., "HEAD^" must be quoted.
Good summary of surprising behavior. You can work around a lot of these issues using "--" as an argument before you use any wildcards. This tells most commands to stop processing options and treat the rest of the arguments as files (or whatever other non-option arguments the command takes). That's getopt(3)'s behavior[0]. For example, "rm -- *" will not have the problem where directories are removed if there's an entry called "-rf" in the directory.
A better workaround is to prefix your wildcarded arguments with ./ as this will work with all commands. For example, rm ./* will safely remove files regardless of how they're named.
This is why I not only use "--" everywhere, I also religiously use full quoting of "${vars[@]}" and options like mv(1)'s --no-clobber when appropriate. Even without the security concerns, this kind of "least privilege" approach can help prevent a lot of really-annoying bugs.
That said, I going to have to check a few scripts for that chmod attack (or similar) - I think I've seen that type of attack before, but I must have forgotten about it... sigh
This is one of the reasons sudo should (by default) only allow a whitelist of built-in commands to be run with wildcards.
Somewhat like sudoedit.
This is of course for the corporate case of a less privileged user performing a certain task at elevated privileges. Not for the more common use of sudo (these days) of people managing their own personal machines.
There are hosts with multiple users on them, who have some level of write access to somewhere on the filesystem. After all, Unix is a multi-user system, so it is not heard of to have multiple users on it. That being said, this article is just stating to be careful of wildcards when you are sitting in a user-owned (or user-writable, such as /tmp) directory.
Not by any sane person, no. Why are you letting people upload and name their own files on your server? Should we be posting articles about the vulnerabilities in the finger daemon or Solaris 8's NIS implementation, while we're at it?
It seems like this article is aiming towards shared servers where you actually allow shell login to "untrusted" users, which IMO is a relic of days long past, that only really persists at (maybe) universities. Hell, even at the university I last worked at 6 years ago we just gave everybody their own VM. And nowadays I wouldn't even need to give them that... students can download and run a vagrant environment on their personal machine in like 2 commands.
It's not to say there's no audience for articles like this... I'm sure there's plenty of environments out there that still follow the multi-user server model from the 1970's, and I certainly pity anyone who has to administer those types of systems. But it certainly should be no surprise to anyone that there's a lot of malicious things you can do if you have shell access to a system (or to your point, the ability to upload arbitrarily-named files with arbitrary content. shudder)
This kind of article is good to keep bubbling up over time, to educate new users in best practices. Not everyone is a 20-year unix admin that's seen a bit of everything.
Lets say you have web server, this web server have web application that allows users to upload files. Web application is sane and doesn't allow path traversal and have proper .htaccess inside of the upload directory.
Now all you need is a user who kindly requests for a copy of uploaded files. If you're not aware of this issue (and you must be "actively aware" i.e. watch out for it all the time) you can do something that shouldn't be possible.
I'm a bit late replying, but I wouldn't consider an application that allows users to upload files, and pick their names to be a sane application. Do you think imgur (as an example) lets users name their files? What about stuff like the defunct megaupload or other file sharing sites? You get to pick filenames but that's really metadata... the URLs and (presumably) the underlying file storage structure are database-driven.
In most environments it would be insane to allow anybody untrusted to put files on your server. That includes a trusted sysadmin extracting unknown tar files on your server. It's a case of "if they got this far, you're already fucked."
http://www.dwheeler.com/essays/filenames-in-shell.html
I strongly recommend his many other essays to HN readers: http://www.dwheeler.com/
Edit: a simple way to avoid these problems is to prepend the wildcard with ./ (so globbed files won't start with - or -- but with the path ./) and on GNU systems put -- before the wildcard, telling the tool that following arguments are not options.