Developing some proficiency with commands such as find, xargs, sed, grep, etc. is where you can really gain productivity. There are studies showing that the perception that the command line is more efficient than a GUI is false; the GUI is actually faster. This may be true for relatively simple operations such as moving, copying, or deleting files, but by involving find, xargs, and a few pipes you can quickly accomplish operations that would be either very tedious or flatly impossible in most GUI file "explorers".
I am a believer in the CL, but I want to point out what "faster" can mean in these GUI vs CLI debates: On one hand, GUIs are probably faster if one is discovering a command in an interactive environment, but command lines and key oriented interfaces are probably faster for pure execution once one knows what one is doing in the same interactive environment.
sed, awk, xargs, friends aren't used interactively like either of the above, but are rather super powerful ingredients of non-interactive environments.
Every study I've seen like this is indicating that they're faster to learn for the average joe, and possibly faster for certain types of tasks (stretching an image), not that they're faster in general -- I'm not even sure what exactly that would mean.
Using xargs with a pipe is easier. I don't know any reason why I'd want to "save a pipe" when working at the command line.
Also note that text surrounded by asterisks in your comment become italics. Indent text by two or more spaces to reproduce text verbatim, like for code.
I think it was more about the appropriateness of the examples. z_'s comment is right-on, and I was going to post the same thing. It's a good intro, but the examples are contrived because you just don't need it.
It's not necessarily about saving a pipe, but also, when the tool provide first-class support for the function, it's typically less prone to error. For example, the -print0 becomes unnecessary, and I've been burnt by that.
I also appreciate the writeups that don't teach poor examples. We all know how prolific copy&paste coding is. How many times have you seen "grep foo bar | wc -l" when you know it's just all-around better to "grep -c foo bar"?
I would prefer we only teach "grep -c" as a special case optimization to people who already understand how "grep | wc -l" works, because the latter is more generally useful.
I felt bad using contrived examples, but I wanted a short post to cover the basics of xargs without getting detracted with discussions of the options of find. Based on the feedback I've seen here I'll go ahead and update the post though. If you can share some simple but less contrived examples please let me know, I'd love to update the post.
Don't get me wrong; it's a perfectly good and useful tutorial. The meat comes at the end when you talk about parallelism and argument batching. That can make a world of difference when you're working on real-world problems, like moving millions of files (mv * won't work unless you're on a system without ARG_MAX, and even then there are performance implications).
I think a good intro to xargs starts with a list of things that you can't do without it. (Easy for me to say that, but of course I haven't written that piece...) It'd be great to know why to use it, not just how, you know what I mean?
Anyway, this is just off-the-cuff commentary, not criticism. Thanks for writing it up.
I rarely use find because zsh has great globbing. I really wish zsh were the default shell everywhere, most people wouldn't even notice if you changed the prompt from % to $
Please stop calling it ack-grep. Debian already had an ack package and didn't get their priorities the way I would have wished. That's their problem, not the official name.
I didn't say "correct". I said "the way I would have wished". I expressed a personal preference, you're making me sound like more of a jerk than I already am.
That being said, yes, I believe they're plain wrong on this one. They shouldn't cater to me, but to their users. Thank God they have stats.
"find . -name '* .py' -exec grep 'import' {} +" will work more like the xargs example, i.e. all filenames will be passed as parameters to a single grep process, so the filename is displayed by default.
I usually don't bother with this kind of command line micro-optimization, but in my experience find and grep are the commands for which it's worth knowing and using every option.
It's more efficient to use the pipe with xargs. If you're removing 10,000 files, the difference in time between -exec (and fork/exec of rm 10,000 times) and xargs rm is quite significant. (On my system getconf MAX_ARGS is 2180000.)
One of the reasons I love xargs so much is the archaic syntax (and surprises based on your input) behind -exec. I think the pipe is cleaner, or using find in a set of backticks.
It's not the default because things like 'find | grep foo' wouldn't work as well: the output would appear all concatenated, as the terminal doesn't break lines on null characters.
I wrote a utility I named print0, which simply converts line-oriented input into null-terminated output. Very useful for building pipelines of line-oriented utilities, where each line is a filename. It's quite common to have spaces in filenames (if nothing else, user files on NAS shares), and vanishingly rare to see newlines, so I find it to be a sensible tradeoff. Things like 'find | egrep | sed | print0 | xargs -0' work as you'd expect.
Would not « print0() { tr '\n' '\0'; } » work in most cases? Does your utility do something special to existing instances of \0? That's the only case I can imagine where the transformation is necessarily lossy and it's not clear what to do. I suppose in theory you could have newline normalization or something too.
No, because that doesn't handle command-line arguments or '-' in the same way that cat does. It also doesn't handle the differences between DOS, Unix and Mac (\r, \n and \r\n) line endings properly. Finally, it's also slower than my 80-line C version.
As I already said, newlines[1] are vanishingly rare in filenames (most file manager UIs treat attempts at introducing a newline into the name as as committing the file rename), so I find it useful, so I wrote the utility, which only I use.
How you figure my solution to my problem is only symptomatic treatment, with technical drawbacks I already mentioned but treat as acceptable, is beyond me.
[1] As I mention in a cousin comment to this one, my utility handles all usual ASCII forms of newlines - \r, \n and \r\n (and \n\r for good measure).
The chances of such a file coming into existence accidentally and also being picked up in the find filter seem slim. If it's around maliciously, I think you have bigger problems. Anyway, there are many other failure modes of rm -rf that are a lot more likely and you should worry about.
The chances of such a file coming into existence accidentally
When sharing servers with less UNIX savvy developers you will observe A) a notoriously polluted home-directory and B) plenty of files with funny names such as -, * , user@server.com (from failed scp attempts) all over the place.
It's indeed relatively hard to create a file called '/' by accident. But I've seen files containing the '/' along with spaces, which can be just as deadly. Oh and the popular * -file is not to be taken lightly either.
As a rule of thumb: Learn the safe way to chain these commands (find/xargs in particular) once and stick to it. And always perform a dry-run before launching the real deal.
Unix filenames can not include / characters or null bytes, so I call shenanigans. Nonetheless your point stands: be careful. (And make liberal use of -print0 !)
A few issues with this article.
No, GNU xargs does not truncate the generated command. It will span multiple commands. Nor is it "-print 0" but "-print0" in find. And as mentioned in other comments, GNU parallel is much better for job parallelization.
You are right, I mistakenly assumed xargs truncated the command because the only thing I saw on the screen was the output of the last invocation. GNU parallel is great but xargs is installed by default on OS X, BSD. I'll go ahead and update the post and fix the -print0.
One thing that have bitten me in the past is the fact that xargs always runs at least once, even if there are no input. By using the -r / --no-run-if-empty flag, xargs does not run the command if the input does not contain any nonblanks.
I was just trying to figure out how to change the position of the arguments yesterday. I figured it out (thanks, man), but discovered the OS X version (FreeBSD?) uses -I while certain GNU versions use -i. Lame.
On a slightly related note, am I the only one who finds the argument syntax for `find` to be inconsistent? Shouldn't it be `find --name "*.bar"` or even `find --name=foo.bar`?
The "single - for single-letter -- for fullname" option style was popularized by gnu getopt, by far the most popular style. Find has been around for a while and probably doesn't update to that style because of the huge amount of stuff already using find.
It's inconsistent in comparison to most utilities, for sure.
It's consistent internally, and POSIX-compliant, at least. (iirc)
finds arguments might very well be inconsistent but the --name argument as you show it does exist. -iname just means find should treat the name as case insensitive. --name on the other hand is case sensitive so --name=Foo.bar won't match foo.bar.
zsh and its zargs command can easily overcome many of the limitations of xargs noted in the article. They also make the use of the "find" command unnecessary.
The orthogonality ship has sailed with Unix, for better or (IMHO) for worse. The Unix Haters were right about this, among other things, and the sections of the handbook about the shell are still valid, even as time and Moore have rendered others quaint.
For worse, I concur. I still fire up my SGI boxes once in a while to remind myself things didn't use to/don't have to be as binary as they mostly are today (Linux vs BSD, iOS vs Android, Intel vs AMD etc.). That, and to run electropaint, of course. =)
That was already fixed with -print0 | xargs -0, but then this solution is dismissed with "The problem is that this is not a portable construct;...".
The -delete isn't either, so this is a straw man argument, although it probably is the most efficient and secure of all.
Because find changes to the directory first (carefully not following symlinks), and then deletes the file from there.
It does not delete the file using the entire path (which may contain a sudden symlink).
It's not possible to do this safely using xargs.
Take a look also at -execdir which does the same thing - changes to the directory first, and runs things from there. -exec is not safe and should not be used.
xargs is not safe if you are running against a directory not your own. You should use find and -execdir instead.
Yes, the original authors of posix made a mistake here.
> Also, rant rant, I really don't understand why find was extended with -delete in the first place.
Rob Pike and Brian Kernighan warned about this trend in their seminal paper "Program Design in the UNIX Environment" (also known as "cat -v considered harmful" which describes how proper unix programs should be designed:
I did a quick look online to see who has it. GNU, DragonFly BSD, NetBSD, and FreeBSD all have a -delete option. OpenBSD seems to be the only one that doesn't (although its online manpage does give examples on how to do it with -exec and piping to xargs).
That's basically two flavours of UNIX (Linux and BSD), but that's OK - others are unfortunately either dead or dying. The UNIX family tree (http://www.levenez.com/unix) has been shrinking so rapidly in the last few years, that the current situation looks like the early years.
Nobody remembers the various command line switches of find or grep. I use xargs quite often because it's a very simple concept and I know how to mix and merge any commands together with it and that's something I've only had to to go ahead and learn once.
I'm not an expert by any means, so please provide correction/feedback if I'm wrong, but I've read that xargs is more efficient because it can send parameters to the cmd in batches. I've read this in contrast to `-exec {}` not `-delete`, however.
For example:
find . -name '*~' -exec 'rm {}'
The statement above executes `rm result` for every result. By contrast:
find . -name '*~' | xargs rm
The example above would group the results and pass them to rm like this:
rm result1 result2 result3 result4
Because I'm not an expert, I don't know how many parmeters it will pass, or if initiation of many new processes has a significant performance impact on newer systems. I would suspect that for anything involving the disk, I/O will be the bottle neck, not process turn-up time.
-delete will be the fastest because find can do everything, and it already has the file loaded, and doesn't have to do a second lookup.
-exec will be slower because for each file it has to spawn a new process (this is where most of the time goes), and then that new process has to look up the file.
xargs will be faster than -exec because it will collect a few hundred, or thousand, filenames, and pass them all to one command (the article claims 4096 is the default, on some systems it may be lower). This means that typically, only 3 processes need to spawn: `find', `xargs', and one `rm'; instead of find, and many, many `rm's.
Now, xargs is still slower than -delete because it will buffer the filenames, either waiting for the list to end, or 4000 filenames to pass to `rm'. Then, rm must look up the file from the filename.
To make my point I just set up a test situation with 1110 files ending with `~', among a total of 2221 files. I tested how fast it is to delete all the files ending with `~' using the 3 following commands:
Most systems have a find command that supports the "-exec command {} +" option, which does not spawn the command for each result, but instead puts as many arguments as the shell will allow on the command line.
You are correct about -delete. You could also use -exec to execute arbitrary commands for each file returned by find. I needed some examples however, and I often see "find | xargs" because people don't know, or forget, the options to find. I should write a follow up post on find.
Well, you want xargs as soon as your project is big, because you'll reach the limits on arguments fairly quickly.
I won't go too deep on the issues this will trigger if your filenames contain special characters. We often think of spaces, but you could have some nastier stuff. If you want to sound smart at your next geeks reunion, simply read http://www.dwheeler.com/essays/fixing-unix-linux-filenames.h...
And I would guesstimate (Linux, kernel >= 2.6.23) to still be a fairly small amount of the machines people interact with professionally through a command line.
And if in a script/snippet, you often want to cover a vast majority of the systems you _could_ end up with. Won't be System III, at least for me, but there has to be a RHEL5 system in a closet right? :)
That's the kernel's limit, Bash (3) has its own limit, which is significantly lower, I've hit it before. I think Bash 4 can do an arbitrary number of arguments, within the kernel's limit.
No, bash doesn't and didn't have its own limit. On an ancient system (Debian Sarge), ARG_MAX=131072:
$ bash --version
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
Copyright (C) 2002 Free Software Foundation, Inc.
$ strace bash -c '/bin/echo `seq 1 30000`' 2>&1 | grep exec
execve("/bin/bash", ["bash", "-c", "/bin/echo `seq 1 30000`"], …) = 0
execve("/bin/echo", ["/bin/echo", "1", …) = -1 E2BIG (Argument list too long)
As you can see, the argument list too long error came back from the
execve syscall, i.e., from the kernel. (Note that I shortened the strace output to make it fit the page)
Of course, I meant my version as an alternative to xargs. And I think you have about 128k of space for the filenames, but yeah with large projects that can be a problem.
Thanks for the link, that's more interesting than the submission. :)