
Things you (probably) didn't know about xargs - tswicegood
http://offbytwo.com/2011/06/26/things-you-didnt-know-about-xargs.html
======
ams6110
Developing some proficiency with commands such as find, xargs, sed, grep, etc.
is where you can really gain productivity. There are studies showing that the
perception that the command line is more efficient than a GUI is false; the
GUI is actually faster. This may be true for relatively simple operations such
as moving, copying, or deleting files, but by involving find, xargs, and a few
pipes you can quickly accomplish operations that would be either very tedious
or flatly impossible in most GUI file "explorers".

~~~
forkandwait
I am a believer in the CL, but I want to point out what "faster" can mean in
these GUI vs CLI debates: On one hand, GUIs are probably faster if one is
_discovering_ a command in an interactive environment, but command lines and
key oriented interfaces are probably faster for pure _execution_ once one
knows what one is doing in the same interactive environment.

sed, awk, xargs, friends aren't used interactively like _either_ of the above,
but are rather super powerful ingredients of _non-interactive_ environments.

~~~
dustingetz
it's not a dichotomy. for example, staging non-trivial commits in git is much
easier from a frontend than from git's CLI.

------
z_
You can skip the pipe by using find's '-exec' option. It will save you the
pipe.

The example: "find . -name '* .py' | xargs grep 'import'" would become "find .
-name '* .py' -exec grep -H 'import' {} \;".

You need to include the -H in grep to get the filename in which the match
occurs.

~~~
bellaire
Using xargs with a pipe is easier. I don't know any reason why I'd want to
"save a pipe" when working at the command line.

Also note that text surrounded by asterisks in your comment become italics.
Indent text by two or more spaces to reproduce text verbatim, like for code.

~~~
ibejoeb
I think it was more about the appropriateness of the examples. z_'s comment is
right-on, and I was going to post the same thing. It's a good intro, but the
examples are contrived because you just don't need it.

It's not necessarily about saving a pipe, but also, when the tool provide
first-class support for the function, it's typically less prone to error. For
example, the -print0 becomes unnecessary, and I've been burnt by that.

I also appreciate the writeups that don't teach poor examples. We all know how
prolific copy&paste coding is. How many times have you seen "grep foo bar | wc
-l" when you know it's just all-around better to "grep -c foo bar"?

~~~
cstejerean
I felt bad using contrived examples, but I wanted a short post to cover the
basics of xargs without getting detracted with discussions of the options of
find. Based on the feedback I've seen here I'll go ahead and update the post
though. If you can share some simple but less contrived examples please let me
know, I'd love to update the post.

~~~
ibejoeb
Don't get me wrong; it's a perfectly good and useful tutorial. The meat comes
at the end when you talk about parallelism and argument batching. That can
make a world of difference when you're working on real-world problems, like
moving millions of files (mv * won't work unless you're on a system without
ARG_MAX, and even then there are performance implications).

I think a good intro to xargs starts with a list of things that you _can't_ do
without it. (Easy for me to say that, but of course I haven't written that
piece...) It'd be great to know why to use it, not just how, you know what I
mean?

Anyway, this is just off-the-cuff commentary, not criticism. Thanks for
writing it up.

------
billpg
I wonder why -print0 / -0 isn't the default, as it seems that _not_ using
those options is the wrong way to do it.

(Either that or why filenames can have spaces or LFs in them.)

~~~
barrkel
It's not the default because things like 'find | grep foo' wouldn't work as
well: the output would appear all concatenated, as the terminal doesn't break
lines on null characters.

I wrote a utility I named print0, which simply converts line-oriented input
into null-terminated output. Very useful for building pipelines of line-
oriented utilities, where each line is a filename. It's quite common to have
spaces in filenames (if nothing else, user files on NAS shares), and
vanishingly rare to see newlines, so I find it to be a sensible tradeoff.
Things like 'find | egrep | sed | print0 | xargs -0' work as you'd expect.

~~~
premchai21
Would not « print0() { tr '\n' '\0'; } » work in most cases? Does your utility
do something special to existing instances of \0? That's the only case I can
imagine where the transformation is necessarily lossy and it's not clear what
to do. I suppose in theory you could have newline normalization or something
too.

~~~
barrkel
No, because that doesn't handle command-line arguments or '-' in the same way
that cat does. It also doesn't handle the differences between DOS, Unix and
Mac (\r, \n and \r\n) line endings properly. Finally, it's also slower than my
80-line C version.

------
tszming
Use GNU Parallel if you need to execute jobs in parallel
(<http://www.gnu.org/software/parallel/>)

------
RexRollman
The best thing about Unix, and maybe the worst, is that you never stop
learning about it. Nice article.

------
botker
Never pipe `find` to `xargs rm -f`. All you need is a malicious empty file
called / and you're screwed. That's why `find` has a -delete flag.

~~~
Jach
The chances of such a file coming into existence accidentally and also being
picked up in the find filter seem slim. If it's around maliciously, I think
you have bigger problems. Anyway, there are many other failure modes of rm -rf
that are a lot more likely and you should worry about.

~~~
moe
_The chances of such a file coming into existence accidentally_

When sharing servers with less UNIX savvy developers you will observe A) a
notoriously polluted home-directory and B) plenty of files with funny names
such as _-_ , * , user@server.com (from failed scp attempts) all over the
place.

It's indeed relatively hard to create a file called '/' by accident. But I've
seen files _containing_ the '/' along with spaces, which can be just as
deadly. Oh and the popular * -file is not to be taken lightly either.

As a rule of thumb: Learn the safe way to chain these commands (find/xargs in
particular) once and stick to it. And _always_ perform a dry-run before
launching the real deal.

~~~
koenigdavidmj
Unix filenames can not include / characters or null bytes, so I call
shenanigans. Nonetheless your point stands: be careful. (And make liberal use
of -print0 !)

~~~
moe
Seems you are right! I could've sworn I've seen them before, guess my memory
got mixed up there.

------
onedognight
While not the point of the article, I was happy to learn about bash's {..}
operator! I've always missed perl's .. in the shell.

    
    
       echo {1..100}

------
st3fan
Note that the first three examples can also be written in a much shorter form
if you use zsh:

    
    
      wc -l **/*.py
      rm **/*~
      grep 'import' **/*.py
    

I like zsh a lot for this small feature.

~~~
scrame
I like zsh for the same reason, but you will want to quote or escape the tilde
in your second example.

    
    
        rm **/*\~
        rm '**/*~'

------
ptramo
A few issues with this article. No, GNU xargs does not truncate the generated
command. It will span multiple commands. Nor is it "-print 0" but "-print0" in
find. And as mentioned in other comments, GNU parallel is much better for job
parallelization.

~~~
cstejerean
You are right, I mistakenly assumed xargs truncated the command because the
only thing I saw on the screen was the output of the last invocation. GNU
parallel is great but xargs is installed by default on OS X, BSD. I'll go
ahead and update the post and fix the -print0.

------
uggedal
One thing that have bitten me in the past is the fact that xargs always runs
at least once, even if there are no input. By using the -r / --no-run-if-empty
flag, xargs does not run the command if the input does not contain any
nonblanks.

~~~
ptramo
As with many cool features, it's a GNU-only option.

One can refer to
[http://pubs.opengroup.org/onlinepubs/009695399/utilities/xar...](http://pubs.opengroup.org/onlinepubs/009695399/utilities/xargs.html)
for the portable flags.

------
tlrobinson
I was just trying to figure out how to change the position of the arguments
yesterday. I figured it out (thanks, man), but discovered the OS X version
(FreeBSD?) uses -I while certain GNU versions use -i. Lame.

~~~
james2vegas
In GNU utils, incompatible features and extensions are a feature, not a bug.

------
underwater
On a slightly related note, am I the only one who finds the argument syntax
for `find` to be inconsistent? Shouldn't it be `find --name "*.bar"` or even
`find --name=foo.bar`?

~~~
kaens
The "single - for single-letter -- for fullname" option style was popularized
by gnu getopt, by far the most popular style. Find has been around for a while
and probably doesn't update to that style because of the huge amount of stuff
already using find.

It's inconsistent in comparison to most utilities, for sure.

It's consistent internally, and POSIX-compliant, at least. (iirc)

------
gte910h
The trace param to Xargs is nice too: You can check what will happen before
doing something.

------
gnosis
zsh and its zargs command can easily overcome many of the limitations of xargs
noted in the article. They also make the use of the "find" command
unnecessary.

------
sabat
_Recursively find all Emacs backup files and remove them_ _find . -name '_ ~'
| xargs rm*

 _Recursively find all Python files and search them for the word ‘import’_
_find . -name '_.py' | xargs grep 'import'*

Hmm. I don't mean to be a tweak, but you don't need xargs to do either of
those things. Just:

find . -name ' _~' -delete

find . -name '_.py' | grep 'import'

Note: I can't figure out how to get an asterisk to show up, and don't have
time to look it up.

~~~
nuxi
The -delete option is not available on all UNIX systems, as it's not part of
the POSIX spec
([http://pubs.opengroup.org/onlinepubs/009604599/utilities/fin...](http://pubs.opengroup.org/onlinepubs/009604599/utilities/find.html)).

(Also, rant rant, I really don't understand why find was extended with -delete
in the first place. What's next, "ls --delete" or maybe "cat --grep"?)

~~~
jfb
The orthogonality ship has sailed with Unix, for better or (IMHO) for worse.
The Unix Haters were right about this, among other things, and the sections of
the handbook about the shell are still valid, even as time and Moore have
rendered others quaint.

~~~
nuxi
For worse, I concur. I still fire up my SGI boxes once in a while to remind
myself things didn't use to/don't have to be as binary as they mostly are
today (Linux vs BSD, iOS vs Android, Intel vs AMD etc.). That, and to run
electropaint, of course. =)

~~~
ars
-delete was added to find because of the race condition with doing it using xargs.

See section 9.1.5
[http://www.gnu.org/software/findutils/manual/html_node/find_...](http://www.gnu.org/software/findutils/manual/html_node/find_html/Deleting-
Files.html)

~~~
nuxi
That was already fixed with -print0 | xargs -0, but then this solution is
dismissed with "The problem is that this is not a portable construct;...". The
-delete isn't either, so this is a straw man argument, although it probably is
the most efficient and secure of all.

~~~
ars
Read it again.

-print-0 | xargs -0 does not fix the race condition.

The problem is someone can swap in a symlink after the find, and before the
xargs.

~~~
nuxi
I'll have to run some more tests, but I can't see how -delete would help in
that case .

~~~
ars
Because find changes to the directory first (carefully not following
symlinks), and then deletes the file from there.

It does not delete the file using the entire path (which may contain a sudden
symlink).

It's not possible to do this safely using xargs.

Take a look also at -execdir which does the same thing - changes to the
directory first, and runs things from there. -exec is not safe and should not
be used.

xargs is not safe if you are running against a directory not your own. You
should use find and -execdir instead.

Yes, the original authors of posix made a mistake here.

> Also, rant rant, I really don't understand why find was extended with
> -delete in the first place.

I'm hoping you understand it now.

~~~
nuxi
Yes, thanks for the extensive info.

------
gcb
for a long time I avoid xargs like the plague. much less error prone to user a
bash loop or something....

but that parallelization parameter may win me back as it's cleaner than & and
global vars for counters....

