A lot of people like writing bash for loops, I will try and avoid that as much as possible, xargs -n1 is the bash equivalent of a call to 'map' in a functional language.
For instance, let's say you want to create thumbnails of a bunch of jpegs:
find images -name "*.jpg" | xargs -n1 -IF echo F F | sed -e 's/.jpg$/_thumb.jpg/' | xargs -n2 echo convert -geometry 200x
Additionally, it's fully parallelizable as xargs supports something akin to pmap.
You might be interested in "zargs" in zsh, which would save you the call to find.
Furthermore, instead of the pipe to sed and extra xargs, it might be clearer and simpler to do something like:
zargs -n 1 **/*.jpg -- make-thumb
Where "make-thumb" is a short script (or even a zsh function, if you care about saving a fork for each input file) containing:
convert -geometry 200x $1 ${1%%.jpg}_thumb.jpg
But, in real life, instead of writing such a script or function, what I'd probably do instead is:
zargs -n 1 **/*.jpg | vipe > myscript
and do some quick editing in vim to modify the zargs output by hand to do whatever I need -- and then I'd run the resulting "myscript". Just fyi, "vipe" is part of the "moreutils" package [1] and lets you use your editor in the middle of a pipe.
One final trick is for when you need to do in-place image manipulation. Instead of using "convert", you can use another ImageMagick command: "mogrify". It will overwrite the original file with the modified file. Of course, you should be very careful with it.
I use xargs a lot for refactoring work when I cannot simply use sed, e.g.
vim $(grep -lr foo | xargs)
and doing what I need to do on a file by file basis. Otherwise, for renaming functions and the like, I do a lot of:
find . -name foo_fn exec sed -i s/foo_fn/bar_fn/g '{}' \;
I generally love abusing bash. Just today I was asked about how to rename a bunch of files, specifically containing spaces, and came up with either of these two options:
find -name foo_bar -exec cp "'{}'"{,.bak} \;
and
for file in $(find -name foo_bar); do cp "$file"{,.bak}; done
Ultimately, the great thing is, if you learn CTRL-R, you can always search for these types of commands and modify them as necessary for the particular task at hand and not necessarily remember them. One I use all the time, to push git branches upstream is the following:
You are correct on both counts regarding the vim and grep example - I guess I just assumed I would have to have all the files on a single line before handing them off to vim.
Thanks for the suggestion about -exec +; I will have to remember it in the future.
Would anyone mind doing me a favor by explaining xargs in more detail? I've tried learning it a couple times but I always seem to forget the primary situations in which it's useful. Thank you in advance!
Xargs takes a newline separated list and maps the list to a command.
find ./ -name '*.log' | xargs rm
Finds all log files and map them to 'rm' commands. e.g. if it finds system.log and rails.log it will run the command `rm system.log rails.log`.
xargs will automatically do things like break up very long lists into multiple command calls so that it doesn't exceed the maximum number of arguments a command can have.
Other useful things about xargs are '-P <NUM>' which lets you run the same command in parallel. I use this with curl to do ghetto benchmarks.
The next major flag is `-n <NUM>` which changes the number of arguments per command call. e.g. `-n 1` will run the command per argument passed to xargs.
And the last flag I commonly use is `-I {}`. This sets `{}` as a variable which contains the arguments. (This also forces `-n 1`). This is useful for things like:
find ./ -name '*.log' | xargs -I{} sh -c "if [ -f {}.gz ]; then rm {}; fi"
Only do that if you know exactly what '*.log' will expand to (i.e. don't use it in scripts and avoid using it on the command line). This is because the delimiter for xargs is a newline character, but filenames can have a newline character in them. This can lead to unexpected results.
Almost everywhere I see xargs used, find ...-exec {} ; will work as well and find ...-exec {} + may work even better.
Fixes that issue and xargs is far more efficient, it doesn't launch a new process for each line like exec does, but far more importantly, xargs is generalizable to all commands so you only have to learn it once; exec is just an ugly hack on find, you can't generalize it across all commands; xargs is much more unixy.
True, but the issue isn't removing files, the issue is generalizing a command for mapping output of a command to another command. rm was just a simple example. xargs is far more useful than simply deleting files.
Removing files is a common need coupled with find yet many readers don't know of -delete; I was pointing it out. That doesn't weaken xargs's valid uses. You seem a little defensive? :-)
Note that xargs actually takes a whitespace-delimited list. This often leads to problems when a filename has a space in it. To fix it, you should either use:
Maybe you'll call me a pedant, but you should be aware that `find .` is not equivalent to `ls .* *`. The find command starts at the indicated directories (. in this case) and lists each file and directory within it, recursing into subdirectories. You can use things like -type, -[i]name, and -mtime to filter the results, as well as -mindepth and -maxdepth to constrain the traversal.
Note also that "-print" is the default command for find, so you can leave it off. Other commands include -print0 (NUL-delimited instead of newline-delimited) and -exec.
You can parallelize this operation with xargs simply by adding a -P. You could add a & to your convert here but that would run all the jobs at the same time. xargs allows you to only run n at a time. That would be a lot harder to replicate in bash.
I have a tendency to use xargs too, but often you can go with find's -exec, especially "command {} +" construct used for spitting many files at once to the given command. E.g.
However, unlike xargs, GNU Parallel gives you a guarantee on the order of the output. From the man page:
GNU parallel makes sure output from the commands is the same output as
you would get had you run the commands sequentially. This makes it
possible to use output from GNU parallel as input for other programs.
Want to comment just to stress that this feature is EXTREMELY useful and has saved me from all sorts of tricks with file intermediate output with filenames that match process ids, etc.
The nice thing about parallel is that it can actually run some of the instances on remote machines using SSH, transparently (as long as they have the commands). You just need a couple of parameters and it takes care of uploading the files the command needs and then downloading the results. It's quite awesome.
A lot of people like writing bash for loops, I will try and avoid that as much as possible, xargs -n1 is the bash equivalent of a call to 'map' in a functional language.
For instance, let's say you want to create thumbnails of a bunch of jpegs:
find images -name "*.jpg" | xargs -n1 -IF echo F F | sed -e 's/.jpg$/_thumb.jpg/' | xargs -n2 echo convert -geometry 200x
Additionally, it's fully parallelizable as xargs supports something akin to pmap.