Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My #1 abuse is xargs -n1.

A lot of people like writing bash for loops, I will try and avoid that as much as possible, xargs -n1 is the bash equivalent of a call to 'map' in a functional language.

For instance, let's say you want to create thumbnails of a bunch of jpegs:

find images -name "*.jpg" | xargs -n1 -IF echo F F | sed -e 's/.jpg$/_thumb.jpg/' | xargs -n2 echo convert -geometry 200x

Additionally, it's fully parallelizable as xargs supports something akin to pmap.



You might be interested in "zargs" in zsh, which would save you the call to find.

Furthermore, instead of the pipe to sed and extra xargs, it might be clearer and simpler to do something like:

  zargs -n 1 **/*.jpg -- make-thumb
Where "make-thumb" is a short script (or even a zsh function, if you care about saving a fork for each input file) containing:

  convert -geometry 200x $1 ${1%%.jpg}_thumb.jpg
But, in real life, instead of writing such a script or function, what I'd probably do instead is:

  zargs -n 1 **/*.jpg | vipe > myscript
and do some quick editing in vim to modify the zargs output by hand to do whatever I need -- and then I'd run the resulting "myscript". Just fyi, "vipe" is part of the "moreutils" package [1] and lets you use your editor in the middle of a pipe.

One final trick is for when you need to do in-place image manipulation. Instead of using "convert", you can use another ImageMagick command: "mogrify". It will overwrite the original file with the modified file. Of course, you should be very careful with it.

[1] - http://joeyh.name/code/moreutils/


I use xargs a lot for refactoring work when I cannot simply use sed, e.g.

vim $(grep -lr foo | xargs)

and doing what I need to do on a file by file basis. Otherwise, for renaming functions and the like, I do a lot of:

find . -name foo_fn exec sed -i s/foo_fn/bar_fn/g '{}' \;

I generally love abusing bash. Just today I was asked about how to rename a bunch of files, specifically containing spaces, and came up with either of these two options:

find -name foo_bar -exec cp "'{}'"{,.bak} \;

and

for file in $(find -name foo_bar); do cp "$file"{,.bak}; done

Ultimately, the great thing is, if you learn CTRL-R, you can always search for these types of commands and modify them as necessary for the particular task at hand and not necessarily remember them. One I use all the time, to push git branches upstream is the following:

CTRL-R --set-

which gives me:

git push -f --set-upstream origin `git rev-parse --abbrev-ref HEAD`

This is entirely unique in my history, a common part of my workflow, and trivially searchable.

I also enjoy being able to perform something along the lines of:

vim $(bundle show foo-gem)


Sorry, what do you think

    vim $(grep -lr foo | xargs)
is doing? Assuming a missing directory after the `foo' why can't it just be

    vim $(grep -lr foo .)
I don't see that xargs's default behaviour is adding anything.

For

    find . -name foo_fn exec sed -i s/foo_fn/bar_fn/g '{}' \;
you may find -exec's + of use. The above has the fork/execve overhead for each file found.

    find -name '*.[ch]' -exec sed -i 's/\<foo\>/bar/g' {} +


You are correct on both counts regarding the vim and grep example - I guess I just assumed I would have to have all the files on a single line before handing them off to vim.

Thanks for the suggestion about -exec +; I will have to remember it in the future.


Backticks, ``, and $(), use IFS to parse the command's output into words which then become the $()'s replacement and IFS normally contains \n.


Would anyone mind doing me a favor by explaining xargs in more detail? I've tried learning it a couple times but I always seem to forget the primary situations in which it's useful. Thank you in advance!


Xargs takes a newline separated list and maps the list to a command.

    find ./ -name '*.log' | xargs rm
Finds all log files and map them to 'rm' commands. e.g. if it finds system.log and rails.log it will run the command `rm system.log rails.log`.

xargs will automatically do things like break up very long lists into multiple command calls so that it doesn't exceed the maximum number of arguments a command can have.

Other useful things about xargs are '-P <NUM>' which lets you run the same command in parallel. I use this with curl to do ghetto benchmarks.

The next major flag is `-n <NUM>` which changes the number of arguments per command call. e.g. `-n 1` will run the command per argument passed to xargs.

And the last flag I commonly use is `-I {}`. This sets `{}` as a variable which contains the arguments. (This also forces `-n 1`). This is useful for things like:

    find ./ -name '*.log' | xargs -I{} sh -c "if [ -f {}.gz ]; then rm {}; fi"


    find ./ -name '*.log' | xargs rm
Only do that if you know exactly what '*.log' will expand to (i.e. don't use it in scripts and avoid using it on the command line). This is because the delimiter for xargs is a newline character, but filenames can have a newline character in them. This can lead to unexpected results.

Almost everywhere I see xargs used, find ...-exec {} ; will work as well and find ...-exec {} + may work even better.


    find ./ -name '*.log' -print0 | xargs -0 rm
Fixes that issue and xargs is far more efficient, it doesn't launch a new process for each line like exec does, but far more importantly, xargs is generalizable to all commands so you only have to learn it once; exec is just an ugly hack on find, you can't generalize it across all commands; xargs is much more unixy.


> it doesn't launch a new process for each line like exec does

Exec doesn't either if you use "+" as a terminator:

    find . $(options) -exec command {} \;
executes one command per match,

    find . $(options) -exec command {} +
executes a single command with all matches

> exec is just an ugly hack on find

Obvious and complete nonsense, -exec is both an important predicate of find and a useful and efficient tool.

-print0/xargs -0, now that is a hack.


You apparently missed the part where I said "but far more importantly".


No, that's just your opinion and you're entitled to it so I don't care, the rest is factually incorrect.


Oh, I agree on the + thing, I wasn't aware of that option, however, it doesn't make exec any more generalizable across commands which is what matters.


Wanting to remove the files was so common that some finds have it built-in. Avoids even the overhead of -exec rm {} +.

    find -name '*.log' -delete


True, but the issue isn't removing files, the issue is generalizing a command for mapping output of a command to another command. rm was just a simple example. xargs is far more useful than simply deleting files.


Removing files is a common need coupled with find yet many readers don't know of -delete; I was pointing it out. That doesn't weaken xargs's valid uses. You seem a little defensive? :-)


Too much reddit perhaps. I use the delete switch myself regularly.


Note that xargs actually takes a whitespace-delimited list. This often leads to problems when a filename has a space in it. To fix it, you should either use:

  find ... -print0 | xargs -0 ...
or:

  ... | xargs -d'\n' ...


It's most common use is to take some stuff on stdin and then use those as arguments to a command. Here is a fancy way to do `ls * `[1] using xargs:

    find . -print | xargs ls
`find` dumps the results to stdout, the pipe shuttles `find`'s stdout to `xargs` stdin, `xargs` uses its stdin as arguments for `ls`.

(Don't run this in your home directory)

[1] Pedants will realize this is actually equivalent to `ls .* *` as hidden files are found by `find`.


Maybe you'll call me a pedant, but you should be aware that `find .` is not equivalent to `ls .* *`. The find command starts at the indicated directories (. in this case) and lists each file and directory within it, recursing into subdirectories. You can use things like -type, -[i]name, and -mtime to filter the results, as well as -mindepth and -maxdepth to constrain the traversal.

Note also that "-print" is the default command for find, so you can leave it off. Other commands include -print0 (NUL-delimited instead of newline-delimited) and -exec.


Why do that when a loop is clearer, safer, and shorter?

  find images -name '*.jpg' | while read jpg; do
      convert -geometry 200x "$jpg" "$(echo "$jpg" | sed 's/.jpg$/_thumb.jpg/')";
  done
This version works even when there are spaces in a filename, whereas yours will break.


false

    $ ls -Ql
    totale 4
    -rw-r--r-- 1 zed users 33 set  6 07:30 "    spaces    "
    $
    $ find -name '*spaces*' | while read text; do
        cat "$text";
    done
    cat: ./    spaces: File o directory non esistente
    $
    $ find -name '*spaces*' -print0 | xargs -0 cat
    while read is broken with spaces
    $


IFS needs some care and attention and read should have -r.

    $ ls -Q
    "   spaces   "
    $ ls | while IFS="\n" read -r f; do ls "$f"; done
       spaces   
    $
For lots of grim detail see David A. Wheeler's http://www.dwheeler.com/essays/fixing-unix-linux-filenames.h...


That is a good point, but it is a much more degenerate case than the more common internal space.


Only for file names starting or ending with spaces, as far as I can tell, which are pretty unusual.


You can parallelize this operation with xargs simply by adding a -P. You could add a & to your convert here but that would run all the jobs at the same time. xargs allows you to only run n at a time. That would be a lot harder to replicate in bash.


Why use sed here?

    find images -name "*.jpg" | while read -r jpg; do
        convert -geometry 200x "$jpg" "${jpg%.jpg}_thumb.jpg"
    done
This correctly handles spaces in file names and uses built in shell string replacement.


Even shorter, avoiding the echo sed business:

  find images -name '*.jpg' | while read jpg; do
      convert -geometry 200x "$jpg" "${jpg%%.jpg}_thumb.jpg";
  done


I have a tendency to use xargs too, but often you can go with find's -exec, especially "command {} +" construct used for spitting many files at once to the given command. E.g.

    find . -iname '*.pdf' -exec pdfgrep -i keyword {} +


Upvoted just for mentioning that there is such thing as pdfgrep, thanks :)


If you like xargs, but want jobs to execute in parallel, you may enjoy GNU parallel.

Your trivial example:

find images -name "*.jpg" | parallel convert -geometry 200x {} {.}_thumb.jpg


xargs --max-procs=#


However, unlike xargs, GNU Parallel gives you a guarantee on the order of the output. From the man page:

       GNU parallel makes sure output from the commands is the same output as
       you would get had you run the commands sequentially. This makes it
       possible to use output from GNU parallel as input for other programs.


Want to comment just to stress that this feature is EXTREMELY useful and has saved me from all sorts of tricks with file intermediate output with filenames that match process ids, etc.


The nice thing about parallel is that it can actually run some of the instances on remote machines using SSH, transparently (as long as they have the commands). You just need a couple of parameters and it takes care of uploading the files the command needs and then downloading the results. It's quite awesome.



I too like xargs but see no need for these two xargs given sed is already used?

    find images -name '*.jpg' |
    sed 's/\.jpg$//
        s/.*/convert -geometry 200x &.jpg &_thumb.jpg/'


Is there a way to use an alias with xargs like in the command below ?

  xargs -a cmd_list.txt -I % alias %
This has been bothering me for a while.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: