Hacker News new | past | comments | ask | show | jobs | submit login
Unix Commands I Abuse Every Day (everythingsysadmin.com)
392 points by BCM43 on Sept 5, 2012 | hide | past | favorite | 103 comments



My #1 abuse is xargs -n1.

A lot of people like writing bash for loops, I will try and avoid that as much as possible, xargs -n1 is the bash equivalent of a call to 'map' in a functional language.

For instance, let's say you want to create thumbnails of a bunch of jpegs:

find images -name "*.jpg" | xargs -n1 -IF echo F F | sed -e 's/.jpg$/_thumb.jpg/' | xargs -n2 echo convert -geometry 200x

Additionally, it's fully parallelizable as xargs supports something akin to pmap.


You might be interested in "zargs" in zsh, which would save you the call to find.

Furthermore, instead of the pipe to sed and extra xargs, it might be clearer and simpler to do something like:

  zargs -n 1 **/*.jpg -- make-thumb
Where "make-thumb" is a short script (or even a zsh function, if you care about saving a fork for each input file) containing:

  convert -geometry 200x $1 ${1%%.jpg}_thumb.jpg
But, in real life, instead of writing such a script or function, what I'd probably do instead is:

  zargs -n 1 **/*.jpg | vipe > myscript
and do some quick editing in vim to modify the zargs output by hand to do whatever I need -- and then I'd run the resulting "myscript". Just fyi, "vipe" is part of the "moreutils" package [1] and lets you use your editor in the middle of a pipe.

One final trick is for when you need to do in-place image manipulation. Instead of using "convert", you can use another ImageMagick command: "mogrify". It will overwrite the original file with the modified file. Of course, you should be very careful with it.

[1] - http://joeyh.name/code/moreutils/


I use xargs a lot for refactoring work when I cannot simply use sed, e.g.

vim $(grep -lr foo | xargs)

and doing what I need to do on a file by file basis. Otherwise, for renaming functions and the like, I do a lot of:

find . -name foo_fn exec sed -i s/foo_fn/bar_fn/g '{}' \;

I generally love abusing bash. Just today I was asked about how to rename a bunch of files, specifically containing spaces, and came up with either of these two options:

find -name foo_bar -exec cp "'{}'"{,.bak} \;

and

for file in $(find -name foo_bar); do cp "$file"{,.bak}; done

Ultimately, the great thing is, if you learn CTRL-R, you can always search for these types of commands and modify them as necessary for the particular task at hand and not necessarily remember them. One I use all the time, to push git branches upstream is the following:

CTRL-R --set-

which gives me:

git push -f --set-upstream origin `git rev-parse --abbrev-ref HEAD`

This is entirely unique in my history, a common part of my workflow, and trivially searchable.

I also enjoy being able to perform something along the lines of:

vim $(bundle show foo-gem)


Sorry, what do you think

    vim $(grep -lr foo | xargs)
is doing? Assuming a missing directory after the `foo' why can't it just be

    vim $(grep -lr foo .)
I don't see that xargs's default behaviour is adding anything.

For

    find . -name foo_fn exec sed -i s/foo_fn/bar_fn/g '{}' \;
you may find -exec's + of use. The above has the fork/execve overhead for each file found.

    find -name '*.[ch]' -exec sed -i 's/\<foo\>/bar/g' {} +


You are correct on both counts regarding the vim and grep example - I guess I just assumed I would have to have all the files on a single line before handing them off to vim.

Thanks for the suggestion about -exec +; I will have to remember it in the future.


Backticks, ``, and $(), use IFS to parse the command's output into words which then become the $()'s replacement and IFS normally contains \n.


Would anyone mind doing me a favor by explaining xargs in more detail? I've tried learning it a couple times but I always seem to forget the primary situations in which it's useful. Thank you in advance!


Xargs takes a newline separated list and maps the list to a command.

    find ./ -name '*.log' | xargs rm
Finds all log files and map them to 'rm' commands. e.g. if it finds system.log and rails.log it will run the command `rm system.log rails.log`.

xargs will automatically do things like break up very long lists into multiple command calls so that it doesn't exceed the maximum number of arguments a command can have.

Other useful things about xargs are '-P <NUM>' which lets you run the same command in parallel. I use this with curl to do ghetto benchmarks.

The next major flag is `-n <NUM>` which changes the number of arguments per command call. e.g. `-n 1` will run the command per argument passed to xargs.

And the last flag I commonly use is `-I {}`. This sets `{}` as a variable which contains the arguments. (This also forces `-n 1`). This is useful for things like:

    find ./ -name '*.log' | xargs -I{} sh -c "if [ -f {}.gz ]; then rm {}; fi"


    find ./ -name '*.log' | xargs rm
Only do that if you know exactly what '*.log' will expand to (i.e. don't use it in scripts and avoid using it on the command line). This is because the delimiter for xargs is a newline character, but filenames can have a newline character in them. This can lead to unexpected results.

Almost everywhere I see xargs used, find ...-exec {} ; will work as well and find ...-exec {} + may work even better.


    find ./ -name '*.log' -print0 | xargs -0 rm
Fixes that issue and xargs is far more efficient, it doesn't launch a new process for each line like exec does, but far more importantly, xargs is generalizable to all commands so you only have to learn it once; exec is just an ugly hack on find, you can't generalize it across all commands; xargs is much more unixy.


> it doesn't launch a new process for each line like exec does

Exec doesn't either if you use "+" as a terminator:

    find . $(options) -exec command {} \;
executes one command per match,

    find . $(options) -exec command {} +
executes a single command with all matches

> exec is just an ugly hack on find

Obvious and complete nonsense, -exec is both an important predicate of find and a useful and efficient tool.

-print0/xargs -0, now that is a hack.


You apparently missed the part where I said "but far more importantly".


No, that's just your opinion and you're entitled to it so I don't care, the rest is factually incorrect.


Oh, I agree on the + thing, I wasn't aware of that option, however, it doesn't make exec any more generalizable across commands which is what matters.


Wanting to remove the files was so common that some finds have it built-in. Avoids even the overhead of -exec rm {} +.

    find -name '*.log' -delete


True, but the issue isn't removing files, the issue is generalizing a command for mapping output of a command to another command. rm was just a simple example. xargs is far more useful than simply deleting files.


Removing files is a common need coupled with find yet many readers don't know of -delete; I was pointing it out. That doesn't weaken xargs's valid uses. You seem a little defensive? :-)


Too much reddit perhaps. I use the delete switch myself regularly.


Note that xargs actually takes a whitespace-delimited list. This often leads to problems when a filename has a space in it. To fix it, you should either use:

  find ... -print0 | xargs -0 ...
or:

  ... | xargs -d'\n' ...


It's most common use is to take some stuff on stdin and then use those as arguments to a command. Here is a fancy way to do `ls * `[1] using xargs:

    find . -print | xargs ls
`find` dumps the results to stdout, the pipe shuttles `find`'s stdout to `xargs` stdin, `xargs` uses its stdin as arguments for `ls`.

(Don't run this in your home directory)

[1] Pedants will realize this is actually equivalent to `ls .* *` as hidden files are found by `find`.


Maybe you'll call me a pedant, but you should be aware that `find .` is not equivalent to `ls .* *`. The find command starts at the indicated directories (. in this case) and lists each file and directory within it, recursing into subdirectories. You can use things like -type, -[i]name, and -mtime to filter the results, as well as -mindepth and -maxdepth to constrain the traversal.

Note also that "-print" is the default command for find, so you can leave it off. Other commands include -print0 (NUL-delimited instead of newline-delimited) and -exec.


Why do that when a loop is clearer, safer, and shorter?

  find images -name '*.jpg' | while read jpg; do
      convert -geometry 200x "$jpg" "$(echo "$jpg" | sed 's/.jpg$/_thumb.jpg/')";
  done
This version works even when there are spaces in a filename, whereas yours will break.


false

    $ ls -Ql
    totale 4
    -rw-r--r-- 1 zed users 33 set  6 07:30 "    spaces    "
    $
    $ find -name '*spaces*' | while read text; do
        cat "$text";
    done
    cat: ./    spaces: File o directory non esistente
    $
    $ find -name '*spaces*' -print0 | xargs -0 cat
    while read is broken with spaces
    $


IFS needs some care and attention and read should have -r.

    $ ls -Q
    "   spaces   "
    $ ls | while IFS="\n" read -r f; do ls "$f"; done
       spaces   
    $
For lots of grim detail see David A. Wheeler's http://www.dwheeler.com/essays/fixing-unix-linux-filenames.h...


That is a good point, but it is a much more degenerate case than the more common internal space.


Only for file names starting or ending with spaces, as far as I can tell, which are pretty unusual.


You can parallelize this operation with xargs simply by adding a -P. You could add a & to your convert here but that would run all the jobs at the same time. xargs allows you to only run n at a time. That would be a lot harder to replicate in bash.


Why use sed here?

    find images -name "*.jpg" | while read -r jpg; do
        convert -geometry 200x "$jpg" "${jpg%.jpg}_thumb.jpg"
    done
This correctly handles spaces in file names and uses built in shell string replacement.


Even shorter, avoiding the echo sed business:

  find images -name '*.jpg' | while read jpg; do
      convert -geometry 200x "$jpg" "${jpg%%.jpg}_thumb.jpg";
  done


I have a tendency to use xargs too, but often you can go with find's -exec, especially "command {} +" construct used for spitting many files at once to the given command. E.g.

    find . -iname '*.pdf' -exec pdfgrep -i keyword {} +


Upvoted just for mentioning that there is such thing as pdfgrep, thanks :)


If you like xargs, but want jobs to execute in parallel, you may enjoy GNU parallel.

Your trivial example:

find images -name "*.jpg" | parallel convert -geometry 200x {} {.}_thumb.jpg


xargs --max-procs=#


However, unlike xargs, GNU Parallel gives you a guarantee on the order of the output. From the man page:

       GNU parallel makes sure output from the commands is the same output as
       you would get had you run the commands sequentially. This makes it
       possible to use output from GNU parallel as input for other programs.


Want to comment just to stress that this feature is EXTREMELY useful and has saved me from all sorts of tricks with file intermediate output with filenames that match process ids, etc.


The nice thing about parallel is that it can actually run some of the instances on remote machines using SSH, transparently (as long as they have the commands). You just need a couple of parameters and it takes care of uploading the files the command needs and then downloading the results. It's quite awesome.



I too like xargs but see no need for these two xargs given sed is already used?

    find images -name '*.jpg' |
    sed 's/\.jpg$//
        s/.*/convert -geometry 200x &.jpg &_thumb.jpg/'


Is there a way to use an alias with xargs like in the command below ?

  xargs -a cmd_list.txt -I % alias %
This has been bothering me for a while.


I like this but i can't decide if it's technically abuse or not. The paste command will happily parse - (meaning read from stdin) multiple times, so for transposing a list into a table:

  file.txt:
  line1
  line2
  line3
  line4
  line5
  line6

  $ paste - - - < file.txt

  line1 line2 line3
  line4 line5 line6
Combine with the column command for pretty printing. I seem to find a use for this pretty frequently.

I like the simplicity of this one but it's not very useful day to day:

  $ echo *
As a replacement for "ls".


Back in the crusty old days, FreeBSD used to take forever to install over the network, but would start an "emergency holographic shell" on pty4. The 'echo *' trick and various other shell built-ins were very useful for exploring the system before /bin and /usr/bin are populated.


Random note. The most commonly "abused" Unix command is cat. The name is short for "concatenate", and its intended purpose was originally to combine 2 or more files at once.

Therefore every time you use it to spool one file into a pipeline, that is technically an abuse!


It may be an abuse, but at least it has no influence on the correctness of the result as pipes are a monad http://okmij.org/ftp/Computation/monadic-shell.html


I've sometimes heard it's "catenate" not "concatenate".

Do they mean the same thing?

As for cat, the utility, I'm afraid we'll never stop seeing people doing

   cat file|prog1|prog2
even when it makes absolutely no sense whatsoever.

If it did, I might as well do

   cat file|cat|prog|cat|prog2|cat -
I mean, why not? It "looks nicer" than

   prog file|prog2 or 
   prog < file|prog2
Programmers love their cats.


It's most definitely catenate. I understand catenate to mean chain and concatenate to be to chain together. Since "cat foo bar xyzzy" doesn't modify the files to join them in any way I don't think they're chained together.

Besides, ken & Co. aren't daft. con would be short for concatenate. :-)


According to the 1st Edition and 6th Edition manual pages: NAME cat -- concatenate and print

See: http://man.cat-v.org/unix-1st/1/cat http://man.cat-v.org/unix-6th/1/cat

I'm pretty sure based on the timeline of pipe (~1973, roughly V4) that the cat command (~1971 V1) predates pipes.


Fortunately they fixed that errant man page. :-) http://man.cat-v.org/plan_9/1/cat


From the man page cat doc1 doc2 > doc.all concatenates the files doc1 and doc2 and writes the result to doc.all


I think I'm missing your point. doc[12] aren't changed.


Doc 1 & 2 are not changed but chained together to create doc.all


I have long thought that some sort of zsh completion that detects that abuse of cat and converts it into the more appropriate `< file` might be a good idea. If it did it silently it probably wouldn't be worth it but if it actually preformed the substitution in front of you then it might help users get more comfortable with the carrot syntax.


The ` < file` has to appear at/near the end of the line, right? Using cat has the advantage of being able to read the line from left to right along with the data flow. I often add more piped commands to the end of a line as I refine it, while the source data remains the same. (To be fair, sometimes the opposite is true.)


Nope,

    <file ./command --args
and even

    ./command <file --args
works fine in Bash and Zsh.


Interestingly, there are some circumstances where you actually want "cat file | program" and not "program < file". The case I have in mind is when file is actually a named FIFO which was not opened for writing. If you use cat, program will still run and only reads to stdin will block (but it can perform other things, possibly in different threads). If you use '<', opening stdin will block and program will probably block altogether.


With < on a FIFO it's the shell that blocks on opening before the command, e.g. cat, is run.


Does it really matter that you're starting an extra process?


it doesnt matter for a file of size 1kb. For a file of size 10Gb, every process matters.

For the downvoters: please time how long it takes to do something like `cat $file | awk '{print $1}' ` and `awk <$file '{print $1}'`


Not exactly convincing:

    ~/desktop$ du -h c.dat
    11G     c.dat
    ~/desktop$ time cat c.dat | awk '{ print $1 }' > /dev/null
    
    real    0m53.997s
    user    0m52.930s
    sys     0m7.986s
    ~/desktop$ time < c.dat awk '{ print $1 }' > /dev/null
    
    real    0m53.898s
    user    0m51.074s
    sys     0m2.807s
cat CPU usage didn't exceed 1.6% at any time. The biggest cost is in redundant copying, so the more actual work you're doing on the data, the less and less it matters.


I was curious, so, here goes; 'foo' was a file of ~1G containing lines made up of of 999 'x's and one '\n'.

    $ ls -lh foo
    -rw-r--r-- 1 ori ori 954M Sep  5 22:57 foo

    $ time cat foo | awk '{print $1}' > /dev/null

    real	0m1.631s
    user	0m1.452s
    sys 	0m0.540s

    $ time awk <foo '{print $1}' > /dev/null 

    real	0m1.541s
    user	0m1.376s
    sys 	0m0.160s
This was run from a warm cache, so that the overhead of the extra IO from a pipe would dominate.


Both invocations take similiar amounts of "real" time because the task is IO-bound and it takes roughly 1.5s on your machine to read the file.

But if you add up the "user" and "sys" time in the cat example, you see that it took 1.992s of actual cpu-time... Which is actually about a 30% increase in cpu-time spent.

The perf decrease wasn't visible because you have multiple cores parallelizing the extra cpu-time, but it was there.


So the two are different because awk's call to read() is effectively the same as a read directly from a file, whereas copying is taking place through the pipe with the pipeline approach?


Basically you see a linear increase in time. If it was going to take a coffee break's worth of time one way, it will take a slightly longer coffee break worth of time the other. It is fairly rare that the additional time involved matters and there isn't something else that you should be doing anyway.


The difference between

    cat file | foo
    foo <file
assuming foo only reads stdin so `foo file' isn't possible, is that with the latter the shell will open file for reading on file descriptor 0 (stdin) before execing foo and the only cost is the read(2)s that foo does directly from file.

With the needless cat we have cat having to read the bytes and then write(2) them whereupon foo reads them as before. So the number of system calls goes from R to R+W+R assuming all reads and writes use the same block size and more byte copying may be required.


No.

It is pretty much just a matter of principle.


> carrot syntax

Heh. Be careful with this, though: ^ is the caret (note spelling) according to most sources of information about these things.

Random Fun Geekery Time: Back in the Before-Before, the grapheme in ASCII at the codepoint ^ is now was an up-arrow character, which is why BASIC uses ^ for exponentiation even though FORTRAN, which came first and which early BASIC dialects greatly copied, uses .

These days, ↑ is U+2191, or &uarr; in HTML.

http://www.alanwood.net/unicode/arrows.html

http://www.fileformat.info/info/unicode/char/2191/index.htm

http://en.wikipedia.org/wiki/Caret


> Back in the Before-Before, the grapheme in ASCII at the codepoint ^ is now was an up-arrow character, which is why BASIC uses ^ for exponentiation even though FORTRAN, which came first and which early BASIC dialects greatly copied, uses .

That is, two asterisks in a row.


> Therefore every time you use it to spool one file into a pipeline, that is technically an abuse!

Since everything's a file* in UNIX (and ilk,) it's actually not an abuse.

* Pedantic variations such as "everything's a bytestream" or "everything's a file descriptor" notwithstanding.


No, cat(1) stands for catenate, not concatenate. http://en.wiktionary.org/wiki/catenate


hmmm, cat(1) mentions interesting flags, and tac, which comes in handy at times.


Useless use of cat award! http://partmaps.org/era/unix/award.html


shell input redirection doesn't work inside eshell, the only way I know to pipe into stdin from eshell is to use cat


Stop watch:

    time read
press enter to read elapsed time. If you write your activity in the prompt and repeat it for multiple activities, you have a nice time log. You can then just copy&paste it from terminal.


I have the time (H:m:s) in my prompt. That way I can easily time commands and I can also find things more easily in my scroll buffer.


For all the pipe lovers in this thread, here is a Perl utility I wrote to help debug shell pipelines. I call it `echoin`, and whatever it takes on stdin, it prints to stdout (presumably the terminal) while also treating its arguments as a command (sort of like xargs) and repeating its input for that command's stdin. So I can do:

    foo | echoin bar
This is like `foo | bar`, but I can see what's passing between them. It's a bit like `tee`, but reversed. It's what I irrationally want `foo | tee - | bar` to do.

    my $args = join ' ', @ARGV;

    open OUT, "|$args"  or die "Can't run $args: $!\n";
    while (<STDIN>) {
      print $_;
      print OUT $_;
    }


I use the "grep with color for lines plus the empty string" so frequently that I have a function for it:

  function highlight() {
    local args=( "$@" )
    for (( i=0; i<${#args[@]}; i++ )); do
      if [ "${args[$i]:0:1}" != "-" ]; then
        args[$i]="(${args[$i]})|$"
        break
      fi
    done
    grep --color -E "${args[@]}"
  }
This is only to be used as a filter, since it mangles filenames.

I'm curious, does ack support a highlight-only mode?


Yes, ack --passthu does what you want. I have highlighting turned no in my .profile (export ACK_COLOR_MATCH="bold red").


I've come to use the following command so often that I've written a script for it in my ~/bin - 'narrow':

    #!/bin/bash
    xargs -n 1 grep -l "$@"
This takes a list of files on stdin, then greps for the argument in all the files and spits out the matching files.

The perk is that it can be chained:

    find *.txt | narrow dog | narrow cat | narrow rabbit
This will find all the files that contain dog, cat, and rabbit.


But why run so many greps? I don't think there's a need for -n1 here.

    xargs -rd'\n' grep -l "$@"


Witless uninstall #1 (perfect for those crappy tarfiles that exact everything into the current directory)

    tar -tf tarfile.tar | xargs rm
Witless uninstall #2. Find a file that you changed just before "make install" into the wrong spot (usually config.log is a good candidate).

    find /bogus/installdir -newer config.log  | xargs rm
Yes, this is totally unsafe. But it's an abuse... so, there you go


Another 'less' abuse: using it as an interactive grep via '&' line filtering.

It's a newish feature of less (those of you with stale RHEL installs won't find it). Type '&<pattern><return>' and you'll filter down a listing to match pattern. Regexes supported. Prefixed '!' negates filter.

Wishlist: interactive filter editing (similar to mutt's mail index view filters), so you don't have to re-type full expressions.


Simpler than using grep to highlight output is using ack with the --passthru option. (If you have ack anyway)


I use a custom function: function hl() { local R=$1; shift && egrep --color=always "|$R" $@; }


egregious use of color (uses 256-color term support):

function hl() { local R=''; while [ $# -gt 0 ]; do R="$R|$1"; shift; done; env GREP_COLORS="mt=38;5;$((RANDOM%256))" egrep --color=always $R; }

then do e.g.

whatever pipeline | hl foo bar baz | hl quux | hl '^.* frob.*$' | less -R

results in foo/bar/baz highlighted in one color, quux in another, and whole lines containing frob in another. hopefully the colors aren't indistinguishable from each other or from the terminal background :\

I use a somewhat similar setup in emacs, where a key binding adds the word under point to a syntax highlighting table, but the color is computed as the first 6 characters of the md5sum of the word.


It should be noted that grep-dot only prints filenames if you give it more than one file.

Also, it skips blank lines. But of course that might be in the feature-not-a-bug category; and if you really want to see every line, there's always grep-quote-quote:

  grep '' *.txt
In any case, fun article. :-)


I don't think I could live without ack:

http://betterthangrep.com/


fmt 1 < file seems like it might be handy

i don't understand his first one though. i thought both linux and bsd greps had the -H option (show filename).

here's some more:

   1. sed t file instead of cat file

   2. echo dir/*|tr '\040' '\012' instead of find dir

   3. echo dir/*|tr '\040' '\012' |sed 's/.*/program & |program2 /' > file; . file instead of xargs or find
(of course this assumes you keep filenames with spaces or other dangerous characters off your system.)

   4. same as 3. but use split to split file into parts.  then execute each part on a different cpu.


The point of the first one is he's abusing grep as cat; what he really wants is a cat that shows filenames, but since there's no such thing he uses "grep ." as a substitute.


My number one used command: grep -Hnri <text> <file or *>


This works really well with the suggested --color=always, because it highlights the line number, the filename and the separators in different colours.


I sense there is some cat abuse going on!


cat foo | less

Normally I'd typed something like "grep -i 'something' foo | less", and wanted to just up arrow the previous line and change the grep stuff to cat. I don't know why, it doesn't really save me anything. Maybe it's the hackerish "because I can, that's why" instinct at work.


`cat foo | less` is gratuitous. `less foo` is all you need.


It'll save you a few more keystrokes if you do

    cat !$ | less
!$ is "the last argument to the previous command".


If you do

    less <ESC>.
it will save you even more keystrokes.


Do you remap ESC to another key?


No, but for me <ESC> acts like <META> and the actual keybinding is M-. or M-_.

The command you are looking for is yank-last-arg for bash and insert-last-word for zsh.


You have that backwards--Meta actually means "prepend Escape" (or 8th bit on, but that's another discussion).


Isn't that what is hacking all about? Using thing in a different way than it was intended?


    You can do "head -n 0" on Linux to mean "all lines". 
No you can't.


`head -n 99999` seems like a weird way to do it anyway. Wouldn't it make more sense to do `tail -n +1`? The output is the same from both commands, but `tail` doesn't require you to assume arbitrary limits.

Honest question, btw. I'm relatively inexperienced with Linux, and I certainly haven't used BSD. I'd appreciate any critiques you may have to offer.


> Wouldn't it make more sense to do `tail -n +1`

Yes (or perhaps tail -n +0 as that is idiomatic, which makes it clear to anyone what you are intending).


Should probably be "head -n-0".

The '-' between -n and 0 means "all but the last 0 lines".


Nice unix commands




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: