Hacker News new | past | comments | ask | show | jobs | submit login
Top Unix Command Line Utilities (coldflake.com)
109 points by coldgrnd on Dec 30, 2012 | hide | past | favorite | 63 comments

These obviously aren't related to 2012 at all.

Some issues:

- don't forget that /dev/random blocks

- It's easier to use dd_rescue to track progress than to signal dd

- Using dd to zero out a hard drive repeatedly doesn't increase security[1]. Using ATA secure erase does[2]

- an alternative for summing file sizes is

    du -ch **/*.png
[1] http://en.wikipedia.org/wiki/Data_erasure#Number_of_overwrit...

[2] https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase

I think the only "secure" way to erase the contents of a hard drive is to repeatedly overwrite the disk surface with a mix of random/patterened data (like Darik's Boot & Nuke does).

Also, for those wondering about the blocking of /dev/random, it will restrict the number of bits you can copy using dd, but this won't be apparent unless you attempt to copy more bits from the entropy pools than there are available for random number generation. For more information, see this question on Super User: http://superuser.com/questions/520601/why-does-dd-only-copy-...

The ATA secure erase command is faster, and should be better than overwriting. Overwriting has potential for missing sectors marked as bad but Secure erase will get those.

Multiple over writes is pointless. There's the Gutmann stuff, but that's ancient and the 35 passes was for multiple drive controllers, if you didn't know what drive controller was being used.

But then sometimes you don't have to do what works, but what other people tell you. Thus, if you're working to a standard it doesn't matter if DOD specifications are actually more secure than a single secure erase, you do what the spec calls for. And if you have to persuade other people that the data is provably gone it's easiest to just grind the drives.

Secure erase depends on the technology of the drive and the prowess of your attacker.

If you fear the NSA seizing your disks, consider the tradeoffs of explosive disposal.

If you fear a technically-savvy reporter going through your trash bin, overwriting your disk three or four times with patterns will be fine. But it's probably faster to take a drill and make a couple of holes. Make sure you hit the platters.

If you are selling your old hardware and just don't want your unencrypted stuff to be recovered by a sixteen year old with no budget but lots of time, overwrite the disk once.

If you're trashing an SSD, make sure any patterns you use for overwriting are not compressed out of existence by the controller. Or pull off the controller and crunch it.

The whole "overwrite xx times" is a myth. Overwriting it once with random data makes it completely unreadable. Overwriting it more than that is a complete waste of time.



Anyone wiping disks in any of these ways in 2013 is doing it wrong.

Zeroize the keys. Then you don't need to worry about the disk.

If your data is something that would be a problem if the physical disks were stolen, don't store it on-disk unencrypted.

> It's easier to use dd_rescue to track progress than to signal dd

Why? I just use: watch pkill -USR1 dd

I'm a heavy command line user (I don't have a graphical file explorer/manager for instance). Here is my top 42, in order of usage:

    ls, cd, git, ssh, make, e, cat, veille, rm,
    wpa_supplicant, grep, evince, mv, x, dhclient,
    cp, echo, todo, mplayer, scp, man, mkdir, ack,
    pdflatex, apt-get, apt-cache, sed, less, feh,
    racket, gcc, wget, xrandr, bg, svn, pmount,
    for, gpg, halt, ping, tail, top.
"e" is an alias for emacsclient ; "veille" is a script which toggles between "xset s 5" and "xset s default" ; "x" is an alias for "xinit" ; "todo" is a script which manage a text file which I use as a todo list.

"ls", "cd", and "git" are far more used than any other commands : 14808, 13256 and 10078 times respectively, against 3919 times for "ssh" which is just behind.

I obtained these data from my .bash_history. Here are the place of the commands that are listed in the article :

    "tr" is 76th
    "sort" is 66th
    "uniq" is 120th
    "split" is there only one time like many other command so its rank is not relevant
    Substitutions operations are what most "for" do so it is in my top 42. However see (1) below about the article.
    Files size are a mix of "ls" for individual file and "du" for multiple files, "du" is 65th
    "df" is 63rd
    "dd" is 473rd
    "zip" is 123rd (and funnily "gzip" is 122nd)
    I didn't use "hexdump"
(1) About the following line

    for i in *.mp4; do ffmpeg -i "$i" "${i/.mp4}.mp3"; done
I have two remarks. First, using the "-vn" of ffmpeg would accelerate the conversion by making ffmpeg ignore the video entirely. Second, substituting with '/' in the Bash expansion is not the right way to do that. "${i/.mp4}" would be "$i" without the first occurence of ".mp4". It is '%' that you want to use here.

I wonder how large is your HISTFILESIZE, to get these accurate statistics?

My .bash_history weights 1.7Mo (almost 100k lines). I set a very high HISTSIZE since I don't see any reason to lose this data.

One reason might be that bash reads this whole file into memory on interactive startup and rewrites it completely when shutting down.

I second this, I think it is good practice to rotate the history file manually when it reaches several MBs. (Using zsh, the main symptoms of a large history file is sluggishness when using ^R and a lag of a few tenths of a second when closing a terminal.)

For now I don't feel any annoyance, the shell startup is instantaneous and it closes instantaneously as well. If I'm starting to feel a slow down I might rotate the log, but I guess that's not happening soon.

wow...that's quite some statistical data you gathered! I tried this myself using this command:

    cat ~/.bash_history | cut -f1 -d' ' | sort | uniq -c | sort -n -r
turns out my .bash_history is clipped at it's default size-limit (500 lines). So I'll change that to gather more data for next year. My results started with: 147 clang++ 77 ls 54 cd 15 gs 14 rake 13 vim ...

where gs is short for "git status" ...and thanks for the hint about the substitution! I updated it on my page.

In a bash session (bash 2.05 or later):

  $ hash | sort -nr | less
shows usage counts and full paths of executed commands.

My 2012 top, in order of usage:

joe, ls, cd, time, cdbdump, tail, more, cat, rm, grep, wc, apachectl restart, find, curl, chmod, history, mv, locate, cpan, apt-get, pwd

But the most useful one is a command line Perl utility I called "flt" that executes a block of perl code for each in the stdin.

cat file.txt | flt ' $line=~ s|\s+| |gsi; print $line."\n"; '

That would compact free spaces

find . | flt ' if (-f $line) { print (-s $line)."\n"; } '

This would print the size of all files in the current folder and subfolders.

So it works like awk, but with full Perl, no need to learn awk syntax. You can do conditionals, loops and whatnot. I write 30% of my one time throw away scripts directly in the command line.

Are you familiar with perl's -n and -p switches?


Those "flt" lines could be written

  perl -lpe 's|\s+| |gsi' file.txt
  find . | perl -ne 'if (-f) { print -s }'
(-l chomps the incoming newlines, and puts them back on the output)

Of course, "perl -ne" is longer than "flt", and I appreciate all this implicit use of $_ is not to everyone's tastes.

Yes, but I bundled slightly better syntactic sugar in my tool. It's the same, in essence.

A more interesting set of commandline utilities to know - http://www.cyberciti.biz/open-source/best-terminal-applicati...

This list is definitely more interesting to me. I discovered a few of these already this year and have been using them a lot (mtr, pv, curl for inspecting headers) and several others that I know I'm going to start messing with immediately (siege, multitail).

Another VERY useful tool I didn't see on this list is iperf. From the Debian package description:

Iperf is a modern alternative for measuring TCP and UDP bandwidth performance, allowing the tuning of various parameters and characteristics.


* Measure bandwidth, packet loss, delay jitter

* Report MSS/MTU size and observed read sizes.

* Support for TCP window size via socket buffers.

* Multi-threaded. Client and server can have multiple simultaneous connections.

* Client can create UDP streams of specified bandwidth.

* Multicast and IPv6 capable.

* Options can be specified with K (kilo-) and M (mega-) suffices.

* Can run for specified time, rather than a set amount of data to transfer.

* Picks the best units for the size of data being reported.

* Server handles multiple connections.

* Print periodic, intermediate bandwidth, jitter, and loss reports at specified intervals.

* Server can be run as a daemon.

* Use representative streams to test out how link layer compression affects your achievable bandwidth.

I use iperf initially when I'm troubleshooting poor file server transfer speeds, for example. There's a pretty Java GUI too if you want that.

Not really a bash utility, but delegating to this script from vim has been a huge timesaver for long running scripts. For instance I can trigger my ruby specs from Vim without vim getting blocked. I wrote a blog post about it here: http://minhajuddin.com/2012/12/25/run-specs-tests-from-withi...

    #Author: Khaja Minhajuddin
    #Script to run a command in background redirecting the
    #STDERR and STDOUT to /tmp/runinbg.log in a background task

    echo "$(date +%Y-%m-%d:%H:%M:%S): started running $@" >> /tmp/runinbg.log
    $cmd "$@" 1>> /tmp/runinbg.log 2>&1 &
    #comment out the above line and use the line below to get get a notification
    #when the test is complete
    #($cmd "$@" 1>> /tmp/runinbg.log 2>&1; notify-send --urgency=low -i "$([ $? = 0 ] && echo terminal || echo error)" "$rawcmd")&>/dev/null &

So basically the same as in 1985?

Basically. And that's a good thing.

You can also use Ctrl+T to get the status of a running command, instead of faffing about with pids and kill.

That doesn't work for me, but sending USR1 does.

Linux still hasn't picked up 'stty kerninfo'.

"for i in *.mp3" doesn't work with spaces and some other "special" characters in filenames, see here:


The for loop works fine. Performing word splitting/wildcard expansion/etc on a variable will, well, split it into words/expand wildcards/etc, whether it was introduced by a for loop or not.

I am not sure I understand what you mean. Word-splitting is performed by the "for x in y" construct, using the IFS variable, so by default if you have a file called "foo bar.mp4" the command line from the article:

  for i in *.mp4; do ffmpeg -i "$i" "${i%.mp4}.mp3"; done
will result in executing:

  ffmpeg -i foo foo.mp3
  ffmpeg -i bar.mp4 bar.mp3
Which is obviously not what was meant. So it's a good habit to learn to loop over files in a directory in a different way.

Maybe you are confusing this with what happens when you do

    for i in `find -name '*.mp4'`; do # ...
or similar. In that case, the output of `find` is indeed split first, and `for` sees "foo", "bar.mp4", and so on.

Ah yes, you are right, thanks!

Perhaps you have a very broken shell.

  $ function echon { echo $# $*; }                    
  $ >'foo bar.mp4'
  $ for i in *.mp4; do echon "$i" "${i%mp4}mp3"; done 
  2 foo bar.mp4 foo bar.mp3

Combine sort and uniq is useless since sort has already -u flag.

Except for those cases where you want counts, only unique/only duplicate, skipping fields, doing case-insensitive duplicate detection, or only comparing a chunk of the line - all but the last two I've used and now that I've found those in the man page (GNU uniq) I can replace some silly shell hacks that I've used over the years. It's like saying that tail is pointless unless you're doing tail -f... you're missing the actual useful functionality due to presuming the tool only does what you've done with it in the past.

sort | uniq -c | sort -n

is something I use all the time to get sorted frequency tables.

And, by the way, the two commands are something that could be done in O(n), rather than O(n*log(n)) - but this little procedure is so damn easy to write, that on relatively thin inputs that are less than 20m lines long, I usually just do this.

Does anybody know of great guides to master the majority of the useful CLI tools available on Unix and Linux? For me the challenge is that I don't even know I have some of these fantastic solutions readily available. I'd really benefit of knowing they're there in the first place instead of hacking a homebrew solution every time.

Learn Linux the Hard Way talks about them quite a bit, is that the go to guide in 2012 or can I do better?

http://www.commandlinefu.com/commands/browse is a good resource to at least browse through for inspiration.

I also learned a lot from "The Linux Cookbook" (Second Edition) by Michael Stutz (this might be the first edition online: http://dsl.org/cookbook/cookbook_toc.html).

Got a feeling they ran out of utilities to mention by the end. Even very new users of *nix will likely have heard of and used `find` and `zip`.

Probably dd and df too. But, he's just listing what commands he found the most useful, not just the esoteric ones.

xxd is a pretty nice hexdump substitute that has a reverse mode of operation (-r), turning a hexdump into binary.

From what I recall xxd is actually part of the vim distribution, so it's common, but not everywhere. An approximate standard (coreutils) alternative is `od -x' (although it doesn't include the ASCII readable char column at the right, which can be annoying)

Where's the love for awk? It has been tucked away in a sub item but doesn't deserve first class status?

That whole "find -ls | awk" is wicked slow anyway; try wc and xargs...

  $ time find -ls | awk '{s += $7} END {print s}'

  real	0m27.721s
  user	0m1.256s
  sys	0m1.780s

  $ time find | xargs wc -c 2> /dev/null | tail -1
  604260969 total

  real	0m0.332s
  user	0m0.068s
  sys	0m0.204s

The standard disclaimer on find | xargs: you should use -print0 and -0 to avoid problems with files with whitespace in their names, i.e.

   $ find -print0 | xargs -0 wc -c 2> /dev/null | tail -1
(Also, many uses of find | xargs can be replaced with -exec cmd {} \; or -exec cmd {} +, e.g.

  $ find -exec wc -c {} + 2> /dev/null | tail -1
although this isn't much faster in this case.)

You sure that's not because of memory swapping? Once warmed up the awk command is much faster for me.

Also, the results are different - though I'm too lazy to figure out why right now :)

You need to filter out directory entries.

    find -type f -ls|awk '{s += $7} END {print s}'
    find -type f -print0 | xargs -0 wc -c | tail -1
    find -type f -exec wc -c {} + | tail -1

I am not familiar with awk, what's the 's+= $7'? What is the argument 2 passed to wc? Why is the produced output different? What am I missing here?

s += $7 means "add the content of the 7th column to the total"

Makes sense, thanks!

2013 will be the year of awk on the desktop.

actually there is heaps of love for awk... so much actually that I'd rather spend a whole post on it than to "just" make it one item :)

Motivated by your list, I went through my history to pull some awk snowclones http://news.ycombinator.com/item?id=4989524

The second #6 example, the one with xargs, is wrong. Xargs(1) doesn't necessarily create a single du process, it might create several.

I was wondering about that, but is it really wrong? From what I found in the "BSD General Commands Manual":

Any arguments specified on the command line are given to utility upon each invocation, followed by some number of the arguments read from the standard input of xargs. The utility is repeatedly executed until standard input is exhausted.


-P maxprocs Parallel mode: run at most maxprocs invocations of utility at once.

The way I interpret that is that you could run xargs in parallel mode, but by default the "utility is repeatedly executed" in the same process.

Even in sequential mode, for a sufficiently long input xargs will invoke the command multiple times to comply with kernel limits on command line length.

If the utility is repeatedly executed then by definition it can't be a single process.

cat ~/.bash_history | cut -f1 -d' ' | sort | uniq -c | sort -rn | head -10 | cut -b9-

That's not enough, you need to split lines on "|" and ';' and before that after for's "do" and if's "then".

It's not too hard to do (a very basic) implementation of splitting on | and ;, e.g.

  sed 's/ *| */\n/g' ~/.bash_history | cut -f1 -d' ' | ...
(This fails on something like "echo 'a | b'" and doesn't split correctly on |& and || and doesn't split at all on &&.)

sure, and search through your file system to find all scripts written this year, to include those too....

give me a break.

Hey calm down, I didn't mean to be agressive sorry if you felt my comment that way.

You can count scripts as commands (I did it in my other comment elsewhere on this page) but the way you do it you will miss a lot. For instance you won't count any "uniq", "sort", … that are almost exclusively used as filter and not in first command, you will also miss a lot of "less" and "grep" for instance.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact