
Linux utils that you might not know - nreece
http://shiroyasha.io/coreutils-that-you-might-not-know.html
======
brettproctor
Anyone here will probably enjoy checking out commandlinefu:
[http://commandlinefu.com](http://commandlinefu.com)

Especially looking down the list of all time greats:
[http://www.commandlinefu.com/commands/browse/sort-by-
votes](http://www.commandlinefu.com/commands/browse/sort-by-votes)

~~~
philjr
Bringing this on a small tangent, but I've never been comfortable with 'sudo
!!' and I can't really articulate why aside from wanting to be as explicit as
possible.

<up>, <ctrl-a>, type sudo and enter is nearly as quick and much more explicit
for me.

~~~
scardine
If you are a long time "vi" user, put `set -o vi` in "~/.bashrc": the sequence
to issue the previous command with sudo prepended will be <esc>0isudo<enter>,
probably something that is in your muscle memory already.

~~~
sevensor
While I'm a very long time vi user, and I'm completely at home with the
keybindings, I could never get into using "set -o vi". For me it falls into
the uncanny valley of being quite like vi while not actually being vi.

------
Aissen
Just look at anything in the "moreutils" package. Just great stuff. Excerpt:

\- ifdata: do not parse the output of ifconfig/ip anymore. Just use this tool.

\- sponge: when you need to overwrite an input file at the end of pipe. sponge
will wait for the pipe to end before overwriting, preventing any data loss

\- vidir: edit a given dir with your EDITOR. Awesome for mass renames/deletes.

\- ts: add timestamps to a command

\- parallel: C implementation of GNU parallel (in perl). Very small, very
fast, just does the core (running commands in parallel), and does not try to
take over xargs.

There are others of course.

~~~
vog
Although these tools are great, I'm afraid these are of limited use, as these
aren't preinstalled but must be explicitly installed.

For personal use these are great. But if you write a shell script to be
executed on different machines where you can't install anything, these tools
won't be available to you.

And if you don't have that requirement, i.e. you can install everything, just
install appropriate Perl/Python/Ruby libraries and use a proper scripting
language.

(BTW, I tend to assume that Python is installed on almost all modern Unix
systems by default, hence I prefer writing portable tools in Python instead of
portable Shell code.)

~~~
angry_octet
Why assume python? It's huge compared to bash or perl. Lots of minimal
configuration operational systems won't have python. More likely than
go/ruby/node though.

~~~
chrisper
You don't need to install go. It has the runtime thing built into the compiled
executable.

~~~
vog
Same with C. However, the binaries are then architecture dependent, so not
really portable either.

Nevertheless, statically linked binaries are indeed a good alternative, and
provide good-enough portability in a wide range of cases.

------
MichaelBurge
* 'comm -3' is a quick way to do a set difference from the command line.

* Newer versions of sort have an option for running sorts in parallel. If you're using an older sort, you can split the files, sort them individually with GNU parallel, and use --merge to combine them.

* If you have scripts that read data files, process them, and output more files, consider using a Makefile.

* tmux is a good way to leave a development session on a server that you come back to at the start of the day, or that you run long-running processes in.

* Setting a soft ulimit system-wide is a good way to avoid accidentally running the machine out of memory, causing your C libraries to be paged out to disk, and usually requiring a reboot. Since it's soft, you can override it at any time.

* strace and gdb can be run on just about anything to see what it's doing. If it's an interpreted language, you can often attach to a live process, call a function to invoke some interpreted code, and not have to kill & restart the process to fix a bug or oversight. Or you can attach to a Postgres worker process to get a stack trace, which often tells you why your query is slow if it runs so long you can't get an explain analyze.

* Look in /proc to find tons of useful information about running processes and settings. For example, you can tell how many filehandles a process has open, and sometimes what file it corresponds to. Just yesterday, a grub update caused a server to boot the kernel with the command line replaced with 2 random characters and a newline, which would be harder to debug without /proc/cmdline.

* pushd and popd are useful in shell scripts to temporarily change directories without forgetting where you were.

~~~
falsedan
> _If you have scripts that read data files, process them, and output more
> files, consider using a Makefile._

Dear sweet hypnotoad, don't use Makefiles for this. Scripts in a pipeline are
perfectly well suited for ETL. They have the advantage of using the same
language as the command language (Makefile is not shell, and when it differs
it's a significant surprise). Plus, you can drop a script into a dir in your
PATH and it will work in any project.

Use make if you have a project which expensively computes assets and reusing
partial outputs is possible.

> _pushd and popd are useful in shell scripts to temporarily change
> directories without forgetting where you were._

I would start with

    
    
      cd - # go back to where I just came from
    

Which handles most of the pushd/popd use cases (outside of writing a script).

Really, there's a whole separate set of skills for effective interactive shell
use & writing great bash scripts (e.g. parameter expansion and history
manipulation).

~~~
kingosticks
I'd stick with pushd and popd every time, it's considerably more expressive.

~~~
falsedan
> _considerably more expressive_

is that an advantage? Do you have the time to explain this a bit more?

I feel that new users need less expressiveness, to avoid decision overload,
and keeping one automatic directory save point is easier to mentally manage
than a stack of them.

I do recommend using pushd/popd in shell scripts (always) and interactively
(if you must), but I think 'cd -' should be the first thing you introduce to
newcomers w.r.t tracking working directory changes.

~~~
godd2
'cd -' only saves one previous directory, which is held in $OLDPWD. pushd can
store an arbitrary history in its stack. And of course, $OLDPWD will change if
you hop around after a cd, but the pushd/popd stack will persist.

~~~
falsedan
That's true, but I rather would have liked to hear how the expressiveness of
pushd/popd was superior to cd - (especially with respect to a newcomer
internalizing all the weird coreutils & bash things).

------
majewsky
Since the article mentions numfmt(1), I'm surprised that no one mentioned
units(1) yet. I use it all the time to convert units.

    
    
      $ units '4123412312312 bytes' 'tebibytes'
              4123412312312 bytes = 3.7502217 tebibytes
              4123412312312 bytes = (1 / 0.26665091) tebibytes
      $ units "50 miles per gallon" "liters per 100 kilometers"
              reciprocal conversion
              1 / 50 miles per gallon = 4.7042917 liters per 100  kilometers
              1 / 50 miles per gallon = (1 / 0.21257185) liters per 100 kilometers

~~~
spleeder
I use it as a general purpose calculator.

------
giis
There is also another command 'paste'.

Example: Created two files (1.txt and 2.txt)

[efg@fedori ~]$ cat 1.txt

file1: line1

file1: line2

[efg@fedori ~]$ cat 2.txt

file2: line1

file2: line2

In order to merge the lines of these two files we can use paste:

[efg@fedori ~]$ paste 1.txt 2.txt

file1: line1 file2: line1

file1: line2 file2: line2

Also check out 'man fold'

~~~
mynewtb
Also super handy to turn lines into columns. 'paste - - -' for example will
take every three lines as single row.

------
j_s
Tools like rdfind and rmlint breathed new life into my SSD by hardlinking
duplicate files after video editing software helpfully duplicated huge videos
on import.

[https://rdfind.pauldreik.se/](https://rdfind.pauldreik.se/)

[https://github.com/sahib/rmlint](https://github.com/sahib/rmlint)

There are great GUI visualizers for disk space, like WizTree/Baobab; for the
command line I've had success with ncdu.

[https://dev.yorhel.nl/ncdu](https://dev.yorhel.nl/ncdu)

------
cybersol
Even as a decades long 'nix user, I had never used join until recently, which
joins lines of two files on a common field. I used to turn to fgrep or a
Python script to solve a whole class of problems join can help solve on the
command-line.

~~~
minhajuddin
I recently got to use `join`. However, without reading documentation on outer
joins. I went ahead and wrote my own version:

    
    
      join -t$'\t' <(cat c <(comm -13 <(cut -f1 c)  <(cut -f1 d) | sed -e 's/$/\t/') | sort -k1,1) <(cat d <(comm -23 <(cut -f1 c)  <(cut -f1 d) | sed -e 's/$/\t/') | sort -k1,1)
    

Then I stumbled on an easier version with using just join

    
    
      join -a1 -a2 -o auto f1 f2
    

There are really too many unexplored things on the linux command line for a
typical dev.

------
squarefoot
Mgen and Drec. No idea if they have been included in any modern distro, or
there are more modern equivalents, anyway over 15 years ago I used them on
Alpha hardware under both Linux, BSD and Tru64 to test a local network for
missing UDP packets now and then in a strange regular pattern. By missing
packets now and then I mean figures like 5 packets every 3 millions at full
speed, it wasn't something one could hope to reproduce just by pinging around,
therefore we needed something to leave during night hours running and writing
huge logs. Turned out that it was an interrupt triggered by the video
subsystem on the Alpha machines only under Tru64 that grabbed 100% of the CPU
for a very short time, but enough to make the receiving task miss a few
packets. Once identified the problem I was able to reproduce and predict it
with 100% accuracy.

[https://www.nrl.navy.mil/itd/ncs/products/mgen](https://www.nrl.navy.mil/itd/ncs/products/mgen)

ps. ...I know, I know, but sadly no way to make them change the protocol used.

------
Tepix
"cal -3" is quite useful, I use it all the time. To show the week-of-year use
"ncal -w". Add "-S" if you want ncal to start weeks on Sunday (crazy!).

~~~
bartl
Use `cal -m` if you want a week to start on a Monday. AFAIK Sunday is the (US
centric) default.

~~~
bratch
`cal` does try to use the expected first day of the week based on the locale.
It seems to at least use LC_TIME from some quick local testing:

    
    
      bratch@serenity ~ $ LC_TIME=en_GB cal | head -n2
            May 2017      
      Mo Tu We Th Fr Sa Su
      
      bratch@serenity ~ $ LC_TIME=en_US cal | head -n2
            May 2017      
      Su Mo Tu We Th Fr Sa

------
db48x
for extra fun try watch -t -n 1 "factor \$(date +%s)"

~~~
qznc

        alias primetime watch -t -n 1 "factor \$(date +%s)"

~~~
db48x
Now that's a good alias

------
smitherfield
One note: I believe `shred` won't work as expected on a journaling file system
like ext4.

~~~
Scaevolus
No. Ext4 doesn't do data journaling by default. Even when enabled, what's
written to the journal is _blank data_ that's about to be written, not the
current file's contents.

    
    
        1) write to journal "I'm going to overwrite this file with this data (zeros)"
        2) commit journal
        3) write data to file
    

This is typical for journaling filesystems-- step 3 can be interrupted by a
crash and replayed later (by re-reading the journal).

For filesystems with CoW data (ZFS, btrfs), the in-place data will probably
not be overwritten.

~~~
LukeShu
Before evaluating the claims myself: The shred manual specifically claims that
it is "not guaranteed to be effective" on "log-structured or journaled file
systems", and specifically calls out ext3 in data=journal mode.

I would assume that the concern with ext3/4 in data=journal mode is that shred
does not guarantee that the records of previous writes are evicted from the
journal.

~~~
dom0
In data=journal mode, data to be written is first written into the journal.
Only after the journal is flushed it will be written out to the correct
location. Therefore, a crash at any time is fixed by replaying the journal
forwards.

Note that the ext3/4 journal is a redo log, not an undo log. Old file contents
are not copied into the journal on a write.

Thus, I don't see why shred should be less effective in data=journal mode
compared to the other journaling modes.

CoW file systems are a different story. They don't allow you to overwrite
physical file contents. You have to set the +C (FL_NOCOW) flag, which is, by
principle, _only effective for a file that does not have any contents yet_.
Thus, you can't set +C on an existing file and overwrite it's contents.

~~~
angry_octet
There is no reason for blocks that have been fully rewritten to be written
back to the same location. In fact, it is faster to write them somewhere
convenient near the write head and update the indirect block. So even though
only the meta data goes through the log, block locations can change.

~~~
adrianmonk
I don't know that I'd say there's no reason at all.

For one thing, if you have a contiguous file and you update some (but not all)
bytes, putting them back in the original location allows the file to stay
contiguous.

Also, if you write the data back into the original location, you don't have to
update metadata such as inodes. Now, you may say that's less data, but on a
spinning disc, there is some threshold below which the amount of data written
doesn't matter much at all and it's the number of seeks that matters more.
That is, if it's a choice between a single 50k continuous write or two
separate 1k writes in different locations, the single write is probably
quicker. (But this falls apart eventually of course.)

Whether these reasons are enough to prefer updating in place is another
question, of course. But it's not like there isn't any benefit at all.

------
jeroenjanssens

        $ curl -s http://datascienceatthecommandline.com | pup '.sect3 > h3 text{}' | head
        alias
        awk
        aws
        bash
        bc
        bigmler
        body
        cat
        cd
        chmod

~~~
aktau
TIL about pup(1) [1], what a fantastic tool to grab some data from an HTML
page. Written in Go too, so easy to deploy. That one's going in my toolchest.

[1]: [https://github.com/ericchiang/pup](https://github.com/ericchiang/pup)

------
dkns
My favorite: basename

Returns just the filename plus extensions e.g.

basename a/b/c/d/foo.php returns foo.php

Very useful for shell scripting.

~~~
fpoling
With shell one can just use "${path_var##*/}" to get the base name from the
string stored in $path_var. This also works if $path_var is just a base name
already without any slashes.

EDIT: fixed the substitution.

~~~
josefx
> With shell one can just use "${path_var##*/}"

The reason I use python whenever I need something "shell" like. Cryptic symbol
salad that makes code golf fanatics green with envy.

------
RijilV
I was really happy about tac when I found out about it, same with paste and
comm as already mentioned.

Gotten a lot of mileage out of the various seldom used options of uniq, diff
and cut too.

~~~
VSpike
I have a friend who uses `tac | tac` in pipelines to make the pipeline wait
for all the data before continuing, which is an interesting hack.

~~~
cnvogel
If you install the "moreutils", there's sponge doing the same.

[https://joeyh.name/code/moreutils/](https://joeyh.name/code/moreutils/)
[https://linux.die.net/man/1/sponge](https://linux.die.net/man/1/sponge)

------
tduk
Don't forget the often overlooked "apropos"!

~~~
dwe3000
My problem with apropos (and man, for that matter) is related to how at least
one major distribution handles the man pages. Rather than bundling the man
pages with the applications, the man pages for many applications are bundled
together in one base package. When this is done regardless of whether or not
the application is actually on the system, apropos and man continually return
information on applications which are not on the system.

------
nbanks
Although I frequently use cal, the article mentioned cal -3 which will be
useful near the end of each month. Previously I always displayed the calendar
for the whole year or next month instead.

Going through the core utils manual [1], I found a couple other useful looking
commands I didn't know about. Previously I used awk in bash scripts primarily
for printf, but apparently it's part of coreutils too. The timeout command
also looks useful.

1\.
[https://www.gnu.org/software/coreutils/manual/coreutils.html](https://www.gnu.org/software/coreutils/manual/coreutils.html)

------
AstralStorm
And the columns line has the fan favourite useless use of cat.

[0]
[http://porkmail.org/era/unix/award.html](http://porkmail.org/era/unix/award.html)

------
hellomynameis12
GNU utils _

------
xkxx
I like how this article's title doesn't claim that it's about utils "that you
don't know". Yet I was expecting utils that have something to do with the
Linux kernel, not generic *nix command-line tools.

------
throwaway2048
should be noted the factor program on most systems is not accurate!

[https://marc.info/?l=openbsd-
cvs&m=147272695616766&w=2](https://marc.info/?l=openbsd-
cvs&m=147272695616766&w=2)

[https://marc.info/?l=openbsd-
cvs&m=146826185625800&w=2](https://marc.info/?l=openbsd-
cvs&m=146826185625800&w=2)

Hey its part of bsdGAMES not bsdGETSHITDONE :)

------
everybodyknows
Which xyz-file buried in the code base has that "abc" function I saw last
month?

locate /\\*xyzGlobPattern | xargs grep -E '\<abc[(]'

~~~
vdm
ag -G [file pattern] [text pattern]

------
_pmf_
My favorite unexpected tool: tsort for performing topological sorting (i.e.
sorting a dependency graph by order of dependencies).

------
hprotagonist
afaik, `shred` doesn't do anything useful on SSDs. Beneath the OS, there's
intentional indeterminacy on sector access, so instructions to write over the
same sectors multiple times with garbage aren't guaranteed to actually do
that.

~~~
sherincall
The reason shred writes "multiple times with garbage" is due to magnetic disks
where even when overwritten, old bit values could still be extracted with
advanced tools. This is not the case for SSDs, so a single pass of shred
(either random or zero) is enough here. Multiple passes will just kill your
SSD faster.

The problem with sector access still stands, so you're not sure it's actually
overwritten. You should use sfill(1) in addition to shred. sdmem(1) also, if
you don't want to physically power down your machine.

~~~
DanBC
>The reason shred writes "multiple times with garbage" is due to magnetic
disks where even when overwritten, old bit values could still be extracted
with advanced tools.

People _thought_ that you needed multiple overwrites for traditional drives
but in reality that data was gone after a single overwrite of 0. There were no
"advanced techniques" that could recover the data. Some specs suggested
multiple passes, but this was because they were using precautionary principle.

> This is not the case for SSDs, so a single pass of shred (either random or
> zero) is enough here. Multiple passes will just kill your SSD faster.

SSDs do weird things with the data, so if your data is important enough you
should destroy the drives. There's some data tucked away in odd places.

~~~
marcosdumay
Hum, no. Some time ago you needed multiple uncorrelated overwrites to hide
your data. Then as HDDs shrank you strted needing less and less passes up to
today that you just need to clear it once.

It's not about people thinking unrealistic things. The command was created for
defending against a demonstrated attack.

~~~
hprotagonist
Yeah. You'd be amazed what you can do with a scanning electron microscope.
(ie, read individual bit states off after a rewrite.)

~~~
DanBC
Someone would have done it by now if it was possible, so where are the
published papers?

It's not possible, and it's never been possible unless we're talking about
1970s 24" platters, and we're not talking about those.

~~~
hprotagonist
[http://escholarship.org/uc/item/26g4p84b#page-5](http://escholarship.org/uc/item/26g4p84b#page-5)

------
ivanstojic
It always pains me when I see people use "cat" left and right, even when they
don't need it.

This makes for good reading:
[http://porkmail.org/era/unix/award.html](http://porkmail.org/era/unix/award.html)

~~~
noja
It shouldn't. It's 2017 now, and the speed difference between a "misplaced"
cat command is normally negligible. The speed of the terminal user is more
important.

~~~
AstralStorm
It is slower to type too. Just write the command with the file name already.

~~~
anc84
If you are just doing one line and you write it perfectly, then maybe. For
many people cat is simply the start if excessive piping. I like having the
filename as far at the beginning as possible so I can Ctrl-W with ease.

------
sigi45
Show sorted list with human readable file/foldersize:

du -hs * | sort -h

~~~
effie
Tha shows only some files. To list all files, use:

du -hs * .* | sort -h

~~~
cyphar
Or if you (like me) hate depending on globbing semantics:

$ find . -mindepth 1 -maxdepth 2 -print0 | xargs -0 -- du -hs

Not that I would write it in a shell. ;)

------
alex_duf
I really like the command "join" as well

------
lathiat
I needed column yesterday. TIL

Also paste

------
umanwizard
As much as people make fun of him for caring so much about a seemingly minor
point, I've come around to understanding RMS's frustration about people
calling all of the GNU project "Linux". This sort of title drives home why:
neither Linus nor anyone else on the Linux project had anything to do with any
of this.

~~~
kemitche
If someone wrote an article titled "Windows utils that you might not know," I
would not expect it's content to be solely applications written by Microsoft.

~~~
flukus
With WSL (and cygwin) this could be windows utils you may not know.

------
vertebrate
Why does one use `font-weight: 200` for code?

~~~
pacaro
It's to keep the page weight balanced. Prose is considered to have neutral
buoyancy, but code is naturally heavier and will tend to drag the page down if
that isn't compensated by a lighter font. FWIW poetry can be so light, that it
can require a very heavy font weight to keep it tethered to the page.

~~~
vdm
<tips hat>

------
jurgemaister
To shreds, you say?

------
sabujp
factor is buggy :

factor 100000000000000000000000000000000000000

100000000000000000000000000000000000000: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

[http://www.wolframalpha.com/input/?i=factor+1000000000000000...](http://www.wolframalpha.com/input/?i=factor+100000000000000000000000000000000000000&dataset=)

also numfmt bug:

$ numfmt --to=iec 999999999999999999939999999 828Y

$ numfmt --to=iec 999999999999999999940000000 numfmt: value too large to be
printed: '1e+27' (cannot handle values > 999Y)

but 999999999999999999939999999 is ~1000YB :
[http://www.wolframalpha.com/input/?dataset=&i=99999999999999...](http://www.wolframalpha.com/input/?dataset=&i=999999999999999999939999999+bytes+to+yottabytes)

~~~
barrkel
You got the same answer from factor and Wolfram alpha; where is the bug?

~~~
sabujp
wa shows 76 prime factors, which is the not same as printing out a bunch of
2's and 5's

~~~
thehobgoblin
WolframAlpha lists the factors as 2^38 * 5^38, which is just a compact
notation for 38 2s and 38 5s, totalling 76 factors. If you count the 2s and 5s
in the output of factor, you get the same result of 38 each for a total of 76
factors.

