
The hard part in becoming a command line wizard - ColinWright
https://www.johndcook.com/blog/2019/02/18/command-line-wizard/
======
l_t
I like the "everything is text" mentality, but really, the "hard part" of the
GNU/Linux CLI is getting things done _when you don 't already know how_.

If you've memorized the hundreds of different invocation arguments for find,
sed, grep, tr, sort, uniq, cat, xargs, etc. -- then yes, you can be very fast.
If not -- be prepared to RTFM for about 4 different programs to get your job
done.

That's an incredible hurdle and it really only makes sense if you are using
these tools daily/weekly. I'm still not completely convinced that its "worth
it" to master these tools, when general-purpose scripting languages are
_almost_ as succinct and obviously way more powerful. That's speaking as
someone who's written a fair amount of Bash.

In contrast, I think tools with higher discoverability and self-documenting
nature (arguably, Excel) usually offer a more user-friendly way of doing the
same thing, unless you are doing it _a lot_.

~~~
pm215
Yes, most of us would probably have to check the manpage for three or four of
these utility programs to put together something like Rob McIlroy's one-liner
from the article. I think this is clearly still more efficient than writing 17
pages worth of code in a general purpose language, which is what it's being
pitted against. I also don't think it's really a very high hurdle -- if I were
writing the same thing in a Python or Perl script I bet I would end up also
consulting the documentation to check exactly how to do things or what some
detail of syntax is. If you happen to write Python all day every day then
sure, you'll find it much faster to just write a quick Python script for this
sort of thing. If your day job is C or Java or some other non-scripting
language and this is a once-a-month-or-so task then I think the time required
to do this sort of small utility task in shell-and-utility-tools is not really
any different to remembering how to write Perl scripts.

~~~
jamesgeck0
The 17 pages comment gets me. Literate programming is verbose by design. I'd
be surprised if the Ruby version of that script was much longer than the
command line version.

------
lisper
The Lisp philosophy is very similar to the unix philosophy, but with one
crucial modification: instead of "everything is text" the mantra becomes
"everything is an S-expression".

The reason s-expressions are better than text is that they are more
expressive. They allow arbitrary levels of nesting whereas text doesn't. Text
naturally develops a record structure with only two levels of hierarchy:
records separated by newlines, containing fields separated by some other
delimiter (usually spaces, tabs or commas, sometimes pipes, rarely anything
else). Going any deeper than that is not "native" for text. That's why, say,
parsing HTML is "hard" for the unix mindset. But a Lisper naturally sees HTML
(and XML and JSON and, well, just about everything) as just an S-expression
with a more cumbersome syntax.

~~~
hjk05
> The reason s-expressions are better than text is that they are more
> expressive

That has got to be a joke right? S-expressions are a subset of text it’s
axiomatically impossible for them to be more expressive.

~~~
kbutler
Sugar is better than sooty water. They are both C, H, O, but the sugar has
useful structure.

s-expressions are structured text. Therefore, for some set of problems,
s-expressions are "better" (contain more information, are more easily
manipulated, etc.) than unstructured text.

~~~
hjk05
Please provide an example of an s-expression that provides more information
than can be encoded though a text string that is the same length, and in fact
identical to said s-string.

I mean, I get it. Lispers like lisp, and think it’s “better”. But it’s more
more expressive than text, and in fact operating on text gives you natural
access not just to s-expressions but to any other programming languages or
structured and unstructed examples of text.

If you shell operated only on s-expressions that would either have you reduce
expresiveness, or just doing tricks to encode the text-stream as a
s-expression which hardly qualifies as making it more expressive, that’s just
adding extra redundant information around the original information.

~~~
Twisol
> Please provide an example of an s-expression that provides more information
> than can be encoded though a text string that is the same length, and in
> fact identical to said s-string.

We're talking about a class of strings, not a single individual string.
S-expressions are more constrained than arbitrary text, and therefore contain
more structure than arbitrary text. Constraints _are_ information.

You can't simply impose a tree structure on any arbitrary piece of text.

------
wingerlang
For me, clever one liners is a very much an iterative process. In the example
given I'd find (memory or Google) the program to spit out each word, then find
how to count them, somehow sort it. In the end you might have a multi line
sequence of commands, then you gradually move them to pipes. It is rarely
written in one swoop anyway.

------
Tepix
If learning all that stuff is too hard for you, follow this advice from
Dilbert:
[https://dilbert.com/strip/2010-05-16](https://dilbert.com/strip/2010-05-16)

:-)

~~~
ghostbrainalpha
I used to LOVE Dilbert. His comics are so funny.

I didn't mind the Trump stuff. I think Scott Adam's insights were valuable
even if his comments were extremely frustrating and contradictory.

But his somewhat new Climate Change denial stuff finally made me unfollow him.

[https://twitter.com/ScottAdamsSays/status/109968558214017024...](https://twitter.com/ScottAdamsSays/status/1099685582140170240)

~~~
merlincorey
> But his somewhat new Climate Change denial stuff finally made me unfollow
> him.

I don't follow Scott Adams, but my understanding from when he started talking
about that subject is that he is not denying climate change, but he is acting
as a climate skeptic in order to bring about debate and enlighten people on
all sides.

For a (possibly bad and certainly trite) metaphor, it's somewhat like saying
"Gravity is settled science" to a gravity denialist. You could instead provide
some facts and experiments they could try such as holding their hands above
their head to fight the force[0] of gravity. They could decide they don't
think your facts are correct and perhaps theorize other reasons why they
cannot hold their hands above their head indefinitely.

That's science, where even gravity is "just" a theory[1].

[0] Apparently in the Theory of Relatively, gravity is not actually a force,
but a consequence of the curvature of space-time and uneven distribution of
mass

[1]
[https://en.wikipedia.org/wiki/Gravity#Modern_alternative_the...](https://en.wikipedia.org/wiki/Gravity#Modern_alternative_theories)

~~~
dontbenebby
> _I don 't follow Scott Adams, but my understanding from when he started
> talking about that subject is that he is not denying climate change_

We don't need debate on climate change. There are not "sides". (Unless you
count reality and magical thinking as "sides".)

~~~
edmundsauto
We absolutely need debate and skeptics on climate change. Debate doesn't mean
adversarial; it means questioning the evidence and possible interventions.

If we aren't debating on climate change, how do we decide where to put
resources to reduce emissions and clean up existing pollution?

~~~
dontbenebby
>Debate doesn't mean adversarial; it means questioning the evidence

Questioning the evidence of climate change is de facto adversarial. The
evidence is overwhelming and beyond question. No intelligent, sane person
arguing in good faith would argue otherwise.

~~~
edmundsauto
Your derision and perspective are more appropriate for a religious zealot than
a scientific thinker. Skepticism is the heart of science.

I have not invested enough time in understanding the currently best climate
models. I must, therefore, take a skeptical approach and question conclusions
drawn from that evidence. Especially so for a projection.

For the level of assurance you seem to have, I assume that you have rerun
models yourself, have a deep understanding of complex systems and statistics,
and done other climactic work? Even that is skepticism applied and questions
asked.

------
redka
I never really got to be a command line wizard even after 7 years of using it
daily. After some time it struck me that I am somewhat of a wizard in Ruby and
most of the things that I need to to in the terminal would be really easy for
me if I could make use of my Ruby skills in a more natural way. I came up with
rb [1] which made it possible and I no longer read man pages of simple unix
tools

[1] [https://github.com/thisredone/rb](https://github.com/thisredone/rb)

~~~
commandlinefan
I'm about the closest to a command-line wizard that I know personally, but the
only reason is that I grew up on the command-line - I had already graduated
college before GUIs were common. I'm not entirely convinced that, in today's
era, spending the time it takes to really master the command line is honestly
worth it; you need to know how to do some things, but there are probably
better tools for most of these jobs. A lot of what I do naturally with sed,
awk, cut, paste, uniq, sort and tr could be done just as quickly and naturally
in Excel; I just use the command line because I already know it.

------
xg15
> _The output of a shell command is text. Programs are text. Once you get into
> the necessary mindset, everything is text._

This mindset is also how we got the C preprocessor...

~~~
gowld
Programs are _structured_ text. The judtaposition is against binary structures
(Windows style) that require an intermediary to perform translation for
humans.

The C-preprocessor travesty is text, but hygenic macros are also text. The
difference is structure.

------
jodrellblank
And the best bit about PowerShell is that you don't have to force everything
into text, and then work out how to parse it back out of text.

You are in a scripting language REPL where you can have a hashtable of words
and their counts, and keep it structured

    
    
        gc words.txt |% { -split $_ } | group | sort count -desc | select -first 5
    

and when I saw the first one was spaces:

    
    
        gc words.txt |% { -split $_ -match '^\w+$' } | group | sort count -desc | select -first 5
    

But the point isn't that you can do it, or any comment on the length of it,
the point is that the output of group (group-object) is structured - the
counts are [int] types, the sort is able to pick those out and sort on them
without needing to parse them out of text and convert them to numbers, the
select (select-object) isn't outputting individual lines of text, so the
structure isn't lost when shown on screen, and the output can be formatted for
display as columns, lists, tables, from its structure.

------
gerbilly
I tried it on the text of the article itself.

Here's my cmdline:

# cat file | preserve only spaces and letters (remove punctuation) | convert
to lowercase and put each word on one line | use awk maps to keep a count for
each word and print map in two columns, count followed by word | sort in
reverse numerically by count column, then by word ascending | print top 10
words by frequency

cat input.txt | sed -e 's/[^a-zA-Z ]//g' | tr -s ' A-Z' '\na-z' | awk -F ' '
'{ if($1 in count) { count[$1]=count[$1]+1 } else { count [$1] = 1 } } END {
for(k in count) { printf("%d\t%s\n", count[k],k) } }' | sort -k1,1nr -k2,2 |
sed -n -e '1,10 p'

Output:

27 the

21 to

15 a

11 is

10 of

10 that

9 but

8 text

7 was

6 and

~~~
tln
Here's my version using a different "command line": chrome devtools

Object.entries(document.body.innerText.replace(/[^a-zA-Z\s]/g,
'').toLowerCase().split(/\s+/).reduce((o, word) => {o[word] = (o[word] || 0) +
1; return o}, {})).sort((a,b) => b[1] - a[1]).slice(0, 10)

Plain JS is missing some niceties, but being able to crunch some data from a
website using the same iterative approach as a shell command line has its
place!

~~~
pcnix
This is pretty cool, I haven't really used the console for something like this
in a while, and it's a good reminder for its power.

------
blunte
To me it's a similar principle to REPL-based development: you compose or chain
tools, testing each tool (or deciding between tools) at each step.

As with most programming, there are usually many ways to solve a problem.
Sometimes the different approaches are equally good, but sometimes the
approaches are more or less ideal given the constraints.

There's no real magic, and it's really not difficult. To me, problem solving
like this, and ending up with a "script" (just a chain of piped commands with
some args) is often the easiest way to solve problems.

~~~
busterarm
I tend to agree here, as someone who is both command line wizard and REPL-
based developer.

In fact, the majority of my Ruby scripts for systems work are combinations of
the language and %x'd unix commands. I find it's incredibly powerful.

------
pjc50
For more on Knuth, see [https://franklinchen.com/blog/2011/12/08/revisiting-
knuth-an...](https://franklinchen.com/blog/2011/12/08/revisiting-knuth-and-
mcilroys-word-count-programs/) and also a C version in
[http://www.cs.upc.edu/~eipec/pdf/p583-van_wyk.pdf](http://www.cs.upc.edu/~eipec/pdf/p583-van_wyk.pdf)

I think the main thing that stands out to me here is just how long ago it was;
Knuth was writing just past the assembler era. None of the modern
infrastructure was available, such as languages with builtin dictionaries like
Perl and Python.

However, software compositionality is one of those things that's been tried in
so many not-quite-successful ways. Pipes? Objects? Interprocess object systems
(COM/OLE) or object brokers (CORBA)? RPC? Microservices? Packages? Object
pipes (powershell)?

------
scrooched_moose
One of my more enjoyable side projects was improving parsing of some truly
horrendous log files - massive and poorly formatted - from ~2 hours to ~2
minutes.

The code I was given was several hundred lines of python. All I did was
replace all but the final parsing function with a single "grep | sed | awk |
grep | uniq | sort | grep" line and called it a day. I'm sure I could have
taken it farther, but for the amount of effort it was a fun little exercise.

------
projectramo
I can't beat this for terseness, but I think I could whip up something close
(say 15 lines) in Python (readable) or Perl (would be shorter than Python but
more readable than linux).

I guess I am more curious about what Knuth did! Did he just comment it a lot
for "literate" code?

As always I feel the Python (or Ruby if I could write it) is the sweet spot
between being easy to read and easy to write.

~~~
svat
Knuth was writing in Pascal, a language that had many limitations (e.g. it
didn't even have a proper string type; "array of 15 characters" and "array of
16 characters" were distinct types) -- see Kernighan's article "Why Pascal is
not my favourite programming language" from about the same time. So he had to
implement a lot of the primitives in the program, and also more complicated
data structures like a hash trie in this case, and the point was to
demonstrate how to do that without the program becoming unreadable. I'm
reading some of Knuth's Pascal programs recently and writing down notes for
myself; ended up discussing some of these points here:
[https://shreevatsa.github.io/tex/program/pooltype/#other-
pro...](https://shreevatsa.github.io/tex/program/pooltype/#other-problems-
with-pascal)

(A slightly better comparison may be with the source code of the Unix programs
(tr, wc, etc), rather than with shell pipelines composing those programs.)

Read the original sources for more:

[https://www.cs.tufts.edu/~nr/cs257/archive/don-
knuth/pearls-...](https://www.cs.tufts.edu/~nr/cs257/archive/don-
knuth/pearls-1.pdf)

[https://www.cs.tufts.edu/~nr/cs257/archive/don-
knuth/pearls-...](https://www.cs.tufts.edu/~nr/cs257/archive/don-
knuth/pearls-2.pdf)

[http://www.cs.upc.edu/~eipec/pdf/p583-van_wyk.pdf](http://www.cs.upc.edu/~eipec/pdf/p583-van_wyk.pdf)

~~~
sedachv
Thank you for those links and your post!

------
dahart
> The hard part on the path to becoming a command line wizard, or any kind of
> wizard, is thinking about how to apply existing tools to your particular
> problems. You could memorize McIlroy’s script and be prepared next time you
> need to report word frequencies, but applying the spirit of his script to
> your particular problems takes work.

It definitely takes a bit of work, but hopefully it's still a lot less work
than writing the verbose solution.

The hard part of mastering unix one-liners for me are two things:

\- Figuring out which tool can pick apart or reconstruct a specific syntax
robustly. I think I spend more time figuring out input parsing and output
presentation than anything else.

\- Knowing when the right time is to ditch the clever one-liner and upgrade to
a real program. There's a benefit to prototyping as a one-liner, but there are
cases when replacing the result with a long program is appropriate, especially
in a production environment. But, then again, only if the one-liner isn't
robust or scalable enough, and usually it probably is.

~~~
dmitriid
A verbose solution is repeatable, devuggable, and understandable six months or
even six years from now.

A verbose solution can be deconstructed and understood by a newbie.

A oneliner? Not so much, if at all.

~~~
dahart
That is occasionally true, but in 30 years I've absolutely definitely seen a
lot more the other way around, personally speaking.

Good POSIX one-liners don't need to change usually and aren't hard to
understand. A verbose program, on the other hand only lasts if it's like K&R C
with zero dependencies. All the languages and libraries have changed
considerably, scripting languages and versions come and go, C++ is quite
different and might not compile later, build systems have backward
incompatibilities.

Plus, a one liner is, like, one line. It's really not that hard to understand
unless it's intentionally crazy, and even then, even if it takes a half hour
to work through, that's still less work than something like Knuth's 15 pages
of code.

We could be imagining pretty different scenarios, though, so feel free to
share examples. Normally a pipe consisting of a combination of sort, cut,
head, and uniq is trivial to grok, and it would be very wasteful to re-
implement that kind of thing on your own. Here are a few one-liners I've made
use of recently that I love:

    
    
      # count the function calls in log.txt
    
      grep -Po '^\K.+(?=\()' log.txt | sort | uniq -c | tee callcounts.txt
    
      # sum the timings 
      # input: 2 columns, time & name. output: 3 columns: time_sum, name count, name
      # It's basically a pivot table in one line of code
    
      < time.txt perl -lane '$k{$F[1]}+=$F[0]; $c{$F[1]}+=1; END{print "time\tcount\tcall"; print "$k{$_}\t$c{$_}\t$_" for keys(%k) }' | sort -n | tee time_grouped.txt
    

In my own personal repo, the bash scripts and one liner aliases are outlasting
all my python and C++ code. The verbose programs that I wrote myself are
harder for me to maintain.

------
3xblah
The question I have for Dr. Cook is, "What is this wizardry worth in today's
world of computing?"

It seems for every concise one-liner that requires no internet access there is
a complimentary website someone has set up to perform this same task "for
free" on a remote computer and return the results in a web page.

I never use those because I still enjoy the command line. People who design
software and websites today, and those who review their work in the media,
often refer to the command line as some sort of purgatory. God help the user
who is faced with a command prompt. For me, this is more like a haven from the
world of gratuitously graphical software and web applications that are too
often either bloated, overly complex, inflexible, opaque, slow, or some
combination thereof.

It is not that the command line is pleasing in an absolute sense. It is only
pleasing in a relative sense, when contrasted with the alternatives. If the
alternatives are painful enough, then moving to a command prompt is a relief.

~~~
Nullabillity
It's also worth quite a bit (IMO) that the data doesn't have to leave your
computer.

------
cortesoft
I think this is just the hard part of programming in general - mapping your
problem into something that the tools you know how to use can solve.

I don’t think there is anything fundamentally different between when your
tools are Unix cli tools or a programming language. It is the same skill.

------
lqet
Isn't

    
    
      tr A-Z a-z | tr -cs a-z '\n' | sort | uniq -c | sort -rn | sed ${1}q
    

equivalent to

    
    
      tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q
    

but 4 characters shorter? Or am I missing something?

~~~
ColinWright
Yes, it's shorter, but we're not playing golf. It's conceptually pretty much
the same, and then you just go with whichever one you came up with first.

Yours says:

    
    
        Convert to lower case
        Convert non-letters to CR
    

The original says:

    
    
        Convert non-letters to CR
        Convert to lower case
    

Conceptually the same. The point is, most people don't seem to be able to do
this at all, although there are many who can tinker with a given solution and
improve it around the edges. It's the ability to come up with the initial
version that seems to be rare.

~~~
darrenf
My first reaction was to wonder why `tr` was required at all - neither
conversion struck me as necessary. These days both `sort` and `uniq` can do
case-insensitivity:

    
    
        sort -f | uniq -ic
    

So I just allow the shell to do word splitting:

    
    
        for x in `cat foo.txt` ; do echo $x ; done | sort -f | uniq -ic | sort -rn | head -<num>

~~~
ColinWright
Doesn't that have different behaviour if there is punctuation in the file? On
my phone and not in a position to test it, so I'm not sure, but I think it's
different.

~~~
darrenf
It does indeed. Good catch :)

------
peterwwillis
jq has made it easier to munge complex data from many non-unixy apps, but then
you have to learn how to write complex jq expressions. Something like sql
applied to json might make this easier.

~~~
JasonFruit
jq is an amazing tool, and I wish I used it frequently enough to remember how
without looking it up.

~~~
andrewla
I've mostly switched to using gron [1] for these applications -- the operation
is much simpler and doesn't depend on learning a new DSL for manipulating
json. I have to manipulate JSON data just enough to want to have a tool better
than just firing up node and running something, but not a complete new
language. With gron I can just wire it up to a bunch of standard line-based
utilities.

[1] [https://github.com/tomnomnom/gron](https://github.com/tomnomnom/gron)

~~~
FraaJad
while I get what gron is trying to do, I do not have an intuitive grasp of how
to use it. Do you have an example that you wrote/learnt from?

------
mancerayder
Is the entire question becoming obsolete?

I'm something of a command-line wizard (well, given enough time, if I'm
honest! I always have to re-remember things like awk syntax), but in
interviews now, no one asks core Linux shell-type questions (or even
theoretical kernel-type questions). They primarily ask:

\- coding (70+%)

\- diagramming infra type questions

\- experience with keywords ("CI/CD" and "containerization" being the biggest
keywords).

Recently I interviewed with a 'startup' that's in the real estate space and I
was told by a manager, after touting my Linux infrastructure skills, "Oh, I
don't even care about that anymore. I can teach all there is to now in a
couple of days. It's all AWS and tooling here."

This is in the DevOps space.

Sounds like the new generation can give 0 craps about the command-line?

~~~
majewsky
Just because suits don't ask for it in interviews doesn't mean it's not
relevant for your day job.

Also works the other way around: How many binary trees are you balancing each
day?

------
rocqua
I love that quoted new-line. Thats the exact amount of crazy idiosyncrasy I
expect from shell one-liners.

~~~
Tepix
It's nice, alas it has some problems with apostrophes and dashes.

    
    
        perl -pe 's/[\s\.,]+/\n/g'
    

works better. Nowdays you could also get rid of the extra "tr" to convert to
lower case and use the new-ish "-i" option to make uniq ignore case. That
wasn't available in 1986, however.

------
darkerside
Are we just going to ignore the (rough guess) thousands of lines of code that
make up `tr`, `sed`, `uniq`, and other CLI tools? I enjoy hacking together
nifty solutions on the shell, but the tone of this article irked me.

~~~
jonahx
That was not implied.

A key skill in good programming is choosing tools at the correct level of
abstraction for the problem. Here, the higher-level unix tools are clearly the
better choice.

Saying "that's cheating" because the high-level tools are _themselves_ written
with lower-level tools with many LOC is missing the point.

~~~
darkerside
> Knuth and McIlroy had very different objectives and placed different
> constraints on themselves, and so their solutions are not directly
> comparable. But McIlroy’s solution has become famous. Knuth’s solution is
> remembered, if at all, as the verbose program that McIlroy responded to.

He admits they don't compare, but does it anyway. I shouldn't take offense...
but for some reason I do!

------
vram22
Just posted this to HN some days, in another thread, but since relevant:

[https://jugad2.blogspot.com/2012/07/the-bentley-knuth-
proble...](https://jugad2.blogspot.com/2012/07/the-bentley-knuth-problem-and-
solutions.html)

Post has rough Python and shell solutions to the problem Bentley proposed.

The solutions can be useful for beginners to Python or shell.

~~~
CarolineW
Your "shell" solution isn't really a shell solution, it's an awk solution.

It's really valuable to know awk, but it's a bit misleading to claim that
"it's shell".

~~~
gerbilly
This is just pedantry.

I this read you other replies and I can't see them adding much value to the
discussion.

You are just being rigid about semantics.

What action are we supposed to take having read your comments?

To be careful of not calling a pipeline with awk in it 'shell?'

To be ashamed of using awk because it's a programming language, and that is
'cheating'?

~~~
CarolineW
Wow, I would never have expected my comments to have been interpreted that
way. Genuinely shocked.

Thank you for the feedback.

------
dontbenebby
I wish there was some tool that could, given a one liner, expand it out into a
series of steps to better understand it.

(Or maybe someone wrote a 1 liner for that too?)

Personally I prefer to write small bash scripts over one liners so future
colleagues can understand the program's logic.

~~~
oweiler
[https://explainshell.com/explain?cmd=tr+-cs+A-Za-z+%27+++++%...](https://explainshell.com/explain?cmd=tr+-cs+A-Za-z+%27+++++%27+%7C+++++tr+A-Z+a-z+%7C+++++sort+%7C+++++uniq+-c+%7C+++++sort+-rn+%7C+++++sed+%24%7B1%7Dq)

~~~
zwieback
Perfect, this is why, after almost 10 years, I still come back to HN
regularly!

------
MPSimmons
Hrm.

[msimmons@msimmons-lin ~]$ time cat moby10b.txt | tr " " "\n" | grep -v
'[[:space:]]' | sort | uniq -c | sort -g | tail -n 10

1592 I

2122 his

2446 that

3462 in

3906 a

4051 to

5225 and

5514

5851 of

11997 the

real 0m0.768s user 0m0.805s sys 0m0.021s

------
jackallis
my question, now, is where does a complete beginner start?

~~~
wodenokoto
Google “enough command line to be dangerous”

It lives up to its name and is very much a place to start and nothing more.

Your real problem is going to find enough problems to solve to build up muscle
memory.

------
mondo9000
Its 2019, learning to program like its 1975 might not be a good use of time.

~~~
arendtio
Sometimes it is wiser to learn what has been tried, before reinventing what
didn't work in the past. And tools that survived for 40 years in our industry
can't be that bad.

~~~
mondo9000
you can use vi and vim, but I'll just use nano.

