Hacker News new | past | comments | ask | show | jobs | submit login
The hard part in becoming a command line wizard (johndcook.com)
187 points by ColinWright on Feb 28, 2019 | hide | past | favorite | 169 comments

I like the "everything is text" mentality, but really, the "hard part" of the GNU/Linux CLI is getting things done when you don't already know how.

If you've memorized the hundreds of different invocation arguments for find, sed, grep, tr, sort, uniq, cat, xargs, etc. -- then yes, you can be very fast. If not -- be prepared to RTFM for about 4 different programs to get your job done.

That's an incredible hurdle and it really only makes sense if you are using these tools daily/weekly. I'm still not completely convinced that its "worth it" to master these tools, when general-purpose scripting languages are almost as succinct and obviously way more powerful. That's speaking as someone who's written a fair amount of Bash.

In contrast, I think tools with higher discoverability and self-documenting nature (arguably, Excel) usually offer a more user-friendly way of doing the same thing, unless you are doing it a lot.

Yes, most of us would probably have to check the manpage for three or four of these utility programs to put together something like Rob McIlroy's one-liner from the article. I think this is clearly still more efficient than writing 17 pages worth of code in a general purpose language, which is what it's being pitted against. I also don't think it's really a very high hurdle -- if I were writing the same thing in a Python or Perl script I bet I would end up also consulting the documentation to check exactly how to do things or what some detail of syntax is. If you happen to write Python all day every day then sure, you'll find it much faster to just write a quick Python script for this sort of thing. If your day job is C or Java or some other non-scripting language and this is a once-a-month-or-so task then I think the time required to do this sort of small utility task in shell-and-utility-tools is not really any different to remembering how to write Perl scripts.

The 17 pages comment gets me. Literate programming is verbose by design. I'd be surprised if the Ruby version of that script was much longer than the command line version.

I do see what they're saying though. I've used Linux for the most part of 20 years. My primary job has hopped between Perl, Python, and a bunch of other languages, but bash/csh has been fairly consistent. I took a job using Windows for a year or two and tried really hard to get familiar and proficient in PowerShell. Since my job was primarily in Python, learning PowerShell had a steep learning curve with a lot of time between uses. I clung to Python. I jumped to WSL/Ubuntu (and honestly sshed into Linux vms before that) because I was so familiar with it. If I stayed at that job (or another Windows job) I probably would have eventually picked up PowerShell, but that would probably have taken 5 years.

And, it's also faster/easier to copy and paste a command from the internet than to download, install, and run a standard program. Probably not the most security-conscious thing, but it gets the job done.

Furthermore, it's often quite easy to slightly tweak a command I found online, even if I don't understand much of it.

As a Windows user (embedded programming and gaming keeps me stuck to Windows) who has dabbled a bit in Linux over the last 10 years, the fact that GNU/Linux "assumes you know what you're doing" is constantly frustrating. I've overcome this somewhat by stopping my current linux install from automatically starting the xwindow system. That way I have to use the terminal for some things. This has worked somewhat, but seems only to highlight how terrible interface is for those of us who aren't wizards.

I like to phrase it as "Linux assumes you know what you're doing. Windows assumes you don't know what you're doing. Mac assumes you don't want to know what you're doing." Working in an interface that doesn't hold your hand can be liberating when you understand how that system works, but there must be a way we can step back.

Maybe it would be worth adding something similar to the drop-down auto-completion boxes I see in so many modern IDEs. That way instead of reading the man page of every single command and program, I can (for example) get a list of possible arguments, and each of those arguments provides a summary of its function when highlighted. This could all be done in text, so that it would be available outside of the window system. Maybe it has already been done.

Obviously such a system would be laborious to implement and obnoxious to seasoned bash users (though it should be easy to disable), but it would be extremely handy to those who (like me) only occasionally use Linux. In any case, something should be done that makes the system a little more welcoming to casual users.

FWIW I was in the same boat for many years, first on Windows then OS X. I'd learn a Linux trick or two then lose momentum and more importantly lose confidence that I could get around the CLI (and just as important, the maze of config files).

What made a huge difference for me was setting up Arch Linux on my XPS 15 laptop. Following the install guide[1] forced me to go through the process of standing up a functioning Linux desktop from a much lower level starting point than I'd ever done with Ubuntu or CentOS or even Slackware. The whole experience reminded me of the first few times I built PCs as a kid: At first they were intimidating boxes full of complex circuit boards and imposing nests of ribbon cable, but putting a few together demystified all that in a hurry. Setting up my Arch laptop had the same result as it relates to Linux configuration in general and command line operations in particular.


Both Windows and OSX are "GUI first" OS's, and Linux simply is not. The majority of the Linux systems out-there is headless - and it shows. I drop down to a terminal immediately on any Linux system for everything - GUI or not. On both Windows and OSX, I will most likely do things "the GUI" way first, especially file management with the finder or explorer - although on both these platforms, there are very viable or even excellent CLI environments.

But I get what you're saying - even when looking at the PowerShell design, script documentation is a first class citizen there. As a scripting language I'm pretty impressed with it, but as a shell I find it an annoying place to be in. OSX on the other hand simply has a unix shell, with a respectable amount of CLI tools available to interact with the desktop. Sadly, Apple scripting has been a bit neglected in the last few releases and is not really supported anymore by all applications.

disclaimer: I do not like Linux on desktop. My daily driver is a MBP with OSX, and I used to work on Windows (but still have a gaming box), and all my linux boxes are servers.

> As a scripting language I'm pretty impressed with it, but as a shell I find it an annoying place to be in.

Couldn't agree more. The syntax is fantastic for scripting as it has good support for objects, but it is not good as a shell due to the different return types and polymorphic behavior of some functions based on the type they are operating on. In *nix land this isn't a problem as everything is text.

There are such friendly shells, such as https://fishshell.com/ with auto completion and nifty features.

I agree. I think a REPL (which I consider the command line to be one of) and something like Excel to be immensely useful no matter what the language or proficiency. I think code is much sturdier and I'm more likely to experiment if I can iterate over code after typing every few words instead of (even in a language I'm proficient at) writing a whole page of code, then running it to correct my typos and errors.

I work in visual effects. Most of our software has a build-in Python interpreter. I was talking to a tech savvy junior who was interested in becoming a full time modeler/artist. He was asking if he should learn programming/Python and I advised against it (unless it scratched his particular itch). I felt his efforts were best put towards getting better at modeling. Most large places I've worked at the artist can get by with a sticky note of common commands. There's usually someone they can ask for advice or small scripts when things get hairy. Smaller places you're more on your own. But it is surprising when you have 100+ items, you need to be sure they all have 2 sets of UVs. The manual way is to click on each one and pull a drop-down menu or with Python you can write a one-line foreach to test and print. It's all about knowing which screw to turn[1]

I often challenge myself in doing something in a single line just to scratch my own itch. Often when tackling daunting tasks (do this thing 20-50x) I weigh the benefit of doing it manually, but bias a tad toward automating even if I'm throwing the code away. But I don't think it's worth most people's time.

The biggest benefit of working the same place a long time is knowing who to ask about things. Second is knowing the history of things. Third is having a junk drawer of scripts that don't quite work out of the box, but they have idioms I can grab when I need to hammer something out.

[1] https://calvincorreli.com/blog/1397-knowing-which-screw-to-t...

I do a lot of 3d stuff (not a pro) and I've found a passing knowledge of programming is quite useful when you're doing something a little bit more unusual or complex. I did some pretty big cityscapes made out of modular components, and it was nice to have python for stuff like twisting 300 TV aerials so they're all a little bit differently wonky, or making lines of washing.

I'm also a big believer in mastering your tools. An artist is limited by their imagination - and imagination is largely structured by your knowledge of materials and the capabilities of tools. If you're working with 3d models, and you plan on mastering it, having an idea of what you can do with An artist is limited by their imagination - and imagination is largely structured by your knowledge of materials and the capabilities of tools. It spurs you on to try new and ambitious stuff, new styles, new forms.

As passing knowledge is great, but focus and mastery just requires too much time (and time to maintain). Something as large as Maya or computer graphics you have to pick your battles. If you're completely independent or at a very small studio you have to be scrappy, but I think that means more of having a wide toolset, a lot of copy/paste and experimenting. You just don't have the time to master and maintain that mastery.

My career has let me focus on the backend and internals. I know people way more proficient than I am with Maya, but they know little to nothing about Python or the DG (or maybe MEL). I know a few vfx sups who can cobble together a basic shell command, but aren't masters and don't know a bit of Python. It's much easier to interact with them (especially over chat/email--I can send them a command and trust them to modify paths and run it), but it's not worth their time to invest in honing those things. It's probably cheaper to fly someone from a different state to do these things for them than it is for them to spend half a day writing something to parse and edl to rename some Quicktime files or troubleshoot a bad power supply in their workstation. I've also noticed that the more technical they are, the deeper the hole they're in before they ask for help. Hopefully that means they're more productive (and are more ambitious), but it usually means I have to ask them what their original problem was, then what was the first thing they did, then the second, and unwind the situation.

For the situation you described of turning buildings there are probably 5-10 completely different approaches you could take (apart from manually placing them), only a few use Python. At a moderately sized studio (on a moderately sized project), I'd reach for something like Houdini to do what you're asking and you would be hard-pressed to find someone who would use Python for this kind of thing in Houdini. On the flip side I was handed a project at a small place that was well suited for Houdini. We didn't have a license so I cobbled something together in Maya. The producers were telling me in their experience they'd hire someone who would do a project in Houdini, but later when they needed changes and that person wasn't available they'd have to hire someone to redo it in Maya. Reality is hard.

My thoughts about this really clicked with photography. A lot of photographers are just technical enough. Most of the process is a black box, but they've had their camera for years, tested enough inputs and outputs that they can trust their tools to do what they want. Other photographers dig into the tech. They can't, by definition, have as much time in the field. They can tell you a lot more "why" and might be equally good or better in many respects, but it can be really easy to get lost in the tech and miss out on the practice.

I guess I'm interested in 3D stuff because I feel like I'm living in the birth of a new medium, like when people discovered painting, or dance. Most people doing 3d are doing stuff you could do in another medium, like film but cheaper, or interactive.

On the one hand, you're absolutely right when you say that it's largely unproductive to work outside of your speciality. On the other, I think it's a bit like painters learning to make their own frames and mix their own paints. It's not strictly necessary, but it deepens the understanding, and it opens the door to loads of stylistic discoveries.

I think the thing I like about python, or programming in general as a creative technique, is the combination of amazing fine-grain control, with absurd expressiveness. The kind of techniques you learn as an artist are usually strong on one or the other. Crosshatch gives you a lot of control over light and shade, but it's a time-consuming, boilerplate kinda technique. Most traditional techniques are like that, or, like throwing paint at a wall - very expressive, but no control.

Programming the creation of sculptures and images is the kind of amazing opportunity that it feels sort of crazy to pass up. I've spent an inordinate amount of time refining my ability to draw nice gradients with a set of pencils - and now I have a machine that can give me a perfect gradient with a couple of lines of code!

The Lisp philosophy is very similar to the unix philosophy, but with one crucial modification: instead of "everything is text" the mantra becomes "everything is an S-expression".

The reason s-expressions are better than text is that they are more expressive. They allow arbitrary levels of nesting whereas text doesn't. Text naturally develops a record structure with only two levels of hierarchy: records separated by newlines, containing fields separated by some other delimiter (usually spaces, tabs or commas, sometimes pipes, rarely anything else). Going any deeper than that is not "native" for text. That's why, say, parsing HTML is "hard" for the unix mindset. But a Lisper naturally sees HTML (and XML and JSON and, well, just about everything) as just an S-expression with a more cumbersome syntax.

you obviously know lisp and might be biased but i have a serious question, if i was going to waste a bunch of time learning a functional language in 2019 and wanted to use it in production (I am in consulting, so I would be recommending this to my clients) would you learn

1. LISP 2. Haskell 3. Erlang 4. Something else

No wrong answer, just a question I always like to hear answers to.

Not GP, but: I recommend Standard ML. I can't see a better way to do this than to power through Dan Grossman's three course sequence, which goes back to the drawing board and teaches Standard ML [0] to introduce modern programming language features such as scope, closures, types, pattern matching, modules, and so on. Then, he uses the concepts introduced in Standard ML as a springboard to introduce Scheme in the second course, and Ruby in the third course. And in the end, you will see how each of the three languages implement in different ways the same powerful techniques.

But with just that first Standard ML course, you can already branch out to more popular languages such as Scala [1], which gives you the closest thing I've seen to "Standard ML on the JVM". But Standard ML (or Ocaml, which is very close) might even be your preferred language for prototyping [2], or compiler writing [3].

P.S.: To make Coursera tolerable, I usually pull the entire course with this nice little script [4]. Works great with ranger and mplayer!

[0] https://www.coursera.org/learn/programming-languages

[1] https://www.coursera.org/specializations/scala

[2] https://sanderspies.github.io/slides/dawn-of-reason.pdf

[3] http://flint.cs.yale.edu/cs421/case-for-ml.html

[4] https://pypi.org/project/coursera-dl/

I almost forgot to mention: Matt Might has an excellent blog post about what all of these "advanced" programming languages offer: http://matt.might.net/articles/best-programming-languages/

> Uses script to copy programming course.

That's a little meta, don't you think?

thank you!

Caveat emptor: Standard ML is not widely used in industry and does not have a large library ecosystem, so if you do recommend it to clients, make sure it's clear that they'd want to follow up with something like Scala or Ocaml, which have far more users/libraries. It's just that for really understanding functional programming, Standard ML is fantastically clear. (Although actually, that Scala coursera course I linked to above is by Martin Odersky, and is also pretty darn good, especially if you're already committed to the JVM.)

Others have mentioned Elixir, which would also be an interesting option to explore. The main difference being that with Elixir you're leveraging Erlang and BEAM (which gives you a fantastic platform for actors and lightweight threads), versus Java and the JVM in the case of Scala (which gives you access to all of Java).

> if i was going to waste a bunch of time learning

Your problem is not what language to learn, your problem is that you think that learning is "wasting time."

> a functional language in 2019

Lisp is not a functional language, so your question does not even make sense.

If you actually want to help your clients, instead of selling them on bogus, inappropriate solutions (which, with your attitude and lack of knowledge, I guarantee is what you are doing to your clients today), you need to start actually learning the fundamentals of your field. If you think that learning is "wasting time," then you are not learning, you are, indeed, only "wasting time."

I am also a consultant, have worked with all the languages you listed, and I say the above to help you, and other people who consult, and me. Consulting is all about having a very solid knowledge base, continuous learning, and professionalism (which, BTW, includes basic writing skills, like capitalization). Attitudes like yours are why consulting engagements fail, and why consultants get a bad reputation. That only makes it harder for all of us to get work.

Erlang as the most pragmatic functional programming language.

Clojure as the most useful functional programming language.

Ditch both if/when you run into their limitations. You’ll then discover various degrees of FP in most languages.

I enjoyed Clojure and wrote a little production code using it but I'm finding F# much more fun and safer. F# is certainly far easier to refactor so that's another nice effect of having a static type system. Since .NET is so prevalent in business, it's easy to use almost anywhere.

I like a lot of F# but there's a lot of F# Person stigma to it in most shops that I've seen--condescending, shitty, and in many ways perceived, validly or invalidly, "different-to-be-different" (the Paket brouhaha comes to mind).

F# is a great language but I've found that it's usable when you can 1) convince C# people it's not that scary, and 2) convince them that you're not An F# Person.

Probably Erlang, but potentially Ocaml.

I'll probably catch some flack for this (from both sides): You can (and should) use Ocaml as a Lisp where you lose the easy macros but gain strict typing. It's brilliant. The only downside is emacs; it's the only sensible IDE.

Haskell (and friends) are fantastic, but in my experience you have to be reasonably smart and work full-time with it, since every medium sized project gets turned into its own DSL (more so than with other languages). Maybe it's not ideal for consulting.

I will always have a soft spot in my heart for Common Lisp, and Clozure Common Lisp (CCL) in particular. But all the cool kids seem to be using Clojure (with a J) nowadays.

Erlang has Elixir which is used throughout industry. Clojure is out there, too. I've also seen OCaml and Haskell here and there. F# is gaining ground.

+1 for F# (though I have to admit, Elixir seems much preferred as the functional weapon of choice in industry).

Amazing ML-derived language with all the benefits of the .NET ecosystem. Highly suggest everyone dip their toe in.

> The reason s-expressions are better than text is that they are more expressive

That has got to be a joke right? S-expressions are a subset of text it’s axiomatically impossible for them to be more expressive.

As others have mentioned, "text" is rarely unstructured in practice: it's just that the "structure" is ad-hoc, relying on things like whitespace, newlines, various punctuation, etc.

I mean, there's a reason why regular expressions were one of the first things that went into Unix. But with lisp, you need neither regular expressions nor separate parsers for each format, because s-expressions enforce a uniform encoding of all structure in a data structure (a tree), which is parsed automatically by the interpreter!

Edit: And, as lisper clarifies, "S-expressions are more expressive than text that adheres to the typical unix conventions" [0], meaning that s-expressions make it trivial to arbitrarily extend that structure (whereas doing the same beginning from Unix conventions gets messy very quickly; see, e.g., lisper's example of parsing HTML).

[0] https://news.ycombinator.com/item?id=19274518

Sugar is better than sooty water. They are both C, H, O, but the sugar has useful structure.

s-expressions are structured text. Therefore, for some set of problems, s-expressions are "better" (contain more information, are more easily manipulated, etc.) than unstructured text.

Please provide an example of an s-expression that provides more information than can be encoded though a text string that is the same length, and in fact identical to said s-string.

I mean, I get it. Lispers like lisp, and think it’s “better”. But it’s more more expressive than text, and in fact operating on text gives you natural access not just to s-expressions but to any other programming languages or structured and unstructed examples of text.

If you shell operated only on s-expressions that would either have you reduce expresiveness, or just doing tricks to encode the text-stream as a s-expression which hardly qualifies as making it more expressive, that’s just adding extra redundant information around the original information.

> Please provide an example of an s-expression that provides more information than can be encoded though a text string that is the same length, and in fact identical to said s-string.

We're talking about a class of strings, not a single individual string. S-expressions are more constrained than arbitrary text, and therefore contain more structure than arbitrary text. Constraints are information.

You can't simply impose a tree structure on any arbitrary piece of text.

The lisp interpreter can operate on s-expressions, but it cannot operate on arbitrary text. The specialization of encoding as s-expressions allows the development of the lisp interpreter.

Similarly, a C compiler operates on well-formed C programs.

Neither one can do its work on arbitrary text or on the other's text.

You can argue whether one is better than the other, but while each is a subset of "text" the limitations of the subset enable the expressiveness.

"I work on matter" vs "I sculpt" or "I paint" or "I cook".

s/text/unstructured text/ in the OP. S-expressions are serialized by text, but the Lisp approach is more powerful than UNIX approach because Lisp approach works with structured text by default, while UNIX approach assumes unstructured text. UNIX approach forces every program to write its own arbitrary parsing and unparsing, and makes almost every shell one-liner to be half about converting output ad-hoc shapes of text to input ad-hoc shapes of text.

The only popular modern incarnation of Lisp philosophy in shells is... PowerShell - where instead of s-expressions, the tools exchange objects. That's AFAIK binary blobs, but if you wanted to retain a text-serializable form, then the most obvious way to do it would be... S-expressions.

OK, we get it. You can embed any language in plain text, so it's "expressive". lisper should have emphasized that "how Unix programmers default to using command-lines text" is less expressive than S-expressions, as history shows that programmers tend to fall into using a language of "records of fields".

> S-expressions are a subset of text it’s axiomatically impossible for them to be more expressive.

This reasoning is backwards. Structure creates expressiveness. A novel in correct, readable English is more expressive than an arbitrary pile of letters, because it follows a form. A painting, even when represented as a JPEG, is more expressive than a large number, because it is considered as a valid JPEG and a painting. A Bach chorale with no parallel fifths, and a mid-20th-century experimental atonal piece, are both more expressive than banging on a piano.

I think the GP was talking about what you might call "expressive range". Anything that can be expressed in a string of characters of a given length uses at most the full expressive possibilities of that string. I think it's a silly point, because the meaning of lisper's comment was clearly that the added structure makes s-expressions more powerful than the Unix convention of "plain text".

The Unix standard of lines within which there are fields delimited by some character is, itself, a very primitively constrained/structured form of text, so it too is less expressive than raw text could be. But this is a daft discussion.

This has to be intended as a joke in order to pretend to be dense right?

First of all, the notion that a subset being "more expressive" than its superset is "axiomatically impossible" is wrong. If I form a 20 letter string that's a sentence (e.g "Welcome to my house", it would be more expressive than the totality of 20 letter strings (even if it's a subset of them). That's because the totality of 20 letter strings includes all possible valid sentences and close to 20^26 (or more if we add space, numbers etc) arbitrary BS strings, and thus can't really express anything concrete.

Parent obviously speaks about common text treated with regular string/text manipulation functions (what 99% of Unix shell pipe work is) vs text formatted and interpreted as an S-expression structure.

The same way a structured english sentence is more expressive than a bunch of vocalizations (even if a sentence is just a bunch of vocalizations too -- it's the structure that makes it expressive).

It's not about the merits of plain text as a general format vs a specific plain text variant. It's about the merits of a specific structure, vs the kind of text files tools like sort/cut/tr etc were made to operate on.

In this context, think of "text" as shorthand for unstructured text. Which is another subset of text. The sort of text that Unix tools traditionally handle. Colon-separated files like /etc/passwd, line-oriented files like /usr/share/dict/words, etc.

So, now we have two subsets of text, and it makes sense they can have different levels of expressiveness.

(Actually, unstructured is too extreme since there is a little structure. Something like "isomorphic to CSV" isn't quite right, but closer.)

“Colon-separated files” are not unstructured though. And arguing that s-expressions are more expressive than text because “text” excludes all structured examples of text seems like an excessive in definitional acrobacy acrobacy.

But Unix didn't adopt everything-is-colon-separated-values. Everything being unstructured text leads to a lot of different, incompatible ways of structuring that text. S-expressions themselves aren't important, I think, it's just having some standard structure that's used everywhere and you don't have to parse yourself. Trees are a good choice for a universal structure, hence things like XML. S-expressions are just a very lightweight and flexible format.

Obviously I mean degrees of structure, not an absolute lack of structure.

The goal of the definition isn't to give S-expressions an advantage so they can be proven superior and Lisp people can win the argument. It's to describe the sorts of files that actual existing command-line utilities (like this in Unix/Linux) can conveniently operate on.

The "head" command deals in line-oriented fashion and can print the first N lines of a file. The "cut" command can select certain columns based on a line-oriented and character-delimited structure. There is no exact definition, but this gives you a general idea of what text files means in the context of "becoming a command line wizard".

Text is a subset of binary, it surely doesn't hold that a CLI interface that only took protocol buffer based binary input would be more expressive then a text oriented input?

Not at all. Less is more.

By using specific constraints (rules for sexprs), one creates a wider array of possibilities.

By having no rules (text), one has to provide a rubric on how to parse the text. And when you have to parse, you have to write a parser, and define the parsed format.

Let me rephrase that: S-expressions are more expressive than text that adheres to the typical unix conventions.

S-expressions are text with some constraints in its construction.

However, what would you expect from someone whose username is literally "lisper" ?


Lisp people don't like being reminded about the theory of computation. Which is ironic, because they like Lisp because it is closer to raw mathematical syntax.

For me, clever one liners is a very much an iterative process. In the example given I'd find (memory or Google) the program to spit out each word, then find how to count them, somehow sort it. In the end you might have a multi line sequence of commands, then you gradually move them to pipes. It is rarely written in one swoop anyway.

If learning all that stuff is too hard for you, follow this advice from Dilbert: https://dilbert.com/strip/2010-05-16


Off topic: I loved the Top Gear homage in that one.

I used to LOVE Dilbert. His comics are so funny.

I didn't mind the Trump stuff. I think Scott Adam's insights were valuable even if his comments were extremely frustrating and contradictory.

But his somewhat new Climate Change denial stuff finally made me unfollow him.


> But his somewhat new Climate Change denial stuff finally made me unfollow him.

I don't follow Scott Adams, but my understanding from when he started talking about that subject is that he is not denying climate change, but he is acting as a climate skeptic in order to bring about debate and enlighten people on all sides.

For a (possibly bad and certainly trite) metaphor, it's somewhat like saying "Gravity is settled science" to a gravity denialist. You could instead provide some facts and experiments they could try such as holding their hands above their head to fight the force[0] of gravity. They could decide they don't think your facts are correct and perhaps theorize other reasons why they cannot hold their hands above their head indefinitely.

That's science, where even gravity is "just" a theory[1].

[0] Apparently in the Theory of Relatively, gravity is not actually a force, but a consequence of the curvature of space-time and uneven distribution of mass

[1] https://en.wikipedia.org/wiki/Gravity#Modern_alternative_the...

>I don't follow Scott Adams, but my understanding from when he started talking about that subject is that he is not denying climate change

We don't need debate on climate change. There are not "sides". (Unless you count reality and magical thinking as "sides".)

We absolutely need debate and skeptics on climate change. Debate doesn't mean adversarial; it means questioning the evidence and possible interventions.

If we aren't debating on climate change, how do we decide where to put resources to reduce emissions and clean up existing pollution?

>Debate doesn't mean adversarial; it means questioning the evidence

Questioning the evidence of climate change is de facto adversarial. The evidence is overwhelming and beyond question. No intelligent, sane person arguing in good faith would argue otherwise.

Your derision and perspective are more appropriate for a religious zealot than a scientific thinker. Skepticism is the heart of science.

I have not invested enough time in understanding the currently best climate models. I must, therefore, take a skeptical approach and question conclusions drawn from that evidence. Especially so for a projection.

For the level of assurance you seem to have, I assume that you have rerun models yourself, have a deep understanding of complex systems and statistics, and done other climactic work? Even that is skepticism applied and questions asked.

>I don't follow Scott Adams, but my understanding from when he started talking about that subject is that he is not denying climate change, but he is acting as a climate skeptic in order to bring about debate and enlighten people on all sides.

Apparently this is not allowed in polite company.

Of course its allowed. And I would love Adam's attempt to "enlighten people" if that was really his goal.

But I'm saying this as a long time super fan... that isn't what it feels like to follow him anymore. I think he is less passionate about entertaining people or enlightening people, and more interested in bolstering his conservative credibility.

I don't know why his career arched in that direction, and its his right to do whatever he likes. But for me I have had to jump off the train because doesn't seem to be tied to what I liked about his humor.

I'm the conservative version of an Al Franken fan. We came for the comedy, but at some point it isn't about that anymore.

I agree with you, he doesn't write/speak about that topic like somebody who's just uninformed and is ready to learn. He uses invalid arguments and false claims there, which is not contributing to anything but, like you say, his "credibility" among those that need to identify with such public personas, for whatever reason they have. And his is getting "attention". It would not be different if he were a flat earther, and it's equally worthless.

>But his somewhat new Climate Change denial stuff finally made me unfollow him.

To avoid an echo bubble, don't unfollow people for such reasons. In fact find more people with opposite views to follow.

One does not have to subject oneself to nonsense in order to keep an open mind about the world.

One's mind might not be as open as one thinks unless they're regularly evaluating opposing perspectives, nonsense included.

It's not necessary to completely cover yourself in crap, but you should have an exposure to it once in a while.

Opposing opinions are not necessarily nonsense. Not that you said that, but many imply it or assume it. It's like their worldview or nonsense. At best, they allow small diversions from their worldview.

I say, we should allow huge diversions from our worldview.

In fact, even if its nonsense, if its influencial we should study it and understand it (that is, we don't have to suffer any random nonsense, but if something is both nonsense and big, the real active citizen should study it at least, even if just to try to grasp why their fellow people fell for it and what consequences it might have).

Of course if they start with "they fell for it because they're bad/stupid/crazy/lazy etc" that's just something to make themselves feel better, and doesn't really explain everything.

First of all, this or that people can be bad/stupid/crazy/lazy: those are characteristics of the individual. Millions of people can't be said to be that, there are other forces at play when masses make decisions or adopt things, not personal pathologies.

E.g. it's surely it's not because all the "stupid/crazy" people are in the fly-over states that Trump was popular there. And vice versa, it would be improbable that most of the smart/sane people are in the coasts. (You could reverse this with how MAGA supporters view Hillary supports, and it would be the same argument).

Perhaps its a cultural thing that made this people vote this way and the others that. Or a historical thing. Or an economic shift that's felt in one area and not in another. People should strive to understand those things then, instead of assuming the others are into "nonsense".

Of course people conveniently call their fellow people good/sane/etc when they agree with them (e.g. a vote goes how they liked) and then explain the opposite result at some later point (e.g. people voting something they don't like) as "people being bad/crazy/duped/etc"). They're not bothered to see the contradiction in calling the same masses both things at different times.

I'm sorry, maybe this is arrogant, but I can't learn something from everyone who has a different worldview. There are topics like climate change where most people's stances follow stereotypes. If I encounter such a stereotype, I'm 99.9% sure I won't find anything new. In these cases, I rather spend some time with more interesting diversions from my worldview than the one I already know so well.

This is not about people with another opinion being bad/stupid/crazy/lazy. Maybe they are right and I am wrong. It is about repetition, and myself focusing on stuff I don't know yet.

>I'm sorry, maybe this is arrogant, but I can't learn something from everyone who has a different worldview. There are topics like climate change where most people's stances follow stereotypes. If I encounter such a stereotype, I'm 99.9% sure I won't find anything new.

You don't need (and I didn't ask) to learn "something from everyone who has a different worldview".

But you (or, at least, an active citizen) needs to learn why tons of people share the same different worldview.

E.g. why is there a dissent on "climate change" being real. And the circular non-answers to those questions, like "because they follow stereotypes" or "because they're dumb" are not insightful.

It is very true that one should try to find out why so many people share a different worldview.

However, reading the 1000th climate change proponent or opponent won't help in this goal, "because they follow stereotypes".

And I never claimed to explain the different world views with "because they follow stereotypes", I explained my own behavior with this statement.

Finally, "because they're dumb" is not insightful. But "because they were lured by simplifying propaganda, and this is because they haven't learned to analyze complex situations" can be a valid explanation.

On the other hand, I read Dilbert and when I noticed his political rants a few years ago, I just adblocked the css for his blog posts.

I never really got to be a command line wizard even after 7 years of using it daily. After some time it struck me that I am somewhat of a wizard in Ruby and most of the things that I need to to in the terminal would be really easy for me if I could make use of my Ruby skills in a more natural way. I came up with rb [1] which made it possible and I no longer read man pages of simple unix tools

[1] https://github.com/thisredone/rb

I'm about the closest to a command-line wizard that I know personally, but the only reason is that I grew up on the command-line - I had already graduated college before GUIs were common. I'm not entirely convinced that, in today's era, spending the time it takes to really master the command line is honestly worth it; you need to know how to do some things, but there are probably better tools for most of these jobs. A lot of what I do naturally with sed, awk, cut, paste, uniq, sort and tr could be done just as quickly and naturally in Excel; I just use the command line because I already know it.

That...is actually quite clever. Convenience of the command line, with the expressive power of Ruby. Nice.

> The output of a shell command is text. Programs are text. Once you get into the necessary mindset, everything is text.

This mindset is also how we got the C preprocessor...

Programs are structured text. The judtaposition is against binary structures (Windows style) that require an intermediary to perform translation for humans.

The C-preprocessor travesty is text, but hygenic macros are also text. The difference is structure.

And the best bit about PowerShell is that you don't have to force everything into text, and then work out how to parse it back out of text.

You are in a scripting language REPL where you can have a hashtable of words and their counts, and keep it structured

    gc words.txt |% { -split $_ } | group | sort count -desc | select -first 5
and when I saw the first one was spaces:

    gc words.txt |% { -split $_ -match '^\w+$' } | group | sort count -desc | select -first 5
But the point isn't that you can do it, or any comment on the length of it, the point is that the output of group (group-object) is structured - the counts are [int] types, the sort is able to pick those out and sort on them without needing to parse them out of text and convert them to numbers, the select (select-object) isn't outputting individual lines of text, so the structure isn't lost when shown on screen, and the output can be formatted for display as columns, lists, tables, from its structure.

I tried it on the text of the article itself.

Here's my cmdline:

# cat file | preserve only spaces and letters (remove punctuation) | convert to lowercase and put each word on one line | use awk maps to keep a count for each word and print map in two columns, count followed by word | sort in reverse numerically by count column, then by word ascending | print top 10 words by frequency

cat input.txt | sed -e 's/[^a-zA-Z ]//g' | tr -s ' A-Z' '\na-z' | awk -F ' ' '{ if($1 in count) { count[$1]=count[$1]+1 } else { count [$1] = 1 } } END { for(k in count) { printf("%d\t%s\n", count[k],k) } }' | sort -k1,1nr -k2,2 | sed -n -e '1,10 p'


27 the

21 to

15 a

11 is

10 of

10 that

9 but

8 text

7 was

6 and

Here's my version using a different "command line": chrome devtools

Object.entries(document.body.innerText.replace(/[^a-zA-Z\s]/g, '').toLowerCase().split(/\s+/).reduce((o, word) => {o[word] = (o[word] || 0) + 1; return o}, {})).sort((a,b) => b[1] - a[1]).slice(0, 10)

Plain JS is missing some niceties, but being able to crunch some data from a website using the same iterative approach as a shell command line has its place!

This is pretty cool, I haven't really used the console for something like this in a while, and it's a good reminder for its power.

Instead of

  { if($1 in count) { count[$1]=count[$1]+1 } else { count [$1] = 1 } } ,

does and means exactly the same thing.

Why bother 'changing' the input field separator? No need. By default it's already space.

  awk '{count[$1]++} END {for(k in count) print count[k],k}}'


I wasn't sure what awk would do if there was no entry when it first encountered a new word.

And yeah, I left the -F in there by mistake.

But I don't get overly fussy with pipelines like this. I usually just hack them up as a one off then forget about them.

Some more simplification suggestions:

  first sed    -->   tr -dc 'a-zA-Z \n'
  second sed   -->   head

To me it's a similar principle to REPL-based development: you compose or chain tools, testing each tool (or deciding between tools) at each step.

As with most programming, there are usually many ways to solve a problem. Sometimes the different approaches are equally good, but sometimes the approaches are more or less ideal given the constraints.

There's no real magic, and it's really not difficult. To me, problem solving like this, and ending up with a "script" (just a chain of piped commands with some args) is often the easiest way to solve problems.

I tend to agree here, as someone who is both command line wizard and REPL-based developer.

In fact, the majority of my Ruby scripts for systems work are combinations of the language and %x'd unix commands. I find it's incredibly powerful.

For more on Knuth, see https://franklinchen.com/blog/2011/12/08/revisiting-knuth-an... and also a C version in http://www.cs.upc.edu/~eipec/pdf/p583-van_wyk.pdf

I think the main thing that stands out to me here is just how long ago it was; Knuth was writing just past the assembler era. None of the modern infrastructure was available, such as languages with builtin dictionaries like Perl and Python.

However, software compositionality is one of those things that's been tried in so many not-quite-successful ways. Pipes? Objects? Interprocess object systems (COM/OLE) or object brokers (CORBA)? RPC? Microservices? Packages? Object pipes (powershell)?

One of my more enjoyable side projects was improving parsing of some truly horrendous log files - massive and poorly formatted - from ~2 hours to ~2 minutes.

The code I was given was several hundred lines of python. All I did was replace all but the final parsing function with a single "grep | sed | awk | grep | uniq | sort | grep" line and called it a day. I'm sure I could have taken it farther, but for the amount of effort it was a fun little exercise.

I can't beat this for terseness, but I think I could whip up something close (say 15 lines) in Python (readable) or Perl (would be shorter than Python but more readable than linux).

I guess I am more curious about what Knuth did! Did he just comment it a lot for "literate" code?

As always I feel the Python (or Ruby if I could write it) is the sweet spot between being easy to read and easy to write.

Knuth was writing in Pascal, a language that had many limitations (e.g. it didn't even have a proper string type; "array of 15 characters" and "array of 16 characters" were distinct types) -- see Kernighan's article "Why Pascal is not my favourite programming language" from about the same time. So he had to implement a lot of the primitives in the program, and also more complicated data structures like a hash trie in this case, and the point was to demonstrate how to do that without the program becoming unreadable. I'm reading some of Knuth's Pascal programs recently and writing down notes for myself; ended up discussing some of these points here: https://shreevatsa.github.io/tex/program/pooltype/#other-pro...

(A slightly better comparison may be with the source code of the Unix programs (tr, wc, etc), rather than with shell pipelines composing those programs.)

Read the original sources for more:




Thank you for those links and your post!

I always kind of duck when I see "readable" and "Perl" in the same sentence. You never know when jokes are gonna start flying.

Perl would also be a lot faster than the shell. It just does less work. So if the file is big, you would probably prefer Perl.

  perl -lne 'map { $words{lc $_}++ } split /\W+/,$_ }{ print "$words{$_} $_" for (sort {$words{$a}<=>$words{$b}} keys %words)'

15 lines? You must be forgetting about the Counter class from collections.

I think there’s a XKCD about importing your solution from the standard library.

"import antigravity" - https://xkcd.com/353/

> The hard part on the path to becoming a command line wizard, or any kind of wizard, is thinking about how to apply existing tools to your particular problems. You could memorize McIlroy’s script and be prepared next time you need to report word frequencies, but applying the spirit of his script to your particular problems takes work.

It definitely takes a bit of work, but hopefully it's still a lot less work than writing the verbose solution.

The hard part of mastering unix one-liners for me are two things:

- Figuring out which tool can pick apart or reconstruct a specific syntax robustly. I think I spend more time figuring out input parsing and output presentation than anything else.

- Knowing when the right time is to ditch the clever one-liner and upgrade to a real program. There's a benefit to prototyping as a one-liner, but there are cases when replacing the result with a long program is appropriate, especially in a production environment. But, then again, only if the one-liner isn't robust or scalable enough, and usually it probably is.

A verbose solution is repeatable, devuggable, and understandable six months or even six years from now.

A verbose solution can be deconstructed and understood by a newbie.

A oneliner? Not so much, if at all.

That is occasionally true, but in 30 years I've absolutely definitely seen a lot more the other way around, personally speaking.

Good POSIX one-liners don't need to change usually and aren't hard to understand. A verbose program, on the other hand only lasts if it's like K&R C with zero dependencies. All the languages and libraries have changed considerably, scripting languages and versions come and go, C++ is quite different and might not compile later, build systems have backward incompatibilities.

Plus, a one liner is, like, one line. It's really not that hard to understand unless it's intentionally crazy, and even then, even if it takes a half hour to work through, that's still less work than something like Knuth's 15 pages of code.

We could be imagining pretty different scenarios, though, so feel free to share examples. Normally a pipe consisting of a combination of sort, cut, head, and uniq is trivial to grok, and it would be very wasteful to re-implement that kind of thing on your own. Here are a few one-liners I've made use of recently that I love:

  # count the function calls in log.txt

  grep -Po '^\K.+(?=\()' log.txt | sort | uniq -c | tee callcounts.txt

  # sum the timings 
  # input: 2 columns, time & name. output: 3 columns: time_sum, name count, name
  # It's basically a pivot table in one line of code

  < time.txt perl -lane '$k{$F[1]}+=$F[0]; $c{$F[1]}+=1; END{print "time\tcount\tcall"; print "$k{$_}\t$c{$_}\t$_" for keys(%k) }' | sort -n | tee time_grouped.txt
In my own personal repo, the bash scripts and one liner aliases are outlasting all my python and C++ code. The verbose programs that I wrote myself are harder for me to maintain.

The question I have for Dr. Cook is, "What is this wizardry worth in today's world of computing?"

It seems for every concise one-liner that requires no internet access there is a complimentary website someone has set up to perform this same task "for free" on a remote computer and return the results in a web page.

I never use those because I still enjoy the command line. People who design software and websites today, and those who review their work in the media, often refer to the command line as some sort of purgatory. God help the user who is faced with a command prompt. For me, this is more like a haven from the world of gratuitously graphical software and web applications that are too often either bloated, overly complex, inflexible, opaque, slow, or some combination thereof.

It is not that the command line is pleasing in an absolute sense. It is only pleasing in a relative sense, when contrasted with the alternatives. If the alternatives are painful enough, then moving to a command prompt is a relief.

It's also worth quite a bit (IMO) that the data doesn't have to leave your computer.

I think this is just the hard part of programming in general - mapping your problem into something that the tools you know how to use can solve.

I don’t think there is anything fundamentally different between when your tools are Unix cli tools or a programming language. It is the same skill.


  tr A-Z a-z | tr -cs a-z '\n' | sort | uniq -c | sort -rn | sed ${1}q
equivalent to

  tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q
but 4 characters shorter? Or am I missing something?

McIlroy's claim was not that his script is the shortest possible way to implement the task, but rather that the Unix philosophy of combining general-purpose tools operating on text streams is a better engineering approach than Knuth's special-purpose program, because the general-purpose tools can be reused in many ways.

McIlroy: "A first engineering question to ask is: how often is one likely to have to do this exact task’? Not at all often, I contend. It is plausible, though, that similar, but not identical, problems might arise. A wise engineering solution would produce—or better, exploit—reusable parts." [1]

[1] https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/pearls-...

My initial comment was easy to misunderstand, sorry: I was just wondering whether the additional A-Z check was doing something I was not aware of.

It's slightly more efficient to squeeze non-alphabetic characters first and then convert to lowercase, rather than vice versa, because in the former case most of the non-alphabetic characters only get processed once, compared to twice in the latter case.

Yes, it's shorter, but we're not playing golf. It's conceptually pretty much the same, and then you just go with whichever one you came up with first.

Yours says:

    Convert to lower case
    Convert non-letters to CR
The original says:

    Convert non-letters to CR
    Convert to lower case
Conceptually the same. The point is, most people don't seem to be able to do this at all, although there are many who can tinker with a given solution and improve it around the edges. It's the ability to come up with the initial version that seems to be rare.

Is it really that rare, though, and if so, what makes it rare? I find that when I am doing this sort of thing, the issues seem to be the same as when writing a conventional loop - in this case:

    Split the stream of text into words.
    Normalize with respect to case.
    Accumulate a count of each distinct word.
    Sort by the counts.
    Print the number needed.
Furthermore, I would guess that the corner cases (such as hyphenated words) are as likely to trip someone up in either approach.

Perhaps what makes it rare is not having seen an example before - but one example is enough to demonstrate the principle, thanks to its elegant simplicity.

The hard part used to be in remembering which utility has an option that performs the transformation you are looking for, but practice helped with that, especially once one knew enough to guess which utility probably had that feature, at which point one brought up its man page. These days, of course, it is usually easy to search for the solution you want.

My first reaction was to wonder why `tr` was required at all - neither conversion struck me as necessary. These days both `sort` and `uniq` can do case-insensitivity:

    sort -f | uniq -ic
So I just allow the shell to do word splitting:

    for x in `cat foo.txt` ; do echo $x ; done | sort -f | uniq -ic | sort -rn | head -<num>

Don't abuse command line parameters just to do word splitting. There's at least a dozen obvious ways to do that without leaving your reader guessing what that loop does. Avoid being clever. If you want to change word separators to line separators then just do that (with tr, sed, or some other way) and the purpose will be obvious.

And please be mindful of looping over user supplied data like that. Those things are typically the ones that works fine in test but blow up in production. The environment has a maximum size, and you shouldn't store large amounts of data there.

Doesn't that have different behaviour if there is punctuation in the file? On my phone and not in a position to test it, so I'm not sure, but I think it's different.

It does indeed. Good catch :)

$ cat test.file | tr -cs A-Za-z '\n' | tr A-Z a-z > diffFirst

$ cat test.file | tr A-Z a-z | tr -cs a-z '\n' > diffSecond

$ diff diffFirst diffSecond


No differences with my test file.

wondering if there's a utility for finding shorter strings with the same output

jq has made it easier to munge complex data from many non-unixy apps, but then you have to learn how to write complex jq expressions. Something like sql applied to json might make this easier.

I find myself frequently reaching for 'gron' when I need to deal with JSON in a script. It's a bit simpler than 'jq', both in usage and in capability.


I managed to write a Telegram bot in ash (in OpenWRT) because it had a little program called jsonfilter. It's not as powerful as jq, but quite useful.

jq is an amazing tool, and I wish I used it frequently enough to remember how without looking it up.

I've mostly switched to using gron [1] for these applications -- the operation is much simpler and doesn't depend on learning a new DSL for manipulating json. I have to manipulate JSON data just enough to want to have a tool better than just firing up node and running something, but not a complete new language. With gron I can just wire it up to a bunch of standard line-based utilities.

[1] https://github.com/tomnomnom/gron

while I get what gron is trying to do, I do not have an intuitive grasp of how to use it. Do you have an example that you wrote/learnt from?

Is the entire question becoming obsolete?

I'm something of a command-line wizard (well, given enough time, if I'm honest! I always have to re-remember things like awk syntax), but in interviews now, no one asks core Linux shell-type questions (or even theoretical kernel-type questions). They primarily ask:

- coding (70+%)

- diagramming infra type questions

- experience with keywords ("CI/CD" and "containerization" being the biggest keywords).

Recently I interviewed with a 'startup' that's in the real estate space and I was told by a manager, after touting my Linux infrastructure skills, "Oh, I don't even care about that anymore. I can teach all there is to now in a couple of days. It's all AWS and tooling here."

This is in the DevOps space.

Sounds like the new generation can give 0 craps about the command-line?

Just because suits don't ask for it in interviews doesn't mean it's not relevant for your day job.

Also works the other way around: How many binary trees are you balancing each day?

I love that quoted new-line. Thats the exact amount of crazy idiosyncrasy I expect from shell one-liners.

It's nice, alas it has some problems with apostrophes and dashes.

    perl -pe 's/[\s\.,]+/\n/g'
works better. Nowdays you could also get rid of the extra "tr" to convert to lower case and use the new-ish "-i" option to make uniq ignore case. That wasn't available in 1986, however.

You'll probably enjoy https://unix.stackexchange.com/a/503371/5132 then. (-:

Of course this raises the philosophical question of whether a "one-liner" is in fact a one-liner if it has two lines.

Are we just going to ignore the (rough guess) thousands of lines of code that make up `tr`, `sed`, `uniq`, and other CLI tools? I enjoy hacking together nifty solutions on the shell, but the tone of this article irked me.

That was not implied.

A key skill in good programming is choosing tools at the correct level of abstraction for the problem. Here, the higher-level unix tools are clearly the better choice.

Saying "that's cheating" because the high-level tools are themselves written with lower-level tools with many LOC is missing the point.

> Knuth and McIlroy had very different objectives and placed different constraints on themselves, and so their solutions are not directly comparable. But McIlroy’s solution has become famous. Knuth’s solution is remembered, if at all, as the verbose program that McIlroy responded to.

He admits they don't compare, but does it anyway. I shouldn't take offense... but for some reason I do!

Even if you wrote the program in C, you still are using someone else work (libc), not to mention the kernel.

Unless you're planning on writing raw assembly, you're using someone else's program/work anyway.

If you're writing raw assembly, you're still using the chip designer's work. Better start taping your own chips, and god help you if you use a standard toolchain for chip design. ;)

"If you wish to make an apple pie from scratch you must first invent the universe". --Carl Sagan

Just posted this to HN some days, in another thread, but since relevant:


Post has rough Python and shell solutions to the problem Bentley proposed.

The solutions can be useful for beginners to Python or shell.

Your "shell" solution isn't really a shell solution, it's an awk solution.

It's really valuable to know awk, but it's a bit misleading to claim that "it's shell".

This is just pedantry.

I this read you other replies and I can't see them adding much value to the discussion.

You are just being rigid about semantics.

What action are we supposed to take having read your comments?

To be careful of not calling a pipeline with awk in it 'shell?'

To be ashamed of using awk because it's a programming language, and that is 'cheating'?

Wow, I would never have expected my comments to have been interpreted that way. Genuinely shocked.

Thank you for the feedback.

The awk language and shell are now in one bundle called POSIX. A POSIX shell environment is not conforming if it doesn't have an awk command. A conforming POSIX implementation could make awk a shell builtin.

That I did not know - thank you. We here still think of awk as fundamentally different from other pipeline facilities such as tr, sed, sort, uniq, and so on, but I can see why it could, perhaps should, be though of as being "shell".

I guess I was triggered by the fact that the proposed shell solution is:

* not on a command line (although it could be),

* is significantly longer than the original command line solution, and

* gives a different result.

But you're right, it's shell. I might, however, given my background, and remembering as I do its first introduction, always have trouble thinking of it as such.

> We here still think of awk as fundamentally different from other pipeline facilities such as [...] sed

If you consider awk "not-shell" because it's an entire language, then it's really inconsistent to consider sed "shell". sed is a stream programming language. For example, this is a Sudoku solver written in sed: http://sed.sourceforge.net/local/games/sedoku.sed.html

Actually, we are pretty marginal on sed, but point taken. It feels like there's a difference between "stream mode" and "program mode".

I remember when awk was first implemented. sed was already standard, and awk was this new thing. I love it, and for some things it's my "go to" language. That colors how I think of it - I think of it as a language.

But this has been done to death, everyone is jumping on me, so there seems little to add.

>But this has been done to death, everyone is jumping on me, so there seems little to add.

No one is "jumping on you", as you put it, neither the other people who replied to your comments, nor me. It's absolutely normal in a tech forum (more so in such ones, because they are mostly fact-based) and in fact even in any online or real-life forum, for that matter, for people to point it out if they think some statement a person has made, is wrong.

In fact, you did exactly that (point out that you thought I was wrong - see your comments that I've quoted just below), in your top-level reply to my original comment about my having created solutions in Python and shell, which is what started this whole sub-thread. Going by your logic, I should have complained about you jumping on me :)

Here's where you pointed out that (you thought) I was wrong:

>Your "shell" solution isn't really a shell solution, it's an awk solution.

>It's really valuable to know awk, but it's a bit misleading to claim that "it's shell".

Not only that, you claimed that it was misleading, without having any way to know whether I had any intention to mislead or not. Come off it. You cannot read my mind. It's mine, not yours. Come to think of it, that was a poor judgement call on your part, too, because even if I had some intention of misleading people, which I did/do not, what could I possibly gain by misleading them about whether some code is a shell solution or an awk solution? The whole "misleading" idea is a figment of your imagination, or of your unclear thinking, I'm sorry to have to say.

Anyway, this thread has gone on for too long, with barely any benefit to anyone. I'll just briefly touch on that fundamental flaw in one of your points about my work, that I mentioned in another comment, and then be done with this whole thing. Doing that in a separate comment.

>I remember when awk was first implemented. sed was already standard, and awk was this new thing. I love it, and for some things it's my "go to" language. That colors how I think of it - I think of it as a language.

Google for "sed is Turing complete" and see the results, including the post at catonmat.net - Peteris Krumins' blog :) There's even an HN thread about it.

I can pull off some ok stuff with the shell toolkit, including awk and sed but that's a whole different level.

Whoever authored that, mad props.

If you ever have a low moment, worrying about about the marketability of the skills you have developed in your side interests, you can always think of these people that think it's a good use of their time to learn how to develop a Nintendo Game Boy emulator in Sed.

>Your "shell" solution isn't really a shell solution, it's an awk solution.

>It's really valuable to know awk, but it's a bit misleading to claim that "it's shell".

It's more than a bit misleading to accuse someone of being misleading, without checking your facts first.

Maybe you spoke too soon, without reading the full post, as I find people sometimes (often?) do (not just in reply to my comments, but to those of others too), not just on HN, but on many forums.

Read both the header comment and the last line of the script you called "an awk solution":

Header comment:

# bentley_knuth.sh

Of course, the filename extension does not make it a shell solution instead of an awk solution, but I used .sh because I knew what I was doing [1]. See next point.

Last line of that script:

    ' < $2 | sort -nr +1 | sed $1q
So the script uses all of awk, sort, sed and shell - which you seem to have missed, maybe because you only skimmed the first some lines before replying here. That makes it a shell script in my book, plus see below.

The $2 and the $1 - in the $1q bit - are shell command line parameters, because this code is invoked from the shell as a script, with arguments passed. Also see the pipe symbols (|). All these are part of shell syntax, not awk syntax (although awk has $1, $2, etc. too, they have a different, though related meaning - in fact, I use a $i in my awk script within this shell script too). The whole script is a pipeline, that pipes the awk script's output to sort and sort's to sed.

I don't know if you are a shell/awk newbie or not, but regardless, you missed those points above, like I said, probably due to haste.

[1] Check out this recent article by me in Linux Pro Magazine:


for a bit of shell quoting magic along with some use of awk.

And this post:

UNIX one-liner to kill a hanging Firefox process:


and also the interesting comments on that post, from which I learned some things.

After having worked for years on Unix platforms (as both a dev and system engineer), from even before Linux was created, I think I know the difference between shell and awk (and a bit more, although still do not claim to know everything, or even close).

On a more positive note, I'll use this as an opportunity to put in a plug for my Python and Linux training offerings :) Course outlines and a couple of testimonials here:


I read your entire post, and I didn't miss any of the points you make. I simply disagree with you.

Quoting from your post:

    And here is my initial solution in UNIX shell:

    # bentley_knuth.sh

    # Usage:
    # ./bentley_knuth.sh n file
    # where "n" is the number of most frequent words 
    # you want to find in "file".
    awk '
            for (i = 1; i <= NF; i++)
    END     {
                for (i in word_freq)
                    print i, word_freq[i]
    ' < $2 | sort -nr +1 | sed $1q
So you invoke awk, and then run the output of awk through sort and sed.

You're doing all the word counting in awk.

Yes, you're invoking awk from a shell script, but that's really not the same thing as "using shell." McIlroy’s solution is genuinely shell:

    tr -cs A-Za-z '
    ' |
    tr A-Z a-z |
    sort |
    uniq -c |
    sort -rn |
    sed ${1}q
"awk" is generally accepted as a full programming language, whereas "tr", "sort", "uniq", and "sed" are command line utilities. I don't think "awk" classes as a command line utility, so I don't class your solution as "shell".

Perhaps you don't agree, perhaps you think "awk" is a command line utility. If so, then we'll agree to disagree.

Wow. Multiple misunderstandings on your part in one single fairly short comment. I'll of course reply to it, substantiating what I said, as best as I can, but it's late here, and when replying to an argument, I prefer to do it thoroughly enough, so I'll do it, hopefully, by tomorrow night my time, otherwise a day later, if too busy. I think the reply link should be alive until then.

Meanwhile, you might want to scrutinize your own reply (the one to which I am replying here) and think a bit more deeply about what might be wrong with it. And until my full reply to come later, here are a couple of hints:

Hint 1:

>"awk" is generally accepted as a full programming language, whereas "tr", "sort", "uniq", and "sed" are command line utilities.

- a tool can very much be both a full programming language as well as a command line utility at the same time. awk falls into that category [1], as do many other Unix commands. Who made up a rule that it cannot be both at the same time? You?

Hint 2:

Check out your line:

>So you invoke awk, and then run the output of awk through sort and sed.

and compare and contrast its meaning with the meaning of your few lines immediately below it, including the one that says "McIlroy’s solution is genuinely shell:". Try to see the similarity/difference/contradiction.

[1] Finally, read the book The Unix Programming Environment, a classic, by Kernighan and Pike (Unix pioneers). I cut my Unix teeth on it, years ago, although, of course, years do not mean I am right and you are wrong. Facts do. There are chapters in the book on awk and sed. And IIRC they come under the topic of filters (maybe even the chapter name is that) a.k.a. command-line utilities, although not every such utility needs to be, or is, a filter. I think you have some confusion about terms and their meanings, and/or are assigning your own meaning, even though you use words like "generally accepted".

Also skim this article (published by me, years ago, on IBM developerWorks) to get your fundamentals more clear:

Developing a Linux command-line utility:


Enough for today - will do follow-up comment as I said, if needed, in a day or two.

Then we'll agree to disagree. I think you are wrong on so many points here, it's clear we're not going to agree, and probably won't find common ground.

Thank you, by the way, for your references to various published material. FWIW, I've worked with BCPL, C, AWK, C++, Unix, Linux, GNU, and much, much more, for the last four decades or so, so I'm not inexperienced, and I have read most of the classics. That also doesn't mean I'm right, but it does mean that I have a basis for my opinions.

So thank you for your offer to school me, but I'll decline, and, as I say, accept that we disagree.

>Then we'll agree to disagree. I think you are wrong on so many points here

Thanks for casting aspersions without even so much as a mention of what the "so many points" are that I am supposedly wrong on.

When I said upthread that you have misunderstandings, I at least mentioned some and hinted at or gave a clue to what the others were.

Also interesting that when kazinator said to you that awk is part of shell, you meekly accepted that he was right, thereby contradicting your earlier claim that my shell solution was not a shell but an awk solution. And in that same comment ( https://news.ycombinator.com/item?id=19279030 ) accepting it, you still seem to be neither here nor there, by your own words, where you say things like you "see why it could, perhaps should, be though (sic) of as being "shell", but "always have trouble thinking of it as such".

Just happened to see your reply here before I went off to sleep:


Interesting and maybe significant that you say: "Then we'll agree to disagree. I think you are wrong on so many points here, it's clear we're not going to agree, and probably won't find common ground."

First, interesting that you say "I think you are wrong on so many points here ..." but do not deign to offer any points to back up your statement. Kind of a cop-out, looks like. Anyone can say someone else is wrong; such statements do not carry any weight unless backed up with something more substantial.

And about your "four decades", like I said in a previous comment, years or age do not matter, facts do. I care not a whit if the person I am arguing with has 4 years or 4 decades or 4 centuries of experience. They (or I) can still be wrong (or right) about any specific topic we happen to be arguing about. I've been known to acknowledge that I was wrong, in arguments with people less experienced than me, many times, and vice versa has happened too.

Nor is finding "common ground" the goal (this is not some sort of compromise between political parties, it's a technical argument). Getting things right is the goal. For which, sometimes one party or the other may have to admit they are wrong - including me. Just that I do not think I am wrong in this case.

Will still write my fuller reply as I said earlier, to keep my word, and to make the picture more clear for other readers, since you have made these statements, even if you have hastily left the conversation.

OK, so as kazinator has pointed out, awk is now a mandatory part of Posix, and so is genuinely a part of "shell". My reply there says that I and my colleagues still think of awk as fundamentally different from other pipeline facilities such as tr, sed, sort, uniq, and so on, but I can see why it could, perhaps should, be though of as being "shell".

So it's shell. I might, however, given my background, and remembering as I do its first introduction, always have trouble thinking of it as such.

It's not a cop-out, we disagree.

> Nor is finding "common ground" the goal (this is not some sort of compromise between political parties, it's a technical argument).

We disagree. When there is a disagreement, finding what you agree with the the first step in finding where the lines of reasoning diverge. Finding common ground is the first step in resolving differences.

> Getting things right is the goal.

Sometimes in software there are judgement calls. Maybe this is one of them, maybe our definitions differ. Sometimes definitions differ because of context or experience. In each case, the terms used are not right or wrong, they are definitions that are useful in the context.

> For which, sometimes one party or the other may have to admit they are wrong - including me.

This is not an "I'm right, you're wrong" situation. By my experience, in my context, what you wrote would be called a "shell solution" in the same sense as the original command-line solution would be called a "shell solution."

You think that invoking AWK from the command line means that it's still a command-line script. Your definition of the terms means that you accept that invoking AWK still lets you call it a "shell solution."

I think that is fundamentally and structurally different from using command line utilities such as tr, sed, sort, and uniq.

So my position is clear - your solution that you call "shell" is not, in my opinion, just "shell". To me, your solution is an AWK solution, and you feed the output from your AWK program through shell utilities.

You are using the terms in a manner that is different from how I'm using them, that much is now clear.

Do you agree that you have written a shell script that invokes a program written in AWK?

Would it be different if you wrote a shell script that invoked a C program by calling a C interpreter? Would you still call it a "shell solution to the problem?"

Does it matter? Really? I've made clear why I've said that I don't class your solution as being shell, why do you care?

Well, I'm a few days late to write my final point, due to being busy with other work. I know you've probably left this thread by now, as I didn't see any replies to my other challenges to you (about your misconceptions, about your calling some of my points "wrong" without substantiating why, and about your outright waffling (using terms like "could", "should", "maybe", etc., that I referred to elsewhere in the thread), but as I said, I'm not just making the replies for you, but for others, and also because you made accusations against me, so as to vindicate myself (although I do not need to do it, and the choice to do it or not is solely mine - it's just that I choose to do so this time). So here goes - my last comment in this largely futile thread:

I said I would point out a "fundamental flaw" in your points. The flaw is this:

You thought (and said) that my shell solution was an awk solution. That is wrong. It is a shell solution (and not an awk solution) for multiple reasons, which any slightly-more-than-beginner-person to awk and shell, should have easily known, if they had their fundamentals clear, which implies that you do not. It is a shell solution because:

1) the entire script is a pipeline (which is obvious to see from the pipe signs used, if you knew your stuff and had paid attention to the code, before writing your first reply). awk does not have the pipeline operator (as meaning send the output of the previous command to the input of the next command). That itself should have told you that it is a shell script, not an awk script. There is an awk command embedded in the shell script, but that is very different from saying that it is an awk script.

2) You said elsewhere in this thread, in reply to kazinator:

>That I did not know - thank you. We here still think of awk as fundamentally different from other pipeline facilities such as tr, sed, sort, uniq, and so on, but I can see why it could, perhaps should, be though of as being "shell".

That statement of yours above is wrong on two counts:

a) awk is not fundamentally different from tr sed, sort, uniq, etc. It is a Unix command-line command like any other. The fact that it happens to be what you and some others may call a full programming language (not a well-defined term, anyway) does not make it any less of a command-line command. A tool can be both of those at the same time, and awk is. So is Perl. So is Python. So are many other languages. In fact as someone else said and I hinted at, sed may be a Turing-complete language. So does that suddenly make my script a sed script, just because I used sed in it? But I used awk in it too. So should it be called an awk-sed script? But I used sort too. So now should I call it an awk-sort-sed script? See what I am getting at? No, it should just be called a shell script, because that is what it is. The shell is a high level language that orchestrates other programs via its syntax and operators. (See below about the shell's operand being whole programs.) You claimed that the main work of my script (the word counting) was done in awk, and the results piped to other commands, therefore it was an awk solution. But it is the shell that is doing the piping, not awk! awk cannot do such orchestration, at least not easily, not without resort to the "system()" library function it has, but that is again implemented using the shell (and other stuff, like fork and exec system calls - I'm simplifying here).

All shell scripts can consist of any command or combination (not just pipelines [1]) of commands, irrespective of the type of the command, whether it is a programming language or not, what language it is written in, etc. In fact there is not even a requirement that the commands used in shell scripts should all be filters; that requirement is only for shell pipelines. [2]

[1] A shell script can: consist of just a sequence of (one or more) command(s), terminated either by semicolons or newlines, or both; consist of one or more pipelines only; consist of any combo of the preceding. And also other variations, including at least an ampersand (&) terminating the command (or pipeline), which makes the preceding command or pipeline run asynchronously from the rest of the overall command/pipeline/script, if any, i.e. in the "background", as we say in Unix.

[2] Here is a shell script that demonstrates many of the above points:

  # a_script.sh
  foo1 # run foo1
  foo2; foo3 # run foo2, then foo3
  foo4 & # run foo4 in the background
  foo5 > f1 #run foo5, redirect its stdout to f1
  foo6 < f1 | foo7 arg1 arg2 | foo8 arg3 arg4 arg5
Any of those foo* commands in the script could be any command at all, without any restrictions. Only the commands in the pipeline on the last line of the script, even need to obey the conventions of filters, that I described above. The commands on the preceding lines do not.

All this is part of the flexibility, beauty and power of the shell, whether used in scripts or on the command line. Which brings me to my next key point: there is essentially (almost) no difference between typing commands interactively at the shell prompt, and invoking the same commands from within shell scripts that are run by the shell. The exact same syntax with the exact same semantics can be used (for all practical purposes, maybe with a few exceptions, in both modes, interactive or script).

In fact, you can even type for and while loops [3] (including with redirection of their input and output) at the shell prompt. (You can even type if statements at the shell! Same for case statements.)I do it all the time, for throwaway "scripts" such as ones to monitor the execution of some processes, and so on. And many standard Unix books - like classic book, The Unix Programming Environment (UPE), that I mentioned in this thread - show that in examples.

Another thing that UPE says and shows is something to the effect that "the shell is a very high level language - its operands are whole programs (emphasis mine)". That is why we can do things like the example in [3] below, but first, another example:

  while : # : is a built-in that evaluates to True, 
  saving having to run the true command from disk each 
    ps -aef | grep foo
    sleep 10
This is a script (but can equally well be typed directly at the shell prompt, for the reason I gave above) that monitors the execution of the foo command. Better versions of it, using while and until commands of the shell, are shown in the book, you can look them up. One version may start like:


  while ps -aef | grep foo
     # something or just sleep a bit
which shows the point about the shell's operands being whole programs - the "ps -aef | grep foo" part is used as an operand in the while condition - and it is a pipeline, bigger even than "a whole program"! This works because the exit code of the pipeline is the exit code of the last command in it, which is grep, so the while condition is true if grep finds a match of foo in the ps output.

b) You called tr, sed, sort, uniq and so on, "pipeline facilities". They are that, but are not just that. Before and apart from the fact of being "pipeline facilities" (which is anyway, a non-standard term you used, a better and more standard term would be just "Unix commands" or "filters" - filters is a standard term, for programs that read either filename arguments or their standard input, process the input in some way, and write the results to standard output, thereby enabling the whole Unix pipeline paradigm), they are also simply normal commands, or programs. Any of those commands can be used either standalone, or in a pipeline. In fact there are other ways of using them too, for example, you can invoke any of those commands (as well as any other executable) as a child process from some other program you write, in C, Python or other programming language. You are creating distinctions where none exist, for who knows what reason.

I wish there was some tool that could, given a one liner, expand it out into a series of steps to better understand it.

(Or maybe someone wrote a 1 liner for that too?)

Personally I prefer to write small bash scripts over one liners so future colleagues can understand the program's logic.

Perfect, this is why, after almost 10 years, I still come back to HN regularly!


Thanks for this!

In trying to understand one-liners like this, I find that it's valuable to break it down into each step. Run the first step to see what it outputs. Then, pipe that into the second. Repeat for each step of the process. Google or use the man pages, as needed.


[msimmons@msimmons-lin ~]$ time cat moby10b.txt | tr " " "\n" | grep -v '[[:space:]]' | sort | uniq -c | sort -g | tail -n 10

1592 I

2122 his

2446 that

3462 in

3906 a

4051 to

5225 and


5851 of

11997 the

real 0m0.768s user 0m0.805s sys 0m0.021s

my question, now, is where does a complete beginner start?

Google “enough command line to be dangerous”

It lives up to its name and is very much a place to start and nothing more.

Your real problem is going to find enough problems to solve to build up muscle memory.

My advice to you... jump in the water and you'll learn to swim faster. My Linux skills were always lacking a little. (I use to dual boot Windows/Linux) But then I just said fuck it, I was getting tired of Windows. So, I removed Windows and then my Linux skills got way better much faster. I wanted to reduce how much I used the mouse. Unplug it and move it to another room. Boom. It's quite some pain in the beginning but in the end you'll be faster.

Don't rush through tasks, use them as an exercise to learn new things.

When I was starting out, I deliberately took the time to truly master as many of the UNIX tools as I could.

When I would run into a problem like this, I would use it as an exercise to get better at them.

In most cases, especially the first time I used each tool, it took much longer, but it paid off in the long run.

man intro

Its 2019, learning to program like its 1975 might not be a good use of time.

The Software Carpentry team has stressed the usefulness of shell scripting for years [1]. One nice feature is that the scripts are easily tested as they are developed. For example, biologists find this approach helpful for constructing large data processing pipelines [2]. Because the scripts are plain text, they play well with version control (e.g. git). The tools are free, well tested, and will work on most hardware.

[1] http://swcarpentry.github.io/shell-novice/

[2] https://computingskillsforbiologists.com/

Go read "Mythical Man Month", written in your selected year of 1975, and see how relevant it remains today. It's pretty startling isn't it? That a book written over 40 years ago remains relevant today. Dismissing things because they're "old" is unwise. You just need to be able to separate the wheat from the chaff. Now excuse me while I get back to my "Speedcoding in 21 days" book... https://en.wikipedia.org/wiki/Speedcoding

Sometimes it is wiser to learn what has been tried, before reinventing what didn't work in the past. And tools that survived for 40 years in our industry can't be that bad.

you can use vi and vim, but I'll just use nano.

Are you equating usage of a shell with “1975” programming? How do you think the tools and environments you are using for your work have been created?

A lot of the "modern tools" people use these days like git GUIs are just complicated fronts for their shell equivalents. Shells certainly are not outdated by any stretch of the imagination.

Deutsch limit: "The problem with visual programming is that you can’t have more than 50 visual primitives on the screen at the same time."

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact