Maybe I'm missing something, but it has always seemed to be that people who complain about shell scripts do not approach shell scripting the way they approach any other programming language. By this I mean that they seem to actively resist the idea that they need to learn the language to get the most out of it.
One of my early jobs involved maintaining a pile of shell scripts that were part of automating a publishing system pipeline. So I picked up a couple of suggested books and learned everything I could about shell scripting. I quickly grew to enjoy working with shell scripts for automation.
Yes, there are quirks and oddities in shell scripting but the language is very well document and good practices and style guides have been around for ages.
Would you expect to understand any language without studying it first?
>By this I mean that they seem to actively resist the idea that they need to learn the language to get the most out of it.
That wouldn't be the case if shell was actively resistant to learning it.
First due to the varieties (different shells, POSIX vs additions, GNU vs UNIX/BSD versions, etc).
Second due to the huge non-consistency between app flags, switches, and other aspects, which are essential part of the shell experience. The whole set (POSIX command line userland + shell) was never designed as a coherent whole.
Third due to incredibly bad design of certain shell language aspects and historical cruft, that could very well be improved upon, but will never for compatibility reasons (or will, but in a new shell).
Add to that, that unlike "any other programming language" shell is mostly useful for quick one-offs and any of the myriads of syntax/gotcha/option you've learned from writing a quick command line or script you'll need to lookup time and again (because you've forgotten it after weeks or months later when you need it again).
From what I can see, everything you listed applies to most long lived programming languages. I can certainly say the same things about javascript and python.
Your last paragraph really highlights it. Why take this approach? When I need to use something time and again, I take a little time to write a clean, clear, and robust shell script to automate that task. Instead of the one liner in my history, it will usually be around 20 lines in a file.
A shell script should be written in a verbose manner and include in comments whatever notes and links to docs are useful. A good one will serve you well for a long time. I have some I'm still using unmodified for >20 years.
Every complaint on HN about Bash is from one of two people: 1) someone who has never read the man page, 2) someone who wishes Bash were Python.
Bash is my favorite programming language, because it's not really a programming language. Rather than being some academic philosophical high-minded solution, it was written by systems practitioners for backwards compatibility and usability. Once you learn it, it replaces 85% of other languages for systems work.
>Every complaint on HN about Bash is from one of two people: 1) someone who has never read the man page, 2) someone who wishes Bash were Python.
Suitcase designer before 1970 [1]:
"Every complaint about luggage is from one of two people 1) someone who has no strength to carry them, 2) someone who wishes suitcases were like carts - as if it makes sense to add four small wheels and a telescopic handle to a suitcase!"
>Rather than being some academic philosophical high-minded solution, it was written by systems practitioners for backwards compatibility and usability.
Rather than written it was accumulated and piled upon.
I am admittedly someone who wishes Bash were Python (ish). What's wrong with that though? Of course I want the REPL features of Bash, its terseness, its builtin commands for all sorts of OS and filesystem operations. But is it wrong to ask for a more modern syntax on the language side of things (i.e. the for loop for example)?
The number one use case of `for` in shell scripting is over the output of `$(ls .)` And I immediately run into a problem with filenames that have spaces and then I discover that a stringly typed language has lots of sharp edges.
I'm sure the full list of shell check rules is in the many dozens.. I was just trying to address one apparent misperception. And here I thought someone would complain about not putting $f in double quotes not * instead of ./* ;-)
Anyway, I did mention find-xargs, but it's also true your leading dash filenames mistaken for options point can cause trouble in naive invocations if find roots begin with dashes. Doubtless, a lotta gotchas..
I knew `seq` was available on BSDs. For some reason I thought it was on SVr4-derived Unices too (and therefore figured it was either actually Posix, or ubiquitous enough to be reliable anyway), but I guess I got that one wrong.
Maybe because GNU tends to be a bit more heavily influenced by SVr4 than BSD, historically?
If you're ok with integer math and no switches, you can easily make your own seq.
The next time that you are on an OSF/1 operating system, you can use this portable version. It's also probably faster than the fork() overhead, assuming dash.
seq () {
if [ -n "$3" ]
then FIRST=$(($1 + 0)) INCREMENT=$(($2 + 0)) LAST=$(($3 + 0))
else if [ -n "$2" ]
then FIRST=$(($1 + 0)) INCREMENT=1 LAST=$(($2 + 0))
else FIRST=1 INCREMENT=1 LAST=$(($1 + 0))
fi
fi
c=$FIRST
until [ $c -gt $LAST ]
do printf %d\\n $c
c=$((c + INCREMENT))
done
}
The point I was making is that because (to my recollection) GNU tends to mimic SVr4, the fact that `seq` is in GNU is probably why I thought it was also available in SVr4-derived Unices.
But also, I know that `seq` is in the BSDs.
Therefore, if `seq` were in GNU, SVr4 and the BSDs, then a) the chances that it would have been included in Posix are very high, and b) even if it wasn't in Posix, the fact that it's in all 3 of those families of shell tools would make it ubiquitous enough that I'd be happy to rely on it for a personal project that I intended to be widely portable anyway.
(Posix tends to codify existing common practice, rather than designing new features for existing systems to implement.)
Ah, that’s kind of like refugees fleeing all kinds of failing states to better places, that have rule or law, democracy, human rights, prosperity and then actively trying to install the norms of the old country in the new place they fled to.
The simplest thing you can do should be the correct thing to do.
It is impossible to do this 100% in a programming language, absolutely.
But shell violates this all over the place for programming purposes. And it's not like "read the bash man page" has a nice clear "hey, here's the way to write bash as a safer programming language".
I like bash as an interactive shell. Don't love it but like it enough to not go looking around for "better". As a programming language it's not great. It's convenient, but in terms of discussing which is a bigger minefield of problems, C or bash, I'm not sure. C is worse in practical impact because more people put it on the network, but bash is right up there in how much like walking through a minefield it is. I still use it but I definitely have it linted through shellcheck to do anything serious, and shellcheck always has things to say.
(And yes, bash is far from the only programming language to have this problem.)
> C is worse in practical impact because more people put it on the network
Unfortunately the one thing that I learned in the whole Shellshock Vulnerability thing is that a surprising number of things [still?] runs on shell/GCI.
From the perspective of writing a parser, advanced knowledge is required.
The lex and yacc parsing tools will be woefully insufficient to write a POSIX shell, as the language is not LR-parsable.
It would have been much cleaner and more consistent if awk had been chosen as the scripting language of the shell, as that had a yacc grammar for most of its life.
As far as I remember that talk, the parsing problems come down to two points:
- Shell aliases are grammar-defying macros in basically the same way as C preprocessors macros are but unlike those are not separable from the actual execution, which, yes, they’re a misfeature that defies static analysis, I don’t know why people seem to prefer aliases to shell functions so much;
- The "modern" $(... $(...) ...) syntax, unlike the "legacy" (per shellcheck) `... \`...\` ...` syntax, requires invoking the parser from the lexer, which, also yes, but as far as I know that’s a problem that every language with general expression interpolation must face—Python’s f-strings tried to sidestep the problem, but I’m not sure the resulting restricted language with weird escaping rules was worth it.
For a talk that was supposed to be about problems encountered while building a parser, I was surprised by how much of a non-event both of the problems were.
"${*##*[ ]}" is a valid POSIX approach to getting the last parameter but it doesn't work in ksh or bash. Those shells apply the pattern to each parameter in turn when used on '*' or '@'.
That text is from a developer who has profound depth.
A portable method to extract the last argument is:
Counting backslashes is unpleasant in any language, and Bourne shell excluding `...` does not seem worse than any other. Pascal-style quote doubling is the only reasonable alternative I know of, and except for occupying one character instead of two it’s not all that much better. (Also, trick question, because the first line, with its odd number of backslashes, is a syntax error if given on its own.)
More importantly, the difficulties humans have with the language are different from the difficulties computers do, and I was mostly thinking about the latter, what with the mention of LR parsing and everything.
Two reasons, one questionably relevant and one difficult to formalize.
The questionably relevant reason is that all well-known classes of declarative syntax specifications that are easy to parse don’t really compose very well, in any direction; the exception being PEGs, which do compose, but in nonintuitive ways, and are also kind of difficult to parse. So if you want a generated parser, chaining a regular-language-based lexer and an LL- or LR-based parser lets you tackle a much wider class of languages than the second step alone.
Of course, hardly any serious language implementations use generated parsers these days, and hand-rolling a scannerless parser is not that much more difficult as far as the mechanics go.
But then, the difficult-to-formalize reason is that people do actually think of languages that way, and separating phases gets you more localized and more understandable errors. This might be an artifact of how language specs are written, but I don’t think so: good frontends will occasionally split the process into even more phases, first parsing a looser syntax than specified and then tightening it up with semantic checks. For example, ISO C prohibits (a + b = c) at the grammar level, by restricting which expression productions are allowed to the left of an assignment, but a good compiler will parse this successfully and only afterwards tell you that (a + b) is not a thing you can assign to. In this connection, the "skeleton syntax tree" idea as used in Dylan[1] seems promising, but I don’t think I’ve seen it developed further.
> Once you learn it, it replaces 85% of other languages for systems work.
It's funny you say no one learns it, then say you could use it for 85% of systems work. Then the next dude comes along who has to maintain it and we're back to square one.
I'm not sure that's quite true. I like many others are by no way a Bash expert, and will generally reach for Python if I have to build anything of reasonable complexity.
I (and many, many like me) have however had enough Bash exposure to do day to day maintenance on Bash scripts.
Yes whitespace sensitivity, magic characters (WITH whitespace sensitivity), no standard library, no unicode, horrible math expressions, awful string, mishmash of acceptable good practices and outmoded stuff, printlns to debug, subpar error messages, no particularly good IDE, dynamic typing but really none of the convenience.
I honestly have no idea how anyone got anything done before Stackoverflow with bash. It's all copy paste and half the things you need to do in modern scripting isn't in typical UNIX command set (curl, etc) and there's no dependency resolution mechanism.
But yeah, read the manpage to fix everything. Sure dude.
The article is right though. Command line and pipes is a flow type model, but programming is a different ordered thing in the natural sense:
I’d recommend installing Shellcheck and an associated plugin for your editor of choice. I have the plugin installed for VS Code and open the link presented to me when I do anything “wrong” as it’ll give a good explanation for the “better/more correct” way of doing things.
Bash's primary problem is that everything is a string. PowerShell is a shell that makes for a decent programming language primarily because it passes objects between pipes instead of just strings. (Well, and the fact that 90% of what it's used for is made by Microsoft so there's a degree of consistency everywhere)
ahh, the myth of the pragmatist who outsmarts the academic. one day you will grow up. bash is a terrible language, and no academic background is needed to come to this conclusion
I think what you may be missing is the article's first point: shells are also a high-frequency REPL. The programming language is mostly popular because you can capture/record interactively/incrementally developed logic into "a script" (from typescript(1), script(1), command history, or whatever). What leads people to think they understand the language without detailed study is their many high freq success cases of running commands. Success repetition breeds (over-)confidence.
An alternate system design would be a "translator" or "translation assistant" from the high-frequency REPL/interactive/incremental domain into a more static domain (probably with human edits to correct automated mistakes). Part of what even this design would miss (and is also not elaborated upon in the article) is that the nature of interactive/incremental work is very much that of "prototyping".
Prototyping often involves many simplifying assumptions, not just about newlines in filenames (a bug in the article's xargs usage) or permissions, but really a constellation of them -- whatever simplifies things. The biggest problems with scripts as a final product are more A) identifying and B) walking back all the various simplifying assumptions of the prototype.
yes, because the mere act of comparing two variables in bash is about 100x harder than any other language (not to mention it literally straight up adds undocumented RCE vectors to your code for example when you use [[...]])
I think that viewpoint comes from how shell scripts are used - many people write shell scripts, but once they reach a certain level of complexity, they reach for python or perl or some other "more real" language. So, people don't want to create complex shell scripts. So they don't put in the time to learn the complexities of the language.
for minutes spent versus knowledge gained, the best bang for buck in my experience is google's shell style guide. i don't agree with all of their rules, but their explanations for each give a ton of insight about the language.
This is so true.
Torvalds wrote git in shell.
You can do great things.
Provided...
a) agreed, RTFM, (man bash is not long)
b) you dont try to do things shells shouldn't do.
c) for things you shouldn't do, you write a clean stable cli interface in some other language.
Often you can do c) in bash temporarily and port only if/when it gets complicated.
(That's what git did)
Did he? Because that’s definitely not what the initial revision of git that’s in the repo shows. It has 1kloc of C, a readme, and a makefile.
I struggle to think how you’d even get to the wonky tree format if you used the shell as your implementation language.
The first shell scripts landed a few weeks later, in 839a7a06f35bf8cd563a41d6db97f453ab108129 and were convenience wrappers to facilitate interacting with contributors (a wrapper around `merge` to merge a single file, a trivial wrapper around `fsck-cache` to prune, and finally a somewhat more involved original `git pull` which used `rsync` to retrieve all of the remote’s object store then would merge its HEAD with yours).
The next two scripts were literally called “example scripts” in their commit messages, and respectively for creating a signed tag and applying a patch.
If I remember correctly, git was originally distributed as several small programs like git-checkout, git-reset, etc., some written in C and others in shell.
Shells need to be good at composing programs together, languages compose libraries.
Pipes and output redirection, running things in parallel and waiting till they end - for some reasons these things suck in some languages compared to what the most basic shell can provide.
Shells let you do unexpected, unintended integrations.
I think languages could do better without breaking
"Bash is like the neighbor who seems nice and sweet, but maybe they don’t have a life you’d want to have personally. They have some pretty serious character flaws and obviously make mistakes, but in the end they are a nice person who tries hard to make sure the people that depend on them are doing ok.
"Batch [cmd.exe] is the guy who looks like that neighbor, but then turns out to be a serial killer."
...aaand if you actually think this through you will just come to the conclusion
- piping isn't anything special, its just an operator
- piping has like 10 different things you need to account for, just a bunch of stuff is chosen FOR you (heuristically) in the shell. for example IFS. whether its correct for a given pipeline is an entirely different question
- any form of error handling in a pipeline is a syntactic disaster mostly because shell syntax completely sucks
- every second program in the pipeline will literally just execute code from the input until you figure out the secret option to make it not do that anymore
- data has a structure, whether you think it does or not. whenever you fail to account for all possible cases (doesn't help that these tools mostly don't have any specified format) of that (such as in most shell scripts), you are opening holes for bugs. both sides of the pipe must use the same data structure (and since its shell that means encoding it to text which changes depending on your locale and encoding).
- for most cases you don't even need pipes:
a = f()
g(a)
or just g(f())
piping is just premature optimization like lazy evaluation. composing functions is actually simpler, just shells have no (good) way to do it.
Though to be honest, eg Bash isn't all that great at composing programs, too. Especially when you want to handle failure.
Mostly shells are good at scaling downward, ie having relatively little syntactic (and brain) overhead for very simple things. Especially when you are ok to handle any failures manually.
Stderr output does not indicate that a program has failed. It's just a communication channel to the user that's not affected when you pipe or redirect stdout. So useful for programs where to want to make sure that data doesn't get mixed up with messages to the user.
exit codes are the only true way to tell if some process failed and programs should set them properly.
This is well made case - but I'm not sure I buy the central argument. Within some basic limits, I don't think terseness and readability have the contradiction made out here, because in programming we have abstraction, which gives us both.
To take the example command that's given:
beef.txt | grep "lasagna" | sort -n | uniq
Sure, writing the logic out for this in something like python straight out the bat with only the standard library might look messy, but with one basic convenience function it could quickly be:
Obviously you have to write the function in the first place, but I'd say if you're doing something like this often, it's easily worth spending that 2 minutes. And if you're not doing this often, you'll have a faster time writing more code, but keeping less heavy lifting of "how does bash pipe together" in your head.
I shared a project here a few weeks ago experimenting with what my dream shell might look like, what surprised me more than anything else, was how easy writing a repl environment actually is. I put a scrappy one together as one person in a few hours, so I don't understand why as developers we've reached general language models before being able to make a powerful, but new-user friendly shell.
Also, completely unrelated note, but posix only allows passing back strings - but isn't this true of web apis too which we use all the time? How come no json as a standard passback from programs?
Of course, all of those things would need to be generators. Python accepts that kind of code, but most "Python developers" wouldn't think on it. Haskell is a much more natural fit for replacing Bash. (By the way, Haskell has at least one Bash replacement library that I remember, but it's hard to find).
you could also have the console on our own machine with 1+GHz that passes code to the embedded box to execute, or one of a million other variations on this theme. but nooo we gotta LARP about how well we support legacy systems which is why are systems are so broken after doing this for 50 years. oh yeah and embedded is dead, it's just high end boxes with ubuntu on them now. and last but not least i hope to god i never have unix on something that actually needs to be embedded for all of performance, stability and security reasons.
I've looked into Oil a bit, and I think Andy has the right idea for how to go about updating the legacy scripts. I've written a few local scripts in Oil and it's a definite upgrade from Bash.
A very hard thing to do though is the requirement of generics.
If everything is typed, you may have functions that take an argument and produce another value of the same type for instance. And generics are really, really hard to implement correctly.
Oil is first thing that came to mind reading article. Interestingly the submitted article and the one you linked open in similar way with two bullet points saying shell is made of two things.
I've used click [1] a lot to build Python tooling scripts the past few years. Click usage is "sort of" similar to the author's proposed solution. There's also a small section here [2] that describes some of the issues covered in the article (in context of argparse).
I have had great success creating DSLs using ipython, where you simply define a set of classes and functions on them, then use ipython.embed() (Before it was terminal.embed, so you can see where the experience is dated).
PowerShell fully disregards the POSIX paradigm (good), and has forged its own path in the world of shells.
- one shell does almost everything, rather than having a million different utilities with their own input/output formats
- everything is not a bag of bytes, or a bunch of text to be parsed; outputs are objects with queryable structure
- comes with a massive standard library, even without .NET
- straightforward, (mostly) sane syntax, thoroughly different from `sh` and descendants
I must say, for all the criticism levelled against it, Microsoft has a penchant for creating really useful and unique programming languages and tools, or at least building on existing tools and improving them. C#, F#, .NET, PowerShell, TypeScript, VS Code, VS. And six out of these seven are now cross-platform and open-source.
Why do you think people are still entrenched into mode of thinking these days? Better tooling has switched over to node (NPM) and Python (pip) for anything more complicated than 50 lines.
> Why do you think people are still entrenched into mode of thinking these days?
I dunno. Familiarity? I mainly use Windows because it's what I've used for 23 years, even though I have a dedicated SSD to Linux. Maybe for many people, it's not worth unlearning decades of muscle memory and re-learning PS just so they can get object-oriented shell scripting.
I get the feeling Powershell is exactly what the author has in mind when they say
>Historically, operating systems that have fanciful structured interfaces between programs have been left in the dust by Unix, because Unix maximizes flexibility by favoring the lowest-common-denominator interface, which is the string.
at the end, there.
I adore what Posh gives me over bash. I've also seen many people over the years try to use it as Windows's "shittier Bash" without noticing that doing complicated things with strings here is so painful because _you're not supposed to do complicated things with strings._
The readability of Posh is a direct consequence of people learning to work with its stricter, but more composable interfaces.
That's how I tried to use it in the first year of using Windows (after switching back). Once I let go of my entrenched habits after years of Linux, I was able to do much more with PowerShell.
I mean, there's some beauty to having all output as string. It's more predictable. But soon you're jumping through so many hoops to get what you want, it's terrible.
I've been learning PowerShell ever since it was launched, but it took me many years to realize it's actually better than bash. Sometimes you have two learn tô stop fighting the tool and embrace it instead.
kichererbsen in german means chick peas, but if you separate the two parts.. and maybe fudge the spelling to kichern, it would literally translates to giggle peas.
in dutch it's kikkererwten which literally translates to frog peas. I always found that funny... makes me want to kichern.
Story time: I've been a diehard bash/zsh fan for the better part of the last two decades (I was a regular user of it since 1999 so that's 24 years now). I daily-drove some form of Linux since 1999 up until ~2010 when I switched to MacOS (zsh). I only switched to using Windows as my daily driver for ~2 1/2 years ago.
Around 2 years ago, I had a script in python that I wrote to process my directory of media files based on the directory name and looked for a subtitle file and pulled that down. For fun, it was a simple enough script that I re-wrote it in bash (WSL2). Six months later, I needed to make a change and was surprised at how much I couldn't read it. I just forgot.
That was OK, I had a boring weekend and rewrote the whole thing in PowerShell. A few things I noticed:
- It was more LOC owing mostly to brackets, etc, but I'm over the whole "less LOC is better" mentality.
- It was faster - but this is probably the result of WSL2, but also, I didn't have to call out to other CLI utilities as much. You can do just about everything in PowerShell and call out to .NET for anything more complex.
- It was far more readable. Having to parse while [ ]; do ... done; is harder than ForEach-Object ( ) { }
- Passing things around as an object is a very pleasant way of scripting (Python could do the same, but it feels more "Programming Language" than "Scripting" at this point)
- Calling an CLI utility is just the same as in Bash (Unless it has spaces, then you just do & "Path to CLI.exe")
I had to make a change recently and was pleasantly surprised at how much I could remember the layout, functions, etc.
I also tried to adhere to the MS-isms like using a Verb-Noun for my functions where it makes sense, and breaking things up in modules.
I don't absolutely love PowerShell, but it is pleasant to use and pulling in .NET stuff is just a line of code away.
Now I use it directly as a daily driver, with Windows Terminal - which is also surprisingly pleasant to use.
That is just a case of people trying to use PowerShell as a bash shell combined with the unfortunate decision by whomever at Microsoft to alias PowerShell cmdlets to well known UNIX tools.
It doesn't "silently mangle" binary data - it assumes you're piping PS objects or text streams through. If it is a text stream, then it will pipe through as Unicode by default.
Invoke-WebRequest (curl) pipes out an object, with the content in (object).Content. Using | uses Unicode by default which "mangles" binary output. So it tries to convert the object into a Unicode text stream then into the file.
The correct way to save an image would be to simply set -OutFile (-o) in Invoke-WebRequest.
If you MUST pipe it through, then you need to access the object's .Content, then use Set-Content -AsByteStream (for PS7+) or use Set-Content -Encoding byte (PS5)
This is useful if you wanted to put the image inside a variable (rather than create a temporary file for the image) and pass it around until you figure out what you want to do with it.
>> That is just a case of people trying to use PowerShell as a bash shell combined with the unfortunate decision by whomever at Microsoft to alias PowerShell cmdlets to well known UNIX tools.
>> It doesn't "silently mangle" binary data - it assumes you're piping PS objects or text streams through. If it is a text stream, then it will pipe through as Unicode by default.
The problem is that PowerShell uses aliases that match UNIX command line utility names and the same pipe syntax, but exhibits subtlely different behavior without any indication of the difference.
If someone is not aware of the difference and they use PowerShell to process production data, their data could be corrupted without them realizing it.
> The problem is that PowerShell uses aliases that match UNIX command line utility names
Agreed.
> If someone is not aware of the difference and they use PowerShell to process production data, their data could be corrupted without them realizing it.
Someone coming from UNIX-land shouldn't be making PowerShell scripts day1. They really should be diving into documentation. Microsoft's documentation on PS is _excellent_ and here is one about pipelining: https://learn.microsoft.com/en-us/powershell/module/microsof...
Notably:
> To support pipelining, the receiving cmdlet must have a parameter that accepts pipeline input. Use the Get-Help command with the Full or Parameter options to determine which parameters of a cmdlet accept pipeline input.
then further down:
---
Using native commands in the pipeline
PowerShell allows you to include native external commands in the pipeline. However, it is important to note that PowerShell's pipeline is object-oriented and does not support raw byte data.
Piping or redirecting output from a native program that outputs raw byte data converts the output to .NET strings. This conversion can cause corruption of the raw data output.
As a workaround, call the native commands using cmd.exe /c or sh -c and use of the | and > operators provided by the native shell.
> Piping or redirecting output from a native program that outputs raw byte data converts the output to .NET strings. This conversion can cause corruption of the raw data output.
That Python example is broken in all kinds of ways, and is overly verbose. I would go with something like the following, which does spark some joy in me:
matches = {l for l in open("beef.txt") if "lasagna" in l}
print(*sorted(matches, reverse=True), sep="")
On And on the same note, the shell command is weird too, and should probably be:
grep "lasagna" beef.txt | sort -nur
So the shell command is still definitely simpler for this case, but for anything more complex, I'd go with Python.
EDIT: I put back the `n` flag, in case the lines start with numbers.
I noticed the contrived Python example too, but yours will explode if the input is too big, so, it's not equivalent to Shell, which will buffer the input.
Still, even with that in mind, Python variant could've been greatly simplified. To give the author the benefit of a doubt: they probably made such a contrived example on purpose to underscore their point, but it's hard to find small-scale examples which illustrate verbosity... On the other hand, I might be overthinking on behalf of the author. Doubt they explicitly set out to make a contrived example. They probably wrote what they wrote and were happy with the result because it helped them to make a compelling argument, and they didn't think to analyze their own illustrations too deeply to see if they might disagree with their conclusions.
>yours will explode if the input is too big, so, it's not equivalent to Shell, which will buffer the input.
I agree that mine would explode on a massive file, but how would the shell version avoid this, given the need to sort? As far as I know, it sorts entirely in memory.
When the input is large enough, GNU `sort` writes the input in chunks to multiple temporary files, sorts the individual files, and then merges the result for output.
Your Python version is def neater and more compact. However there's something really elegant about using the | operator to compose commands (or using |> in OCaml/F# or iirc $ in Haskell) that I can't quite articulate :)
My example is a bit stupid but I hope it's clear what I mean. Maybe it's just that as someone who only knows left-to-right languages, the flow of the data matches the order the steps appear? Actually I'm now curious if Arabic or Hebrew speakers feel the same way. Or maybe I'm overthinking this :)
In Forth, you have to reason about how many things each command takes off the stack and leaves on the stack, whereas with pipes, it's always exactly one thing.
Yeah I think it’s one of the less-hard things about Forth. For what it’s worth I quite enjoyed fiddling with Forth, and went through a bit of a “stack computing” phase. I never made the connection with the the pipe operator tho
For me, the last example feels messy for two reasons. First, I have to worry about matching parens. That can feel mentally exhausting even for simple cases. Second, it's written all on one line, so it's harder to mentally separate all the steps. I usually break expressions like that apart into separate lines, but that means finding concise yet meaningful names for temporary variables.
That's much better, but like the original implementation it's still missing the -n flag for sort, which extracts numbers at the beginning of lines and sorts them numerically instead of lexicographically.
Oh, I always thought it only works on numbers, and am very pleasantly surprised to learn it works on numeric prefixes, thank you!
EDIT: After playing around with the python approach, I arrived at this, which is definitely more verbose, but arguably, more maintainable:
import re
NUMERIC_PREFIX_RE = re.compile(r"^(\d+.?\d*)")
def numeric_descending(line):
if m := NUMERIC_PREFIX_RE.match(line):
return -float(m.group())
return float("inf")
matches = {l for l in open("beef.txt") if "lasagna" in l}
print(*sorted(matches, key=numeric_descending), sep="")
Not sure it is more maintainable than a shell solution. Look at this in two years from now thinking there might be a bug somewhere in there (say you want to include negatives and scientific notation). In the shell, a quick inspection suffices. In this case you have to deal with regular expressions within Python, their own little special language… and it may not spark joy.
Well, it seems you nerd-sniped me a bit, so here's a new version which deals well with scientific notation (negatives were already supported) and is actually simpler, using a split and "it's better to ask for forgiveness..." instead of the regex:
def numeric_descending(line):
possible_number = line.split()[0]
try:
return (-float(possible_number), line)
except ValueError:
return (float("inf"), line)
matches = {l for l in open("beef.txt") if "lasagna" in l}
print(*sorted(matches, key=numeric_descending), sep="")
Thanks, and apologies---I didn't mean to come off as negative. Yes, this code is cleaner and certainly maintainable a couple years down the road. It will work in a slightly different way than the shell script (the lack of space in "10oz" of lasagna vs "2.0e1 oz" of lasagna), however I'd consider this a feature rather than a bug, as it could help identify bad formatting in the input data.
I've learned from experience that it is hard to replace certain simple space and memory efficient programs that have been tuned over the course of 40 or more years. If one wanted to print only a single instance of a line in a file, then python's set construct in your snippet is certainly sparking more joy compared to the awk equivalent of printing only the first instance of a line in a file:
awk '!x[$0] {print; x[$0]=1}'
Tuning python to use ordered dicts to keep the line order looks a little less pleasant (and might appear confusing to an uninitiated), but would still be rather neat. There are many instance of day-to-day data inspection tasks where bash, sort, shuf, sed, grep (or rg), and awk will solve a problem near optimally out of the box and can combine nicely in modest memory machines to tackle large data, when some python native code (or poorly designed python library) might accidentally cause a performance mess. Interactive shells are an awesome superpower that is hopefully going to stay with us for a while. I'd have personally been happy if some type of Lisp REPL had won over the shell fights, but we don't live in that world, so python + posix tools via a shell will have to work in practice for most people.
Well it might work as a joke, but I have to say that I had more than enough experiences with Powershell that drastically disprove that notion in my opinion. That is not to say that bash and consorts are much better, but e.g. a shell that decides to arbitrarily violate its normal error handling logic based on internal voodoo heuristics about if it's running in a console or not [1] is pretty much the total antithesis of sanely architected in my book.
As in: Architecture must be the reason someone thought this insane scheme has any merit. And arbitrarily starting to throw exceptions whenever a subprocess dares to write to stderr, especially if the shell doesn't do this in normal operation, doesn't get any other label than "insane" from me.
Also the terrible terrible idea of using common unix shell command names as aliases for PowerShell commands that are similar but don't quite work the same way and take entirely different parameters.
...and that time period where Invoke-WebRequest used IE for parsing by default.
But other than those flaws I agree that it is really great.
> using common unix shell command names as aliases for PowerShell commands that are similar but don't quite work the same way
On any Linux/Unix, when `ls` is called, PS runs the native `ls` and not `Get-ChildItem`. These aliases only exist on Windows, where unless one is explicitly running a Unix-ish shell (WSL, Cygwin, MinGW, MSYS2, etc), PS will use these aliases. Plus, `ls`, `mv`, `cp`, and `rm` never really existed on Windows; the equivalent cmd binaries were `dir`, `move`, `copy`/`robocopy`, `del`.
Finally, this is a fairly tired argument against PowerShell...
It may be a 'tired' argument but a valid one. For me I keep using 'dosisms' in powershell. But they do not work because those commands are actually something different and you need the params for those items. One that gets me every time is is the 'dir /s' command which does not work in powershell because you should use -recurse which is the command for Get-ChildItem which is what it is really calling.
I think most people have this issue when jumping between shells. Each one is has its own way of working. But adding aliases, I can see why they did it. The problem is it was poorly done up front. Now they are kind of stuck with it and we get to deal with it. Also you may be expecting all of the sorting items out of dir/ls. But you do not get those. You get to pipe it and sort it in a secondary step. Which is fine. But can is a bit of a pain when you are used to something like 'dir /os' vs 'Get-ChildItem . | Sort-Object -Property Length'. It just takes some getting used to. But seems longer and more verbose to get the same effect.
It is nicer though when you want to act on those items though (which is where powershell shines) as everything is in an object and you can slice it down. But using it for 'shell' things like where is a file and what is the biggest in a folder it falls a bit flat and feels like the object model is in the way.
> where is a file and what is the biggest in a folder
You don't need to use the full commands and parameters. The commands have aliases, and parameters can be arbitrarily short (and case-insensitive) as long as they are unambiguous:
gci . | sort Length
gci . -r -inc <filename>
Furthermore, the PowerShell interpreter has an extremely powerful autocomplete, so these should be straightforward to look up.
If you are thinking I am a sarcastic detractor, I am not. I whole heartedly believe PowerShell is superior in every way that matters to more common POSIX shells.
I was unaware of the difference between Windows and Linux in this regard as I don't tend to use PowerShell on the few Linux systems I run. However, I maintain that it is still a mistake to alias them in Windows because anyone coming from a POSIX shell background will be misled.
> the equivalent cmd binaries were `dir`, `move`, `copy`/`robocopy`, `del`.
Nit: only robocopy is a binary in the list. The rest are built in to the command shell.
My sarcasm detector led me into thinking you were a sarcastic detractor (heh).
I get your point though; perhaps PowerShell should've stuck with the initialism aliases only (`gci`, `cpi`, `ri`, etc) and people wouldn't have nitpicked on this as though it was some powerful argument against PS.
Perhaps there could've been a banner saying `hey, this is Windows, use `gci` instead of `ls` with these xyz options`.
I felt like the author did realize that, and the "written by the devil" comment was a direct reference to autoconf (which certainly qualifies as a devil). Who knows, though.
zetalyrae also correctly pointed out that much of the text is "handwritten but transcluded". Most of the code snippets that appeared in autoconf was originally written by hand and stored in reusable modules, including 100+ tests in autoconf. So overall the volume of code is still huge.
I guess it at least demonstrated the power of modularity. Almost nobody can understand it, but it still works...
The whole point of autotools is to make sure that your project can run on the largest possible set of unix-like operating systems.
`sh` is actually the very best compilation target for that, looking back in time, whatever other language that could have been chosen (most likely perl) would have been a huge mistake.
I don't think you understand the role of autoconf... The whole point is that you _don't know_ yet what compiler is available, which version/flavor of C is supported, which API or syscall are available, etc.
Autoconf was litterally made so that you can write C code that is somewhat portable. Suggesting that it should have been written in C and built locally is non-sensical.
Well first because autoconf/automake were built in the early 90s. It's not like there were a ton of choices back then.
Basically if you wanted something that 1) is available on all Unix-like flavors and 2) is strictly well defined and standardized, you could count them on a single hand fingers.
I guess the candidates were sh, awk, make, m4 and perl.
sh had the advantage of being ubiquitous, somewhat well understood by everyone, strictly defined (an sh-compatible shell at /bin/sh was mandated by Posix/SUS, which had (and still have) quite good traction even for non-strictly unix/posix Oses)
Note that even by today's standards, "sh" is probably the only thing you can be 100% sure is available whatever the Unix flavor, even on the most minimal build imaginable.
I cannot say I can think of anything as well defined and supported even today.
Python? Ok but which specific version?
Most distribs struggled to migrate to Python 3 for the very reason that system-wide python 2 usage was pervasive in base utilities (yum, etc). If Python 2 would have been chosen as autoconf target, imagine the nightmare we would be in today.
Perl? I guess it could have been the strongest candidate after sh. Present me is happy that it was not chosen though, as we seldomly see any Perl nowadays, while sh is still quite well used and understood.
sh is also quite well suited IMHO for the kind of things you do with autoconf/automake/make.
Some examples that come to my mind:
- Cleanup some directories / intermediary files
- grep / find / sed some configuration from config files to .h/.pc
- Lookup for ad-hoc libraries on the filesystem
- Archive / build your dists
- etc
Now dont get me wrong, I won't say autotools are a brease to work with. Especially I have a profound hate of m4.
Pragmatically though, once you tame the beast, it does the job it was made to do pretty darn well.
If I recall correctly, the original compilation target is not even /bin/bash but the /bin/sh. Originally, the ./configure script generated by autoconf must run on several incompatible flavors of Unix at that time, including AIX, SunOS, and BSD, powered by several different CPU architectures, such as x86, Alpha, SPARC, and MIPS, so shipping binaries or using the Bash syntax was not acceptable. It must strictly use the original sh syntax. Though I suspect lots of Bashism has lurked in over the years since nobody still tests it anymore.
Libtool.. yeah, that's evil. Was it ever really helpful?
At a former job we supported Linux, AIX, FreeBSD, Solaris, Mac OS, and Windows. One of the best things I did was rip out libtool, as it was totally unneeded. And if we didn't need it when supporting all those disparate platforms, I really cannot figure out what purpose it serves.
It's the premier porting tool for making cross-compilation harder.
I think the main raison d'etre was automatically linking transitive dependencies for static libraries so that builds on system with or without dynamic libraries could work more or less the same. E.g. if you statically link e.g. Freetype than wether you also need to link zlib, libz2 and a bunch of other stuff depends on how Freetype was built but the .a static library contains no information about those dependencies since it is just an archive of object files with a symbol index. libtool records these dependencies in .la files so you don't have to guess.
It does have many questionable decisions like silently dropping (some) compiler flags it doesn't know about. I wish I never had to deal with it.
POSIX shells does indeed have many problems, including lack of data types and various security limitations. I like the point OP is making that shells should focus on being interactive, but not so scriptable.
For scripting, it's better to resort to a simpler solution that doesn't even need complexities like parsing or data types. In a Unix environment all you need is the ability to compose commands together. This is what execline is for:
There are a few modern shells that do include data types and fix a number of security issues that more traditional POSIX-like shells (Bash isn't technically POSIX) suffer from.
It's also not performing the numeric sort ('-n'), which is arguably the hardest part because it's not a Python built-in. A bad example all around.
Here's my attempt[1]:
import re
lines = list(set(line for line in open('beef.txt') if 'lasagna' in line))
lines.sort(key=lambda line: int(re.match('\d*', line)[0] or 0))
for line in lines: print(line, end='')
Python is especially bad at this task for a number of reasons:
- Iterating over the lines yields strings ending with newlines. If you naively loop and print the sorted lines, `print` will add a second newline, resulting in every other line being empty. Hence the `end=''`.
- Python's str has many helper functions, but none for numeric sort, so you need an import, a regex, and a fallback value.
- It's very imperative with a big mutation in the middle. Using `sorted(lines, key=...)` and `print(''.join(lines))` would be simpler and safer, but each would double the memory usage for large inputs.
- Filters using list comprehensions ('x for x in y if z in x') are idiomatic and readable but very verbose.
- The concepts are a lot more complicated. Here we have file handles, magic iterators, first-class-functions, sets, generators, lists, keyword arguments, regexes, overloaded indexes, etc. A beginner would blindly copy and run, and have no chance to fix any issues that appear.
- Because Python has almost no static analysis (undefined names are runtime errors!), you'll have to find most errors by running your code. For a quick shell script this usually mean running over the real inputs, giving a horribly slow edit-compile-run loop and be potentially dangerous.
In author's defense, code is apparently meant to exemplify the differences between shell and general programming languages mentioned in bullet points. So working doesn't matter and isn't even mentioned that is in Python.
A number of command-line programs are actually just wrappers around a more strongly-typed library API. For example, curl and libcurl, or ffmpeg and a whole collection of audio/video libraries. [1]
Of course, in the POSIX world, "library API" means "C API", so if you want to write nice-looking Python code like in the article, there is still some more work to do.
Where does the requirement for it being a generic library come from? The point is that a lot of things that are available as command-line utilities for shell scripts already have a C API available as well.
And libav* is used by a lot more than /usr/bin/ffmpeg
OP's Python is... exaggerated? (there's too much noise in it, was that deliberate, as an illustration, or just bad Python?), but so is their use of nomenclature... Shell is not and doesn't need to be a REPL, but, I think, OP is using that word the same way some use REST to mean any kind of HTTP interface, i.e. just a common misconception about what that word means.
Also, OP makes claims about "readable syntax" and "static types", both of which are sorts of common misconceptions. Like... English is readable to you, right? You had little problem reading this after all. All types in Shell are static when you look at the script, and, actually are much more well-behaved, because all of them are string. Both at definition time and at run time.
----
In general, I just don't agree with OP's analysis because, I think, OP interprets status quo as the only possible way things may be. But the reality is that we have a bunch of popular but atrocious shells and a handful of popular but atrocious languages that aren't usually used in shells. The fact that the two groups don't seem to have a lot of overlap is just an accident.
Now, take for example SQL. It's fine to use it interactively, in a way more similar to how system shells are used, but also to write programs in it. It doesn't have modules or some other organizational features other languages have, but do you think it's somehow impossible in principle? -- Of course not. Had there been a will, a new SQL standard would add modules to SQL etc. There's nothing impossible about it. Similarly, had there been a will, SQL could've been used in the same capacity Shell is used today. No reason it couldn't. We use a decrepit UNIX Shell because of standard and tradition. Or we use the circus of PowerShell because that's the only reliably available tool on that OS.
It doesn't have to be that way. It's just that we have a sample size of two, and both are defective beyond belief.
It's my opinion that the author is wrong. A lot of the same qualities that make a good REPL for interacting with the system make a good programming language for interacting with the system.
C shell was designed solely for interactive use and is a worse REPL experience for it. Not being able to e.g. redirect compound expressions matters for quick one-liners at least as much as it does for writing scripts.
I think the author has an underlying assumption that untyped languages are worse for writing programs, and that's what they mean by "terseness" versus "verbosity". If that's the case then they should probably state that outright.
I'm only unsure about this assumption because their pandoc example is non-sensical if that's what they are getting at.
In reality there isn't much contradiction between the two goals, you "just" need a bit more design thought instead of getting stuck in the ugliness of POSIX.
Author's first example
cat beef.txt | grep "lasagna" | sort -n | uniq
is already possible literally as written in Xonsh (a Python-based shell) with a pipeliner plugin
give me functions that output typed data. none of this string parsing shit. the sole fact that they have syntax to call bash: $() makes me not want to use this ever. this is why every single effort like this is wasted. i used to get excited by these projects and after about 2 of them i realized it's always a bunch of crap duct taped together, and also bends over to be compatible with un*x and so everything is ruined.
they're also often so poorly implemented, like with those early fancy python shells they'd go beserk if one command you execute goes into a time consuming process (or infinite loop) whereas why the hell can i not just run another command below while waiting for the one above to complete? this is the thought i had in 2008. i could have made something better out of my ass. really, there's no other way to put it.
here's another thing: a language console should operate on semantic objects, not text. as in, you move objects around to program in it. editing code by typing one character at a time is too slow. no amount of vim or emacs fixes this. in fact, we can just scrap those un*x garbage (along with the rest of un*x) while we're at it, when implementing an OS that works on actual meaningful data instead of text, and semantic objects instead of a language represented by text.
I wrote code with networkx to parse this graph dot file and then follow the plan topologically, spinning up packer, terraform, bash scripts, compilation and so on. It also executes graph nodes in parallel where it can.
My problem with bash pipelines is passing data along and referring to previous steps' data. You kind of need to enrich the data that goes along the pipeline. In my use case, I need to pass along outputs from previous instantiations.
> The fundamental problem of shells is they are required to be two things.
> A high-frequency REPL, which requires terseness, short command names, little to no syntax, implicit rather than explicit, so as to minimize the duration of REPL cycles.
> A programming language, which requires readable and maintainable syntax, static types, modules, visibility, declarations, explicit configuration rather than implicit conventions.
> And you can’t do both. You can’t be explicit and implicit, you can’t be terse and readable, you can’t be flexible and robust.
Of course you can. Lisp is an existence proof. You get the best of both worlds by starting with a simple dynamic language and adding declarations and type inference on top in a backwards-compatible transparent way.
Just because no one does this doesn't mean it can't be done.
Forth is the first "shell" that ever ends up on a new piece of computing hardware. You get a REPL + programming environment from scratch by implementing just 28 words in hand-written assembly.[0]
Most programmers can get it running in just a few days, even with no prior knowledge. (Lisp is also famously small to implement.)
> Just because no one does this doesn't mean it can't be done.
The correct way would seem to be to write four python programs, named pcat, pgrep, psort, and puniq, and then pipe the output of one program to the other program. The only advantage would be that it should be able to work in any operating system with a python environment.
Ideally each program would have the same functionality and resilience as the original bash programs. The argparse module (for accepting the same options at the command line as the origals) and pathlib module (for portability to Windows as well as Linux and Mac) would be useful.
Now, is it really worth the trouble of doing this? Possibly, at least as part of an educational Python program. Implementing all of grep's functionality would be a big job, however.
You can fork and pipe between python functions perfectly well. It just needs a lot of boilerplate (that you can easily abstract away). Also, you will get most of the problems of piping in Bash imported into Python.
But the real question is why are you piping those? Almost always, piping is not the end-goal, and this is certainly one of those cases.
I’ve written about this problem previously and it is a problem I’m attempting to solve with my own shell (along with the added problem that most shells still have a UI from the 80s).
I do think it is possible to find a sane middle ground however it’s not an easy task negotiating the right compromises to make. Plus any shell that does achieve this wouldn’t be POSIX compliant, which then automatically breaks support with sh/Bash etc.
The conclusion is eerily close to using a programming language, Python say, having a natural library API for that, and then a small `if __name__ == '__main__'` that parses args (with argoarse or docopt or whatever) and calls it.
The problem of course is so now you want your script to 'pipe' to pandoc, say, and pandoc even if implemented like this might not have chosen Python but Ruby, say. So you have to (or ~have to, most reasonable way is to) use the CLI to interface with it anyway.
1. Shells are a universal interface from orchestration layers in higher level languages to “do stuff on a Unix node”. Your puppet scripts, dockerfiles, ansible modules, all will benefit from understanding shell.
2. The bash chapter in the Architecture of Open Source Applications is really interesting, one finishes it feeling sympathy for the devs. http://www.aosabook.org/en/bash.html
sort also has -u, so there's two unnecessary pipes. The whole thing could be written:
grep lasagna beef.txt | sort -u
Some people add extra pipes and cat for what they claim is readability. I don't agree that cat and extra pipes is more readable, but to each their own.
I was just thinking the same. There's not really any reason a shell can't have terse syntax for quick command things and explicit, careful syntax for things that require it. And if python had a pipe operator, you could basically do
open('file.txt') |> grep(?,"lasagna") |> set |> sorted
Which is really pretty good. It's not at all hard to imagine a Python that was designed to be good in the shell. They just didn't try.
> Historically, operating systems that have fanciful structured interfaces between programs have been left in the dust by Unix, because Unix maximizes flexibility by favoring the lowest-common-denominator interface, which is the string.
I am curious for examples for such a operating systems and their program interfaces
just like every component of a fully un*x braindamaged OS, the shell only gives you speed where it increase the likelihood of errors (and security vulnerabilities, lots of them). every single thing you can do in the shell is a vuln by default until you write a verbose workaround. since nobody does that, in practice the shell is just insecure and fast(ish) user experience. pl design is a big field and some C/java/sh programmer's opinion doesn't nearly represent it. even python is better than sh in every way; merely showing that it doesn't have a bunch of useful functions in scope by default (duh, it wasn't geared for that use case) doesn't prove that sh is fundamentally a good language for a console (its not).
F# is readable and terse. F#'s REPL is really fun to work in, too. The significant whitespace makes it a bit awkward in shell mode, but why do that when editing a script in VS Code, Emacs, or even screen + nano is far more convenient.
Maybe I'm missing something, but the example at the bottom, you can do just as much type checking with the terse command-line syntax as with what he calls the 'typed interface'.
Certainly factor/retroforth, Forth is too low level to be useful, but with support for quotations and streams, factor/(retro?) might be good, dealing with series a la rebol is also a good idea. Unfortunately all of those languages are kind of dead.
The interesting thing is that pipes behave in an rpn like fashion, but standard behavior is prefix based, So a rebol like language should be the normal operation, and then it changes to rpn when you pipe, the only languages I see implementing this is rye, which is highly experimental.
Powershell is also a really good option, at least for what OP wants, I like terseness. I can read bash code fine enough, until someone starts golfing, but that's true of all languages. python also has xonsh.
One of the reasons why PoverShell approach cannot succeed now is that the system interface produces exit codes and unstructured text output. So, no matter how hard PoverShell tries to pretend that there's structure or that there's anything but strings and exit codes, it's all fake, and it starts breaking up very quickly, especially if you are used to work with programs whose interface is just that: exit codes and unstructured / randomly / uniquely structured output.
Had the system offered programs an interface for outputting structured information, had it worked better at standardizing and unifying the way programs process structured input... PoverShell as a concept would have more ground to stand on.
Right now, in the UNIX world, it is an even worse option than Python. So, there's no motivation to accept it. At least I haven't met a single system programmer who would have wanted to have PowerShell on UNIX. Only people who do that are Windows users who have some prior experience and who simply don't know much about the system they have for whatever reason ended up using.
One of my early jobs involved maintaining a pile of shell scripts that were part of automating a publishing system pipeline. So I picked up a couple of suggested books and learned everything I could about shell scripting. I quickly grew to enjoy working with shell scripts for automation.
Yes, there are quirks and oddities in shell scripting but the language is very well document and good practices and style guides have been around for ages.
Would you expect to understand any language without studying it first?