> PPS: In theory GNU Coreutils is portable and you might find it on any Unix. In practice I believe it's only really used on Linux.
Oh how times have changed:) But even with the demise of most commercial unixen, I was under the impression that Darwin preserved the ancient tradition of installing the system and then immediately replacing its coreutils with GNU?
That is indeed an incredibly frustrating difference, because there is no version of it that works on both GNU and macOS. In practice I often end up using perl to do an in-place search-replace instead.
I much prefer Linux but having used Macs a few times for work, I absolutely cannot recommend making GNU coreutils the default option on Macs.
If you make GNU coreutils the first binaries in PATH, you can expect subtle, _very nasty_ issues.
The last one I encountered was `hostname` taking something like 5 seconds to run. Before that, there was some bug with GNU’s stty which I don’t remember the specifics of.
There's far less variance between Perls on different systems than there are for sed/awk/shell. So if you want performant portable code, Perl does better than all of those.
I'd never use Perl for a 'big' program these days but it still beats the crap out of the mess of sed/awk/bash/ksh/zsh.
Oops yes, that should have been -i '' with a space. But I see now that GNU doesn't accept -i '', or even -i .orig. It MUST be without a space: "-i.orig". The reason for that I assume is there's no way to disambiguate:
echo foo | sed -i pattern
echo foo | sed -i .orig
But you can still use "-i.orig"; that should work on both. That will leave you with a .orig file to clean up, but arguably that's not a bad thing as -i can clobber files.
That was the case for a while, but when they were updated to use GPLv3, Apple stopped updating them, probably to avoid licensing problems on iOS. Nowadays you can install them from Homebrew, MacPorts, straight from source, or other methods.
An interesting factoid is that FreeBSD/macos sort(1) was using GNU code until recently, since this is quite tricky to implement. Eventually it was reimplemented for GPL avoidance reasons.
We do consider macos though, and ensure all tests pass on macos for each release
I tend to install coreutils on FreeBSD. Besides some minor annoyance with every command being prefixed with "g", some of the programs work a bit nicer than the FreeBSD-shipped versions (or some Linux-centric programs just want the coreutils versions...).
In GNU utilities, option arguments can come after (or between) positional arguments. Personally I find this small convenience invaluable, because I'm used to it.
As a Unix graybeard I always place options first. Options last feels like Windows command prompt, so nothing I want to see...
I always tell younger colleagues who place options in the end, it might work with some commands, but just don't do it. I did not know that "some" includes all of GNU coreutils? A single common code style is a virtue, even in interactive use if there are onlookers. So I guess I will continue to point it out.
Even many command line parsing libraries support it and scan the entire argv for options. You should always terminate the options with "--" if it's in a script and any of the positional arguments are variables that might or might not start with a dash.
Unfortunately not. macOS comes with violently outdated FreeBSD coreutils, for the GPLv3 situation. Though the default shell was recently changed to zsh from Bash 3.2 from 2007.
Right, which is why I thought it was common for people to install the operating system, then immediately use macports or homebrew to go get the GNU coreutils and a modern version of bash because the BSD versions are less friendly.
(Thus continuing and extremely long tradition of layering GNU over the vendor tools; ex. Sun had their own tools, but everyone liked the GNU versions better.)
Oh sorry, I interpreted your comment as asking whether Darwin installed GNU coreutils itself in some roundabout, very-expensive-lawyer sanctioned manner :)
What's wrong with bash though? Targeting bash specifically has massively improved my productivity and decreased the incidence of easily avoidable mistakes. Portable POSIX shell scripting is hell on earth but bash scripting with shellcheck can be surprisingly pleasant. I had the same experience with portable makefiles and GNU Make.
For example, I managed to create a surprisingly good test suite with bash and the rest of the GNU coreutils:
It even runs in parallel. Submitted a patch to coreutils to implement the one thing it couldn't test, namely the argv[0] of programs. I should probably go check if they merged it...
> Portable POSIX shell scripting is hell on earth but bash scripting with shellsheck can be surprisingly pleasant.
Assuming you meant `shellcheck`: you know it works for POSIX compatible shell too right?
What's wrong with bash is that people invariably end up requiring features that aren't available in some deployed version, and then you've lost a lot of the benefit of writing in shell in the first place.
Similar to shellcheck, shunit2 works just fine for running unit tests in posix compatible shell.
I did. Edited my comment, thanks. I don't know why I'm so prone to that particular typo. My shell history is full of it.
> What's wrong with bash is that people invariably end up requiring features that aren't available in some deployed version, and then you've lost a lot of the benefit of writing in shell in the first place.
But I've gained quite a lot too. Bash has associative arrays. I just can't go back to a shell that doesn't have that.
Shell scripting makes it simple to manage processes and the flow of data between them. It's the best tool for the job in these cases. So there are still reasons for scripting the shell even if one is willing to sacrifice portability.
I just can't believe it's 2023 and we're actually praising a shell for having arrays and dictionaries. Or that there are still multiple shells in use that don't. Or that people still ask questions like "What's wrong with bash though?" with a straight face, as if they don't know.
Now how long are we going to have to wait until somebody invents a way to do named parameters? That will revolutionize the computer industry! I guess it's way to much to ask for a built-in json or yaml parser, all we can hope for is maybe a stringly typed sax callback based xml parser after another 20 years from now, because dom objects in a shell would be heretical and just so unthinkably complicated.
Why are people so afraid to just use Python? Shell scripting and cobbling together ridiculously inefficient incantations of sed, awk, tr, test, expr, grep, curl, and cat with that incoherently punctuated toenail, thumbtack, and asbestos chewing gum syntax that inspired perl isn't ever any easier than using Python, especially when you actually need to use data structures, named function parameters, modules, libraries, web apis, xml, json, or yaml.
Thank you! That's a great analysis, and I love the design of PowerShell, which addresses most of my arguments against bash. The Unix community (especially Sun Microsystems) has traditionally had this arrogant self-inflicted blind spot of proudly and purposefully cultivated ignorance in its refusal to look at and learn from anything that Microsoft has ever done, while Microsoft is humble and practical enough to look at Java and fix it with C#, look at bash and fix it with PowerShell, etc.
Here's the summary of a discussion I had with ChatGPT about "The Ultimate Shell Scripting Language", in which I had it consider, summarize, and draw "zen of" goals from some discussions that I feel are very important (although I forgot about and left out PowerShell, that would be a good thing to consider too -- when I get a chance I'll feed it the discussion you linked to, which made a lot of important points, and ask it to update its "zen of" with PowerShell's design in mind):
Summarization of the important points in those discussions that apply to The Ultimate Shell Scripting Language, resynthesized into a "zen of" list.
Discussion of Support for Declarative and Procedural Paradigms and how it applies supporting standard declarative syntaxes including json, yaml, and xml (which bash still doesn't and probably never will, and PowerShell does of course).
Suggestions for some more "zen of" and design goals, specifically focused on addressing weaknesses or design flaws in popular languages like sh, bash, tcl, python, perl, etc.
Discussion of how "Simplified Debugging and Error Handling" can be balanced with "Macro Processing and Syntax Flexibility", better than debugging C++ templates, Lisp macros, TypeScript code compiled to minified JavaScript, etc.
Discussion of how JavaScript / TypeScript fall short of those goals.
Discussion of visual programming languages for interactive shell scripting as well and general purpose programming.
Discussion of layering visual programming languages on top of textual programming languages.
Discussion of using Python to build a visual data flow node based shell scripting language on top of Blender (that just happens to support 3D, image processing, video editing, GPU programming, machine learning, Python module integration, and everything else that Blender is great at).
Discussion of how to make efficient use of token budgets when using LLMs with text and visual programming languages.
Here is the condensed discussion with the bulk text I had it analyze omitted, so you can more easily read the summaries and recommendations and "zen of" manifestos:
Here is the entire unexpurgated discussion including all the email messages and articles I had it analyze, if you want to see what it was considering (they're interesting discussions to read in their entirety if you're interesting in that kind of stuff, but ChatGPT is excellent at summarizing them and highlighting the important and relevant points):
With the rise of LLMs I think now is a great time to reconsider the design of text and visual scripting and programming languages, with LLM interoperability and compatibility in mind.
ChatGPT:
Designing a text representation for a visual programming language that interoperates well with Language Models like GPT necessitates careful planning to ensure the representation is both meaningful and efficient. Here are several considerations and strategies to optimize the text representation:
Token Efficiency: The text representation should be concise to make efficient use of the token budget of the language model. Avoid verbose syntax and prefer shorthand notations where possible.
Syntax Simplicity: A simple and consistent syntax helps the language model predict and generate code more accurately. Overly complex syntax can lead to increased errors and confusion for both the model and the user.
Standardization: Using standardized syntax and well-known programming patterns can help the language model better understand and process the code since it has likely been trained on similar patterns.
Semantic Density: Each token should carry a significant amount of meaning. This can be achieved through the use of expressive keywords or operators that encapsulate complex actions.
Modular Design: Break down the text representation into modular components. This approach allows the language model to process each part in isolation, reducing the complexity and cognitive load.
Contextual Keywords: Use keywords that clearly indicate their function within the language, making it easier for the language model to provide contextually relevant suggestions.
Window Size Awareness: Design the text representation to be legible and understandable within a typical window size, avoiding lines that span too long and require horizontal scrolling.
Minimize Ambiguity: Ambiguity in text representations can lead to multiple interpretations by the language model, resulting in inaccurate predictions. Clear and unambiguous syntax helps maintain consistency.
Commenting and Documentation: Incorporate support for comments and documentation within the text representation. This not only aids human understanding but can also provide language models with additional context for generating code.
Escaping Visual Elements: For elements that are inherently visual and do not translate well into text, provide an escape mechanism that allows these elements to be described textually in a way that the language model can process.
Alignment with Model's Training Data: Align the text representation with the structures and syntax that the language model has been trained on. This can include following common coding standards and naming conventions.
Iterative Refinement: Use an iterative approach to refine the text representation based on feedback from testing with the language model, ensuring that the design evolves to better suit the model's capabilities.
By considering these aspects, you can create a text representation for a visual programming language that is both language model-friendly and conducive to an efficient programming experience.
Agreed. If you don't test your script in other shells and on other platforms, why advertise compatibility. And not just the script, but also the programs called.
I got burned by supposedly "portable" scripts by the whole Ubuntu dash disaster, and again when I started using Mac OS X, and then again once I used Cygwin and msys2 on Windows.
I do keep portability in mind when writing shell scripts to ease porting later, but without testing there's really no way to be sure "/bin/sh" is right. And some of the Bash features such as arrays are legitimately useful.
I favor POSIX sh myself, but BASH sits on a happy medium of portability and features; zsh might well win on features, but Darwin is the only OS I know of that installs it by default, where BASH is nearly universally installed by default on Linux distros and still has considerably more features than POSIX /bin/sh.
I use bash mostly out of habit and because it's installed by default, but also because zsh had some sort of incompatibility with some of my aliases that I never got around to debugging.
Installing GNU coreutils etc was also common on Solaris. However as it usually wasn't in your path before SysV utils it was only used with its full path.
Thanks hn for doing what you do! I'm so glad to see this here.
A few months back I noticed '[' under /bin on my mac. I tried googling to understand what it was, but my google-fu came up short. It felt like one of those ungoogleable things. This link is an excellent starting point for me.
They look similar, but to your shell they are different: [ is the name of an executable, ( is a syntax symbol of bash. The man page for the syntax is the man page of bash.
What's not so funny is that /bin/[ has no way of enforcing this syntax rule, or any other. Just like sed can't enforce any syntax rules on its regular expressions. This is why the shell is still a crock of shit for scripting even with "set -e" on.
The invoked binary has no way of aborting script execution. All it can do is barf out on stderr and return an error code, which the shell interprets as false and `if [ "x" = "x"` (without ]) goes into the else branch.
I like the cut of your jib. Finally a breath of fresh air from a sane person amongst all the crazy people making fanatically apologetic excuses and heaping evangelical praise on toxic burning dumpster fires.
It could have still been implemented like that in the filesystem, if the binary read the name it was executed under and modified its behavior based on that. As an extreme case, on a stock Alpine Linux system, both test and [ - along with most other core system programs - are all symlinks to a single busybox binary that reads argv[0] and acts like whatever program it's been called as. I'm actually somewhat surprised that GNU didn't do that in this case; I, too, would have expected test and [ to be some manner of link to the same program, either with identical behavior or using invocation name to decide how to behave.
There must be reasons to use the standalone programs instead of the shell builtins. But the test builtin will be the same in, e.g., NetBSD sh as it is in Linux or Android dash. That is what I use 100% of the time. It is much faster to use builtins.
I vaguely remember that the program and built-in could not always do the same, eg the program might not have supported -e. But it’s possible that ksh was used to get -e. Does anyone remember better than I?
You don't need to do "number crunching"; the performance differences are very real even for simple scripts on modern systems. Launching a process is comparatively expensive (read binary from disk, run linker, allocate new process space in kernel, etc.) whereas a builtin is just "if progname == "[" { .. }". And when you do something like "loop over a few thousand files" the difference really adds up.
Consider this example:
for i in $(seq 1 10); do
for f in /*; do
if [ $f = /etc ]; then
: # Do nothing.
fi
done
done
I have 23 entries in /, so this will run [ 230 times – not a crazy number.
% time sh test
sh test 0.00s user 0.01s system 91% cpu 0.009 total
bash, dash, zsh: they all have roughly the same performance: "fast enough to be practically instantaneous".
But if I replace [ with /bin/[:
% time sh test
sh test 0.11s user 0.38s system 96% cpu 0.509 total
Half a second! That's tons slower! And you can keep "fast enough to be practically instantaneous" for thousands of files with the built-in [, whereas it will take many seconds with /bin/[.
(Aside: if I statically link [ it's about 0.4 seconds).
I know what it costs to start a process. My point is just if that script is done in 0.5 seconds that's good enough for me as an occasional human caller. I don't care that it could be much faster.
Of course if the script were to handle all files of a filesystem with many small files it could get disturbingly slow. I don't deny that there are cases were it matters. But in over 90% of the scripts I write or use it doesn't.
230 files is hardly "all files of a filesystem", and if you run [ 2 or 3 times per file it becomes even slower. Many scripts are little more than glorified for-loops, and "run this on a bunch of files" in particular is a major use case of shell scripts, which is also why we have find -print0, find -exec{} \+, etc.
It's up to you whether half a second is "fast enough" (just an example: can easily also be 2 seconds, or 5 seconds), but it's definitely a lot slower than 9ms and not "only much faster in theory", or "unnoticeable in practice", and "number crunching" doesn't come in to play regardless.
Since the cost of this optimisation is minimal (you can just use the source of /bin/[ as a builtin) I don't see why anyone would choose half a second over 9ms.
Oh how times have changed:) But even with the demise of most commercial unixen, I was under the impression that Darwin preserved the ancient tradition of installing the system and then immediately replacing its coreutils with GNU?