Hacker News new | past | comments | ask | show | jobs | submit login
The Beauty of Unix Pipelines (prithu.xyz)
649 points by 0x4FFC8F 11 months ago | hide | past | favorite | 374 comments

Pipes are wonderful! In my opinion you can’t extol them by themselves. One has to bask in a fuller set of features that are so much greater than the sum of their parts, to feel the warmth of Unix:

(1) everything is text

(2) everything (ish) is a file

(3) including pipes and fds

(4) every piece of software is accessible as a file, invoked at the command line

(5) ...with local arguments

(6) ...and persistent globals in the environment

A lot of understanding comes once you know what execve does, though such knowledge is of course not necessary. It just helps.

Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

> (1) everything is text

And lists are space-separated. Unless you want them to be newline-separated, or NUL-separated, which is controlled by an option that may or may not be present for the command you're invoking, and is spelled completely differently for each program. Or maybe you just quote spaces somehow, and good luck figuring out who is responsible for inserting quotes and who is responsible for removing them.

To criticize sh semantics without acknowledging that C was always there when you needed something serious is a bit short sighted.

There are two uses of the Unix “api”:

[A] Long lived tools for other people to use.

[B] Short lived tools one throws together oneself.

The fact that most things work most of the time is why the shell works so well for B, and why it is indeed a poor choice for the sort of stable tools designed for others to use, in A.

The ubiquity of the C APIs of course solved [A] use cases in the past, when it was unconscionable to operate a system without cc(1). It’s part of why they get first class treatment in the Unix man pages, as old fashioned as that seems nowadays.

There's a certain irony in responding to criticism of that which you're extolling by saying not to use it.

And the only reason I might be pushed down that path is because the task I'm working on happens to involve filenames with spaces in them (without those spaces, the code would work fine!), because spaces are a reasonable thing to put in a filename unless you're on a Unix system.

> because spaces are a reasonable thing to put in a filename unless you're on a Unix system.

Putting spaces on a filename is atrocious and should be disallowed by modern filesystems. It is like if you could put spaces inside variable names in Python. Ridiculous.

At Goldman, the internal language Slang has spaces in variable names. It's insane at first glance and I got into an argument with my manager about why in the world this was acceptable, and he could give me no other answer than "this is the way it is".

But when you realize that space isn't a valid separator between token, seeing things like "Class to Pull Info From Database::Extractor Tool" actually becomes much easier to read and the language becomes highly expressive, helped somewhat by the insane integration into the firm's systems.

I was on your side until I tried it, but it can actually be quite useful, esp. if everything is consistent.

> I was on your side until I tried it, but it can actually be quite useful, esp. if everything is consistent.

On my side? You cannot imagine how extreme one can be... If I had my way, programming languages would only allow single-letter variables and space would be the multiplication operator.

What possible justification could you have for making this mandatory?

Math notation being dependent on single-character names is an artifact of "concatenation as multiplication" -- which in my opinion is a useful but sloppy historical convention with little merit or relevance in non-handwritten media.

Okay - I wasn't meaning to pick a fight here. Just saying that there are definite use cases where spaces in variable names can work and actually be useful. That's all.

I'm not going back to CamelCase or underscores for my normal day to day file naming. The problem with spaces only exists inside the IT world and it's something they should find a way around.

At the point I'd be happy if all unix tools had a `--json` option. Then the whole space issue goes away. So do a bunch of parsing issues.

Unix tools standardized a character representation (e.g. everything is ascii). The next step is a unified standard the syntactical representation.

ASCII isn’t just a “character representation” though. It includes code points for data serialisation and transmission control. You can represent tables in ASCII without resorting to printable characters nor white space as field separators.

I don’t why POSIX tools don’t already do this but I suspect it’s because most tools would have evolved by accident rather than being designed from the ground up with edge cases in mind.

I agree. The file load/save dialogs of all GUI should work in such a way that spaces typed by the users in a filename field are always transparently changed to something different (for example, the unicode non-breaking space).

> (for example, the unicode non-breaking space)

That's potentially even worse in a file name than plain old space is. It looks like a normal space, but isn't.

If you are going to replace spaces in file names with something else, underscores are the best option I think.

Sounds like taking the "I know better than the users to another level".

You should take a stroll in the real world sometime, where spaces and Unicode exists :)

The parent comment is extreme, and the real world is indeed very diverse, but I would also be quite surprised to find a software project with spaces in internal filenames.

So would I, but the world is, thankfully, not entirely composed of software projects.

Yeah, any unicode character is OK on a filename, except maybe space and slash.

I don't think C0 or C1 controls should be allowed in filenames.

Why allow them? And they potentially pose security issues.

With some, admittedly somewhat obscure [1][2], terminal emulators, C0 and C1 controls can even be used to execute arbitrary code. You could say these terminal emulators are insecure, and you may well be right, but the fact is they exist.

[1] See for example page 211 of https://static.zumasys.com/zumasys/atfiles/manuals/at7/AccuT... which documents the ESC STX < command CR escape sequence which tells the AccuTerm terminal emulator to run an arbitrary command

[2] Also the APC C1 control can make Kermit run arbitrary commands, if you've executed "SET APC UNCHECKED" (which is not the default setting) – see http://www.kermitproject.org/onlinebooks/usingckermit3e.pdf page numbered 300 (page 310 of PDF)

I agree. Personally I think the Linux kernel should have a compilation-time option to disallow various special characters in filenames such as ASCII control characters and colons, so users who want to maintain a sane system can give themselves the option to do so.

So um, how would you work with files that came from a foreign file system? Would the kernel just crash when it sees them? Would they be effectively untouchable, like filenames with a '/' encoded in them?

Previously, I proposed addressing this with a mount option and a filesystem volume flag [1]. The mount option would prevent creating new files/directories with "bad names" but would still allow accessing ones which already exist. The filesystem volume flag would make bad names on that volume be considered fsck errors. (If the kernel found such a bad name on a volume with the flag enabled, it should log that as a filesystem error, and either return an IO error, or just pretend the file doesn't exist – whatever Linux does now when it finds a file on disk whose name contains a null byte or forward slash.)

Using a mount option and/or filesystem volume flag means it works fine with legacy volumes which might contain those bad names. On such legacy volumes, either that volume flag is not supported by the filesystem, or else it is but is disabled for that particular volume. If you mount it with the mount option, you can still access bad names on that volume, you just can't create new ones; if you need to create new bad names, you just (re)mount it with the mount option disabled.

[1] https://news.ycombinator.com/item?id=22873153

and spaces! ;)

I disagree with this immensely.

Also, some languages do allow whitespace in variable names like R and SQL, so long as the variable names are quoted or escaped properly.

Are your kids named NameSurname? Did you meet EuropeanBrownBear on your last trip to the mountains? :)

The shell works with spaces, you just need to be careful to quote every file name. The advantage of using file names without spaces is that you can avoid the quotes.

Please don't ever assume you can avoid spaces and quotes. It's just a time bomb.

It's like people saying you don't need to escape SQL values because they come from constants. Yes, they do... today.

It's not just quoting either. It's setting the separator value and reverting it correctly. It's making sure you're still correct when you're in a function in an undefined state. It's a lot of overhead in larger projects.

Sure, but in a larger project your won’t be coding in shell script, right?

Larger projects will often include small shell scripts for deployment / maintenance / various tasks.

What I am saying is nothing new. This has always been the policy of UNIX developers. If you use only file names without space, then you can make it easier to use tools that depend on this. Of course, if you're writing applications in shell then you cannot rely just on convention.

The shell (bash, anyway) has a weird mix of support and lack of support for spaces/lists/etc.

For example, there is no way to store the `a b "c d"` in a regular variable in a way where you can then call something similar to `ls $VAR` and get the equivalent of `ls a b "c d"`. You can either get the behavior of `ls a b c d` or `ls "a b c d"`, but if you need `ls a b "c d"` you must go for an array variable with new syntax. This isn't necessarily a big hurdle, but it indicates that the concepts are hard to grasp and possibly inconsistent.

Bash has arrays:

    var=( a b "c d" )
    ls "${var[@]}"
POSIX shell can get you what you want with eval:

    var='a b "c d"'
    eval "ls $var"
There is also one array variable in POSIX shells, the argument list:

    set -- a b "c d"
    ls "$@"

That has nothing to do with spaces, but more to do with the rather lackluster support for arrays in most Bourne-style shells.

Indeed, and the shell lives and breathes spaces. Arrays of arguments are created every time one types a command. Without spaces-as-magic we’d be typing things like this all the time:

  $ exec(‘ls’, ‘-l’, ‘A B C’)
Maybe that’s unrealistic? I mean, if the shell was like that, it probably wouldn’t have exec semantics and would be more like this with direct function calls:

  $ ls(ls::LONG, ‘A B C’)
Maybe we would drop the parentheses though — they can be reasonably implied given the first token is an unspaced identifier:

  $ ls ls::LONG, ‘A B C’
And really, given that unquoted identifiers don’t have spaces, we don’t really need the commas either. Could also use ‘-‘ instead of ‘ls::’ to indicate that an identifier is to be interpreted locally in the specific context of the function we are calling, rather than as a generic argument.

  $ ls -LONG ‘A B C’
If arguments didn’t have spaces, you could make the quotes optional too.


I don't think this is the problem people usually complain about.

The much bigger problem is that spaces make text-only commands compose badly.

    $ ls -l `find ./ -name *abc*`
Is going to work very nicely if file names have certain properties (no spaces, no special chars), and it's going to break badly if they don't. Quoting is also very simple in simple cases, but explodes in complexity when you add variables and other commands into the mix.

That's still just a shell problem since the shell is expanding `find` and placing the unquoted result into the command line. Some non-POSIX shells don't have this issue. For example with murex you could use either of these two syntaxes:

    ls -l ${find ./ name *abc*}
    # return the result as a single string

    ls -l @{find ./ name *abc*}
    # return the result as an array (like Bourn shell)
So in the first example, the command run would look something like:

    ls -l "foo abc bar"
whereas in the second it would behave more like Bourne shell:

    ls -l foo abc bar
As an aside, the example you provided also wouldn't work in Bourne shell because the asterisks would be expanded by the shell rather than `find`. So you'd need to quote that string:

    ls -l `find ./ -name "*abc*"`
This also isn't a problem in murex because you can specify whether to audo-expand globbing or not.

With POSIX shells you end up with quotation mark soup:

    ls -l "`find ./ -name '*abc*'`"
(and we're lucky in this example that we don't need to nest any of the same quotation marks. It often gets uglier than this!)

Murex also solves this problem by supporting open and close quote marks via S-Expression-style parentheses:

    ls -l (${find ./ -name (*abc*)})
Which is massively more convenient when you'd normally end up having to escape double quotes in POSIX shells.


Now I'm not trying to advocate murex as a Bash-killer, my point is that it's specifically the design of POSIX shells that cause these problems and not the way Unix pipelines work.

Ah, more problems with shell expansion and substitution.

Pipelines to the rescue. If you used a pipeline instead of shell expansion, it can deal with spaces just peachy.

    $ find . -name "*abc*" -print0 | xargs -0 ls -l
There's no argument that bash (and other shells) make dealing with spaces and quotes troublesome. That has nothing to do with pipelines.

Does every unix utility suport -print0 or similar?

My point wasn't specifically about find, but about the problems of text and other unstructured input/output in general.

Update: reading a bit around the internet, it looks like it's not possible to do something like this :

    find / -print0 | \
        grep something | \
        xargs -0 cat
As grep will mangle the output from find -print0.

And note that, all the way in my original argument, I complained that every command used a different way of spelling that changed delimiter character.

Some shells handle this quoting better than others.

That's when it's useful to use Perl as a substitute for a portion of or the entire pipeline.

Choosing space as a separator is a useful convention that saves you typing in additional code afterwards if you want to run a short command.

It's like there is a shortcut that some people use that you want to wall in because it doesn't look pretty to you.

The biggest pain comes from untarring some source code and trying to build it while in ~/My Tools Directory. Spaces in directories that scupper build code that isn’t expecting them is the fatal mixing of worlds.

In most other cases I’ve never really had a problem with “this is a place where spaces are ok” (e.g. notes, documents, photos) and “this is a place where they are not ok” — usually in parts of my filesystem where I’m developing code.

It’s fine to make simplifying assumptions if it’s your own code. Command history aside, most one liners we type at the shell are literally throwaways.

I think I was clear that they aren’t the only category of program one writes and that, traditionally on Unix systems, the counterpart to sh was C.

The problem is that the pipeline model is extremely fragile and breaks in unexpected ways in unexpected places when hit with the real world.

The need to handle spaces and quotes can take you from a 20 character pipeline to a 10 line script, or a C program. That is not a good model whichever way you look at it.

Pipelines are mainly for short-lived, one-off quick scripts. But they are also really useful for when you control the input data. For example, if you need a summary report based on the results of a sql query.

If you control the inputs and you need to support quotes, spaces, non-white space delimiters, etc... in shell script, then that’s on you.

If you don’t control the inputs, then shell scripts are generally a poor match. For example, if you need summary reports from a client, but they sometimes provide the table in xlxs or csv format — shell might not be a good idea.

Might be controversial, but I think you can tell who works with shell pipes the most by looking at who uses CSV vs tab-delimited text files. Tabs can still be a pain if you have spaces in data. But if you mix shell scripts with CSV, you’re just asking for trouble.

That's not the fault of pipes, that's the fault of shells like bash. There are shells that deal with spaces and quotes perfectly fine.

I've been scripting stuff on the pipeline for over a decade and haven't really run into this much.

You can define a field separator in the command line with the environment variable IFS - i.e. 'IFS=$(echo -en "\n\b");' for newlines - which takes care of the basic cases like spaces in filenames/directory names when doing a for loop, and if I have other highly structured data that is heavily quoted or has some other sort of structure to it, then I either normalize it in some fashion or, as you suggest, write a perl script.

I haven't found it too much of a burden, even when dealing with exceptionally large files.

I agree that spaces, quotes, encodings, line-ends etc. are problem, but this is not problem of „the pipeline model“ – this is problem of ambiguously-structured data. See https://relational-pipes.globalcode.info/v_0/classic-example...

Kind of / not really.

Also Zsh solves 99% of my "pipelines are annoying" and "shell scripts are annoying" problems.

Even on systems where I can't set my default shell to Zsh, I either use it anyway inside Tmux, or I just use it for scripting. I suppose I use Zsh the way most people use Perl.

C was always there when you needed something serious

There is a world of stuff in between "I need relatively low-level memory management" and "I need a script to just glue some shit together".

For that we have Python and Perl and Ruby and Go, or even Rust.

Yes! I love 2020!

(My point about C was in the historical context of Unix, which is relevant when talking about its design principles.)

Going with the same theme, C itself was an innovation meant to fill the space between Assembler (in this context, "A") and B[0].

[0] https://en.wikipedia.org/wiki/B_(programming_language)

That isn't really supported by the article you linked -- it describes C as an extension of B multiple times.

I only linked that because most people don't know what B is, and I suppose the difference between "innovation meant to fill the space" and "extention" wasn't so important to me.

My source for the claim itself is from 14m30s of Brian Cantrill's talk "Is It Time to Rewrite the Opersting System in Rust" https://youtu.be/HgtRAbE1nBM

To be more precise: A->BCPL->B->C

This doesn't always hold in Linux though. Some proc entries are available only via text entries you have to parse whether that's in sh or C. There's simply no public structured interface.

This is one of the superior aspects of FreeBSD: no parsing of human-readable strings to get back machine-readable information about the process table. It's available directly in machine-readable form via sysctl().

[C] Long lived tools one throws together oneself

Stable tools designed for oneself

The unknown knowns?

That's a POSIX shell thing rather than a Unix pipeline thing. Some non-POSIX shells don't have this problem while still passing data long Unix pipes

Source: I wrote a shell and it solves a great many of the space / quoting problems with POSIX shells.

Same, I've built toy shells [0] that used xml, s-expressions and ascii delimited text [1]. The last was closest to what unix pipes should be like. Of course it breaks ALL posix tools, but it felt like a shell that finally works as you'd expect it to work.

[0] Really just input + exec + pipes.

[1] https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text

As long as you re-serialise that data when piping to coreutils (etc) you shouldn't have an issue.

This is what my shell (https://github.com/lmorg/murex) does. It defaults to using JSON as a serialisation format (that's how arrays, maps, etc are stored, how spaces are escaped, etc) but it can re-serialise that data when piping into executables which aren't JSON aware.

Other serialisation formats are also supported such as YAML, CSV and S-Expressions too. I had also considered adding support for ASCII records but it's not a use case I personally run into (unlike JSON, YAML, CSV, etc) so haven't written a marshaller to support it yet.

Ascii records have worked better than json, xml, s-expressions (better as in provide the same bang for a lot less buck) in every case I've used them for.

I have no idea why I am the only person I know who uses them regularly.

ASCII records have their own limitations though:

- They can be as easily edited by hand like other serialisation formats which use printable characters as their deliminators

- It's not clearly defined how you'd use them for non-tabulated data (such as JSON, XML, S-Expressions)

- There isn't any standard for escaping control characters

- They're harder to differentiate between unserialised binary formats

- They can't be used as a primitive like JSON is to Javascript and S-Expressions is to Lisp.

And if we're both honest, reducing the serialisation overhead doesn't gain you anything when working in the command line. It's a hard enough sell getting websites to support BSON and at least there, there is a tangible benefit of scale.

Not that I'm dismissing ASCII records, they would have been better than the whitespace mess we currently have in POSIX shells. However I don't agree ASCII records are better that current JSON nor S-Expressions.

Here's one tool that it does not break. (-:

* http://jdebp.uk./Softwares/nosh/guide/commands/console-flat-...

>good luck figuring out ...

luck, or simply execute the steps you want to check by typing them on the command line. I find the pipes approach incredibly powerful and simple because you can compose and check each step, and assemble. That's really the point, and the power, of a simple and extensible approach.

God help you if your paths have spaces in them.

No need to invoke a deity to do that...

  echo 'Hi!' | tee "$(printf "foo\nbar\tqu ux.txt")"
  cat foo$'\n'bar$'\t'qu' 'ux.txt
IIRC the latter only works in ksh93+, but the former should be fine pretty much in any shell.

... except if one uses \c by mistake. \c has unspecified behaviour, and is one of the things that varies widely from one implementation of printf to another.

* https://unix.stackexchange.com/a/558665/5132

By “God” you may mean “libc”...

...if it was the 1970s. Nowadays we use Python etc for tools where it’s less acceptable for them to fall apart, though everything falls apart at some point, spaces or otherwise.


Don't you mean Raku? lol

Perl 5 was more suitable for these particular use cases than perl 6 ever tried to be.

Perl5 was and is amazing. Check the articles from https://linuxfocus.com and the old Linux Gazzete magazines from TLDP: https://linuxgazette.net/archives.html.

Just quote paths? shrug

Try doing that in e.g. Make

In the early 2k's I discovered that it was impossible to quote or escape spaces in filepaths in a makefile. I naively reported it to the Make maintainer as a bug.

The maintainer responded by saying, basically, that the behavior was deeply baked in and was not going to change.

This is a nice feature of make. People who put spaces on filenames are justly punished by not being given the goodness of make.

I can't tell if you're being sarcastic but this can result in innocent users who don't even know this fact to suddenly overwrite or wipe their files when trying to build someone's project.

Nobody is innocent if they have filenames with spaces in their systems! /s

Try finding out which registry key you need to edit to get something to happen.

Some things across different systems have pitfalls as a result of their implementation, or some features have pitfalls. There are half a dozen things I can point to on windows that are worse than the "spaces in filenames". Hell, windows doesn't even support ISO 8601 dates, or times to be properly placed in files. It restricts symbols much more than Linux / Unix

who said anything about windows?

The parent thread referenced windows as something that handles spaces in filenames. I'm not sure why I'm getting downvoted when my comment was relevant to the debate occurring, and my point (that all features have downfalls caused by the necessity of implementation within the surrounding system, and in the case of command line arguments, linguistics and common keyboards themselves) is something that according to the guidelines should be debated against, not downvoted.

None of the parents of your comment mentioned Windows. Perhaps you have the wrong thread?

Or you just pipe through the program of your choice for which you know the syntax:

  echo "foo bar fnord" | cut --delimiter " " --output-delimiter ":" -f1-

  # foo:bar:fnord

I love learning new things about commands I've used for years.

    --output-delimiter ":"
Very cool!

However it does require the GNU version of cut, not the Mac OS X supplied BSD version. In zsh at least, you can do:

    alias cut='/usr/local/bin/gcut'

The fact that filename can contain anything except NULL/slash is really pain. I often write a shell script that treats LF as separator but I know it's not good.

Why not use null as a separator?

I agree it's messy but not that hard. Just set IFS to null.

filenames are variable names. Choosing good filenames is the first step in writing a nice shell script in a situation that you control.

It's sometimes in control or out of control. Writing script in defensive way is better for security if filename is out of control.

No, they aren't -- they are data.

I think this was a failure on behalf of early terminal emulator developers. End-of-field and end-of-record should have been supported by the tty (either as visual symbols or via some sort of physical distancing, etc even if portrayed just as a space and a new line respectively) so that the semantic definition could have been preserved/distinguished.

TTYs in general are some of the most outdated and archiac pieces of technology still used today. They need a serious overhaul, but the problem is, any major change will break pretty much all programs.

Ah yes, perfect is the enemy of the good.

Works for me.

People struggle so much with this, but I don't see what the point is at all. The fundamental problem that playing with delimiters solves, is passing arbitrary strings through as single tokens.

Well, that's easy. Don't escape or quote the strings; encode them. Turn them into opaque tokens, and then do Unix things to the opaque tokens, before finally decoding them back to being strings.

There's a reason od(1) is in coreutils. It's a Unix fundamental when working with arbitrary data. Hex and base64 are your friends (and Unix tools are happy to deal with both.)

Never heard about this, would this affect performance?

Can you give some examples?

> everything is text

Everything is a byte stream. Usually that means text but sometimes it doesn't. Which means you can do fun stuff like:

- copy file systems over a network: https://docs.oracle.com/cd/E18752_01/html/819-5461/gbchx.htm...

- stream a file into gzip

- backup or restore an SD card using `cat`

Also dd, which may be the disk destroyer but is also a great tool for binary file miracles.

See one here: https://unix.stackexchange.com/questions/6852/best-way-to-re...

"data definition", from MVS JCL.


I recall hearing conjecture that cc was already a routine, and that "copy and convert" became dd as a result in a really old system that may have been pre-unix.

Researching, it appears that there are multiple attributions, because it's just that obvious and necessary a routine.




Yeah disk destroyer or data deleter or whatever is the old joke, not it's actual name. For those unaware, dd has often been used for stuff like overwriting a file with random noise as a homespun secure delete function, and it can do a lot of damage with small typos in the command.

Could have used a \s or something to clarify.

dd doesn’t example byte stream pipelines though ;)

But yes, that’s the typical file / block management tool. Just remember to set a block size otherwise you’d suffer from worse performance than using cat

You're right! I was planning to tell a story about some bulk file changes using dd, but then I saw it was a giant "you had to be there" story so I posted with half a context.

> Unix is seriously uncool with young people at the moment.

Those damn kids with their loud rockn'roll music and their Windows machines. Back in my day we had he vocal stylings of Dean Martin and the verbal stylings of Linus Torvalds let me tell ya.

Seriously though, I'm actually seeing younger engineers really taking the time to learn how to do shell magic, using vim, etc. It's like the generation of programmers who started up until the late 90s used those by default, people who like me started 15-20 years ago grew up on IDEs and GUIs, but I've seen a lot of 20-30 something devs who are really into the Unix way of doing things. The really cool kids are in fact doing it.

I guess I am one of the young kids who think the unix command line is wicked cool. It makes the user experience on my laptop feel so much more powerful.

Me too. Especially since I can regularly hit 100 WPM when typing, but I'm a terrible shot with the mouse - not to mention the fact that you can get really nice and satisfying keyboards (I use an IBM Model M at home and a CODE Cherry MX Clear at college) but mice are all kind of the same, and you have to move your hand a lot to get there. On that last point, my mouse hand gets wrist pain consistently way more than my other hand (which I only use for the keyboard) does.

Add to all that the fact that the command line allows for a lot of really easy composition and automation, and it's significantly better for me. I can hardly function on a Windows computer!


For projects where I must work in Windows my secret weapon is WSL + tmux + fish + vim for everything without a strict Visual Studio dependency.

"Millennials Have Ruined GUIs"

hard to get around not using the mouse eventually when looking up docs or copy pasting from browser

Yes, that’s a pain point I haven’t quite cracked.

Copy-paste within tmux works just fine with (a) set -g mouse on in ~/.tmux.conf (b) mouse select to copy and (c) Ctrl + B ] to paste.

Copy-paste from tmux to Windows or vice versa is a pain.

I haven’t tried all the terminals yet but what I need is an iTerm2 [1] for Windows.

[1]: https://iterm2.com

Unfortunately. But that's what VI keybindings for the browser are for!

I personally don't mind using the mouse here and there in vim.

I grew up all-GUI windows kid. I actually had a revulsion to the shell, mostly because it seemed unfriendly and scary. In my early 20s I tried to run Plex on a Windows 7 box and it was miserable. I forced myself to lean by switching to a headless arch linux box.

Giving up the idea that CLI = pain (i.e figuring out how to to navigate a file system, ssh keys, etc) for sure was a learning curve, but now I can't imagine using computers without it.

I guess that's the issue with terminal - the learning curve. I did start with TUIs as a kid before Windows came along, and I remember that feeling when I started using GUIs - "oh, everything has a menu, no need to remember anything, no need to read any docs, it's all just there". That was a real revolution.

> (1) everything is text

Not at all, you can pipe around all the binary you want. Until GNU tar added the 'z' option, the way to extract all files in a tarball was:

`gunzip -c < foo.tar.gz | tar x`

However, "text files" in Unix do have a very specific definition, if you want them to work with standard Unix text manipulation utilities like awk, sed, and diff:

All lines of text end are terminated by a line feed, even the last one.

I can't tell you how many "text files" I run across these days that don't have a trailing newline. It's as bad as mixing tabs and spaces, maybe worse.

Serialized bytestreams do compose better than graphical applications. But that is setting a very low bar. For example, allowing passing around dicts/maps/json (and possibly other data structures) would already be a massive improvement. You know what might be even better — passing around objects you could interact with by passing messages (gasp! cue, Alan Kay).

While posix and streams are nice (if you squint, files look like a poor man’s version of objects imposed on top of bytestreams), it’s about time we moved on to better things, and not lose sight of the bigger picture.

Yeah, but that makes every single file an object. Then you need plugins to deal with plain text objects, plugins to deal with every single type of object and their versions.

Objects aren't human readable and if they get corrupted the object's format is very difficult to recover, you need an intimate knowledge of both the specific format used (of which there might be thousands of variations). Plaintext, however, if that gets corrupted it's human-readable. The data might not be more compact (Although TSV files tend to come out equal), but it's more robust against those kinds of changes.

Have you ever examined a protobuf file without knowing what the protobuf structure is? I have, as part of various reverse-engineering I did. It's a nightmare, and without documentation (That's frequently out of date or nonexistent) it's almost impossible to figure out what the data actually is, and you never know if you've got it right. Even having part of the recovered data, I can't figure out what the rest of it is.

But that seems like just ignoring the problem, like an ostrich shoving one’s head into the sand!

> Have you ever examined a protobuf file without knowing what the protobuf structure is?

Are you referring to an ASCII serialization or a binary serialization?

In a typical scenario, your application needed structured data (hence protobuf), and then you serialized your protobuf into an ASCII bytestream... so how are you ever worse off with a protobuf compared to a text file, provided you make a good choice of serialization scheme? Clearly text files cannot carry enough structure, so we have to build more layers of scaffolding on top?

Ultimately files are serialized on to disk as bits, and “ASCII” and “posix” are just “plugins” to be used when reading bitstreams (serialized file objects). Plaintext is not readable if the bits get corrupted or byte boundaries/endianness gets shifted!

To have the same amount of of robustness with higher-order structures I imagine all one needs is a well-defined de/serialization protocol which would allow you to load new objects into memory? Am I missing something?

You could solve that problem with a system which mandated that structured data always come with a reference to a schema. (And where files as blobs disconnected from everything else only exist as an edge case for compatibility with other systems).

> For example, allowing passing around dicts/maps/json (and possibly other data structures) would already be a massive improvement.

It is allowed, you can pass around data in whatever encoding you desire. Not many do though because text is so useful for humans.

> You know what might be even better — passing around objects you could interact with by passing messages (gasp! cue, Alan Kay)

That's a daemon/service and basic unix utils make it easy to create, a shell script reading from a FIFO file fits this definition of object, the message passing is writing to the file and the object is passed around with the filename. Unix is basically a fulfillment of Alan Kay's idea of OO.

> For example, allowing passing around dicts/maps/json (and possibly other data structures) would already be a massive improvement.

Well, it is allowed. You can of course serialize some data into a structured format and pass it over a byte stream in POSIX to this end, and I think this is appropriate in some cases in terms of user convenience. Then you can use tools like xpath or jq to select subfields or mangle.

> (if you squint, files look like a poor man’s version of objects imposed on top of bytestreams)

If all you have is a hammer...

try powershell, its on linux too

Indeed, if it wasn’t for its reliance on .NET, it looks like PowerShell could be a massive improvement over Unix shell.

There are projects like NuShell that are similar to PowerShell but don't rely on .NET.

There is nothing like PowerShell.

I agree that all of them so far seem to be lacking in many ways compared to PowerShell, but saying they are "nothing like" PowerShell isn't a very accurate statement in my opinion.

No, that is totally accurate - there is nothing even close to PowerShell.

You were inaccurate on the other hand - what is similar to PowerShell ?

Structured object piping, for one. What is this thread about again? Oh right, piping.

"Structured object piping" is not a product.

That's some non-sequitur argument you got there.

Some consider not relying on C a massive improvement.

> Pipes are wonderful!

It's wonderful only if compared to worse things, pretending that PowerShell is not a thing, and that Python doesn't exist.

UNIX pipes are a stringly-typed legacy that we've inherited from the 1970s. The technical constraints of its past have been internalised by its proponents, and lauded as benefits.

To put it most succinctly, the "byte stream" nature of UNIX pipes means that any command that does anything high-level such as processing "structured" data must have a parsing step on its input, and then a serialization step after its internal processing.

The myth of UNIX pipes is that it works like this:

    process1 | process2 | process3 | ...
The reality is that physically, it actually works like this:

   (process1,serialize) | (parse,process2,serialize) | ...
Where each one of those "parse" and "serialise" steps is unique and special, inflexible, and poorly documented. This is forced by the use of byte streams to connect each step. It cannot be circumvented! It's an inherent limitation of UNIX style pipes.

This is not a limitation of PowerShell, which is object oriented, and passes strongly typed, structured objects between processing steps. It makes it much more elegant, flexible, and in effect "more UNIX than UNIX".

If you want to see this in action, check out my little "challenge" that I posed in a similar thread on YC recently:


The solution provided by "JoshuaDavid" really impressed me, because I was under the impression that that simple task is actually borderline impossible with UNIX pipes and GNU tools:


Compare that to the much simpler equivalent in PowerShell:


Especially take note half of that script is sample data!

> Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

UNIX is entirely too cool with young people. They have conflated the legacy UNIX design of the 1970s with their preferred open source software, languages, and platforms.

There are better things out there, and UNIX isn't the be all and end all of system management.

> Where each one of those "parse" and "serialise" steps is unique and special, inflexible, and poorly documented.

This can also be a security concern. According to research, a great number of defects with security implications occur at the input handling layers:


Famously, "avoid parsing" was one of the qmail security principles (q.v.).

> http://langsec.org

The maintainer apparently has no transport security concerns. It takes literally two minutes to set up TLS these days

   The reality is that physically, it actually works like this:

      (process1,serialize) | (parse,process2,serialize) | ...
But as soon as you involve the network or persistent storage, you need to do all that anyway. And one of the beauties is that the tools are agnostic to where the data comes from, or goes.

So you're saying optional things should be forced on everything, because the rare optional case needs it anyway?

Look up what the Export-Csv, ConvertTo-Json, and Export-CliXml, Invoke-Command -ComputerName 'someserver', or Import-Sql commands do in PowerShell.

Actually, never mind, the clear and consistent naming convention already told you exactly what they do: they serialise the structured pipeline objects into a variety of formats of your choosing. They can do this with files, in-memory, to and from the network, or even involving databases.

This is the right way to do things. This is the UNIX philosophy. It's just that traditional shells typically used on UNIX do a bad job at implementing the philosophy.

If you have network or persistent storage, those parse and serialise steps are better served by a consistent, typed, and common serialize and parse step instead of a different ad-hoc step for each part of the chain (especially if there are then parts of the chain which are just munging the output of one serialisation step so it can work with the parse stage of another).

Typed streams sound great.

Most data being line oriented (with white space field separators) has historically worked well enough, but I get your point that the wheels will eventually fall off.

It’s important to remember that the shell hasn’t always been about high concept programming tasks.

  $ grep TODO xmas-gifts
  grandma TODO
  auntie Sian TODO
  jeanie MASTODON T shirt
The bug (bugs!) in the above example doesn’t really matter in the context of the task at hand.

More Lego rules:

(7) and common signalling facility

(8) also nice if some files have magic properties like /dev/random or /proc or /dev/null

(9) every program starts with 3 streams, stdin/stdout for work and stderr for out of band errors

(9) every program starts with 3 streams, stdin/stdout for work and stderr for out of band errors

This is false - if a process closes stdin/out/err, its children won't have these files open.

That's correct, which is why I said _starts_ with the streams. The child can do what it wants with them after that.

His point was that if you close stdin/stdout/stderr and then fork a child, the new child will not have all of those streams when it starts - the three streams are a convention, not something enforced by the kernel when creating a new process.

That said, it's still a bit of a pedantic point, you can expect to always have those three fds in the same way you can expect `argv[0]` to represent the name of your executable (Which also isn't enforced by the kernel).

Actually, it's not a pedantic point. It's a security flaw waiting to happen (as it actually has) if one assumes that one always has three open file descriptors.

The privileged program opens a sensitive data file for writing, it gets assigned (say) file descriptor 2, because someone in a parent process took some advice to "close your standard I/O because you are a dæmon" to heart, other completely unrelated library code somewhere else blithely logs network-supplied input to standard error without sanitizing it, and (lo!) there's a remotely accessible file overwrite exploit of the sensitive data file.

Ironically, the correct thing to do as a dæmon is to not close standard I/O, but to leave setting it up to the service management subsystem, and use it as-is. That way, when the service management subsystem connects standard output and error to a pipe and arranges to log everything coming out the other end, daemontools-style, the idea actually works. (-:

I defy anyone to build their error-handling logic off the back of stderr…

Logic, not in this case, just messages.

The point of separate error streams echoes the main Unix philosophy of doing small things, on the happy path, that work quietly or break early. Your happy results can be chained on to the next thing and your errors go somewhere for autopsy if necessary. Eg,

    p1 | p2 | p3 > result.txt 2> errors.txt
will not contaminate your result file with any stderr gripes. You may need to arrange error streams from p1 and p2 using () or their own redirection. If it dies, it will die early.

BUT speaking of logic, the concept is close to Railway Oriented Programming, where you can avoid all the failure conditionals and just pass a monad out of a function, which lets you chain happy path neatly. Eg, nice talk here https://fsharpforfunandprofit.com/posts/recipe-part2/

1. All data is bytes

2. Everything is a file descriptor

3. File descriptors are things that support standard I/O system calls such as read and write

You are describing Plan 9.

> (2) everything (ish) is a file

I'm not all that knowledgable about Unix history, but one thing that has always puzzled me was that for whatever reason network connections (generally) aren't files. While I can do:

  cat < /dev/ttyS0
to read from a serial device, I've always wondered why I can't do something like:

  bind /tmp/mysocket 80
  cat < /tmp/mysocket
It is weird that so many things in Unix are somehow twisted into files (/dev/fb0??), but network connections, to my knowledge, never managed to go that way. I know we have netcat but it's not the same.

The original authors of "UNIX" considered their creation to be fatally flawed, and created a successor OS called "Plan 9", partially to address this specific deficit, among others like the perceived tacked-on nature of GUIs on *nix.

That's a whole rabbit hole to go down, but Plan 9 networking is much more like your hypothetical example [0]. Additionally, things like the framebuffer and devices are also exposed as pure files, whereas on Linux and most unix-like systems such devices are just endpoints for making ioctl() calls.

[0] http://doc.cat-v.org/plan_9/4th_edition/papers/net/

I think you're just seeing the abstraction fall apart a bit - for weirder devices, I would say they are "accessible" via a file, but you likely can't interact with it without special tools/syscalls/ioctls. For example, cat'ing your serial connection won't always work, occasionally you'll need to configure the serial device or tty settings first and things can get pretty messy. That's why programs like `minicom` exist.

For networking, I don't think there's a particular reason it doesn't exist, but it's worth noting sockets are a little special and require a bit extra care (Which is why they have special syscalls like `shutdown()` and `send()` and `recv()`). If you did pass a TCP socket to `cat` or other programs (Which you can do! just a bit of fork-exec magic), you'd discover it doesn't always work quite right. And while I don't know the history on why such a feature doesn't exist, with the fact that tools like socat or curl do the job pretty well I don't think it's seen as that necessary.

Theoretically it would work if the devices were exposed as directories, not files (which is basically what /proc actually does for most things).

/dev/ttyS0 should really be /dev/ttyS0/{data, ctr, baud, parity, stopbits, startbits} (probably a bunch I'm forgetting).

Congratulations, you've re-invented Plan9!

Berkeley sockets were kind of bolted on as an afterthought. If the AT&T guys developed them they would probably look a lot more like that.

There's no need for a subjunctive. Go and look at AT&T STREAMS and see what actually did result.

elsewhere in this thread somebody mentions socat, but you can do it entirely within bash. from https://www.linuxjournal.com/content/more-using-bashs-built-... :

    exec 3<>/dev/tcp/www.google.com/80
    echo -e "GET / HTTP/1.1\r\nhost: http://www.google.com\r\nConnection: close\r\n\r\n" >&3
    cat <&3
https://news.ycombinator.com/item?id=23422423 has a good point that "everything is a file" is maybe less useful than "everything is a file descriptor". the shell is a tool for setting up pipelines of processes and linking their file descriptors together.

Daniel J. Bernstein's little known http@ tool is just a shell script wrapped around tcpclient, and is fairly similar, except that it does not use the Bashisms of course:

    printf 'GET /%s HTTP/1.0\nHost: %s:%s\n\n' "${2-}" "${1-0}" "${3-80}" |
    /usr/local/bin/tcpclient -RHl0 -- "${1-0}" "${3-80}" sh -c '
    /usr/local/bin/addcr >&7
    exec /usr/local/bin/delcr <&6
    ' |
    awk '/^$/ { body=1; next } { if (body) print }'

In the c api, you can read/write to sockets. Wouldn't be difficult to implement something like your example with a named pipe.

Additionally, there is /dev/tcp in bash.

Sounds like you want netcat?

Or potentially even socat (http://www.dest-unreach.org/socat/), which could be used (among many other things) to model the use case of GP, forwarding a socket to a pipe.

>(1) everything is text

>(2) everything (ish) is a file

I'm having a moment. I have to get logs out of AWS, the thing is frustrating to no end. There's a stream, it goes somewhere, it can be written to S3, topics and queues, but it's not like I can jack it to a huge file or many little files and run text processing filters on it. Am I stupid, behind the times, tied to a dead model, just don't get it? There's no "landing" anywhere that I can examine things directly. I miss the accessibility of basic units which I can examine and do things to with simple tools.

I worked around that by writing a Python script to let me pick the groups of interest, download all of the data for a certain time period, and collapse them into a CSV file ordered by time stamp.

Yes, it’s very annoying I have to do all that, but I commend Bezos for his foresight in demanding everything be driven by APIs.

Can you give me any clue as to what execve does? I looked at the man page but none the wiser. Sounds like magic from what I read there. I'm from a Windows backgroud and not used to pipes.

It replaces the executable in the current process with a different executable. It's kind of like spawning a new process with an executable, except it's the same PID, and file descriptors without CLOEXEC remain open in their original state.

would `become` be a good alternate name for it?

That's loosely correct.

whats a simple example of using this?

It’s how all new processes in Unix are created. There is no “spawn new process running this code” operation. Instead:

(1) you duplicate yourself with fork()

(2) the parent process continues about its business

(3) the created process calls execve() to become something else

This is the mechanism by which all new processes are created, from init (process 1) to anything you run from your shell.

Real life examples here, in this example of a Unix shell:


Bash is probably the best example of this style of working. You have a shell process that forks and spawns child processes from itself that are turned into whatever program you've called from your terminal.

see here: https://pubs.opengroup.org/onlinepubs/9699919799/functions/e...

plenty of basic examples with execv, execve, execvp, etc.

if the execv() call succeed then it completely replaces the program running in the process that calls it

you can do plenty of tricks with that ;)

Well you have a Windows background, so you know BASIC, right? And you know that BASIC has a CHAIN statement, right? (-:

Yes, I know. You've probably never touched either one. But the point is that this is simply chain loading. It's a concept not limited to Unix. execve() does it at the level of processes, where one program can chain to another one, both running in a single process. But it is most definitely not magic, nor something entirely alien to the world of Windows.

* https://en.wikibooks.org/wiki/QBasic/Appendix#CHAIN

If you are not used to pipes on Windows, you haven't pushed Windows nearly hard enough. The complaint about Microsoft DOS was that it didn't have pipes. But Windows NT has had proper pipes all along since the early 1990s, as OS/2 did before it since 1987. Microsoft's command interpreter is capable of using them, and all of the received wisdom that people knew about pipes and Microsoft's command interpreters on DOS rather famously (it being much discussed at the time) went away with OS/2 and Windows NT.

And there are umpteen ways of improving on that, from JP Software's Take Command to various flavours of Unix-alike tools. And you should see some of the things that people do with FOR /F .

Unix was the first with the garden hosepipe metaphor, but it has been some 50 years since then. It hasn't been limited to Unix for over 30 of them. Your operating system has them, and it is very much worthwhile investigating them.

Thanks. I use pipes, occasionally ( I admit I prefer to code in a scripting language ) in Windows. It’s just execve that I was not familiar with.

Its how you launch (execute) a new program.

Such programs typically don’t have any interaction so the v and the e parts refer to the two ways you can control what the program does.

v: a vector (list) of input parameters (aka arguments)

e: an environment of key-value pairs

The executed program can interpret these two in any way it wishes to, though there are many conventions that are followed (and bucked by the contrarians!) like using “-“ to prefix flags/options.

This is in addition to any data sent to the program’s standard input — hence the original discussion about using pipes to send the output of one command to the input of another.

“(1) everything is text”

LOL. What fantasy land are you living in?

Pipes are great. Untyped, untagged pipes are not. They’re a frigging disaster in every way imaginable.

“Unix is seriously uncool with young people at the moment.”

Unix is half a century old. It’s old enough to be your Dad. Hell, it’s old enough to be your Grandad! It’s had 50 years to better itself and it hasn’t done squat.

Linux? Please. That’s just a crazy cat lady living in Unix’s trash house.

Come back when you’ve a modern successor to Plan 9 that is easy, secure, and doesn’t blow huge UI/UX chunks everywhere.

>Unix is seriously uncool with young people...

Not all of us. I much prefer the Unix way of doing things, it just makes more sense then just trying to become another Windows.

Everything is a file until one needs to do UNIX IPC, networking or high performance graphics rendering.

> Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

I offer you my well wishes on that.

How’s it uncool? Just not “ mobile”?


Linux, ew!”

A big part of teaching computer science to children is breaking this obsession with approaching a computer from the top down — the old ICT ways, and the love of apps — and learning that it is a machine under your own control the understanding of which is entirely tractable from the bottom up.

Unlike the natural sciences, computer science (like math) is entirely man made, to its advantage. No microscopes, test tubes, rock hammers, or magnets required to investigate its phenomena. Just a keyboard.

>No microscopes, test tubes, rock hammers, or magnets required to investigate its phenomena.

They're full of magnets, and with electron microscope you can analyze what's going on in the hardware.

Yeah, but it's not required for a lot of investigation.

Until stuff goes wrong with the hardware...

My sense is that people do think Linux is "cool" (it's certainly distinguished in being free, logical, and powerful), but its age definitely shows. My biggest pain points are:

- Bash is awful (should be replaced with Python, and there are so many Python-based shells now that reify that opinion)

- C/C++ are awful

- Various crusty bits of Linux that haven't aged well besides the above items (/usr/bin AND /usr/local/bin? multiple users on one computer?? user groups??? entering sudo password all the time????)

- The design blunder of assuming that people will read insanely dense man pages (the amount of StackOverflow questions[0][1][2] that exist for anything that can be located in documentation are a testament to this)

- And of course no bideo gambes and other things (although hopefully webapps will liberate us from OS dependence soon[3][4])

0: https://stackoverflow.com/questions/3689838/whats-the-differ...

1: https://stackoverflow.com/questions/3528245/whats-the-differ...

2: https://stackoverflow.com/questions/3639342/whats-the-differ...

3: https://discord.com/

4: http://wasm.continuation-labs.com/d3demo/

- Python as a shell would be the worst crapware ever. Whitespace syntax, no proper file autocompletion, no good pipes, no nothing. Even Tclsh with readline would be better than a Python shell.

- By mixing C and C++ you look like a clueless youngster.

- /usr/local is to handle non-base/non-packages stuff so you don't trash your system. Under OpenBSD, /usr/local is for packages, everything else should be under /opt or maybe ~/src.

- OpenBSD's man pages slap hard any outdated StackOverflow crap. Have fun with a 3 yo unusable answer. Anything NON-docummented on -release- gets wrong fast. Have a look on Linux HOWTO's. That's the future of the StackOverflow usefulness over the years. Near none. Because half of the answers will not apply.

You took that python shell too literally. Ipython included a shell profile (you can still use it yourself) and https://xon.sh/ works just fine.

The ISO C and C++ papers are full of C/C++ references, are the ISO masters clueless youngsters?

Or for that matter the likes from Apple, Google and Microsoft?

EVen C99 has nothing to do with bastard childs such as C++.

The last is convoluted crap, full of redundant functions.

Further C standards do exist, but C99 and heck, even ANSI C can be good enough for daily uses.

Perl, Ruby and PHP look close to each other, being the first the source of inspiration.

But don't try to write PHP/Ruby code as if it was something close to Perl. The same with C++.

> Even Tclsh with readline would be better than a Python shell

In fact, I think Tcl would make an excellent sh replacement.

So, wish(1)?

There were attempts in order to create an interactive shell from the TCL interpreter, but the syntax was a bit off (upvar and a REPL can drive you mad), but if you avoided that you could use it practically as a normal shell. Heck, you can run nethack just fine under tclsh.

> multiple users on one computer??

Last I checked this is something all the mainstream desktop operating systems support, so for you to call this out as a linux weirdness is baffling.

- Bash is awful (should be replaced with Python, and there are so many Python-based shells now that reify that opinion)

It has its pain points, but the things that (ba)sh is good at, it's really good at and python, in my experience, doesn't compete. Dumb example: `tar czf shorttexts.tar.gz $(find . -type f | grep ^./...\.txt) && rsync shorttexts.tar.gz laptop:`. I could probably make a python script that did the equivalent, but it would not be a one-liner, and my feeling is that it would be a lot uglier.

- Various crusty bits of Linux that haven't aged well besides the above items (/usr/bin AND /usr/local/bin? multiple users on one computer?? user groups??? entering sudo password all the time????)

In order: /usr/bin is for packaged software; /usr/local is the sysadmin's local builds. Our servers at work have a great many users, and we're quite happy with it. I... have no idea why you would object to groups. If you're entering your sudo password all the time, you're probably doing something wrong, but you're welcome to tell it to not require a password or increase the time before reprompting.

- The design blunder of assuming that people will read insanely dense man pages (the amount of StackOverflow questions[0][1][2] that exist for anything that can be located in documentation are a testament to this)

I'll concede that many GNU/Linux manpages are a bit on the long side (hence bro and tldr pages), but having an actual manual, and having it locally (works offline) is quite nice. Besides which, you can usually just search through it and find what you want.

- And of course no bideo gambes and other things (although hopefully webapps will liberate us from OS dependence soon[3][4])

Webapps have certainly helped the application situation, but Steam and such have also gotten way better; it's moved from "barely any video games on Linux" to "a middling amount of games on Linux".

> - Bash is awful (should be replaced with Python, and there are so many Python-based shells now that reify that opinion)

You know that no one uses Bash these days right?

Fish shell, Zsh are the much newer and friendlier shells these days. All my shells run on Fish these days.

I have to tell you that my shell ergonomics has improved a lot after I started using fish shell.

But to your point, _why_ is Bash awful?

> You know that no one uses Bash these days right?

I strongly disagree although, just like you, I have no evidence to support my claim

> I strongly disagree although, just like you, I have no evidence to support my claim

Haha. Fair enough!

I don't get this. Linux is just the kernel, there's a variety of OS distributions which allow you to customise them infinitely to be what ever you want.

I was also skeptical of that statement. I did a google trends search and indeed there seems to be a slow decline in command-line related searches:


My complaints with unix (as someone running linux on every device, starting to dip my toe into freebsd on a vps); apologies for lack of editing:

> everything is text

I'd often like to send something structured between processes without needing both sides to have to roll their own de/serialization of the domain types; in practice I end up using sockets + some thrown-together HTTP+JSON or TCP+JSON thing instead of pipes for any data that's not CSV-friendly

> everything (ish) is a file

> including pipes and fds

To my mind, this is much less elegant when most of these things don't support seeking, and fnctls have to exist.

It'd be nicer if there were something to declare interfaces like Varlink [0, 1], and a shell that allowed composing pipelines out of them nicely.

> every piece of software is accessible as a file, invoked at the command line

> ...with local arguments

> ...and persistent global in the environment

Sure, mostly fine; serialization still a wart for arguments + env vars, but one I run into much less

> and common signalling facility

As in Unix signals? tbh those seem ugly too; sigwinch, sighup, etc ought to be connected to stdin in some way; it'd be nice if there were a more general way to send arbitrary data to processes as an event

> also nice if some files have magic properties like /dev/random or /proc or /dev/null

userspace programs can't really extend these though, unless they expose FUSE filesystems, which sounds terrible and nobody does

also, this results in things like [2]...

> every program starts with 3 streams, stdin/stdout for work and stderr for out of band errors

iow, things get trickier once I need more than one input one one output. :)

I also prefer something like syslog over stderr, but again this is an argument for more structured things.

[0]: https://varlink.org/

[1]: ideally with sum type support though, and maybe full dependent types like https://dhall-lang.org/

[2]: https://www.phoronix.com/scan.php?page=news_item&px=UEFI-rm-...

> As in Unix signals? tbh those seem ugly too; sigwinch, sighup, etc ought to be connected to stdin in some way; it'd be nice if there were a more general way to send arbitrary data to processes as an event

Well, as the process can arbitrarily change which stdin it is connected to, as stdin is a file, you need some way to still issue directions to that process.

However, for the general case, your terminal probably supports sending sigint via ctrl + c, siquit via ctrl + backslash, and the suspend/unsuspend signals via ctrl + z or y. Some terminals may allow you to extend those to the various other signals you'd like to send. (But these signals are being handled by the shell - not by stdin).

Sure, I mean more like there'd be some interface ByteStream, and probably another interface TerminalInput extends ByteStream; TerminalInput would then contain the especially terminal-related signals (e.g. sigwinch). If a program doesn't declare that it wants a full TerminalInput for stdin, the sigwinches would be dropped. If it declares it wants one but the environment can't provide one (e.g. stdin is closed, stdin is a file, etc.) it would be an error from execve().

Other signals would be provided by other resources, ofc; e.g. sigint should probably be present in all programs.


In general, my ideal OS would have look something like [0], but with isolation between the objects (since they're now processes), provisions for state changes (so probably interfaces would look more like session types), and with a type system that supports algebraic data types.

[0]: https://www.tedinski.com/2018/02/20/an-oo-language-without-i...

I have written something on this topic previously: http://blog.rfox.eu/en/Programmers_critique_of_missing_struc...

> iow, things get trickier once I need more than one input one one output. :)

you can pass as many FDs to a child as you want, 0/1/2 is just a convention

At that point you're writing something for /usr/libexec, not /usr/bin though. (i.e., at that point it becomes sufficiently inconvenient to use from the shell that no user-facing program would do so.)

That's pretty much false on every point.

* https://unix.stackexchange.com/a/331104/5132

Unix is seriously cool and I consider myself still young :-) But it could be better. See https://relational-pipes.globalcode.info/

I use pipelines as much as the next guy but every time I see post praise how awesome they are, I'm reminded of the Unix Hater's Handbook. Their take on pipelines is pretty spot on too.


I mostly like what they wrote about pipes. I think the example of bloating they talked about in ls at the start of the shell programming section is a good example: if pipelines are so great, why have so many unix utilities felt the need to bloat?

I think it a result of there being just a bit too much friction in building a pipeline. A good portion tends to be massaging text formats. The standard unix commands for doing that tend to have infamously bad readability.

Fish Shell seems to be making this better by making a string which has a syntax that makes it clear what it is doing: http://fishshell.com/docs/current/cmds/string.html I use fish shell, and I can usually read and often write text manipulations with the string command without needing to consult the docs.

Nushell seems to take a different approach: add structure to command output. By doing that, it seems that a bunch of stuff that is super finicky in the more traditional shells ends up being simple and easy commands with one clear job in nushell. I have never tried it, but it does seem to be movement in the correct direction.

It's less that pipelines are friction, they're really not.

It's more that people like building features and people don't like saying no to features.

The original unix guys had a rare culture that was happy to knock off unnecessary features.

> Nushell seems to take a different approach: add structure to command output. By doing that, it seems that a bunch of stuff that is super finicky in the more traditional shells ends up being simple and easy commands with one clear job in nushell. I have never tried it, but it does seem to be movement in the correct direction.

I tried nushell a few times and the commands really compose better due to the structured approach. How would one sort the output of ls by size in bash without letting ls do the sorting? In nushell it is as simple as "ls | sort-by size".

> The Macintosh model, on the other hand, is the exact opposite. The system doesn’t deal with character streams. Data files are extremely high level, usually assuming that they are specific to an application. When was the last time you piped the output of one program to another on a Mac?

And yet, at least it's possible to have more than one Macintosh application use the same data. Half the world has migrated to web apps, which are far worse. As a user, it's virtually impossible to connect two web apps at all, or access your data in any way except what the designers decided you should be able to do. Data doesn't get any more "specific to an application" than with web apps.

Command-line tools where you glue byte streams are, in spirit, very much like web scraping. Sure, tools can be good or bad, but by design you are going to have ad-hoc input/output formats, and for some tools this is interleaved with presentation/visual stuff (colors, alignment, headers, whitespaces, etc.).

In a way web apps are then more alike standard unix stuff, where you parse whatever output you get, hoping that it has enough usable structure to do an acceptable job.

The most reusable web apps are those that offer an API, with JSON/XML data formats where you can easily automate your work, and connect them together.

> The most reusable web apps are those that offer an API, with JSON/XML data formats where you can easily automate your work, and connect them together.

Sadly these kind are also the most rare.

I think of the Unix Hater's Handbook as a kind of loving roast to Unix, that hackers of the time understood to be humorous (you know, how people complain about the tools they use every day, much like people would later complain endlessly about Windows) and which was widely misunderstood later to be a real scathing attack.

It hasn't aged very well, either. "Even today, the X server turns fast computers into dumb terminals" hasn't been true for at least a couple of decades...

> It hasn't aged very well, either. "Even today, the X server turns fast computers into dumb terminals" hasn't been true for at least a couple of decades...

You're not wrong, but that's only because people wrote extensions for direct access to the graphics hardware... which obviously don't work remotely, and so aren't really in the spirit of X. It's great that that was possible, but OTOH it probably delayed the invention of something like Wayland for a decade+.

It's been ages since I've used X for a remote application, but I sometimes wonder how many of them actually really still work to any reasonable degree. I remember some of them working when I last tried it, albeit with abysmal performance compared to Remote desktop, for example.

Fair enough! I wasn't really thinking of the remote use case which was indeed X's main use case when it was conceived (and which is not very relevant today for the majority of users).

In any case, this was just an example. The Handbook is peppered with complaints which haven't been relevant for ages. It was written before Linux got user-friendly UIs and was widespread to almost every appliance on Earth. It was written before Linux could run AAA games. It was written before Docker. It was written before so many people knew how to program simple scripts. It was written before Windows and Apple embraced Unix.

If some of these people are still living, I wonder what they think of the (tech) world today. Maybe they are still bitter, or maybe they understand how much they missed the mark ;)

Oh, certainly. It was really just a nitpick :).

It's not great even on a local network I can use Firefox reasonably well, but I can entirely forget that on current remote working environment with VPN and SSH... And this is from land-line connection with reasonably powerful machines at both ends. A few seconds response time makes user-experience useless...

No, I think it's mostly just as angry and bitter as it sounds, coming from people who backed the wrong horse (ITS, Lisp Machines, the various things Xerox was utterly determined to kill off... ) and got extremely pissy about Unix being the last OS concept standing outside of, like, VMS or IBM mainframe crap, neither of which they'd see as improvements.

Just because Unix was the last man standing, doesn't mean it's good. Bad products win in the marketplace all the time and for all kinds of reasons. In Unix's case I'd argue the damage is particularly severe because hackers have elevated Unix to a religion and insisted that all its flaws are actually virtues.

I am firmly in the "ugh" camp. I strongly suspect that the fawning that occurs over pipelines is because of sentimentality more than practicality. Extracting information from text using a regular expression is fragile. Writing fragile code brings me absolutely no joy as a programmer - unless I am trying to flex my regex skills.

If you really look at how pipelines are typically used is: lines are analogous to objects and [whatever the executable happens to use] delimiters are analogous to fields. Piping bare objects ("plain old shell objects") makes far more sense, involves far less typing and is far less fragile.

If you're having to extract your data using a regex, then the data probably isn't well-formed enough for a shell pipeline. It's doable, but a bad idea.

Regex should not be the first hammer you reach for, because it's a scalpel.

I recently wanted cpu cores + 1. That could be a single regex. But this is more maintainable, and readable:

    echo '1 + '"$(grep 'cpu cores' /proc/cpuinfo | tail -n1 | awk -F ':' '{print $2}')" | bc
There's room for a regex there. Grep could have done it, and then I wouldn't need the others... But I wouldn't be able to come back in twelve months and instantly be able to tell you what it is doing.

Compare to using a real programing language, such as Go:

  return runtime.NumCPU() + 1
Sure, there needs to be os to program (and possibly occasionally program to program) communication, but well defined binary formats are easier to parse in a more reliable manner. Plus they can be faster, and moved to an appropriate function.

OP wants number of CORES, not CPUs (excluding HT 'cores'). I know this is old, but if someone finds their way here looking for a solution...

In BASH number of CPUs + 1 is as easy as:

echo $(($(getconf _NPROCESSORS_ON)+1))


echo $(($(nproc)+1))

getconf even works on OS X!

Many of the other solutions confuse the two. As far as I can tell, using OPs solution is one of the better ways to get core count.

Could also: echo $(($(lscpu -p | tail -n 1 | cut -d',' -f 2)+2)) ...but OPs solution is probably easier to read...

Would anyone be able to approach this, without knowledge of /proc/cpuinfo, and understand what `$2` refers to?

> Regex should not be the first hammer you reach for, because it's a scalpel.

You used a regex tool: grep.

It's plaintext, you could just look at it. That's the beauty of text based systems.

This is where you need to use awk.

    awk -F ':' '/cpu cores/ {a = $2 + 1} END {print a}' /proc/cpuinfo

> This is where you need to use awk.

I think need is a strong statement in this context.

  echo $(( 1 + $(grep 'cpu cores' /proc/cpuinfo | head -1 | awk -F ':' '{print $2}') ))

Choosing shell vs. [insert modern] programming is a matter of trade-off of taste, time, $$$, collective consensus (in a team setting) and so forth.

[edit]: Forgot code syntax on HN.

That seems overly complicated to me. Can you not just:

     grep '^processor' /proc/cpuinfo | wc -l

I love UNIX but I think on this point I'm with the haters.

Practically speaking, as an individual, I don't have the needs of the DMV. For my personal use I'm combing through pip-squeaky data file downloads and CSV's.

So even though using Python or some other language causes a gigantic performance hit, it's just the difference between 0.002s and 0.02 seconds: a unit of time/inefficiency so small I can't ever perceive it. So I might also well use a language to do my processing, because at my level it's easier to understand and practically the same speed.

From the intro:

> As for me? I switched to the Mac.

It's amusing that Apple would go on to switch to a unix-based OS in '01.

Apple released their first UNIX based OS in 1988.

Exercise for the reader to find out its name.

Hum... Those other abstractions on that section are great and everything, but it really sucks to write them interactively.

Pipelines also lack some concept similar to exceptions, but it would also suck to handle those interactively.

  set -o pipefail

Let's be more hardcore

    set -euo pipefail

Why not a littel bit more? ;)

  set -euf -o pipefail

Sadly set -e is busted, though, I do use it.

Thanks for sharing! I'll have to give that a read sometime. It looks quite interesting.

I read the introduction. Sounds like Norman would have liked to hear about Plan 9.

I have a hard time believing he would have been unfamiliar with Plan 9. It wasn’t exactly obscure in the research community at the time. See the USENIX proceedings in the late 80s and 90s.

This is mere speculation, but I doubt he would have appreciated Plan 9.

I did notice that he wrote about problems in Unix that were solved in P9; that's the reason I made the comment.

Pipes are a great idea, but are severely hampered by the many edge cases around escaping, quoting, and, my pet peeve, error handling. By default, in modern shells, this will actually succeed with no error:

  $ alias fail=exit 1
  $ find / | fail | wc -l; echo $?
You can turn on the "pipefail" option to remedy this:

  $ set -o pipefail
  $ find / | fail | wc -l; echo $?
Most scripts don't, because the option makes everything much stricter, and requires more error handling.

Of course, a lot of scripts also forget to enable the similarly strict "errexit" (-e) and "nounset" options (-u), which are also important in modern scripting.

There's another error that hardly anyone bothers to handle correctly:

  x=$(find / | fail | wc -l)
This sets x to "" because the command failed. The only way to test if this succeeded is to check $?, or use an if statement around it:

  if ! x=$(find / | fail | wc -l); then
    echo "Fail!" >&2
    exit 1
I don't think I've seen a script ever bother do this.

Of course, if you also want the error message from the command. If you want that, you have to start using name pipes or temporary files, with the attendant cleanup. Shell scripting is suddenly much more complicated, and the resulting scripts become much less fun to write.

And that's why shell scripts are so brittle.

Just use a better shell. rc handles this wonderfully, $? is actually called $status, and it's an array, depending on the number of pipes.

set -e makes another pain for command that nonzero isn't mean failed (ex. diff). It changes semantics for whole script.

If by pain you mean you have to handle errors, yes, that's what you have to do. It's no different from checking the return code of functions in C.

But there are tools that don't follow this paradigm. Famously, grep fails with err=1 if it doesn't find a match, even though 'no match' is a perfectly valid result of a search.

Yes. So I don't use set -e.

I love pipelines. I don't know the elaborate sublanguages of find, awk, and others, to exploit them adequately. I also love Python, and would rather use Python than those sublanguages.

I'm developing a shell based on these ideas: https://github.com/geophile/marcel.


Piping is great if you memorize the (often very different) syntax of every individual tool and memorize their flags, but in reality unless it's a task you're doing weekly, you'll have to go digging through MAN pages and documentation every time. It's just not intuitive. Still to date if I don't use `tar` for a few months, I need to lookup the hodge podge of letters needed to make it work.

Whenever possible, I just dump the data in Python and work from there. Yes some tasks will require a little more work, but it's work I'm very comfortable with since I write Python daily.

Your project looks like, but honestly iPython already lets me run shell commands like `ls` and pipe the results into real python. That's mostly what I do these days. I just use iPython as my shell.

+1, but I use jupyter instead of IPython

The lispers/schemers in the audience may be interested in Rash https://docs.racket-lang.org/rash/index.html which lets you combine an sh-like language with any other Racket syntax.

also what I think is the 'original' in this domain, scsh

Your project looks really cool.

I am pretty sure I've seen a Python-based interactive shell a few years ago but I can't remember the name. Have you heard of it?

I imagine you are thinking of xonsh? https://xon.sh/

Do you mean IPython? My understanding is that IPython more of a REPL for Python, less of a shell.

Perhaps you are thinking of xonsh https://xon.sh/

Edit: x1798DE beat me to it :D

Xonsh, that's it!

Thanks to you and x1798DE, and to geophile for the attempt :-)

I could not find the right keywords on two different search engines to find it. This never happens.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact