Hacker News new | past | comments | ask | show | jobs | submit login
A surprisingly arcane little Unix shell pipeline example (utoronto.ca)
206 points by zdw 19 days ago | hide | past | web | favorite | 40 comments



There are some subtle points that the blog is not clear about:

- The use of `1>&2` is to overwrite the stdout of the LHS process so that `echo green` actually never writes to the pipe, so it never gets SIGPIPE.

- `echo` only echoes arguments and always ignores stdin, so putting `echo blue` on the RHS of the pipe only serves to run the two sides of the pipe in parallel

- `bash -c '(echo green 1>&2) | echo blue' 1>stdout 2>stderr` will show you that green and blue actually write to different files

Another observation:

> if it was a separate command (or even if the shell forked to execute it internally), only the 'echo red' process would die from the SIGPIPE instead of the entire left side of the pipeline.

Most linux distributions have /bin/echo as a separate program. Running `(/bin/echo red; echo green 1>&2) | echo blue` will always print both green and blue.

EDIT: fix typo of stdin/stdout/stderr as suggested


Thanks, this is helpful and why I came to the comments. Another question I had was what happens to red? You helped me answer that -- unlike echo green, the output of echo red is still connected to the pipe. The right-hand side of that pipe does nothing with it, so it disappears.

Also regarding one of your examples: maybe I'm misunderstanding, and anyway this wouldn't change your overall point, but should it maybe read like this instead: bash -c '(echo green 1>&2) | echo blue' 1>stdout 2>stderr

since stdin is normally associated with file descriptor 0


Yes you're right. That's a typo. Thanks


I’ve been doing Linux stuff for ten years and I am apparently just learning that the pipeline commands run in parallel not serially. If I had put any thought into it I would have realized it because otherwise tailing logs to grep wouldn’t work...


Pipes are commonly used when the right side is consuming the output of the left side. That introduces serialization at least insofar as the right side is forced to wait for input.

You could replace the | with an &; and get the same behavior because nothing in the 2nd stage depends on the first stage of the pipe.

In my old man grumbling at clouds voice: This is not arcane.


Perhaps not arcane, but at least somewhat counterintuitive if one's mind is too invested in the "pipe" analogy.


But physical pipes run in parallel too! If you have a pipe full of water, and you push some more water in one end, there is an instant effect at the other end. If it's empty, then sequentiality is introduced because it takes time for the water to flow through the pipe. That's how a Unix pipe works – (the processes at) both ends of the pipe come into existence at the same time, but if the right side is waiting for input (the pipe buffer is empty), it won't do anything until the left side has produced output.


That's parallel in time, but serial in "space". That might be the source of the confusion here.


> But physical pipes run in parallel too!

For normal people, pipes are as serial as can be. For the average person, parallel pipes = 2 pipes.


Twenty five here, and I still get confused about stdout/stderr redirection.


As I was just commenting in a different reply...

The shell ties the prior program's (or user input to the first) output to the input of the next program. That's (old) stdout and (new) stdin taken care of.

stderr is sent to the terminal as output by default.

For reference stdin, stdout, and stderr are normally numbered 0 through 2 respectively. When you're directing input and output on the shell (usually) the default is to wire up things as they'd make sense, but (without any IFS characters) a number on either side of the redirection operator codes tells the shell to grab a different input or output.

These will yield different results due to left to right parsing:

  echo test >/tmp/test 2>&1
  echo test 2>&1 >/tmp/test
In the second if there were any errors they'd be sent to the file, while in the first (assuming you can make temporary files, which normally you can) there won't be any errors printed but they would be sent as standard output on the shell.

Here's a more interesting example

  echo test >/dev/null/test 2>/dev/null
  echo test 2>/dev/null >/dev/null/test
The second line will silently fail (though it'll still return an exit value of 1) because the standard error output was already sent to /dev/null, while the first attempts to open a file that can't exist and prints the failure message before the error output is redirected.


Here's a cheat sheet which should cover the most common situations.

Send stdout to a file.

ls myFileWhichExists > myStdLog

- or -

ls myFileWhichExists 1> myStdLog

Send stderr to a file.

ls myFileWhichDoesNotExist 2> myErrLog

Send stdout to one file and stderr to a different file.

ls myFileWhichExists myFileWhichDoesNotExist 1> myStdLog 2> myErrLog

Send stdout and stderr to the same file

ls myFileWhichExists myFileWhichDoesNotExist 1> myBothLog 2>&1

I read that last part "2>&1" as "Send stderr (2) to the same place as stdout (1) is already going to".

Notice that if you send stdout and stderr to the same file, because of caching and other issues, the output from stdout and stderr will overlap in unpredictable ways.


Yeah, it's the ampersands and ordering and placement and numbers that always throw me, not the concepts. Thanks1


This "parallel execution" is one of the interesting things that distinguishes Unix pipelines from DOS pipelines; in DOS a temporary file is used, which was an old source of puzzlement for beginners doing a "dir | more" --- "what's that extra file I see?"

In retrospect, getting non-preemptive pipes (a type of coroutining) working in DOS would not have been all that difficult, if it weren't for the limited memory available to PCs of the time and the fact that most programs assumed they owned it all when they ran.


"Apart from the fact that DOS wasn't a multitasking operating system, concurrent execution of multiple tasks would have been easy"?

(Eventually people retrofitted a sort of multitasking with TSR programs, but that's not really the same thing.)


If you suspect a race condition between two operations, one way of testing it is to add a sleep command to one side or the other. For example:

  (sleep 1; echo red; echo green 1>&2) | echo blue
vs

  (echo red; echo green 1>&2) | (sleep 1; echo blue)


Offhand I'm unsure of the semantics of sleep...

It appears that sleep (at least on a typical modern Linux desktop) does behave similarly to echo in that it does nothing with the pipeline instead of echoing it to terminal.

It's curious because someone might assume the default behavior would be to forward all file descriptors unless something was done to the data-streams. Clearly that isn't the case, the shell ties the prior standard output to the next programs standard input irrespective of if anything ever happens to it.


I don't know what you mean by "forward" but I have the impression you don't understand how file descriptors (i.e. "open files or streams") work.

The shell sets up a pipe and connects the left side to the pipe's writing end, and the right side to the reading end. Now, the right side is actually a subshell (indicated by the parentheses "(...)"). And that subshell can spawn as many other processes, sequentially or in parallel, as it wants. All of them will get the open file descriptor (the pipe's reading end) inherited by the operating system.

If you had multiple processes in parallel reading from the pipe, the outcome would be totally nondeterministic (dependent on the kernel's scheduling behaviour. In the example case, none of the potential readers actually read (not the subshell itself, not the sleep, and not the echo).

Here's a perhaps illuminating example:

    (echo h; echo hello; echo HALLO; ) | ( read firstline; echo "Firstline is $firstline"; grep A; )
Does that help?


Arcane would be mixing the output of multiple commands in to a single text stream without any readily available means to determine their origin, then writing code based upon that output that relied upon a specific ordering of said output without a preliminary explicit sort, ie. code that was reliant upon this indeterminism to fail. In any event, diff would sort it out. ;)


thing | thing | thing > /file/I/didnt/mean/to/smash | thing

oops

thing | thing | thing > /another/file/I/didnt/mean/to/smash

oops

if you include > redirection, its parsed and processed before execution of the pipe. Even if the subsequent pipe execution moments fail, you have probably smashed /thing/you/didnt/mean/to/smash if you > into it


Which is why setting "noclobber" is really useful and oft time-saving.


sponge[1] is very useful in this scenario. The output file is written only when the `thing | thing | thing` finishes successfully.

[1]: https://linux.die.net/man/1/sponge


While testing around I found out that enclosing the commands in parantheses, reverses the commoness of 'blue green' and 'green blue' output. Can anyone explain why this happens?


Disclaimer: I am an idiot on this subject

I recently found out that you can’t easily spawn a shell and then send commands to it. It’s doable with tmux commands, but you’d think it would be easier. I just wanted to write something that locates npm/virtualenv stuff in bash, nothing fancy.


A distinction you may be having trouble with, because it's kind of hidden from the user, is the difference between a shell and a "pty" (pseudo teletype). You certainly can spawn a shell and send it commands, but because it doesn't have a pty the input is treated very differently.

That's what you get setup for you by running tmux, screen, expect, xterm, ssh etc.

(Bonus: https://askubuntu.com/questions/481906/what-does-tty-stand-f... )


"expect" was mentioned, but what are you actually trying to do that cannot be solved with a shell script, either executed the normal way or sourced in at the start of a new shell?


wow I feel like an idiot, thank you.


Just to join in the "I feel like an idiot" fun: yea, I've used tmux to do this too. First I've heard of `expect`. TIL.

Bash is pretty bad at helping you discover how to do things.


Why can't you run a shell script?


I think you want expect.


wow I feel like an idiot, thank you.


No, you are most likely not, but you are also not likely looking for expect.

It sounds like you are doing something simple in a very roundabout way. Explain instead the intended outcome and many people will be happy to help.


What do you mean by locates?


  echo ls | bash


It even seems to work with named pipes, although in my first test it exited after the first command (I suspect I'm accidentally sending a EOF when I echo the command in).

    mkfifo testpipe1
    <testpipe1 bash  # in separate window
    echo ls > testpipe1


If you have echo and a pipe, then you're most likely already in a shell, no?


Why does `red` never show?


1) If RHS exits before `echo red` is executed, `echo red` receives a SIGPIPE and crashes

2) Even if RHS does not exit quick enough, `echo red` writes output to the pipe, which is ignored by RHS `echo blue` (echo never reads stdin)


Why would you expect it to show?

Would you also expect "echo something < file.txt" to show the contents of file.txt?

Perhaps you are thinking of cat or some other command, because piping things to echo is such a strange and unexpected thing to do that you normally won't encounter it.


Because the other output of the LHS (sometimes) shows.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: