
Pipes and Filters - rbc
http://blog.petersobot.com/pipes-and-filters
======
js2
Some random thoughts:

• Though the first pipeline is didactic, it can be done entirely within awk:

    
    
        awk '
        BEGIN { l=0 }
        /purple/ {
            if(length($1) >= l) { word = $1; l = length($1) }
        }
        END { print word }' < /usr/share/dict/words
    

• Named pipes are neat, but you can also use subshells and additional FDs (I
am in no way arguing this is more clear):

    
    
        (
          (
            (
              echo out
              echo err >&2
            ) | while read out; do echo "out: $out"; done >&3
          ) 2>&1 | while read err; do echo "err: $err"; done
        ) 3>&1
    
    

• Bash has "set -o pipefail" for cases where you want any process in the
pipeline that exits non-zero to cause the entire pipeline to exit non-zero.

------
robertduncan
Error detection is much easier with pipefail:

[http://www.gnu.org/software/bash/manual/html_node/Pipelines....](http://www.gnu.org/software/bash/manual/html_node/Pipelines.html)

"If pipefail is enabled, the pipeline’s return status is the value of the last
(rightmost) command to exit with a non-zero status, or zero if all commands
exit successfully"

~~~
psobot
Pipefail is awesome - I had no idea this even existed. Thanks!

------
bkirwi
If you like pipes, you'll love Pipe Viewer:

[http://www.ivarch.com/programs/pv.shtml](http://www.ivarch.com/programs/pv.shtml)

> pv - Pipe Viewer - is a terminal-based tool for monitoring the progress of
> data through a pipeline. It can be inserted into any normal pipeline between
> two processes to give a visual indication of how quickly data is passing
> through, how long it has taken, how near to completion it is, and an
> estimate of how long it will be until completion.

------
jstsch
Another fun trick, by piping through dd you can add a buffer between
processes.

Example: the raspberry pi has pretty slow SD performance and the USB bus can
get hogged. If you record audio and want to encode + write it to SD you can
easily get buffer overruns. Easily solved by a 10sec buffer between arecord
and flac in my case.

~~~
netghost
So what would that look like? Makes sense, but dd has a lot of options, and I
haven't fiddled with it much.

~~~
jstsch
Recording 48Khz, raw, piping through a five second buffer, encode to flac and
dump on a USB stick:

    
    
      arecord -D hw:1,0 -v --fatal-errors --buffer-size=192000 -f dat -t raw | dd bs=480000 | flac --endian=little --channels=2 --bps=16 --sample-rate=48000 --sign=signed -o /mnt/usbstick/`date '+%s'`.flac -
    

Gotta love Linux :)

------
cgh
Reminds me of the classic David Beazley course on coroutines:
[http://www.dabeaz.com/coroutines/](http://www.dabeaz.com/coroutines/)

It highlights a similar pipeline-oriented architecture and eventually ends up
being sort of mindblowing.

~~~
sitkack
The nice thing about the Beazley talk is shows how to move structured records
in the pipeline, with multiprocessing one could use more cores.

------
brianpgordon
I hate to be "that guy" but _someone_ has to say something about the Useless
Use of Cat.

[http://www.catb.org/jargon/html/U/UUOC.html](http://www.catb.org/jargon/html/U/UUOC.html)

~~~
djur
I've never agreed with the UUOC concept when applied to pipelines. Using cat
in a pipeline for a single file makes the flow clearly unidirectional,
prevents certain types of errors like switching > and <, allows the command to
be easily modified to handle multiple files or a glob, and frankly just seems
easier to read.

Considering the tiny overhead of an additional cat process, UUOC these days
feels like nitpicking.

~~~
unhammer
Also, the cat makes for faster testing. I tend to start out with e.g.

    
    
        head bigfile.1.txt | grep | awk | stuff
    

and refine things, and when output looks right, it's a simple "Ctrl+A Meta+D
cat RET" to run it on the full output. Or vice versa if I suddenly want to go
back to testing part of bigfile (or exchange the cat for "grep something").

If I want to change that to "< bigfile.1.txt", I have to "Ctrl+A Meta+D <
Meta+F Meta+F Meta+F Ctrl+D Ctrl+D" – the extra keypresses are to delete the
first "|" symbol. And if I suddenly want to change it back to head or grep, I
have to reinsert the | (also I often by habit do Meta+D instead of Ctrl+D at
the beginning of the line, which doesn't work as intended if the first token
is "<" instead of "cat").

Those useless cats are quite handy when doing a lot of shell work.

~~~
JetSpiegel
You can use <bigfile.txt head | grep | awk

~~~
unhammer
So if I want to change that to the whole file, I "just" have to Ctrl+A Meta+F
Meta+F Meta+D Ctrl+D Ctrl+D RET. That's not really an improvement – especially
since it depends on how many dots or similar are in the filename.

Also, that's a Useless Use of head, since grep has the option "-m10"

~~~
JetSpiegel
grep -m "Stop reading a file after NUM matching lines", while head -10 takes
the first 10 lines and searches on them. Different things.

~~~
unhammer
doh! you're right, I wasn't thinking :-)

------
daveloyall
Off topic: Today I learned [http://dcurt.is/unkudo](http://dcurt.is/unkudo).
Peter Sobot, I want my kudos back. (Not that I didn't really appreciate
learning about ${PIPESTATUS[*]}.)

~~~
thristian
Yeah, I hate svbtle for exactly that reason. Despite kudos being a thoroughly
meaningless number, the kudos widget makes me feel like I'm nine years old and
I just fell for the "a sphincter says what?" trick.

~~~
daveloyall
What?

...But, yeah, exactly. :)

Various searches on the subject revealed plenty of people noting how
meaningless Internet points are, leading to an additional meta-rub: not only
did I fall for the hover trick, I also was childish enough to google for a
svbtle kudos undo. _sigh_

~~~
roryokane
I wouldn’t say that kudos are _totally_ meaningless, nor your quest to undo
yours. By displaying kudos on their article, the author falsely purports to
have more fans (implied by the name “kudos”) than they really do. By
attempting to correct that number, you were fighting against (slight) false
advertisement, which could cause other visitors to waste their time reading an
article that you don’t actually recommend. Though bumping the number by 1 has
a negligible effect, fixing _all_ bumps that people were tricked into making
would probably save many internet readers a small amount of time.

------
Anthony-G
I’ve used pipes for years and had a pretty good understanding of how the
system calls of each process in the pipeline interacts with their own `sdin`
and `stdout` file descriptors but this article puts it all together really
nicely with some good examples.

I don’t mind the useless use of `cat` as it can enhance readability for some
people. However, I would suggest replacing the Bash while loop with a for
loop:

    
    
        ls *.flac | 
        while read song
        do 
            flac -d "$song" --stdout | 
            lame -V2 - "$song".mp3
        done
    
    
        for song in *.flac 
        do 
            flac -d "$song" --stdout | 
            lame -V2 - "$song".mp3
        done
    
    

Using Bash’s file globbing avoids problems with enumerating the output of `ls`
[1]. It also avoids an unnecessary Bash sub-shell being spawned for the while
loop that follows the first pipe. More importantly, I think it’s a lot more
readable while still demonstrating how pipes can be efficiently used to
process any amount of FLAC files.

[1]
[http://mywiki.wooledge.org/ParsingLs](http://mywiki.wooledge.org/ParsingLs)

------
whee
Neat. I have my own take on this concept[1] using Redis pub/sub instead of
queues. Tradeoffs involve being able to lose data if endpoints aren't
connected, but you do get the benefit of having multiple inputs and outputs on
one stream, which was important for my use case.

[1] [https://github.com/whee/rp](https://github.com/whee/rp)

------
daemonize
I recommend highland.js for node
[http://highlandjs.org](http://highlandjs.org)

------
runeks
I recently came across a language called Factor that works very similar, if
not identical, to this.

Here's a video about it:
[https://www.youtube.com/watch?v=f_0QlhYlS8g](https://www.youtube.com/watch?v=f_0QlhYlS8g)

~~~
matoffk
I have been using Factor for some time. It has some drawbacks related to
readability, but it is a real-time saver.

~~~
runeks
Compared against what does Factor save time? Python, C, Java?

Also, I've been looking at Factor, but I have a hard time getting into the
mindset of the paradigm, since I'm used to Python (although I'm very much used
to using pipes in the terminal). Are there any types of programs that you
prefer to write in Factor, and others you prefer to write in -- for example --
Python?

------
deathanatos

       _____________ 
      < unimpurpled >
       ------------- 
         \
          \
    

Part way through my second viewing of the article, I thought, "what is
'unimpurpled'". Wiktionary didn't know. Google doesn't return useful results
for it, even. M-W, finally, clued me in: it's an obsolete term, with an "un"
prefix, for the verb "empurple", which means to make purple[1].

[1] And a few similar things.
[https://en.wiktionary.org/wiki/empurple](https://en.wiktionary.org/wiki/empurple)

------
tomgg
I've been playing around with julia[1] this week and discovered the inclusion
of a pipe-like operator that removes a lot of the parentheses from functional
programming; you can write,

    
    
        x |> a|> b |> c|>s->d(s,y)|>e|>...
    

in julia instead of

    
    
        e(d(c(b(a(x))),y)) or (e (d (c (b (a x)) y)) 
    

...or whatever is your flavour. I reckon it is impossible to make a serious
case against that readability gain.

[1] julialang.org

~~~
tel
In Haskell there are two varieties on this. "Apply" and "compose":

    
    
              f (g (h x)) == f $ g $ h $ x
                          == f . g . h $ x
        \x -> f (g (h x)) == f . g . h
    

The "compose" combinator, (.), is especially pertinent for making pipelines.
Idiomatic Haskell code uses it constantly---especially for its natural
mechanism of eliminating "points" like that `x` above. These are usually
better described by their type than any variable name given (especially since
the variable name cannot be machine checked for meaningfulness, unlike the
type) and so are best eliminated.

In many libraries there is also a reverse apply function defined, often as &

    
    
              f (g (h x)) == x & h & g & f
                          == x & f . g . h
    

which is more popular when using other operator chains to describe functions
as in lens.

------
pdknsk
> I’m calling /usr/bin/time here to avoid using my shell’s built-in time
> command [...].

I prefer to use command in this scenario.

$ command time

------
visarga
Pipes are also related to the IO monad, similar in a way with the jQuery
syntax which is another hugely popular case. I am utterly amazed how they
invented such a powerful concept of functional programming for the shell
(well, not 100% pure functional, if there are side effects)

~~~
ky3
_utterly amazed how they invented such a powerful concept of functional
programming for the shell_

You might be interested in the classic McIlroy-Knuth dialogue:

[http://www.leancrew.com/all-this/2011/12/more-shell-less-
egg...](http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/)

The Unix "Software Tools" philosophy -- small kits composable into something
big -- is deeply shared by functional programming.

I don't think one begot the other. It was just obviously the natural thing to
do during that era.

~~~
vram22
>You might be interested in the classic McIlroy-Knuth dialogue:

Yes, that was a good article. The comments on the post were interesting too.
Saw it a while ago, and for fun, wrote two quick solutions for the problem, in
Python and shell:

[http://jugad2.blogspot.in/2012/07/the-bentley-knuth-
problem-...](http://jugad2.blogspot.in/2012/07/the-bentley-knuth-problem-and-
solutions.html)

A reader, Veky, wrote an interesting comment on my post too.

------
xiaq
The first few parts make a pretty good pipeline preach. Two interesting points
about pipelines are the concurrency and the difficulty to handle errors (at
least without cluttering the syntax), which are often missed by newcomers.

------
yellowapple
The author has earned him/herself a Useless Use of cat (UUOC) award for not
realizing that grep can take a filename argument in the example pipeline.

Basically, the example can be shortened to the following:

    
    
        grep purple /usr/share/dict words | # Find words containing 'purple' in the system's dictionary
        awk '{print length($1), $1}' |      # Count the letters in each word
        sort -n |                           # Sort lines ("${length} ${word}")
        tail -n 1 |                         # Take the last line of the input
        cut -d " " -f 2 |                   # Take the second part of each line
        cowsay -f tux                       # Put the resulting word into Tux's mouth

