
The Effect of Pipe Capacity on Unix Pipeline Performance - dongyx
http://dongyuxuan.me/posts/pipeline.html
======
anonsivalley652
Awhhh. :'(

I wanted to see:

\- What happens when piping an 8 TiB file through awk to be stored on another
physical volume (proly want to use xfs).

\- A graph of various block sizes, various file sizes, pipeline sizes and
timing using a HPET. I don't know how you make an 2+-way pipe like it the
diagrams except by process substitution, except that doesn't always work and
you can't create cycles without creating FIFO (named pipes):

    
    
        cmd0 <(cmd1 here) <(cmd2 here)
    
        mkfifo -m600 ~joe/foo
        cmd0 <(cmd1 here) <(cmd2 here) | cmd3 > ~joe/foo &
        cmd4 <(cmd5 here) >(cmd6 here ~joe/foo)
    
        <>(...) # outer can r/w
        <(...) # outer can r
        >(...) # outer can w

~~~
wnoise
Most shells don't have great support for making even non-linear graphs of
(anonymous) pipe connections (though note that wilder, even dynamic graphs are
possible with the "coproc" builtin of e.g. ksh, zsh, or modern bash).

However, arbitrary graphs are possible with the system calls available -- just
create all the pipes in the spawning process with pipe(), and after fork(),
but before exec() move the right ends into the file descriptors 0, 1, and 2
with dup2().

------
duskwuff
> If the two processes exchange data in this pattern [one-to-one exchange], a
> small pipe doesn’t cause unnecessary blockings.

I don't think this is correct! Context switches are expensive -- a small pipe
size will force both processes to make more system calls to move the same
amount of data through the pipe.

In this light, the UNIX specification's requirement of 4 KB is definitely too
small; even the Linux 2.6+ default of 64 KB feels like it might be on the
small side.

~~~
dongyx
You're right. The words I took are not precise. I don't know how to express
the concept that the performance is not _essentially_ affected. Is there a
universally acknowledged concept to distinguish the difference? Like the
concept of time complexity in the algorithm area.

------
steerablesafe
I expected some examples and benchmarks with some common unix tools.

~~~
AdmiralAsshat
Ditto. This article feels like it was published before the author had finished
vetting out everything he/she wanted to demonstrate.

------
lotwxyz
Believe it or not: I have to deal with issues just like this over in webdev-
land! (See the Terminal app in
[https://dev.lotw.xyz/desk.os](https://dev.lotw.xyz/desk.os) or skip right to
it, go to [https://dev.lotw.xyz/shell.os](https://dev.lotw.xyz/shell.os))

------
thedance
I wonder how much stuff breaks if you actually change the pipe size. I bet
there’s a lot of assumptions kicking around.

~~~
jandrese
The vast majority of shell script writers don't know or care what the pipe
size is. The tools they are using treat them as just one solid stream of data.
There's not much to break.

~~~
thedance
I meant in the kernel. Like if I hooked up a fuzzer and just started swinging
the pipe size around.

~~~
jschwartzi
Considering there's a semi-contiguous allocation in the kernel I bet you'd
find some interesting bugs that way. I don't think very many people mess with
the pipe sizes other than to tune to a multiple of the input/output block size
for the application. And if you're trying to maximize throughput you're likely
to reach your optimum block count long before you start blowing out buffers.

Actually, come to think of it named pipes would be another good place to fuzz.

