Long story short: pipes and FIFOs are implemented on a virtual pipeFS and an internal 64K buffer is used for holding data in memory when transferring between processes. Locks on VFS inodes on pipefs are used for synchronization across threads / processes.
(Full Disclosure: I'm the author; this work was done as part of my Master's degree and discusses Pipes and FIFOs as implemented on pipefs for a kernel around 2011).
Hope this is interesting to the people on the thread.
Interesting. What exactly have you done for your Master's project? Contributing code to the linux kernel or was it some literature work (such as making a presentation about the code)?
Can you elaborate on how to identify the filename for a pipe in procfs or sysfs in a simple "echo hello | wc -c" example?
(There's an interesting note in Documentation/filesystems/vfs.txt about how pseudo-filesystems like pipefs can generate these names only when someone asks for them, since they're not used for anything otherwise. The only way I know of to ask for the name in this case is to call readlink() on the pipe fd under /proc/$pid/fd, as ls does.)
If I remember correctly, this work was an end of semester advanced OS/Networking class presentation. I picked Pipes, others picked the filesystem, device drivers, CUDA, process scheduling etc. For the master’s thesis, my work was on active indoor localization and tracking (early work before indoor Google maps was launched).
Re: your second question, I guess you’re referring to identifying a FIFO on the filesystem - that’s a simple ls -la - FIFOs show up as regular files with the p attribute. For procfs or sysfs - just ls /proc or ls /sys
You should see the p attribute for FIFOs on the filesystem. Sysfs and procfs map onto internal (in-memory) kernel data structures, so those might just show up as regular files in the “virtual” filesystem. So cat, grep etc. on these files will be just reads/writes from/to the appropriate memory space in the kernel or for read only files, they may be “code generated output” that allows for inspection of some internal state in the kernel.
Thanks for adding context, that wasn't clear at all to me, it seemed like the classical "Reads headline and then some comments, and post 'smart' stuff which the original article read anyway" response to me.
Now I'm confused and curious as to what the ‘library filing scheme’ means. Does it mean shared libraries? IIRC Unix stuff was statically-linked at first. And then again, what are ‘indexing’ and ‘data path switching’?
+1 for Kernighan's book. I'm not one for impulse purchases, but as when I saw a link on HN, I immediately bought it and read through it. I lent it to a professor of mine and he's using as a text for his history course next semester.
Pipes are bytes Streams. Communication between actors as well as CSP channels are object streams.
Pipes exercise backpressure on the writer - if the pipe is full the writer is blocked. Actor systems mostly use seemingly unbounded queues. The sender will not get blocked.
> Pipes exercise backpressure on the writer - if the pipe is full the writer is blocked. Actor systems mostly use seemingly unbounded queues. The sender will not get blocked.
Erlang does suspend (block) processes that send to ports or nodes when the buffers for that get full; but not when sending to local processes. There used to be an optional reduction count punishment for senders when sending to messages with larger mailboxes, but it seems that may have been removed. I don't think it would be too hard to add a feature where sending to a local mailbox over a specified size caused the sender to be suspended, but tracking it might be a little difficult.
TL;DR: McIlroy was applying the concept of coroutines, which was described by Melvin Conway in 1963. Two processes communicating over a pipe are basically coroutines, except instead of passing structured data they're just passing bytes.
A Unix pipeline is a set of multiple coroutines. In fact, Tony Hoare's 1978 Communicating Sequential Processes paper cites the UNIX shell[1] for the concept of coroutines, and discussion of coroutines figures prominently in that paper. See https://www.cs.cmu.edu/~crary/819-f09/Hoare78.pdf CSP basically models the behavior of a large set of coroutines.
AFAIU, Erlang was partly inspired by CSP. You can draw a straight line from coroutines, through Unix pipelines and CSP, to Erlang's processes.
Modern pipes have to serve many lords. Besides bog-standard stdio redirection they also act as MPMC queues of semaphore tokens in build systems[0] but also as handle to a kernel-owned io-vecs for zero-copy DMA via sendfile, splice and friends.
That commit is a fascinating illustration of how pipes allow a model of inter-process interaction that goes beyond what you can do with temporary files, as well as how much further things have evolved from the days of the "very conservative locking" in readp/writep from the 6E Unix kernel.
Nice write up. The logic of the sleep and wakeup code, which explicitly pass control from writer to reader and back again, clearly shows how pipes lead to the more refined Communicating Sequential Processes (CSP) concept.
> whose troff dialect still underlined words with a string of literal ^H backspaces followed by underscores!
... which is still how roff tools do it today. Manual formatters still send, even today, TTY Model 37 style input from 1969 to the manual pager: underlining with BS and the underscore character, boldface with BS and overprinting, and bullet points formed by printing a plus sign over the letter "o"; all of which less/more/pg/most have to recognize (but ironically actually do not).
The relatively modern (1976!) capabilities of GNU groff were deliberately turned off at the turn of the 21st century.
By the way: File-based pipes were later created on a specific "pipe device", whose device number was configured by the /etc/config program and was not necessarily the root.
> Manual formatters still send, even today, TTY Model 37 style input from 1969 to the manual pager
Thanks, you are perfectly right. What I found charming was that the 3E pipe.2 manual page contains «word^H^H^H^H____» written out by hand in the roff _source_. The 4E one switched to using ".it word" instead.
So do modern Linux pipes work bi-directionally? Of course the shell doesn't use them like that. But that does not necessarily mean that the kernel wouldn't support it. I vaguely remember that a colleague used a pipe bi-directionally between 2 C programs he wrote. To my surprise it mostly worked. IIRC there were some minor issues make him giving up the approach. The big surprise for me was that it worked at all. Or is it just racy beyond all control that you read back your own data if the other end has happened not to empty the buffer?
No, they do not (nor on the modern BSD kernels, as far as I can tell). The Linux pipe(7) manpage says (under "Portability notes"):
«On some systems (but not Linux), pipes are bidirectional:
data can be transmitted in both directions between the
pipe ends. POSIX.1 requires only unidirectional pipes.
Portable applications should avoid reliance on
bidirectional pipe semantics.»
I believe the systems that supported bidirectional pipes were SysV kernels that implemented pipes using STREAMS and 4(?)BSD kernels that implemented it using socketpair.
That said, it is not based on socketpair any more. sys/kern/sys_pipe.c says:
/*
* This file contains a high-performance replacement for the socket-based
* pipes scheme originally used in FreeBSD/4.4Lite. It does not support
* all features of sockets, but does do everything that pipes normally
* do.
*/
Yes, xv6 is great. The original post here already has a brief section on the code from xv6/pipe.c, which made for pleasant reading after I had just finished working my way through the 6E code.
I also looked at the pipe implementation in Minix, which is a (non-trivial) variant of John S. Dyson's implementation that the BSDs share. It is implemented as a server (in the microkernel sense), so there's quite some added complexity there in handling "vmount"s and locking, but there are still some familiar elements of the code too, such as the "put it all together with flags" code in create_pipe().
For something even further along these lines, there's also the pipe implementation from Plan9, which at first glance felt so unfamiliar that I wasn't sure I was looking in the right place:
And it has a pipes implementation, which is saying something for how ascetic it is. It's only got a couple dozen syscalls total, and is smaller than sel4 at least in lines of code.
I'd greatly appreciate pipes in the next version of Python. Pandas is huge now, and the R programming language has implemented this for its dataframes. There are Python packages for this, but they were buggy for me.
What would that mean? Just an extra syntax for passing like, the output of one iterator to a function that accepts an iterable as input? Like, syntactic sugar for:
The %>% notation might not be very short, but there are other short ways of avoiding parenthesis mess as in foo = qux(bar(foo(x))). For instance, Mathematica/Wolfram language allows you to write something basically
https://www.slideshare.net/divyekapoor/linux-kernel-implemen...
Long story short: pipes and FIFOs are implemented on a virtual pipeFS and an internal 64K buffer is used for holding data in memory when transferring between processes. Locks on VFS inodes on pipefs are used for synchronization across threads / processes.
(Full Disclosure: I'm the author; this work was done as part of my Master's degree and discusses Pipes and FIFOs as implemented on pipefs for a kernel around 2011).
Hope this is interesting to the people on the thread.