> Current performance and test counts on a 40 core system are: $ time make -j $(nproc) check SUBDIRS=.
13s
$ # time make -j $(nproc) check RUN_EXPENSIVE_TESTS=yes
1m22.244s for 9 extra expensive tests
That's pretty respectable, given that coreutils include 98 programs (some are simple like yes(1) and true(1), but most of them are used millions of times a day to do real work: ls(1), kill(1), cat(1), wc(1).
In fact, I used wc(1) to count the number of separate programs inside coreutils.
I have to wonder if this was either just someone being like "I want to make `yes` as fast as possible" or if there was an actual need to make such an elaborate program for something that spits out "y\n" repeatedly.
It also, frankly, feels like the wrong layer for such an optimization. I would have hoped there was a c "write to stdout" method that does all the buffering and performance tricks this thing does.
> If you have a vague recollection of the internals of a Unix program, this does not absolutely mean you can’t write an imitation of it, but do try to organize the imitation internally along different lines, because this is likely to make the details of the Unix version irrelevant and dissimilar to your results.
> For example, Unix utilities were generally optimized to minimize memory use; if you go for speed instead, your program will be very different.
So I think a lot of coreutils etc were written for extreme speed so there would obviously be no crossover with existing UNIX source.
I love using it as an example of how implementing something so simple can lead to learning about so many seemingly unrelated things like context switches and memory alignment.
It's a dumb tradeoff, IMO. The job of `yes` is to produce output only when it's being read, and this implementation copies the "y\n" until it fills up a BUFSIZ-sized buffer (8KB on my system), then outputs that buffer until the write fails. This means you're paying the cost to fill the buffer even if you're only reading one line (which is a common use case for "yes": Responding with "y" to software that is asking you for confirmation, which generally only reads input once.) Which means "yes" will always occupy at least 8kb of RAM even thought it doesn't need to, and you're spending thousands of CPU cycles copying into a buffer even though you don't need to.
That inefficiency is a far bigger sin than the "slowness" of yes needing a write() call for every line it emits, given the intended purpose of the command (which is not, as the code suggests, to saturate a unix pipe as fast as you can.)
Except that it does not display a single yes line by default, so its primary purpose is effectively to print them forever. Also, in this particular corner of the netsphere we should applaud the maddening research for performance that people put into that seemingly obvious command
Edit: riquito is totally right here, I'm wrong. I'm leaving this for posterity, but this is all incorrect: stdout buffering by default will allow the "yes" process to write a lot of data even if you do a naive loop of printf calls. Writing to stdout will not block just because nothing's reading from it, the writes will be held in a decently-sized buffer by default, so even the naive approach will be wasting energy. I'll leave this for my own shame:
It prints them as long as the receiving process reads them. That’s how pipes work. When piping to a process that only reads from stdin once (you know, like the original purpose of the command, to respond “yes” to prompts from processes that are asking for confirmation), it will only write once (because stdout is line-buffered, so printing a string with a newline will block until something reads it.) Filling a buffer with 4,000 instances of “y\n” on the off chance that the receiving process will actually read all of those, is doing extra work that may not be needed.
“Maddening research for performance” in this case is coming at the expense of energy expenditure: you’re always allocating that memory and always filling a buffer with thousands of “y”s even when it’s just going to get thrown out. That should not be applauded. People should be just as concerned about energy usage as they are about wall clock time.
On the other hand, 8kb is just 2 virtual memory pages; overhead of running a process in the first place will be bigger than that. And the time of filling the 8kb buffer is likely not much more than the cost of a context switch to the reading process.
You're optimizing for the wrong thing. Sure, it's "inefficient" for a single prompt, but what about when you're doing a batch operation with millions of prompts that you actually need to be fast?
That's pretty respectable, given that coreutils include 98 programs (some are simple like yes(1) and true(1), but most of them are used millions of times a day to do real work: ls(1), kill(1), cat(1), wc(1).
In fact, I used wc(1) to count the number of separate programs inside coreutils.