Hacker News new | past | comments | ask | show | jobs | submit login
How is GNU `yes` so fast? (2017) (reddit.com)
142 points by zbentley on June 4, 2022 | hide | past | favorite | 63 comments



Someone fixed the bottleneck in pv to hit 123 GB/s [1].

Someone else managed to double yes performance [2] via vmsplice.

[1] https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes...

[2] https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes...


Removing a bottleneck in pv to get it to 123 GB/s?! I thought: cool! This kind of work is right up my alley! I'm curious, let's see how this person did it. I read that reddit post and I see the guy is mrb_... My own work, from 4 years ago. I completely did not remember :)


Hah. That's so much better than the opposite - 'who wrote this terrible, buggy, badly formatted code??!' then finding it was you several years ago.


This made me chuckle. I've done this a handful of times in the form of SO answers. Always a fun feeling to discover you've helped yourself somehow.


I don't know this mrb_ guy you're referring to, but is he the same person as _mrb or mrb? Whoever he is, he's clearly a snake_case kind of guy... next I expect to see m_rb because that's the only other form of private member variables I've seen in C++.


I’m a little surprised there’s no references to the gnu coreutil yes source. I love speculation as much as the next guy but a lot of it could be satisfied by reading the source code. It’s fascinating that the BSD code is linked but not the primary subject!

Source for yes which does all the buffer malarkey: https://github.com/coreutils/coreutils/blob/master/src/yes.c

Source for the actual output: https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob_pla...


Maybe the OP was edited in response to this, but right now it does link to exactly the same thing as you do. Search for "Let's see how yes does it".


Can 4 year old posts still be edited?


No idea, but the link was there back in 2020:

https://web.archive.org/web/20200122231049/https://old.reddi...


They recently-ish lifted the restriction on only replying/voting in threads 6 months and younger - I'm not sure if that also applies to editing, though, nor how far back it goes


No it wasn’t edited I just didn’t notice the link.


Somehow related. FizzBuzz at 40+ GiB/s https://codegolf.stackexchange.com/a/236630


For those wondering why they may have done this, there’s a relevant section in the GNU Coding Standards:

https://www.gnu.org/prep/standards/html_node/Reading-Non_002...

This appears to be a case where they went for speed rather than simplicity.


I did this because it runs significantly faster with relatively extra complexity. It's useful for quickly generating test data etc


Thanks for the link. That was what I was thinking about when writing my response about "complex code to do trivial things is kind of a meme." but i couldn't find it.


Is this for legal or product reasons or both? It’s hard to tell from the link.


Legal reasons; it's in a section titled "Legal Issues". The idea is if you're cloning an existing program, it needs to be clear that you're not copying code from the original. This may be less of an issue with the other free implementations we have now, but a lot of these were clones of proprietary UNIX programs, and then it's really important that you can tell it's not derived from the proprietary version.


Got it, thanks for the explanation and context


The throughput of GNU yes may be higher, but the simpler unbuffered implementations probably have a shorter latency to first line output, and since in most cases the program on the other end of the pipe only reads a handful of line, or more liely just one, this means they will actually be faster.


> but the simpler unbuffered implementations probably have a shorter latency to first line output

I tested this by taking the freebsd (buffered) implementation, stripping all the iterations, and comparing it against a version which also strips all the buffering (so the latter would just `write(STDOUT_FILENO, "y\n", 2)`, and the former would first fill an 8k buffer then write that).

The unbuffered implementation has an edge of about 1%, for a variance of above 10%. The "shorter latency" is essentially nonexistent.


On the other hand the program you are piping it to likely needs to do some setup as well. Plus with the fact that shells probably spawn the pipeline in order and that the setting up the buffer is incredibly cheap it would be incredibly rare that the first write wasn't done before the program did the first read. ...of course this can be more complicated if you are CPU starved and these programs are competing.

Probably the more useful metric is total efficiency. How many interactions need to occur before the setup cost is made back up. And how many lines are read on average from yes?


Using puts, fputs or fwrite (which is what the other implementations do) also buffers, so all the GNU implementation does is special case the buffer to remove the overhead of stdio.


Is the performance gain worth the less legible implementation?


I use "yes > file" to benchmark IO, knowing that this is pretty much the "speed of light" for writes.

I could use "cp /dev/urandom file" but then it's hard to tell if the filesystem or kernel optimize a copy different from generating your own writes.


I have never read the source of yes but used it so for me it could be 20 megabytes of source for what I know - as long as it is efficient why would I care ?


How does that efficiency reflect in actual usage? I’d imagine the time waiting for a “y” would be negligible for all but the most trivial tasks.


grepping for yes being piped in shell scripts yields a couple hundred matches in my computer - small things add up


How much time is spent generating a “y” compared to do the job the yes is being piped in?

In a million years it’d have spared… about a 386 worth of compute ;-)


Tangibly related, but yes is one of my favorite apps. I use it quite often as it’s easier to just spam “yes” than dig through the man page to figure out how to make the command not ask question.

I always feel like it is the wrong way of doing it, but at the same time I just love the idea of a core utility that does nothing but spam yes.


I use it to be REALLY sure I’ve cleared my scrollback buffer.


> It looks like we can't outdo C nor GNU in this case. Buffering is the secret, and all the overhead incurred by the kernel throttles our memory access, pipes, pv, and redirection is enough to negate 1.5 GiB/s.

On my computer reading from /dev/zero (16.5GiB/s) is still significantly faster than piping yes (7.11GiB/s), so I'd be interested in an exploration of why that is. BPF would probably be useful here. Maybe when I find a moment, I'll write a follow-up, doing just that... unless someone else does it faster. ;)


I always thought the device minor number of /dev/zero should control which number you get, so you can also make useful devices like /dev/seven with minor number 7 for an infinite source of beeps, or /dev/newline with a minor number 10 for an infinite source of newlines, and /dev/rubout with minor number 127 for an infinite source of rubouts.


I see a great need for /dev/yes.


Past discussion:

"How is GNU `yes` so fast?" https://news.ycombinator.com/item?id=14542938 (872 points | 5 years ago | 334 comments)


I have not heard of yes. It sounds like it just repeats a string infinitely. What would be a common use case? Or is it intended to be a lego brick for bigger composed commands?


You pipe it as standard input to a command that asks you lots of yes/no questions on the command line to save on typing yes repeatedly.

A lot of tools have their own command line flag to do this, e.g. apt-get has -y or --assume-yes


Wouldn't you normally need to use "expect" or equivalent for this?


If you wanted to vary based on the input prompt, or if the application got confused.

expect is basically the next step up from yes for this use.


In addition to actually answering interactive prompts, it can be useful to test software that reads and processes stdin.


GNU yes is not doing anything crazy imo why are the BSDs not using GNU yes if it’s this much faster?


Because BSDs don't use the GNU userland?

As to why they don't use similar techniques, probably because they historically didn't really care enough about the performances of yes(1) to bother with it.

FreeBSD has since added output buffering to yes(1), you can see the difference between openbsd yes(1)[0] which remains utterly naïve and freebsd yes(1)[1] which uses an 8k internal buffer.

[0]: https://github.com/openbsd/src/blob/master/usr.bin/yes/yes.c

[1]: https://svnweb.freebsd.org/base/head/usr.bin/yes/yes.c?view=...


Amusingly, the OpenBSD one remains completely naïve, but still uses pledge for privilege dropping.


It's not naive, it's correct. GNU's focus is on features, kitchen sink style, performance tuning, etc. If that's what you want, it's for you. OpenBSD is focused on correctness. It's not correct to use hacks and buffering in tools to increase performance. That should be handled at the syscall interface or in the kernel or whatever.


Can you expand on why you think performance hacking degrades correctness? As I understand it, “yes” (with no arguments) is supposed to output “y\n” continuously until terminated. Throughout/latency are unspecified, admitting various implementations with various concerns, like GNU’s focus on throughout vs BSD’s focus on latency/simplicity. Even an implementation that took 10 seconds between each output could be desirable for some use cases! Is that version of “yes” less correct?


The case can be made that, given that kernel code tends to be more difficult to write and debug and is inherently security critical, there ought to be less of it. If the buffering can be generalized then it ought to be in a library to be shared, otherwise the application is the right place.


BSD devs tend to prioritise clean code over speed. I'm not certain of the implementation details here but their approach does make openbsd coreutils a more pleasant read than the GNU coreutils.


Also clean code usually results in compact code, which used to be effectively the same as fast code for the machines BSD was initially developed for.

I wouldn't be surprised if the BSD coreutils were more performant on older machines with small memory, buffers and prefetch queues (sometimes just a few bytes) than their GNU counterpart.


I would be very surprised. An 8k buffer is hardly a large imposition, and less syscalls has always been hugely beneficial.


Do the various BSD equivalents have to also work on other, esoteric OSen, or do they generally tend to fork a new version only to run on just on that flavour of BSD?


Not exactly an answer but OpenBSD sometimes has different portable versions of their software. e.g. https://www.openssh.com/portable.html


Idk about the stuff that's normally ‘included’ in the distribution (base install of the OS), but NetBSD's package manager supports a ton of platforms. I've used it on Linux and macOS.


I didn't try it but I think it would be relatively straightforward to compile GNU tools on BSD.

But generally GNU tends to focus on performance and features while BSD focuses on simplicity. GNU having ridiculously complex code to do trivial things is kind of a meme.

And also, most people don't need a ridiculously fast "yes". Usually, when you want a fast stream of bytes, for example to fill up some space, /dev/zero is a better option.


>GNU having ridiculously complex code to do trivial things is kind of a meme.

One of the reasons is that GNU had to reimlmement UNIX tools as GPL code. How do you reimplement something trivial without it looking as the original to avoid copyright claims? One of the solutions is to implement it as complicated as possible.


For the trivia, GNU "true" is an abomination which even includes a cornercase that returns "false".

Meanwhile, its BSD version essentially sums up to "return 1".

Same for much of the GNU code:

- echo: https://gist.github.com/fogus/1094067

- cat: http://9front.org/img/longcat.png


When does the speed of yes matter?


There's a whole best-selling book about it:

https://en.wikipedia.org/wiki/Getting_to_Yes


Well played. :) Interestingly, if I listed their negotiation principles here in a blind test, you might think they were about software development:

    - separate the people from the problem
    - focus on interests, not positions
    - invent options for mutual gain
    - insist on using objective criteria
I'd hire that engineer!


I've misused it as a load generator before: yes '{"some":"json"}' | kafkacat -P -t bleh (Though in my case, ~100 MB/s were enough.)


It doesn't, 99.9999% of the time. But there's probably someone, somewhere, who uses it in some crazy huge obscure script, who appreciates the work people put into optimizing it.


There is a program in the real world that consumes a line of input and processes it, faster than a naive yes can emit it?


ding ding ding

But then again, maybe the purpose is to swamp the input, maybe for testing, so the more the better, so, you never know and it's wrong to decide what all users don't need.


yes | not


When you’re really impatient with prompts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: