Hacker News new | past | comments | ask | show | jobs | submit login
How is GNU `yes` so fast? (reddit.com)
872 points by ruleabidinguser on June 13, 2017 | hide | past | favorite | 334 comments




One thing to keep in mind when looking at GNU programs is that they're often intentionally written in an odd style to remove all questions of Unix copyright infringement at the time that they were written.

The long-standing advice when writing GNU utilities used to be that if the program you were replacing was optimized for minimizing CPU use, write yours to minimize memory use, or vice-versa. Or in this case, if the program was optimized for simplicity, optimize for throughput.

It would have been very easy for the nascent GNU project to unintentionally produce a line-by-line equivalent of BSD yes.c, which would have potentially landed them in the 80/90s equivalent of the Google v.s. Oracle case.


Came here to post this, so instead I'll just back you up with a source:

https://www.gnu.org/prep/standards/standards.html#Reading-No...


"Add a programming language for extensibility and write part of the program in that language" wow, that's just asking for trouble...


Remember that "yes" is the bottom end of complexity here, not the top. As the utility grows larger that stops looking like "asking for trouble" and starts looking like "often the only sensible solution". Expecting people to extend programs in C is often "asking for trouble", after all!

(Lately I've been really tempted to pull a Cato and start terminating my every HN post with "C delenda est." https://en.wikipedia.org/wiki/Carthago_delenda_est)


Never go full Cato.


Yeah, I mean, who even uses emacs?


Youngster!


That's the LISP programming philosophy of the creator and host of this web forum


Does writing it in a different language suffice?


Depends how different.




And the motivation: it's used as test inputs. Not sure I agree with completely destroying the readability of a perfectly readable file - but ok.


The motivation, as far as I see, was only that it "may be used."

But see the post of belorn where he argues that the error handling seems to be good.


Is the date on that accurate, did that happen in 2016? Kinda hurts the theory that they were trying to avoid Unix similarity.


It says: Mar 9, 2015


One point in favor of this simple version is that it's immediately obvious that it doesn't do the same thing as the OpenBSD version. In OpenBSD `yes a b c` will only print "a" while in GNU it prints "a b c". I did not catch that when I was reading the more complicated modern version.


(joke) They copied the API of the main() function... :-)


At the risk of sounding like a copyright newbie, shouldn't that be covered by just doing a 'clean room' implementation? As long as you can verifiably prove that you didn't copy the source, it should fall under general use (as there's really only one way of doing such a thing), right? Much like Apple can't patent rectangles, although they tried.


Even if you eventually "win" you already lost when plausible litigation began.


Normally I think readability is more important than speed. But in this particular case, I think GNU is doing the right thing optimizing the code to the limit.

This is the beautiful part of Unix: small tools that do only one thing well. Programs following this philosophy are very good abstractions. They do one very well defined thing so you can use them without having to understand how they work. I have used Unix for years and I've never felt the need to read the source code for `yes`. And because they do a very small thing, even if you need to read them, the overhead of optimization is not that much, for example, the optimized GNU yes is just under 100 LOC if you remove comments and help boilerplate. Yes, it's longer than the BSD version, but it's just a matter of minutes to understands what it does.


I totally disagree. Nobody will ever want to use `yes` at 10 GB/s. They will want it to be reliable, and this sort of over-optimisation increases the risk of bugs.


I've used 'yes' many times to generate huge amounts of data quickly. Back then, it never had the small string optimisation, but you could always run 'yes InsertReallyLongStringHere' to spew out data much faster than /dev/urandom or even /dev/zero

I'm glad it runs fast, and I hope that all OS utilities are optimised (and tested, of course!) instead of making their source code pretty. The fact is, most people want to use programs, not read them.


> The fact is, most people want to use programs, not read them.

I want to use safe programs, and programs with readable code are more likely to be properly audited.


Audited UNIX tools - does such a thing exist?


openbsd.org


Sounds like you want to stick to something with a non-GNU userspace, apparently.


> you could always run 'yes InsertReallyLongStringHere' to spew out data much faster than /dev/zero

That really doesn't make any sense, /dev/zero should be at least as fast as yes.


/dev/zero should be at least as fast as yes

I agree, all I remember is that when I tried it, /dev/zero sometimes sucked performance-wise. I can't recall the exact circumstances as it was some time ago, and could have been on any of Linux/FreeBSD/SunOS/HP-UX/IRIX - perhaps it was the fastest common way at the time?

On a recent x64 Linux, /dev/zero seems plenty fast enough now:

  $ dd bs=8k count=819200 if=/dev/zero of=/dev/null
  819200+0 records in
  819200+0 records out
  6710886400 bytes (6.7 GB, 6.2 GiB) copied, 0.331137 s, 20.3 GB/s

  $ yes | dd bs=8k count=819200 of=/dev/null
  819200+0 records in
  819200+0 records out
  6710886400 bytes (6.7 GB, 6.2 GiB) copied, 0.959551 s, 7.0 GB/s


No need for "dd", let "pv" get the data from /dev/zero directly:

$ pv < /dev/zero > /dev/null [ 16GiB/s]

But the version of yes using vmsplice() is even faster than that on my machine.


What's the line to test `yes` with `pv`?

    pv < /usr/bin/yes > /dev/null
doesn't seem to work properly. FWIW I get 330MiB/s vs 8.4GiB/s for /dev/zero.

[Incidentally first I've heard of pv but I've known about dd for a decade or two].


< and > are for file redirection, yes is a binary so you want to pipe its stdout into pv:

    yes | pv > /dev/null


So how did that redirect even work; should we be doing a `mknod` to make a "yes" device to make the comparison work (can we, does it help other than in my naive imagination).


The reason that redirect worked is because it was using the contents of the yes program instead of its output.


I know this works, but how come we can see the output of pv when it is redirected to /dev/null? Maybe I just don't understand how pipes and redirection works since I rarely use Linux :(


> I know this works, but how come we can see the output of pv when it is redirected to /dev/null?

From pv's man page:

> Its standard input will be passed through to its standard output and progress will be shown on standard error.

> Maybe I just don't understand how pipes and redirection works since I rarely use Linux :(

The Windows/DOS command line has the same concepts[0], though it's probably used less often: by default a process has 3 FDs for STDIN (0), STDOUT (1) and STDERR (2).

At a shell

* you can feed a file to STDIN via <$FILE (so `</dev/zero pv` will feed the data of /dev/zero to pv), the 0 is optional

* you can pipe an output to a command (other | command) or the output of a command to an other (command | other)

* you can redirect the STDOUT or STDERR to files via 1> and 2> (the "1" is optional so 1> and > are the same thing) (you can redirect both independently)

* you can "merge" either output stream to the other by redirecting them to &FD so 1>&2 will redirect STDOUT (1) to STDERR (2) and 2>&1 will redirect STDERR (2) to STDOUT (1), you can combine that with a regular redirection so e.g. `command >foo 2>&1 ` or with a pipe (`command 2>&1 | other`)

And you can actually create more FDs to use in your pipelines[1] though I don't remember ever seeing

[0] https://support.microsoft.com/en-us/help/110930/redirecting-...

[1] http://tldp.org/LDP/abs/html/io-redirection.html


You're redirecting stdout to /dev/null and pv is writing to stderr. If you use &> instead, stderr and stdout will both be redirected to /dev/null and you will see no output at all.


Maybe this?

    yes | pv > /dev/null


That's not quite apples-to-apples though:

cat /dev/zero | pv > /dev/null

uses a pipe like the 'yes' line above, but runs substantially slower on my computer than the redirect-only version.


'yes' to a file is how I sometimes benchmark disk speed. It should have fewer system calls than reading from /dev/zero and then writing.

I actually checked just now, and it looks like you'd be making twice as many system calls with /dev/zero compared to generating the data locally:

    strace dd count=1 bs=512 if=/dev/zero of=test
    open("/dev/zero", O_RDONLY)             = 3
    dup2(3, 0)                              = 0
    close(3)                                = 0
    lseek(0, 0, SEEK_CUR)                   = 0
    open("test2", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
    dup2(3, 1)                              = 1
    close(3)                                = 0
    read(0,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
    write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
    close(0)                                = 0
    close(1)    
While 'strace yes > test2' is just a constant stream of write() calls.

The difference matters if you're benchmarking e.g. some new SSD compared to a tmpfs on a machine with 100+ GB of RAM. It's always better if the tools have less overhead, because the comparison is more meaningful.

Also consider that it can be faster to write to a local network than to disk. I've never done it, but I imagine that the kernel's not going to want to deal with your /dev/zero calls if it's spending all of its time writing to a 10GB switch. I can imagine some very specialized storage servers that could spend most of their time writing from memory buffers to a network switch, or if you're troubleshooting a slowdown in the networking itself.


When I started this comment, I didn't think you were measuring what you thought you were measuring with those straces. 'strace yes > test2' only watches 'yes', not '> test2' (which is handled by the shell). Here's what your command outputs:

    $ strace yes > /tmp/test2
    [various initialization steps]
    write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [...]
To measure everything, I started up a whole new shell, expecting to see a 'write(1, ...)', a 'read(1)', and a 'write(f, ...)':

    $ strace -f sh -c 'yes > /tmp/test2'
    [various initialization steps]
    [pid 15839] write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [pid 15839] write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [pid 15839] write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [pid 15839] write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
    [...]
How does this possibly work? File descriptor 1 is supposed to be the terminal, not a file! Of course, the magic of file redirection:

    open("/tmp/test2", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 # Open the file, get FD 3
    fcntl(1, F_DUPFD, 10)                   = 10 # Copy FD 1 (the terminal, STDOUT) to FD 10 temporarily
    close(1)                                = 0  # Close original FD 1
    fcntl(10, F_SETFD, FD_CLOEXEC)          = 0  # Close FD 10 (the terminal, copy of STDOUT) when exec() is called
    dup2(3, 1)                              = 1  # Copy FD 3 (the file) to FD 1 (STDOUT)
    close(3)                                = 0  # Close FD 3 (the file's original descrptor)


I had the impression that you thought "yes" pipes the output to shell, and then the shell writes to disk. That's incorrect. " > file" means redirecting to a file, and therefore all write system calls actually write to disk.


That's exactly what I thought. Now, had you asked me how redirection worked I would have said "the redirection operators cause the shell to attach a file to the descriptor," but I'd never actually thought through the implications of that in terms of what syscalls get made, and the output of strace presented a rather visceral demonstration of the implications of this clever bit of design.


The ability to get 10 GB/s dummy date into a pipe from the command line could come in handy at some point, for stress testing or something. I am not sure if it is over-optimized. (And even with the optimizations the risk of bugs should be very small.)


Of course you could just pipe from /dev/zero which will easily do 10GB/s on every machine.


Just tried it,

    cat /dev/zero|pv >/dev/null 
gives me roughly 2/3 the speed of yes. (5 GB/s vs 7 GB/s) Plus yes gives you an arbitrary string, instead of only zeroes.


Tobik already wrote it

but here's todays Useless Use of Cat Award http://porkmail.org/era/unix/award.html


Well, sometimes using cat is faster, as I discovered to my surprise recently:

https://news.ycombinator.com/item?id=14414610

But in that case one could probably say that Gnu awk is just not very good at input handling (as Mawk doesn't appear to benefit from an extra "cat").


That's because you're using cat and a pipe. Try this instead:

  pv > /dev/null < /dev/zero


For anyone who is like me and finds it uncomfortable that things are now out of order, note that you can still put the input redirection in front:

  </dev/zero pv >/dev/null


Or even this:

pv </dev/zero >/dev/null

which is a common way of doing it (for any command with any inputs and outputs, not just the above ones), i.e.:

command < input_source > output_dest

All three pv command invocation variants, the one here and the two above, work. And it becomes more clear why they are the same, when you know that the redirections are done by the shell + kernel, and the command (such as pv) does not even know about it. So in all three cases, it does not see the redirections, because they are already done by the time the command starts running (with its stdin and stdout redirected to / from those respective sources). And that is precisely why the command works the same whether it is reading from the keyboard or a file or pipe, and whether it is writing to the screen or a file or pipe.


I believe /dev/zero writes data one byte at a time; that's likely the reason why.

[edit] That's actually inaccurate (and badly expressed), see comments below.


/dev/zero doesn't "write" anything in the sense that yes writes, since it's a character device and not a program. The Linux kernel's implementation of /dev/zero does not write one byte at a time.


You're right of course; and actually I believe the kernel will simply provide as many bytes as the read() requested; so the speed should mostly depend on how you access /dev/zero. IE, the user above was using cat and I think with dd and a proper block size it'd be much faster.


I was under the impression that cat automatically used a sane size for reading. Now that I think of it I cannot think of a source, other that to point out my own anecdotal experience.

When I was writing raspbian images to SD cards to use on a raspabery Cat and DD took within a few seconds of each other on an operation longer then a minute. Since then I have been using cat where I could, but I didn't think to right down the numbers though.


For something somewhat related, see parts of this thread:

https://news.ycombinator.com/item?id=14414610

Note that cat+gnu awk was faster than just gnu awk - but mawk was faster still (reading a not entirely small file).

And in a similar vein of gp comparing Gnu and Openbsd, note that openbsd cat is a little more convoluted than the simplest possible implementation (at least to my eyes):

https://github.com/openbsd/src/blob/master/bin/cat/cat.c

https://github.com/coreutils/coreutils/blob/master/src/cat.c

(That is, Gnu "cat" and OpenBSD "cat" are less different than Gnu "yes" and OpenBSD "yes").


yes will give repeated data though, not just zeros, seems like it's more useful here - as others have pointed out, also uses less syscalls than /dev/zero.


If you want to maximize readability and simplicity then writing `yes` in C is a bad choice. It is much easier, cleaner and shorter to write it in python or just use the shell which would normally be using the `yes` in the first place. Since `yes` is used in shells and builtins can be considered very reliable, here is a implementation as a shell function:

    yes(){ while :; do echo "${1:-y}"; done; }
Python:

    import sys 
    while True:
      if len(sys.argv) > 1:
        print(sys.argv[1])
      else:
        print("y")
And if you don't need the features of `yes` and only need a loop that prints 'y', then there really is hard to beat the simplicity of:

  while:; do echo "y"; done


Is it really "easier, cleaner and shorter to write it in python"? Did you look at the OpenBSD implementation?

https://github.com/openbsd/src/blob/master/usr.bin/yes/yes.c

It's essentially line for line identical to your python code...


The C example has three includes, two conditionals, two loops, and one function definition. The python example has a single include, conditional, and loop.

For readability purpose it is easier to go through each lines of the python program than the OpenBSD C code. Its not massively different, but its distinguishable enough that I would choose the python version if I wanted to maximize readability, minimize syntax requirement and did not want to use shell script.

The Shell function is in my view the superior choice if the audience is a programmer than know the shell script syntax. It is just a single loop and is written in the environment that the program is intended to be used in. The only drawback there is the speed.


Most of what makes the C program bigger comes from the fact that the C program does more. Your python example doesn't call pledge(). Remove that from the C program and it drops to one include, one conditional, and two loops. Further, counting the two loops against C doesn't make any sense: it's entirely up to the programmer whether to have a conditional containing two loops, or a loop containing a conditional. Both languages could naturally do it either way.


> The python example has a single include, conditional, and loop.

... and python. Don't forget to count python.


Exactly. That's the reason I write stuff in C instead of my favorite interpreted language, Ruby. When you write something in C, that's it. No large interpreter plus runtime needed.


[flagged]


You've been on a tear of uncivil and unsubstantive commenting, and it has to stop. Often a good strategy for this is to slow down. High-quality reflective posts are what we're after instead of dismissive, snarky reflexive ones, and the former come more slowly.


This is innificient. Why do the argv check in every iteration of the while loop? It's not going to change between iterations.


> Nobody will ever want to use `yes` at 10 GB/s

Just because you say so?


If the speed of yes is bounded by memory speed, doing anything useful will almost certainly consume that data at a far lower rate. Putting it on a disk, pushing it over a network, etc. will almost always be slower than yes is able to generate data.


The typical use case is piping it into another running program. Maybe someone wants to do that really quickly rather than putting it on disk or pushing it over a network.


It's not about running 'yes' at 10GB/s, it's about less overhead to do a simple job. If this version of yes is 100x faster, that implies it using 1% of a cpu to do the same work that would otherwise occupy 100% of a cpu. This leaves more of the machine to do what is likely to be the intended task.


Unix userland tools have evolved over decades to be as efficient as possible because they have historically underpinned everything the operating system does. The faster each tool works, the faster the other tools that depend on them work. If increased efficiency results in a bug, that bug can then be fixed, making it a net gain for system stability.


I'm having a hard time imagining a case where even a grossly complex 100,000 line implementation of yes(1) couldn't be trivially proven to be correct.


Easy! Write a program that does a brute force check of Goldbach's conjecture on all integers. For each integer that passes the check, print a line. If you can prove this correct (or incorrect) you'll probably win a Fields medal.


I agree with the general point that you could prove such a simple program correct relatively easily but that does still have a cost, which is always a concern in an open-source project. You still need someone to step up and do that work and continue to verify it periodically in the future – if that code is doing complicated things with buffering, that opens up possible odd bugs due to stdlib, gcc, maybe even kernel behaviour changes which might not affect simpler programs.

Not a huge bit of work to be sure but for a non-commercial project you might have trouble finding a volunteer who cares about that tedium.


Absolutely, I once trivially proved a 1000,000 line implementation of return 0 to be correct. I don't know why all comments are bothered by how much overkill this yes implementation is. Maybe they don't hold 100,000 PhDs like we do, am I right?


Hey, 640K is enough for anyone!


It's not really optimized to the limit – or perhaps it is, but then the limit is fairly easy to reach.

When I saw this item here I reached for the terminal and wrote a simple (and non-compliant) alternative that simply initialises 8150 bytes of "y\n" and then loops a write forever. I understand that it is not a fully standard-compliant yes, and that maybe GNU yes is indeed fast, but that awfully simple program that takes all of 10 lines (everything included) and took me all of a minute to write performs just as well as far as pv is concerned.

(I eventually completed a feature complete yes but I still think that simply not using `puts` is hardly optimising to the limit.)


Yeah I got a surprise when I wanted to see how strlen was implemented:

https://github.com/lattera/glibc/blob/master/string/strlen.c


If you're on amd64 that's the wrong file:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86...

The pcmpeqb instruction is from SSE 4, it compares 16 bytes per op


A nitpick, but I notice that the BSD implementation do not catch errors when printing. In theory it could get EINTR and only write partial amounts of argv[1], especially if the argument string is really long and the program runs for a extended amount of time. The GNU version do catch EINTR.

Naturally it could be that the C function puts used by most people in OpenBSD is implemented with built in EINTR catching loop, or that OpenBSD do not interrupts writes.


You'd have to check the various POSIX standards to be sure - and have to then verify that the OS/libc actually follows them, but you can pretty much rely on every libc's I/O wrapper functions to handle interrupted system calls or incomplete writes. I've never seen any code check the return value of a printf() to verify that all the characters were printed.

As you say, the GNU versions definitely handle EINTR - the linux man page for puts() just says it returns a non-negative value on success, it's not even specified whether it returns the number of bytes written or not.


`puts` is an "stdio" function, not a system call. It won't EINTR, it correctly resumes if the underlying write() EINTRs. If it does get an error, it will return EOF, then you'll have to call `ferror()` to find out which error.

But the point stands; there's no error handling in the OpenBSD version. But that could be considered a design decision; the OpenBSD version never gives up on writing data until you kill it; the GNU version bails at the first error.


I think the GNU version is surprisingly readable for what it does. I have seen code that is a lot less readable than this and doesn't have the performance benefits described in the article.


I wonder how much of BSD is written in this canonical style.

I think unix v6 was mostly as clear. Linux suffered the real world penalty. But maybe BSD managed to keep it's source poetic.


In general I love reading the code base of BSD-systems, NetBSD in particular is really beautiful and easy to follow.


I generally look at some BSD sources when I want to know how some unix tools are working, they are always much much more readable than the GNU equivalent.


I am wondering why the GNU version uses

atexit (close_stdout);

Aren't all streams closed at exit? And why closing stdout, anyway?


It's to make sure the buffer is flushed. Streams are closed, but not explicitly flushed at exit. Stdout only because it's the only filehandle with a buffer in yes.


In normal use, GNU `yes` does unbuffered IO on stdout. However, it does use buffered IO for --help and --version messages; it sets atexit(close_stdout) to cover both of those cases at once, rather than handling them both separately.


And for a third example, Busybox's implementation:

https://git.busybox.net/busybox/tree/coreutils/yes.c


Mah, how many times do you read a program and how many times do you execute it?


According to the article/experiment you don't have to bork up the code much, just copy your stuff into a big buffer before you start printing it (and print using write(2) instead of puts)


This works easily for the default case, which prints "y\n" (two bytes), which is likely to divide BUFSIZ. To handle the general case with the same efficiency, you have to have a buffer size that is a multiple of both BUFSIZ and the length of the string to be printed. It appears that GNU yes will not do that and simply does unaligned writes in this case, which is likely to be considerably slower (possibly slower than write would have been).


> It appears that GNU yes will not do that and simply does unaligned writes in this case, which is likely to be considerably slower (possibly slower than write would have been).

Why would it be slower to do a single, say, 8190 bytes write instead of 2730 3-byte writes?


Generally writes not aligned to cache line are slightly slower on most common architectures, vastly slower on others. (Such as many MIPS)

Small write calls themselves incur a considerable syscall overhead.


This is why Linux and GNU has won.

It's just a trade off. For utilities whose behaviour doesn't change we're happy to improve the speed.


And it's also a source of many bugs (in the past and likely in the future as well). In most use cases today I'd rather have a slightly slower userland which is easily read (and audited) then one which compromises quality for speed in edge cases.


IOW, Can't leave anything alone.


You're always welcome to port old, slow utilities forward. That's the beauty of open source!


This work certainly has value but it is frustrating to keep up with everything being changed. And if changes affect me do the fashions and attitudes of those making the changes have synchronicity with the way that I use computers?

In the old days I think we would have left yes in c because the compiler will build it on any platform.

My first experience with Linux is porting land.c to one of the commercial Unix. Now I first met TCP/IP on a 3B2 and later met systems sold by SMI and DEC and SCO and we all mostly constructed packets the same way and a guy called Stevens had written some nice books about this that everyone had. I think I recall the commercial and free BSD also did things the usual way. But whoever figured out this interesting phenomenon land.c demonstrated happened to be a Linux user and this platform had some different ideas about it. I saw rewriting the relevant parts as an annoying, menial task.


The /r/programming discussion of this is interesting [1].

Someone does a Go version and gets the same speed as GNU yes. Someone else tries several languages. This person got the same speed in luajit, and faster in m4 and php. Ruby and perl about 10% slower, python2 about 10% slower still, and python3 about half that. The code is given for all of these, and subsequent comments improved python3 about 50% from his results, but still not up to python2.

[1] https://www.reddit.com/r/programming/comments/6gxf02/how_is_...


I had to smile, as this thread is a microcosm of programming language stereotypes: with the python programmers tweaking code to get that extra 10% (nobody mentioned pypy..), someone trying to get javascript to work in a long running program, of course a rust implementation that isn't working well just yet (but we're all rooting for it.. set that compiler flag and rerun).. One comment about perl (turns into a one liner..). Nobody bothering to redo the Lua, Ruby and PHP. And for fun somebody throws down some fortran.

what is going on with the m4 code?


Within this thread someone pointed out https://github.com/cgati/yes/blob/master/src/main.rs. This rust version gives me 7.81GiB/s versus GNU's 7.54GiB/s.


Yeah. Rust came through (woot!), though thats some verbose low level code fu with cows?

(use std::borrow::Cow;) ?!

(I don't know rust, its on my list.. Especially with cow borrowing! They clearly are my people).

I noticed they added a comparison of Rust to gnu yes on the thread. They only got: 2.17GiB/s but on a slower machine.. (2.04 for the GNU)


Cow means "clone on write". It's a generic wrapper for types that have both a borrowed version and an owned version. For example, `Cow<'a, str>` can hold either an `&'a str` (a borrowed string) or a `String` (an owned string), and a `Cow<'a, [u8]>` can hold a `&'a [u8]` (a slice of bytes) or a `Vec<u8>` (an owned vector of bytes).

In that Rust program, Cow is being used here so if the user provides an argument it ends up with an owned vector that contains that argument plus a newline, otherwise it ends up with a borrowed byte slice that contains "y\n". That said, there's not really a whole lot of point to using Cow here since allocation is a one-time cost and nobody cares if yes starts up a few microseconds slower, but the usage of Cow is limited to just a couple of lines so it's not really complicating anything (the actual write() call just ends up taking a &[u8] byte slice, so the usage of Cow is strictly limited to main()).


tine, tiny note: it means Clone on write, not Copy, and it's String not Str.


Oops, typo. I've updated my comment accordingly, thanks.

Also, I didn't realize it was Clone-on-write. Interesting, and the documentation does confirm this. I say "interesting" because the actual operation involved is `to_owned()`, not `clone()`, seeing as how `clone()` on a `&str` just gives you the same `&str` back.


Yeah, conceptually it's closer to Clone even if it's not literally Clone.


One night for fun, I wrote a fizzbuzz (without actual printing) in almost all of the languages mentionned here (except fortran), and benchmarked it with 10000 executions each.


Pretty sure none of your implementations will beat the FizzBuzzEnterpriseEdition [1].

[1]: https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...


That link is always marked as already visited and yet I always forget it exists.


this is hilarious on so many levels...


And? Don't leave us hanging! Who wins at FizzBuzz? Also, surely 10k is just getting started?


10k was just sufficient to get actual differences. Of course I went crazy and did something like 10M, but waiting 10 minutes to get the same relative differences didn't bring anything.

Remember this is without any printing: C is fast, if I remember correctly it was something like 0.015 or 0.03s while at the other end of the spectrum I had to wait 0.1s for JavaScript. C results varied from simple to double because of the OS I guess because it was so quick.

It isn't a really significant benchmark though, for any language. It was one of those nights. Ah, and I remember a large difference between go run and go install.


> It isn't a really significant benchmark though, for any language

Sure, but it's reassuring to know that the universe is still sane, C is still faster than Javascript.

Thanks for expanding on your comment!


Found the old sources, reran the benchmark for 100 times each.

    c    0.062s
    f90  0.080s
    go   0.108s
    awk  0.116s
    lua  0.117s
    rs   0.140s
    sh   0.434s
    php  0.886s
    scm  1.323s
    py   2.965s
    rb   3.738s
    js   5.411s
My memory was bad, I also added fortran for the occasion.


Thanks for digging those up!


This[1] comment seems directly opposite from what you wrote, with Python 3.4 (8.76GiB/s) even outrunning Python 2.7 (8.55 GiB/s).

(haven't run the tests myself though)

edit: haven't noticed the comment by dom0.

[1] https://www.reddit.com/r/programming/comments/6gxf02/comment...


Actually I got perl beating all others by far. Easy, because you can control the buffer size.

darwin with BSD + GNU yes, vs some scripting langs:

    $ /usr/bin/yes | pv > /dev/null
    ^C00MiB 0:00:04 [23.9MiB/s] [     <=>                                                              
    $ /opt/local/libexec/gnubin/yes | pv > /dev/null
    ^C77GiB 0:00:05 [ 644MiB/s] [      <=>
    
    $ bash -c 'while true; do echo "y"; done' | pv > /dev/null
    ^C38KiB 0:00:03 [ 233KiB/s] [
    $ node -e 'while (true) { console.log("y"); }' | pv > /dev/null
    ^C57MiB 0:00:03 [ 582KiB/s] [

    $ perl6 -e 'loop { print "y\n" x (1024*8) }' | pv > /dev/null
    ^C295MiB 0:00:02 [ 139MiB/s] [    <=>
    $ ruby -e 'while true do puts("y"*1024*8) end' | pv > /dev/null
    ^C14GiB 0:00:09 [ 827MiB/s] [
    $ python2.7 -c 'while True: print "y\n" * (1024*8),' | pv > /dev/null
    ^C.3GiB 0:00:10 [1.73GiB/s] [             <=>
    $ python3.6 -c 'while True: print("y\n" * (1024*8)),' | pv > /dev/null
    ^C73GiB 0:00:05 [1.79GiB/s] [      <=>
    $ perl -C0 -E 'print "y\n" x (1024*8) while 1' | pv > /dev/null
    ^C.6GiB 0:00:08 [1.84GiB/s] [          <=>


Actually the second python version is about on par, however it's running on a slower system. He got 6.78GiB/s from GNU yes and 6.76GiB/s with python3. Some one later compares it with python2 which in fact is slower with this codebase.

    > $ python2.7 yes.py | pv > /dev/null
    > ... [7.22GiB/s] ...
    > $ python3.4 yes.py | pv >/dev/null
    > ... [8.76GiB/s] ..


m4 is orders of magnitude slower (notice the different unit).

If you write the Python script using bytes (not Unicode), then Python 3 is faster than Python 2, at least for me. Python 3 is 20 % slower than GNU yes and Python 2 35 % slower than GNU yes.

e: I think in your last sentence you are comparing results from different computers.


I was under the impression Python was magnitudes slower than compiled native languages, even when you aren't abusing Pythonic features that are computationally expensive like list comprehensions. And this isn't even in Pypy where this loop would be JIT'd and then run natively anyway, unless cpython does some kind of JIT? Is is just that the buffer is so large that the kernel handler is taking so much of a % of the runtime that the inefficiencies of the language stop mattering?


Assuming that ‘yes‘ is primarily limited by memory performance and kernel calls, the language runtime should have little influence.


The recent commit that sped up GNU yes has a summary of the perf measurements

https://github.com/coreutils/coreutils/commit/3521722


If anyone, like me, is wondering what "yes" is used for. You can use to pipe "y" to commands that require interactivity, so if you just want to say "y" to all the inputs, you can use "yes" to do this:

  yes | rm -r large_directory

  yes | fsck /dev/foo


On my quad-core MacBook, I've used "yes" in four different terminals as a way to quickly test a 100% CPU load.


Yes (so to speak), you can also do 'yes &' multiple times to create lots of background load (and then 'pkill yes' before anything melts...)


I know its only an example but why

   yes | rm -r <>
and not

    rm -rf <>


I believe "yes" has become less useful over the years, with flags like force and quiet solves this now.


I think rm -rf would also suppress errors in addition to confirmation messages, so presumably if you want a "soft" force...


No it doesn't.

  > rm -rf foo/bar
  rm: foo/bar: Permission denied


If foo/bar does not exist, it should be silent. Via http://pubs.opengroup.org/onlinepubs/007904975/utilities/rm.... on the -f option:

> [...] Do not write diagnostic messages or modify the exit status in the case of nonexistent operands. [...]


Yes, but that's the only error it suppresses, and that's because "not existing" isn't really an error for a tool that's trying to delete something. In my example foo/bar existed but I didn't have permissions to delete it, and as you'll see it printed out that error.


I sometimes do

   yes "" | cat -n
to generate a sequence number.


Try `seq 1 n`


euske may be using one of the operating systems (one such being mentioned on this very page) where there is no seq utility.

    JdeBP ~ $seq
    ksh: seq: not found
    JdeBP ~ $
Of course, on the one mentioned there is jot.


And on bash you can simply use {1..n}:

    $ echo {1..5}
    1 2 3 4 5
If you have Bash 4.0 or above, you can include a step size:

    $ echo {1..10..2}
    1 3 5 7 9


"yes d | mail" is an excellent way to get rid of failed cron job mails.


Why wouldn't it be `mail | yes d` if you're trying to answer "d" to all of the input requests of mail?


Because you want all the d's to be sent to mail...


Back when I worked at the Genius Bar at Apple Stores I saw a customer come in and talk to a 'Genius' about their MacBook being "slow". After a quick bit of troubleshooting, he just opened up 4 terminal windows an ran yes in all of them, and did some hand wavy explanation about diagnostics.


I'm curious why he did it. To impress people who work at the apple store?

OK, may be I misunderstood, who ran yes, the customer or the genius?


I just read this on Wikipedia (https://en.wikipedia.org/wiki/Yes_(Unix)#Uses):

> In 2006, the yes command received publicity for being a means to test whether or not a user's MacBook is affected by the Intermittent Shutdown Syndrome. By running the yes command twice via Terminal under Mac OS X, users were able to max out their computer's CPU, and thus see if the failure was heat related.


Ahh interesting, maybe he did have a legitimate reason for it.

I always assumed it was busywork to make the customer feel better.


Now if I could just prove to Apple that my computer randomly shuts down all the time... problem is, it doesn't appear to be heat related.


Mine just freezes, the last 0.3 second of sound is just repeated until it reboots by itself.


For reference, I had that exact same issue on my MBA - went on for years. Many times while watching youtube or doing something Garageband related.

At the time I had 10.11 on it, but that OS had itself been upgraded from 10.10.

When 10.12 came out, I decided to install fresh - so I backed everything up and installed onto a new SSD drive. The issue has now gone away completely....


Funny, I've seen that multiple times on my homebuilt Fedora system.


You know the system keeps logs, right?


Open console.app and look at the timestamp of log files when the crash occurred.


Hm, my old Dellbuntu laptop does the same. It's been dropped a few times in its 10 years of service, and now sometimes will shut down if I tap it too brusquely, regardless of the temperature. The BIOS will report it overheated, and i don't know what's going on.



the limit isn't the processor, it's how fast memory is. With DDR3-1600, it should be 11.97 GiB/s (12.8 GB/s)

I don't understand this reasoning. Why is it being limited to main memory speed? Surely the yes program, the fragments of the OS being used, and the program reading the data, all fit within the L2 cache?


It is NOT limited to the external RAM speed and the best proof is that it actually uses over 40GB/s of memory bandwidth.

For each byte of data passing through pv:

1. the byte is read from yes memory to CPU registers by the write syscall handler in the kernel

2. the byte is written to kernel's internal buffer associated with the pipe

3. the byte is read back in the read syscall called by pv

4. the byte is written to a buffer in pv memory

5. and thas's the end because write syscall executed by pv on /dev/null very likely doesn't actually bother reading the submitted buffer at all

edit: Actually it might only be 20GB/s because on Linux pv seems to use the splice syscall to transfer data from stdin to stdout without copying to userspace.

This is also the reason why further "optimization" of this program in assembly was a fool's errand: the bulk of CPU load is in the kernel.


All good valid points, but I'm still a bit surprised that the limit is not higher, I thought that the L2 cache was over an order of magnitude faster than main memory (plus, as someone pointed out in the reddit thread, the peak memory performance should really be double the quoted 12GB/s due to dual channel memory).

The actual throughput then, once you include OS copying, is either 2 or 4 times the quoted speed (depending on splice usage), so we're either at main memory theoretical speeds, or double main memory speeds. Intuitively, I'd still have expected that it should be a larger multiple.

(A quick search can't find me any reliable Intel L1/L2 cache speeds/multiples to quote, so I admit this comment is more speculation than it should be!)


L2 cache and copying "y" bytes have very little to do with this; I suspect if you could produce high-granularity timings it would almost all be in the syscall overhead.

See eg. https://stackoverflow.com/questions/23599074/system-calls-ov... who benchmarked it at ~638ns per "read" call.

(Many, many years ago I was working on the Zeus web server, and we went to surprising lengths to avoid syscalls for performance.)


Yes they do.

A read() syscall takes longer than a getpid() syscall because read() has more work to do, it actually does a data copy of len bytes, which takes some time (and will be faster/slower if data is cache hot)

What we call the "syscall overhead" is what happens before and after the actual data copy, switching between user and kernel mode.

You make that overhead negligible by calling read() with a large size.


(Many, many years ago I was working on the Zeus web server, and we went to surprising lengths to avoid syscalls for performance.)

Snap!

IIRC, ZWS used shared memory to mirror the results of the time() syscall across processes, to save a few nanoseconds on some operating systems :) That was before Linux and other OSs used techniques like the vsyscall/VDSO mentioned in the stackoverflow discussion...


yes and pv processes are not scheduled on the same CPU core, so different L2 cache.


I wonder if `taskset -c1 yes | taskset -c1 pv > /dev/null` would significantly change the throughput.


    $ yes |pv > /dev/null
    46.6GiB 0:00:05 [9.33GiB/s]

    $ taskset 1 yes |taskset 1 pv > /dev/null
    32.9GiB 0:00:05 [6.58GiB/s]

    $ taskset 1 yes |taskset 2 pv > /dev/null
    45.7GiB 0:00:05 [9.13GiB/s]

    $ taskset 1 yes |taskset 4 pv > /dev/null
    45.7GiB 0:00:05 [9.18GiB/s]
Very rough numbers - the 9.13/9.33 difference flip-flopped when I ran the commands again. Binding both processes to the same core is definitely a performance hit though. There might be some gain through a shared cache, but it's lost more through lack of parallelism.

I tried 2/4 as not sure how 'real' cores vs 'hyperthread' cores are numbered. These numbers are from a i7-7700k.


How do you know that the dataset fits in L2 ?

Assuming pv uses splice(), there is one only copy in the workload: copy_from_user() from fixed source buffer to some kernel allocated page, then those pages are spliced to /dev/null.

If the pages are not "recycled" (through LRU scheme for allocation), the destination changes every time and the L2 cache is constantly trashed.


I only learned of pv from this article so I can't speak much about its buffering. I would guess that the kernel would try to re-use recent freed pages to minimise cache thrashing. But anyway, on the 'yes' side, the program isn't re-allocating its 8kb buffer after every write(), so there's a lot of data being re-read from the same memory location.


As another point in your favor, one of the commentors reached 123 GB/s modifying both yes and pv.

https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes...


In general, only the CPU itself sees the L2 cache. Anything you see on another device (screen, disk, NIC etc) has been flushed out of cache.


Sure, but this is a pipe between two tiny processes, hopefully with very little else being run on the computer at the time (otherwise all bets are off for any benchmarking). There's no kind of 'real' I/O going on (in terms of stuff like screen, disk, NIC, and so on)

There's no reason that the L2 cache needed to be flushed at any point - the caches are all dealing with physical memory, rather than virtualised address space, so the fact that there are two processes here shouldn't stop the caching from working.


I am probably way behind the current state of the CPU judging by the downvotes I got so if you are saying there is no reason and the data can be written into a device without leaving the CPU I will just concede my ignorance.


Don't fret about the downvotes, these magic internet points aren't redeemable anywhere :)

It's completely possible for the data to not (all) leave the CPU. If the caches are large enough, then the pages full of "y\n" will be still resident in the cache when the next iteration of the program overwrites the same pages again. Then the CPU has no need to send the original page out to main memory.


If you did for(;;) buffer[(i++)%size] = 'y'; then you'd be correct. However, you do i/o. And the 'y's appear at the i/o driver and have to be made visible for the device it's driving, which can be anything, including a process on another CPU core in another socket. If they remained in the issuing CPU cache, I fail to see how the destination device could possibly see them. There are some devices which can snoop the cache (like SoC GPUs and other CPU sockets on some archs) but the snooping is much slower than memory bus. Writing to memory is the only way which a) guarantees the data is available elsewhere and b) is the fastest.


You could make "yes" faster with the tee() syscall. Keep duplicating data from the same fdin (doesn't actually copy) and it becomes entirely zero-copy.


Actually yes should use vmsplice().

And pv on the other side of the pipe should use splice().

Now that would be a complete zero-copy I/O path, purely limited by the CPU, not by memory bandwidth. It would benchmark at hundreds of GB/s :)


Someone posted an implementation which uses vmsplice on the reddit thread:

https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes...


His code can further be optimized by having only 1 large element in the iovec (vs. many 2-byte "y\n" elements).


from what i can see it's already in a large iovec buffer


Correct, I misread the code. However bumping the pipe size buffer bumps the speed from 28 to 74 GB/s on my Skylake CPU (i5-6500).

Edit: I optimized further and now get 123 GB/s See https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes...


next step is playing tricks with memory mapping so that the buffer is large in virtual memory but only takes a single page in physical memory, fitting in L1 (at least on the read side).


> And pv on the other side of the pipe should use splice().

Good catch, I just noticed that it does.


On the other hand the current code is perfectly portable, tee(2) is a linux syscall.


Still gets piped and hits same performance.


It's 2x times faster for a quick test here. Only copying on the read side?


Out of interest: Could you please post the code?


https://pastebin.com/jrcJbjU4

I've realised straight tee() is actually wrong - it works fine for piping to something but "zcyes > /dev/null" won't work, it needs vmsplice() like mrb said.

  core2duo e8400, 4.8 kernel, coreutils 8.25
  > yes | dd bs=1M count=10000 of=/dev/null iflag=fullblock
  10485760000 bytes (10 GB, 9.8 GiB) copied, 4.1901 s, 2.5 GB/s
  > ./zcyes | dd bs=1M count=10000 of=/dev/null iflag=fullblock
  10485760000 bytes (10 GB, 9.8 GiB) copied, 1.8542 s, 5.7 GB/s


FreeBSD's yes has just been updated because of this.

https://github.com/freebsd/freebsd/commit/1d61762ca37c20ab6f...

It's about twice as fast as GNU yes now on my FreeBSD system here.


Looks like that may drop some data if you get a short write, possible when writing to pipes etc.

Update: They fixed that issue with this follow up https://github.com/freebsd/freebsd/commit/2592fbb8


Let's not forget the most crossplatformest, purest `yes` of them all: https://www.npmjs.com/package/yes

    # /usr/local/bin/yes | pv > /dev/null
    11.5MiB 0:00:09 [1.02MiB/s] [                             <=>]
    
    # /usr/bin/yes | pv > /dev/null
    1.07GiB 0:00:09 [ 142MiB/s] [                             <=>]
JavaScript wins again!!


It's come to my attention that lower numbers are not better here. I have filed a bug, we'll get to the bottom of this shortly. I want to apologize to all our users, this issue does not reflect the values and principals we at Pure JavaScript `yes` hold dear https://github.com/Sequoia/yes/issues/3


I was not going to post this because hacker news has this ethic (?) of down voting anything that seen as not positive. Perhaps we should have discussion about that, I'm not sure that's a good thing but I'm not in charge here.

The top comment is:

"It's a shame they didn't finish their kernel, but at least they got yes working at 10GiB/s."

which as an OS guy, someone who has been working on Unix for 30+ years, as a guy who was friends with one the QNX kernel guys (they had perhaps the only widely used microkernel that actually delivered), that's hugely amusing and spot on. The GNU guys never really stepped up to being kernel people. Bitch at me all you want, they didn't get there. It's a funny comment, especially coming from reddit.


> hacker news has this ethic (?) of down voting anything that seen as not positive

We must not be reading the same Hacker News...

Anyway, the comment you're quoting is just a shallow jab that belittles the GNU developers' work without contributing anything new or meaningful. It's telling that you had to spend two paragraphs to justify cross-posting it here.


You say that and then immediately become an example of what hes talking about. This "shallow jab that contributes nothing new or meaningful" is, in some circles, known as a "joke." I'm continually frustrated by people who think that misinterpreting comments as harmful is a useful activity.


Not really -- I didn't downvote that comment. And I still dispute the notion that negativity is rare or always shunned on this board: to the contrary, it's so commonplace that an actual rule [0] had to be added to try to sway things in the other direction.

Jokes have their place, but bringing up the failure of Hurd in every GNU-related post is banal. And saying they "never really stepped up" to your level as a mighty kernel developer, as if the people who brought us glibc and coreutils lack an understanding of OS internals, just seems rude and curiously out of touch.

[0] https://news.ycombinator.com/item?id=9317916


So negativity here is not rare, when people don't like something they are fast to jump on it.

What I was trying to get at is this, if you care about your upvotes, hacker news promotes a sort of hive mind. Which is somewhat like "say only nice things unless you are clearly swatting down something that is obviously wrong".

Which is mostly fine, fantastic in fact. I'm fine with it, hacker news is really pleasant because of the (more or less ) quality of the posts and especially the quality of the comments. I'd much rather have it be this way than a free for all, those go bad pretty fast.

So I'm for the hive mind, I was just pointing out that you can't make jokes and be upvoted. The joke wasn't banal at all IMO. GNU has done a lot of good, I've been there since the beginning and paid attention along the way. They have also been pretty self serving with their choices, every project has to sign away their rights and then GNU takes full credit for the project even though they had nothing to do with it other than it being GNU troff for example. Given their tendency to take credit for stuff that they didn't do, and their claim that they can do an OS but clearly can't, that joke is funny as heck. If you don't get that, sorry, you haven't been paying attention.

() The quality of the posts in the area of programming, especially systems programming, is spotty. Some stuff is great, stuff I didn't know (there is a lot of that here and I'm very grateful for it, it's why I stick around), some stuff is meh, and then there is stuff like "wow, look at $OBVIOUS, isn't that cool?" that gets upvoted. That last one I just don't get, but whatever, the good stuff is good. The signal/noise ratio here is better than any other programmer oriented site I've found.


Years ago I read a similar experiment about max. CPU data flow. Guy was testing how much data can his CPU pass in a second. He was writing it in C, using some Linux optimization, optimizing code for CPU caches, using some magical C vectors that are optimized for such purpose. He got some help from someone working at Google. I tried to find that post but never succeeded. Does anyone here know it?


`yes` (with the backticks) is my favorite "bring the system to its knees right now" shell command.


Does not do that on modern Linux or especially -ck patch.


Good to know. I think the last time I tried it was on a rhel5 or rhel6 variant.

However, as avip found out below, it does still render OS X useless within less than a minute (at least on my 2015 MBP).


How? Only eats 2GiB and crashes bash with

    bash: xrealloc: cannot allocate 18446744071562067968 bytes


What does `yes` try to do?


This will attempt to open a child shell process with whatever output of "yes" is, interpreted as shell script.

But before that, parent shell has to buffer until EOF. With "yes" output being unbound, this means unbound buffering / memory growth. That is, until OOM killer take notice and shut it down.

Whole thing will probably take few seconds to a minute (depending on how much free RAM there is vs. how fast it is), peg a single CPU core in the process, and will recover cleanly (except for shell in question being terminated).

...unless system in question has single CPU and large+slow swap. Then yes, bring-it-to-its-knees.


Single CPU is fine nowadays. It will be slowish if you ran it with no nice or priority.

RAM can be tweaked by preventing Linux heuristic overcommit via sysctl vm.overcommit_memory=2. (0 is only recommended if you do not run broken applications. It so happens that many JS and Java VMs are broken on memory pressure.)


It may also push everything in memory to swap in the process, what is the real speed killer.


Make sure to have a sane memory limit set in PAM and switch vm.swappiness to lower value than 60.


I'm guessing it tries to read the output and return it. Since yes doesn't actually terminate it's going to generate a massive temporary variable to store it's continuous output.


Well thanks for killing my mac. People are trying to work here you know.


You took a command from a comment which said they use it to "bring their system to its knees", ran it, and then complained?


I thought that's what internet is for.


And the question is, do we need yes to be so optimized?

Not complaining, I like this kind of analysis

But it seems you won't be limited, in a shell script, by the speed you can push y's


That kind of question doesn't make sense for open source code.

Somebody wanted to optimize 'yes', so they did. There doesn't need to be a good reason, just like there doesn't need to be a good reason for a person to read a certain book or watch a certain movie, other than they want to do it.


The question was "do we need it? (for a specific use case)" rather than "should we do it?".

And it does make sense for Open Source code because the resources are limited hence other features and/or bug fixes might be more important than pushing data at full speed


> And it does make sense for Open Source code because the resources are limited hence other features and/or bug fixes might be more important than pushing data at full speed

That's not how open source works, though. There's not a group of people obligated to work on the GNU utilities. There's not a central project manager ordering people around telling them what changes to make.

Declaring that 'yes' is generally fast enough already doesn't imply that it was fast enough for the person who spent their time optimizing it. Somebody needed (or just wanted) 'yes' to be really fast, so they did the work and submitted the changes back.


> There's not a central project manager ordering people around telling them what changes to make.

No, but there is review and approval of patches (also the most frequent contributors know where the project is going and there is bug tracking)

If one just makes it faster without major benefits that patch is likely to be rejected


> If one just makes it faster without major benefits that patch is likely to be rejected

IF the project were overwhelmed with incoming patches, then I would agree that there's probably higher priority changes than optimizing 'yes'.

But I'm almost certain the GNU project is not being overwhelmed with changes to the core utilities. Everything else being equal (code quality, readability, test coverage, etc.) there's really no reason to reject a patch that demonstrably improves the project, even in some silly way like making 'yes' really fast.


It makes sense for any kind of code.

This is software engineering. Thinking about the tradeoffs and whether they are appropriate or warranted is bread and butter. It very much makes sense to ask whether all of this is merely optimizing for a benchmark at the expense of other factors that might turn out to be more important, such as long term maintainability, portability, behaviour in some common cases which are significantly unlike the benchmark, and so forth.

Several people have made some of these points here and in other discussions; and not only do they make sense, they are an important part of the discipline.

Is the optimized implementation documented well enough that other people coming to it anew can understand it in another 10 years' time? How many of the magic constants have to be tweaked as the operating system evolves? Is it clear to maintenance programmers how they need to be tweaked? Does optimizing for the benchmark skew the implementation too far away from one common case where one is only wanting a few "y"s (because, say, the program being told "y" only asks 17 questions) resulting in every invocation of the yes program rapidly generating KiBs or even MiBs of output, that lives in pipe buffers in kernel space unused and only to be thrown away? Does it make more sense to put buffer size optimizations in the C library where all can benefit? What is the benefit of optimizing for Linux-only system calls on GNU Hurd? Will we optimize yes for GNU Hurd, too, in the same codebase? How much conditional compilation will we end up with? If the C library is improved to do better in the future, is it a problem that the optimized for benchmark program no longer automatically gains from it? How much of a problem is it that a GNU program is now not portable to systems other than Linux, and is locked into one operating system?

And what about other benchmarks? Many have noted that this benchmark pretty much relies on the fact that the output of the yes program is simply thrown away, and no real work is being done by anything else on the system. What about a benchmark where it is? What about a benchmark where it is important that yes issues a system call per line, yielding the CPU to other processes so that they can process that line before yes bothers to print the next? What about a benchmark that measures kernel memory usage and treats lower as better? What about benchmarks that measure how small the yes program is; not just in terms of code but in terms of memory, I/O, and CPU usages; because the memory, I/O, and CPU should actually be given to more important programs than yes on the system that are actually the system's main function? What about low impact?

Yes, it's fun to focus narrowly and optimize yes so that it generates as much output as it can for one specific toy usage. But in engineering one has to ask and think about so much more.


This can be extended to any input given to the program, since yes is defined to take an argument from the command line to print out instead of "y". See https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes...


A detail you seem to be missing is that you are not limited to a shell script. The shell sets up the pipeline, but the members of the pipeline can be written in arbitrary languages and just have each stdout linked to the next processes stdin.

As a result you can process very large volumes of data and consume (not waste, consume) significant system resources to perform your processing.


I would just pre-allocate a static array of "y\n" of size BUFSIZ, write it out in a loop, and call it for the day, skipping the whole malloc and filling loop business.

Make the static array BUFSIZ * 1024 to trim the syscalls by a factor of 1000.


Real yes accepts a string to print (so you can have it spit out a full "yes" or "Y" rather than a hardcoded "y").


Have a few pre-written arrays for the common cases: "y", "Y", "n", "N", etc. Those are the fast cases (or the benchmark optimized cases, like, what Volkswagen did). Have another pre-allocated static array to fill in with other input.


> (or the benchmark optimized cases, like, what Volkswagen did)

This is amazing. Maybe we can turn this into a verb? "I Volkswagened the common cases with precalced buffers".


"To volkswagen" as in "optimize for a benchmark"? I'm so going to use this as soon as possible.

Incidentally, that word works even better in German, where every verb must end in "-en".


I am pretty sure software cheating on benchmarks precedes VW. Samsung a few years ago? Probably not the first either.


I think video card manufacturers were first to cheat benchmarks


'Never trust a benchmark you didn't Volkswagen yourself'


That's how you get sued. Especially because now Ford and GM are also in court, for having done the same as VW.


Let me check that I have correctly understood the scenario you're anticipating.

1. Someone uses "volkswagen" as a verb meaning "cheat in benchmarks".

2. Volkswagen takes them to court, on the basis that it is slanderous or libellous to associate Volkswagen with cheating in benchmarks.

3. Counsel for the defence reminds the court that Volkswagen were in the news for a protracted period for a benchmark-rigging that saw them hit with a multi-billion-dollar fine, slashed a third off their stock price, saw their CEO resign, led to there being a "Volkswagen emissions scandal" page on Wikipedia, etc., etc., etc.

4. [THIS IS THE BIT I'D LIKE YOU TO FILL IN FOR ME.]

5. Volkswagen wins the case.

How exactly does step 4 go?


By volkswagen arguing that this is industry standard behaviour, and the entire media campaign is already libel and slander.

And they’d have a pretty good case with that, considering the original report all this media campaign was based on called out 6 carmakers, but the media (and the poster) only call out Volkswagen.


In the same vein as Adobe sues me for photoshopping a picture or Alphabet sues me for googling my exgirlfriends' names?


No, but you don’t wanna get sued for libel and slander. If VW can realistically show that it is industry-standard behaviour (which is pretty obvious), then they might even have a chance to win.

Either way, it’d be expensive for you.


I'll call you when they sue.


It's been said, "Truth is the best defense for libel". Reasonably held opinion would then be the second.

Realistically, not mentioning anyone or anything is the only complete defense. The truth still comes close.


I do not think Volkswagen will try to sue you for adding a new phrase to urban dictionary ;-)


That's exactly what the OP did.


No, OP mallocs an 8k buffer then fills it as part of main.


See the "fifth iteration" done in assembly.


Why is it so slow(compared to the post) in the macbook air. Native yes runs at 26 MiB/s, and GNU yes at 620 MiB/s.


It isn't! yes runs at 7.2 GiB/s on my macbook air. Though I have Linux installed on it instead of Mac OS :)


Are you trolling? The commenter obviously meant the macOS version of "yes".


The VM subsystem is much slower. Also possibly a not-up-to-date GNU grep, on my (ancient) MBP I get 26MB "native", 780MB GNU and 2.8GB for TFA's C program (2.9 if I replace the malloc by a stack-allocated array, weirdly enough)


Probably alignment issues: stuff on the stack might be aligned by default, in contrast to malloc'd memory.


The CPU and RAM are slower and there is much more kernel overhead. Not to mention pipe buffer size is small on OS X.


Come on people, I ran GNU yes just there, it isn't as slow as OP says but it is a quarter of expected performance.


With that malloc overhead, I expect GNU yes to be slower when only a few bytes are read from it.

So, what's the distribution of #bytes read for runs of 'yes'? If we know that, is GNU 'yes' really faster than the simpler BSD versions?

Also, assuming this exercise still is somewhat worhtwhile, could startup time be decreased by creating a static buffer with a few thousand copies of "y\n"? What effect does that have on the size of the binary? I suspect it wouldn't get up much given that you can lose dynamic linking information (that may mean having to make a direct syscall, too).


Unless you are statically linking, one malloc doesn't significantly affect your startup time. When only a few y's are read, time is going to be dominated by ld.so, by a large margin.


> I expect GNU yes to be slower when only a few bytes are read from it

Wouldn't that be entirely negligible compared to starting the program in the first place?


Measurements are really noisy, but I seem to get significantly better numbers than that when I use fsplice() on a pre-generated few pages of file data instead.


Yes, splice can bypass the pipe buffer in some cases.


I thought this was a fascinating read but it left a serious question lingering in my mind, which is a little out-of-scope for the article, but I hope someone here can address.

Why did the GNU developers go to such lengths to optimize the yes program? It's a tiny, simple shell utility that is mostly used for allowing developers to lazily "y" there way through confirm prompts thrown out by other shell scripts.

is this a case of optimization "horniness" (for lack of a better word) taken to its most absurd extreme, or is there some use case where making the yes program very fast is actually important?


The stated use case for the perf improvement was "yes(1) may be used to generate repeating patterns of text for test inputs etc., so adjust to be more efficient."

Source: https://github.com/coreutils/coreutils/commit/35217221c211f3...

I've personally used it for generating repeating text and filling disks in system testing, so I appreciate it being faster at those tasks. I also sometimes use it as a signal generator for a hacky load generator, like so:

  yes | xargs -L 1 -P NUM_PROCESSES -I {} curl SOME_TARGET_URL > /dev/null
This doesn't benefit from being faster per se, but I appreciate it using less CPU since I want to give curl as much system resources as possible.


But doesn't this make the typical use case (just a few "yes"s needed) slower, since first it has to fill a buffer?

I would write() the buffer each time it gets enlarged, in order to improve startup speed.

Also: The reddit program has a bug if the size of the buffer is not a multiple of the input text size.

And it's increasing the buffer by incrementing one at a time, instead of copying the buffer to itself, reducing the number of loops needed (at cost of slightly more complicated math).


>But doesn't this make the typical use case (just a few "yes"s needed) slower, since first it has to fill a buffer?

If "only few yes's are needed" then the slowdown to produce them will be inconsequential, whether they still fill a buffer in this case or not.


If your only overhead is filling an 8K buffer, I don't think your user is going to care. Taking one microsecond instead of one nanosecond doesn't matter all that much when you're going to lose way more than that in pipes, the kernel, the program you're piping it to, etc.


But what's the use case for a large volume of continuous output? It feels like we're optimizing for the wrong use case


Maybe filling a disk or flooding a network connection as in

    yes | ssh server "cat > /dev/null" 
But yes take arguments so there might be more use cases:

    $ man yes
    NAME
           yes - output a string repeatedly until killed
    
    SYNOPSIS
           yes [STRING]...
           yes OPTION
    
    DESCRIPTION
           Repeatedly output a line with all specified STRING(s), or 'y'.


> But doesn't this make the typical use case (just a few "yes"s needed) slower, since first it has to fill a buffer?

One could put a write in the memcpy loop, so that first it writes one copy of the string, then two, then 4, 8 (etc.), meaning time to first byte is short, but it quickly gets up to the eventual asymptotic speed with a full buffer.

( https://www.reddit.com/r/rust/comments/4wde08/optimising_yes... )


You could also have a static buffer full of yesses, it would increase the binary size by 8k, but GNU yes is already 40k on my system so...


That is mentioned elsewhere in this very thread, where it is also pointed out that that isn't a general solution (`yes` can print more than just y).


> The reddit program has a bug if the size of the buffer is not a multiple of the input text size.

The author appears to be aware of that:

> Even with the function overheads and additional bounds checks of GNU's yes


GNU yes is fast because it is coded with the assumption that it's not answering any real question, such as "can I combine this free code with a proprietary program?" or "Would you accept the following monstrous patch to GNU Coreutils /bin/true without a copyright assignment?"


I think we can do better.

How about a /proc/bin/yes for this? Like most /proc files, it would appear to be empty. Executing it would involve a fs/binfmt_proc.c file in the kernel source, which would be a handler for this sort of executable. That would get the job done entirely in the kernel.


man, I just spent like 8 minutes today writing a python script to use up all the disk space on some servers (part of ops readiness testing) when I could have just used this trick.

`yes` will help me on the "see what happens when something uses all the CPU and memory" test case. Thanks Reddit/HN!


dd can also be useful for that kind of thing since you can use a source like /dev/urandom to generate random bytes if you're trying to avoid compression and the adjustable block sizes can be optimized for the underlying storage system.


I did actually use dd to begin with but it didn't work as I wanted it too right away (can't remember why, probably something SUSE related) and by then I'd used up 6 mins of the 15 mins I gave myself for the job.

I also had to do the same test on windows, so python won


Did you notice PHP outperforms any other scripting languages? Some report that it event beats the GNU yes implementation.

After reading here so many unfair critics and pedantic dislike over PHP[1][2][3][4][5][6], I just want to say: STFU.

[1] https://news.ycombinator.com/item?id=12706136

[2] https://news.ycombinator.com/item?id=3825227

[3] https://news.ycombinator.com/item?id=3824881

[4] https://news.ycombinator.com/item?id=1823022

[5] https://news.ycombinator.com/item?id=1819517

[6] https://news.ycombinator.com/item?id=1819413

... Just to name a few.


It's almost funny that you try to refute, or rather, dismiss, criticism of PHP with "but it's fast", when none of what you cite even mentions that. I just want to say: STFU.


I don't dismiss criticism. I dismiss unfair critics. What I cite is full of that. And, btw, [6] implies PHP is slow, read on:

[quote]

It's hard to find a worse example than php, because when many "bad" languages fail in one or the other category, php seems to fail in most categories.

* Java has a really repetitive syntax, but it's libraries might make up for that.

* C++ with all it's features has bazillion ways to shoot yourself, but it's arguably fast and the syntax is bearable. ...

[/quote]


It doesn't in fact imply that PHP is slow. What it says about PHP (besides "php seems to fail in most categories" which doesn't imply any single specific failure because most != all) is this: "But php has both an ugly syntax, horrendous stdlib, and fame of security issues." You will notice that none of those specific complaints is about speed.


None of those comments are complaining about PHP being slow at repeatedly writing a string whose length is a multiple of the page size to standard output. That is a pretty niche case, and hardly refutes even the complaint that PHP is slow, much less all the complaints about PHP that aren't related to its speed.

FWIW, when I run the tests there's no substantial difference between Perl, LUA, or PHP.


PHP here is just calling c functions. The actual non-benchmark php people pay for is completely orthagonal to these results.


>>>Did you notice PHP outperforms any other scripting languages?

I am not sure what was the last time I had to choose a language by a single dimension. If we are talking about performance, scripting languages rarely shine. On the top of that, performance critical things in scripting languages are often implemented in a compiled language (like Pandas for example).


>> If we are talking about performance, scripting languages rarely shine

Except that, in the yes example, PHP shines by its performance.

Of course, you don't choose a language based on that on a single dimension, but speed is an important factor otherwise benchmarks would not exist.



while(1){write();} is about as far away as it gets from actual application code. This topic may be fun, but remains an utterly pointless micro-benchmark.


I run the command `yes | pv > /dev/null` on my MacBook Pro, it's only 37m/s, is this normal? I am not familiar with the command.


I am getting 26 MiB/s using native yes, and 620 MiB/s using GNU yes in macbook air.


From the OP:

>OS X just uses an old NetBSD version similar to OpenBSD's >NetBSD's is 139MiB/s, FreeBSD, OpenBSD, DragonFlyBSD have very similar code as NetBSD and are probably identical,

It really depends on how fast your memory and processor are. Everything there was benchmarked with an i7-4790 with DDR3-1600, and very very little running in the background.


Two orders of magnitude difference can't easily be blamed on CPU, in this case it depends on how optimized yes is. I have some older version of coreutils on one machine here and performance is similarly abysmal.


Well now I know what `yes` does (And pv)


Why is he using backticks to quote "yes" in the title?


It is common in markdown formatting to indicate a command line with backquotes. It tells the markdown compiler to apply special css and possibly syntax highlighting. So, a lot of people who write about shell commands and setup tutorials, or use markdown on something like github will automatically identify it as a command.


clearly, we just need /dev/yes


Sounds like a reasonable feature request for systemd.


yes | write <USERNAME> "Don't you hate dialup connections?"


I think you mean yes "Don't you hate dialup connections?" | write username


The proprietary Oracle Solaris 11.2 yes really slowed down when they added DRM and Verified Boot support...



tl;dr someone who doesn't understand how i/o works gets a small insight into how memory and a cpu work and decides "Buffering is the secret" and "You can't out-optimize your hardware"

Can we have a new flag for posts by people who don't know what they're doing so I can skip them? I am serious.


[flagged]


We detached this subthread from https://news.ycombinator.com/item?id=14543757 and marked it off-topic.


Since you didn't add anything other than an unwarranted attack on the person you're replying to, this falls under “Avoid gratuitous negativity”:

https://news.ycombinator.com/newsguidelines.html


I think this subthread is a good example of what can happen if we quote the guidelines without context. It's a counter-intuitive trap I've also fallen into, so I wanted to mention it.

In this case, this subthread could've been nipped by mentioning that it's ok to be abrasive to an extent as long as it's followed by a substantive comment. It's not diplomatic, but it's not against the rules.

The guidelines are the framework the community is built around. They're not sufficient to explain why the community is built around them. (We probably wouldn't want the guidelines to be so complete, either. It's better to explain the motivations on a case-by-case basis.)


It's a valid question but looking at his posting history doesn't convince me that there's a way to word things which would result in a more productive response. Note that I lead with the observation that there wasn't anything to his post other than the attack — I don't think there's too much more to say about that before hitting diminishing returns.


It's a valid question but looking at his posting history doesn't convince me that there's a way to word things which would result in a more productive response.

You could be right. But if you were in their shoes, you'd want to be given the benefit of the doubt.


[flagged]


> Seriously, is HN fostering a culture where it's OK to spew pseudo-intellectual garbage as long as you're being nice?

Not at all. It's fine and encouraged to correct or contradict someone. It's not OK to be personally abusive when you do it.

The civility guideline has been in place since the very beginning of HN, and it's one of the biggest reasons why people keep coming back.

The commenter was just recounting their own recollection. There's no need to be combative. If they're wrong, just point it out politely. No big deal.


The commenter was not being combative or abusive in any way. They did point out the other person was wrong politely. They still got downvoted to hell.

People don't come to HN for civility. They come because they can be smug polite idiots around other smug polite idiots and "the rules" protect and enforce this behavior.

If you don't believe that disproving something stupid is hard, I give you Donald Trump. Sometimes you just need to speak out against stupidity without providing a groveling mathematical proof.


A cursory glance at that commenter's recent history shows little evidence of politeness or intention to be polite.

You don't have to like this community but surely you have better things to do than making it worse by trashing it like this?

Also, invoking US politics drama to score points in this discussion topic shows quite a lack of sense of proportion. We are talking about a single person's casually-communicated recollections of their past experiences with a computer operating system. It really isn't worth getting wound up about.


This is a great example of not reading charitably. You're reading a bunch of stuff that just isn't in the text you're replying to. Who said anything about big data? Who is trying to write journal publication quality comments? You seem to have an axe to grind that has nothing to do with this thread.

My reading was just that the parent commenter was telling a story by relating an interesting anecdote. That's exactly what I come here for. People won't do that as much if the common response is weird aggression like your comments in this thread. That would be a shame.

You could have just written a little comment saying, "in my experience, you're incorrect about the speed of reading from /dev/zero". The difference in your remembered experiences and your conversation about it would have been interesting to me. Instead you picked a fight, which is not interesting.


> in my experience, you're incorrect about the speed of reading from /dev/zero

> Instead you picked a fight

So I harshly pointed out that they're mistaken. Then it was pointed out to me that I wasn't being nice. So I responded by showing why I didn't feel the need to be nice and provided my reasoning for it.

Since it seems to be too much trouble, here is the post I'm getting the "imaginary" quotes you refer to from: https://news.ycombinator.com/item?id=14544002


You can point out how people are wrong and people won't have a problem with your posts. Be a jerk about it and people will. Hacker News, for its (many) faults, doesn't ascribe to the "you can be a sneering asshole but if you're Right you're Right". It's very possible to be both right and a jerk. It's incumbent upon you to avoid being a jerk or to not whine when people don't like jerks. This is basic-functioning-person stuff.


I don't know who is right or wrong. I am a software developer and familiar with Unix/Linux. I actively try to be well informed about other topics related to this. When engineering I actively try to remain emotionally dispassionate. I understand what the argument is about and I see how in a technical sense either side could be "correct" here.

Despite all that, the level of vitriol and defensiveness you have demonstrated makes me want to agree with everyone else. I am curious what the opinions are of people less informed on this topic and more emotionally in touch. You damage your side of the argument, not with your content, but your portrayal of the content.

Some of the things you said have merit, but they are ignored and derided because they are attached to you and you are attached to vitriol and vitriol rightly gets ignored and derided. The best thing you can do for anything resembling your side of the argument is to just back down. If you don't say anything for a day or two it will all just blow over and you can try again later when people are more willing to listen. You can try approaching the conversation like an engineering problem and perhaps see that manners and politeness are to conversation as lubricant is to an engine (or perhaps high clocks speeds to software). You can have a conversation without manners, run an engine without oil or run software on low clocking parts, but it won't turn out as well as it could have.


So it's not what I said it's just how I said it.


I think "you don't know what you're talking about" is substantively different than "in my experience, you are wrong about this". Firstly, lots of people who do know what they're talking about are wrong about lots of things. Making it about the person instead of about the thing was (for me) the problem with your comment. Also, "in my experience" is not just polite fluff. I have come across diminishingly few things in computing that are not dependent on circumstances.

The interesting version of this conversation would have been you saying, "no that sounds wrong to me", followed by a discussion of (to the best of your memories) what kind of hardware, drivers, kernel and compiler versions, etc. you were each using. Maybe you would have convinced me that there is no combination of those things for which the original commenter's experience could possibly have been right. But your throwaway ad hominem comment didn't, and couldn't ever, accomplish that.


Mostly...

But notice how you still post and the downvote brigade still obliterates your score.

You can't say anything and get upvotes because you have angered so many with how you said some things.


> But notice how you still post and the downvote brigade still obliterates your score.

And even posts with actual data are being downvoted :)

This only goes to show just how subjective and emotional this community has become, where everyone just wants to share "but in my opinion" or "but in my experience" without having to back up any of their arguments, or even have any of their arguments or claims challenged and questioned and called out.

This essentially turns concrete, testable claims into happy dreaming about technology. And that's really what this has become: everyone sharing their idea of what something should be and presenting it as fact.

I guess I've made a very good point today!


It has literally nothing to do with people saying "in my experience" or even you choosing to challenge an argument. It has to do with you being a jerk about it. The people you interact with through your keyboard are real and how you treat them matters.

The only point you've made today is that self-righteousness exists and exhibits itself as rudeness. And you know it's rude; you're actively trying to downplay it by calling how you've chosen to approach other people as "harsh". You should use this awareness to be better, not to continue to be unkind. Because, ultimately, this is a you problem.


> And even posts with actual data are being downvoted

Of course they are, its not your data that people don't like. They aren't downvoting the post, they are downvoting you. People don't like you, and anything you say will be downvoted at present. Remove yourself or mod might.

You have made no points today, unless you are trolling then you have angered many and that is your measure of success. I will not respond to you further.


> Of course they are, its not your data that people don't like. They aren't downvoting the post, they are downvoting you.

Exactly my point.

> You have made no points today

Sure I did. One of them is right in your comment :)


You are a troll and deserve all the downvotes.


> Seriously, is HN fostering a culture where it's OK to spew pseudo-intellectual garbage as long as you're being nice?

The comment I called out had no technical content, argument, etc. and was only an attack on someone's credibility. That contributes nothing of value and in my book is worse than “pseudo-intellectual” because it's not even trying to contribute anything other than noise to the conversation.

You would not be getting such intense negative reaction if your comment had been “Do you recall which operating system that was?” or “How long ago was that?”. That's how an actual engineering conversation should go, especially in a field such a long history of people uncovering unexpected performance issues caused by unintended consequences.

As a simple example: in the early 2000s, a colleague reported that tools like grep and wc were running slower than our storage. That seemed odd since those are usually considered fast and certainly hadn't had that problem years earlier but a little testing confirmed that the switch to assuming Unicode by default had made a previously unoptimized code path the default and I'm sure the person who did that had assumed that someone would have noticed if code which had been shipped for years was that slow. The fact that for many years before and after you could mentally bucket those Unix shell tools into the “fast” bucket didn't change the fact that there was a multi-year period where you had to check.

I have no reason to doubt that joosters' reported experience actually happened, especially given how rocky the 90s Unix world was. That was back when you couldn't even rely on a vendor's own C compiler not producing pathologically bad code. Someone having a /dev/zero implementation which triggered extra or slow kernel transitions, unnecessary access checks or audit logging, etc. would be disappointing but far from shocking[1].

More to the point, what do you hope to gain from such an unnecessarily combative style? You're not preventing any bad outcome, helping yourself or others improve, or otherwise doing anything other than lowering the level of discourse here.

1. Here's an example from 2012(!): https://groups.google.com/forum/#!topic/minix3/M3fjpvvOkuY


Oh dear. So sorry that my memory is crap enough to not provide you with the exact details you require.

...every comment is trying to look like a paper published in a highly acclaimed journal

It was an anecdote about my previous experience with 'yes' and /dev/zero, it seemed somewhat on-topic. I'm not trying to justify anything or win magic internet points.

If you're expecting journal-quality posts, perhaps you should go read a journal and not an internet forum chat.


[flagged]


You're free to spout your vitriol just like the people downvoting you into oblivion are free to use their downvote ability to express their dislike with your attitude. Free speech means you're free to express your ideas, it does not mean everyone else is forced to listen to them.

It's hard for me to view your comment as anything other than flamebait. You're calling out someone in an aggressive tone, "You don't even know what you're talking about! Bah-humbug!" and then crying crocodile tears that other people are "being mean" to you.


> You're free to spout your vitriol just like the people downvoting you into oblivion are free to use their downvote ability to express their dislike with your attitude. Free speech means you're free to express your ideas, it does not mean everyone else is forced to listen to them.

Did I disagree with that anywhere? Do you have a reference comment by me to point to?

> It's hard for me to view your comment as anything other than flamebait.

Ok.

> You're calling out someone in an aggressive tone, "You don't even know what you're talking about! Bah-humbug!"

I said I don't think they know what they're talking about. Without any of the shenanigans you portray here.

> and then crying crocodile tears that other people are "being mean" to you.

What? Where did I claim anyone is being mean to me? I made my argument to everyone who asked why I was being combative.

So again, do you have a reference comment by me to point to?


"obviously incorrect"

You admit that you don't have enough details to know exactly what they're referring to, yet you're confident enough to call their personal experience "obviously incorrect"?

It's not an argument, it's an explanation based on personal past experience. You can't find any evidence, and you don't need to, because the claim being made won't influence how you do things today anyway.

People's personal experiences can be interesting, and that adds to the conversation. Your overwrought criticism adds nothing. Kindly stop.


> You admit that you don't have enough details to know exactly what they're referring to, yet you're confident enough to call their personal experience "obviously incorrect"?

I didn't admit to not having enough details. You have two tools:

1. Reasoning about the abstractions and implementations of how yes and /dev/zero work: the claims become obviously incorrect, even their own tests later in the thread show that on a "modern system".

2. Actually running the tests they claim (which would prove them correct): no freaking way to do that. Even they do not offer any numbers to back their own claims because everything in the argument is a maybe or a remember or a might or a may have.

> People's personal experiences can be interesting, and that adds to the conversation.

People claiming their personal experiences are fact without proving anything is hugely problematic. Remember that woman claiming her child got autism due to vaccines? Remember that US president claiming a child got autism due to vaccines?

I remember seeing the earth as flat when I was working as an astronaut on the ISS. Go check my "personal experiences".

I hope you see what I'm trying to convey here.


I really don't see how it's "obviously incorrect" when yes only needs to make system calls for writing, whereas /dev/zero needs to make system calls for both reading and writing. Furthermore you're assuming that both are highly optimized, whereas it's entirely conceivable that /dev/zero is less optimized.

If what you're trying to convey is that abstract reasoning based on fundamental principles without any consideration of real-world complications is a superior approach to determining what's true, well, I reject that.


> I really don't see how it's "obviously incorrect" when yes only needs to make system calls for writing, whereas /dev/zero needs to make system calls for both reading and writing. Furthermore you're assuming that both are highly optimized, whereas it's entirely conceivable that /dev/zero is less optimized.

When you're "only writing", there is a counterpart that needs to read, and you have to wait for that, same as /dev/zero. It's more conceivable /dev/zero is more optimized than yes: you know, because of how pipes and streams are an essential part of a UNIX or UNIX-like system.

> If what you're trying to convey is that abstract reasoning based on fundamental principles without any consideration of real-world complications is a superior approach to determining what's true, well, I reject that.

I agree with that. But the original comment only gives us few options to check for its validity, and some of those are not feasible.

1. Actually finding all those systems, version-specific, combinations of userspace tools, setting them up and running the tests on hardware that was at OP's hand "back then": Not feasible.

2. Reasoning abstractly would tell us quickly that the original comment is plain wrong.

3. Running the tests on a modern system (some default linux install or whatever) shows that (2) is even more probably correct.


Is it "more conceivable" or is it "obviously incorrect"? You switch freely between arguing that your conclusion is more probable and arguing that it's obviously correct.

I agree with you that your conclusion is more probable. But that's an entirely different thing. It is more probable that a Tesla P100D is faster than a Pinto. It is not obviously incorrect that one time a Pinto beat a Tesla, maybe because the Tesla was dragging a bunch of rocks and the Pinto was dropped off a cliff.

Make up your mind what your argument is: is the statement in question merely unlikely, or is it "obviously incorrect"? Once it is, make your argument without being a jackass.


I responded in the terms you were using, hoping that it would become clearer to you.

> It is not obviously incorrect that one time a Pinto beat a Tesla

Yes it is, because OP forgot to mention the Tesla was dragging a bunch of rocks and the Pinto was dropped off a cliff. So was I supposed to have the power to read OP's memories and go back in time to check whether they provided us with all the information we needed to check their claims? No, I worked with exactly what's in the comment.

So without catering to your language/terms you use: it is obviously incorrect. Even OP's own comment later in the thread (on a modern system though, mind you) shows that: https://news.ycombinator.com/item?id=14544002


You appear to be incapable of understanding any middle ground between "obviously incorrect" and "obviously correct." Is the concept of uncertainty that foreign to you?


Really? Because when it comes to something testable, like in this case either running a test of yes vs /dev/zero or looking at the source code of everything involved, there isn't any uncertainty that OP's statement is obviously incorrect.

The only uncertainty is OP's use of "back then", "from memory", "maybe", "could have", etc... without providing any evidence to support their statement.


They were talking about things in the distant past. (Distant in computer terms.) The only way there isn't uncertainty is if you tried all the systems that were available at the time in question.

The claim is testable, in theory. You didn't test it, nor did anyone else here, so the fact that it's testable doesn't remove any uncertainty.


> The claim is testable, in theory.

It's not feasible to test in any practical way.

You said it yourself:

> If what you're trying to convey is that abstract reasoning based on fundamental principles without any consideration of real-world complications is a superior approach to determining what's true, well, I reject that.

Enjoy :)


Other tests by other people show that /dev/zero is slower.

https://news.ycombinator.com/item?id=14544261

People are suggesting that it's because of the use of cat. So, if someone writes a program, and the only difference is between /dev/zero and yes, and one is slower than the other, it's fairly reasonable to make the assumption that /dev/zero is slower.


> People are suggesting that it's because of the use of cat.

It is:

    > cat /dev/zero | pv >/dev/null
    ^C97GiB 0:00:05 [1.79GiB/s]

    > < /dev/zero pv >/dev/null
    ^C.3GiB 0:00:06 [10.2GiB/s]


OP's own comment shows that /dev/zero is faster: https://news.ycombinator.com/item?id=14544002


[flagged]


I would think the measurement tool (pv) could be a significant part of the overhead.

EDIT: the post I'm replying to has changed in the interim. It was previously talking about overhead.


they are using pv in the original GNU 'yes' measurement too?


Right, but if that was the limiting factor on throughput with the original pipe, then there's nothing they could do to the bespoke 'yes' to beat it.


but the article was about matching it, and they weren't able to achieve that. i agree it would be a factor if trying to beat it


good job! you have accurately copied the top comment from the link you just clicked on.



It's possible that he or she wrote both comments, or someone from Reddit copied the comment from HN.


> It's possible that he or she wrote both comments,

Highly unlikely, as the comment posted on HN by fuckemem reads as a non-sequitur

> or someone from Reddit copied the comment from HN.

Highly unlikely, as the comment posted on Reddit was posted over 1 hour ago while the comment on HN by fuckemem was posted 8 minutes ago.


The second part is impossible, the HN comment is timestamped afterwards.


Pedantic correction: that just makes it unlikely, not impossible.

It is after all possible to tamper with comment post dates (either by using some exploit, or by having legit access to a site administration).


The Reddit comment predates the HN comment (Reddit: "an hour ago"; HN: "8 minutes ago"). Same author is a possibility.


I don't see how that would fix kernel overhead. I guess of you have something heavy in the background that might slow it down though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: