FWIW, GNU true.c is that way because the GNU policy is for programs to all have --help and --version arguments.
Other fun fact, in some UNIXes, true looked(looks?) like:
#!/bin/sh
# Some multiple line copyright notice from AT&T
Yup, that's a shell script with only comments, all of which are a copyright notice for what is, essentially, an empty file. I saw that on Solaris something like 15 years ago. I'm sure it made its way to other flavors of UNIX.
There’s a lovely story[1] from the early days of personal computing:
“GO.COM contained no program bytes at all – it was entirely empty. However, because GO.COM was empty, but still a valid program file as far as CP/M was concerned (it had a directory entry and file-name ending with .com), the CP/M loader, the part of the OS whose job it is to pull programs off disk and slap them into the TPA, would still load it!
So, how does this help? Well, using the scenario above:
• the user exited WordStar
• the user ran DIR (or whatever else they needed) and at some future point would be ready to re-run Wordstar
• the user now ‘loaded’ and ran GO.COM
• the loader would load zero bytes of the GO.COM program off disk into the TPA – starting at address 0100h – and then jump to 0100h – to run the program it just loaded [GO.COM]!
• result – it simply re-ran whatever was in the TPA when the user last exited to DOS – instantly [WordStar in this example]!
So, GO.COM, which consisted of zero bytes of code – and sold for £5 a copy is, I figure, the most profitable program ever written (as any other program will return mathematically fewer £s per byte than GO.COM did)!”
Empty files remind me of an episode where I worked. We were supposed to provide a set of files for a court case. But there was a missing zero byte file due to a glitch in processing. So just create one, right? NOPE. The client insisted we find and copy the original empty file from the original media.
You just know it's because, if you just made a new zero-byte file, the opposition in the case would claim that you forged the file and that the evidence was then tainted. Stupid, but your goal here isn't to reproduce the file, it's to stop a stupid argument from convincing stupid people to make a stupid decision.
But... to be sent, the file is recreated a bunch of times right? Because a computer only ever copies, and never “moves”, right? How can it then possibly matter whether the 0b file was recreated by a computer from the instruction of copying the file (which then just turns into an instruction to create the file), or if it was created directly through instructing the computer to create the file?
Computers do "move" files, in some cases, by updating the entry in the filesystem registry. In such a case, it might be that no change was made to the file on disk, so no copy was made at all.
I interpreted "provide a set of files for a court case" as some form of sending it either through the Internet or to some portable storage media. He also said "The client insisted we find and copy the original empty file from the original media", which also indicates that. I therefore intended "move" to mean "move from one storage media/computer to another". You're of course correct that if you "move" a file on one file system, the bits won't actually be copied (or moved) anywhere.
You wanted to provide a zero-byte file of the wrong colour. Unlike in physical reality, in legal world it strongly matters what colors are your bits (or in this case, what color is the lack of them).
Yeah, if they didn't find a copy on the backup/original media to confirm it's zero bytes, then really all they've provided is their assumption that it's zero bytes.
I'm no lawyer, but the testimony of the IT guy (and associated reasoning/etc) seems like a totally different kind of evidence than a true and accurate copy of a file from an original medium. And having the IT guy testifying seems like way more of a pain than just making him find the file.
"prior art" is a term concerning patents, whereas this is copyright.
There needs to be a certain level of creative work to qualify for copyright, and an empty program that does nothing is rather unlikely to qualify.
The definite case on that subject must be John Cage's "4:33", which is 4 minutes and 33 seconds of silence. Its copyright has been upheld in court. Yes, the estate of John Cage sued somebody who "quoted" to much of his work of silence.
That sounds somewhat silly, yet it's well-reasoned: the infringing artist actually acknowledged Cage in the CD inset. And 4'33 isn't really only silence. Audio recordings feature the pianist sitting down, open and closing the piano cover, and the audience making various noises. It's somewhat "out there" obviously, but just the fact that it's rather well-known kind of shows that it did what it was supposed to do.
I'm surprised that neither GNU yes nor GNU cat uses splice(2). Here's what I get. Note that ./rust is the final Rust program from the article modified to print stats and exit after a set byte count[1], ./splice is a simple C program that uses splice to copy the input string from a generated anonymous temporary file to stdout, and ./consume uses splice to move data from stdin to /dev/null. In all tests the default string of "y\n" is used.
splice is Linux-only. Most systems have sendfile(2), including Linux, but I didn't test it. The implementation and semantics of sendfile vary across platforms.
[1] I did this before I wrote ./consume (which takes an optional byte count limit), having assumed that GNU cat was using splice. As cat doesn't have a way to limit the number of bytes read/written, and other tools like head or tail definitely don't use splice, the simplest way to limit the benchmark without introducing a bottleneck was to have the producers themselves print stats and exit. The relevant diff is
+ let mut count = 0u64 as usize;
+ let bytes = (1u64 << 33) as usize;
...
-while locked.write_all(filled).is_ok() {}
+while locked.write_all(filled).is_ok() {
+ count += filled.len();
+ if count >= bytes {
+ break;
+ }
+}
> I'm surprised that neither GNU yes nor GNU cat uses splice(2). ... splice is Linux-only.
The fact that splice is Linux only is most likely why. GNU programs are portable to many different OSes, many of which no longer exist in any real form. They try to not use any 'single OS' specific feature if at all possible.
splice(2) and tee(2) were basically tailor made for cat(1) and tee(1), and for some reason I was under the impression that GNU tee was using splice(2) and/or tee(2). Using these could be trivial--just a few extra lines of code that could fall-through to the existing methods, for a huge speed-up in performance. (Performance matters because the consumers could be CPU bound, and an inefficient cat or tee might be taking away resources that could be used by the consumer.)
Regarding portability, GNU tail uses the Linux-specific inotify(2) to respond faster to writes. Like alot of OSS, coreutils uses the BSD .d_type member extension[1] of struct dirent to avoid unnecessary stat() calls. There are many other more intrusive OS-specific details baked into coreutils, but often it's the nature of the problem--in many situations you're dependent on platform-specific details or features. For the most part, these nitty-gritty platform-specific details are far more intrusive in terms of code complexity than the performance optimizations.
[1] Missing from Solaris, and probably most other SysV derivatives.
1) Because splice(2) requires either the source or sink to be a pipe. The source is a regular file so the sink has to be a pipe.
2) Because the example Rust program(s), emulating yes(1), had no [simple] way to measure throughput except by piping to another program. We can't fairly compare program A that writes directly to /dev/null with program B that writes to a pipe even if program A can measure its throughput.
2a) What jwilk said.
3) For some reason I thought that glibc had optimizations to elide fwrites to /dev/null, and some of my code was using stdio (e.g. for the final trailing bytes less than the pagesize). I could've sworn either glibc or bash did this, but I can't find any mention of it, now. I realize it would be crazy difficult and ugly for glibc to do this (because of dup2, etc), but glibc does alot of crazy things, and in any event I didn't bother checking beforehand.
Mostly it comes down to fairly comparing benchmarks and kernel facilities. Otherwise, yes, those would be classic examples of Useless Use of Cat.
hmm going in the direction of [binary] instead of [shell script], why not just go further and ditch the whole C stdlib by passing -ffreestanding and then compiling
This topic comes up every now and then. I thought this post was particularly insightful,
"One thing to keep in mind when looking at GNU programs is that they're often intentionally written in an odd style to remove all questions of Unix copyright infringement at the time that they were written.
The long-standing advice when writing GNU utilities used to be that if the program you were replacing was optimized for minimizing CPU use, write yours to minimize memory use, or vice-versa. Or in this case, if the program was optimized for simplicity, optimize for throughput.
It would have been very easy for the nascent GNU project to unintentionally produce a line-by-line equivalent of BSD yes.c, which would have potentially landed them in the 80/90s equivalent of the Google v.s. Oracle case."
> There is something elegant about being able to beat it out on speed with about 30 lines of C
What is this referring to? If it's referring to GNU C, the linked C implementation is about 100 non-comment lines, and is three times slower on the author's machine than the Rust version.
As other have pointed out elsewhere what matters here is not much the performance of the language but how much you can avoid calling IO code by making the path as straightforward as possible. You can get comparable performance ( 6.5 GB/s vs 8. GB/s on my machine) with 10 lines of python code, which probably took me the same time to write as the original rust version.
The code is below.
I have to admit that after a few more test I realized that the performance greatly depends on the length of the input, in some cases going as low 2 GB/s and some others reaching the performance of the GNU yes installed on my system. This is probably due to the fact that I build the buffer in a quite naive way.
With a bit more work maybe that can be fixed.
#include <sys/uio.h>
//2048 "y\n" s
static char __attribute__((aligned(4096))) str[] = "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n";
static struct iovec v[] = {
{str, 4096}, //adjust number of these to taste
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
{str, 4096},
};
int main(int argc, char** argv) {
while(1) {
//scatter-gather write (on linux there is basically no way for write to write an odd number of characters, except EOF, so we need not check for partial writes
(void)writev(1, v, sizeof(v) / sizeof(*v));
}
return 0;
}
As for why the naive rust version is slower, it's because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call, while with python, stdout is buffered. Python 2 emits writes of 4096 bytes, and python 3... 8193 bytes (edit: not a typo, this is 8KiB + 1). That's the likely cause for it being slower.
Edit:
A minimal version of the naive rust version would be:
fn main() {
loop {
println!("y");
}
}
On the same machine as with the python tests above, I get:
$ ./yes | pv -r > /dev/null
[4.81MiB/s]
which is actually as slow as python 3, despite doing 4 thousand times more system calls.
A version with buffering would look like:
use std::io::{stdout,BufWriter,Write};
fn main() {
let stdout = stdout();
let mut out = BufWriter::new(stdout.lock());
loop {
writeln!(out, "y").unwrap();
}
}
And produces 129MiB/s on the same machine. And that's strictly doing the same as what the python version does (with a default buffer size of 8KiB, apparently).
And fwiw, on the same machine, both GNU yes and the full rust solution from OP do 10.5GiB/s.
> As for why the naive rust version is slower, it's because without adding a BufWriter in rust, stdout is line-buffered, so each line emits a write system call, while with python, stdout is buffered. Python 2 emits writes of 4096 bytes, and python 3... 8193 bytes. That's the likely cause for it being slower.
Does it have nothing to do with the fact that string-of-bytes is the default in Python 2, whereas string-of-characters is the default in Python 3? Or is that perhaps related to the explanation you gave? Forcing the byte interpretation, Python 3 is slightly faster than Python 2 for me. Forcing the character interpretation, Python 2 wins, but not by as much as before.
Your bytes version outputs lines of, literally, `b'y'`.
The characters versions is still a clear win for python2 on my machine (8.9MiB/s vs. 5.6MiB/s)
It's also worth noting that the buffering behavior of python is only happening because the output is a pipe to pv. If it were the terminal, it would be line buffered, like the naive rust version.
In both cases, a 4KiB buffer is used by python. That's still way slower than the equivalent rust code with a 4KiB buffer (use BufWriter::with_capacity(4096, stdout.lock()) instead of BufWriter::new(stdout.lock())).
For a program like this, execution is I/O bound rather than CPU bound, so even if Python is generally less CPU efficient than Rust the effect is overwhelmed by a different I/O strategy. It's just like how Node has greater max throughput for trivial server workloads than a naive C implementation.
Your post seems to demonstrate _exactly_ what the post you're replying to is saying: most of the speed has to do with I/O strategy (buffering and syscall usage), not the actual speed of the involved language. Maybe I'm missing a distinction you're making, but I'm not following how your previous comment leads to what you're saying here.
The rust version that does exactly what python does (buffered output) is an order of magnitude faster (even if I force the rust buffer size to be 4KiB like with python2).
To head off this comment chain, I've softened the language in my original comment to "overwhelmed by" rather than implying that non-I/O factors are wholly irrelevant. :)
That doesn't make a whole lot of sense when the guy above you posted that two versions of python (2 vs 3) have significantly different output throughput rates.
Seems like it might even be better if yes was rate limited to a couple of lines per second or less. It would be more than enough for its intended use and when users inevitably run it in a shell to see what it does it wouldn't generate tons of output.
It seems like a violation of "do one thing well" to use it for generating data for testing, isn't dd and /dev/zero a better way to do that?
Sure, but you're building strings in memory. It might not be a lot of memory, especially if you're able to run Python, but the native `yes` command can run on the smallest of embedded systems, which is why its speed is impressive.
The author says "no magic here" for the C version:
for (;;)
printf("%s\n", argc>1? argv[1]: "y");
but it's not totally obvious to me whether the argument to printf would be evaluated on every iteration of the for loop or not. Does the compiler know that those don't change, and is the answer to that question fairly basic C knowledge or not?
The compiler can assume that argc will not change within the loop so it can optimize it. I just looked up the assembly output from gcc and it pulls the argc>1 outside of the loop and replaced the printf with puts. So something like:
if argc>1
for (;;) puts(argv[1])
else
for (;;) puts("y")
The replacing of printf with puts is based on gcc having specific knowledge about the printf library function.
I would desire the yes command to be as slow as possible while still performing its basic function of automatically confirming questions that come up during an installation.
This is because I clusterssh into 40 machines and hey I haven't formally accepted them into my known_hosts file yet so I type "yes" to acknowledge them, but whoops I had already accepted two of them so now they are spitting out the letter 'y' as fast as they possibly can and now I have to wait for all of that output to transfer over the wire onto my machine despite pressing ctrl+c a minute ago.
Then just use yes | head -50? There's no need to artificially slow it down when there are more reasonable means of capping output than relying on SIGINT.
I'm not familiar with clusterssh, but assuming that it passes flags to ssh you can add `-o StrictHostKeyChecking=no`.
Of course, the parameter does what it says, but the security implications are really the same as for blindly yessing your way through. The best is to pre-populate your hosts file using ssh-keyscan.
Ah, so, this raises an interesting question: what do people use yes for?
One answer (a bad answer) is "generating artificial load".
This makes the I/O strategy oddly relevant -- different strategies result in different numbers of syscalls and (potentially) kernel lock acquisitions.
Couple this with someone trying to benchmark something's behavior with an antagonist load, and the story gets downright painful to contemplate in terms of confusing results.
Source: in my younger years, I did this. In more recent years, I've seen other engineers do it.
In this case it's probably just for fun and exercise, but generally don't think of speedy code as extending more effort, but as making the computer waste less effort. Well, unless the program isn't blocking anything any just waiting for the network, and other such cases. But any code that actually does run, especially in a loop, doesn't run in a vacuum, so I'd rather err on the case of speed. These tools don't get rewritten every 2 years in a new framework, which probably helps. Why not hone them if they're going to be that heavily used?
Because the yes command will repeat whatever string you give it. If you need to parse the incoming load in a certain way, /dev/{zero,urandom} won't be suitable.
I know right. The only time I have ever used it has required just 1 y/n and a new line. In that case, these "performance improvements" might make it slower for real world use cases.
That's a K&R (Kernighan and Ritchie) style function declaration. Compilers still support it but the version you'd be more familiar with (ANSI C style) has been standard since at least the late 80s IIRC. ANSI C was standardised in 1989 but that process had been in progress for something like 5 years beforehand.
Specifically, everything is an integer with the auto lifetime (gets discarded with the stack frame) unless otherwise specified.
I think most modern compilers will warn on encountering such short hand due to the common error of accidentally declaring integers when you meant to assign a value to a variable you forgot to declare.
main and argc are declared as int (technically auto int) by default. argv is declared and given its type between the function head and the function block, which is perfectly legal but quite unusual these days
This syntax is probably nice if you use a line oriented editor like ed
It's not that hard or unlikely to run into it in old C code. The game Nethack is still being developed and it's programmed this way too (anything I looked at in its code at least).
all the reddit users from the linked article missed the SIGPIPE trick!
you don't need to check the return value from write() as your process will be terminated with SIGPIPE if it tries writing to a closed pipe.
saying that, none of them check the return code correctly: if the consumer only reads a byte at a time you could eventually get 'yyyyyyyyyyyyyyyyyy' (without any newlines)
quite impressive that so many implementations of "yes" have the same bug :)
on linux there is no way to do this without your consumer being a kernel driver that actually [purposefully accepts one byte at a time. all file io and pipes have basically no way to accept one byte only (except EOF)
you wouldn't see it on Linux as pipes are implemented using complete pages (always powers of 2), but there's probably some OS out there with a different implementation where you can set the buffer size to be an odd number, and then you'd see the bug with plain simple cat
Not impressive at all.
Basically he had to write a lot of manual buffering code to reach GNU yes throughput.
I would suggest to use an infrastructure which already provides proper IO.
I guess a stretch goal would be to make a "shouldi" command that can consume more y's per second than yes can produce. Of course at that point the shell itself would probably become the bottleneck.
The shell can't be a bottleneck. If you run `yes | shouldi`, all the shell does is setup the pipe, give one of its ends as stdin to shouldi, and the other end as stdout to yes.
The shell does not even enter into the picture. The shell sets up the pipe and starts your programs. The `yes` program writes its output to stdout which is buffered in kernel and then directly read by your hypothetical `shouldi` program.
Is beautiful, readable, and minimal. The "optimized" Rust version is complicated and over 50 lines of code. At what point does performance optimization go too far?
This is an unfair comparison. For starter this version too is unoptimized, so you need to compare this with the first Rust code, which is about the same size.
Moreover a lot of the "bloat" that comes with Rust code isn't because it's more performant. Neither C++ and Rust try to be faster languages (after all, the optimization tricks you can do with your program is limited to what you can do with Assembly and its cousin, C), they try to be safer by providing more abstractions and restrictions (compile time checks).
Since it can be used to repeat arbitrary data, it is liable to be used in performance-sensitive tasks (mostly along the lines of pumping dummy data to a program or piece of hardware that's being benchmarked).
Funny, I just learned about this command a couple of days ago as a simple way to max out your CPU. I was trying to drain the battery on my Macbook Pro and running 4 of these at the same time did the trick nicely. Redirected to /dev/null and run in the background: "yes > /dev/null &"
Hah, nice! Somewhat related, I once worked on a project that used a high reliability PC meant for extended use in "extreme" outdoor environments. One of the issues they (manufacturer) worried about was the pcb and solder joints experiencing thermal fatigue fron lots of seasonal and night/day temperature cycles.
Their ingenious solution was to always run the system towards the warmer end of its spec, and so it included a program that would monitor the temperature inside the case, and would spawn/kill a bunch of threads doing compute intensive math in order to keep the temperature constant when the users workload wasn't enough!
Passively cooled. The thing was originally meant for extended use in "extreme" industrial environments (eg at a power substation, inside a wind turbine, etc..), so it had no vents or moving parts at all. Everything was heatpiped to the metal case, which looked like a heatsink.
Reminds me of http://thedailywtf.com/articles/Just-a-WarmUp , a trick which I also discovered independently many years ago and used to keep my fingers warm while attending lectures in a nearly-unheated room.
That manual counting of cores is what I thought we would let `parallel` handle! I have never actually used parallel, though, so I don't know how to best do it.
Right. I was unclear. What I really meant was that I was thinking that `parallel` could automatically spin up more and more jobs until it sensed that there is no further performance to be gained. I'm not sure to what extent that is true, though.
A shell script wouldn't be enough (I've tried). However there are plenty of CLI based load testing tools, including one I've written myself. And if you need something more advanced then there is always Gatling, which is run via the command line and produces proper HTML reports and graphs plus is extended in code (eg in Scala) rather than GUI controls
Lol I defrost the chocolate like this by placing it near the air vent. But care should be taken to make sure that the wrapper doesn’t touch the chocolate while peeling off, because I don’t think the hot air from the vent is hot enough to kill all harmful microbes.
this ridiculously complicated syntax to perform such a simple thing is why I will never accept Rust. What a clumsy, ugly language. I’ll just stick with learning ANSI common LISP, that pays immediate dividends.
It's fairly idiomatic so you get used to it but if you prefer something a little more C-ish you can write it like that (as a side note the code in TFA forgets to import std::env and BufWriter):
use std::env::args;
fn main() {
let expletive =
match args().nth(1) {
Some(arg) => arg,
None => "y".into(),
};
loop {
println!("{}", expletive);
}
}
This is the equivalent to the unoptimized C version. I would argue that it's a little more readable too, substituting "loop" instead of the idiomatic "for(;;)" and the more verbose match syntax instead of the ternary (the terser rust equivalent being the `unwrap_or` you seemed to dislike).
Personally I think I'd use the "unwrap_or" version, when you're familiar with the language it's completely transparent, easier to parse and expresses intent better I think. For an outsider I can see why it would look like a strange incantation though (but the same could be said about ternaries or CL's "do" construct for instance).
The only thing I'd deem inelegant here is the `"y".into()`. It's probably not obvious what it does and why it's necessary to somebody not familiar with Rust.
That example is indeed more readable, but it still reads like methods on objects which is a fundamental issue for me. Any non trivial program written in an object-oriented paradigm eventually becomes incomprehensible, whereas the functional approach doesn’t rely on the state machine model. Two different approaches. I didn’t care for objects back in 1990 and I certainly care even less for them now. I upvoted your example in appreciation for the effort you went through though.
I agree that for these types of constructs a more functional style could look better. That being said the OOP approach as the merit of reading in the proper order: you get your args, pick the 1st and then use that or the default "y" value.
If you rewrite this to use function calls instead of methods you end up with something like:
(or (cadr (args)) "y")
Which is fine if you're used to lisp syntax but it's really a matter of taste at this point.
But what if one doesn’t care about memory safety because one has other languages at one’s disposal, like shell, AWK, Lisp? What benefit does Rust bring then? And at what price?
And who is to say that the programmers of Rust built a perfect language that always generates code which is bounds-safe? I question that as I know from experience that no human has ever written 100% correct code 100% of the time; even machines aren’t capable of achieving that. Therefore, I hold that the entire promise of Rust is flawed. I don’t like Rust one bit.
Problem is that Rust is not memory safe anymore, since they switched from GC to refcounting and added unsafe.
Pair that with being no concurrency safety (deadlocks, races) there are only macros left which do make rust attractive.
On the other side there are proper languages which do provide all safeties and beat rust or C++ in performance, such as pony. A proper type-system does help (compile-time guaranteed), but you could also add such features to the run-time system (e.g. a GC or safe threading with a single-writer system such as parrot). With rust you don't have any of it, you have to manually add locks or mutexes into your threaded code, and try to avoid unsafe.
GNU true.c: https://github.com/coreutils/coreutils/blob/master/src/true....
OpenBSD true.c: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/true/tr...