Without understanding what a program is doing, you don't understand what is impacting your results, and have no real knowledge on how things are going to differ when you go to use them in the "real world". Is one process faster when single threaded vs. a low core count, but another is massively parallel, and loses out until scaled higher? Are your commands testing the thing you think they're testing? What is your limiting factor? If you don't know why the results are what they are, instead of higher, you don't have a good benchmark.
http://www.brendangregg.com/activebenchmarking.html / http://www.brendangregg.com/ActiveBenchmarking/bonnie++.html
Just today I was playing with /proc/sys/vm/drop_caches, I'd never used it before, it makes a massive difference reading from a spinning disk!
For example to read tens of thousands of files (using 8 processes), it would take me
Yes, having a "cold" or a "warm" disk cache makes a massive difference for I/O-heavy programs. For one of my other programs, I differentiate between "cold-cache" and "warm-cache" benchmarks: https://github.com/sharkdp/fd-benchmarks
If another user/process tries to access your SSH files directly, it'll go through the traditional file permissions to determine if it has access or not. If the disk block is in the page cache AND access is allowed to that inode, then the kernel will retrieve the page from the cache and give it to the process.
To read the whole page cache, you'd need code sitting in kernel space. If something manages to load itself in the kernel space (e.g. kernel module), you have bigger problems to worry about.
I'd be surprised if you can get access to the page/disk cache without root privileges though.
I'm gonna do some more digging on that.
Edit: Maybe Meltdown, lets you access the RAM with that though I guess?
This would be useful when comparing two similar commands, as interleaving them makes it less likely that e.g. a load spike will unfavorably affect only one of them, or due to e.g. thermal throttling negatively affecting the last command.
The effect of the linux-rt patchset is dramatic.
Not yet, but I've just created a ticket here: https://github.com/sharkdp/hyperfine/issues/20
Should be easy to implement.
I personally use command-line benchmarking to compare different tools. You might want to compare grep, ack, ag and ripgrep. I currently use it to profile my find-alternative fd and to compare it with find itself (https://github.com/sharkdp/fd-benchmarks).
You could also use it to find an optimal parameter setting for a command-line tool (make -j2 vs. make -j8).
How could you do something like:
find . -iname "<asterisk>any<asterisk>thing<asterisk>"
HN won't let me use * so I had to tag them accordingly.
find . -iname '*any*thing*'
Could it use dtrace to measure other metrics besides time?
Hyperfine currently tracks real time (= wall-clock time), user time (= time spent in user mode) and system time (= time spent in kernel mode).
Unfortunately, I have never heard of dtrace. What kind of other metrics would you be interested in?
I believe I would like hyperfine to focus on timing-aspects.