Show HN: Rust test harness that measures energy consumption

thijsr · on April 5, 2022

Hey, I wanted to share a project that we've been working on! Coppers is a custom test harness for Rust that allows you to measure the energy consumption of your test suite. A use case for this could be to identify regressions in energy usage, or to do more targeted energy optimizations. Our goal was to make it as seamless as possible to integrate it with existing Rust projects. To make that work, we had to rely on some unstable and internal Rust compiler features that are only available in nightly. But the current implementation seems to be able to measure the energy consumption of almost every existing Rust crate we tested! (with the exception of embedded and system-specific crates, but that is a limitation we're looking into)

teitoklien · on April 5, 2022

it's a pretty cool project :D, Thank you for making it ! I'll definitely try it in my next project.

______-_-______ · on April 5, 2022

Counting instructions is very accurate and roughly approximates power usage. The CPU's self-reported power usage is comparatively pretty noisy. Unit tests will probably be done running before you can get meaningful data. I have to wonder if a test runner is the best point of integration for this. It might make more sense to expose it as a bench harness, like criterion.

EDIT: another benefit of a criterion-like approach is that you wouldn't require nightly

thijsr · on April 5, 2022

Yes, the CPU self-reported power usage is indeed fairly noisy. We've tried to mitigate this by executing certain tests multiple times in a row, and using the average power consumption across these executions. However, this is data is still quite noisy and is influenced by external factors, like the operating system, your hardware, power management settings, and a lot more. We mention this and possible mitigating actions in the README.

We chose for a test harness because one of our goals was to make it as easy as possible to run it on existing Rust projects. A lot of projects define tests, but benchmarks are not often not present. But maybe a bench harness would be a better and/or cleaner approach, will look into it!

Shish2k · on April 5, 2022

> Counting instructions is very accurate and roughly approximates power usage

I’ve always assumed this to be true, but I see a lot of benchmarking tools / libraries measuring wall-clock time or iterations-per-second or something like that, I’ve never seen a benchmark tool which counts CPU instructions. Am I being blind or is there some other reason that I’m not seeing them? :S

______-_-______ · on April 5, 2022

At the end of the day most people care about wall clock time. It's a real physical value that's easy to understand and easy to compare between systems. Plus, if two functions execute say, 1 billion instructions each, but one spends extra time stalled waiting on IO or data fetches from RAM, you definitely want to account for that in normal benchmarking.

Instruction counting is more of a specialized tool but I like to use it whenever I can because it has low variance and makes comparing changes a lot easier. Compare how bumpy these graphs are for instruction count (first link) and wall clock time (second link):

https://perf.rust-lang.org/

https://perf.rust-lang.org/?start=&end=&kind=raw&stat=wall-t...

wooosh · on April 5, 2022

Counting instructions does not give information about time spent in syscalls/doing IO, which limits its use to CPU-bound software.

wmf · on April 5, 2022

Instructions correlate to energy but not to performance. If you're benchmarking performance you should use wall clock time.

mhh__ · on April 5, 2022

Counting instructions properly is hard and also results in a good amount of overhead if you don't use a bunch of tricks or a kernel module.

You also can't really count instructions in the cloud.

wooosh · on April 5, 2022

Counting (userspace) instructions is relatively easy regardless of language with perf stat, though it does require the kernel module. Generally speaking it should just work if perf is installed through the package manager for your distribution.

edit: valgrind's callgrind utility can also produce exact instruction execution counts for a given block of code

mhh__ · on April 5, 2022

Callgrind can give you instruction counts yes. It doesn't simulate any microarchitecture other than caches which means its only useful for comparing with itself.

Perf stat is very very high overhead. The perf API is available and can be tuned a bit more nicely but it's mostly a horrible mess. It uses bitfields too which makes it somewhat hard to get to from other languages unless you trust the shifts and masks.

namibj · on April 6, 2022

Counting instructions is bad when you have vectorization approach trade-offs. AVX-512 has famously high power consumption.

hd4 · on April 5, 2022

This reminds me of an idea I wanted to submit to the systemd team (or wherever it would be more appropriate) to have a Linux service report on the current power usage of the OS and maybe even translate it into currency-per-hour to show people how much they were spending over time with the aim of reducing power wastage. Seems like it would be more relevant than ever given the global situation around energy these days.

yjftsjthsd-h · on April 5, 2022

So... powertop? Possibly run as a daemon logging to the journal, which I grant is somewhat different from how it works now (ncurses tool or something you run and log to CSV).

rictic · on April 5, 2022

Many benchmarking systems face measurement issues that make it difficult to produce solid results. Any given run might not be running on the same hardware, the same OS, built with the same compiler, running with the same runtime, with the same versions of dependencies, with the same system load, at the same temperature, and so on.

One robust solution is to instead do pairwise comparisons, many times in a round robin fashion. The results aren't quite as nice to plot, as you don't get a single consistent speed value, but they are much more reliable and true, and you still get useful information, like ">95% chance that this test is at least 20% faster at this commit than at the previous one".

A project I contribute to uses this strategy: https://github.com/Polymer/tachometer, but I'd love it if more benchmarks took this approach.

ducktective · on April 5, 2022

Very nice!

Any ideas on how to measure energy consumption of programs in a GNU/Linux OS? I know of `powertop` but it measures total power consumption (its per-program table is inaccurate)