The benchmarking tool is very slick. Easy to configure for a variety of scenarios, and once you figure out how to install R it produces those pretty graphs.

Major weaknesses in it I've found:

- The compare script is fragile. Often times it doesn't want to compare two tests I did with the same exact config, just flipping code I'm testing against.

- It doesn't have a good mechanism for storing auxiliary information. We end up faking errors for it but it just looks ugly and hard to distinguish a correct run from a bad one.

