

Intel Architecture Code Analyzer - jcr
https://software.intel.com/en-us/articles/intel-architecture-code-analyzer/

======
Scaevolus
Here's an example of the output of IACA:
[https://gist.github.com/nkurz/0bea9db0cff60ead1686#file-
broa...](https://gist.github.com/nkurz/0bea9db0cff60ead1686#file-broadcast-
c-L259)

It's somewhat cumbersome to use since you need to recompile with special asm
instructions delimiting the region of interest, but the extensive analysis it
gives is amazing for extreme micro-optimization.

------
iheartmemcache
The whole Intel Compiler stack is really useful from MKL for math, to VTune,
to TBB if you're doing any heavy numerical analysis. It runs about as much as
Visual Studio for the professional edition (under a $1k US) which is more than
reasonable for what you get. We use Coverity for static analysis, Kcachegrind
and ICC in concert in C++ and it's a brilliant stack. Intel PIN(1) is free for
everyone and is amazing too. There are a lot of open-source dynamic analysis
tools coming out from unis (the UC's, Pitt, CMU in particular which are almost
there but not quite).

(1) [https://software.intel.com/en-us/articles/pin-a-dynamic-
bina...](https://software.intel.com/en-us/articles/pin-a-dynamic-binary-
instrumentation-tool)

------
nadav256
IACA is a very useful tool. When working on the LLVM compiler I've used it
extensively.

------
jcr
I _almost_ put (2012) in the title since it's shown on the page, but the
article and docs/binaries were updated in 2013, 2014, and possibly this year.

~~~
sctb
Thanks! We reverted our (2012) addition.

------
alnsn
I spent about an hour looking at the tool. They say that only very few
instructions aren't supported (and thus ignored by the tool) but a very
popular imul instruction can be ignored.

PS I noticed that gcc with -march=corei7-avx generates more cycles compared to
a generic amd64 compilation for an unrolled integer(4) to char[4] conversion.

