>I wanted something that could give me a few seconds worth of samples of where the instruction register was spending its time

You could write this with Pin [1], but I'd be surprised if there wasn't a profiling tool with instruction level analysis available. If there truly isn't, then there are Pin examples that can be pretty quickly modified to achieve this.

1. https://software.intel.com/en-us/articles/pin-a-dynamic-bina...

