Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

All you need is an assembler (a language with a built-in assembler helps with the learning curve) and the processor manual (ideally the processor manual's assembler syntax will match your assembler's syntax; this is not the case for e.g. the defaults in GNU as and Intel x86 or x64). Build a timing loop (timing instructions like rdtsc are right there in the manual) and get to profiling instructions. Read up, and find out about the hidden performance models behind the various instructions (and rework your timing loop to reset these things where possible); read up on caches, branch prediction, register renaming etc., then play around, try to reproduce the positive and negative sides of these optimizations and develop intuitions etc.

I don't think there's a better - or easier - way.



A timing loop will tell you how fast it runs on your particular configuration, but not how fast you can expect it to run in the general case. For that, you need to get very intimate with the processor manuals, especially optimization guides. Also, a good profiler, like Intels VTune, is great for getting low level performance data such as cache misses.

Also, doing what you suggest on a number of combinations of hardware would be useful, so you can compare various processor architectures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: