
Ask HN: Quintessential readings for software optimization - kansai
What are some of the best resources for getting more familiar with stuff like playing nicely with branch prediction, instruction pipelines, and building data to fit well in with CPU cache?
======
Paul_Diraq
The best resource in my opinion is still "What every programmer should know
about memory." It is now over ten years old but still the most comprehesive
and comprehensible post I know about that topic.

[https://lwn.net/Articles/250967/](https://lwn.net/Articles/250967/)

AFAIK there have been no major breakthroughts since that time in memory
technology (Transactional memory could be one). While modern processors may be
build a bit different (e.g. added shared L3 chache) you will have no problem
understanding them.

There is also this github :

[https://github.com/Kobzol/hardware-
effects](https://github.com/Kobzol/hardware-effects)

for programs which should show such effects. (I haven't used them yet but they
may be interesting for you to play with.)

IT Hare has published a nice article about those costs including rules of
thumb costs.

[http://ithare.com/infographics-operation-costs-in-cpu-
clock-...](http://ithare.com/infographics-operation-costs-in-cpu-clock-
cycles/)

Then there are the Agner tables. Giving you latencies and reciprocal
throughput for your instructions.

[https://www.agner.org/optimize/instruction_tables.pdf](https://www.agner.org/optimize/instruction_tables.pdf)

If you are through these you have to read the intel and AMD optimization
guides. (ARM may or may not have something similar.)

The paper by Kazushige Goto ("Anatomy of a High Performance Matrix
Multiplication") is an example of Cache and TLB considerations in a non-
trivial example.

[http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/g...](http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf)

