
Putting Your Data and Code in Order: Optimization and Memory - ingve
https://software.intel.com/en-us/articles/putting-your-data-and-code-in-order-optimization-and-memory-part-1
======
marmaduke
This is 2016 and article's author is using GCC 4.1 (released in 2007)? I
wonder if a modern GCC does the autovectorizations that the author says v4.1
does not do.

~~~
wallacoloo
Furthermore, the ICC version he is comparing it to (16.1) _hasn 't even been
released yet_, as far as I can tell. Wikipedia lists 16.0 as the latest stable
version, and it's _really_ difficult to find out which version of ICC is
available for download on intel's website without navigating through tens of
pages and submitting at least one form.

Also, the article brought up the idea of block sizing, which _sounds_
compelling. But the writer failed to produce a benchmark for it which did
better than baseline on ICC, and then kept on writing as if block sizing had
merit without even commenting on this discrepancy.

~~~
pierrealexandre
If I understand Table 1 correctly, it shows that with ICC a block size of 8
with 4000x4000 matrices gives roughly a speedup factor of 2 (~10.53 / 6.38).
The unit is not clearly stated but it looks like a bigger number means a
better "performance" ie a faster program.

~~~
wallacoloo
Oh, indeed. I believe I was not reading the table as intended.

