

The Cache and Multithreading - jlemoine
http://austingwalters.com/the-cache-and-multithreading/

======
replicant
The first 2 examples are not dividing the work between the threads, but having
each of the threads repeat the full work, which is not poor OpenMP use, but
wrong use. I would have also used the collapse directive and played a little
bit with the schedule. Finally, looping in the inner loop through the first
index is not a good idea not only when working with OpenMP.

------
deletes
Comment on the blog that got deleted:

I did a similar test in C and have gotten very similar results. When N is
around 4000 the trashing version starts to differ substantially. A 3x
difference can already be seen when N is 1000.

 _This means if your program is running on two threads over different parts of
the matrix, every single iteration requires a request to RAM._

I'm skeptical over this part, I have tried to replicate this behavior but was
unsuccessful. Even though cores are sharing L3, I doubt that a thread will
overwrite the entire cache on every iteration.

~~~
lettergram
Author here, I have to manually approve comments, and did not delete this one
(way too much spam not to do so), sorry. Different compilers and architecture
will have different results, I explained that in a previous post.

Either way, you should see a noticeable difference as the size increases,
which was the point.

------
pron
For all those interested in this subject, I'd like to recommend Nitsan
Wakart's blog, [http://psy-lob-saw.blogspot.com/](http://psy-lob-
saw.blogspot.com/), which is dedicated to mechanical sympathy relating to
concurrency and the memory system.

~~~
tjaerv
And, of course, the Mechanical Sympathy blog itself:

[http://mechanical-sympathy.blogspot.com](http://mechanical-
sympathy.blogspot.com)

