
What Every Programmer Should Know About Memory (2007) [pdf] - jxub
https://akkadia.org/drepper/cpumemory.pdf
======
saberience
Is this title and content supposed to be ironic?

I quickly perused the article and I think this link should be renamed "What
99.9% of programmers don't need to know about memory."

I've managed to go from Associate to Principal without knowing 99% of what's
covered in this document, and I'm struggling to understand why the average
Java, C#, Python, Rust, <insert language here> programmer would need to know
about transistor configurations or voltages, pin configurations, etc. Let
alone 114 pages of low level hardware diagrams and jargon!

This document is for someone working on low level drivers for memory, or
working on the hardware side. For any normal software engineer, this
information is not helpful for doing your job.

~~~
dragontamer
I would argue that the overclocker needs to know more about these details than
most programmers, yes. (Overclockers actually tweak these values to maximize
the performance of their computer).

But any high-performance programmer needs to understand the RAS / CAS / PRE
cycle, if only to understand WHY the "streaming" of data is efficient, while
random-access is very inefficient.

If you are accessing RAM randomly, you better be sure its within L3 cache (or
nearer). I've done some experiments, and "streaming" data from beginning to
end can be 2x to 3x faster than random access on modern DDR4 RAM.

Understanding the RAS / CAS / PRE cycle helps me understand why streaming data
to RAM is faster. And understanding that cells are simply capacitors helps me
understand why the RAS / CAS / PRE cycle is necessary in DRAM.

~~~
ummonk
You don’t need to know any of that to know that streaming access is more
efficient than random access. You just need to know that caches cache
localized blocks. Actually, you don’t even need to know that. You could even
just be told that streaming access is faster than random access.

~~~
fizwhiz
This is how cargo cult programming begins.

~~~
arvinsim
It's called working on the relevant level of abstraction.

------
kevstev
There are some salty comments here, but I think the context is important. This
paper passed across my desk in early 2008 when I was doing HFT stuff. It might
be a bit of a stretch to say that the reason people are taught about cache
lines in most CS programs is because of this paper, but at the time this paper
was written, this was really specialized knowledge and groundbreaking to most
software developers. This would go on to be a popular topic on C++ blogs from
Important People (Boost maintainers, STL devs, etc) at least for the next 5
years.

Also, if you know Ulrich Drepper at all, either from some of his talks or his
mailing list presence, this is just a very fitting title from him. Just pure
deadpan, you think its funny, he probably does not, the fact that you think
its amusing is just disappointing him like a professor looking out at freshman
undergrads wondering how he got stuck teaching this class.

~~~
netmonk
I really do agree with you. HAving done HFT for some years, this paper was
crucial when it was running on linux system. Now FPGA took over the field.
Different kind of techs.

------
CalChris
I wish Ulrich Drepper (thank you, Mr. Drepper) would update this with a
section on _Row Hammer_ and also _Spectre_ and _Meltdown_. Programmer's need
to know about memory _because_ of these exploits, more so with the latter two
in order avoid creating exploitable gadgets.

But then I also think that _What Every Computer Scientist Should Know About
Floating-Point Arithmetic_ should be updated to include UNUMs. I don't think
that will happen either. Also, thank you Mr. Goldberg.

~~~
131012
Thanks for bringing that up, I came to the comments section to see if some
other details were outdated. Any other insights?

~~~
CalChris
Well, I think hardware and software prefetch, speculation is a special form of
prefetch, might be revisited. Software prefetch was at best a hopeful
technology even in 2007 and is widely avoided today.

[https://yarchive.net/comp/linux/software_prefetching.html](https://yarchive.net/comp/linux/software_prefetching.html)

Something I don't think either Drepper or Hennessy + Patterson's books get
across is memory banks from a programmer's perspective. How cache organization
affects a program is explained well but how banks affect said same isn't.
Construction yes. Visibility, no.

------
louthy
What every programmer should know about memory shouldn't be 114 pages long.

~~~
saagarjha
"A bit more than what most programmers need to know about memory, but would be
nice if they read anyways"?

~~~
asaddhamani
A _lot_ more. I don't see how most programmers would benefit from reading 100+
pages of low-level details about memory.

~~~
mhh__
Because knowing your craft is important? Besides, if you don't find this
interesting why become a programmer in the first place

~~~
Chlorus
There's a lot to learn about this craft, and people have to prioritize -
knowing algorithms & data structures is more immediately useful compared to,
say, knowing what scratchpad memory is. If I spent my time learning every
detail about every system underpinning every abstraction, I would literally be
70 years old by the time I started writing code.

> Besides, if you don't find this interesting why become a programmer in the
> first place

Who is saying it's not interesting? We're arguing that it's not fundamentally
vital knowledge to know the difference in RAS & CAS latency for SDRAM for most
programmers.

~~~
drainyard
But learning algorithms and data structures literally requires you to know
about memory on a pretty low level. As I'm sure you know, a lot of algorithms
and data structures that are theoretically equal can be vastly different in
practice in no small part because of how they use memory.

------
orzig
I was so excited to dive into this, but ended up with the same Takeaway as
most other commenters. Aside: As a data scientist, I’ve been surprised how
much I’ve needed to learn about the finer points of optimizing GPU utilization
for training.

It has all been from more experienced coworkers, and I would much appreciate
any resources anybody could point me to (free or paid) so that I could round
out my knowledge

~~~
dragontamer
Learn enough about GPUs to be able to read the profiler. That should be your
#1 goal: learning to use the profiler and performance counters.

The profiler not only tells you how fast your code is, but also __why __your
code is fast or slow... at least to the best ability of the hardware
performance counters.

Is it RAM-bottlenecked? Is it Compute bound? Are your Warps highly utilized?
Etc. etc. If you don't know what the profiler is saying, then study some more.

[https://docs.nvidia.com/nsight-visual-studio-
edition/Content...](https://docs.nvidia.com/nsight-visual-studio-
edition/Content/Analysis/Report/CudaExperiments/KernelLevel/PerformanceCounters.htm)

------
mjw1007
Interesting that in 2007 he thought FB-DRAM was going to win. That seems to
have been about the time it dropped dead.

~~~
Const-me
Right, also about NUMA:

> It is expected that, from late 2008 on, every SMP machine will use NUMA.

Outside servers, still not happened.

They are not that exotic anymore and are no longer exclusive to very expensive
servers, e.g. Threadripper 2920X is a $650 CPU, but market penetration is
still low.

------
blakehaswell
For an accessible talk about the real-world implications of this, I enjoy
watching Mike Acton's CppCon talk "Data-Oriented Design and C++":
[https://www.youtube.com/watch?v=rX0ItVEVjHc](https://www.youtube.com/watch?v=rX0ItVEVjHc)

~~~
drainyard
I routinely re-watch this talk. It always gets me back in the right mindset.

------
busfahrer
Ulrich Drepper used to be the glibc maintainer, IIRC

~~~
v_lisivka
memcpy != memmove , $#$@#@

[https://sourceware.org/bugzilla/show_bug.cgi?id=12518](https://sourceware.org/bugzilla/show_bug.cgi?id=12518)

~~~
okl
[https://sourceware.org/bugzilla/show_bug.cgi?id=3266](https://sourceware.org/bugzilla/show_bug.cgi?id=3266)

[https://sourceware.org/bugzilla/show_bug.cgi?id=12701](https://sourceware.org/bugzilla/show_bug.cgi?id=12701)

[https://sourceware.org/bugzilla/show_bug.cgi?id=386](https://sourceware.org/bugzilla/show_bug.cgi?id=386)

[https://sourceware.org/bugzilla/show_bug.cgi?id=10134](https://sourceware.org/bugzilla/show_bug.cgi?id=10134)

~~~
cls59
Also:
[https://sourceware.org/bugzilla/show_bug.cgi?id=10354](https://sourceware.org/bugzilla/show_bug.cgi?id=10354)

