
DDR5 Memory Specification Released - mikecarlton
https://www.anandtech.com/show/15912/ddr5-specification-released-setting-the-stage-for-ddr56400-and-beyond
======
TanjBennett
So many misconceptions here about DRAM. DRAM is miraculously cheap. The
process probably costs about $1.50 to $2 per GB, the rest is indeed profit.
That nets them maybe $4,000 per wafer - and that includes all the testing,
slicing, packaging etc. An average CPU chip in your laptop is about the same
size as maybe 3 DRAM chips which cost around $20.

DRAM runs on a separate process which is dominated by the difficulty of
building the capacitors. These are roughly the shape of a pencil (long narrow
hexagons) where the central structure which holds the capacitor needs to be
etched to perfection in a process that can take days. The transistors
underneath are, at that scale, about as large as the chad from a paper hole
punch. The capacitors are just about as narrow as material science (limit to
voltage arcing through the insulation layers) can make them so there is
glacially slow progress in shrinking DRAM further. Meanwhile the transistors
are at extreme limits of resolution for liquid immersion processing, as also
are the lines needed to join the rows and columns. Getting those perfect
requires very specialized and competent processing.

They are not easy, second rate circuits. They are a completely separate branch
of the silicon world. Unfortunately since they don't scale much any more,
current design methods were mature 8 years ago, the only way you get more of
them is to build new factories. That means it is a seller's market in a game
where building another fab costs $10B and will only succeed if staffed by
really expert people. So, it is generally profitable. The 3 vendors cannot
easily undercut each other since they all have roughly the same limits, and
any attempt to flood the market takes 4 years to build and everyone can see it
coming.

So there you are. DRAM is the pivotal technology of the current computer era.
Fixing that will most likely require breakthroughs in fundamental memory
technology - or a reason for demand to collapse.

------
ksec
>Combined with die stacking, which allows for up to 8 dies to be stacked as a
single chip, then a 40 element LRDIMM can reach an effective memory capacity
of 2TB. Or for the more humble unbuffered DIMM, this would mean we’ll
eventually see DIMM capacities reach 128GB for your typical dual rank
configuration.

So on 8 Channel 16 DIMM per socket you could fit a theoretical 32 _TB_ of
memory. This is insane amount of memory and great for In-Memory Database. (
How is Intel Optane going to compete? )

This makes me wonder, what makes DRAM so expensive? It is still hovering at a
median price or around $3/GB compared to NAND which is less than $0.1/GB.

~~~
baybal2
> This makes me wonder, what makes DRAM so expensive?

Greed does. DRAM makers were antitrust busted at least 7 times on my memory in
Taiwan, Korea, and USA.

~~~
lend000
The processes to form NAND and DRAM are completely different. DRAM relies on
creating non-leaking capacitors which are highly difficult to manufacture at
such a small scale. NAND benefits from innovations in the CPU lithography
space since it's essentially all transistor based. Why would you expect them
to have the same price, unless you knew nothing about the technology? Also,
there are plenty of distinct competitors in the DRAM space. Do you have a
source suggesting Micron and Samsung are engaging in price fixing together?

~~~
vitus
If the difficulty of DRAM is in creating capacitors at that size, why haven't
we seen a shift toward SRAM (6T, for instance), which is purely transistor-
based?

Sure, you have to sacrifice more transistors for the same capacity, but newer
processes can fit more on the chip, right? I recall from computer architecture
classes that the benefit of DRAM is the ability to use fewer transistors, but
if transistors are cheap...

(I'm sure I'm missing something here. Power consumption / heat generation? I
also never really understood why SRAM continues to be so expensive, when it
seems like it would obviously benefit from smaller processes.)

~~~
formerly_proven
On Intel's 10 nm process you can make SRAM with a density of about 20
megabit/mm². [1] Older processes are much worse (<5 megabit/mm²). [2] A
current-gen _DRAM package_ achieves about 170 megabit/mm² (but that's two
dies, probably stacked). This article [3] cites 8 Gb on 77 mm² on a 21 nm
process, giving 105 megabit/mm², and 148 megabit/mm² for the DDR5 version with
a die size of 54 mm². The same article shows a Samsung part with around 200
megabit/mm² density.

So even if you were to manufacture SRAM on Intel's ultra-expensive 10 nm logic
process, you'd need a massive amount of silicon for the same capacity.

[1] [https://fuse.wikichip.org/wp-
content/uploads/2017/12/isscc-2...](https://fuse.wikichip.org/wp-
content/uploads/2017/12/isscc-2018-intel-10-sram-testchip-density.png) [2]
[https://d3i71xaburhd42.cloudfront.net/f20203949a744276e338d6...](https://d3i71xaburhd42.cloudfront.net/f20203949a744276e338d6a4c647e9db3facfdf6/4-Figure13-1.png)
[3] [https://www.anandtech.com/show/13999/sk-hynix-details-its-
dd...](https://www.anandtech.com/show/13999/sk-hynix-details-its-
ddr56400-dram-chip)

~~~
g8oz
Forgive my naivete but: 20 megabit/mm^2 for SRAM...a 1u rack is 600mm X 914mm
= 548,400mm^2. Multiply that by 20 megabits and that is about 70 Gigabytes.
Does that mean in theory we could build a rackmount server with an external L1
cache of 70 Gigabytes? The cost would be horrendous but I'm sure there is a
scenario where it could make sense.

~~~
opwieurposiu
This would require an impractical amount of wires. For an 8 core, 64 bit cpu
with differential signaling would need something like 8 _(64+64)_ 2 = 2048
wires, and the length of the wires would mean the latency would be much worse
then an on-die cache.

------
gruez
>All the while, there are several smaller changes [...], such as [...] on-die
ECC

This means we don't have to worry about ECC support by CPU/motherboard
anymore, right?

~~~
wmf
I don't assume anything. Intel may find a way to deliberately cripple ECC in
the memory controller.

~~~
pedrocx486
And the market may find a way to cripple Intel.

I'm looking forward to our ARM future. :-)

------
hedora
I wonder if this spec increases or decrease memory access latency. The article
doesn’t say, which makes me suspicious.

After all, DDR4 has higher latency than DDR3 running at the same clock speed.

~~~
hvidgaard
That is generally the trade off made to get better clock speeds. And the
absolute latency is better with once the speed gets sufficiently fast.

------
mjw1007
« The big change here is that the command and address bus is being shrunk and
partitioned, with the pins being reallocated to the data bus for the second
memory channel. Instead of a single 24-bit CA bus, DDR5 will have two 7-bit CA
busses, one for each channel »

If there are two 32-bit data busses rather than one 64-bit bus, arithmetic
suggests they shouldn't need to find extra pins from somewhere.

So maybe the rationale for shrinking the CA busses (to 7 rather than 12) is
something different?

~~~
tpxl
[https://images.anandtech.com/doci/15912/DDR5_12.png](https://images.anandtech.com/doci/15912/DDR5_12.png)

DDR4 appears to have had 40 and 32 bit data buses, while this one has 40/40.

~~~
Dylan16807
In other words, data bits stay at 64 but ECC bits go from 8 to 16.

------
luizfelberti
Does this finally address Rowhammer? Ctrl-F on the article yields nothing...

~~~
kube-system
That's more of a die-level issue rather than a module-level issue, isn't it?

~~~
dfox
DDRwhatever is primarily an definition of package level interconnect which has
possibility of being used as module level interconnect as one of design
constraints. And row hammer and similar things are completely irrelevant for
such specifications.

------
bullen
How is just splitting the memory in two separate channels going to make
anything faster?

How will this affect driver complexity and cache-misses?

~~~
wmf
This is sort of explained in the article. I think they had to use burst length
16 [1] to scale to 6400 MHz, but 16 * 64 bits would be 128 bytes or two cache
lines. The whole memory system works in cache lines, so it wouldn't be good if
the processor requested one cache line and got two. So they use BL16 with a
narrower 32-bit channel to fetch one 64-byte cache line.

As long as multiple cores are accessing memory or prefetching is on (it's
almost always on), both channels will be utilized so software won't notice.

[1] When you do a read operation on DRAM you get a multi-cycle burst of data,
not just one word. This amortizes command/address overhead and presumably
matches the slow-but-wide internal DRAM array with the fast-but-narrow
channel. See
[https://people.freebsd.org/~lstewart/articles/cpumemory.pdf](https://people.freebsd.org/~lstewart/articles/cpumemory.pdf)
sec. 2.2.

~~~
LargoLasskhyfv
Reminds me a little bit of Virtual Channel Memory (VCM) SDRAM from NEC.

------
legulere
Are there any changes in there for bulk memory operations such as copying or
zeroing?

------
pavehawk2007
Wonder if this spec will make it easy for embedded systems to catch up. It
always seems like they lag behind what's cutting edge. Maybe that's a
cost/benefit analysis.

~~~
lnsru
I have a brand new design with DDR2. I can power memory from existing 1.8V
rail, no need for more voltage regulators. And 400 MHz is totally ok for me
since I can have whole memory bandwidth for myself, no operating system, etc.
And my application is very cutting edge for sure in its domain.

~~~
pavehawk2007
I'm assuming LPDDR? I think the goals between the two are a bit different. I
think that embedded gets quite messy since it's particularly targeted.

Thanks for letting me know of your experience!

~~~
lnsru
MT47 family. Very normal 1.8V DDR2.

------
tlhunter
Both DDR4 and DDR5 have the same 288 pinout. Hopefully nothing bad happens if
a stick is plugged into the wrong socket.

~~~
wtallis
Same pin _count_ , but different keying provides a mechanical barrier against
inserting the wrong kind of module.

------
anticensor
Why not QDR2?

~~~
ATsch
Despite sounding related, QDR and DDR are mostly unrelated technologies. They
are also both poorly named.

The real purpose of DDR is not actually to double the data rate, but to halve
your clock speed and allow you to use the same frequency for your clock as
your data. This mostly benefits signal integrity.

QDR is better understood as memory with two ports, one for reading and one for
writing, which can be used at the same time. This is a lot more expensive and
really doesn't have huge benefits for PCs compared to just adding more
channels (as DDR5 does).

~~~
Dylan16807
Well there's "QDR" memory like you describe, and then there's "QDR" like
GDDR5X and GDDR6 have, which is a single port doing 4 transfers per clock.

That said DDR5 is solidly on two transfers per clock, so I don't understand
the suggestion to call it or use "QDR2" (what's QDR1 for contrast?).

~~~
ATsch
QDR/QDR2 are real interfaces, although for SRAM. Afaict Micron and Cypress
make them for specialialty applications.

------
fnord77
so, will we see mini-ITX mobos that support 128Gb RAM?

~~~
wmf
Yes, although maybe not soon.

------
gswdh
This may be a stupid point, but, for personal use of computers in their
current form, how much memory do you really need? I’m still a little baffled
why chrome requires GBs of memory...? Can we have lean software please?

~~~
jiggawatts
In a word: No.

CPUs have become so fast that relative to their "internal" speeds, RAM is the
new hard disk. Databases are becoming in-memory, and going out to fixed
storage, even SSD, is an anathema.

New applications are not designed to work on data sets bigger than physical
memory. Disk-to-disk streaming algorithms are practically unheard of outside
of a few niche scenarios. Like I said, even database vendors are moving to in-
memory!

I love machines with huge amounts of memory. My laptop has 64 GB, and it's
great! I can run entire fleets of servers in a local hypervisor. I can load
huge blobs of CSV or JSON data into the shell and not have to worry about the
2-5x overhead of the in-memory representation. It'll fit just fine. I can run
every "bloated" app at once and still have 50 GB free for "whatever". I've
reindexed a database on my laptop in minutes that would have taken days(!) on
a production server because it didn't have enough RAM and was thrashing the
storage like crazy.

Another way to look at it is the "GB per CPU core". With existing AMD EPYC 2
CPUs having 64 cores and 128 threads, the typical 512 GB memory configuration
is "only" 8 GB per core, or 4 GB per thread! With a dual-socket server, halve
those numbers again. Similarly, mainstream desktop Ryzen CPUs have up to 16
cores, and that's not even talking about the not-so-mainstream Threadripper
line. For 4GB per core, you'd need 64 GB.

It's likely that AMD will release 24 or 32 core _mainstream_ CPUs in the near
future, maybe as soon as 2 years from now when their 5nm products start
shipping. I fully expect server CPUs to hit 96-128 cores per socket around the
same time frame, or up to 512 hardware threads in a standard two-socket
server. Terabytes of memory is going to become "standard" very soon now.

~~~
labawi
It really is nice when you can afford to have the latest doubling of memory,
and do things you couldn't do easily before. Maybe run an entire DC on your
computer. Works very well while you're on the upper end.

However, that does not address the sheer wastefulness of our technological
trends to require more resources to do things slower, but displayed with
smaller and more colorful pixels. Should everyone have a 64-core 512GB memory
computer to view web pages, play minecraft or whatever? Will that be too small
to write a text document in 20 years time? Will every person on the planet be
expected to get a bigger computer because they can't run the (electron-in-
ethereum-on-browser-in-container)^n pancomputer?

