

AMD's high-bandwidth memory explained - geoffgasior
http://techreport.com/review/28294/amd-high-bandwidth-memory-explained

======
ChuckMcM
It will be interesting to see how this holds up longer term. I read a great
paper from ISSC on making chips thin enough for stacking and their conclusion
was that it takes very little silicon to build most chips. Thin enough that
stacking old 130nm dies to get 22nm transistor density in the same thickness
using a stack of 6 130nm 'sheets'. Heat is still an issue of course as is
alignment (if you heat one sliver before the others heat up it can push itself
out of alignment apparently)

That said, I'd expect this to become the memory for laptops in the not to
distant future. A Core i7 with 16GB of DDR5 stacked on top of the CPU and all
that freed up space for more battery. Look for it in a Macbook near you :-)

~~~
Zenst
I'd of thought the central connecting holes that are used to link the wafers
together as a bus also act as heatsink to blance heat across the stack.

Also with the lower speeds used, heat will be less and may be the limit
currently to avoid warping of individual wafers in the stack.

As for desktops etc, one avenue this does open up would be a more standard
socket perhaps as the CPU can change and sits on another socket in effect,
allowing changes to be done at that level to maybe stretch out sockets some
CPU's would normally never reach.

------
stephengillie
I had never looked at a GPU this way before, but it's essentially a "slocket".
The name comes from the Pentium II and some Celerons, which mounted onto an
addon card which also carried the external L2 RAM cache. This was mounted on
the daughterboard to grant physical proximity and thus low latency.
[http://en.wikipedia.org/wiki/Pentium_II](http://en.wikipedia.org/wiki/Pentium_II)

Does this count as a 3DIC?

Also, obligatory:
[http://i.imgur.com/v7loIkF.jpg](http://i.imgur.com/v7loIkF.jpg)

I've seen GPUs advertised as having 128-bit buses, but I didn't know GDDR5 was
only 32bit. For this it feels a bit like the RAMBUS vs DDR battle all over
again; high-clock combined with a narrow bus caused higher latencies than a
slower clock with a wider bus, which has the added benefit of being cheaper
overall.

~~~
bryanlarsen
A slocket is a PCB with an edge connector, just like current graphics cards.

This is a multi-chip module or MCM ([http://en.wikipedia.org/wiki/Multi-
chip_module](http://en.wikipedia.org/wiki/Multi-chip_module)). Intel also uses
them to put the North bridge on the same package as the CPU for its mobile
CPU's.

There are several innovative aspects about this MCM, but MCM's have been
around for a long time.

A single GDDR5 chip is 32bit. GPU's use many chips in parallel to achieve
wider buses.

~~~
spikels
I guess the difference here is the substrate is another silicon chip rather
than PCB or substitute. Pretty cool. Must have been some significant
engineering challenges.

------
cptskippy
I see HBM evolving into an external cache with off substrate RAM eventually
working it's way back into cards.

~~~
awalton
I think you're wrong. Everything in history to date with semis has been
focused towards integrating more on chip, or at the very worst, on package.
This puts the memory as close as it can get to the chip without driving the
chip's cost up astronomically. There's absolutely nothing to suggest they'll
de-integrate moving forward.

The biggest problem after this is how much the GPU and CPU have to fight over
main system memory, which really brings us to the end game of GPUs altogether.
Sooner or later there won't be room enough for both in the picture, your
single heterogeneous core or MCM will have both a CPU and GPU on it (and
probably a half a dozen or more application-specific accelerators).

~~~
cptskippy
I wasn't suggesting there would be any disintegration. Just that off substrate
memory will be reintroduced in-addition to HBM.

If you need a point of reference take a look at L2 cache. It was originally
chips on the motherboard, then with the PII it was put on a Processor Card,
then with the PIII it was eventually integrated into the Die.

The exact same thing happened with L3 cache.

------
listic
Looks like 3D is going to be 'the next big thing' in DRAM, one way or the
other.

We are going to have AMD HBM [1], NVIDIA stacked DRAM [2] and Hybrid Memory
Cube [3]. But why do we need all three, when the latter is supposed to be a
standard? Or are some of these _actually_ duplicates?

[1]
[https://en.wikipedia.org/wiki/High_Bandwidth_Memory](https://en.wikipedia.org/wiki/High_Bandwidth_Memory)

[2]
[https://en.wikipedia.org/wiki/GeForce_1000_series](https://en.wikipedia.org/wiki/GeForce_1000_series)

[3]
[http://en.wikipedia.org/wiki/Hybrid_Memory_Cube](http://en.wikipedia.org/wiki/Hybrid_Memory_Cube)

~~~
Narishma
AMD and Nvidia are both using HBM AFAIK. AMD is just farther along.

------
unwind
This reminds me of the good old PlayStation2's "Emotion Engine" hardware; it
has a 2,560-bit wide memory bus in the GPU. Of course that's three independent
read (1,024) write (1,024) and read/write (512) buses, but still it was pretty
wild back in the year 2000.

------
minthd
To some extent , that's not really that impressive. AMD gets access to a
breakthrough technology like HBM and 2.5D silicon interposers , and all we get
is just a measly 50% improvement in memory bandwidth ?

A more interesting configuration would be attaching 12 1GB HBM chips to the
gpu , and achieving a memory bandwidth of 128GB/s * 12 = 1.5Tbyte/sec (would
increase power only by 30W over current model).

Maybe their gpu is too weak to support such massive memory bandwidth, and it
would be quite hard to do so ?

~~~
kllrnohj
I have no idea how you could possibly choose "measly" as the adjective to pair
with "50% improvement in memory bandwidth" (with the part about "using half as
much power" being suspiciously missing from your summary).

