

Hybrid Memory Cube receives its finished spec, promises up to 320GB per second - ozantunca
http://www.engadget.com/2013/04/03/hybrid-memory-cube-receives-its-finished-spec/

======
jws
From the spec:

• External interface is multiple 10-15 Gbps SerDes interface, each with 16
full duplex lanes. (The 320GB number comes with 8 10Gbps links, the 4 link
device is 240GB/s (higher clock rate))

• Internal ECC for memory, packet based interface with CRC and retry.

• Built in self test, there can be spare resources which allow it to replace
failed sections.

Envision a city on a grid filled with sky scrapers. The ground floor of each
skyscraper is the logic, called a "vault controller", each floor above is DRAM
storage. The city is constructed by laminating chips, one for each layer, and
the sky scrapers form their connections vertically through the chips.

There is a switching fabric that connects N serial links to M vault
controllers.

• 16 vaults in the 4 link version, 4GB. 32 vaults in the 8 link version, 8GB.

• A single vault controller can be servicing many serial links simultaneously.
It can prioritize. Within a single link, requests will always happen in order.

• There is a router system which allows up to 8 cubes to be on the same host
link to increase storage per host link. Link length is limited and power
demands are higher for longer links. I think the router will allow shorter
links to be used, especially in multiple cube modules.

• Atomic bit write and atomic add transactions. New options for the lock free
algorithm folk.

• 31mm^2 BGA. 4mm tall. For the 4 link device. About 900 pins. About half are
grounds, 1/4 of the remainder are powers, the rest signals.

• 7 different power supplies at 4 different voltages required. Get to work
board designers!

• READs and WRITEs are from 16 to 128 bytes wide.

• 4 link device can have up to 4GB, 8 link can have up to 8GB. (This seems
small to me, but I suppose it comes down to storage/bandwidth balancing, and
you can have 8 devices on the same link.) Oh, they see the problem too. They
are considering using the currently ignored lower order bits of blocks to
expand the addressing, and there are two bits reserved just above the address.
Quick, someone get the time machine, take them to visit the IDE disk block
addressing planners.

• The refresh logic checks ECC and rewrites if a soft error is found. Take
that cosmic rays!

------
m_mueller
One should note that we already get 250GB/s peak on the GDDR5 used in NVIDIA
Tesla K20x. Intel claims 320GB/s peak for the fastest MIC. What is claimed in
this article is not that new then. From experience with tesla, you can usually
expect about 70% of the peak bandwidth (and using Intel MIC with conventional
x86 codebases it tends to be less, but that's second hand knowledge).

~~~
__alexs
That's 250 GigaBit/s, this is 320 GigaByte/s. No?

~~~
unwind
From the GDDR5 Wikipedia entry (<http://en.wikipedia.org/wiki/GDDR5>):

 _The newly developed GDDR5 is the fastest and highest density graphics memory
available in the market. It operates at 7 GHz effective clock-speed and
processes up to 28 GB/s with a 32-bit I/O.[4] 2 Gbit GDDR5 memory chips will
enable graphics cards with 2 GiB or more of onboard memory with 224 GB/s or
higher peak bandwidth._

So, no, it seems that's bytes.

~~~
__alexs
Wow. That does seem to require a crazy 512-bit wide bus though. The info I can
find on HMC seems like it is achieving these speeds on a 16-bit bus and with
much lower power requirements than even DDR3 so far.

~~~
m_mueller
Ok now that's new to me. If you can go 320GB/s on a measly 16bit bus, then
it's really feasible to have this as your main memory. On the other hand
imagine what you can do with this memory on a GPU, you'd probably get over
1TB/s there. Actually, that's something NVIDIA already has on their roadmap as
far as I remember seeing it at GTC.

------
jcr
blogspam is annoying. We're supposed to submit original sources whenever
possible (according to the HN "Guidelines").

Original hybridmemorycube.org press release:

<https://news.ycombinator.com/item?id=5485833>

Original computerworld.com article:

<https://news.ycombinator.com/item?id=5485823>

------
zacharyvoase
But what's the latency? I'm inclined to mention that a truck full of tapes
hurtling down a freeway has a 'high bandwidth'.

~~~
unwind
I tried searching the spec itself
([http://hybridmemorycube.org/files/SiteDownloads/HMC_Specific...](http://hybridmemorycube.org/files/SiteDownloads/HMC_Specification%201_0.pdf))
but it doesn't seem to contain any specifications about the latency.

Lots of _talk_ about latency-minimization though, but it seems this is
basically a packet-oriented interface (with CRC on packets, retries and stuff)
so I guess latency will be larger than with today's DDR interfaces.

Perhaps computer systems will have both DDR memory and HMC, letting the OS
and/or applications decide how to distribute access for maximum performance.

~~~
jcr
I haven't read the 1.0 spec yet but if the marketing in their FAQ is to be
believed, they claim "will provide a substantial system latency reduction"

<http://hybridmemorycube.org/faq.html>

------
ChuckMcM
This is fun stuff. That it can achieve these bandwidths without requiring an
extra wide bus is also pretty impressive. Running a quad serdes into an FPGA
or 64 bit CPU should be pretty straightforward (as opposed to an unweildly 256
bit wide GDDR5 bus). I so wonder what the power dissipation is like though.
10gbit SERDES ports on my switches get pretty warm (there is a XAUI phy
connected to the 10gbit ports) Having 8 of them sitting under a chip seems
like recipe for a hotplate.

------
Aardwolf
I must say I had never heard of this before and it looks like one of those
"too good to be true" things. I'll believe it when it's a consumer product.

------
revelation
Wow, they just skip all the pretense and add a "show full pr-text" button.

~~~
EvilTerran
I think that's meant to mean "press release text", but my first thought was
definitely "public relations text".

