

Intel’s Long Awaited Return to the Memory Business - qubitsam
http://www.realworldtech.com/intel-dram/

======
Hoff
For comparison purposes, Intel Itanium 9500 series Poulson-class processors
have ~40 MB of on-chip cache, split among the L1 I&D, L2 and L3 caches.

~~~
twoodfin
But that's all SRAM, no? With about 10x the bandwidth.

~~~
pbharrin
Yes all SRAM

------
gnoway
The link will not load.

This article is about the same thing, presumably:

[http://www.anandtech.com/show/6911/intels-return-to-the-
dram...](http://www.anandtech.com/show/6911/intels-return-to-the-dram-
business-haswell-gt3e-to-integrate-128mb-edram)

~~~
pbharrin
The link takes over a minute to load, but it does load.

They must be serving this from a Raspberry Pi.

~~~
dkanter
We had some configuration problems with apache, it should be quite readable
now : )

------
ippisl
Isn't it the wrong direction to go? With the price of memory chips about $1
per 265Mbyte, and technologies of integrating multiple chips with tons of
connections and speed(2.5D and 3D integration) closing by, what's the point in
offering 128MB at a price hike of $50 ?

~~~
adsr
On die cache is orders of magnitude faster than regular ram. I have always
thought that more cache would be a sensible addition when a process shrink
give more available real estate.

~~~
marshray
Sure but that's static RAM. DRAM has 50+ ns of latency no matter how close you
put it to the chip.

What they _can_ do though is achieve more aggregate bandwidth without
increasing package pin count. Perhaps this explains why Intel was willing to
allow Ivy Bridge to have only half the memory bandwidth of the previous Sandy
Bridge: just not that many (non-server) systems were ending up with the four
DIMMs required for full bandwidth.

So now they're integrating it into the package itself.

~~~
sparky
> DRAM has 50+ ns of latency no matter how close you put it to the chip.

This is not some fundamental law; it's entirely dependent on bitline and
wordline capacitance (array size). For external DRAM chips, huge arrays make
sense, and you end up with tens of ns of latency. In an eDRAM cache scenario,
you use much smaller arrays, and in the case of the POWER7 L3, latency is more
like 6ns (for row hits, of course).

~~~
marshray
It seems to me (you may know more about this than I do) that one can design a
memory system to make latency for sequential addresses accesses arbitrarily
small and this is mostly how (G)DDR(n) have improved overall performance. But
random (row miss) accesses don't seem to have improved much.

From early 1980's to early 2000's, DRAM latency went from 150 to 50 ns and has
stayed around there since. That's 1-2 doublings of performance in that
parameter in the last 30 years, compared to who-knows how many in transistor
size and speed.

Still, Intel knows what they're doing and I can see a place for (yet another)
spot in the memory hierarchy before the CPU resort to off-package DRAM.

~~~
sparky
Yeah, given a predictable access pattern, latency can be completely hidden by
prefetching. For an unpredictable access pattern, there's a physically imposed
lower bound on random access latency, roughly proportional to the square root
of the number of bits stored.

There's a huge tradeoff space between latency, bandwidth, capacity, and cost,
and market forces have forced a convergence around 2 design points,
colloquially DDR and GDDR. For yield reasons, die area (cost) has been mostly
fixed for a long time. Successive generations of DDR spend the dividends of
Moore's Law primarily on extra capacity, then on a bit of additional bandwidth
when possible. Successive of generations of GDDR prioritize bandwidth,
primarily by dedicating tons of area to high-speed single-ended I/Os.

These two design points make sense for their most common use cases. In
traditional disk-based systems, avoiding hitting the disk is more important
than absolute DRAM latency, so increasing capacity is your best bet. On GPUs,
you need enough bandwidth to feed a quickly growing number of functional units
on the chip, and at least for graphics, the access pattern can be made to be
extremely predictable, so latency is not as important there.

The advent of faster-than-HDD persistent storage (SSDs) and the desire to run
more general purpose workloads on highly parallel machines like GPUs points to
a need for a third DRAM design point.

~~~
wmf
There is RLDRAM, but I've heard that it's expensive.

~~~
sparky
Yeah, there are a few specialized parts for networking, telecom equipment,
defense, and other less cost-sensitive applications, but DDR and, to a lesser
extent, GDDR dominate the market and enjoy much greater economies of scale.

------
marshray
No matter how tightly integrated it is into the package, I'm skeptical that
Intel integrated graphics with 128MB of DRAM is going to replace an Nvidia GT
650M with 1-2 GB of GDDR5.

~~~
voidlogic
Intel started out targeting the low end discrete graphics solutions, now maybe
they re just moving their way up the food chain.

I'm actually more excited that the CPU has access to this cache too.

