

Intel-Micron Share Additional Details of Their 3D NAND - Justen
http://www.anandtech.com/show/9114/intelmicron-share-additional-details-of-their-3d-nand

======
jjoonathan
Speaking of the HDD->SDD transition, is anyone familiar enough with SRAM and
DRAM design/fab to comment on why we haven't seen a similar transition with
memory? 60ns latency on a 4GHz processor = ouch!

I'm aware that SRAM is 3-6x less dense, but it isn't uncommon these days to
see people with >3x the DRAM they need, so this doesn't strike me as a
terribly convincing justification.

I'm also aware that $/GB is insanely high for on-CPU SRAM, but that would also
be the case for on-CPU DRAM, which is why DRAM is typically put on a separate
die so that its process can be optimized independently. Does the SRAM process
just not optimize as well? Does it have insane power/heat requirements? What
goes wrong?

Or (puts on tinfoil hat) is JEDEC full of people who design DRAM memory
controllers for a living?

~~~
nordsieck
Products like Intel Iris Pro 5200 do add what you are talking about. If you
think about memory like a last level cache, however, it makes a lot of sense,
that much like L3 cache on CPUs, most systems optimize for density instead of
speed.

Preventing a single virtual memory access (particuarly from a spinning disk)
is worth an enormous speed up of the mean access time.

~~~
jjoonathan
No, I'm asking about off-die SRAM as a replacement for off-die DRAM, not on-
die DRAM as an alternative to {off die DRAM, cache, cores, etc}. There are a
bunch of tradeoffs to be made on-die, and I get the reasoning behind them even
if I don't know specific numbers. X86 has to enforce permissions, handle
sharing, cross-reference a TLB, etc and you can win significantly by
memorizing results subject to statistically unlikely invalidation. There would
be separate L1/L2/L3 even if all SRAM cells had identical latency and density.
Which they might, I don't know. L4 (eDRAM, what you were talking about) gives
you a huge density advantage, but it's still not competitive with SRAM for
speed, even though it's on the same process:

[http://www.sisoftware.co.uk/?d=qa&f=mem_hsw](http://www.sisoftware.co.uk/?d=qa&f=mem_hsw)

    
    
        L1:     4 clocks  <-- SRAM
        L2:    12 clocks  <-- SRAM
        L3:    36 clocks  <-- SRAM
        L4:   136 clocks (55ns) <-- eDRAM
        DRAM: 193 clocks (80ns) <-- off-die DRAM
        Clock: 2.5GHz (dynamic overclocking was disabled)
        5cm travel: 1 clock
    

With SRAM you just have to open the right gate, whereas with DRAM you have to
precharge the bitlines, open the word line, wait for the tiny signal to
amplify up to logic level, and only then do you get to read it out. Worse, you
need tons of logic to re-order memory access to take advantage of multiple
accesses on the same word line or that can happen simultaneously in different
banks. And you need to refresh each word line periodically, which requires
even more logic. There is a reason why the memory controller (not the cache,
the controller) is a huge chunk of the die roughly the size of 2 cores!

If we assume that L3 and L4 have similar management overhead then this all
takes ~100 clock cycles in the comparison above, which dominates the other
costs even if we disregard savings due to simpler logic in off-die SRAM
(which, when combined with travel time, accounts for 60 cycles).

I still don't understand why off-die SRAM isn't sensible.

