
More on Apple’s A9X SoC: 147mm2TSMC, 12 GPU Cores, No L3 Cache - lelf
http://www.anandtech.com/show/9824/more-on-apples-a9x-soc
======
Analemma_
The complete lack of an L3 cache is fascinating if true. Not being an expert
on chip design, I had thought the "number of cache levels" trend was
monotonically increasing, especially since some of Intel's latest chips now
have L4 caches, but now this. Can someone explain how they can get away with
this? Is it enough to just have a really fast and wide bus, did they possibly
need to make changes on the OS level, or is it just more acceptable on a
device where you'll only be running one or two applications and very few
background daemons/services?

~~~
stygiansonic
The article had this to say:

" _One explanation may be that Apple deemed the L3 cache no longer necessary
with the A9X’s 128-bit LPDDR4 memory bus; that 51.2GB /sec of bandwidth meant
that they no longer needed the cache to avoid GPU stalls._"

And:

" _Our own Andrei Frumusanu suspects that it may be a power matter, and that
Apple was using the L3 cache to save on power-expensive memory operations on
the A9. With A9X however, it’s a tablet SoC that doesn’t face the same power
restrictions, and as a result doesn’t need a power-saving cache. This would be
coupled with the fact that with double the GPU cores, there would be a lot
more pressure on just a 4MB cache versus the pressure created by A9, which in
turn may drive the need for a larger cache and ultimately an even larger die
size._ "

So, basically, while the A9's 4 MiB L3 cache wouldn't have been much to add to
the A9X, it wouldn't have helped much and a larger, more costly L3 cache would
have been needed to help the A9X performance-wise. As the article points out,
this is all speculation.

~~~
CyberDildonics
A cache isn't a substitute for bandwidth, it helps with latency. More
bandwidth doesn't impact that problem either way.

~~~
sliken
Actually it is. Every cache hit increases the available bandwidth for the CPU.
So generally the larger the caches the less bandwidth and the less sensitive
you are to memory latency.

~~~
CyberDildonics
A cache may free up bandwidth but it isn't a substitute for latency. Modern
processors and vastly more latency bound then bandwidth bound, which is what
cache most directly helps.

------
BooneJS
I wonder if this was mostly a play for power savings? SRAM leakage power can
be pretty high (relative to the total power budget) if it can't be powered
down during sleep-like modes.

------
porsupah
How would one, practically, go about modeling the impact of L3 impact in a SoC
design, assuming you were intending to produce sonething like the A9X?

(I speak as one who loved sim_g4 as a modeling tool for verifying compiler
efficiency, or rather, occasional lack thereof. So easy to see when a cache
miss would occur, or just execution unit contention)

