> *But consider also that you can stick chiplets on top of each other vertically...

paulmd · on April 4, 2022

AMD's 5800X3D and the upcoming generation of AMD/NVIDIA GPUs (both of which are rumored to feature stacked cache dies) are going to be real interesting. So far we haven't ever seen a stacked enthusiast die (MCM doesn't feature any active transistors on the interposer) and it will be interesting to see how the thermals work out.

This isn't even stacking compute dies either, stacking memory/cache is the low-hanging fruit but in the long term what everyone really wants is stacking multiple compute dies on top of each other, and that's going to get spicy real quick.

M1 is the other example but again, Apple's architecture is sort of unique in that they've designed it to run from the ground up at 3 GHz exactly, there's no overclocking/etc like enthusiasts generally expect. AMD is having to disable voltage control/overclocking on the 5800X3D as well (although that may be more related to voltage control rather than thermals - sounds like the cache die may run off one of the voltage rails from the CPU, potentially a FIVR could be used to drive that rail independently, or add an additional V_mem rail...)

And maybe that's the long-term future of things, that overclocking goes away and you just design for a tighter design envelope, one where you know the thermals work for the dies in the middle of the sandwich. Plus the Apple design of "crazy high IPC and moderately low ~3 GHz clocks" appears well-adapted for that reality.

ceeplusplus · on April 5, 2022

Wide and slow is expensive. Very expensive. That's why Apple can do it and nobody else is doing it (in mobile Qualcomm is _cutting_ cache from Arm reference designs and in servers Graviton and Ampere are also cutting cache). It is cheaper for a given performance level to clock your cores as far as they'll go and cheap out on the width of your core if you know your customers either won't or can't care about power efficiency (because they have no better alternatives).

GeekyBear · on April 4, 2022

The fact that the M1 Macbook Air operates without needing a fan is very unusual for that level of performance.

Iwan-Zotow · on April 4, 2022

the problem is signal propagation

for light to cross 1 feet should take ca 1ns

paulmd · on April 4, 2022

3D circuits would be denser (shorter propagation distances) than a planar circuit. In fact "computronium" is sort of an idea about how dense you can conceptually make computation.

You just can't really cool it that well with current technologies. Microfluidics are the current magic wand that everyone wishes existed but it's a ways away yet.

adwn · on April 5, 2022

Signal propagation latency is only a problem if both end points need to be fully synchronous, which they don't have to be if they're independent units. Even the cores within single-chip multi-core CPUs aren't synchronous to each other, as far as I know.