Hacker News new | past | comments | ask | show | jobs | submit login

> But consider also that you can stick chiplets on top of each other vertically.

The problem there is heat dissipation. Already the performance constraint on consumer chips like the Apple M1 is how well it can dissipate heat in the product it's placed in (see Macbook Air vs Mac Mini). Stacking the chips just makes it worse.




AMD's 5800X3D and the upcoming generation of AMD/NVIDIA GPUs (both of which are rumored to feature stacked cache dies) are going to be real interesting. So far we haven't ever seen a stacked enthusiast die (MCM doesn't feature any active transistors on the interposer) and it will be interesting to see how the thermals work out.

This isn't even stacking compute dies either, stacking memory/cache is the low-hanging fruit but in the long term what everyone really wants is stacking multiple compute dies on top of each other, and that's going to get spicy real quick.

M1 is the other example but again, Apple's architecture is sort of unique in that they've designed it to run from the ground up at 3 GHz exactly, there's no overclocking/etc like enthusiasts generally expect. AMD is having to disable voltage control/overclocking on the 5800X3D as well (although that may be more related to voltage control rather than thermals - sounds like the cache die may run off one of the voltage rails from the CPU, potentially a FIVR could be used to drive that rail independently, or add an additional V_mem rail...)

And maybe that's the long-term future of things, that overclocking goes away and you just design for a tighter design envelope, one where you know the thermals work for the dies in the middle of the sandwich. Plus the Apple design of "crazy high IPC and moderately low ~3 GHz clocks" appears well-adapted for that reality.


Wide and slow is expensive. Very expensive. That's why Apple can do it and nobody else is doing it (in mobile Qualcomm is _cutting_ cache from Arm reference designs and in servers Graviton and Ampere are also cutting cache). It is cheaper for a given performance level to clock your cores as far as they'll go and cheap out on the width of your core if you know your customers either won't or can't care about power efficiency (because they have no better alternatives).


The fact that the M1 Macbook Air operates without needing a fan is very unusual for that level of performance.


the problem is signal propagation

for light to cross 1 feet should take ca 1ns


3D circuits would be denser (shorter propagation distances) than a planar circuit. In fact "computronium" is sort of an idea about how dense you can conceptually make computation.

You just can't really cool it that well with current technologies. Microfluidics are the current magic wand that everyone wishes existed but it's a ways away yet.


Signal propagation latency is only a problem if both end points need to be fully synchronous, which they don't have to be if they're independent units. Even the cores within single-chip multi-core CPUs aren't synchronous to each other, as far as I know.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: