I am assuming that having RAM and CPU (and other functions such as GPU or other hardware accelerators) on same die on different levels (physically close ie nano or micrometers apart) would cut latency by orders of magnitude.
3D is already done with NAND flash so I am assuming it is heat that is the problem.
Just my 10c
Even worse, surface area available for heat dissipation grows quadratically, while volume and therefore heat generated grows cubically. That's why 3D memory is a thing (most memory cells are not active at any given time), but seriously 3D processors (trigate notwithstanding) are not.
The biggest issue (IMO) will be heat dissipation and manufacturing time.
It could be that the physical complexities of adding more layers is just too expensive.
We already have RAM(cache) and GPU(integrated graphics) on the same die. But that doesn't work as a replacement for professional/gaming graphical workloads.
The problem is that cache coherence does not scale, given (note emphasis) that a certain amount of data is shared among CPUs.
You cannot circumvent the problem by choosing to not share much data between cores, because then you have changed the problem!
"Using amortized analysis, we show that interconnection network traffic per miss need not grow with core count and that coherence uses at most 20% more traffic per miss than a system with caches but not coherence. Using hierarchical design, we show that storage overhead can be made to grow as the root of core count and stay small, e.g., 2% of total cache size for even 512 cores.
Consequently, we predict that on-chip coherence is here to stay. Computer systems tend not abandon compatibility to eliminate small costs, such as the costs we find for scaling coherence"
And now the quote from TFA:
"Do we really need cache coherency across a hundred cores or a thousand cores on a die?"
The paper's 512 cores are in the middle of the range where TFA suggests you start to get enough problems to seriously consider breaking compatibility and foregoing cache coherence. Why is the paper which supports its claims with concrete analysis less credible than an unsupported offhand remark from someone at Intel, a company that made a fortune from all of its CPU designs except that infamous one where they abandoned compatibility?
BTW a basic understanding of coherence might lead one to doubt the claims about lack of scalability. When you hit the cache, you know that you own the data so it's fast. When you miss the cache, it's a very slow operation anyway, the overhead of checking whether to fetch the data from external memory or another cache can't be that bad. And if data travels a lot from one cache to another because of true or false sharing, that's the fault of the code doing that. Coherence is not supposed to make sharing blindingly fast, just to make it correct, and to make false sharing correct as well.
Of course none of this will be relevant if there's no substrate on which to put a thousand of x86 cores, which right now there isn't.
The problem isn't so much that missing the cache is slow, but rather how do you make a processor know that it has to look the data value up in another cache instead of main memory? The two main options are either to basically broadcast every request to every cache it might be in, or to store in main memory which cache it's in--you have a design tradeoff between high interconnect traffic or high memory metadata usage.
The number 2% also is achieved only if you do three-tier organization of processors, and (IIRC) you need to limit the cache broadcasts to within each tier--which puts some sharp constraints on how you lay out cores in a chip. Intel's many-core chips, IIRC, are laid out in a single ring connecting all the cores, which means it's completely the wrong topology implied by the paper.
Imagine an old star that is undergoing a collapse - the matter infalls under it's own gravity, accelerating faster and faster. At some point, the matter runs out of density - the neutrons slam into each other and the whole mass of the star going inwards at a good fraction of c tries to undergo an abrupt stop.
The leisurely free fall turns into the violence of inertia. Then, either that violence is enough, and the collapse continues towards a black hole, or it's not enough, and all we get is a boring neutron star.
Will the economical inertia behind the Moore’s Law push it past each technology's clang of limits all the way towards some sort of singularity, or will it one day come to an abrupt halt against an immovable wall of the structure of the universe?
The latter, obviously.
Would that wall be CMOS? We can't really know until we hit it.
Bismuth, Gallium-Arsenide and other variations on that theme, various superconductors used for processing elements all have the potential or have already surpassed Silicon for various parameters. But none have done so in a way that would allow massive adoption.
So for now the 'economical inertia' as you put it so eloquently seems to be exactly where the problem is, inertia is only useful to get you past a hump when you're still moving forward, when you're already stuck it is a hindrance.
From the article - possible replacements for CMOS include:
Rapid single flux quantum devices
Carbon nanotube field-effect transistor
Silicon nanowire FET
Spintronics, various types
Tunnel junction devices, e.g. Tunnel field-effect transistor
Indium antimonide transistors
Photonics, optical computing
I have read somewhere that this actually integrates fairly well into CMOS.
> “There is nothing on the horizon today to replace CMOS,” he says. “Nothing is maturing, nothing is ready today. We will continue with CMOS. We have no choice. Not just Intel, but everybody. If you want this industry to grow, then we need to be doing things with CMOS
The trend since the mid-1980s, when the Intel 80386 was introduced, has been for specialized chips to be replaced due to commodity chips beating them comprehensively in single-core speed; problems which once required specialized hardware and core designs were either solved or obviated by massive improvements in scalar hardware.
Now that this brute force method is starting to peter out, we might see a rise of new, specialized designs once again, to solve specific problems which gain substantial speedups from very specific kinds of parallelism. Systolic arrays are one example. We're already doing something like this by pressing GPUs into service as computational hardware, but it can go farther.
Modern server CPUs have FPGAs in them. I was able to speed up Blockchain transaction signing and verification by 2 order of magnitude by utilizing the Intel in-build FPGAs.
Mobile phone SoCs already have semi-specialized processors in them. Microsoft's Hololens has something what they call "HPU" Holographic Processing Unit. Google just made public that they have a TensorFlow ASIC. Google's phone radar thingy has it's own ASIC if I'm not mistaken.
Maybe biotech will lead the next tech revolution. I can see people doing all sorts of unholy stuff with CRISPR. It's a technology though that, at least in the U.S, I can't see the regulatory agencies really supporting development of. It's almost too powerful of a technology.
It's sort of how not much has happened in the way of nuclear power design lately. So much can go wrong with that technology that the regulatory agencies never approve anything.
I wonder if we'll reach a point where all the socially acceptable technologies have been developed and technologies like advanced nuclear or radical CRISPR biotech that could take things to the next level are permanently forbidden because they are too powerful and dangerous.
Our most advanced computing systems fill the equivalent volume of a large snowflake. Scaling down is over; the future is scaling up and sideways. (Power consumption, price...)
The only unit that ever really mattered is amortized flops per inflation adjusted dollar.
Also, we mainly ignore parallelisation because it's hard to reason about, which makes it easy pickings for AI to improve on if we do need stupidly large amounts of computing power. Although if it's inherently linear we're just SOL I guess.
This. Almost certainly blending biological ideas with more traditional chip manufacturing. 3D chips, self healing, cooling integrated through the chip.
Augmenting our brains with implants. Maybe even expanding parts of our brains.