It's not as if Intel and others haven't looked at this and other III-V materials...

imtringued · on June 14, 2016

The majority of silicon on a chip is dedicated to masking the extremely high latency (50-100 processor cycles) and constrained bandwidth of DRAM (8GB/s vs 1TB/s with state of the art GPUs). Adding more cores is complicated by the fact that you need to synchronise all those caches. Out of order execution also has a relatively high fixed power consumption which is why modern x86 have a worse performance/watt ratio than ARM processors which only introduced out of order capability in newer designs like the A9 which is the primary reason why the iPad Pro approaches x86 performance.

There are only a few solutions for this problem. Fast optical interconnects are needed or the DRAM has to be on the same package as the CPU.

vardump · on June 15, 2016

What's that 8GB/s referring to?

From context it sounded like you're referring to DRAM bandwidth. But no RAM interface is that slow anymore, except embedded and some slower mobile devices. Single channel DDR2 can be faster than that.

Phones have about 12-30 GB/s RAM bandwidth nowadays.

X86 based systems from 25 GB/s (dual channel) to 130 GB/s (two socket quad channel).

For example 2015 Macbook Pro 13" has about 29 GB/s memory bandwidth. Samsung Galaxy S7, 28 GB/s.

Also which GPU you can actually buy has 1 TB/s bandwidth? You'd need HBM2 for that kind of number! Recently released Nvidia flagship Geforce 1080 has 320 GB/s bandwidth according to their own marketing material (http://www.geforce.com/hardware/10series/geforce-gtx-1080).

Unless, of course, you meant to compare 2018 GPUs to 2003 CPUs. Athlon 64, released in 2003, had up to 10.7 GB/s memory bandwidth according to http://www.cpu-world.com/CPUs/K8/AMD-Athlon%2064%202600%2B%2....

paulsutter · on June 14, 2016

Architectures for deep learning and other AI approaches will be radically more parallel and require less synchronization.

yuhong · on June 14, 2016

I assume that off-chip SRAM would not improve things much, right?

Symmetry · on June 14, 2016

We're near the end of scaling in silicon but we're nowhere near the physical limits of computational efficiency in general. There are certainly hard limits and improvements will have to cease eventually but not necessarily until the later pars of the 21st century.

tcoppi · on June 14, 2016

The trouble is various incarnations of silicon-based transistors are the best we know how to engineer right now and for the foreseeable future. There will probably be a stall for at least 5-10 years after silicon scaling ends in the late 2010s/early 2020s while we try to develop a suitable replacement that can continue to the ultimate physical limits.

seanwilson · on June 14, 2016

> IMHO, Moore's Law will end in about 2019-2020. No exponential can go on forever. This will have huge ramifications for silicon valley.

In what way? One problem seems to be that most consumers don't need super powerful computers anymore. Previously, the speed jumps were enabling you to do things you couldn't do before but now the speed increases are just making what you could do before a little snappier. We're seeing a lot more focus on miniaturisation (which does enable new use cases) and reducing power consuming as well.

Retra · on June 14, 2016

Are you kidding me? There are zillions of things our computers can't do today that would be absolutely fantastic if they were faster. You could have said the same thing in 1975 and it would have sounded just as silly.

Example: Render a model of an entire real city in 3d, including interiors, to sub-centimeter resolution. Now take that and think up a trillion variations on the theme.

CognitiveLens · on June 15, 2016

Yeah, more powerful computers allow us to actually automate the things that currently still require a human being to manage.

Although @seanwilson is correct that almost no one needs a faster Excel, faster computers enable programs that automatically generate what Excel was being used to produce based on plain language specifications. That requires a big round trip to a monster server at the moment.

seanwilson · on June 15, 2016

Well I did say "most people". Developers and graphic designers benefit much more but I'd say casual users and business users are seeing vanishing benefits.

petra · on June 15, 2016

If virtual reality works out(and a lot of money's riding on that), most consumers will need a lot of compute.

seanwilson · on June 15, 2016

Yeah, VR has been a new and exciting development that will really promote a lot of upgrades.

api · on June 14, 2016

My take on what it will mean:

(1) It will now make sense to invest in the development of higher-order compute architectures that take advantage of truly massive parallelism. In the past such efforts (connection machine, etc.) were killed by Moore's Law-- by the time they hit the street CPUs had already gotten so much faster their advantage had disappeared.

(2) It will also make sense to prioritize parallelism in software to a much greater degree than it has been historically.

(3) "Software Moore's Law" type efforts will get a huge boost. In the past we've almost had an "Eroom's Law" in software: it bloats in proportion to faster chips and more RAM. Expect to see performance in software become a real feature. (I think we're already seeing this.)

(4) Slow languages will become considerably less shiny. Fast languages that take good ideas from slower dynamic languages like Rust, Go, Swift, etc. will become much shiner. C/C++ might also see a renaissance in mainstream interest. (This is already happening.)

(5) Once fabs all catch up to the state of the art endpoint fab tech for conventional chips, silicon fabrication will become a commodity service subject to commodity market forces. It will get cheap. This will accelerate even more as patents expire (or China ignores them).

(6) China (and specifically the Shenzhen ecosystem) has a very good chance of eating semiconductors due to #5. Nobody does manufacturing scale like the Chinese and once they catch up they'll own it unless players like Intel get serious about getting ahead of this.

(7) Also due to #5, the price per unit of compute may continue to fall for some time after Moore's Law hits the wall -- and partly because Moore's Law has hit the wall. Something akin to the current top-end Xeon could easily cost under $20 in 10-15 years. Smallish ARM chips could be priced by weight like a building material.

(8) As a consequence of #7, we might see what I call "Moore's Mushroom" -- a geometric fall in the price of massively parallel machines (also see #1). If the equivalent of a 48-core Xeon is $20, then a 1200-core monster might cost under $1000. For $10k you'd be able to buy your own equivalent of a decently sized data center.

(9) Power efficiency of chips may continue to improve for a while to the extent that this can be pursued without increasing density, and it might make sense to make this a huge priority once density is stable. Power/compute might still have a ways to go.

(10) As a consequence of #5, open source core designs will really hit the mainstream. You'll be able to take an open reference chip design and have it fabbed by the lowest bidder. "White label" OSS chips will appear.

(11) Finally, also due to #5 ASICs will get cheaper to make. Expect to see more of them, especially since it will now make sense to make an ASIC to speed up an algorithm as you can no longer just wait on Moore's Law to give you that speed for free. You'll see AI ASICs, crypto ASICs, graphics ASICS, data compression ASICs, even language specific 'accelerator' ASICs like JavaScript coprocessors, etc. With OSS core chip designs these could easily get bolted on, leading to weird custom cores like a "JavaScript optimized OpenRISC white label CPU."

(12) It might still be possible to push single-threaded performance by improving cooling, e.g. by mainstreaming liquid or active refrigerated cooling. The current overclocking record for cryogenically cooled Xeons is around 6.2ghz. Will we see liquid nitrogen in data centers as a common thing?

(13) The PC and mobile device upgrade cycle will slow a lot, which will further kill profits in hardware all the way up the stack. Hardware's about to become a hard business.

TL;DR: price/compute and power/compute will continue to improve for a while; semiconductor margins will get squeezed; software efficiency will matter more; ASICs will get more common; open source chips might finally arrive.

All of this might actually be good for Silicon Valley, which mostly does software not hardware. Software's about to get even more important.

raphman_ · on June 14, 2016

Thanks for the insightful list. Bunnie Huang has been arguing for some time that the end of Moore's Law also means that Open Hardware and maintainable hardware in general might become more common:

> "Also, as Moore’s law decelerates, there is a potential for greater standardization of platforms. While today it seems ridiculous to create a standard tablet or mobile phone chassis with interchangeable components, this becomes a reasonable proposition when components stop shrinking and changing so much."[1]

[1] http://www.bunniestudios.com/blog/?page_id=1927

akuma73 · on June 14, 2016

Great list! To quote Bob Colwell (former Intel fellow and DARPA director), it's hard to replace an exponential. There will be non-uniform improvements and "one-off" type of breakthroughs but a sustained 2x every 2x years is just not going to happen ever.

As a thought experiment, what if Moore's Law scaling had ended in 2000? No smartphones.

I agree that the value of software engineering and computer architecture in general will increase as the transistor-scaling free ride ends. We have to make more efficient use of existing transistors that will no longer scale going forward.

vardump · on June 14, 2016

> As a thought experiment, what if Moore's Law scaling had ended in 2000? No smartphones.

2000? No, we would still have had smartphones. We were not in stone age anymore. Mass market smartphones would have eventually been built with that era process technology.

By year 2000, we had full color displays and StrongARM CPUs 206 MHz+, similar process could stretch to 400-500 MHz.

We'd have our beloved smartphones. Just with lower resolution screens (at most 640x480), slower CPUs (200-500 MHz single core, 8 kB L1 cache), 64-128 MB RAM and less storage (up to 1 GB). Packet data connections would have likely been limited way below 1 Mbit/s. (A)GPS would exist, but be much less sensitive (needs a lot of processing power) and take a lot longer to lock. Web browsers and other apps would have eventually been developed that work smoothly with such limited HW.

I think I could have lived with such a moore's-law-stopped-year-2000-device.

See what roughly 2000 StrongARM can do in a mobile device (PocketPC Tomb Raider, just like the original Playstation version):

https://www.youtube.com/watch?v=TujJ4nf31rI

Or these demos (run nicely on 2000 PocketPC HW):

https://www.youtube.com/watch?v=bjZupXHhflA

https://www.youtube.com/watch?v=E2CDOtHswR0

gpderetta · on June 15, 2016

"Web browsers and other apps would have eventually been developed that work smoothly with such limited HW"

and they would probably feel as fast as today's phones for many applications.

vardump · on June 15, 2016

Absolutely, when it comes to CPU side of things.

Of course, network latency would likely be higher than 13-20 ms we currently enjoy on 4G+ networks. Probably something like 50 ms -- main reason for high latency, circuit switched network, would have still been eventually replaced with purely packet switched system.

api · on June 14, 2016

Quibble: I disagree about 'no smartphones.' Look at what is possible on a 64K/8-bit machine in the 1980s:

https://en.wikipedia.org/wiki/GEOS_(8-bit_operating_system)

An iPhone could be done on a single core 100mhz system if code is efficient. That would mean forfeiting some amount of eye candy and developing more efficient GUI layers.

I do agree that some smart phone functionality like super-high-end 3d maps apps might not exist, but a basic map with basic turn-by-turn directions could be built for the Commodore 64 by programmers who actually think about performance.

This is why I think a "Software Moore's Law" and performance is going to matter a lot. Premature optimization might no longer be the root of all evil. It might be a skill programmers need to have.

akuma73 · on June 14, 2016

I agree here with the word "possible".

However, progress on this front would be slow due to the increased cost of software development. Software would be much more difficult to write leading to longer lead times for applications.

I don't think the app stores would scale at the same rate that it did for iPhone or Android.

One analogy is parallel programming. This is generally very difficult, so people avoid it, even when hardware vendors give you get a bounty of cores. Another example is GPGPU - very slow progress on mapping data-parallel apps to the GPU (image processing etc.).

We may not know what future devices and applications are not possible due to Moore's Law's end. Another example is the recent explosion in deep learning - this was only possible due to the increase in FLOP density as a direct consequence of Moore's Law. These algorithms were around in the 80s.

api · on June 14, 2016

"Software would be much more difficult to write leading to longer lead times for applications."

This is a problem we will need to solve. So far we've increased programmer productivity by sacrificing code efficiency because Moore's Law gives us more speed for free. We'll have to find ways to reduce this trade-off in the future by thinking harder and deeper about software since the hardware's not going to save us anymore.

ghaff · on June 15, 2016

Colwell's presentation from Hot Chips two or three years ago really nails it. There are a lot of potential paths to increased performance/efficiency but you don't just replace the sort of gains that came from CMOS process scaling.

T-A · on June 15, 2016

I drooled over this in 2000: https://en.wikipedia.org/wiki/Nokia_9210_Communicator

eli_gottlieb · on June 14, 2016

Basically, with Moore's Law coming to the asymptote of its sigmoid, we're turning the Wheel of Reincarnation again towards specialized domain-specific hardware (http://www.retrologic.com/jargon/W/wheel-of-reincarnation.ht...) and software systems distributed over multiple specialized pieces of hardware.

Since I work somewhere that basically makes custom ASICs, this is very nice for us and our ecosystem.

>(11) Finally, also due to #5 ASICs will get cheaper to make. Expect to see more of them, especially since it will now make sense to make an ASIC to speed up an algorithm as you can no longer just wait on Moore's Law to give you that speed for free. You'll see AI ASICs, crypto ASICs, graphics ASICS, data compression ASICs, even language specific 'accelerator' ASICs like JavaScript coprocessors, etc. With OSS core chip designs these could easily get bolted on, leading to weird custom cores like a "JavaScript optimized OpenRISC white label CPU."

There are a couple of intermediate steps here:

9') FPGAs are pushed to get cheaper and more powerful.

10') Tooling improves drastically for FPGAs and ASIC development. Currently, that tooling is decades behind software development.

10'') A combination of 9' and 10' enables a thriving ecosystem of writing algorithms in hardware, sharing the hardware designs around, and everyone deploying to racks of FPGAs when they can't afford full-blown ASIC production.

bcoates · on June 14, 2016

I agree that fab commoditization is on the horizon, and I'm inclined to go a little further than you on the effect: I think we're going to see (in maybe 20 years) an ASIC design golden age where custom and semi-custom designs become the norm in performance-sensitive applications, and demand drives up the scale and down the costs of tooling, expertise, and deployment.

tcoppi · on June 14, 2016

I don't think an ASIC golden age requires fab commoditization, in fact it could benefit from more custom tuning to the individual process as opposed to being developed for bulk silicon.

xhrpost · on June 14, 2016

Neat right up, I think you have some interesting insight here. I don't know the fab scene nearly as well, but I think you may be missing a stage (purposefully?). After exponential growth fades, the next stage will likely be more linear, which will have it's own consequences. What you're describing here sounds more flat or logarithmic (think battery tech from 1980 to 2000). Not saying it won't happen, but I think linear growth will happen first and have it's own separate impact. For example, we may see a stage where hardware tends to grow in price rather than decrease because economies of scale will also follow the linear pace. You can no longer just "throw hardware" at your capacity problems but companies will try and the ones with the deep pockets will continue to do so.

marcosdumay · on June 14, 2016

With linear shrinking factors, and keeping the current exponentially growing costs, I'd expect fabs to specialize, a few going in the route of smallest possible features at a huge price, and most going in the route of commodity, low price chips.

In fact, that seems to be already happening.

ghaff · on June 14, 2016

>All of this might actually be good for Silicon Valley, which mostly does software not hardware.

I don't really disagree with either this or any of your other points. Though a potential return to specialized hardware is something of a counterpoint (even allowing for the fact that a lot of that specialized hardware will mostly be about effectively burning algorithms into chips.)

That said, there's an entire industry and set of consumer expectations built around some combination of faster/cheaper/smaller/more features. Take away the dominant knob that's made those trends possible, CMOS process scaling, and the consequences could be significant.

tcoppi · on June 14, 2016

I don't think fab tech will become that commoditized. If anything, the end of differentiation on process node size alone will drive differentiation in other areas, such as new types of transistors(tri gate has already become somewhat popular). More complicated and expensive manufacturing technologies that are currently too expensive relative to the performance gain such as SOI technologies might take off as well. There is still a lot of room for innovation in the process other than just shrinking the features, shrinking has just generally been the cheapest way to improve performance until now.

nwah1 · on June 14, 2016

This is a step in the direction of the ASICs that you predict.

http://www.anandtech.com/show/10340/googles-tensor-processin...