
ARM Pioneer Sophie Wilson Also Thinks Moore’s Law is Coming to an End - jcbeard
https://www.nextplatform.com/2017/04/13/arm-pioneer-sophie-wilson-also-thinks-moores-law-coming-end/
======
Animats
Some limits were hit a decade ago. The Pentium 4 (2004) clocked at 3.8GHz max.
Most Intel processors today are slower than that. Intel's fastest offering is
a little over 4GHz.

The article says that 28nm will dominate for another decade, even though 14nm
fabs exist. Having to use extreme ultraviolet (really soft X-rays) for
lithography runs costs way up. EUV "light sources" are insanely complex,
involving heating falling droplets of metal to plasma levels with lasers. It's
amazing that works as a production technology. The equipment looks like
something from a high energy physics lab.

It's interesting that we hit the limit of photons before the limits of atoms
or electrons.

Another problem with all this downsizing is electromigration. Every once in a
while, an atom gets pulled out of position by the electric field across a gap.
Higher temperatures make it worse. Narrower wires make it more of a problem.
This is now a major reason ICs wear out in use.

Getting rid of the heat is another problem. High performance CPUs are already
cooling-limited. This is also why 3D IC schemes aren't too useful for active
components like CPUs. Getting heat out of the middle of the stack is hard.
Memory can be stacked, if it's not used too hard.

There's no problem making lots of CPUs on a chip, if the application can use
them. Things look better server-side; you can use vast numbers of CPUs in a
server farm, but it's hard to see what 20 or 100 CPUs would do for a laptop.

Drastically different architectures may help on specialized problems. GPUs
have turned out to be more generally useful than expected. There will probably
be "deep learning" ICs; that's a problem where the basic operation is simple
and there's massive parallelism.

For ordinary CPU power per CPU, we're close to done.

~~~
petra
Maybe we should attack the memory wall problem first, since it promises a big
acceleration in some areas, and it's not strictly related to limits of
physics, like lithography ?

One way to do so is silicon photonics: it's possible to achieve 280 tbps
bandwidth for 168W @ chip area of 4mm2 for 168W. Or you could triple that but
you'll need ~1.5KW .

This of course will require better cooling, but DARPA has a project, using
microfluidics channels, that can cool 1KW/cm2 and 30KW/cm2.

And as for power delivery - with TSV's you can deliver 150W/cm2.

And sure, such chips may be more expensive both in cost and power than today's
are. But infinite memory bandwidth and photonics can make programmer's lives
so much easier, and offer great acceleration in memory constrained problems,
and make new architectures feasible(network computers with infinite
bandwidth?). So maybe there's an initial market ?

And once we're there, and money starts flowing, maybe we'll see another type
of "moore's law" focused on reducing the costs of cooling/photonics/power-
delivery?

And BTW, some good news: ST(i think) is opening a photonics fab.

links:

[https://www.extremetech.com/extreme/224516-microfluidics-
dar...](https://www.extremetech.com/extreme/224516-microfluidics-darpa-is-
betting-embedded-water-droplets-could-cool-next-gen-chips)

[https://pdfs.semanticscholar.org/4f9b/d24ea869f64c4457f69ad6...](https://pdfs.semanticscholar.org/4f9b/d24ea869f64c4457f69ad6d9dc2b3ddb93ac.pdf)

[http://drum.lib.umd.edu/bitstream/handle/1903/17153/TR%20DRU...](http://drum.lib.umd.edu/bitstream/handle/1903/17153/TR%20DRUM%20revised%2023%20June%202016.pdf?sequence=5&isAllowed=y)

~~~
adrianN
Latency is more important than just bandwidth, and since RAM is pretty far
away from the ALU you'll quickly hit limits there.

~~~
barrkel
What if instead of passing a fetch across the bus, you passed a tiny little
program that included conditional indirections - if you want to get x.y.z,
you'd have a list of two indirections and an offset in the little program.

Push the latency problem of pointerful code closer to the data.

~~~
hinkley
Here's an even simpler one: double lookup. Fetch a block of memory at the
address stored in this memory address (which I may not have fetched already).

Our data hierarchies create graphs that have lots of internal pointers in
them, and we are constantly warned about the cost of pointer indirections.
Memory already knows how to deal with addresses. Teaching it to chase pointers
should be easier than teaching it Boolean math.

[edit: it appears the real limitation to both our ideas is that virtual memory
prevents you from making any decisions on the wrong side of the MMU. You could
only ever make conditional fetches from the same page, which could be slightly
useful but would be so hard to use I don't know who would bother]

~~~
dom0
This does not work, because the latency of accessing memory is inherent to it.
It's simply the time required to open a DRAM row (or charge bit lines), which
stays more or less the same independent of process scaling. Moving the pointer
chasing closer to the memory does not change this larger delay.

------
static_noise
Moores law has been the driving force of chip development?

Prophet Moore predicted the future and now engineers start breaking the law?

Isn't it the other way around that Moore made an observation about some effect
that arose naturally? The formula was then called Moores law and its
extrapolation had great predictive power for a long time.

Similar effects occur all through industries when you start scaling things up.
Quality will go up and cost per unit will go down. Often following a simple
mathematical formula which describes the learning curve.

In many technologies there is something called maturity where the straight
line in the diagram starts to bend and approaches a technical limit. Markets
overcome this a few times by changing the technological approach of solving a
problem to an approach that has a better limit. This makes the general trend
continue for decades... until the point where the next technology is so
expensive that noone can afford it anymore.

Thus far Silicon has won every round and chip manufacturing plants cost many
billions of dollars.

~~~
wtallis
Moore's law started out purely observational, but over the years it became a
driving force of its own. Chip manufacturer's performance is judged against
Moore's law. Roadmaps and timelines are drafted with Moore's law in mind.
Massive research investments are undertaken with the expectation that they
will enable a company to keep pace with Moore's law.

Even if you're not doing chip design, anyone planning more than one chip
product cycle into the future needs to take into account Moore's law. If
you're building a hardware system or even a software project, ignoring Moore's
law means that by the time you ship, your product might be cheaper than
expected but also missing features that are now cost-effective.

~~~
parrellel
Chip manufactures aren't going to be able to keep it up though. Look at all
the problems getting a 10nm chip working, look at how those issues are going
to get much worse at 7 and 3nm scales. Advances now seem more putting all the
optimizations back in that they ignored in the quest for the physical bottom.

------
deepnotderp
I think a key point that's ignored is that data movement is the new problem.
For example, according to Lawrence Livermore National Laboratories, the cost
of moving a 64-bit word 1mm ON CHIP on the 10nm projection is approximately
equal to doing a 64-bit FLOP. And the cost of DRAM is outrageous... It's
what's holding back exascale and will hold back general purpose compute as
well.

Architectures MUST change radically to adapt to this or there can be no
progress.

~~~
wtallis
> the cost of moving a 64-bit word 1mm ON CHIP on the 10nm projection is
> approximately equal to doing a 64-bit FLOP

Is this the cost in Joules or nanoseconds?

~~~
deepnotderp
Joules, but the cost of memory access in terms of speed has an even worse
disparity.

------
to3m
> In 1975, Wilson was part of the team that developed the 6502

You can get a better summary of her early career from her computer history
museum oral history interview:
[http://www.computerhistory.org/collections/catalog/102746190](http://www.computerhistory.org/collections/catalog/102746190)
\- worth your time.

~~~
lsllc
Fantastic read, thank you!

------
0xCMP
I imagine this will begin to put some pressure back to making things faster
again as speed-ups that were previously expected fail to appear. (i.e. JS
performance on mobile)

These days it's not a big deal to most developers, but I think over the next
few years if there aren't major advances in speed we will want to get that
extra battery life and speed out of our applications and devices. Independent
Developers hopefully will have a good financial reason to do that, unlike
today.

------
paulsutter
AI processor speedups will advance faster than Moore's law in the next 2-3
years, mostly due to lower precision (12/8/4 bits instead of 64/32 bits),
massive parallelism, and a different programming paradigm. Google's TPU's for
example are close to hardwired matrix multiplication. Maybe speedups for
traditional scalar-oriented code matters less now.

Intel Lake Crest: "will enable training of neural networks at 100 times the
performance on today’s GPUs, said Diane Bryant, executive vice president and
general manager of Intel’s data center group"

[https://venturebeat.com/2016/11/17/intel-will-test-
nervanas-...](https://venturebeat.com/2016/11/17/intel-will-test-nervanas-
lake-crest-silicon-in-first-half-of-2017-knights-crest-also-coming/)

Google TPU: "The TPU...used 8-bit integer math...process 92 TOPS" (trillion
operations per second)

[https://www.nextplatform.com/2017/04/05/first-depth-look-
goo...](https://www.nextplatform.com/2017/04/05/first-depth-look-googles-tpu-
architecture/)

Generally:

[http://www.moorinsightsstrategy.com/what-to-expect-
in-2017-f...](http://www.moorinsightsstrategy.com/what-to-expect-in-2017-from-
amd-intel-nvidia-xilinx-and-others-for-machine-learning/)

~~~
zitterbewegung
I really don't think so. The current graphic cards are heading torward the
same issues Intel is doing. It's getting longer for improvement in GPUs to
occur .

~~~
paulsutter
GPUs are more complex than what's needed for AI, take a close look at Google
TPU for example.

~~~
jacquesm
The TPU is used in inference, not in training, GPUs can be used for both.

[https://www.extremetech.com/computing/247199-googles-
dedicat...](https://www.extremetech.com/computing/247199-googles-dedicated-
tensorflow-processor-tpu-makes-hash-intel-nvidia-inference-workloads)

------
kurthr
The death of Moore's Law will have as much to do with CFOs deciding that the
investment isn't worth the return as it will with technological innovation.
When Intel decided to layoff 12k last year, it seemed like the writing was on
the wall. ITRS seemed to think so, anyway:

[https://www.hpcwire.com/2016/07/28/transistors-wont-
shrink-b...](https://www.hpcwire.com/2016/07/28/transistors-wont-shrink-
beyond-2021-says-final-itrs-report/)

Going from Tick-Tock to Tick-Tock-Tweak... and this year to Tick-Tock-Tweak-
Tuck the fourth year of 14nm (still as compact as other companies 10nm) makes
the slowdown palpable. Perhaps they will manage a 2.7x shrink at their "10nm
node" with or without EUV, but it's not the straight scaling of yesteryear.

~~~
heisenbit
Yeah. There are plenty of distress signs in the semiconductor ecosystem.
Toshiba selling due to inability to invest, Mentor going to Siemens and Intel
buying small tooling companies who are unable to sustain investment due to
their customers having merged or gone out of business. It happens across the
whole value chain. A few vertical integrated businesses are able to keep up
for the time being. Whether for one, two or three generations remains to be
seen.

~~~
tarlinian
To be fair Toshiba's sale has nothing to do with their semi business (they
lost so much money in their nuclear misadventure that they have to sell the
only profitable portion of their business to pay off remaining debts)...and
the semiconductor manufacturing has been consolidating for 20 years. I
actually think that there is a chance for some newer customers to spring up in
China due to the massive investment being underwritten by the Chinese
government. The existing companies involved in chip manufacturing will be kept
from merging by regulatory fiat (see AMAT-TEL, Lam-KLA getting squashed on the
equipment side)...there's no way that TSMC, Intel or Samsung will be allowed
to buy any of their competitors, and they can't sell to China because of CIFUS
(or the Taiwan/Korea equivalent) interference. This means that if China is
serious about getting into wafer manufacturing, they have to start from
scratch. Since NAND is likely supply constrained for at least another 5 years,
there is a chance they can pull it off.

~~~
heisenbit
The semiconductor business is increasingly becoming more capital intensive.
Toshiba needs capital to plug holes elsewhere but they also need to invest
otherwise their healthy chip business would fall back quickly. You are right,
there is more than one reason for the sale.

------
api
I disagree about the limitations of software parallelism. The article is
correct that many existing algorithms like ray tracing or apps like web
rendering have inherent limits to parallelization, but there exist a large
number of "embarrassingly parallel" things that simply are not done on small
PCs and phones right now because they're too costly. This includes things like
neural networks, genetic algorithms, all kinds of optimization algorithms,
etc.

This article is from 2007 so it predates the AI renaissance. Lots of AI, ML,
and optimization stuff can happily eat as many cores as you want to throw at
it.

Then there's the multitasking angle. On a desktop at least I often run dozens
of applications, developer VMS, etc. I could definitely use 20 cores in a
desktop/laptop right now. We have tests that easily max out a 24 core server
that I'd love to run on my own box.

~~~
tonmoy
Sorry, which article from 2007 are you referring to? The OP is from 2017

~~~
api
Oops... misread. It is from 2017.

------
visarga
On the other hand, many computer functions have reached the "good enough"
level. A normal laptop can handle web browsing and document editing just fine.
Resolution over Retina level and digital cameras over 10 megapixels are not
necessary. Also, sound fidelity over 44khz is not useful. Video over 4K also
is on a diminishing curve of returns. We have little extra improvement to get
from many domains. Where do you think more processing power would add a large
benefit?

~~~
Cyph0n
VR is the next step, and it obviously needs more computing power to truly
become immersive.

------
rhaps0dy
>Even for highly parallel workloads like ray tracing, the performance increase
levels off at about 20 times. “No matter how many processors I apply, ray
tracing ain’t going to go any faster than 20 times faster,”

What? That's just not true. Matrix multiplication is one such embarrassingly
parallel workload that can go much faster than 20 times. Ray tracing very
probably too.

~~~
randcraw
I suspect Wilson was saying "Since 5% of the runtime of the raytracing
algorithm cannot be sped up through parallelism, even if the time needed to
run the other 95% were reduced to zero due to parallelism (or some other
magic), the speedup of raytracing could never exceed 100/(100-95), or 20X."

In essence, Amdahl's Law trumps Moore's Law.

~~~
Dylan16807
And that's true with some workloads, and maybe some systems that use
raytracing, but not raytracing itself. The only overhead is combining the
final data from each processor, and that's log(n) in the number of processors
with a very small constant. A system with a million independent processors can
raytrace very nearly a million times faster.

------
buzzybee
If one believes Ray Kurzweil(among others), this is just a phase shift where
the focus of change moves away from this technology towards a new one. But
then the question is: which one? We do have some options floating around.

------
rini17
Memory did not go faster so much. You can cram bazillions of transistors on a
chip, even do clever tricks to fix power consumption/dissipation...but no
trick will feed them data fast enough.

------
Symmetry
Yup, we won't be able to keep shrinking MOSFETs forever. There's like to be an
interregnum of some sort before a new computing substrate is developed that
give us substantially faster gates. And possibly fewer but higher frequency
gates at first, which would be interesting.

In the mean time we might see a new golden age of computer architecture where
the only way to increase performance is to question assumptions about how we
design computers.

------
deepnotderp
We also always tend to neglect the equally important counterpart to Moore's
Law,Dennard scaling. Dennard scaling is on its deathbed, and has been
plateauing from around 40/28nm. Since power consumption is now the problem for
everyone, including supercomputers, this problem will compound the almost
impossible to solve data movement wail m

------
justinbaker84
Very sad to see this ending.

------
framebit
Interesting and relevant paper on the end of Moore's Law:
ftp://ftp.cs.utexas.edu/pub/dburger/papers/ISCA11.pdf

------
kutkloon7
Is this even news?

I have heard the dramatic "Oh no Moore's law is coming to an end" a dozen
times during computer engineering courses. Professors are usually slow to
adapt new information and it is already a couple years ago that I took those
courses. I think that the transistor count has been slowing down for about a
decade already.

