
Moore's law hits the roof - nkurz
http://www.agner.org/optimize/blog/read.php?i=417
======
Animats
There are three main limits, and several secondary ones.

The big one is heat. As densities go up, so do power consumption and heat
dissipation for active cells. Inactive cells are OK, which is why the highest
density devices are now flash memories. That doesn't help CPUs.

The next is fab cost. As someone else pointed out, state of the art fabs cost
billions now, and there are fewer ones with each generation.

Finally, of course, sizes are getting close to the atomic level.

Secondary limits include electromigration (the atoms won't stay where you put
them as they're pulled on by electric fields) and even X-rays not being short
enough for mask etching.

Heat is the biggest problem. CPUs aren't that big; you can fit a thousand in a
shoebox if you can cool them. The Bitcoin ASIC business is a worst case. Those
devices need no external memory and very little I/O, but most of the gates
toggle on every cycle. Mining farms are now located in cold climates near
cheap power stations.

------
elorant
From another point of view perhaps we finally might have to start writing
better code instead of solving software issues with better hardware.

~~~
akerro
Impossible, entry level for CS and programming is drastically falling down
every year, languages that don't require any CS knowledge or thinking become
more and more popular. During my undergraduate studies no one told me that
something like CPU cache or branch prediction exists, we were just told C#,
Java, JS and PHP are used in the industry so learn it and you will have "a
job". Some students did their MP in PHP, JS where you could login as red
status bar (HTML injection in login). We were not told about vagrant or static
code analysis. I had to learn in myself from reddit and contributions to OSS.
People who call them self senior JS developers don't know how to solve basic
algorithmic problems.

~~~
CrLf
Maybe this is the source of the recurrent discussions here, and elsewhere,
about how formal CS education doesn't matter (something I wholeheartedly
disagree).

Universities have always been under fire to provide courses "better suited"
for market needs and it seems to me that they've been slowly caving in during
the last decade or so. This, of course, probably varies a lot between country
and even between universities in the same country.

The fact is: university courses that degenerate into professional training
courses end up providing little value. Timeless CS principles are abandoned
for marketable high-level skills that are obsolete even before students finish
their studies, and that help nothing when the market itself changes (or needs
to change).

I see Europe's Bologna process as a perfect example of this. It was supposed
to make higher education comparable between EU countries but instead now
provides an excuse for shorter and shallower courses, comparable only because
they don't go any further than the lowest common denominator.

What you say is really evident in some circles: a magical view of computing
platforms and a complete ignorance of hardware details critical for software
performance. There is a blind faith on compilers/interpreters to somehow
extract maximum performance from the hardware in the face of wastefulness.

As anecdotal evidence, I've more than once had to convince people that
allocating new objects by the millions (in the JVM) has a real CPU performance
cost. The usual response is that the application isn't limited by memory and
the time required by the garbage collector is negligible, thinking nothing
about cache locality and how memory access is now a major bottleneck for
modern CPUs.

~~~
tracker1
I don't know that I agree... people can learn in many different ways, and
don't always need to deal with different aspects. If all I want to do is build
bird houses, do I need to learn everything a structural engineer has to learn
to build sky scrapers?

That said, had a similar issue with logic in a simulation software where each
virtual unit was its' own OO based object, with it's own threads for events...
could only run about 4 simultaneous simulations on a server, and it still
would bottleneck (GC was horrible). change it to a single event-loop that
signaled each unit, and most of the concurrency issues went away. Streamlined
the message passing (more functional approach) and memory usage dropped a
lot... couldn't convince the manager to switch away from an SQL backend, which
was the final bottleneck (normalized data structures), as it was then fast
enough.

Had another instance where a configuration table from a database was loaded
into memory (for performance), but then was kept in memory as a DataTable,
with text based queries for each access for each key, that happened over a
thousand times in a single web request (logins were taking too long)...
Changed it to a HashTable, and lo and behold, concurrency was no longer a
problem... similar issues with not understanding how static variables worked
in a multithreaded application... OO ftw again (sarcasm).

~~~
CrLf
"If all I want to do is build bird houses, do I need to learn everything a
structural engineer has to learn to build sky scrapers?"

Of course not. People should have the choice on how and what they want to
learn. I just disagree when choice is turned into equivalence (i.e. formal
education providing no value over self-learning / disguising knowledge to
build bird houses as knowledge to build skyscrapers).

In some cases (some people) it may not actually matter, but in the general
case, looks like it does.

~~~
tracker1
I will say that it depends on the person... I've taken from learning how to do
simple scripts in the 80's/90's and actually done reading on lower-level CS,
and although I don't write C/C++ day to day, I do understand the concepts. And
while I don't have to think about such things.. understanding memory
constraints and how caching and process swaps work in the CPU has come in
useful even for higher level language projects.

But isn't that why we're supposed to have "Junior" and "Senior" level
positions? I think that what software development really needs is a loosely
structured guild system where you gain rank not by attrition or seniority but
by reputation, where you gain/lose reputation based on who you back for
seniority and how well they do. It seems that beyond CS education, it's a
constant learning process. You cannot make a career only with what you learn
getting a degree in CS.

~~~
CrLf
That's difficult to achive without the reputation system being overrun by
politics sooner rather than later, I guess.

It's true that a degree in CS doesn't make a career, and that it's mostly a
constant learning process. After a while the difference between having a
degree or not having one is difficult to measure and junior/senior levels
become defined by experience. But everything else being equal, the person with
the degree will have forgotten a lot that the person without the degree wasn't
exposed to in the first place. Many people will be exceptions to this, but I'm
just talking about the general rule here.

Once you enter an organization, you build reputation there. This is the only
thing that matters. To transfer this reputation between organizations, the
organizations themselves must recognize the transfer (a form of trust). When
you have long and convoluted interview processes, it means there is no such
trust. The organization you are trying to join is basically starting from
scratch in evaluating you skills, disregarding infomation that says that even
if you can't balance a binary tree now, you once did and so can do it again if
required.

University degrees are supposed to provide this information. They provide
knowledge but also provide a path that is known to be difficult and with a
known level of "reputation" once completed. Experience alone cannot possibly
provide this, because work at company A says litte to company B if the inner
workings of A is an unknown quantity to B.

This is the theory at least. If universities aren't doing they work properly,
this whole system collapses.

------
paulsutter
The most important new code is massively parallel. Using GPUs today but surely
new architectures soon. The human brain is highly parallel, so no surprise
there.

Look how tiny the source code to Deepmind's general Atari player. It's a
completely different model from traditional development:

[https://sites.google.com/a/deepmind.com/dqn/](https://sites.google.com/a/deepmind.com/dqn/)

~~~
sushirain
You are right. If deep learning takes more and more of the processing, we will
need GPUs or more CPU cores, not necessarily clock speed.

------
jp555
As far as where to innovate, how about engineering wifi/3G/LTE to use less
than 700mW of power?

The smartphone is the first universal tech for mankind, and CPU is not the
bottleneck there, it's screens & radios, all batteries being equal of course.

~~~
pjc50
Power per bit per km from tower _is_ slowly decreasing there, but it requires
a new clever noise floor hack for each generation.

For a given modulation scheme, the power required to reliably transmit one bit
a specific distance is effectively fixed by the laws of physics.

------
woodchuck64
> Another possible improvement is to include a programmable logic device on
> the CPU chip.

Given that, earlier this year, Intel plunked down 16 billion to buy #2 FPGA
maker Altera, this might turn out to be the most interesting improvement.

~~~
dennisgorelik
How come losing money is an improvement?

~~~
woodchuck64
> How come losing money is an improvement?

Purchasing a company is not necessarily "losing money" unless no synergy is
possible between core competences. But in fact, Altera/Intel's respective core
competences suggest programmable logic devices next to high-speed multi-core
CPUs may be one way to keep Moore's law going (See
[http://www.eejournal.com/archives/articles/20140624-intel/](http://www.eejournal.com/archives/articles/20140624-intel/)
for example).

------
wyldfire
> Intel is also catering to the market for supercomputers for scientific use.
> Intel's Knight's Corner ... a small niche market ... RAM speed is now often
> a serious bottleneck.

Many of the scientific supercomputing applications are migrating to GPU
workloads with OpenCL and CUDA. One enormous benefit of making this migration
is the extremely fast memory that's available on GPUs.

------
yazaddaruvala
> A likely development will be to put the RAM memory on the same chip as the
> CPU (or at least in the same housing) in order to decrease the distances for
> data transmission.

I've been waiting for this for 5 years, I don't know why it hasn't happened
already.

\- Optimized clock speeds, layouts, improved connectivity (i.e. increased
number of RAM channels)

\-- Reduced latency

\-- Increased bandwidth

\- Would there even be a need for an MMU?

\-- CPUs wouldn't need to work with a variety of RAM types

\-- i.e. Less complexity, fewer points of failure

The only cons:

\- RAM size would not be configurable

Meanwhile, for better or worse, the majority of consumers can't re-configure
their own RAM (Laptops/Smartphones), or on a desktop they don't know how.
Businesses would either buy beefy hardware that suited their constraints or
spend money on custom configurations. So this con only really effects
hobbyists.

~~~
nordsieck
The reason this will never happen in the way you're imagining it, is yields.
16 GB of ram is just too much die area.

In a small way, this has already happened, though. The intel Iris Pro has 1.5
GB of basically 4th level cache, shared between the GPU and the CPU.

I could see memory on the package, though, which might generate some of the
benefits you're looking for.

~~~
Narishma
> The intel Iris Pro has 1.5 GB of basically 4th level cache, shared between
> the GPU and the CPU.

You mean 128 MB?

~~~
nordsieck
Whopse.

------
zappo2938
I read this on a $149 Chromebook wondering why do I need a faster processor?
Sure the servers this ARM processor powered computer is connecting with in the
cloud are powered by Intel processors but the optimizations at this point are
in the software.

~~~
TeMPOraL
Fortunately, not everything is on the Web yet, and not everything interesting
to do with computers is done on-line.

I read this wondering why the CPU prices aren't dropping - my i5 costs as much
today as it did exactly 4 years ago, when I bought it - and I'm sad about
this, because I could use a faster CPU for Kerbal Space Program, but I'm not
going to pay third of a new computer's worth for a marginal improvement in
clock speed.

~~~
szatkus
It's just in desktop i5 segment. Intel threw more cores to Xeons and much
higher clocks at low TDP CPUs, but that means nothing for desktop user. IPC
only improved by about 25% during last few generations. Also AMD has nothing
competetive in that segment.

------
ninjakeyboard
I think Akka is sort of heading in the right direction in its ubiquitous
language. Akka talks about itself in terms of solving the "distribution
problem" \- not necessarily across machines but machines OR cores. It talks
about "location transparency" as a first principle meaning that an
actor/worker that processes a message can be either on another CPU or another
physical machine and the location of that actor is transparent from the
perspective of the code - it's a deployment/configuration concern.

It's convenient that multicore utilization is a scale problem at the same that
utilizing networked compute resources (ie in the cloud) is scale problem
resources because we might be able to treat them as similar problems. Yes the
network introduces some other unique qualities
([https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...](https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing))
but apart from reliability assertions it's an interesting observation that
they can be treated similarly if message passing mechanics are abstracted. eg
they both fit asynchronous, event driven paradigms. both need elimination of
shared state to not make errors in your code (required over the network).

Multicore concurrency in code isn't too hard but threads/locking are probably
the wrong abstraction. Similarly, functional approaches might be a better
"default" approach for a concurrent world.

~~~
paulddraper
Akka is sold as a concurrency abstraction that works great ay any level: one
core, multicore, multi-host.

After using it for years, I can say it does work, but only with a high
complexity overhead. Failure conditions, retry logic, serialization and
delivery guarantrees are a concern of multi-host parallelism, but not really
multicore (same process) parallelism.

I'm not sure I've ever seen an abstraction over memory-separated concurrency
that doesn't add a lot of complexity. It may be that such a difference is
fundamental and isn't meant to be abstracted over.

In any case, unless you are sure that you want to design for distributed
parallelism, I recommend conventional thread pools, etc.

Also,
[https://www.chrisstucchio.com/blog/2013/actors_vs_futures.ht...](https://www.chrisstucchio.com/blog/2013/actors_vs_futures.html)

------
6d0debc071
Maybe we'll finally get some interesting processor architectures. The race for
ever more speed has its reasons, but it's not been built on a particularly
helpful abstraction layer.

------
eliben
Agner is legendary, but this piece is too Intel-focused. It operates under the
assumption that only Intel can push the industry forward. Yes, we're unlikely
to see a lot of further process reductions with Silicon, and yes power is a
problem; but this is a large market with many powerful players, and assuming
that it will stall because Intel stalls is extremely short-sighted.

~~~
scholia
The reality, at the moment, is that Intel may well be the only company that
can push the industry forward.

Each time you introduce a new processor node, the cost of the fab goes up
dramatically, and now we're talking about $14 billion for the next Samsung
fab. (This is Rock's Law.)

If you go back a few years, there were a couple of dozen chip manufacturers
around the leading edge, including Toshiba, Siemens, Sony, Fujitsu, NEC and
Philips.

With each new step, one or two of these companies can't find or justify the
money to build a very expensive fab for the next process, so they drop out.

Today, we're down to four or five: Intel, TSMC, Global Foundries (ex AMD+IBM),
Samsung and maybe Europe's STM. So Toshiba, Siemens, Sony, Fujitsu, NEC,
Philips and several others have all dropped out.

Which of the four or five leading companies have got both the money and the
need to keep pushing on?

Looks to me like Intel and Samsung.

The only other option I can see is Apple putting a ton of money into TSMC,
which current makes most of its money fabricating chips for mid-market
companies like Mediatek. That's a possibility because it would stop Apple from
being too reliant on Samsung.

Rock's law
[https://en.wikipedia.org/wiki/Rock's_law](https://en.wikipedia.org/wiki/Rock's_law)

~~~
eliben
But Intel is far from being the richest tech company out there. What's $14bln
for Apple with $100s of billions of cash in the bank? It's not that much for
Google either, and and other tech giants.

~~~
scholia
Which is why I suggested that Apple might put money into TSMC....

On the other hand, having pots of money doesn't actually mean a company can do
anything significant in a particular area. Look at all the knock-offs in the
Google Graveyard for examples.

------
JumpCrisscross
When Intel says 10nm, does that actually mean 10nm (as in twenty [EDIT: fifty]
Si-atoms wide)? Or is it an industrial term of art?

~~~
mchannon
Yes, it actually does mean 10nm wide, though it can often refer to the minimum
feature size (as in not every part of the chip is 10nm wide, just some of
them).

Since there's more than just silicon to a chip, those features can often be of
metallic atoms, which can be smaller than Si, and hence more than 20 atoms
wide.

~~~
thechao
The answer to the second unasked question is, if cost were no barrier, then
the lower bound for our current class of tech is almost certainly an effective
feature size of ~1.5nm, or about 5 or 6 generations. When I'm feeling bullish
(usually after talking to my boys at Intel) I feel confident we'll
_eventually_ see 3nm. I suspect something more exotic will intercept by then,
and carry after that in effective feature size reduction. Of course, by then,
my hot-air-lifting-body-balloon unicorn will have realized, so I won't care =)

------
interdrift
That could be good for us.It means its time for innovation.

~~~
emsy
You already see innovation from Intel to segment the market so that they can
maximize profit without increasing processing speed (for example the way they
artificially separate their server from their desktop CPUs). In the long run I
hope you're right (and AMD is able to catch up)

~~~
frik
Let's hope AMD and various ARM CPU vendors catch up. Intel Xenon are
overpriced i7 with enabled ECC memory. And we are stuck with ~3GHz single core
raw speed since 2004 (memory got faster, etc - we know!). We would benefit
from 10GHz single core performance.

~~~
emsy
Definitely!
[https://twitter.com/reubenbond/status/662061791497744384](https://twitter.com/reubenbond/status/662061791497744384)

------
anaip1
The time for Elixir is coming near! :)

------
cft
That probably means there will be no "web 3.0". Not sure how to monetarily
benefit from this prediction.

The next tech boom may be in biology driven by cheap sequencing and its data.

~~~
yo-code-sucks
You mean the Semantic Web? It's here.

------
peter303
I remember reading this in 1980s.

~~~
whistlerbrk
In the 1980s we werent hitting the physical limits of the size of individual
atoms though. Those gates can only be so big, right?

~~~
yogthos
We're very far from hitting any theoretical limits. We're just hitting speed
limits for silicon. Meanwhile, we already know several substrates that
outperform silicon by orders of magnitude. Here are a few off top of my head:

graphene [http://www.extremetech.com/extreme/175727-ibm-builds-
graphen...](http://www.extremetech.com/extreme/175727-ibm-builds-graphene-
chip-thats-10000-times-faster-using-standard-cmos-processes)

spintronics: [http://www.technologyreview.com/view/428883/a-spintronics-
br...](http://www.technologyreview.com/view/428883/a-spintronics-
breakthrough/)

photonics: [http://www.gizmag.com/photonic-quantum-computer-
chip/38928/](http://www.gizmag.com/photonic-quantum-computer-chip/38928/)

memristors:
[https://en.wikipedia.org/wiki/Memristor](https://en.wikipedia.org/wiki/Memristor)

There's no theoretical breakthrough needed to use these technologies, it's
simply a matter of moving them from the lab to mass production. So far there
has been little incentive as companies are still squeezing out the last bits
of juice from silicon, but as we start hitting the limits it's only a matter
of time until these technologies get phased in.

------
whatssttsypu
If I had unlimited power, I'd have builds recompiling constantly, AI running
and storing and analyzing results.

It would be awesome!

------
chrishawn
But wouldn't quantum computing qualify? Seems short sited to just focus on
Intel

~~~
tfgg
Given that Moore's law is about transistor densities, no I don't think quantum
computers would qualify. It also seems premature to talk about about a Moore's
law equivalent for the cost/density of quantum gates given we can barely build
more than a handful at a time (D-Wave 'quantum' computers aren't quantum gate
computers and don't scale in the same way).

