
Moore’s Law Not Dead – Intel’s Use of HPC to Keep It Alive - ingve
http://www.hpcwire.com/2016/01/11/moores-law-not-dead-and-intels-use-of-hpc-to-keep-it-that-way/
======
jondubois
One thing that I am really curious about is why CPU prices have seemingly
stopped dropping? Many years ago, I distinctly remember that as clock speeds
doubled (e.g. from 1GHz to 2GHz), the price per GHz would half (keeping prices
of CPUs stable)... So why is it that prices of n-core CPUs seem to be growing
linearly with n?

Is it because demand is higher for CPUs with low core-counts? I understand
this argument in the consumer PC space, but what about in the datacenter?

Maybe companies which operate datacenters are too focused on horizontal
scalability and stopped caring about vertical scalability? Surely this will
change in the future? There are factors like power consumption and floor
realestate to consider - From an electricity and realestate perspective, it is
increasingly efficient to run your software on as few machines as possible
(each machine having more CPU cores).

Am I right to believe that eventually companies will start writing/using
software which leverages multiple CPU cores (when the cost benefits outweigh
the hassle)?

Why is it not popular to scale both horizontally AND vertically?

I would be really interested in hearing ideas.

I started thinking about this after noticing that Amazon EC2 instance prices
grew linearly as you doubled memory and CPU resources... I expected a sub-
linear growth.

~~~
petra
Since the 28nm node which happened around 2009, transistor prices didn't go
down, and the cheapest chips use 28nm even today. So realistically the main
economic interpretation of moore's law has stopped for some time .

~~~
smaddox
Source? According to Intel, cost per transistor is still falling
exponentially: [http://www.anandtech.com/show/8367/intels-14nm-technology-
in...](http://www.anandtech.com/show/8367/intels-14nm-technology-in-detail)

~~~
petra
source: [http://electroiq.com/blog/2014/03/moores-law-has-stopped-
at-...](http://electroiq.com/blog/2014/03/moores-law-has-stopped-at-28nm/)

Look for "cost metrics" table, "cost per million transistors".

As or intel: they claim they can make cheaper transistors, but yet their cpu's
don't reflect that. Also they've offered manufacturing services for some time,
but nobody taken , and that's pretty weird if they are so cheap beyond
industry.

So many in the industry don't think it is true.

~~~
bashinator
Cost per transistor is still dropping, but transistor count is skyrocketing as
Intel adds cores. Their flagship Xeon CPU has 18 cores and something like 5.5
billion transistors.

New CPUs during the clock speed doubling era had moderate increases in
transistor count to accommodate new features. Now that clock speed increases
have stalled, adding on more stuff is the only way to differentiate new
products.

------
ZenoArrow
People misunderstand Moore's Law a lot. If it had been called Moore's Target
its real world usage would be much clearer. In other words, whilst it started
as an observation about the rate of improvement in semiconductor
manufacturing, it then became a target to work towards, which helped the
industry grow in lockstep.

However, Moore's Law is clearly dying, despite what Intel wants to claim. The
gaps between new process nodes is increasing, which is far more important than
transistor count, as this is also affected by how large you make the silicon
that holds your transistors.

~~~
redcalx
Moore also specifically referred to the cost per transistor, a metric that is
complicated by the increasing costs of fabrication now that the low hanging
fruit has long gone.

~~~
scotchmi_st
> ...now that the low hanging fruit has long gone.

That begs the question of what we determine to be low-hanging fruit. An
engineer in 2030 may look back on the time now and see the 2016 advancements
as 'low-hanging fruit' relative to their time. Unless you think there's not
much of a future in chip fabrication (in which case, you may want to tell
Intel).

As I see it, Moore's Law is a rough metric coined by someone almost half a
century ago. Just because the metric is no longer relevant, that doesn't mean
the increase in complexity is slowing.

~~~
EvanPlaice
> ...now that the low hanging fruit has long gone.

I'd postulate that the low hanging fruit came in two forms.

The first being advances in general purpose CPUs by increasing the number of
transistor. So, we're about to hit the hard limit in that regard.

The second, being the design of software under the assumption that CPU speed
will continue to increase linearly.

\-----

Hardware:

Currently, the majority of systems architectures are designed to be general
purpose. The exceptions being GPUs for parallel processing and ASIC for
specialized tasks like encoding/decoding media formats.

The next generation under currently under development is SoC (Systems on a
Chip). Ie SoC is the complete hardware infrastructure contained on a single
chip. They run cooler, are more power efficient, and are faster in terms of
passing data via the internal cache layers.

What may come after is a combination of SoC and dynamically configurable
hardware architecture similar to how FPGAs work today. The combination of the
cloud gaining mass appeal as well as IoT taking root (despite all the hype), a
market will grow for chips with a combination of very predictable performance
characteristics and specialized capabilities.

With dynamically configurable hardware you could bootstrap a system and tell
it to specialize as a graph database node, a media encoding/decoding cluster
node, a Smart TV platform, a smart car CPU, etc. There's already talk of
neuromorphic chips that can dynamically change/adapt to a specific
environment.

The other direction I see room for growth is communication speed. I've heard
some about using optics as a communication bus on multi-layered chips.
Material manufacturing seems to be the issue. If somebody were to come up with
a solution then enabling on board optical communication busses could have a
huge impact on IO performance/latency.

\-----

Software:

We're at the tail end of 3 decades of unrestricted software glut. Every system
commonly available today runs on a general purpose monolithic architecture.
With a ton of legacy glut and a huge surface attack surface for security
vulnerabilities.

General purpose OSes won't go away for personal workstations but there's a lot
of room for improvement when it comes to special purpose platforms. I've
already been mocked on twitter for it but I think Unikernels will be the next
major advancement in software for non-general-purpose systems.

Instead of installing a huge bloated software platform and configuring it to
run your code. The OS will be reduced in size to the point where it can be
versioned in source control and setup as a step in the build process.

NodeOS is one example of this. It uses a very tiny Linux kernel stripped of
everything except the absolutely bare minimum foundation. It has a scheduler,
dynamic memory management, enough of a HAL to run on a VM, and a basic network
stack. Everything else runs on the V8 VM, including POSIX utility equivalents
written in JS.

That's just Node. There are unikernels for a wide range of different
languages. There's even the Runtime.js project which is working to implement
even the underlying kernel from scratch in pure JS.

The Raspberry PI platform has proven beyond a reasonable doubt that there's a
gaping middle ground (and potential market) between embedded programming and
general purpose platforms.

I think the greatest potential for future performance gains will come from
cutting out the old cruft and optimizing on what we use today.

~~~
ZenoArrow
For what its worth, you and I clearly have similar ideas on this subject. Just
thought you should know not everyone is against your views. Thanks for the
tips about NodeOS and Runtime.js, that's two interesting projects I wasn't
previously aware of.

~~~
EvanPlaice
Thanks for the vote of solidarity.

I tend to read anything and everything I can get my hands on. Occasionally I
stumble on something unique that's worth a deeper look. Very rarely that
something appears to have the potential to drastically improve upon the
current way of doing things.

At this point I don't really have any credibility in the tech community so I
don't concern myself much with losing what I don't have. I assume people
perceive me as the tech nerd equivalent of the crazy homeless guy on the
corner holding a cardboard that says 'they're coming'.

------
frik
I still want 10+GHz single core performance. Since 2004 we have ca 4 GHz CPUs.
(we got faster memory, etc - I know)

Intel needs a serious competitor, otherwise we won't get it by 2020. Is Intel
waiting for AMD/ARM/Mips/etc to catch up (and milk the customer) so they
produce faster CPUs they probably already have in various stages (prototypes).

~~~
ZenoArrow
Tracking GHz isn't an accurate way to measure performance. Take a look at this
table, which looks at Instructions Per Second for a number of different
processors/devices:

[https://en.wikipedia.org/wiki/Instructions_per_second#Timeli...](https://en.wikipedia.org/wiki/Instructions_per_second#Timeline_of_instructions_per_second)

To see an example of why GHz isn't an accurate way to look at performance,
compare these two:

Pentium 4 Extreme Edition - 9,726 MIPS at 3.2 GHz

Intel Core i7 5960X - 238,310 MIPS at 3.0 GHz

That said, it is possible to have CPUs that exceed 4GHz. Heat is the issue you
need to overcome.

Here's an Intel's Core i7-4770K, overclocked to 8GHz:

[http://www.tomshardware.com/news/Intel-Haswell-
Overclock-i7-...](http://www.tomshardware.com/news/Intel-Haswell-
Overclock-i7-4770K,22454.html)

~~~
frik
Thanks a lot.

Comparing your Intel Core i7 5960X with the old 2004 Pentium 4, single thread
performance:

    
    
      Intel Core i7-4790K @ 4.00GHz  ... 2529 
      Intel Pentium G3258 @ 3.20GHz  ... 2172
      Intel Core i7-5960X @ 3.00GHz  ... 1991
      Intel Pentium 4 @ 3.80GHz      ...  824 
      Intel Atom 230 @ 1.60GHz       ...  238 
    

[https://www.cpubenchmark.net/singleThread.html](https://www.cpubenchmark.net/singleThread.html)

Vector processors like the Cray-2 are very awesome and had huge amounts of
main memory. We will see nVidia does with their GPUs and AMD does with their
APUs that works great already in PS4/XB1.

(MIPS and GFLOPS measurements depend on the benchmark implementation and the
used language/compiler/etc.; and they show unrealistic PEAK results, rather
than real world performance) LINPACK benchmarks (Fortran based) is better than
MIPS AND GFLOPS too, "The performance measured by the LINPACK benchmark
consists of the number of 64-bit floating-point operations, generally
additions and multiplications, a computer can perform per second, also known
as FLOPS. However, a computer's performance when running actual applications
is likely to be far behind the maximal performance it achieves running the
appropriate LINPACK benchmark"
[https://en.wikipedia.org/wiki/LINPACK_benchmarks](https://en.wikipedia.org/wiki/LINPACK_benchmarks)

------
yummybear
This might seem like a stupid question, but why does the number of transistors
relate to processing speed?

~~~
vidarh
You can speed up any given instruction basically in two main ways:

Increase clock frequency or do more per clock.

Increasing clock frequency is tricky because you end up spending more and more
of chip area to keep the clock in sync, and the chip will also run hotter
unless you also manage to lower voltage drastically (which has its own
problems); I don't know what it's like these days, but one of the DEC Alpha
CPUs spent _40%_ of the die area just to keep the clock stable (because
basically "everything" on the chip is contributing to skewing the clock one
way or the other)

You may have noticed that clock speed increases have tapered off drastically.
They're hard to do without ridiculous power usage. IBM's POWER CPUs still
chase clock speed, because they're not as limited by the same constraints
(targeting very specific niches).

The other alternative is to do more per cycle. That means trying to squeeze
out whatever parallelism you can from the instruction stream and executing as
much as you can in parallel. To be able to execute in parallel, you need
multiple copies of various parts of the logic so each copy can process part of
the work for one instruction.

~~~
revelation
Why is it important to keep the clock stable? Or rather, what level of
stability are we talking about?

I would think that a customer doesn't particularly care if the clock is
fluctuating 100MHz.

~~~
nikdaheratik
The clock is used to move items along the processing pipeline (between one set
of transistors and another). If you don't keep everything in sync, then
instructions get executed out of order and that's bad, obviously.

------
jnwrd
The scaling on the simulation of the nanowire was dismal. Having effective
quantum simulation would make such a tremendous impact on semiconductor
development.

~~~
danbruc
The scaling behaviour is actually surprisingly good. The time to simulate
quantum mechanical systems generally grows exponentially with the number of
states, i.e. the number of atoms in this case, but the listed running times
grow only somewhere between linearly and quadratically with the number of
atoms.

------
paulus_magnus2
Moore's law is a marketing "law". It's invented so that you want to buy a CPU
today (rather than waiting for next one) and also want to upgrade every 2-3
release cycles.

Sentiment of "good enough" can break the law unless we continue finding
compelling reasons to buy more horsepower. 8K VR anyone?

