
Ask HN: Why are modern Intel CPUs not getting a lot more cores? - trapperkeeper79
As we move to further process nodes, I presume we have more surface area to put cores in. About a decade ago, people used to say we&#x27;d get CPUs that have the same clockspeeds but have a lot more cores. When I look at Skylake and Kabylake (rumors), I don&#x27;t see the number of cores increasing (we seem to be at 2 for laptops and 4 for desktops). I know there are some SKUs like the 6850 with more, and of course, there arr Xeon chips. But .. why are desktop processors not getting beefier over the last 5-10 years?
======
valarauca1
1) Nobody writes multi-threaded code

For user applications multi-threading is overkill. It's hard to write, and
there are very little benefits the average user will see. Good concurrency
isn't even that important on servers. You can trash your cache and interrupt
tables but you'll still handle 10million clients per node so nobody cares.
Great concurrency is really only practiced in HFT and HPC.

2) Modern hardware/OS's suck at multithreading:

IO interrupts are normally handled on 1 core. While you process data on any
core. This causes some non-trivial caching issues. Furthermore putting 2 10GBE
nic's on the same core can prevent you from actually using all 20GBE of
bandwidth you have because the processor literally isn't fast enough. Then you
need to set affinity for epoll thread, etc. etc.

3) Amdahl's Law:
[https://en.wikipedia.org/wiki/Amdahl%27s_law](https://en.wikipedia.org/wiki/Amdahl%27s_law)

Every time you double the number of execution units, even if your calculation
gets twice as fast, the improvement factor constantly decreases. Take for
example a 60 second calculation that can be fully parallelized

    
    
           1 units: 60sec
           2 units: 30sec
           4 units: 15sec
           8 units: 7.5s
           16 unit: 3.75s 
           32 unit: 1.87s
    

The transitions between 8-16-32 really show you the effect coming into play.
Yes the calculation is being done faster. But the overall change from 8 to 32
units saves you ~5.63seconds, while using 4x the resources.

~~~
countryqt30
1) I think this argument is plain wrong. It is incorrect that EVERY APP should
use ALL CPUS. Rather, different apps use different cpus --> less interrupts
--> speed up.

2) I think they are actually very good at what can be done by now. How do you
think they are so bad?

~~~
valarauca1
1)

Yes, but then you need to manage IO thread affinity, this isn't commonly done.
Also you need to manage what is running on which NUMA node as often on 2 node
systems 1 is almost never interrupted except for cross CPU conversation.

Facebook moved to Single Socket Xeon-D's rather then bake task affinity to
their HHVM run time. So yeah, people still screw this up a lot.

2)

The main answer to concurrency these days is to just swap thread stacks in
userland when ever something blocks. Yeah this works fine, but it's no
different then what the kernel does for you. The stack is just smaller, the
swapping is baked into the user land program. So you end up breaking most
debuggers, and your C-FFI gets very slow.

This isn't any faster then what the kernel is doing. It's just the language
run-time limits stack size so the swapping happens faster. Setting smaller
stack sizes and using Thread Groups yields similar performance.

------
onli
AMD tried and failed. Multiple cores is what the FX is all about. What's the
one type of application that drives processor sales on the desktop? Right,
games. What's one of many types of applications that do profit more from fewer
stronger cores than from more slower ones? Right again, games.

That said, over the last 5 years processors did get beefier. They just did not
got more cores. The i5-2500K (from 2011) is slower than the i5-6600K, both
have 4 cores though. But if we go back 10 years, we are looking at the Core 2
Duo, those had only two cores.

And games are also changing. Not too long ago you could still play games (like
Fallout 3) with a single core. New games (like Fallout 4) rely on the
processor having multiple cores, some don't even start with a dual core
(Hyperthreading is needed then and saves the i3). And games supporting DX12
(and probably Vulcan) work better on the FX processors than those using DX11.
More cores become more useful now, and processors will have to add more of
them also in the consumer desktop area soon.

~~~
geezerjay
> AMD tried and failed.

What's your personal definition of "failed"? AMD may not be the market leader
but for the same price, their multicore processors are far better than Intel's
offering.

Old games that haven't been properly designed to take advantage of more than 2
or 3 cores don't benefit from AMD's 8-core processors, but calling a marketing
issue a failure, particularly a technical one, is simply disingenuous.

> Multiple cores is what the FX is all about. What's the one type of
> application that drives processor sales on the desktop? Right, games. What's
> one of many types of applications that do profit more from fewer stronger
> cores than from more slower ones? Right again, games.

False dilemma. All processes take advantage from faster cores, but more cores
don't mean faster cores. More cores offer the ability to run more processes
concurrently.

If an application isn't developed to take advantage of the available cores,
you end up with a system where over half of the available computational
resources remain idling while only 3 or 4 are taxed.

Inefficient software isn't a hardware problem.

> And games are also changing. Not too long ago you could still play games
> (like Fallout 3) with a single core.

That's precisely the issue. Game developers need to target the computer
hardware that gamers use, and the computer hardware that the average gamer
uses is very old. For instance, let's look at Steam's hardware and software
survey:

[http://store.steampowered.com/hwsurvey](http://store.steampowered.com/hwsurvey)

Nearly half of the hardware runs with less than 2 cores, and around 95% runs
harware with up to 4 cores.

Yet, AMD offers 6 and 8-core processors, which appear to be used by less than
2% of Steam's gaming community.

Why would video game developers spend their resources designing software that
take advantage of the available computational resources provided by AMD's 6
and 8-core line of processors if this only impacts less than 2% of the gaming
community?

Therefore, games keep being developed based on constrained hardware
requirements.

But this is by no means a technical failure on behalf of AMD.

~~~
onli
> _What 's your personal definition of "failed"? AMD may not be the market
> leader but for the same price, their multicore processors are far better
> than Intel's offering._

They are not. I talk mainly about gaming performance here, because it's the
one thing where cpu performance actually matters for a relevant market. Have a
look at [http://www.techspot.com/review/991-gta-5-pc-
benchmarks/page6...](http://www.techspot.com/review/991-gta-5-pc-
benchmarks/page6.html) as an example. The i3-4130 (that wasn't even their
fastest i3 at that time) as fast as the FX-6350, the FX-8350 slower than the
old i5-2500K. AMD has not one processor that can compete with an i5-6600K in
current games. It's not like you can't play modern games with an FX-8370, but
with that you pay as much as with intel, you get worse FPS and your processor
uses more energy. Ah, and you buy into a dead socket.

Other criteria for failure: Market share. AMD has a little bit more than 20%
of the PC-gaming market,
[http://store.steampowered.com/hwsurvey/processormfg/](http://store.steampowered.com/hwsurvey/processormfg/).
It gets less and less since the disastrous FX started.

> _Inefficient software isn 't a hardware problem._

If you want to sell your hardware of course it is.

> _Why would video game developers spend their resources designing software
> that take advantage of the available computational resources provided by AMD
> 's 6 and 8-core line of processors if this only impacts less than 2% of the
> gaming community?_

Despite all, the FX-6300 is a popular processor. I highly doubt that number.
The FX line can't have sold that bad.

> _But this is by no means a technical failure on behalf of AMD._

I'm not talking technique. I'm talking about performance, both in real
existing applications and on the market. In both the FX was a disaster for
AMD, and I really hope the coming Zen-Architecture can make AMD relevant
again. A Intel-Monoculture would be horrible, as is already visible in prices
and customer-friendliness of the offerings.

------
PaulHoule
Memory bandwidth is another issue. What people forget in the recent deep
learning pissing match between Intel and nvidia is that memory bandwidth
limits mean the difference between optimized CPU, GPU and FPGA systems is not
that much.

~~~
wyldfire
Memory bandwidth is _the_ issue for scaling CPU cores. Very, very few problems
can saturate 32 cores worth of computational work with the N-channel DDR
memory, so there's no reason to design a Xeon or Opteron with more than that
many. There may be challenges related to N-way cache coherency too.

> memory bandwidth limits mean the difference between optimized CPU, GPU and
> FPGA systems is not that much.

Well, not really -- the enormous memory throughput of GDDRx is much of what's
fueled the growth in the GPGPU market over the last decade.

------
hyperpallium
They are, but in the GPU.

GPUs handle the tasks that can be easily parallelized better than CPUs. Many
supercomputers are now made from GPUs.

I have a theory that we will gradually favour business and societal forms that
_can_ be easily parallelized, i.e. on GPU.

BTW intel's growth focus is on ASICs (application-specific integrated
circuits) - effectively, code pushed down to silicon.
[https://news.ycombinator.com/item?id=11287511](https://news.ycombinator.com/item?id=11287511)
And will integrate them in future Xeons
[http://www.pcworld.com/article/2921832/intel-looking-to-
boos...](http://www.pcworld.com/article/2921832/intel-looking-to-boost-
horsepower-on-server-chips-with-asic-integration.html)

------
detaro
Because common desktop applications still don't do much with multiple cores,
and thus a few fast cores are better (and techniques like TurboBoost are
useful, where cores sleep to free the energy budget to allow a single core to
clock even higher). And for many things, CPUs have been fast enough for quite
a while now.

So the common CPUs have only a few cores, people who really want more get to
pay the premium for the gamer models or Xeons, since they don't really have
alternatives, and Intel knows that ;)

For servers, you can get CPUs with 4-16 small(er), low-power cores instead of
a few large ones. (Atom C2xxx, Xeon-D)

~~~
melling
How many applications are actively running on a modern desktop? The kernel and
GUI... The modern browser with Javascript engine? An always listening Cortana
or Siri would be nice for many people. What's the current threading on modern
games? I'd think at least 8 cores could be utilized.

~~~
detaro
But could they be better utilized than 4 cores with the same energy budget and
cost?

Most of the time, most applications use barely any CPU at all. What you really
feel is the moments when a single application suddenly needs a lot of CPU, and
when it isn't parallelized it wants a single fast one, not several slower
ones. I almost never see my system under 100 % load, but I regularly see
processes maxing a single core.

~~~
melling
You'd think most apps could be built to utilize at least 2 cores, instead of
pinning just 1 core.

------
grabcocque
The move over the last few years has been towards mobile, and decreasing power
consumption.

So over the last ten years or so, Intel has responded to this realisation by,
inter alia, using the extra transistor budget afforded by Moore's Law to move
more and more of what was once on the chipset on die.

This is, practically, a much better use of silicon than simply adding more
cores.

~~~
pjc50
Intel has responded to this realisation by .. giving up on mobile, for the
time being.

(The integrated graphics are getting quite acceptable and we're starting to
see PC-on-a-stick and cheap Intel tablets as a result, but I'm not sure how
big a market that is)

~~~
dr_zoidberg
Actually, the Iris branded GPUs are quite impressive, considering that they
are essentially a co-processor inside the chip and they pull over 800 GFLOPs
of compute power.

[https://en.wikipedia.org/wiki/Intel_HD_and_Iris_Graphics#Sky...](https://en.wikipedia.org/wiki/Intel_HD_and_Iris_Graphics#Skylake)

------
elcct
You can put Xeon in desktop motherboard like with X99 chipset. I myself
recently got 22 (44 threads) core one, but for day to day tasks it doesn't
make a difference and it is quite expensive. If there was a larger demand then
perhaps we could see more of these.

~~~
soulbadguy
what kind of workload did you intended to you use this beast for ? I really
wish i had a use case to buy on of those :).

How did you decided to go with the X99 chipset instead of the c230/c236
(traditionally used with xeons)

~~~
elcct
Mainly for making music, some plugins that are doing simulations are very
computationally expensive, so it is not possible to run many instances in
realtime using typical 4-core PC. So multicore cpu like in my case helps a lot
with this. Another thing is that I use a lot of virtual machines for my work,
so it is handy to have when launching a lot of them etc. You also get the
benefit of having more memory slots available. In terms of X99 - I had
i7-5820K cpu which I was maxing out all the time and since X99 is supporting
XEONS as well, i decided to keep it for money reasons :) I could also keep
non-ECC memory, so that was a plus.

------
filereaper
Caches.

Xeons and the like dedicate much of their transistor budgets towards larger
caches.

If you take that budget for caches and put it towards more execution units,
you get the equivalent of Xeon Phi which is a lot of Pentiums built on today's
fabrication processes.

~~~
frou_dh
With the normal Intel Core CPUs, they seem to often have a _huge_ amount of
their chip area / transistors taken up by integrated graphics. Which could be
a bit depressing if someone gets one and doesn't use the IGP.

~~~
Vexs
IT seems to me a lot more enthusiast desktops are using xeon processors
instead of cores, this being one of the reasons people are making the swap-
you gain a lot of performance/$$ because loosing integrated graphics saves a
fair amount of money. The average user still needs integrated graphics though,
so I don't see the core ever going away. Who knows though, they might come out
with some enthusiast grade core processors that don't have int graphics.

~~~
frou_dh
Hopefully said enthusiasts take the opportunity to now use ECC RAM! It's
rather hostile that the consumer lines are fenced out from correctness-
promoting technology.

------
theranos87
They actually are doing more cores... For now, they integrated the Phi chip
into a real cpu. Give it another 2-3 years then this tech will come to
consumers (very likely I think).

[http://www.anandtech.com/show/10553/asrock-rack-
launches-2u4...](http://www.anandtech.com/show/10553/asrock-rack-
launches-2u4nf-x-200-knights-landing-xeon-phi-cpu)

And I completely disagree with that consumers don't need many cores... That
would be paramount to saying that consumers don't need GPUs. One of Intel's
core arguments for bringing Phi to the masses is realtime raytracing
(something for which normal GPUs suck bad). Also NVidia is working on a Phi-
kind-of-thing as their next generation GPU. So we are getting there, but it
was a long journey from 2006.

~~~
hawski
Interesting. Could you provide some links to nVidia's next generation GPU to
be a Phi-kind-of-thing? Just curious.

------
minipci1321
A balance needs to be striken between the consumers of data (OOO execution,
multiple issues, HT threads and multiple cores), and the providers of data,
the cache subsystem (and memory etc).

Bigger caches that would be required by more cores in order to keep the
"usefulness" of the design in balance, would lead to higher power consumption,
which a) is a primary criterium of evaluation of the product such as laptops,
even before the computing power; b) not justified given typical "office+home"
workload, as already mentioned in the answers.

~~~
amelius
Also could it be that bigger caches would require more coherence-logic (or
even a more complicated/congested cache-bus), making everything slower, and
eventually negating the advantage of more cores?

~~~
imtringued
You can make cache coherence scale to multiple cores with precise tracking and
the overhead of precise tracking can be reduced by arranging your cores as
clusters.

Exact tracking basically requires a bit field with one bit for each core. The
problem is with the storage overhead since a 64 core cpu would need 64 bits
per cache line. However if you had a 64 core CPU you could partition it into 8
clusters with 8 cores each which means you only need 8 bits per cache line for
exact tracking.

[http://www.cis.upenn.edu/acg/papers/cacm12_why_coherence.pdf](http://www.cis.upenn.edu/acg/papers/cacm12_why_coherence.pdf)

------
engr_student
Thoughts: \- Diminishing returns in performance per core because some tasks
are truly serial, not parallel. \- It costs money - putting stuff on silicon
costs money. The only justification for spending more money is that you will
make it in revenue. \- The PC market imploded. Lenovo has killed the laptop
and PC market, and soon will be entering the server market. Expect the end of
the server market in the next 5 years or so. There is a lot less money to go
around. \- Moores 1st law (there are several) is hitting a wall, that is what
drove the multi-core approach. They couldn't double the clock speed, so they
effectively added to the clock-count. \- There is on-die architecture
including routers and memory busses. They have capacity limits and tend to be
the performance limiters for max temperature. They might be the limiting
technology. \- Low power is important, as is low heat. More cores (aka more
ALU's means more heat and higher power use. \- The more components, the easier
for something to go wrong. \- Gartner says the next big money is in IoT, not
massively parallel on the PC. (It seems google beat Intel to a machine-
learning ASIC, and deep learning seems to be a decent market right now - but
because Gartner didn't say it the Intel leadership can't hear about it).

------
gpderetta
All the answers about more cores not being that useful for desktops are true,
but there is also the fact that Intel doesn't want to cannibalize its high
margin Xeon business by selling high core count cheap desktop CPUs.

Finally for desktops, and especially for mobile, the die real estate is
probably put to better use by incorporating a larger integrated GPU.

------
BatFastard
I agree with the "less core but faster processors are what is most useful on
desktops".

But I ask, has anyone one or any language made the number of
processors/hyperthreads transparent? Ideally I would just create a thread and
some load balancing system would distribute it to the correct processor. Am I
just behind the times?

~~~
virmundi
On a small number of cores Erlang does this. That's why actors can be in the
thousands and still performant.

~~~
akavel
Also Go language.

------
brudgers
If the desktop has four cores now, then it has double the cores of a few years
ago when two was standard and that was double the number of cores 'anyone' had
ten years ago: desktop Conroe chips started shipping in 2007. Which means the
number of cores has quadrupled in less than ten years.

------
gargravarr
This is my current understanding:

Mostly down to poor support for multiple cores in software, a lot of which
have some kind of legacy heritage, and some tasks not translating well to
running in parallel. Take video editing, for example - generally a single task
the requires straight-line speed on one core, while the rest sit idle. Games
are the most intensive for multi-core CPUs, but gamers usually buy top-end
components, up to and including Xeons.

If anything, we tend to have a glut of CPU performance for day to day tasks.
Even low-grade CPUs like Celerons are capable daily-use machines; some tech
people (a friend included) see no reason to go beyond an i3. The multi-core
paradigm shift around 2005-6 also massively increased CPU efficiency over
anything that came before, reducing the need for existing applications to be
reworked to exploit the extra cores. Intensive Javascript on sites can really
drag your browser down, but even Chrome only runs one process per tab, and few
people work with more than one tab at once. Multi-cores therefore tend to take
care of background tasks, with only one core at a time doing all the heavy
lifting. For one of the best examples of multi-core not being the ideal
solution, one need only look at the Playstation 3. The 7-core (I think?) CPU
is extremely powerful, but very, very difficult to program for, creating real
headaches for developers trying to exploit its potential.

Multi-core might have drastically improved power efficiency, but it hasn't
completely solved the heat generation problem. With the Prescott P4, Intel
realised the core was generating so much heat it presented a serious problem
to cool it. Cramming 10 of those on a single die would probably cause a China
Syndrome. Adding cores allowed manufacturers to clock each core lower and
still crunch a similar amount of numbers while reducing the heat output, but
this sacrifices straight-line speed for single tasks. Ergo, AMD and Intel have
to compromise for their consumer-grade products - enough straight-line
performance that it doesn't feel slow for the current task, and enough cores
to keep the throughput high. Looking at the current Xeon E7 v4 series, for
example, increasing the core count above 4 results in a decrease in clock
speed, likely entirely down to keeping the heat under control.

Add into this, laptop sales overtook desktops sometime around 2010, which
require lower power consumption above all else. You could put a 16-core CPU in
a laptop, but to keep power consumption realistic each core would probably end
up clocked slower than a Pentium II. Further to this, a second shift occurred
a year or two later where tablet and phone sales outstripped laptops. Mobile
devices based on ARM CPUs are very power-efficient for all but the most
demanding tasks. I know several people who've ditched laptops altogether and
use Android or iPads exclusively. Essentially, current consumers have no need
for the higher number-crunching capability of a desktop, which in turn means
the traditional market for Intel and AMD has shrunk noticeably. ARM CPUs are
starting to come standard with 8 cores, and it's difficult to find a current
device with less than 4, all helped by the ARM architecture's power
efficiency. Because they run an OS designed specifically for low power, not
for general purpose, the cores can be clocked lower.

A lot of factors, but as noted, Intel and AMD are still improving their CPUs,
just not by cramming more cores into the same die.

------
TempleOSV409
I am the proud owner of a Xeon 12/24 3.0Ghz. It's the best one!

Some people like hot-rod cars.

I don't use a GPU like normal people do. Graphics are great for parallel
execution. (Obviously)

If everybody owns a Xeon 12/24 then you can make real-time software that
requires it. There's not a chance any software will be written until most
people have it.

