
Intel Discloses Lakefield CPUs Specifications - rbanffy
https://www.anandtech.com/show/15841/intel-discloses-lakefield-cpus-specifications-64-execution-units-up-to-30-ghz-7-w
======
CountSessine
Jeez - how is this supposed to work with OS scheduling? The charm of
big.little on arm is that the instruction sets between the big and the little
cores are identical. Now the OS has to pin processes based on support for
different instructions? What a ridiculous nuisance. Intel really couldn’t
discipline themselves just this once and actually implement the same
instructions, even if in microcode, for both types of cores?

Are the little cores’ instructions at least a complete subset of the big
cores’? Are we going to have some ridiculous situations where the little cores
are completely pegged but the OS can’t migrate their processes off to the big
core?

Or do kernel programmers need to start chasing Intel and trap/software
implement every single future AVX8192AESNISSSE instruction Intel jams into
future instruction sets to provide Xeon market differentiation?

~~~
kllrnohj
> Now the OS has to pin processes based on support for different instructions?

The only complication here would be if they have differing extensions like
AVX512, but that's easily solved by the OS by just advertising the common
baseline. Nothing about this looks difficult to support?

~~~
CountSessine
_by the OS by just advertising the common baseline._

What does that mean? Advertise to whom? The process/process loader? Does it
mean that I can’t compile with -mavx2 anymore? What if I do?

The extensions are the whole problem.

~~~
kllrnohj
> What does that mean? Advertise to whom?

Runtime detection is the process querying what extensions are available, and
then selectively using those. You adjust what the query returns to only return
the common set.

Runtime detection has been a pretty standard thing for well over a decade now
- it's how we all manage to run the same compiled binaries over the years
despite variability in SSE & AVX support. You don't download different
versions of Chrome/Photoshop/Gimp/Premiere/Blender/Whatever compiled for
different CPU micro-architectures, do you? You might if you run Gentoo I
suppose, but that'd be about it.

> Does it mean that I can’t compile with -mavx2 anymore?

You already can't if you're shipping binaries to users unless you only support
Skylake & newer? There's a lot of CPUs currently in use that don't support
AVX2. So... you either already have this problem and you're familiar with it,
or you're not doing this and it's moot.

~~~
CountSessine
_Runtime detection is the process querying what extensions are available, and
then selectively using those. You adjust what the query returns to only return
the common set._

Except that they almost always do this runtime detection _once_ , on startup,
and then choose/thunk codepaths accordingly. If the OS just happens to start
my avx2 process on a little core (and how is it going to know better?), that's
going to turn off all of my optimizations, regardless of where the process
subsequently gets migrated to.

 _You already can 't if you're shipping binaries to users unless you only
support Skylake & newer? There's a lot of CPUs currently in use that don't
support AVX2. So... you either already have this problem and you're familiar
with it, or you're not doing this and it's moot._

Except nobody in 30 years of x86 dev expects to get a different answer from
CPUID during runtime.

~~~
wmf
If _all the cores_ are configured to advertise the lowest common denominator
instructions it will work.

~~~
CountSessine
But that defeats the purpose of supporting any extensions at all in the big
core that the little core doesn't support. Software will get the lowest common
denominator answer and just not use avx2. So why support it in the first
place? Why not just do the right thing and have uniform extension support like
big.LITTLE?

~~~
wmf
Chips aren't designed from scratch. They're assembled out of previously
designed components and in this case Core cores and Atom cores were never
designed to work together.

~~~
CountSessine
No - but that just means that Intel shouldn’t do this at all. Either don’t
support stuff like avx and avx2 in the big core by disconnecting those blocks
or support a slow microcode version of avx and avx2 in the little cores.
Supporting different extensions for a CPU used with modern preempting OS’s
doesn’t make any sense.

~~~
kllrnohj
> Either don’t support stuff like avx and avx2 in the big core by
> disconnecting those blocks

That's partly what they did. From the article: "One thing we can confirm in
advance – the Sunny Cove does not appear to be AVX-512 enabled."

Maybe they also fused off AVX & AVX2 support in the Sunny Cove core as well,
we'll see.

And disabling AVX in cores that otherwise support it is already a common thing
- see the Pentium & Celeron lineups that Intel currently sells. They don't
have AVX/AVX2, even though the cores inside them definitely could offer it.

------
kllrnohj
The ARK page is up:

[https://ark.intel.com/content/www/us/en/ark/products/202777/...](https://ark.intel.com/content/www/us/en/ark/products/202777/intel-
core-i5-l16g7-processor-4m-cache-up-to-3-0ghz.html)

There's no mixed instructions as Anandtech speculated. The Sunny Cove core was
cut down to match what the Tremont cores support. So no AVX at all, no weird
extension mismatch, no OS headaches beyond the expected big.LITTLE headaches.

~~~
shorts_theory
How much would the lack of AVX hurt performance apart from scientific
computing? It's interesting to see Intel offering a chip with a clear focus on
tablet computing, so the lack of AVX might not be so bad if the Tremont cores
improve battery life significantly.

~~~
dr_zoidberg
I can answer that: a lot. AVX can speed up things anywhere from modest 20% (if
not very well implemented) to massive 15 or 30x when carefully used (usually
things that very good fits for AVX/AVX2 instructions/use cases).

Anecdotically, I have a "netbook" that has a Goldmont+ Celeron N4000 that
works very respectable for everyday business (web browsing, office suites,
watching videos, etc) but crawls when trying to run scientific code. 100x
slowdowns at worst, 10x slowdowns on the parts that take "nice and easy" (wrt
a 7200u notebook that I usually carry around).

~~~
kllrnohj
> I can answer that: a lot.

You mean not much. The question was _apart_ from scientific computing how much
does it hurt. Per your own comment it's only worth maybe 20%, and that
everyday needs work fine.

It'll show up occasionally in some things that would be relevant to a device
with this SoC, like noise cancellation, but very little else. And even then
you can do noise cancellation even better on a GPU (RTX Voice says hi), and
this does still have a GPU and a decent one at that (~500 gflops), so it's not
even that simple.

~~~
dr_zoidberg
Read my second paragraph. Also Intel, despite having theoretically amazing
GPUs, has for years been a lackluster player in that area. Ironical too, since
they're one of the largests iGPU manufacturers in the world.

------
Twirrim
I'm really curious how they envision this working, given the different
instruction sets between the main core and the small cores. That seems like a
bit of a nightmare to handle on the OS side of things, unless you turn things
in to a two tier environment somehow, and require software (users?) to
explicitly opt-in to the small cores somehow.

~~~
ww520
I would imagine the difference between the instruction sets of the two CPU
types is small since the small CPU is Atom which is not too different from the
main CPU.

The program machine code can be scanned when loaded to look for specific
assembly opcodes to determine the required capability of the CPU to execute it
on. The code with instructions not fitted for Atom will be sent to the main
CPU only.

Edit: Just a thought. The OS can install an illegal-opcode exception handler.
When a process is first run on the small CPU, the unsupported opcode will
raise the exception. The exception handler can simply set the processor
affinity of the process to the main CPU and put it to sleep. The OS will
handle it like it normally would - putting the process in the run queue of the
affined processor.

~~~
jlebar
> The program machine code can be scanned when loaded to look for specific
> assembly opcodes to determine the required capability of the CPU to execute
> it on.

That works as a heuristic, but it's not perfect, since JITs and self-modifying
code are a thing.

I expect the chip will raise a fault and the OS will move the process.

~~~
aarongolliver
This is one of the things we tried when I worked (2011-12) at Intel on QuickIA
[0] (dual socket, Atom on one side Xeon on the other). One of my first
projects was to write a vectorized matrix multiplication that could only run
on the Xeon, so we could demo the fault-to-big-core behavior.

[0]
[https://www.neotextus.net/hpca12.pdf](https://www.neotextus.net/hpca12.pdf)

(I think the "3.2 QuickIA Software Support" section is interesting, if nearly
a decade old by now)

------
philistine
With the heavy rumours of Apple moving forward with ARM chips in Macs, Intel
clearly showed the plans for this chip to Apple last year, and Apple said no
thank you.

~~~
pradn
IMHO, Apple's phones were considered fantastic for a few reasons, in order:
brand (perceived quality, luxury status, pricing, "Apple"), software lock-in
(iMessage, iCloud), and privacy (also a branding thing since they still use
Google for Safari search). The hardware and camera were usually matched or
exceeded by top-tier Android manufacturers. Now, Apple-designed CPUs are
leaving their competitors in the dust, and provide, perhaps for the first time
in a long time, hardware superiority. I don't think consumers quite understand
the actual performance gap yet. But once it gets out, it's probably going to
be second or third in the list for why people buy iPhones.

I'm going to switch from a Pixel to an iPhone this year. The CPU is just
clearly better, and seems to actually matter for taking photos and web
browsing. (The Pixel still takes like 5 seconds to post-process a photo.
iPhones do it instantly.)

~~~
oehtXRwMkIs
What are your sources for the claim that the hardware superiority actually
matters in real life? Every once in a while when I'm looking to buy a phone, I
like to check out videos that do real life speed tests (mainly opening apps
but other tasks as well) and iPhones have never been on top. Curious to see if
that has changed recently.

~~~
acdha
You're asking for sources and then citing unnamed “real life speed test”
videos without any way for people to see what they're measuring or how solid
their methodology was?

The main area where normal people notice this is in web usage, where Mobile
Safari has handily outpaced Android browsing for many years — see e.g.
[https://discuss.emberjs.com/t/why-was-ember-3x-5x-slower-
on-...](https://discuss.emberjs.com/t/why-was-ember-3x-5x-slower-on-
android/6577) from 2014. How much that matters depends on how much a
particular website is limited by single-core JavaScript performance — well-
engineered sites probably don't have a huge impact but it's quite noticeable
on anything which has a bloated SPA and the web has been moving in the latter
direction for years.

Gaming is the other area where this is fairly noticeable but that varies both
in where the bottlenecks are (CPU vs. GPU), how prominent the effect is, and
the relative quality of the ports so it's harder to do a fair comparison.

~~~
oehtXRwMkIs
Burden of proof is on the one who makes the claim. I'm just asking for some
links and I explained why I doubted the claim.

I appreciate your effort with giving more background, although I think 2014 is
a bit dated with Firefox Quantumn becoming more of a thing on Android. Can't
remember if the Preview has it yet or not.

------
bigtones
This has a lot of parallels to how Intel incorporated some of the best ideas
from RISC processor designs in their x86 CPU’s to produce a better product
under intense competiton in the 90’s - they're doing the same thing here with
ARM processor designs ideas, like the “Big Little” ARM core designs for low
power, translated to x86.

~~~
MrStonedOne
you lost me at "produce a better product". I don't think there are any
parallels to that in this offering.

------
acd
Very good tdp numbers. This sounds like an X86 version of arm big little
architecture.

Amd can do lower tdp to with underclocking undervolting. But should offer
something out of the box.

Happy cpu makers are lowering power consumption we have global warming. People
often talk about co2 with cars but less so with computers.

~~~
rbanffy
Just remember these are Intel Watts. They tend to be larger than ISO Watts
depending on the workload the processor is running.

------
chx
Why? I do not understand. It would make sense if the Atoms were performance
per watt champions but
[https://www.cpubenchmark.net/power_performance.html#intel-
cp...](https://www.cpubenchmark.net/power_performance.html#intel-cpu) in this
chart you basically only see various generations of ultra low (Y) and some low
(U) Core chips, going back to Broadwell when the Y chips were introduced.
Atoms make a very rare appearance. The top performing Sunny Cove (which itself
has much poorer score than the latest Amber Lake Y but I digress) brings in
623 CPU Mark / Max TDP where the top performing Atom, a Pentium Silver N5000
features 455. You can argue on the finer details of this particular benchmark
but we are talking of an absolutely brutal 40% difference. Core CPUs switch P
states extraordinaly quick so if less performance is needed, they will consume
less power just fine. What's the point...?

Cheat sheet to reading Intel CPU codes: if the number starts with 10 and
contains a G it's an Ice Lake. Otherwise, if there is an Y somewhere, it's
ultra low power. The last U letter means 15W, rarely 28W. Last letter T means
a 35W desktop chip. E means embedded. Letter H means 45W mobile. All of these
were Core chips, first letter N, J, Z means Atom. First digit (or two digits
for 10) is generation which became an absolute mess past 7th gen, most
important change is that 8-U is quad core where 7-U was dual core.

------
londons_explore
And the real question... Will it have more performance per dollar than
whatever AMD has up its sleeve...

~~~
phonypc
AMD doesn't really have anything in this category, do they? Or even any
announced plans for it?

~~~
AnthonyMouse
Ryzen 4000 series has 8 cores down to 10W. This is 7W but with fewer and
slower cores.

But there _are_ things that need lower power more than multi-thread
performance.

~~~
ihattendorf
They really aren't comparable. For example, standby power for Lakefield
appears to be 2.5mW. Idle is probably significantly lower as well. They're
targeting different use cases.

~~~
AnthonyMouse
I can't find any figures for the standby power for Ryzen 4000 series, but
standby is basically off. The standby power consumption should be negligible
in both cases.

It's also not obvious what that use case would be. You can put most laptops
into standby and they'll run on battery like that for many weeks. What's the
thing that needs more than that?

It would be interesting to compare _idle_ power consumption, but for that we'd
have to know what it actually is.

~~~
londons_explore
Also, don't most x86 CPU's support being powered off entirely during standby?
Ie. flush the caches, write the registers back to RAM, set RAM to self-
refresh, and then write some config registers in some power control IC to cut
the power to the CPU.

Then it doesn't matter what the standby power consumption is - it simply
matters how quickly it can get back into a working state from OFF.

------
tibbydudeza
It all sounds like a respin of the Cell Processor concept from the PS3 .. big
standard CPU core surrounded by little DSP like cores that you have to farm
out work to and manage and keep fed.

------
ineedasername
I'm unsure of how this is being positioned in terms of CPU "horsepower". Is
this supposed to be something like the next gen version of their "U" chips?
More powerful? Less powerful but more portable with lower watt/TDP?

Also the "stacking" approach is really interesting, I wonder how far that
approach may go though-- heat dissipation seems like it would quickly become a
problem with multiple layers.

------
Zenst
Be nice if they could shift some of the CPU core towards
[https://en.wikipedia.org/wiki/Asynchronous_circuit](https://en.wikipedia.org/wiki/Asynchronous_circuit)
design and that way be able to run those parts at a variable clock rate
instead of one part for this frequency and another for a slightly lesser
frequency.

For power efficiency for dynamic loads, an asynchronous circuit design would
win over current solutions. However designing asynchronous circuits is a
magnitude more complex than a synchronous one.

But for parts like AVX, extensions that will only be available upon the main
core, that would yield dividends and with that. However, may find they offer
such extensions as a separate chiplet/stack and negate the issue of does this
core support it as they all could tap into that.

What i'd find interesting would be the actual design and instruction set under
the hood, the x86 gets translated via microcode and do wonder how much of the
underlying way things work has diverged from that original instruction set.

~~~
monocasa
Async design is basically snake oil AFAIK that's only produced cores in the
10Ks of gates.

And there's still benefits to lower speed sections as there's physical
differences to the transistors to emphasize power consumption of switching
speed that'd still continue into async designs.

On top of that, part of what we're seeing is dark silicon and the specifics of
Dennard scaling. You can't light up the whole chip and not melt the chip, so
you're going to see mobile TDP chips where you turn half the chip on or off at
a time either way.

~~~
Zenst
Thank you for that, had somewhat overlooked the aspect of parts not used would
in effect be shifting heatsinks and the whole big/little does in many ways
give you whole easily controlled area's which for multi core designs can work
well.

But biggest issue with any adoption of growth in asyn design is the tools and
skills to do such work. Though it does prove hard to compare and most of the
development in the async area been driven by the EM advantages for space based
usage.

------
MintelIE
Can we get this one without Intel ME? Or with the HA bit twiddled in advance?

~~~
unnouinceput
not in a million years. Having complete access to your device without you
being able to do anything is their wet dream

------
moonchild
> new 3D stacking ‘Foveros’ technology

...wonder where they got that one from.

~~~
bondarchuk
Total crapshoot but it sounds like "foveon" which is a layered camera sensor.

------
jokoon
So can it compete with GPU, in a way?

I mean if it can do minimalist 3D, with opengl ES, with good performance or
the same performance of a playstation 2, it would still be interesting, since
high end, bleeding edge GPU graphics are not always interesting for everybody
(and developing on bleeding edge GPU seems like it requires a lot of work).

A dedicated GPU always made sense for high performance, but at some point,
having a console or a gaming PC that can run games with a single big chip
might not be a bad idea. I have been able to play wow classic on a laptop's i5
with integrated graphics, and it was just fine.

Although I'm not sure if that new CPU is just that kind of "hybrid" design.

~~~
monocasa
The GPU is somewhere between 360/PS3 generation and XBone/PS4 generation, but
probably gets throttled heavily on sustained workloads.

