a. The 10nm process fiasco
b. Missing the chiplet concept
c. ISA fragmentation - AVX512 which was supposed to be the next big thing was server CPU side only till recently and downclocks the entire chip when used making it extremely hard to reason about whether it makes sense to use it for mixed workloads.
d. With a move to more distributed programming models the network is often in the critical path now, making cpu performance less relevant for many workloads
e. ARM eating their lunch in mobile
f. Nvidia eating their lunch with GPGPU
g. Big players are increasingly building accelerators for critical workloads that offer better power / performance
h. Strong execution from AMD for the first time in over a decade
i. GUIs moving to browsers and a lot of compute moving to runtime based languages like Python/JS means that any new ISA features are initially hard to utilize in killer apps on the client side. So if Intel ships a killer new set of instructions there is a significant lag between it doing something useful for the user.
What Intel really should be worried about is the client side being their largest revenue segment. That’s a dead business, eventually. And the bets they made didn’t pan out: FPGAs aren’t popular because the people sophisticated enough to use them are also smart enough to tape out ASICs. IoT is not a thing.
Maybe you know something I don’t but that FPGA statement makes zero sense to me. The ASIC development cycle is measured in years - that’s why FPGA’s are valuable (and I thought they were relatively heavily used).
With the newest Ice Lake processors ("10th generation") the all-cores-active, all-avx-512 max clock speeds are the same as max scalar clock speeds. You can try this out yourself with the avx-turbo program.
It will be interesting to test Ice Lake when they make it to the cloud, hopefully some time late next year, but until we can actually use Ice Lake, Sky Lake is what AVX512 will be judged on.
Codes that can do a lot of 512b FMA consecutively will benefit very greatly, and pay a small penalty (up to 25%) in terms of throughput for everything else.
Codes that use non-multiplier stuff that's just marketed as AVX-512, like VBMI2, also benefit greatly and without any penalty.
People with AMD CPUs don't get a choice. Hard to see how this accrues to Intel's mistakes column.
AMD isn't relevant in this space AFAIK.
For more reading, check out cpu pipelining, as well as how vectorized instructions actually work. The performance benefit for well implemented vectorized instructions overcomes the clock speed hit by leaps and bounds, which is why mobile systems make such heavy use of them, for example.
We do use them at my employer, a multinational chip company - but only in very small numbers, like one $50k board gets shared around project groups who use it for a few weeks each. Most of the work is done in simulation.
Another of the big advantages for AMD is that their products aren't reticle limited. The basic design lets them have a single design they bolt into dozens of configurations that scale larger than what intel can fit on a single die. Hence 64 "big" cores in a single socket.
There are likely other advantages too (cooling?) that partially make up for the longer more complex core->core latencies.
Bottom line, I don't believe that intels product lines prices in any way reflect what the actual yield curves are.
Its hard to tell, but intel still has a strong markup on 24 core parts being sold from 28 core dies. Intel has often be "caught" down selling parts to protect their higher margin parts. (AKA they are selling parts with things disabled that work)
AKA its a perfect part, but its being sold with a couple cores disabled or at a frequency below whats its capable of.
It deputed on workstation accelerator cards.
> It does not “downclock a whole chip”, it gates the core where it is active and there’s not even that penalty on the current generation parts.
It very much could thermally throttle more than the one core.
> “10nm” is marketing fluff which has little or nothing to do with actual semiconductor construction.
"10nm", even as a proper noun, is a very important component of Intel's woes right now. They aren't getting the yields they were expecting, a major competitor surpassed the for the first time ever (TSMC) and that's how AMD is killing them right now.
> “Chiplet” is also marketing-speak for “wow this memory topology is hard to program around “. Not sure they should feel too bad about missing that boat.
No, it's marketing-speak for "near EUV process nodes have terrible yields compared to previous nodes, and need smaller dies combined on a multi chip module to get anything worthwhile for an acceptable cost". Current EPYC chips are a single NUMA node again, but still chiplets. They are absolutely kicking themselves for not bucking the trend and going chiplet, because then they would have been competitive with TSMC for yield/area. Single chips is putting all your eggs in one basket, but splitting the dies means you throw away way less chips. (Another way out is FPGAs and GPUs that practically can bin off way more of the chip).
> And the bets they made didn’t pan out: FPGAs aren’t popular because the people sophisticated enough to use them are also smart enough to tape out ASICs.
FPGAs are very interesting in a post Moore's law world. Their ability to dynamically reconfigure makes them interesting in cases where ASICs don't make sense. High level logic can be treated like code from a continuous delivery perspective (like Alibaba does with their memcache like FPGAs sitting on RDMA fabric). Data can be encoded in combinatorial logic and treated like any other infrastructure deployments (like Azure does with their routing CAMesque logic in their SDN FPGAs). ASICs don't give you anywhere near that flexibility, even in a world where they're a commodity. Don't confuse their tooling immaturity for a lack of usefulness.
> IoT is not a thing.
It's very much a thing; once again just an extremely immature ecosystem. Once high end CPUs are commidities that can been shopped around from each of the fabs, IoT external customer designs will almost certainly be a very important revenue stream for Intel. A modern fab is nothing to sneeze at, basically only countries with $20B to spend will have one, so we'll be seeing one or two per continent. It won't make sense for anyone else in the US to compete. As for how that affects IoT, tiny nodes will be amazing for little smart dust chips once the capital investment of these end nodes has been paid off.
FPGAs really need a tooling unlock to take off so they can be useful to people who haven't been on the ASIC design course.
> It won't make sense for anyone else in the US to compete
Apple have $245bn on hand, so they could have a dozen $20bn fabs if they felt the need. Bezos has $180bn and no idea what to do with it.
Fabs are both a commodity, and take a high capital investment. Unless you have a geopolitical reason to create one and you get .gov kickbacks to make it happen, then there's way better uses of your money.
2nd Comment into the page and literally everything said in that were wrong.
If Intel recovers, it may need to jettison x86, or offload x86 interpretation to a secondary unit for desktop processor with it gone from server procs.
This would make me happy because it’s been crushing to think I might go my entire career/lifetime with little endian processors being the mainstream. :-/
x86 helped with that historically, certainly, but fundamentally endianness just doesn't matter most of the time, and when it does, little endian makes more sense from first principles.
ACKCHYUALLY, PowerPC (POWER9 included) supports both big and little endian. Software wise, last I checked the Linux kernel, Debian (PPC port), Gentoo, FreeBSD, and several other major projects supported both. I believe KVM also supports flipping VM endianness versus the hypervisor.
Acknowledged, of course, that the spirit of what you said is indeed correct - the majority of systems in active use are little endian.
> AArch64 GNU/Linux big-endian target (aarch64_be-linux-gnu)
Could you explain why little endian would make more sense? While I dont take side in this debate, I have always thought Big Endian would make more sense from First principles.
With big integers, the logic is simple: if you use little endian, you can operate on the same memory representation of the bigints quite easily with machine integers of different sizes.
A similar phenomenon happens with bit encoding. Let's say you want to encode a sequence of 25-bit integers tightly packed in memory. How do you do that? With little endian, you get a somewhat more natural representation especially for seeking into the bitstream (especially for architectures that allow unaligned memory accesses).
That's true with big endian too, but you just have to store your integers in the opposite way as you would on little endian: with the MSB at the lowest address (which of course is the same as the distinction between little and big endian in the first place).
Basically the bigint layout has to be compatible with the the endianess.
I don't know what bignum implementations do in practice on big endian systems though.
I think it's similar to scientific calculator vs RPN calculator.
x86 architecture also was big on backward compatibility. You have registers that can be accessed as 8bit, 16bit, 32bit (and I believe 64bit) adding the extra bytes after similarly makes it easier, because the little significant bits are always in the same place.
Again, I don't have extensive understanding of hardware side, so I might be wrong.
x86 won fair and square. The risc people failed to foresee that instruction density would be extremely important to performance. Intel didn’t beat them with physics. CISC is just fundamentally better.
I'm not sure that exactly accurate, its more accurate to say, that there was a lack of market crossover that allowed similar power or perf envelopes.
That is because ARM did/does make much more efficient CPU's, they just aren't anywhere close to the perf of common x86 cores. AKA a low clocked in order ARM with small caches, etc is more efficient per op but it can't touch even a medium size x86. Intel sort of was in that market for a bit and their cores were efficient too, but the main selling point for an architecture is the software around it, and a 50 mhz in-order x86 can't exactly run modern windows in a reasonable way.
Now that ARM & friends are building higher perf parts, the power efficiency keeps getting worse. When someone makes a 5Ghz ARM core it will likely consume more than a couple mW.
The perf/power ratios have more to do with culture and market than ISA.
A design where all instructions have one of just a few sizes and where the first byte unambiguously encodes the length would be nicer.
RISV-V is decent in this respect.
FWIW, x86’s legacy is a security problem, too. The ISA is so overcomplicated that nasty interactions cause all manner of security bugs. As a recent example, the sequence mov (ptr), %ss; syscall with a data breakpoint at ptr could be used to root most kernels. With virtualization, this type of thing is much worse. A hypervisor needs to handle all the nasty corner cases in a guest user program without crashing itself or the guest kernel, and it needs to handle all the nasty corner cases in guest kernels without dying. There are various ways that native kernels can literally put the microcode in an infinite loop, and hypervisors need complex mitigations because an infinite-looping microcode bug triggered by a guest can’t be preempted by the host, and it will take down the system.
So yes, x86 is not fantastic.
Also, I wonder if the low density of RISC could be countered by introducing execution of compressed/zipped machine code. Some compressors like brotli are highly tuned to the expected type of data to be compressed and are very compact. All entry points to basic blocks in the code are generally known at compile time, so it can be ensured that the jump destinations are decompressible without any context, avoiding the slow process of needing to scan backwards to start decompressing...
I’m not seeing why the higher level x86 instruction set needs to be jettisoned or killed off...
Where can I read more about that?
Now we see that Amazon makes enough money and has enough volume to invest in server processors.
Not just mobile. Obviously, that's the content of the OP. But, ARM based servers have been available for quite a while for dirt cheap prices and they are great for workloads where ARM libs are available.
Edge ML inference.
I'll add a caveat to (d) because distributed can also mean distributed across cores, and whoever handles that the best gets a strong differentiator.
Oddly, Intel ought to have an edge here, because they have been selling chips with embedded 100 Gbps Ethernet controllers on-die.
However, I've never seen such a beast in data centre, I suspect they've "reserved" this feature for HPC workloads only.
So by that logic, I suppose they wanted to be? :-)
There are now also loads of possibilities to build native code, squeezing the best performance, especially for critical systems.
I’d argue that the deployment of new native code has never been easier, because it will be the browsers that first make use of new instructions, and they have large sophisticated dev teams who push out frequent releases with minimal user effort to adopt.
This is a much more important announcement than the press is giving it credit for.
I have argued in the past the Intel "lost" the smartphone CPU war because Apple decided not to wait for them to come up with some high margin processor compromise and instead to re-invest profits into the development of bespoke processors that gave their products an edge over their rivals. Others like Samsung followed suit with own processors for Android.
Doing this both takes away Intel's ability to 'gate' what is and isn't a cellphone processor, and their ability to set the margins they would like.
Amazon, Facebook, and Google have been designing and building their own server designs for years. This takes away Intel's ability to gate their choices at the Server manufacturer. As a result more AMD server processors were deployed by these three companies than the rest of the market combined.
Now Amazon is taking profits from the AWS service and re-investing them in bespoke CPUs that are tuned to the workloads they can see customers running on their infrastructure. As a result they will not only enhance their cost/power edge over Google, Microsoft, and everyone else, their infrastructure can be better than anything you can buy in order to run your own workload, locking you into their service (a moat if you will for keeping you there).
If they succeed, Google and Facebook will follow suit. (I am guessing Google already is well down this path, knowing them but also knowing their secrecy about such things)
If you take 50% of the enterprise server market out of Intel's portfolio they are left fighting for enthusiast/gamer share and AMD is eating their lunch there.
It is going to be really interesting to watch this play out.
Kind of a weird way to phrase it given that iPhone 1 through 3GS ran on Samsung SoCs. It wasn't until the iPhone 4 that Apple used their own SoCs. Not sure how Samsung could be following suit in that case.
Edit: also, Apple was never going to wait around for Intel. Apple is actually one of the early investors in ARM, back in the early ninties, had used ARM.in the whole iPod line (as well as the ill fated Newton), and iPhones lined up with ARMs in mobile TDPs starting to have full MMUs.
There were two good paths to take, but one depended in Intel and they never provided it.
Edit: Like they had been using ARM since before Intel released the original Pentium.
Inthink if Intel came out with a compelling product, Apple would have switched. I suspect they invested in ARM because it made sense at the iPod level, but once you start converging mobile and desktop experiences, there are major benefits to having the same architecture for both. Just look at how rampant the ARM MacBook rumors have been for the last few years.
I think the ARM investment by Apple was good regardless of whether they wanted to switch to Intel for higher end mobile, so I don't see it as evidence as to why they didn't want to.
Nobody was waiting for Intel to get into the cellphone market. The TDP of Intel chips just never made sense, and it was clear that this was because they just didn't institutionally care about that market segment in a real way.
Maybe pundits were waiting, but nobody serious.
Edit: and all these ARM desktop rumors are just that, rumors. They have the ability to ship a competive low to mid end laptop on their own arm chips today, and aren't. Their wide OoO designs would run beautifully in a formfactor that isn't thermally throttled so much. IMO they don't want to switch ISAs and are waiting out the x86_64 patents.
...I'm not sure it was before the Pentium though, because it appears they both came out in 1993.
This is such an understatement. From Wikipedia: The company was founded in November 1990 as Advanced RISC Machines Ltd and structured as a joint venture between Acorn Computers, Apple Computer (now Apple Inc.) and VLSI Technology.
Apple is a cofounder of ARM.
It was Steve's idea that Intel had the best chip and Fab, and he wanted the best in iPhone. He was willing to wait for Intel to give him what he demands, the price he asked for was way lower than Intel's usual margin but very high for Apple's BOM Cost.
Ultimately one of this Top Engineers put his name badge on the table and said if Steve insist on Intel he will quit. Steve Jobs ultimately backed off. They than bought P.A. Semi and the rest is history.
Intel's failure to capture the iPhone SoC opportunity was described by Paul Otellini, Intel CEO at the time as the biggest missed opportunity / failure of his life. ( And I am still quite pissed they forced out Pat Gelsinger )
If Steve had persisted he would have repeated the same Apple Lisa mistake. And the world might not be quite the same.
*Makes me rather sentimental while typing this. I miss Steve Jobs.
Still not seeing how Samsung followed Apple in SoC design when Apple started by using Samsung SoCs.
But exciting times if this flips, competition is good for all
Their success here is going to be nonlinear depending on how long they stay in the lead.
In part this may be because AMD bios stability has historically been poor - perhaps folks don't want the headaches? But intel bios security with ME has also been poor - soo...
Maybe they figured availability finally, at least for systems producers.
1. 40% better price performance over comparable current generation instances
At least for commercial availability. There's tons of enterprise SKUs just to serve customers like AWS, of course, but that's not to "squeeze the market"; that's because the customer wants custom IP cores in their chips.
These ARM CPU are made with ARM Core Design ( NEOVERSE N1 ) , which means there are no reason why Google or Microsoft, or even Apple and Facebook cant have their own N1. And they will because of the competitive advantage as you mentioned.
This solves the Chicken and Egg Problem where ARM CPU from Qualcomm and Ampere or other Vendors could not get into Server Market due to software compatibility. And no one was willing to invest porting and making sure every software runs well on these chip without Hardware. ( One of the reason why Qualcomm exit the Server Market ) Now Amazon is doing both, all the Open Source Web Stack AWS offers will support ARM, assuming Amazon upstream all those work.
And once those Software are ARM ready, other smaller Cloud Providers such as Digital Ocean, OVH, Linode etc.. will offer and switch to ARM once there are ARM CPU available.
On top of my head, the last estimate HyperScaler represent 50% of Intel's DC and Enterprise Revenue. Intel's yearly revenue from DC is roughly $20B, so we are talking about a potential lost of $10B+ over the next decade. And DC has been the most profitable segment for Intel.
Microsoft now also has new incentives to port Windows to work on ARM. ( A lot of Azure Customers runs out Windows VM ) They are also on their way with Windows on Snapdragon. Although I think it will be a few years before it becomes stable enough and mainstream.
So 5 years down the road, You now have the servers on AWS running on ARM, all mobile and Tablet are on ARM. Intel are selling less unit to amortise its cost of R&D in each leading node. AMD with better Node fighting for marketshare. If Windows on ARM worked, it is only a matter of time Apple moves its whole Mac line across to ARM as well, may be 2025+. ( Pro Application developers will have incentive to port their work to ARM on both platform )
If Andy Grove were still alive, or his apprentice Pat Gelsinger were still around at Intel, they would be facing the same dilemma half a century ago, -Should Intel invest more in protecting the memory business and fight Japanese manufacturers or should the company flee from memories market and create a new growth market?
And this time around, Should Intel invest more in protecting the x86 business and fight ARM or should the company flee from x86 and create new growth market?
It was nearly 5 years ago Intel announced Custom Foundry. They had a chance to take on and compete with TSMC. They were far too worried their best tech and IP would info would leak out. They would rather do it all by themselves. 5 years later, I doubt even if they are reopening Custom Foundry there will be any customer wanting to sign up.
Sorry if this is a grim outlook of Intel, I am sure they will stick around, just like any great companies once were, IBM, Sun, HP, they will have a long tail of decline until they become mostly irrelevant.
Most servers run Linux, and most software on Linux is distributed as source. The same reason that people can easily move to ARM - they can just recompile/download software for the correct architecture and everything mostly works - is the same reason they can leave easily.
All the other AWS proprietary APIs: sure. It'd take a stupendous amount of work for Netflix to migrate away from AWS. But running on ARM isn't really part of that.
Being distributed as source code does not mean the source code is not architecture-specific. A lot of software has SIMD-optimized code paths, and it's common for these SIMD code paths to target SSE2 as the least common denominator (since AMD included SSE2 as part of the baseline of its 64-bit evolution of Intel's 32-bit architecture, every 64-bit AMD or Intel CPU on the market will have at least SSE2), so they have to be ported to NEON. And that's before considering things like JIT or FFI code.
Clearly you aren't purchasing much software. If you're one of the (majority of) large software companies that use precompiled binaries from vendors for any of the components in your systems, you're at the mercy of which architectures your vendor(s) support.
From the software that runs for hours to compile a model, the software that calculates results, to the interactive analysis software that needs to load GBs of data.
The whole reason to run these things on a server farm is that you need large and fast machines that are better shared to make sure they get optimal use of the machine and of the license pool.
The whole point is that the hardware becomes as flexible as the software, spin up, spin down, blow it away and deploy a fresh instance if something goes wrong.
I can't see many people trying to do that with proprietary licensing?
Clouds are migrating to AMD and Amazon kicks back with home grown cores. Ouch. Chip vendors have to let go of their ISAs, no one cares, as long as you can run Debian on it.
This promises to be quite interesting. If Amazon can prove ARM as viable on their scale, that'll help companies like Qualcomm who were struggling to drum up attention from the more traditional markets.
If Amazon can prove ARM can cope with server scale operations, and even outperform x86-64, you can have a high degree of confidence that other companies will get in the game. There are many more companies manufacturing ARM chips than x86-64 (due to licensing agreements, in part)
It made sense for them to reuse their design expertise and hedge into other markets so that they didn't get stuck exclusively with trends in mobile devices. AAPL told their suppliers to withhold royalties, hurting revenue. And once AVGO took aim, they needed better focus. Centriq was a long term bet that could've worked out (cloudflare had a positive review) but it just wasn't in the cards.
Incorrect. Amazon is a public company, and their own report tells us that AWS margin is 25%, as of 2019 Q3.
I’m always fascinated when an incumbent gets beat by a company that views that core business as an ancillary means to their own core business.
Probably I think another example is Netflix creating its own content. It’s a streaming company... but now they produce award winning content.
Early on, books sales was Amazon's core business.
There's no Wikipedia page for it, and the regular google search results all loop back to this AWS announcement.
"AWS announced in late 2018 the EC2 A1 instances, featuring their own AWS-manufactured Arm silicon"
"AWS during its annual re:Invent conference announced the availability of their new class of Arm-based servers, the M6g and M6gd instances among others, based on the Graviton2 processor."
So the Graviton2 is made by Amazon as well? Under licence from ARM, I mean, like other ARM design chips.
_edit_ Looks like it, yes. Here is the context necessary to understand the story:
Every ARM based PC I've seen has been disappointingly slow, which I dont really understand because phones usually are amazingly quick.
DEC Alpha, Sun Sparc, MIPS, PowerPC couldn't keep up with Intel in the recent past either.
Intel CPUs aren't cheap, but they're a fraction of the cost of a server.
All adds up to interesting innovation but not necessarily the future.
Most recent ARM PCs were explicitly low-power low-performance. That was the goal, not a description of the architecture. The mobile part pretty much confirms that.
The ARM chips that Amazon is putting in the servers is nothing like what you'll find in a consumer arm laptop.
Surely these are in development, but based on the ARM experience, it'll be at least 5 years from availability of reasonable parts to any decent volume.
Depending what what performance level you're looking at, they're a very large fraction of the server cost. Get a dual or quad socket board and buy the higher end CPUs and you might be looking at well over half the cost.
Its possible to shift this argument different ways depending on the part of the market your looking at.
Intel stumbled in fabrication and complicated architecture is having a bit of a growing pain. The CPU market doesn't make revenue like it used it – very few desktop users feel the need to upgrade every two years and for many applications pretty much any CPU is fast enough. Growth in the cloud market could be good, but any thing that threatens their market share of the cloud is threat to Intel.
The only ARM based PCs I've ever seen are the Raspberry Pi and other products intended to compete with the Pi, and one of the core features of these PCs is being a complete computer for under $100. Keeping the price low was always the top priority, not performance.
Maybe it’s a latency vs throughput thing?
As far as I know, ARM is great at balancing latency and power consumption, but behind x64 in throughput.