Hacker News new | past | comments | ask | show | jobs | submit login
Intel Has a Big Problem (bloomberg.com)
780 points by MilnerRoute on Jan 20, 2018 | hide | past | web | favorite | 337 comments

Intel is a hot mess even without these security disasters.

Just look at their product release lifecycle: In years past, they'd get maybe one extra product release off each new arch (tick/tock); for example, Sandy Bridge bore Ivy Bridge and Haswell bore Broadwell.

Skylake has born SIX new product lines; Goldmont, Goldmont Plus, Kaby Lake, Kaby Lake Refresh, Coffee Lake, and the upcoming Cannonlake. Their failed 10nm shrink has forced product delays; remember, Cannonlake (the 10nm shrink of Skylake) was supposed to be released in 2016, and its not even out yet. Just at CES this week they said they've shipped mobile Cannonlake CPUs.

They have zero presence in mobile. Their best efforts involve competent Y-series processors. Then Apple comes around and, seemingly without even trying, destroys them [1] with a product that's more thermally efficient and, in some ways, more powerful than Intel's best mobile processors, not just their thermally efficient ones.

They have little presence in HPC/AI, where Nvidia is slaughtering everyone and its not even close.

Its completely inevitable they're going to lose Apple as a customer for consumer products; its just a matter of time. AMD is gaining traction with Zen, and they're moving in the direction enterprise cloud provider want (lots of cores, not much $$). How much longer can Intel keep holding on? Do they have an ace they've been hiding? Will people even trust their ace after Meltdown?

[1] https://9to5mac.com/2017/06/14/ipad-pro-versus-macbook-pro-s...

I agree with your general thrust though I think you are overstating a few points significantly.

> Skylake has born SIX new product lines; Goldmont, Goldmont Plus

Goldmont, Goldmont Plus are Atom products and are not "Skylake-derived" in any really major sense.

> zero presence in mobile

This is true only if you disregard the traditional laptop and newer convertible and chromebook markets. More importantly, Intel has an admittedly-indirect but financially very significant presence in the mostly-ARM cell phone and tablet markets, in that the backend for pretty much every cellphone app runs on an x86_64 server core somewhere. Every cell phone and tablet sold helps sell a modest fraction of an x86_64 core too.

> little presence in HPC/AI

Again, every major HPC/AI deployment that I've heard of still has a huge number of Intel cores. Though - as with the cell phone and tablet markets - you can definitely fault Intel for not capturing these markets in their entirety; they certainly had the talent, IP and the capital to do so, but they screwed up strategically in a way that will almost certainly become a cliched business school study at some point, if it's not already.

Regarding HPC I think people here are missing the real problem for Intel: Yes, they still have a strong showing in Top500 systems. But this could easily change soon because there is not that much keeping system developers from switching to AMD, potentially rven ARM. At the same time NVIDIA is essentially in a hard to replace position because an increasing percentage of applications is written to get their FLOPS from their GPUs instead of Intel‘s offerings.

The way Intel tries to fight back I find questionable. Both their MIC lineups and many-core Xeons suffer from a crucial flaw: Almost laughably low memory bandwidth when compared to NVIDIA, even compared to Epic. They rely so heavily on caching that it would require tremendous software efforts to get even in the range of 50% of the performance of GPU ports - I‘d argue it‘s significantly more difficult and less performance portable to do that rather than just do a basic GPU implementation, even with manually handled data transfers.

Add to this the increasingly successful efforts by GPU makers to automate data handling and Intel‘s attractiveness only dwindles further.

And now people are told that they might loose 20% performance? There‘s applications that use these chips like it were real-time systems (because it‘s the only thing financially possible for them). For these use cases, 20% will hurt, a lot.


There‘s applications that use these chips like it were real-time systems (because it‘s the only thing financially possible for them). For these use cases, 20% will hurt, a lot.

Applications that use these chips in a realtime capacity don’t share hardware with other apps, so it will be possible to deploy them on unpatched machines, even if inconvenient. Those workloads are also not big on context switching and so not affected much by the patch.

I think cloud hosting providers will hurt the most, because that tends to be a context-switch-heavy workload, and their pricing model assumes a certain level of performance per core. They’ll mix in AMD in significant quantities just to diversify and avoid getting burned in the same way. They’re probably also the biggest market for intel’s server line.

This incident will seriously hurt intel even if they handle it perfectly, which they’re not doing.

Yes, that's a good point. The sharing of hardware over the internet (aka the 'cloud') is the most critical market here. Though, I think there's a lot of cross pollination between cloud and HPC markets - if AMD is financially successful with Zen they'll be able to pour more R&D money into fortifying their position. The question is of course whether that's going to be enough - Intel came back and beat out all competition after their last big loss of market share (~2005, 95nm process spawning AMD64 / Opteron). I'd argue that the Intel of 2018 is not the Intel of 2005. From what I'm hearing/reading, they've had an ongoing brain drain from being too much focused on short term profits versus taking real risks in developing new technologies as well as putting up the finances to actually secure and test their cash cows.

But this could easily change soon because there is not that much keeping system developers from switching to AMD, potentially rven ARM

Right, you can very easily imagine a machine whose CPU merely marshals and dispatched work to compute elements, be they GPUs, FPGAs, ASICs, TPUs, whatever. You don’t need an all-singing all-dancing Xeon for that...

… Or a Threadripper.

There might be other reasons Intel isn't doing it the nVidia way. Patents are good at blocking progression for example.

Looking at the patent landscape of 3D memory [1] I don't really see the issue that Intel would be facing. First of all, as a chip maker they seem to have completely withdrawn from the base level science in memory in the first place, which probably hurts them from getting favourable patent deals - but this seems to be a management level blunder, maybe an overreaction to the Pentium 4 proprietary RAM flop.

In any case it would IMO be better to just now license 3D memory at market cost rather than just sitting on the hands and let HPC markets fade away.

Remember, it has historically only taken 8-10 years for HPC developments to scale down to people's pockets - it could well be that the next iPads outperform what Intel can offer as x86 based laptops, which will not be a good look for software developers deciding which platform they should choose for the next potential killer app.

[1] http://www.wipo.int/edocs/plrdocs/en/lexinnova_plr_3d_stacke...

>Pentium 4 proprietary RAM flop

You mean RDRAM?

yes. even googling it everyone just talked about DDR back then, that‘s why I couldn‘t find it anymore.

> the backend for pretty much every cellphone app runs on an x86_64 server core somewhere

Intel is not the entirety of x86_64 servers, and as the GP notes, AMD is going in the better direction with Zen for serving all those requests than Intel is at the moment.

As an aside, because it doesn't really matter to the point, x86_64 is actually an AMD extension that Intel licenses from them.[1]

1: https://en.wikipedia.org/wiki/X86-64#Licensing

I agree there is a good chance they will lose a material share over the coming 1-2 years to epyc, but at the moment, the x86_64 server market is almost 100%-Intel.

Old Opterons are still around too. Heck, cheap dedicated servers at Hetzner are repurposed desktop Athlon 64s :D

> "In years past, they'd get maybe one extra product release off each new arch (tick/tock)"

Intel has moved away from the tick/tock release cycle. They now work to what they call process/architecture/optimization, or in more familiar unofficial terms tick/tock/tweak.

Moore's law is essentially broken now, and this isn't just an Intel problem, this is a problem for all the major chip manufacturers. Furthermore, the challenges and cost of future node shrinks is also causing a slowdown in progress in integrated circuit manufacturing. We may get a couple more node shrinks, but we should prepare ourselves for the brick wall that we're likely to hit in the next decade.

«They now work to what they call process/architecture/optimization, or in more familiar unofficial terms tick/tock/tweak.»

For this cycle it's been tick/tock/tweak/tweak. Sounds like a broken clock.

As the bitkeeper guy, I'm watching this with amusement. Intel used BitKeeper for over a decade for their RTL and when they used BK things seemed fine. We did everything that Intel wanted, at one point I realized we had taken $7M of revenue from other customers over the years and spent it on doing work for Intel. And that was still not enough for them.

They switched to git because free is better (we were a rounding error on a rounding in terms of cost to Intel, the most they ever paid us in a year is .00000004 of their revenue).

But that was too much so they switched to git and it's been downhill for them ever since.

I'm not an idiot, I don't think that the switch to git is the cause of their problems, their problems are self inflicted. I'm just one of many many vendors that Intel has fucked over. So I like seeing them squirm.

Karma is a bitch Intel.

Edit: yup, knew I'd get down voted. Don't care. Try being an Intel vendor and get back to me about how much you like that.

I don't really get it. You seem to be complaining that a party chooses to no longer be your customer? Isn't it their full right to decide so? What makes discontinuing a business relationship more "fucking you over" than you "fucking over the supermarket" by choosing to go to another supermarket?

McFly's right, if a bit sensational (ehhh, it's his style.)

However, as poorly as they treat their vendors, it's nothing compared to what they do to their employees and sadly large segments of their first line and middle management not only buy into that but have spent time making that into an art form.

What do they do to their employees?

Ask them to sign NDAs.

@luckydude isn't just some random troll.


It seems unfair that you are taking it out on Intel for dropping your product-- when in fact, the entire industry dropped your product. Git is free, more widely supported, open-sourced, and doesn't get angry when a user stops using it. Technology evolves - best not to pick at old scabs.

I could go into details about how they screwed us but that is perhaps a thread of its own. We bitch slapped them at the end when I finally grew a pair (after years and years of just doing whatever they wanted).

If you are an Intel vendor you will like this tidbit, they were on our paper. Not their agreement, they agreed to our paper. I think we are the only small vendor that they ever did business with where they didn't force their terms into the deal.

Looking at CVs, everyone knows git. Using anything else would require not just licence costs. Also training and potentially reducing the pool of possible hires.

BK predates Git by quite a while. The free alternatives then were sccs, cvs, rcs, etc.

That doesn't change anything. Older or not, people use git.

Because you came up randomly with your disappointment absolutely unrelated to today's topic, I'm tempted to give you another unrelated news: the Linux project has stopped using Bitkeeper too.

This is super uncalled for.

In some ways it's been closer to tick/tock/tweak/tweak/tweak, which is just another sign that improving process nodes isn't something that we should rely on anymore.

For this cycle the 'tick' (Cannon Lake) was pushed back a year (with two 'tweaks', Kaby Lake Refresh and Coffee Lake, taking its place). Cannon Lake devices are due to be released to the mass market this year, and according to Intel it was able to manufacture some Cannon Lake devices last year:


"At CES 2018 Intel announced that they had started shipping mobile Cannon Lake CPUs at the end of 2017 and that they would ramp up production in 2018."

The present evidences show that one should not trust Intel announcements. They may ship CPUs based on their 10nm process, but it is not clear that they managed to resolve all their yield issues.

If Mores law could be broken, mabe the continued economic growth is in danger too?

unlikely to be linked... increasing cache or writing better algorithms will make things faster...

Many industries are held back by lack of interoperability, rather than raw computer power.

> "increasing cache or writing better algorithms will make things faster"

There are performance ceilings on those as well. Best case scenario we come up with a new architecture that is more efficient than the current ones (something built around reconfigurable chips could be ideal). However, we will eventually reach a point where computers don't get significantly faster. Maybe it takes 20 years, maybe it takes 30 years, but there will come a point where that happens.

> However, we will eventually reach a point where computers don't get significantly faster.

Help me understand why. Even if individual computational units cannot get faster, can we not benefit from additional computational units? I know that these don't scale linearly but I suspect there remains work that can be done to improve scaling. Isn't that where GPUs get their computational power?

> "Isn't that where GPUs get their computational power?"

GPUs get their power from being able to execute multiple calculations in parallel. Rendering a 3D scene is something that lends itself to being processed in parallel, for example on a simple level you can have different cores each rendering a different section of the overall image (tile-based rendering is one example of an approach that benefits from this: https://en.wikipedia.org/wiki/Tiled_rendering ).

The GPGPU uses of GPUs (i.e. non-graphics uses) also take advantage of this parallel processing power.

The issue is, not all computing workloads are easy to split up into smaller workloads that can run in parallel. Some workloads are easiest to manage sequentially. For example, consider some code that had a lot of conditional logic (such as "if" statements), where the code executed depended on validating a condition was met. What advantages would you gain from running this code in parallel?

tons of algorithms and software don't run any faster with more cores. Plus each core has a n+1 cost associated with it, whereas with how scaling worked previously you got better performance for cheaper because of size decreases.

The most significant cost in making an ASIC is the silicon wafer, which is extremely expensive, so anything that uses it more efficiently [ less space] makes things cheaper, faster (easier to keep a faster clock synced over a small area), and use less power (less power means you can go faster too, because you have more power/head headroom for cranking clock speeds).

Scaling by going smaller is extremely synergistic, and has super exponential performance impact, whereas in the very best case adding more cores is linear, and in almost every real world case, sub linear. It also costs more, not less.

The fact is baring major breakthroughs, we are stuck with roughly today's level of performance for the foreseeable future.

> tons of algorithms and software don't run any faster with more cores.

> The fact is baring major breakthroughs, we are stuck with roughly today's level of performance for the foreseeable future.

That's obviously false. Otherwise graphics wouldn't get any faster when you add more GPU cores, to name one common embarrassingly parallel problem.

Our software is just really bad at making use of those extra cores. Measure real-world use cases with a well-tuned work stealing engine and you'll see how much performance we're leaving on the table.

(Source: I wrote the first version of possibly the largest consumer deployment of a major software component backed by a work stealing engine.)

>The fact is baring major breakthroughs, we are stuck with roughly today's level of performance for the foreseeable future.

Maybe finally we stop wasting hardware and use the supercomputers we have in our pockets for greater things.

Another thing to consider is that memory access is still the most prevalent speed problem. If we can work on these bottlenecks we can still get significant performance increases without actually increasing CPU performance that much.

Intel having little presence in HPC is a laughable claim. There are plenty of Intel Xeons on the TOP 500 https://www.top500.org/list/2017/06/ and these Nvidia GPUs don't run themselves.

> these Nvidia GPUs don't run themselves

It's only a matter of time before they do. Nvidia's Drive PX platform is Intel-free and something derived from it might look interesting for HPC and/or high-end workstations in the next few years.

Serious question. How likely is for Nvidia to become relevant in the CPU space in the next 10-15 years? (talking specifically about their attempts in this market with things like Tegra)... OR is their goal to keep optimizing form factor, thermal efficiency, energy consumption, etc...to the point they can make a case for a stand-alone GPU effectively replacing a CPU?

They've already build their own ARM cpu's, they are in the TX1, Nintendo Shield for example. Their most recent Volta GPU made significant progress in terms of general purpose programmability, it gained a much more flexible memory and concurrency model compared to previous chip generations. They also partner with IBM by supporting OpenCAPI for HPC.

*Nintendo Switch, Nvidia Shield

> https://riscv.org/wp-content/uploads/2016/07/Tue1100_Nvidia_...

It’s a controller, but if risc-v become established, then the acquired knowledge might be applied in CPU development.

I've long felt that nVidia is going to rise from the ashes of Intel, not AMD.

They are also capable of integrating with Power9 via OpenCAPI, which is the configuration in which many of the Volta based HPC cards by Nvidia will be used in.

Correction: While Nvidia is a member of the OpenCAPI Consortium and thus doing work on OpenCAPI, at present the interface between the Nvidia Teslas and POWER9 chip is NVLink 2.0.

On POWER9, both NVLink 2.0 and OpenCAPI are implemented using the same PHY. However, above that level they're completely separate protocols.

(I work on OpenCAPI drivers and firmware at IBM)

Thanks for the info! Can you tell me how many nvlink Lanes are used? Do you have any benchmarks showing performance of copies compared to pcie?

Geekbench is a somewhat odd collection of benchmarks, I'd be careful reading too much into it.*

Furthermore, the site you link to, which is clearly rather gung-ho about macs, still shows that the ipad was not able to keep up with the intel chips, not even in multithreaded mode, even though the intel has 2 cores to the a10x's 3+3. The ipad beat the MBP on the GPU tests, and I don't think anybody is going to dispute that intel's gpu's are... not exactly record-breaking.

Still, you're totally right that apple demonstrated that intel's tech isn't the uniquely fast chip you might think it to be! It's not a small achievement to come this close, especially if the ipad is using less power (which isn't actually clear - battery capacity is roughly similar, at any rate).

* The reason I don't like geekbench as a cpu test is that it includes a great many tests like FFTs, gaussian blurs, jpeg with DCTs, image processing, and crypto (even in the noncrypto segment!) etc - i.e. workloads that are very, very vectorizable. The problem with that is that such workloads are really tricky to get right - small differences in code can make considerable difference in perf, and what's right for one CPU isn't for another (notably, geekbench is necessarily running completely different code on these processors). Secondly, those kinds of workloads are very, very amenable to special-purpose instructions, which leads me to the related point that I don't think they're representative of what is typically annoying slow today, and much less what is likely to matter tomorrow: these kind of workloads are either good enough on the CPU as-is, or they're going to be moved to the GPU (or even more specialized hardware), which happens to excel at that.

> Their failed 10nm shrink has forced product delays

This is the most important thing. AMD has already made the switch to Multi Chip Modules which makes it much much easier to produce chips for 10nm (what TSMC/GF/Samsung call 7nm).

Right now Intel cant even make a dual core low speed mobile chip on 10nm. How are they going to make a giant 30+ core server processor? This is extremely bad news for them that has not been fully realized by the markets because they have faith that Intel will figure it out, but they may not.

AMD may ship 7/10nm server chips before Intel and Intel may never ship them before switching to MCM themselves.

MCM is as revolutionary as AMD64 was but most people dont realize yet how important it is and how much of an advantage AMD has because of leading with it.

AMD may be shipping server CPUs as MCMs while Intel isn't, but that doesn't mean AMD has a meaningful lead. Intel's Embedded Multi-Die Interconnect Bridge (EMIB) [0] technology is far more advanced than any MCM tech AMD has available, to the point that AMD is using Intel's EMIB to connect an AMD GPU to its DRAM in an upcoming Intel product [1].

When Intel gets around to making a multi-die CPU using EMIB, it'll blow away AMD's EPYC in terms of bandwidth between dies.

[0] https://www.intel.com/content/www/us/en/foundry/emib.html

[1] https://www.anandtech.com/show/12003/intel-to-create-new-8th...

> AMD is using Intel's EMIB to connect an AMD GPU to its DRAM in an upcoming Intel product

It is Intel's CPU using AMD's GPU. I am not sure how this is evidence that EMIB is superior tech. Of course Intel is going to use their own tech wherever possible.

> When Intel gets around to making a multi-die CPU using EMIB, it'll blow away AMD's EPYC in terms of bandwidth between dies.

Do you have any bandwidth numbers on EMIB to support this? There are other sources saying that EMIB is crap and doesn't interface with HBM without using a bridge provided by TSMC. [1]

[1] https://semiaccurate.com/2017/12/19/intels-claims-fpgas-hbm-...

Make sure to take SemiAccurate's name seriously.

That article doesn't say EMIB is broken, it just says that the FPGA product Intel/Altera announced wasn't actually natively designed to use EMIB to interface with HBM. This is not relevant to a discussion of whether two Intel CPU dies could be connected over EMIB.

We do know that the Intel CPU with an AMD GPU on-package will have a much faster interconnect between the dGPU and its HBM2 over EMIB than AMD's EPYC and Threadrippper manage between CPU dies with a conventional MCM. It's also faster than Intel's old Crystalwell parts that used a conventional MCM to add an eDRAM L4 cache to a consumer CPU.

The limits of conventional MCM packaging are well-known across the industry and are a problem for everyone who's trying to use HBM-style DRAM or otherwise make high-speed inter-die links. Full-scale silicon interposers with TSVs are really expensive and would not have been adopted in high-end GPUs and FPGAs over the past several years if there was an alternative. Likewise, AMD wouldn't have adopted EMIB for their project with Intel if conventional MCM packaging were sufficient, because EMIB is still more expensive than not using two kinds of interconnects within the package.

> Likewise, AMD wouldn't have adopted EMIB for their project with Intel if conventional MCM packaging were sufficient

Once again, you are talking about an Intel CPU, so what you are saying makes no sense. It is not AMD's project. It is Intel's CPU that happens to use an AMD component.

> Once again, you are talking about an Intel CPU, so what you are saying makes no sense. It is not AMD's project. It is Intel's CPU that happens to use an AMD component.

The only EMIB in that product is between AMD's GPU and that GPU's DRAM; the Intel CPU die in that product is not using EMIB. The fact that the AMD GPU is using EMIB is significant—it's a somewhat external validation that EMIB has value as an alternative to conventional MCM packaging or large scale silicon interposers, both of which AMD uses for products of their own.

If EMIB wasn't saving money while offering sufficient performance, then Intel would have integrated the AMD GPU into this product using the off-the-shelf silicon interposers that AMD's own products use to connect the GPU to its HBM DRAM.

> If EMIB wasn't saving money while offering sufficient performance, then Intel would have integrated the AMD GPU into this product using the off-the-shelf silicon interposers that AMD's own products use to connect the GPU to its HBM DRAM.

That is a nice claim, based on no evidence.

Intel spends billions subsidizing non-competitive products in order to try to build market share all the time. (mobile, ultrabook, etc, etc)

Unless you can present any evidence to the contrary (still no bandwidth numbers) it is just as likely that this product has EMIB because they couldn't get anyone else to use it and they need some shipping products with the tech to try and sell it.

> Intel spends billions subsidizing non-competitive products in order to try to build market share all the time.

The concept of market share does not apply to EMIB. It's an internal implementation detail, and Intel gains nothing from using EMIB in one of their own products over a cheaper solution that would still result in the end product being an Intel-branded part.

The only motivation that Intel could possibly have to use EMIB as any kind of loss-leader would be to entice third-party foundry customers to use Intel for the sake of EMIB. But attracting foundry customers is clearly not a priority for Intel, and Intel's primary goal with EMIB is to produce Intel products that may incorporate some third-party silicon, not to produce silicon to be incorporated into third-party products.

I don't understand. How does the CPU talk to the GPU on that product? What bridge the GPU die to the CPU die? Sounds like you are saying the CPU doesn't talk to the GPU over EMIB.

> Sounds like you are saying the CPU doesn't talk to the GPU over EMIB.

Correct. There's a conventional link through the package substrate carrying the PCIe signals from the CPU to the GPU. The only EMIB on that product is between the GPU and its DRAM. That's why the GPU and its DRAM are adjacent, while the CPU is about 1cm away from them: https://images.anandtech.com/doci/12003/intel-8th-gen-cpu-di...

Intel would need to redesign the CPU die to accommodate an EMIB connection to the CPU. As it stands, an EMIB link would probably get in the way of lots of contacts that need to go to something other than the GPU, and EMIB wouldn't be worth the trouble for a relatively narrow and slow link like PCIe. The point of EMIB is to enable very wide and fast links between dies, so that inter-die communication is almost as fast as communication across a single die.

> Intel may never ship them before switching to MCM

Fantastic claims require fantastic evidence.. history shows that there are many ways of fixing or designing-around manufacturing yield problems. Do you have any actual evidence that their 10nm fabs will never yield a competitive monolithic server part in a financially-viable way?

> MCM is as revolutionary as AMD64

I would disagree somewhat, as AMD64 clearly benefits every customer with very little downside, while it is fair to say that MCM CPUs can create or exacerbate irritating performance problems for the customer by being "much more NUMAed". [All else equal, most customers would prefer a monolithic part, or 2 NUMA domains instead of 8-16].

Not quite. 64 bit ran behind 32 bit for a long time after introduction. The rule of thumb was to ask whether more ram or more speed was important. 64bit felt slower to people for a long time. (I swear I can still see the difference 10 years later.) 64 bit didn’t really pull away until 8 gig became common. Something similar may hold for MCM, I don’t know about mcm to speak to that.

We saw substantial performance improvements from the increased register count of AMD64 when I lead the porting effort of a video editing/compositing suite back in the early days.

Because our engine took advantage of memory-mapped unnamed file handles to cache frames, we didn't really need the extra address space of larger pointers. I was able to exhaust system memory by manually managing whether or not buffers were mapped into our address space.

Though it had been made for something else, I lucked into being able to use that same engine for interprocess legacy plug-in support. AMD64 support also wasn't necessary to get more physical memory support out of the chips of the era. Operating systems supported 36 and 40 bit physical address spaces, and applications could use higher (would-be negative) addresses by indicating support for it. Those wanting to really cut loose could turn to manual mapping. (See: PAE, LAA)

In the end, we shipped AMD64 support because there were substantial performance wins. And, yes, I profiled every possible software/hardware combination to determine where gains and losses were coming from.

I'm not saying that things were universally faster/better. RIP-relative addressing, though generally quicker than absolute 64-bit, is a headache, and the loss of 80-bit floating point intermediates broke some rare code (which would have broken in strict mode, anyway). I'd love to see some evidence of slow-down now, but, until then, I'm unconvinced. I had a dev bring me "evidence" of huge performance differences between ARMv7 and ARMv8 a couple of years ago (and there are some if you dig). I asked him if he had compiled with or without thumb in the linked modules.

He let me know of his new results a few days later, a little red-faced.

I'm not saying that your detection of slowness is the same thing, but many aspects of performance are measurable (and measured). If it's slower, we can probably measure just how much.

Yes, if someone said anything about video editing I would suggest a 64 bit system.

To clarify my point: an AMD64 CPU could still run in 32-bit mode, if the customer preferred. Much later, the x86_32 ABI work also retained small pointer sizes while providing additional benefits. Whereas there is no way for a customer to fully "unNUMA" an MCM CPU.

When you say "MCM is as revolutionary as AMD64", does that mean Debian and RedHat and Windows and macOS, and all 3rd party applications, have to be recompiled for a new architecture? Or is it simply an incremental speed-up of existing systems?

I'm led to believe that if we're going to recompile The World as well as build new JIT javascript++ compilers, it will be for ARM64?

> does that mean Debian and RedHat and Windows and macOS, and all 3rd party applications, have to be recompiled for a new architecture? Or is it simply an incremental speed-up of existing systems?

Neither. Multi-chip modules simply make it more economical to manufacture large processors. When you can make a 32-core CPU by connecting two 16-core silicon dies together, you'll have higher yields than if you try to make a monolithic 32-core die. There isn't necessarily any consequence visible to software, with the possible exception of having multiple NUMA nodes per physical socket.

So not an architectural paradigm shift, then - a mostly "invisible" incremental CPU improvement instead?

It's a pretty big increment that Intel has been unable to or unwilling to advance to. The message here is that if Intel drags their feet long enough then it'll take that much longer to catch up -- or, more to the point, they'll be forced to take more drastic (read: expensive) measures to catch up.

Intel has produced multi-die CPUs in the past. They aren't doing it now because their fabrication advantage allows them to make bigger monolithic CPU dies than AMD can afford, so Intel doesn't have to incur the performance penalties that AMD's multi-die CPUs suffer from. At the same time, Intel has been very publicly preparing for the day that it makes sense for them to once again make multi-die CPUs, by making sure they have the technology to do it without the performance hit that is currently necessary. There's no heel-dragging from Intel on this issue.

Once Intel's fab advantage is actually eliminated, they'll be forced to make multi-die CPUs to compete against AMD's multi-die CPUs. But every indication is that Intel will be ready to do multi-die better than AMD is doing multi-die. How much better is up for some debate, but it does look like Intel will have a leg up on EPYC's biggest weakness.

Multi-Chip-Module (Epyc) is just 4 separate dies wired together. (As Intel would derisively point out if is basically 4 Ryzen desktop chips glued together).

The advantage of this architecture is that smaller chips are easier to manufacture because the probability of a random defect rendering the chip useless is lower and when a chip is bad it is not as great of a loss.

Epyc chips run all existing software. There is extra work involved with Non-Uniform Memory Access (NUMA) where if you want software to actually use 32 or 64 cores simultaneously it has to be purposefully designed to take into account the architecture but that is not really any different than developing for multi-processor systems in the past.

The point is that we have hit the ceiling on frequency scaling, which means more cores, and now we have hit the ceiling on core count scaling, so now it is going to be more CPUs, and the best way to do more CPUs is with MCMs.

MCM in itself is old: Xenos in XBox 360 was MCM, IBM's POWER chips have been MCM's in many generations, Intel's Pentium Pro was MCM 1995, etc. Going back to the 80's/70's.

MCM has been relatively unpopular because of cost and that's why most of its applications could justify it only to put chips from different silicon proceseses in same package (cache or EDRAM + CPU usually). Except for some high end chips like the POWER stuff.

The interesting bit is how and why AMD put logic chips in low cost MCMs. Probably involves the time or $$ budget to make and validate additional high end variations of the silicon.

> 10nm (what TSMC/GF/Samsung call 7nm)

What’s the reason for this? From my understanding of numbers, 10 is not 7

"10nm" and "7nm" are marketing terms that are not directly comparable across foundries.

How? I thought the number was the size of the smallest transistor?

The numbers are almost pure marketing, these days.

From https://en.wikipedia.org/wiki/7_nanometer#7_nm_process_nodes

> The naming of process nodes by different major manufacturers (TSMC, Intel, Samsung, GlobalFoundries) is partially marketing driven and not directly related to any measurable distance on a chip – for example TSMC's 7 nm node is similar in some key dimensions to Intel's 10 nm node.

well yeah not really. it's a bit of a long read but it comes down to whom you ask. https://www.extremetech.com/computing/246902-intel-claims-th...

They are measuring different things and use different transistor designs.

> How much longer can Intel keep holding on? Do they have an ace they've been hiding?

They do have an ace. $17 billion in operating income the prior four quarters. And about $17 billion in cash on hand.

Who knows if they'll put it to appropriate use to recover from their present mess. They very clearly have the resources to do so. Intel's $17 billion in operating income is over 4x the revenue of AMD, and 2x the revenue of nVidia. It's a pretty fantastical premise to be already counting Intel out, they've recovered from dramatically worse business situations in their past.

They had $XX billions in income, and were Nx the revenue of AMD for the last 5 years - and, frankly, nothing spectacular/interesting to show. Some arch tweaking, process improvements, that's about it

Under your premise of prior situations predicting future actions, Microsoft should be collapsing about right now as Windows fades. IBM should not exist at all, the largest US corporate loss in history (at the time) that they produced in 1992 should have spelled their nearing end, with them fading into oblivion. Oracle should have died in the software wars, they survived several bet the ranch situations.

You know what was missing the last five years for Intel? Desperation. The need to fight back from a situation that threatens their well-being. They've done it before, usually when they were put in a desperate situation. That's also typical of human behavior in general (which is where the corporate behavior derives from).

> Microsoft should be collapsing about right now as Windows fades.

Microsoft has been reinventing themselves; partly successfully, partly failing (Windows Phone / Nokia acquisition was arguably a recent example of a failure in this regard). Windows and Office are still the primary income for Microsoft, but cloud has gained traction, percentage-wise. Xbox is profitable (it wasn't for a long time). The Surface line is another example of Microsoft reinventing themselves.

> IBM should not exist at all

IBM has been relatively marginalised, and could've been so much more rich if the IBM-PC didn't backfire them with the clones getting popular, plus them getting tricked by Bill Gates with MSDOS and Windows (the failure to collaborate with Microsoft on OS/2). But as the B in IBM suggests, IBM never cared much about the consumer space.

Note, I'm not sure about POWER, how competitive that is with Intel's high-end products. Perhaps someone can comment on that.

I think what a company like Intel needs (or any company, really) is direct competition. Ie they need AMD. The competition with ARM and such is far more vague, less opaque. To be fair, Intel did try to compete on low end, with Atom. I remember they were busy with Moblin/MeeGo in the mid '00s. Never took off AFAIK.

well, that is true, but

- they need to capture/create product niche with volume of sales in XX billions per quarter, otherwise it is not worth the effort

- in their traditional niche they have 90+%, the only way is down, basically

we'll see, but signs are not good. F.e., this idiot (https://www.fool.com/investing/2018/01/19/intel-corps-upcomi...) is hyping some 6/12 cores notebook CPU, while for any reasonable analyst this should be sign of desperation - Intel does NOT KNOW WHAT TO DO WITH SILICONE, so it's just slapping more cores together

> they've recovered from dramatically worse business situations in their past.

Such as?

They abandoned their primary business in memory chips, it was eroding out from under them rapidly (Japanese dumping, tanked their margins and was quickly slashing their share of the market). They bet the company on switching focus instead to the microprocessor business, per Moore & Grove (they both recount the story in various ways in interviews). Had they not done so, Intel would have likely gone out of business.


Here's a pretty good cover of it from 2012 by NPR, interviewing Grove, in a series they did:


Well, then they still had alternatives. There is not much in terms of alternatives for a broken reputation. Intel's management of this particular crisis hurts them precisely in the one place where they can not afford it.

Three, prior attempts at better CPU's plus one bug if we just go by the numbers. BiiN alone lost at least $300 million. They had to do a straight-up recall over the FDIV at a cost of who knows how many millions. Then, they and HP spent at least half a billion on Itanium over time. So, yeah, they've had some rough times with reactions or losses that are way worse than I've seen so far with this with some microcode updates and bad press. It certainly could get worse but not as bad yet.

Aside from those, I added a link to Intel i960 that spun out of the BiiN project since it was a nice, little RISC that you might like. It had object-based security and fault-tolerance built into it. There was still potential for reviving that before RISC-V ecosystem, CHERI, and so on obsoleted it. Too bad.






It remains to be seen what the business impact of this latest fiasco is, but PR is easier to deal with than the physical recalls experienced with the FDIV bug or the intel SSDs a few years back.

This situation may become worse for them over time due to the attitudes of their partners in response to this fiasco, but as it stands (AT THIS POINT IN TIME) I think the FDIV bug was probably worse financially.

That said : I hope it hurts them just to help turn the industry away from such single-vendor dependence.

> with the FDIV bug or the intel SSDs a few years back.

I lived through both of those, they did not come even close to the sense of brand damage that I have right now regarding how I personally perceive Intel, they have mis-handled this from day #1 and they are not doing much better than when it started.

There was that time they went all-in on Rambus, and then their supplier couldn't keep up with demand. That gave AMD a big opening that they took full advantage of.

They were further behind AMD in the P3/P4 days, and especially in the early days of 64-bit.

Goldmont is the atom line and not related to skylake. Apple has been building cpus for years now and it's unfair to claim that the apple chips blow the intel ones out of the water based on some simple int/fp benchmarks. I agree that they will lose apple in time, but "in time" could be on the order of 10 years.

10 years isn't that long in a fab-scale company.

I would be absolutely shocked if Apple did not already have MacBook-level ARM cores running macOS in their development labs. Just as Apple had OSX running on Intel hardware for years while developing/selling PPC.

I think there's a very good chance we see a MacBook form factor laptop running an ARM chip in the next few years. I'd buy one, having a laptop with multi-day battery life would be awesome. I don't need super performance on my MacBook.

You can get their MacBook Retina with low TDP cpu's already (A1534 model). Reality is - usage experience is hardly acceptable, everything is way too slow.

Intel seems to be way above everyone in pushing performance/TDP. Apple might have comparable CPU's in iPhones, but they are only comparable for first 10 seconds - then they overheat and throttle. If we put good cooling on their CPU's then you have TDP in 15+ watt class which means poor battery life regardless of who made the CPU (Intel or Apple).

The only reason why all of this works with iOS in a first place is super insane power awareness on OS level: as soon as an app is not in view it basically stops almost all work (except for background handlers), and also their browser capabilities are very limited. To achieve 20h+ class of performance on laptops we either must accommodate the same principles (but hey that will probably kill almost any background web apps, like soundcloud) or invent better battery tech :(

The A10X in iPad Pro is pretty damn good, and it doesn't throttle as much as in the old Ax SoC. So I think A10X is close or already on par with Intel's low TDP CPU.

Then you have to realize on Intel's roadmap, there is nothing substantial changes in the next two years at those TDP SKUs apart from 10nm. While Apple already has A11x lined up and 7nm from TSMC. First time Apple and Intel SoC has Fab features size parity.

> multi-day battery life would be awesome. I don't need super performance on my Macbook.

I think the smartphone market shows that multi-day battery life is not something the average person will trade a decrease in device performance for. And that's on phones, where you also don't have much practical opportunity to use the device plugged in, unlike a macbook. I'd be really surprised to see this tradeoff made on macbooks.

Not to mention that Apple has been running their OSs on ARM for years now so it wouldn't exactly be groundbreaking for them. Kind of like how when they switched to Intel they had already has x86 builds running for years prior. It's only a matter of time really.

There are a fair number of people buying Apple Laptops that use them for virtualizing Windows. I'd expect that a transition to ARM would only be able to happen for some of the lineup. It couldn't be across the board as it wouldn't run Windows. That would present a new challenge for folks who don't understand much about tech and wouldn't be able to differentiate the product lines properly.

Isn't Windows ARM ready now? A bunch of ARM laptops w/ Windows 10 are coming out this year.

Yes, but the two things that keep people using Windows are legacy applications and games. They’d also have to be ported to ARM, and I don’t see that happening in a hurry.

Microsoft has been demoing x86 emulation on ARM for a while now (https://channel9.msdn.com/Events/Build/2017/P4171)

I think it requires special hardware (x86 emulation, which is 32 bit only), and many apps won't be ported to ARM for a while.

That being said Apple might find x86 emulation useful too.

So you run on ARM just to do x86 emulation? How does it work?

Windows might have been recompiled to ARM but it will take a while before all apps have. That's also why WinRT was a failure. You couldn't run anything useful on it.

RT's big problem was the lockdown. They wanted everyone to use the Windows Store. You can't even run architecture-neutral .NET apps on the desktop in RT without a developer unlock.

> I'd buy one, having a laptop with multi-day battery life would be awesome

Apple wouldn't use the technology for a multi-day battery life, they'd use it to make the laptop thinner/lighter (they've already started making MBP batteries smaller than they need be, leaving empty space in the case)

There are windows laptops with ARM chips as well running full Windows:


That being said, Apple has been waiting for Cannonlake for LPDDR4 support.

Goldmont's arch is essentially just Skylake scaled down; in the words of the Wikipedia article "The Goldmont architecture borrows heavily from the Skylake Core processors"

> Their failed 10nm shrink has forced product delays

Previously [1], from 2012 (starting around t=49:52): "Lots of problems here [at 11nm]. Intel and IBM have publicly discussed solving the problems with 11nm by skipping it", i.e., going straight to 7nm with never-been-used-before EUV lithography.

[1]: https://news.ycombinator.com/item?id=16175949

Were people realistically expecting Intel to hit deadlines at 10nm? This sounded more like a research project than a product line.

I always got confused with how companies like Intel planned on getting past a 10nm transistor size. How does one get around the issue of electron tunneling at that threshold (or close to that threshold)? Granted, I only understand CPUs and quantum physics at a very abstract level.

Direct tunneling in the channel becomes a problem < 4nm. the gate insulator can be even thinner since they use high dielectric constant ones which limits the tunneling current.

As GP mentioned the main problem with 10nm is the photolithographic process. Currently they're using 193nm ArF lasers. EUV "light" is very difficult to generate and handle. There aren't even lenses for it, only lossy mirrors, so each optics step will lose you a fraction of the input power and heat the mirrors.

10nm doesn't actually means that the transistors are 10nm-sized

>> They have little presence in HPC/AI

Huh? The Xeon Phi is used regularly in the industry. Nvidia is dominant but to call Intel's presence small is very odd.

Did you miss that Intel's stopped developing the Xeon Phi's?

I did, since that's not true. Knights Landing is discontinued and Knights Hill is suspended, but they are continuing development for DOE Aurora and other projects and targeting 2020-2021 - while existing stock is used regularly and the used market is plenty active. I get a lot of use out of the x200 line.

It is more or less true. Xeon Phi is being replaced by regular high core count Xeons with HBM MCM on one end, and a traditional GPU on the other. While the brand name lives on, it's not at all a continuation of the product line.

A lot of the Xeon phi usage in HPC is due to non technical reasons (e.g. Intel subsidies).

The memory bandwidth is just crushingly bad.

I guess? I mean it is not poorly represented in the Top500 and Top50 for "crushingly bad" memory bandwidth, I'll say that much.

>Do they have an ace they've been hiding?

They will fall back on their old tried and tested tactic of anti-competitive practices.

Come on. Just think little bit about what Intel (and their competitors) have given us. It is obvious that they are straining against the inevitable end of Moore’s prediction. I salute them. This despite of Spectre etc.

"They have zero presence in mobile." is false: Infineon.

It's only a matter of time before Intel makes a huge chip acquisition, possibly of AMD itself to compete with Nvidia. I think they could now make the case that this wouldn't be an anti-trust issue because...there are more competition than ever in the space now. AMD is small potatoes compared to Nvidia, Qualcomm, etc. I am fairly certain that MacOS will eventually be switched to ARM, as they begin to merge MacOS and iOS but in an effort to avoid what Microsoft has encountered in developing for ARM and x86.

anti-trust makes this unlikely.

how unlikely is it that nvidia won't be the center of their own "meltdown" in the future though?

They could be saved, but I will not speak their rescue here. There is nobody who would ask or hear it.

They could have been saved, and still might.

I gave them more than they deserved, better than they could have asked and they turned it to crap.

Curious if you currently you hold a position in AMD?

I'm skeptical that Intel really has that much of a problem, and the investors in the stock market seem to agree. Intel only has a problem if alternatives come to be seen as economically viable. And right now it doesn't seem like there's any particular danger of this happening.

Pundits want to say that there's a huge thing here, because pundits don't optimize for the truth. They optimize for clicks. So you really need to be careful looking to their writing for the truth.

Don’t be generous to Intel. I work in a vertical strategic to Intel, and get regularly called on by them and get NDA presentations and the ability to talk to engineers about strategic projects.

For this incident, I got an email a few hours after the embargo was lifted, that essentially said that it was no big deal and referenced public information. The purpose of the communication was to have people like me message up the chain that this was no big deal. That misdirection is inexcusable, particularly when they could have given meaningful guidance under NDA.

We had some follow up questions, which weren’t really answered. We were directed to hardware OEMs, as ETA for microcode updates are out of their control and according to Intel are the full responsibility of the OEM. In reality, Intel was struggling to deliver the code, and the OEMs we deal with issued patches in hours, and had to pull back updates due to Intel code revisions.

Personally, I do have alternatives for strategic parts of the business that drive high margin Intel sales. Many critical aspects of my business can run on Intel or Power platforms, and we can engineer solutions either way in similar cost footprints.

Less strategic aspects of the business, like end user compute now have niche competitors that can gobble up Intel business very quickly. Half of my desktop users run on VDI, mostly with AMD thin clients. 50% of my constituencies can run their core line of business functions on iOS. iPad with a keyboard could reduce my Intel desktop spend by 50-75% for 2-3 years.

I'm also under Intel NDA and it took them days to provide any meaningful guidance at all.

The only useful thing in these documents was a timeline/detailed list for the microcode patches, all of which should be public.

They also claim that Spectre/Meltdown are "not a bug or flaw in Intel products" and their slide deck has a whole slide dedicated to forward-looking statement disclaimers. Sigh.

Needless to say, we're not impressed.

They know what they're saying is bull hockey, but they have to say it, otherwise they'll lose their lawsuits. They have to be able to point to the public spec and say "See!? We're completely in spec! No bugs here." Nevermind that the bug is in the spec.

Thanks for sharing those details. Deferring to OEMs is such a copout given how they are most likely going to handle security issues:


At this point it’s up to users to scour motherboard manufacturers’ clunky forums to determine if their platforms will ever receive patches. Given the severity of the issue this really should be handled with more accountability and with a greater sense of urgency.

It’s also a cop out as they were leaning on the assessments of the vulnerability as a medium risk information disclosure by many security analysts. That is accurate in many respects, but not in the context of a multi tenant environment or in places where different compliance standards are in play.

It’s particularly obnoxious when you read about the heroic efforts that were put in place to put AWS, Azure and GCP right. If it was no big deal, why go through that?

> purpose of the communication was to have people like me message up the chain that this was no big deal. That misdirection is inexcusable

But perhaps this was no big deal. We've seen years of research suggesting that modern CPUs are full of issues like this. There's probably a good decade worth of papers on cache side channel attacks. See https://eprint.iacr.org/2013/448.pdf for example

Perhaps the big deal is that there are still people who think they can safely run multiple different things on a single machine?

The 'bombshell' of meltdown isn't about cache side-channel attacks. Instead, it is about the use of out-of-order execution (meltdown) and speculative execution (spectre) to get side effects. It doesn't matter much that these side effects happen to be cache state. For meltdown, it matters that page protection is essentially broken. For spectre it matters that speculation on indirect branches are essentially attacker controlled.

Both attacks, in their practical form, use cache timing as the side-channel to extract information. But the surprise is the control over (as the paper calls them) 'transient executions'.

Yeah, I know. My point was that it is very well established that CPUs are full of issues similar enough to meltdown and spectre. Therefore it is foolish for anyone to assume that they can safely run untrusted code on a machine with secrets on it.

I think you might be misunderstanding what the phrase "side channel attack" means. It's very difficult to protect against side-channel attacks, because of their very nature. There are a lot of ways someone can try to exploit a system and of course not all of them can be accounted for in advance. But not all side-channel attacks are practical, some are done for academic reasons and can only be done in controlled environments with contrived setup.

like check this out: https://www.tau.ac.il/~tromer/acoustic/

> Here, we describe a new acoustic cryptanalysis key extraction attack, applicable to GnuPG's current implementation of RSA. The attack can extract full 4096-bit RSA decryption keys from laptop computers (of various models), within an hour, using the sound generated by the computer during the decryption of some chosen ciphertexts.

So the seriousness of a side-channel attack is determined as a function of the impact of the exploit as well as how easy it is to carry out the exploit.

Meltdown in particular is nasty because it is relatively easy exploit and is undetectable and affects a ridiculously large range of hardware. So it is actually a pretty big deal. Yes there are side-channel attacks against Intel CPU's, but this isn't just any old side-channel attack.

I'm not misunderstanding anything. I'm simply of the opinion that secure multiuser systems aren't something we can do right now.

Here's just one of the more practical attacks http://palms.ee.princeton.edu/system/files/SP_vfinal.pdf

>but this isn't just any old side-channel attack.

It isn't, but you were already screwed. Now you're just slightly more screwed.

There’s a $250 billion cloud computing market that makes that assumption.

What about it? There's a huge market for cigarettes too.

And this is terrifying for me. I am giving away money so that people smarter than myself figure out how to secure my systems.

There is no such thing as practical form of these exploits. We have yet to see a real world demonstration even though everybody keeps saying how theoretically easy it is.

Poc in the url means proof of concept. Not very Real world at all.

Not some artificial poc that already knows an address to attack and needs to be helped by continually pulling the data into the L1 cache.

If it is so easy to do then why has nobody written anything that can read a password from a browser or sudo?

A proof of concept is a real world demonstration. That's all it is.

You seem to think that exploit devs should work for free, why is that?

I have seen the way Intel takes care of its important customers first hand. They are as pro as it gets. You won't find anyone better.

I’m sure that’s true, and it’s great if you’re the army or Amazon.

I’ve read Andy Grove’s account of the thinking behind the response to the Pentium math bug. I expect better from Intel. As a customer somewhere between an individual PC builder and Amazon web services, I don’t think they handled this incident well at all.

EPYC is already economically viable. If it's not seen as viable, that's for business rather than technical reasons. It has more cores, better IO, and after these patches it's no longer clear that Intel has simply better performance per core.

Just a random rant: Our data team onprem infra is based on EPYC and boy it's wonderful. We've just placed an order for another 200 EPYC-based servers for our onprem setup that's set to arrive in Q1 but folks already can't wait.

AMD people should be proud, as customers we're really happy. I hope that GCP would have EPYC-based platform at some point too.

Where do you source EPYC servers?

Right now your favorite Supermicro seller is the best bet. Right now.

In two weeks, it's Dell. R6415, R7415, R7425.

What chx said, but in our case we have custom racks with higher density of servers due to spacing constraints so it's not standard Supermicros.

> I'm skeptical that Intel really has that much of a problem, and the investors in the stock market seem to agree.

It's way, way, way too soon to judge the long-term implications of Meltdown and Spectre on Intel. If their clients want to switch, it'll take months and years to do that. That doesn't mean they won't do it, but it means we won't really know the full extent for a while.

The stock price is a really crude metric. For judging the long-term implications of this, we can't look at how the stock has performed in the last two weeks alone and extract any meaningful information.

My prediction: nobody big is going to switch away from Intel entirely, but they will start to prioritize investments in technology built on its competitors, as a way to hedge their future risk. That's definitely bad for Intel, because over time, it'll reduce their lock-in.

Intel stock is down ~10% and AMD up ~10% since end of December. I'm not even sure why people are saying the stock market thinks it's not a problem. The stock market represents a weighted average of possible futures and it has priced in some damage. Is the idea that Intel should instantly drop 90%?

As I write this Intel stock is around $44, up from $41 last October, and that $41 is at an almost 18-year high, it hasn't been this high since the Dot-com bubble. Look at the longer term graph[1]. Similar story for AMD[2] except it's at an all time high since 2007-ish.

The reason people are saying Intel stock isn't down is because of that, when what HN considers the worst thing ever is indistinguishable from fluctuations for the last 3 months the market doesn't think it's a big deal.

1. https://www.google.com/search?q=intel+stock

2. https://www.google.com/search?q=amd+stock

This didn’t become public last October.

Well yes, that's the point.

The point is that it's up from an arbitrary historical value? It's down since the story broke - that's the point.

It's down from 47 to 45, and it's almost impossible to attribute this to these bugs with any reliability. Yes, there was some initial panic in the first days after the bug. You can see this in the volume the day after it was announced. But things have largely recovered since the low midway through this month.

Of course, the stock market by no means knows everything. But the aggregate prediction of traders is that this doesn't matter very much to the bottom line, and I tend to agree.

To be fair AMD stock has a a fairly volatile history. I have traded it over the last fifteen years and my only thought is: Wall Street has a vendetta against AMD.

For example a while back amd announced in an earnings call: we are in the black and reduced out debt subtantially. The stock tanks by 15-20 percent

They just means they were expecting a bigger profit.

The analogy I would use is that investors may know something is going to happen, but there is a lag before it really happens (less market share), and the stock price decline lags that. In school they teach you that a stock tracks the future predictions but it doesn't quite work like that. For big, widely held things, the stock often lags at predicting future troubles until the last group of investors realize that the trouble exists. That also explains why when stocks fall they often fall quickly rather than gradually.

Disclaimer: I know relatively little about differences between processor architectures, so this might be totally wrong because Intel might just have them over a barrel on this.

I think the major difference between this and, say, the Equifax blowup, is that Intel's institutional clients are affected by this.

I'm not sure what they're thinking internally, but it stands to reason that they're probably a bit upset at least: Their CapEx just went up to maintain the same level of computing power. I'd be surprised if internally Google is buying the "AMD is just as affected" line that Intel's been throwing out.

So, I wouldn't be surprised if they're at least evaluating AMD.

Or, again, maybe Intel just totally has them over a barrel and transitioning isn't feasible at all. It certainly doesn't paint a great picture of Intel's future if AMD does catch up, though.

I think the big institutional clients are always evaluating AMD, and even more exotic options. But I don't think this bug has materially affected the equation here. Savvy buyers understand that there will occasionally be bugs, and this one has already been mitigated. How Intel verbally responds to the issue matters much less than their actions, because the words do not affect the costs of using their products, and the actions do.

I am deeply skeptical of the commentary on Intel's attitude and press releases. I really doubt that matters much to most buyers.

I think Intel has two problems:

First, ARM is doing to Intel what Intel did to the Unix workstation vendors in the 80’s.

Second, given that they’re being cornered into the server business, they need to have products that are rock solid there until they can regroup. This is one of a long parade of recent screwups with their big bets in this space:

(1) A while back, all their server atom chips (tons of crypto and I/O with piles of ECC DRAM and cores for < $1000 and < 20W) had a bug where they stopped booting af 18 months of uptime. These compete exactly in the space server-ARM has a chance, so many affected vendors were already dual sourcing.

(2) NVIDIA crushes them for AI, and Intel is a distant third for graphics in general

(3) Samsung SSDs generally trounce Intel ones.

(4) They’re rapidly losing client device share. Their big recent innovation there is AMT, which is increasingly considered an anti-feature.

That leaves conventional IT compute, (web services, DBMS, etc) for their core business, but even on-prem stuff is moving to private cloud, which needs multi tenancy, and they’re looking pretty risk for that use case too (vs AMD?)

They’ll certainly be around for a long time, but it’s not clear how long they’ll keep their “no one gets fired for buying IBM”-level of dominance.

> (3) Samsung SSDs generally trounce Intel ones.

This is true only for the consumer SSD market, where Intel outsources large portions of the product development. It's also a market that Intel may abandon completely in the next few years as Intel and Micron start to pursue separate flash memory development. If Intel doesn't score a solid win with a consumer SSD in the next two generations, it would be reasonable for them to pull out and focus solely on enterprise SSDs, where they have no trouble winning.

> this one has already been mitigated

Correct me if I'm wrong: It's been mitigated by applying a patch that has fairly severe performance implications, no? How does this not affect the institutional clients' bottom line in that case?

> fairly severe performance implications

1 - 30% impact range for best / worst case. On the kernel mitigations. So really workload dependant.

There's the 15-20% hit for going to database. Most servers out there do nothing but CRUD to some DB somewhere because most businesses don't do HPC. It's definitely an issue for them.

Then there are companies like Epic Games who reported horrific numbers. It seems if you do lots of simple communications (eg websockets or UDP), you can expect a huge slowdown.


>Or, again, maybe Intel just totally has them over a barrel and transitioning isn't feasible at all.

That was my hypothesis for why their stock didn't drop much. The problem is very bad but intel's quasi-monopoly and the very high switching costs involved will let them weather it.

Not only will they probably weather it (short term), it will probably result in a reasonable sales bump as affected datacenters replace capacity lost to the various patches and mitigations and also perhaps accelerate replacement schedules when hardware that has designed-in mitigations becomes available.

>I'm not sure what they're thinking internally, but it stands to reason that they're probably a bit upset at least: Their CapEx just went up to maintain the same level of computing power. I'd be surprised if internally Google is buying the "AMD is just as affected" line that Intel's been throwing out.

They don't have to buy it. Whether affected or not, AMD is a non starter at this moment for those things.

Semi-related question: Why is that? I've been looking for resources on why AMD is a non-starter, but most searches just turn up comparisons for low-level consumers.

AFAIK, it’s the ecosystem and availability. Let’s say you’re a Dell shop and you’re using mostly PowerEdge R730s (one of their middle-of-the-road 2-socket options). Your choice right now is Xeon E5-2600 v4 family, because that’s what that platform supports. These aren’t motherboards that you just pick up at Fry’s, they’re designed as an entire system, which takes a while to get going. Dell is not going to invest the R&D money unless they have demand, but there won’t be much demand without stories of success in the field, so we’re at a bit of a chicken and egg problem right now.

Speaking of Dell, they are launching some EPYC stuff: https://blog.dellemc.com/en-us/poweredge-servers-amd-epyc-pr...

But again, getting the ball rolling might take a couple of years. Look at what happened with Opteron as an example.

Pundits and editorial staff sensationalize. Investors collectively behave like a "drunken psycho". Neither are of themselves bellwethers of a company's future success.

We might instead point to facts. Major tech companies are getting into chip design. Apple's foray into fabless last year almost destroyed the value of Imagine and Dialog shares. If they and other companies are successful, we can see the same happen to Intel.

The article you’re mocking uses that very fact, and many more, to make its case:

The company has no competitor in server chips at the moment, but this episode could change that. Microsoft and Google have publicly praised Qualcomm Inc.’s first server chip, which went on sale in November, and Apple, Google, Microsoft, Amazon, and Facebook all have internal divisions working on chip designs.

While I don't underestimate the scope and possible ramifications, I still don't understand the immediate threat to the average person. The malicious hacker that is after secret personal information does not need to go to the trouble of writing a trojan that exploits Spectre or Meltdown when a simple keylogger and file scanner would be much simpler and probably more effective.

The article itself states: "So far, Meltdown and Spectre probably pose less risk to the average person than, say, a simple phishing attack in which a hacker tries to send you to a malicious website. They won’t lead to the kind of widespread panic that resulted from the 2017 hack of Equifax’s customer database.

But that could change. Hackers who hadn’t tried to break into Intel’s hardware, believing there was no way it would leave a side door open, are now seeking ways in."

"But that could change" is a vague term that doesn't mean much to me. Again - not trivialising this nor saying Intel shouldn't do some soul searching, but I'd like to better understand the justification for the apparent hysteria - unless, of course some people more experienced in security would care to explain what it is I'm missing here.

P.S. One thing I wanted to check was whether Spectre/Meltdown breaches could somehow be caused by manipulating a web browser. Some searching revealed that this is indeed a possibility so at this point, everyone feel free to panic :-)


Intel has as much of a problem as VW had after diesel-gate: none. Same will be valid for Apple's throttling scandal.

Big corps like this may experience some little storms here and there, but there is no iceberg big enough for them.

Articles like this exist just for the sake of writing something and making some money.

It's not even close to being over -- https://skyfallattack.com/

There seems to be consensus that it's a hoax.

Is there an embargo? Is it a silly play on the James Bond reference that Spectre represented? Either way pretty funny since you can't know for sure after this fiasco.

Botched micro code updates, Intel engineers arguing with each other on the linux kernel mailing list, etc. Intel's best and brightest have had 6+ months to work on proper mitigations in secret and this is the result.

https://marc.info/?l=linux-kernel&m=151559244214217&w=2 https://marc.info/?l=linux-kernel&m=151559367514704&w=2

Silly play is what I heard as most likely.

Yeah, handling of mitigations for Spectre has been awful during the embargo period, but I am very happy with the result we're getting now. It's taking less than three weeks to get everything sorted out.

Does the embargo have a hard deadline?

> the investors in the stock market seem to agree

It's down several percent since the announcement in a rising market.

More generally they have underperformed the SP500 over the past 2 years and AMD in particular is blowing them away.

Yes they have a problem. The current CEO seems more interested in politics than technology. http://www.breitbart.com/big-government/2015/09/10/intel-cut... (inb4 I don't like Breitbart)

Using the current stock market reaction as a bellwether for anything but short term (1-6 months) outlook is a bit silly.

In the longer term (1-3 years) Intel has a very large problem. Especially as they are being eaten alive in non-desktop class cpus right now.

These issues may not be a knockout punch, or even have them on the ropes, but it made them stumble and they look vulnerable.

Regarding the stock price and business revenue, it might be naive and certainly depends on if lawsuits go forward, but these bugs could prompt customers to upgrade their chips when the next chip revisions are out, increasing sales over pre-disclosure expectations.

I remember the selling on insider information claims, and was skeptical that it would matter that much.

How do you factor into your thesis the possibility of a multinational class-action lawsuit?

The main problem with intel is direction and process. Dan Luu pointed out in 2015 [1] that intel chips had serious bugs and, given how intel acted, it was only a manner of time before something like this popped up.

What I see happened to Intel is that once they consolidated their monopoly in the late 2000s, they lost the healthy management practices that tend to come from being in a competitive industry.

All this talk from upper management about velocity was about trying to find a way to make more money when you've mined out your current niche completely. It ended up instead opening the door for AMD to make a comeback on x86

[1] https://danluu.com/cpu-bugs/

I think that comparing the types of bugs in Luu's post (which is excellent) and Spectre/Meltdown is a mistake. The former are mistakes caused by insufficient testing. The latter are conceptual problems that are nearly fundamental to modern processor design. No amount of simple testing would have uncovered them.

Spectre is fundamental to processor design, but Meltdown is pretty much a bug.

It is only a bug once you have discovered that branch prediction can be used as a vector of attack. Until then, it was a perfectly valid design implementation.

It was a bug right from the original design, because it sidestepped the privilege level separation guarantees of the CPU. That the bug could not be exploited for other reasons does not make it not-a-bug.

What actual data can be extracted by a Meltdown/Spectre attack? Still need to find an answer to that, nothing online says anything specific.

Datacenters should probably be worried, but what about the hundreds of millions of users out there? Doesn't seem like a big deal, tbh - until an actual exploit is out there, why should they worry?

Meltdown allows userland native code (the Javascript your browser loads from random websites is JIT'd down to native code) to dump kernel memory.

It is worth clarifying, when people talk about "kernel memory", for x86-64 it really means all of memory, because all of physical memory is mapped into the kernel's address space. So really, meltdown allows userland code to read anything in memory.

Incorrect. When people talk about kernel memory, they are talking about pages marked as supervisor in the page tables for a particular process. That is not "anything in memory."

Meltdown allows applications to read any mapped pages, regardless of the protection bits on those pages. That mainly means kernel memory, which is the only page set that's normally unreadable. The kernel mapping normally includes all of physical memory.

I think you missed the "all of physical memory is mapped into the kernel's address space"

For a typical kernel without Meltdown mitigations, the entire kernel, including that window into all of physical memory, is in the page tables of every 64 bit process at all times.

How do people find the addresses to even start probing for stuff like this?

Theoretically, any application data stored in memory. An example might be a key used to encrypt user data server-side. The scary part of this attack is how it basically breaks the assumptions that most software was written under around memory isolation. Consider an AWS instance that shares a physical system with a different customer’s instance.

Amazon, Google, Microsoft, and basically everyone else is scrambling to fix these issues because of the potential. Basically, it’s going to take years to cover the long tail for this issue, and waiting until exploit kits are commonplace isn’t necessary to understand the potential impact.

The page from the researchers is your best source: https://meltdownattack.com/. I also recommend reading the actual papers (https://meltdownattack.com/meltdown.pdf and https://spectreattack.com/spectre.pdf). I haven't gotten to the Spectre paper yet, but the Meltdown paper is excellent, readable, and clear.

I don’t think it’s fair to call Meltdown a bug. It’s like LLVM aggressively taking advantage of undefined behavior in C. Processors, like compilers, are designed making assumptions about what they owe the user and what they don’t. The architecture manuals promise that a user space read from protected kernel memory will trigger a page fault. They don’t make any other promises.

Not calling it a bug is ridiculous, it breaks x86 memory protection totally.

What does “x86 memory protection” mean? What promises does the hardware make to the software?

That software in ring 3 cannot read memory in ring 0 unless the appropriate permission bits on the page table entries are set.

I don't think the spec actually makes that guarantee anywhere. It says that a page fault will be generated if a memory access violates page protection bits, but doesn't discuss any other potential side effects.

The closest thing I can find is Intel Architecture Reference Manual Volume 3, Section 5.1.1:

> With page-level protection (as with segment-level protection) each memory reference is checked to verify that protection checks are satisfied. All checks are made before the memory cycle is started, and any violation prevents the cycle from starting and results in a page-fault exception being generated. Because checks are performed in parallel with address translation, there is no performance penalty.

If you read section 11 on caching, the terminology "memory cycle" seems to exclude cache access. Indeed, Volume 3, Section 11.7 explicitly warns that implicit caching might happen that you would not expect:

> Implicit caching occurs when a memory element is made potentially cacheable, although the element may never have been accessed in the normal von Neumann sequence. Implicit caching occurs on the P6 and more recent processor families due to aggressive prefetching, branch prediction, and TLB miss handling.

The spec isn't really relevant to the point I'm making. I'm talking about what the purpose of the design is. Memory protection as a feature exists in order to (among other things) prevent code in ring 3 from reading data in ring 0 that ring 0 has not explicitly granted permission to read. If the implementation fails to do that, then it's failed to achieve its goal, and the whole exercise is pointless—why include the silicon at all? Documented bugs are still bugs.

Again to use the crypto analogy: An implementation of, say, RSA that uses timing-sensitive memcmp to compare signatures would follow the RSA specification. But everyone would agree that such software has a severe bug.

Meltdown is a bug in the same sense that a crypto routine that uses a timing-sensitive memcmp has a bug. Memory protection is designed to prevent ring 3 code from reading memory in ring 0 (assuming the appropriate page tables are set). The implementation fails to do that.

The fact that speculative reads don't check the permission bits is arguably a design bug, not an implementation bug, but I'd still call it a bug.

Speculatively executing code without performing memory access checks seems like a big design/architecture mistake. Not really the kind of thing that testing or verification is aimed at.

That's one of the bugs, Meltdown. The other, more insidious bug doesn't involve memory checks not being done, but rather the more basic fact that results written to cache can't easily be backed out.

packaging meltdown/spectre together was a brilliant PR coup from intel, dispite the fact that meltdown is much more severe and really only effects Intel processors.

This way they can pretend its an industry wide problem and they are not to blame, great.

How was "packaging meltdown/spectre together was a brilliant PR coup from intel"?

It was Google who found the exploits, and Google who published the exploits in the same document & their press release.

Google is the one who packaged these exploits together. How does Intel PR take credit for this?

Google were quite explicit about the distinction between the two and the fact that the more severe of the two only affected Intel. Intel's press releases, on the other hand, used the existence of Spectre - before the full details had even left embargo - to convince people that there was no Intel-specific problem, indeed no problem at all, just a industry-wide design decision.

>Google were quite explicit about the distinction between the two

They weren't though. The technical publication by project zero [0] did distinct between the two, but Google's PR oriented article [1] didn't bother to do that. The PR oriented article packages the two exploits together saying "These vulnerabilities affect many CPUs, including those from AMD, ARM, and Intel, as well as the devices and operating systems running on them."

[0] https://googleprojectzero.blogspot.com/2018/01/reading-privi...

[1] https://security.googleblog.com/2018/01/todays-cpu-vulnerabi...

One might ask why it is that Project Zero's 90 day disclosure rule was waived for Intel in this case and the were apparently given an indefinite amount of time to do nothing about it.

Why is it that if the disclosure (after it was found by someone else) was scheduled for early January that everyone was still not ready and the patches still aren't released and they are still buggy and broken?

It is clear from the fact that Google broke their own disclosure rules that they were colluding with Intel to a certain extent to manage the fallout from this.

Why do you assume Google waived their disclosure policies for Intel? I thought it was clear Google waived their disclosure policy for themselves.

Google had an enormous self-interest to avoid disclosure. At the 90 day disclosure date, their Google Cloud product was vulnerable to all three exploit variants. From their blog post, it looks like at that point they didn't have acceptable mitigations for variant 2. It took Google until December, or nearly 3 months after the projected disclosure date, to fully patch their Cloud product. Google only announced the exploits to the public after they had mitigations in the pipeline for all their products.

Reviewing the timeline, isn't it more convincing that Google tried to protect themselves first, rather than Intel?

Furthermore, if Google really cared so much about 'colluding with Intel' and having disclosures 'waived for Intel', why did they announce these exploits before Intel could release a microcode patch for their CPUs? It does not add up.

> Reviewing the timeline, isn't it more convincing that Google tried to protect themselves first, rather than Intel?

It is true that both would suffer from disclosure.

> why did they announce these exploits

Because they got outed and so they had to develop a cover story.

> It does not add up.

Many parts of the story do not add up, probably because the parties involved are lying.

> Because they got outed and so they had to develop a cover story.

Your claim is at odds with Google's explanation of why they announced the exploits prior to the coordinated disclosure date. In their press article, they state:

We are posting before an originally coordinated disclosure date of January 9, 2018 because of existing public reports and growing speculation in the press and security research community about the issue, which raises the risk of exploitation. The full Project Zero report is forthcoming (update: this has been published; see above).

Recall Google's announcement came the day after multiple reports speculating the existence of a CPU hardware bug. See https://news.ycombinator.com/item?id=16046636 and https://news.ycombinator.com/item?id=16052451 .

Google's statement on why they announced before the coordinated disclosure date is valid. There was an unprecedented amount of embargo violations prior to the coordinated disclosure date. There were multiple high-profile reports correctly speculating the nature of the vulnerabilities. Where is your evidence this is a 'cover story'? I can't see it.

> probably because the parties involved are lying.

It is trivial to accuse someone of lying. Specifically, what did Google lie about, and where did they lie about it? Most importantly, where is your evidence that Google is lying?

I also think it was a PR coup from the Intel side. I think of myself as pretty much tech savvy. But even I hesitated for a moment and asked myself whether AMD was affected by meltdown although I knew that it didn't. All the google searches lead to sites mentioning "Meltdown/Spectre attacks" as if they were the same thing. You can hardly find sites that explicitly state: look here is meltdown, it only affects intel CPUs and it can only really be patched on the software level and that will result in HUGE perormance losses. Basically every time your CPU accesses the Kernel Space, some kind of cache cleansing needs to be done, it has to context switch.

Instead no one seems to grasp what meltdown is, intel is feeding the media with benchmarks and says there is no real impact, users believe intel.

I read somewhere that all big companies Google, Intel etc. knew about these security vulnerabilities, especially meltdown months before. Intel made a deal with Microsoft to disclose the vulnerability on a Microsoft Update date and tell everyone that there was an issue and it was fixed. Microsoft was also supposed to make all processors including AMD to get affected by the slowdown. However Google reported 1 week earlier said it couldn't keep this a secret with good conscience. But even if that plan failed there is still a lot of misinformation caused partly by intel.

Even if you visit sites such as meltdownattack.com (first result on google with some deeper info), it says we do not know in which aspect Meltdown affects processors.

I'm on the verge of buying a new desktop, I will go with intel because I have no other cheap choice. Intel still delivers the cheapest option for me (i3-8100). But I would appreciate if Intel played fairly. All this manipulation behind the scenes is corrupting the market and intel is responsible for it.

I don't know what we got ourselves into. The RAM folks built a cartel and are keeping the prices high. Microsoft is deliberately crippling third party antivirus software. Intel is shipping us CPUs with backdoors (Intel ME), there are serious vulnerabilities that get swept under the rug, yet we have to buy intel because we are locked into the x86 platform.

At least microsoft is teaming up with Qualcomm and Apple is rolling out its own ARM processors. We need to have an alternative. IMHO Intel has been playing an unfair and an unethical game.

But here's the other thing: there is no Meltdown exploit for AMD yet. The paper (https://meltdownattack.com/meltdown.pdf, Section 6.4) is clear that the race condition at the core of the exploit exists in AMD chips, as toy examples show some information leakage. It's just that they can't get a working exploit.

Is there no working AMD exploit by design, or did AMD just get lucky? I bet they just got lucky.

> The paper (https://meltdownattack.com/meltdown.pdf, Section 6.4) is clear that the race condition at the core of the exploit exists in AMD chips, as toy examples show some information leakage. It's just that they can't get a working exploit.

Actually not. The toy example shows that speculative execution occurs past a fault. Every big CPU does that and it's not news (and it's the basis for the Spectre stuff, except with "branch" replaced by "fault"). It should probably be in the Spectre paper too or instead of the Meltdown one. This also isn't news to anybody (if CPUs couldn't execute past faulting instructions, OoO would be almost useless).

The really specific interesting thing that makes Meltdown work is that supervisor-only flagged lines can be immediately used by a subsequent dependent load. That could entirely be specific to a particular architecture decision, and so it could be that some designs are immune while still being highly speculative in the general sense.

I'm not saying that AMD had some brilliant foresight to do it this way, but it might have been a design decision for unrelated reasons or just fallen naturally out of other constraints.

As I mentioned in another reply to the OP, there is some small advantage in delaying the squash of the first load (slightly reduced complexity on the L1-load-hit path), but maybe there are some small advantages to doing the early squash too (you don't use the "wrong" value in subsequent instructions and so avoid wrongly evicting useful lines from the cache, etc).

I think we can safely assume that AMD wasn't like "Oh, we should squash disallowed loads early to avoid cache timing side-channels" - not because they thought of it (they should have), but because they would have already tested Intel chips for it and published for the huge PR boost it would give them.

It's by design in a literal sense. AMD enforces security boundaries even with speculative execution. So does ARM on all but a couple of chips.

Well we don't really know, because AMD hasn't said in enough detail, but it is entirely possible that they are totally immune to the Meltdown attack.

The crux of it comes down to how their TLB, L1 and out-of-order engine interact. Can a load Y whose address depends on an earlier load X whose entry exists in the TLB but only with the S-bit (supervisor access only) end up actually using the true value of X before it is squashed?

Clearly Intel allows it to occur, but it's not obvious that this has to be the way. The TLB, which presumably contains the S-bit, is already on the critical load-to-load dependency path (since the physical tag needs to be used to select the way) so it isn't obvious that the S-check can't happen at the same time and essentially squash the result of the load X before it ever shows up on the bypass network for consumption by Y. On the other hand, it might be slightly easier to let the X result show up up on the bypass network and do the S-check in parallel and then only flag the ROB-entry as squashed after the fact[1] - perhaps it saves a MUX in the critical path.

This is very much like unlike Spectre, where it is more or less obvious by the way that almost everyone does branch prediction and speculative execution that you can probably pull of the attacks on modern cores (perhaps with the exception of branch predictors that do a full address check to access the BTB). Here AMD has also tried to claim to some invulnerability, but IMO these claims are much weaker since it seems unlikely that the predictors cannot be trained. AMD is probably just relying on the fact the the predictors are harder to train.

[1] It's worth noting that everyone is saying that Intel only applies the security check at retirement - but we don't really know this: we only know they apply it "too late" in that the Y load can consume the result, but it could still be applied before retirement, as little as a few cycles later.

I'm talking about Meltdown, which is about a race condition with memory access checking and out-of-order execution, not speculative execution. The paper authors do not give a categorical explanation for why they can't exploit the AMD and ARM chips (Section 6.4), and indicate it may be possible to find an exploit in the future.

It's still speculative execution: Meltdown works because you can speculatively load a location based on an earlier load which will ultimately fault. That's basically the definition of speculative execution: the second load never even occurs in the non-speculative instruction flow.

It is also due to memory access checking and out-of-order execution, since those aren't orthogonal to "speculative execution" (and in fact are tightly related).

Now we're arguing the semantics of "speculative execution." When I use the term, I mean speculatively executing code across branch instructions. I use "out-of-order execution" when instructions inside of a basic block are executed in an order different from how they appear in the instruction stream. This is the terminology I learned in computer architecture courses, it's what I've read in the literature, and it's what the Meltdown and Spectre authors use.

Well long before these attacks that term had a widely accepted definition in CPU architecture that includes much more than only execution on the other side of a predicted branch.

I can't speak to your computer architecture course and I'm not sure what literature you've read (but it's easy to get the impression that it only relates to branches since a lot of literature might only be addressing that aspect), but the Meltdown authors are clear at least (quoting from the pdf):

In practice, CPUs supporting out-of-order execution support running operations speculatively to the extent that the processor’s out-of-order logic processes instructions before the CPU is certain whether the instruction will be needed and committed.

They go on to note that for the remainder of this paper their use of the term will refer to a more restricted definition related specifically to branch speculation, since that's what they care about. That's fine, and it's good they are clear about it - but it doesn't change the recognized meaning of the term (and indeed their narrowing of the term helps confirms the general definition).

They aren't as clear in the Spectre paper, and they focus on branch-related speculative execution since that's what they care about for the purposes of their description, but they don't contradict the idea that speculative execution is limited to branch prediction.

Not that the Spectre/Meltdown authors are a particularly authoritative reference for CPU architecture terminology: these are, after all, software guys peeking into the hardware world for the purpose of putting together these attacks.

Modern "big" cores are, conceptually, executing most of their instructions speculatively, since wide out-of-order execution windows that that there is a large-degree of divergence from pure in-order and any time any earlier instruction can fault, the remaining instructions are speculative (and the CPU mostly doesn't care: the infrastructure such as the ROB are going to be used regardless of whether the current head of the instruction stream is speculatively or not).

I think AMD does some checks that Intel doesn't.

Oh, but it IS an industry wide problem. Keep in mind Meltdown also affects some ARM processors, not only Intel.

And Intel did not have a saying in packaging both vulnerabilities together. Researchers at Project Zero did, and with good reason, since both exploit the same function through different means.

Exactly. It's why you need multiple, assurance activities on projects to spot what a single one can't handle. I summarized here [1] those that produced results at various points in CompSci and INFOSEC history. Far as this, the first applications of covert-channel analysis to hardware found cache-based, timing channels in both VAX [2] and Intel CPU's [3]. That the caches leaked secrets meant any number of constructions built on top would leak secrets. So, proper response would be either one cache/core/cpu per security domain or designing a new cache that didn't leak secrets. The former was high-security's fall-back with the latter attempted by many in CompSci using partitioning or masking caches. The first was done in 2005 after Percival's work showing secret leaks. That first attempt on partitioned caches also cited prior work in real-time sector thinking similarly to boost determinism: proving determinism and covert channel prevention were always closely related. There were many designs after that for both partitioning and masking before the Intel hit.

So, the root causes of shared, on-chip resources were identified by mid-1990's, demonstrated again by later work like Percivals, being mitigated from that year onward, and ignored by CPU vendors. When I asked in the past, hardware people told me they didn't care about cache security because their sales were strictly tied to customers' benchmarks of performance per dollar and watt. Customers didn't care. Suppliers didn't care. That simple.

There were in fact (tiny) segments where customers were buying processors with more robustness or predictability. Those that come to mind were some PowerPC designs from Freescale that aerospace liked, Leon3-FT SPARC that was GPL, some smartcard components, and especially Rockwell-Collins' AAMP7G [5]. Designed with EAL7 methods from 1992, it had mathematical proof of separation at level, triplicated registers for fault-tolerance, ECC memory, and MILSPEC heat tolerance. It's used in guards to separate Top Secret/SCI info from other stuff.

So, these are old attacks with mitigations of various costs that were ignored for profit maximization by the big companies and performance maximization by most consumers/businesses who didn't buy security in general. Both old and new techniques were effective at assessing leaks and mitigations, though. They could've been used at any time, were by CompSci, were by security-critical suppliers (esp Rockwell), and even more techniques exist now for analysis [6]. Intel et al will just patch up until next attack since they and the market haven't changed. ;)

[1] https://pastebin.com/uyNfvqcp

[2] https://www.google.ch/patents/US5574912

(Note: Using the patent filing since the 1992 work is paywalled. It's the same person filing what they discovered on VAX VMM project.)

[3] https://pdfs.semanticscholar.org/2209/42809262c17b6631c0f653...

[4] https://eprint.iacr.org/2005/280.pdf

[5] http://www.ccs.neu.edu/home/pete/acl206/slides/hardin.pdf

[6] https://pastebin.com/ajqxDJ3J

This isn't just Intel. Every chip company is like this. Pre/Post silicon verification is given a narrow window in which to catch everything and if it's a non-trivial bug that requires a layer change it's probably shelved till the next revision.

I think this is less of a thing that is specifically endemic to Intel as much as it's just a "big, old company" thing. Companies become massive, they eventually lose their visionary leadership, and then they spend the proceeding few decades lumbering along just based on inertia.

These companies continue to employ a lot of smart people and have exciting things going on in some components, but the corporate culture makes it hard to get a consistent positive execution.

Intel has been sitting pretty for the last 20 years because chip design is not something that lends itself to modern patterns of disruption. I'm sure it will happen some day, but for now, Intel has kept its position mostly due to the difficulty of the niche than any intrinsic competitive edge.

The point being that Intel itself is not uniquely terrible; they're routinely terrible. They're just positioned in a space that's much more hostile to less-terrible entrants, so capable people do something that is less hard.

Intel still has the advantages that allow them to screw up and survive.

1. Process knowledge and manufacturing capacity. You can buy from others only as much as they have manufacturing capacity. Only real threat to Intel comes from combined volume of GlobalFoundries, TSMC, Samsung and UMC. Apple, NVIDIA, AMD, ARM and Qualcomm can get past Intel only trough these companies.

2. Profit margins. Intel makes 60 percent profit margins, AMD struggles from decade to decade. That's not a coincident. It's the direct result of pricing decisions by Intel. Whenever AMD gets ahead Intel in uP technology, Intel has always the option of cutting profit margins and prevent AMD from gaining more market share.

The odd bit of anti-competitive behaviour (and billion-dollar lawsuits that come with it) is also something of an advantage...if an unfair one.

I think Spectre and Meltdown are a fantastic opportunity to rediversify the CPU market, and I don’t think it could have come at a better time.

- AMD has just had a great release with Ryzen, showing they can compete on a price/performance basis.

- Apple is moving core OS functionality on its newest desktops/laptops on to Apple designed ARM chips.

- Mobile platforms are getting bigger, especially with things like ChromeOS that could be (are being?) easily run on ARM based hardware.

- Open Power has come a long way and could be poised to take some of the server market for customers who want more control than they got with Intel.

I’m excited for this. Obviously the vulns are an issue that needs to be solved, but we could get some real competition in terms of manufacturers, and even in terms of architecture. The industry will take some time to readjust to compiling for/running on multiple architectures, which I think much of the industry hasn’t needed to deal with for a while. The result though will be a market where customers can choose an architecture that makes sense for their use case, and can choose from a range of good options.

(I realise other chips are vulnerable, not just Intel, but the publicity has been Intel focused and I don’t think the technicalities of it matter too much)

I think Intel might get away with it.

Last 5 years they were slacking off, because economically there is no reason to go over the usual 10-15% yearly performance bump. But actually they were accumulating aces up their sleeves. Again, no reason to show your hand, if you don't have to.

But the time has come. Right now Intel has 3 major problems: 1) Meltdown/Spectre situation 2) AMD is awoken from sleep with surprisingly good Ryzen lineup 3) Apple craves new powerful CPUs to satisfy unhappy MacBook Pro customers

Intel can fix all of this with one sweep. Just by releasing a brand new CPU that will surprise everyone. Of course with hardware Meltdown/Spectre fix. They were holding off, but it's time to drop all these hidden aces on the table. And I believe it's gonna happen. Not right now with Cannon Lake, but with the one after - Ice Lake on 10nm transistors, by the end of 2018. It's going to be even bigger than NVIDIA's GTX 1080 success.

Doubtful. You don't just develop a new processor over night, and if they truly had all these aces up their sleeves, they would have dropped them already in response to Zen last year.

Intel's process advantage is shrinking. They're struggling like everybody else because the physics is getting harder and harder. Apart from the fact that it would have been nice to get easy process shrinking forever, this is good news for almost everybody: it means competition for them is getting tougher.

> this is good news for almost everybody

I don't think CPU capacity failing to double every 18 months is good news for anybody. I'd rather have a monopolistic Intel churning out 2x powerful chips every 2 years than a competitive market giving 5% performance bump per year.

It’s doubtful that they would drop all their aces in response to Zen.

Actually, I'd turn this on its head and ask: Why is there this claim that they had or have any aces in the first place, Zen or no Zen?

What you and the ggp are basically saying is that Intel slowed down the improvement in their processors on purpose over the last several years. Why on earth would they do that?

Besides, all the evidence points to the contrary, what with them being unable to compete in the mobile space.

I'd speculate that top management was aware of the physics limitations to their biggest market advantage, and they probably even had a timeline for when the competition is going to inevitably catch up. So they must have been spending their billions on something that's going to keep the company afloat in XXI century.

Maybe quantum computing, neuromorphic chips, GPGPU and 3D NAND are where it's at for them in the future, and traditional CPUs will be more or less commoditized.

> Why is there this claim that they had or have any aces in the first place, Zen or no Zen?

I'm not a big hardware person, but from what I've heard the speed they released 6 core processors after Ryzen makes it likely they were capable of producing 6 core (consumer) designs earlier.

The original hexacore Xeon is almost eight years old (March 2010 release). Intel released a consumer hexacore in response to Ryzen. Intel's artificial market segmentation is ridiculous, but so is the typical AMD watcher's near total ignorance if what is happening in the Xeon line.

That may be overstating AMD's ignorance by quite a bit. The big marketing push with the zen launch was that Intel had a chip with a lot of cores, but it was 2x the price for with slightly worse performance.

They produce Xeon chips with dozens of cores forked off the same architecture, so that wasn't too surprising. Sticking to four cores was probably just market segmentation, like not supporting ECC memory in the consumer line, to protect Xeon sales.

If you think they can redesign their cache geometry, indirect branch predictors, and return stack buffers in the matter of a few months and then tape out new processors by the end of the year, you're nuts.

It's gonna take 3 years minimum until even the easiest of those things is resolved and silicon hits the street, even if Intel begrudgingly admits they need to do this.

> But actually they were accumulating aces up their sleeves.


> And I believe it's gonna happen.

I don't notice anything in your post supporting those beliefs, aside from Intel having a motivation to make them true.

Intel is a huge company and it's hard for huge companies to make abrupt transitions. CPUs have a development cycle that spans years, and employees who were laid off during ACT a couple years ago when Intel decided their headcount was too high aren't going to suddenly come back now.

What Intel could do in the short term is reduce their prices drastically. They have the profit margins to afford it.

They had an ace up their sleeve to fix those security issues? If that is true, they should get ready for the lawsuits I guess...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact