
Intel CPU Bugs of 2015 and Implications for the Future - tolien
http://danluu.com/cpu-bugs/
======
lumpypua
Awesome article! From footnote 1:

 _In the time that it takes a sophisticated attacker to find a hole in Azure
that will cause an hour of disruption across 1% of VMs, that same attacker
could probably completely take down ten unicorns for a much longer period of
time. And yet, these attackers are hyper focused on the most hardened targets.
Why is that?_

I can't figure it out, any fellow posters have ideas?

Well funded unicorns with iffy infrastructure seem like good candidates to pay
out DDoS ransoms, while Azure, AWS, etc will never pay out a ransom. If
attackers want credit cards and email addresses, softer targets seem like a
better choice as well. It doesn't seem like an attacker can extract any value
from hardened targets unless they have state-sponsored or corporate espionage
level resources and skill.

~~~
RickS
Some people just really like hard problems.

------
joosters
Whenever I've glanced through the errata for processors, it makes me think
it's a miracle that the CPUs run as well as they do...

[https://www-ssl.intel.com/content/dam/www/public/us/en/docum...](https://www-
ssl.intel.com/content/dam/www/public/us/en/documents/specification-
updates/4th-gen-core-family-desktop-specification-update.pdf#21)

~~~
nickpsecurity
Take a look at the challenges at 28nm:

[http://electronicdesign.com/digital-
ics/understanding-28-nm-...](http://electronicdesign.com/digital-
ics/understanding-28-nm-soc-design-arm-based-cores)

You're right that it's damn-near a miracle that they work. The miracle is the
investment of literally billions in tooling and projects to figure out how to
get it to work. Then, it's still so expensive that cutting-edge nodes at 28nm
and below where Intel often works are exclusive to only the elite engineers
and tools if it's anything of significant size.

~~~
_yosefk
I dunno, Adapteva taped out at 28nm with 5 people. Do you mean that full
custom design is particularly hard at 28nm or that it's hard for EDA vendors?
28nm is also a bulk node, I think it ought to have been easier to understand
for everyone relatively to FinFET nodes.

~~~
gluggymug
Uh, tape out doesn't mean you've verified anything properly.

Looking at the various Intel bugs found, I can't help sniggering a little.
Verification mistakes happen all the time though. It's just a matter of when
the bugs are found: if found later, it has much greater impact.

On the 28nm issue, getting your design to synthesize to 28 nm depends on the
architecture and your constraints. Trickier data paths make it harder to get
it meeting timing. Where your IOs are placed, where your RAMs are placed and
how much you have, each thing can make it harder and harder to synthesize.
Adapteva doesn't have much RAM I believe.

~~~
nickpsecurity
Are you joking? Intel uses (even invented) so much verification tech it's
crazy. I'm talking more than any small vendor could hope to use as it would
slow them down or take too much brains/training. Any bugs they're having are
more likely due to size of their project or how custom the optimizations are.
Here's some Intel techniques they use for verification:

[https://www7.in.tum.de/um/25/pdf/LimorFix-2.pdf](https://www7.in.tum.de/um/25/pdf/LimorFix-2.pdf)

[https://www.cl.cam.ac.uk/~jrh13/slides/nasa-14apr10/slides.p...](https://www.cl.cam.ac.uk/~jrh13/slides/nasa-14apr10/slides.pdf)

IBM's papers on their verification system for POWER processors mention all
kinds of optimizations for things like pipelines that make the logic
ridiculous. Then they jump through hoops in verification. Yet, they don't hit
3+GHz on good pipeline without that. Undoubtedly, Intel's using similar tricks
with similar issues.

Regular ASIC verification doesn't cut it at their level. What they're doing is
on another level. It's hard to say what exactly we should expect in terms of
errata level given their operating constraints (esp marketing). Only thing I
expect is to know _clearly_ the circumstances errata appears so I can avoid
it. They let me down...

~~~
gluggymug
"Are you joking? Intel uses (even invented) so much verification tech it's
crazy."

That's kinda why I am sniggering. Intel has a rep for letting bugs go to
silicon despite all that stuff. In terms of verification, they dropped the
ball probably (unless they found these bugs in verification and decided to
tape out anyway).

One area where they fell on their face seems to be AVX. From TFA, "Certain
Combinations of AVX Instructions May Cause Unpredictable System Behavior".
That's a huge bug. Remember how we discussed the torture testing of floating
point on the RISC-V ? Kinda similar issue. A customer wouldn't be happy with a
huge bug there.

The big 3 of EDA tools (Cadence, Synopsis, Mentor Graphics) provide their own
tools for verifying this stuff called UVM. Like anything, it still relies on
the person using the tool. It takes a lot of effort and planning to use this
stuff.

Whenever the verification engineer has to create Verification IP to test the
IP, there's a chance they create bugs of their own. It's like a golden rule.

That's why I am not a fan of formal methods. Nothing is proven until you have
it working in silicon.

~~~
nickpsecurity
"That's why I am not a fan of formal methods. Nothing is proven until you have
it working in silicon."

The one's that did use formal methods all the way did what they were supposed
to and usually first pass. They were a mix of academic and defense-related
stuff most people can't buy. What I normally see when I look up formal
verification in industry is equivalence checking with custom shops also doing
protocol verification and certain correctness angles. We've seen lots of what
Intel does in their docs. So, that narrows the question down to "Why are these
errata in there anyway?"

"The big 3 of EDA tools (Cadence, Synopsis, Mentor Graphics) provide their own
tools for verifying this stuff called UVM. Like anything, it still relies on
the person using the tool. It takes a lot of effort and planning to use this
stuff."

Didn't know about that one. Thanks. The briefs I just Googled sound weaker
than Intel's stuff and especially IBM's where presentations cover an
incredible amount of specific verifications. Like you said, what one puts in
determines what one gets out of it. So, are Intel just being lax on
verification or is their stuff just too complex + optimized to catch all the
corner cases?

"Intel has a rep for letting bugs go to silicon despite all that stuff. "

 _If_ it's intentional and avoidable, then I think it might be wise in another
light. (Or not, but worth considering.) The other light is the Lipner essay on
why shipping is more important than highest quality:

[https://blogs.microsoft.com/cybertrust/2007/08/23/the-
ethics...](https://blogs.microsoft.com/cybertrust/2007/08/23/the-ethics-of-
perfection/)

That comes from a background where he and Karger did high assurance systems
that aimed for perfection and got as close as they could in that period. Kept
slipping behind competition in terms of features/speed/price and that would
effect market share. So, his prior employer canceled that product with his
next one following his recommendation to hit acceptable quality levels, ship,
and continuosly improve the product. Wonder if Intel is doing that to keep
market dominance?

"Nothing is proven until you have it working in silicon."

This we agree on. The formally verified stuff usually works first try but
that's thanks to billions in R&D in tooling & fabs they used. I like knowing a
batch of chips performed exactly according to spec when probed during
operation. Funny I can't remember what you HW people call that activity.

Anyway, I'd love to email or chat with you sometime to see an insider's view
on this topic and fill in some blanks. Reason being experienced ASIC people
talk very little vs software people. I'm collecting what tidbits of reality I
can for a variety of reasons. Two important ones are giving a head start to
people aiming for HW design and boosting high assurance design by determining
where the weak points are currently. Really busy right now but maybe later on,
eh?

~~~
gluggymug
"So, are Intel just being lax on verification or is their stuff just too
complex + optimized to catch all the corner cases?"

Lax! Of course it's complicated but verification is about finding the corner
cases. Intel is driving all these extensions to the ISA. They have a pretty
captive CPU market so they are slack. Qualcomm was the same with Wifi SoCs
when I was there. Freescale was better but that may be because of the
particular projects.

"I like knowing a batch of chips performed exactly according to spec when
probed during operation. Funny I can't remember what you HW people call that
activity."

ATE (automatic test equipment)?

The thing is that Intel _does_ have the toughest job. They are 28nm with a
complicated design, lots of RAM, power is a big issue so clock gating probably
everywhere etc. You can't really compare that with a military or an academic
chip. The design constraints are much tougher for Intel.

Still, Intel supposedly has all the geniuses and the money. They should have
no excuses.

On the formal stuff, I have yet to be convinced. I never just trust the tools,
remember?

My email is now in my user profile if you want to discuss further.

~~~
nickpsecurity
"ATE (automatic test equipment)?"

Yeah. It's one of the only processes I have little data on. Must be straight-
forward if I haven't stumbled on many academic papers on the subject. If not,
there's a siloing effect happening on publishing side & the term will be
helpful.

"The thing is that Intel does have the toughest job. They are 28nm with a
complicated design, lots of RAM, power is a big issue so clock gating probably
everywhere etc. You can't really compare that with a military or an academic
chip. The design constraints are much tougher for Intel."

That's part of my point. Hitting perfection took making the problem a _lot_
simpler than what Intel faced. Same happened in high assurance security where
everything TCB was verified down to every trace and state. Took lots of
geniuses...

"Still, Intel supposedly has all the geniuses and the money. They should have
no excuses."

...but still couldn't solve all the problems, keep up in feature parity, meet
profit requirements, etc. So, I'm not so harsh on Intel for now given the
complexity & business model. I might change my mind later. For now, we'll just
disagree. :)

"On the formal stuff, I have yet to be convinced. I never just trust the
tools, remember?"

Now, that I don't get. I've seen, in synthesis/verification work, one
9-transistor analog circuit take (IIRC) 55,000+ equations to represent all its
behaviors. Digital ones easier but with tons of multi-layer cells wired up.
For custom, they often behave differently. DRC's on modern nodes I read are in
1,000-2,500 range. I'm ignoring OPC because you're handicapped enough at this
point. If you don't trust the tools, how are you getting anything done in ASIC
land?

You must write really fast plus have a discount card at Office Depot to do it
all on pencil and paper. :P

I think you trust tools more than you're letting on. You probably just cross-
check tools with tools in various ways like I did with high assurance SW to
catch tool-specific issues. That implies a lot of, but not total, trust in the
tools. If I'm wrong, I'll be surprised and probably learn something in the
process.

There's another method HW people might already use that comes from theorem
provers. They know the proving process is complex. It also breaks down into a
series of primitive actions in logic. So, they split the activity between a
complex, untrusted prover and a simple, easy-to-verify checker. I know state-
machine equivalence & even many physical phenomenon can be modeled well in
software. I've seen as much with FEC systems. Trick for HW might be turning
all the tool outputs into a series of steps like in an audit log that such
tools can verify. That might take a _hell_ of a long time, though, but also
should be easy to parallelize onto clusters, GPU's, FPGA's, etc. Do it on a
macro-cell at a time composing the results like in proof abstraction or
abstract interpretation for software.

What you think?

~~~
gluggymug
"I think you trust tools more than you're letting on. You probably just cross-
check tools with tools in various ways like I did with high assurance SW to
catch tool-specific issues. That implies a lot of, but not total, trust in the
tools. If I'm wrong, I'll be surprised and probably learn something in the
process."

While it is true that we use tools to cross check each other, what I mean is
that we regularly are manually looking through waveforms. At every stage of
the flow, we are checking that our verification infrastructure is actually
doing what it should to find bugs. Because a lot of the time, either we've
stuffed up using the tool or the tool itself is stuffed.

So much tooling is provided for you. Bus functional models, protocol checkers,
etc. You are just cramming it all together and writing your own stuff over the
top. There is always a mistake in there somewhere.

"Trick for HW might be turning all the tool outputs into a series of steps
like in an audit log that such tools can verify."

This is what happens with the UVM. A checker is written with a SystemVerilog
interface by a third party or ourselves. It uses the UVM standard so you can
integrate it with other UVM stuff to make even more abstract checkers. If I am
writing the prover, I know I probably threw a few bugs in there!

If only it were all parallelised because it is slow as hell.

~~~
nickpsecurity
"what I mean is that we regularly are manually looking through waveforms. At
every stage of the flow, we are checking that our verification infrastructure
is actually doing what it should to find bugs. Because a lot of the time,
either we've stuffed up using the tool or the tool itself is stuffed."

Waveform-based verification is something I know nothing about. I haven't seen
it in any paper I've looked at. Is that what people do in logic analyzers and
such? Do you have a link to a free reference discussing what people do with
that stuff and how it's used to verify digital designs? I really should have
this info in mind and on hand if you all rely on it more than verification
tools.

"This is what happens with the UVM. A checker is written with a SystemVerilog
interface by a third party or ourselves. "

That makes sense.

"If I am writing the prover, I know I probably threw a few bugs in there!"

I like that you're realistic. It's how I used to look at code on a complex
project. Even with Correct-by-Construction, I didn't get to feel safe with my
code: only wonder how obscure or unnecessarily simple a problem was left in
it. I need to make a Philosoraptor meme along the lines of: "Do we create
coding schemes or does the code scheme against us?" Haha.

"If only it were all parallelised because it is slow as hell."

There's the opportunity. Whether it can be acted on who knows. I do know so
far that hardware is many blocks strung together with all kinds of tests that
_should_ be parallelizable. Now you've reinforced this potential in my mind. I
have a trick for this but I'm holding off publishing it for now. Let's just
say it's easier to parallelize stuff if one doesn't force their implementation
to be inherently sequential or even tied to CPU's. And there's a little-known,
albeit alpha-quality, way of doing both at once. :)

~~~
gluggymug
"Waveform-based verification is something I know nothing about. I haven't seen
it in any paper I've looked at. Is that what people do in logic analyzers and
such? Do you have a link to a free reference discussing what people do with
that stuff and how it's used to verify digital designs? I really should have
this info in mind and on hand if you all rely on it more than verification
tools."

Yeah, waveforms from a logic analyzer are mimicked by simulator tools.

Not sure about free references. Just googling around I found this about using
logic analyzers :
[http://www.eetimes.com/document.asp?doc_id=1274572](http://www.eetimes.com/document.asp?doc_id=1274572)

For example, page 3 shows a RAM timing diagram. Like any good spec, the
interface from one module to another is defined via a timing diagram. We build
our UVM checkers and monitors to detect these memory transactions based on the
sequences specified. When a transaction occurs it triggers a UVM event which
in turn can be observed by other monitors/checkers or it can create other
events or record the event to a log file etc.

We build our verification infrastructure to automatically check transactions
behave as specified. However knowing I can't trust my own work, I manually
check the waveforms to see whether the infrastructure is performing correctly.

"Let's just say it's easier to parallelize stuff if one doesn't force their
implementation to be inherently sequential or even tied to CPU's."

Sounds interesting. I don't know much about how it's all implemented in the
simulator.

------
nickpsecurity
Old news says Kris Kaspersky:

[http://cs.dartmouth.edu/~sergey/cs258/2010/D2T1%20-%20Kris%2...](http://cs.dartmouth.edu/~sergey/cs258/2010/D2T1%20-%20Kris%20Kaspersky%20-%20Remote%20Code%20Execution%20Through%20Intel%20CPU%20Bugs.pdf)

Will likely continue to get worse in full-custom designs like Intel's since
complexity keeps going up but ease of modeling and verification doesn't.

On the other end, look up VAMP from Verisoft, SSP from Sandia, or AAMP7G from
Rockwell-Collins if you want to see what high-assurance processors look like.
They ran error-free during testing IIRC with a ton of validation. Sandia SSP
was first-pass. In any case, they're all kind of simple compared to Intel's
stuff. That's on purpose given there's an upper limit to how much complexity
you can squeeze into a chip without significant errata. One can expand such
methods to larger SOC's but that's not what big vendors are doing [out of
necessity]. And that has security implications that aren't going away.

------
tgb
I'm curious: what would a fix for these look like? Does it mean a new revision
to be bought, a recall, a software patch?

~~~
quanticle
It depends on the severity of the bug and the level of foresight of the
hardware designer. In the worst case (e.g. the FDIV bug on the Pentium) you
have to recall the CPU. Obviously, this is very bad. That's why modern CPUs
have what are called "chicken bits" or "kill bits", which the BIOS or OS can
set to disable specific features. The most recent use of kill bits I can
recall is Intel disabling the TSX instructions on the Haswell line of CPUs.
Finally, the least invasive option is to issue a microcode update, alters the
way that x86 instructions get decoded to avoid the problematic behavior.
Microcode updates are issued as software patches, and they're actually in the
Linux kernel source tree.

If you want to learn more about all three of these fix techniques, in addition
to learning a bit about the steps that Intel and AMD take to prevent such bugs
in the first place, I highly recommend this CCC talk:
[https://media.ccc.de/v/32c3-7171-when_hardware_must_just_wor...](https://media.ccc.de/v/32c3-7171-when_hardware_must_just_work).

~~~
rincebrain
Don't forget the Segfault bug on the Intel Quark (which doesn't have killbits
- oops) [1], or the stack pointer overflow bug that Matt Dillon found
compiling DragonflyBSD [2], or the Barcelona "you weren't really using your
TLB, were you?" cache coherency bug [3].

Microcode is also in multiple places - there's a flavor baked into your chip,
flavors baked into your BIOS to boot on your CPU if its rev is lower, and
flavors pushed into {Windows,Linux,...} that also flash if their revs are
newer.

[1] -
[https://en.wikipedia.org/wiki/Intel_Quark#Segfault_bug](https://en.wikipedia.org/wiki/Intel_Quark#Segfault_bug)

[2] -
[http://wiki.osdev.org/CPU_Bugs#DragonFly_BSD_Heavy_Load_Cras...](http://wiki.osdev.org/CPU_Bugs#DragonFly_BSD_Heavy_Load_Crash)

[3] -
[http://www.anandtech.com/show/2477/2](http://www.anandtech.com/show/2477/2)

------
nisa
There was a talk on 32C3 about CPU bugs and the insane work that goes into
preventing them:
[https://www.youtube.com/watch?v=eDmv0sDB1Ak](https://www.youtube.com/watch?v=eDmv0sDB1Ak)

------
Animats
That doesn't even include any CPU bugs deliberately installed as backdoors.[1]

[1] [http://www.eteknix.com/expert-says-nsa-have-backdoors-
built-...](http://www.eteknix.com/expert-says-nsa-have-backdoors-built-into-
intel-and-amd-processors/)

~~~
nickpsecurity
It's called AMT/vPro. It's in the brochure. People tell me it's even on when
the system is off. All that circuitry is probably in most of the family just
to reduce NRE costs. Couldn't be more ideal.

And to think some people here mock people for worrying about their random
number function while ignoring their official backdoor and its implications.
(sighs)

------
Andys
On the bright side: There are worse things than a processor lockup (which is
easy to spot when it happens). And the other bug was in the newest
architecture (Skylake) and did not affect Xeons.

Arguably the memory Row Hammer exploit was far worse, and is a sign of how bad
things can get outside the CPU.

