Hacker News new | comments | show | ask | jobs | submit login
Multiple OS Vendors Release Security Patches After Misinterpreting Intel Docs (bleepingcomputer.com)
300 points by ingve 70 days ago | hide | past | web | favorite | 91 comments

At first I thought this was an error in implementing new docs related to Meltdown/Spectre. But the original research paper:


the researcher wrote:

> Somewhere around the release of the 8086, Intel decided to add a special caveat to instructions loading the SS register...where loading SS with [`pop ss` or `move ss`] would force the processor to disable external interrupts, NMIs, and pending debug exceptions.

So it's a really, really old piece of documentation, dating from around 1980.

To call it a 'misinterpretation' rather than a vulnerability is extremely generous, given that most Intel engineers spent entire careers in the presence of code vulnerable to this 'misinterpretation' without calling the OS vendors out on their error.

On this particular thing I think the Intel docs are clear. This is where I implemented the same in JPC:


The interrupt shadow itself is clear.

The specific implication of that for a mov ss ; syscall pair with a hardware breakpoint set on the first instruction is a lot more subtle.

Agreed, but that's just saying that combining two or more simple things results in a more complex thing. All platforms I know of describe each individual instruction and its consequences, and leave you to deduce the consequences of combining them.

They might be clear today. Were they always? There are literally decades of timeline to examine here.

They were clear in 2006 when I was reading them. All the emulator implementors were very aware of it. Many old operating systems won't boot without the delay.

I'm trying to understand this one, even if most of my ASM knowledge is from the 8086 era. My guess:

* When an interrupt, debug exception, ... occurs, the CPU pushes stuff on the stack as part of the task switch.

* The stack is managed by 2 registers: SS and (e/r)SP. To change your stack, you have to change both registers. If an interrupt happens and you've changed only 1, stuff gets pushed on an invalid stack and you're toast.

* To fix this, the CPU has a wild card: When you change SS, you get exactly 1 instruction that will not be interrupted. The idea is you use that instruction to change (e/r)SP and make the stack valid again. If there is a need for an interrupt, it will be delayed for 1 instruction.

* Now this being a security problem, what would happen if you use this second instruction to switch to kernel mode ? It turns out the delayed interrupt happens before the first kernel mode instruction, but in the kernel.

* And you can trigger the right kind of interrupt with debug exceptions and single stepping.

* And if you do this, the kernel tells the debugger not about the debugged program but about the kernel. Oops.

So to fix this, I suppose the kernel checks the debug exception info from the CPU, and if it is debugging the kernel it fixes things up so you go back 1 instruction.

> To fix this, the CPU has a wild card: When you change SS, you get exactly 1 instruction that will not be interrupted. The idea is you use that instruction to change (e/r)SP and make the stack valid again. If there is a need for an interrupt, it will be delayed for 1 instruction.

I wonder, why could not they make a single instruction to change both SS and SP?

There are. I suppose LSS SP,value is possible on i386+ and there is also the task state segment which might help. But this was a hack in the original 8086, which wasnt the best processor design ever to start with. It stays there because of backward compatibility.

They could, but attackers would still use the approach that works for them.

And they can’t really remove the old instructions because of backwards compatibility.

Whitelisting a limited set of instructions that can follow setting SS and making all others trap might be an option, though. It still would break backwards compatibility, but if the effective impact would be negligible, they could deem it acceptable.

Follow-up: not only could they, they did. https://software.intel.com/sites/default/files/managed/7c/f1..., page 4-385:

“Loading the SS register with a POP instruction suppresses or inhibits some debug exceptions and inhibits interrupts on the following instruction boundary. (The inhibition ends after delivery of an exception or the execution of the next instruction.) This behavior allows a stack pointer to be loaded into the ESP register with the next instruction (POP ESP) before an event can be delivered. See Section 6.8.3, “Masking Exceptions and Interrupts When Switching Stacks,” in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. Intel recommends that software use the LSS instruction to load the SS register and ESP together.”

They could only do the interrupt disabling behaviour if the next instruction is in the whitelist, but not otherwise.

EDIT: Although I guess it's almost certain to be an error anyway if it's not a whitelisted instruction.

I bet whitelisting only instructions that can change SP/ESP would not break the majority of x86 code. All other instructions would behave as if MOV SS had not been executed.

They use the same scheme other places as well.

will hold off on enabling the interrupts until right after the hlt has started so that you don't end up in a race condition when you get an interrupt between sti and hlt.

Albeit, they could have made an stiandhlt instruction here as well.

That one will take trace exceptions though.

Sure, that makes sense.

A trace exception shouldn't be creating work for a thread context that's ostensibly the one going to sleep with nothing to do, but an external interrupt very likely is.

In protected mode, SS is a descriptor offset into the GDT. It doesn’t make sense to change SS very often unless there are multiple stacks. Furthermore, it doesn’t make sense to overcomplicate an already over-complicated instruction set burdened with vestigal, legacy features. And, loading SS and SP atomtically would probably create a byte pattern that’s too long to encode.

You could, but it wouldn't allow them to relax the constraints here, since you'd have to recompile all the code in the world to use this new instruction before relaxing.

They probably could, but it would have had a large silicon cost for 1980 without performance benefit. That and ISA incompatibility.

The cost would not have been very big. The 8086 had microcode, and there were the very similar LDS and LES instructions. There were plenty of unused slots in the opcode table so that wasnt it the reason either. There are weird instructions aplenty like XLAT or AAA/AAS/AAD/AAM, so it wasnt as if they had a pressing silicon shortage.

Maybe they dindt think about it, and fixed it with a quick hack once they got aware of the problem? The whole segment register story was very hacky from the start.

Intel might have dumped the segment registers in i386 32 bit protected mode, as they cleaned up a lot of other troublesome corners around that time. But, well, they didn't, so we have to deal with it today.

The fun thing is the 286 didn't fix this either. This is why the MOV SS came first (in protected mode the selector may not be present for example). This was a problem for DOS where SS was often used to access the DOS data segment.

They also had a more recent chance to clean up this with the which to 64 bit

Adding an interrupt delay is also not free and requires changes on hardware level. But if it was done in 80s then of course it was a safe hack because there was no kernel and user mode isolation at that time.

As far as I understand, what's happening is:

* There's an old feature which causes POP SS/MOV SS instructions to delay all interrupts until the next instruction has executed, to safely allow changing both SS and SP without an interrupt firing inbetween on a bad stack.

* If such an instruction itself causes an interrupt (by triggering a memory breakpoint through the debug registers), it is delayed (as intended).

* The delayed interrupt will fire after the second instruction even if the second instruction disabled interrupts.

* By means of the above, a MOV SS instruction triggering a #DB followed by an INT n instruction will cause the #DB exception to fire before the first instruction of the interrupt handler, even though this should be impossible (as entering the handlers sets IF=0, disabling interrupts).

* The OS #DB handler assumes GS has been fixed up by the previous interrupt handler, which in now under user control.

The x86 ISA and its implementations are now in the spotlight of the whole security research community. There is probably a lot more to come since it accumulated a lot of cruft in the name of backwards compatibility.

I hope we learn a lot, and take the time to record the experience, for coming platforms like RISC-V and others.

Why is there no big CAVEATs document from intel detailing weird quirks. I strongly assume the intel arch engineers are well aware of many of those counter-intuitive behaviours in their products.

> I strongly assume the intel arch engineers are well aware of many of those counter-intuitive behaviours in their products.

But it is likely just kind of distributed, organic knowledge that is hard to condense into a single document. Writing and maintaining such a thing would be a significant project, and (I am speculating here) not the kind of thing that significantly burnishes anyone's performance review.

That said, the whole community of assembly-hackers has even broader knowledge of the topic, and could start such a document out in the open. And Intel engineers might likely contribute their own two cents. (Unless lawyers forbid it).

I stumbled on a blogpost by bunnie huang which describes a liability angle, which I found to be plausible: https://www.bunniestudios.com/blog/?p=5127 - worth a read.

Wow, the article shows that many vendors mis-read the docs: Apple, Microsoft, FreeBSD, Red Hat, Ubuntu, SUSE Linux, and other Linux distros...as well as VMware and Xen.

This is going to be a busy day!

At that point can it really be attributed to "mis-reading" the docs? If every single independent implementor understood it the same way, the docs were wrong.

To be fair, Intel docs are so consistently gibberish that it might as well be classified a separate language (similar to english, but only a quarter the information density).

In this case it seems they just didn't properly specify a piece of insane behaviour though. Hell, I'd consider it an outright CPU bug if I'm reading this right. Seemingly there's a "feature" where loading SS causes interrupts to be delayed until after the next instruction, even if the next instruction disables interrupts - so you can cause an interrupt to fire on the first instruction of the handler (where it should be impossible).

Two others that look like bugs:

1. The CPU does a buffer overflow when reading the array of bits used to determine IO permission for instructions like "in" and "out". Every OS which supports the feature has to add an extra byte of 0xff beyond the end of the array.

2. Returning from a 32-bit OS to a 16-bit process will only update the low 16 bits of the stack pointer. The upper 16 bits can still be read, leaking info about the kernel stack. Linux has a complicated work-around called espfix.

> Intel docs are so consistently gibberish that it might as well be classified a separate language

Mayhaps much like legalese.

Speaking as a technical writer, if that many vendors misinterpreted the documentation then it was the fault of the documentation.

It also could be innate complexity of the thing being documented. Looking at https://software.intel.com/en-us/articles/intel-sdm, the combined PDF has 4.844 pages.

I have no doubt that the complexity adds to the difficulty of documenting it. But, I still think the documentation is failing when people across the industry who should be able to parse this complexity are unable to. Complexity just isn't an excuse for broadly misunderstood documentation in my opinion.

Maybe the docs were written badly? If everyone makes the same mistake, then there might be something wrong in the source.

> Apple, Microsoft, FreeBSD, Red Hat, Ubuntu, SUSE Linux, and other Linux distros.

Just to clarify, this is kernel code. Listing 3 different (+ "other") Linux distros as affected is kinda bogus, it's not that they all made the same mistake, they just all use the same kernel.

Also, Apple and Microsoft are not an “OS”. This is a poorly written article unfortunately.

Not necessarily, maybe they all contributed their own (incorrect) fixes?

Many of them use different versions of the same underlying Linux kernel, sometimes put together in different ways.

It seems improbable that there are multiple ways of putting together the kernel's handling of mov ss.

As an example, Red Hat doesn’t ship major kernel upgrades except with major releases. If you’re running RHEL 6, you’re still on a 2.6 kernel and the fact that someone patched 4.x probably doesn’t help you all that much unless you have the time to backport the change and confirm that it doesn’t break something else.

I was really sad to see Debian omitted from that list, they patched it too but were described as "other Linux distros" :(

And in Germany it's national holiday. Really great.

Not just Germany, a very large chunk of Europe has a holiday today. Belgium, France, Netherlands, Portugal, the Nordics and Switzerland I'm sure about, there may be others.

"Both Peterson and the CERT/CC team blamed the "unclear and perhaps even incomplete documentation"

Yet the article's title makes it seems like it was the OS implementors faults instead of Intel's.

I wonder for what reasons the website tries to shift/soften Intel's fault on this?

Seems like there's a trend in news articles to have incoherent titles in relation to content lately, it's really annoying...

I wonder what happens if you execute multiple POP SS instructions. In fact, you could set up a 64K v86 mode segment containing only copies of the POP SS instruction. jmp far into it. When IP reaches the last instruction it wraps around and starts again. Will it ever be interrupted by anything? If the stack usage bothers it, just do MOV SS,AX

Intel thought of that, but doesn’t seem to give us an answer. https://software.intel.com/sites/default/files/managed/7c/f1..., page 6-8:

”If a sequence of consecutive instructions each loads the SS register (using MOV or POP), only the first is guaranteed to inhibit or suppress events in this way.”

So, we still don’t know. For all I know, it may even depend on the exact cpu used or cpu state.

Tested it: you get an interrupt after only skipping one instruction

Thanks, both dooglius and Someone. One learns something everyday.

> Fixing the bug and having synchronized patches out by yesterday was an industry-wide effort, one that deserves praises, compared to the jumbled Meltdown and Spectre patching process.

Is this a fair comparison? I feel like the patching techniques must have been easier to develop than for Meltdown/Spectre. Furthermore, if this affected the same kind of people in this community, maybe this time around benefitted from the communication channels of the previous exercises.

Maybe this isn't a comparison to try and badmouth the previous iteration, and instead just tried to show a general improvement in the industry—I just find it a bit unfair.

This is one of the first times that I know of that the Linux kernel and Windows kernel developers discussed a security issue together directly. So while the fix was much simpler than Meltdown/Spectre was (Linux was fixed with a patch that was written in 2015) overall, the communication between different OS kernel developers right now is very good.

And yes, it is all due to the horrible Meltdown/Spectre problem and how that was handled. We were not allowed to work together for that problem, and we do not want to that to happen again.

> We were not allowed to work together for that problem

How do you mean?

It was covered by NDA.

Interesting that some of the BSDs (Dragonfly, FreeBSD) are listed as Affected and others (NetBSD and OpenBSD) are listed as Not Affected.

There isn't one BSD kernel like there is one Linux kernel, there are several different ones which might borrow from each other, but are developed independently. A comparison can be found on Wikipedia: https://en.wikipedia.org/wiki/Comparison_of_operating_system...

Dragonfly descends from FreeBSD so it makes sense that it's affected.

The other BSD's have different kernels and vastly different development histories as well.

Some BSDs never allowed debug register writes in the first place, so they were immune.

It's been so long I had to look it up: SS is the stack segment.

And furthermore, the current stack address is determined by the combination of two registers: ss for the segment and rsp/esp/sp for the stack pointer within the segment. I guess the strange behavior around modifying ss comes from the fact that you need to also modify sp immediately afterwards, because otherwise you are running with a wild stack address pointing to random memory. You also can't modify sp before modifying ss because then you are also running with a wild stack, and an interrupt could come in at any time and push things onto random memory.

I felt a great disturbance in the Force, as if thousands of voices suddenly cried out, "Oops..."

The vulnerability notes[1] say Apple patched this on May 8, but my last security update was May 3 and I don't currently show any available updates... I wonder if the May 3 patch fixed this, or if my computer might not be affected.


The May 3 patch fixed it. The nature of the fix was such that Linux and Mac OS were able to patch it early without revealing much.

In fact, the Linux fix was a patch I wrote in 2015 (for unrelated reasons) and just never got around to upstreaming.

I'm super happy to see that the BSDs were contacted (with the exception of HardenedBSD).

HardenedBSD would just receive it downstream from OpenBSD, wouldn't it? It's like contacting the Linux Mint group while contacting Ubuntu is sufficient.

HardenedBSD is a separate OS forked from FreeBSD with its own kernel development. While the BSDs may share some code, they're essentially all different OS'. There is no upstream like Linux has. As a side-note after Spectre/Meltdown, Shawn Webb complained in a NYCBUG thread about getting not being able to get access to these embargoed vulnerabilities.

HardenedBSD is based on FreeBSD, not OpenBSD.

Definitely, it appears this is thanks to the work/awareness of Nate Warfield at MSRC


Is there a PoC exploit source?

A bit tangentially related, but I've always wondered why the syscall instruction doesn't use the TSS for stack switching like int does. I guess it does give you more flexibility to load rsp from gs during a cpl 3 -> cpl 0 transition rather than consulting the TSS to switch it automatically. Can anyone weigh in on this?

Can someone explain the risk factor? It cannot be remotely exploited or though browser if I'm reading it right. But a malicious program with user level access can get kernel access. So exposure to malware running on a limited account can get higher access?

Archived version: http://archive.is/DxUwA

Is this also affecting code in 64 bit mode?

As the code in the paper is written for amd64 I would assume that it is affecting it.

In the other hand it is somewhat surprising, because loading anything into SS is mostly pointless operation.

This doesn't even make sense. Why would they keep a behavior that could be considered a bug when creating a new instruction set?

Was illumos not affected?

I'm sure the initial reaction here is going to be lamentation about the state of documentation. People will correctly point out that, if multiple entities misread the documentation, it just have been unclear. And they are right. But that doesn't make this Intel's fault alone. Clear or unclear, the documentation described behavior that was understood at the Intel organization, and the shipped product worked as described.

Where was the security testing at the OS level? Why can't there be automated test suites that catch unauthorized access issues before ship (if not before merge commit)? If your vendor delivers an insecure product and you don't discover it, how much blame do you share?

Why can't there be automated test suites that catch unauthorized access issues before ship (if not before merge commit)?

Usually the search space is too large.

Isn't that what fuzzing is for?

Concolic testing would probably catch it, but only if the person that implemented the hardware model for the theorem prover understood the Intel documentation, which seems unlikely.

Basic fuzzing probably wouldn’t catch this; as the other comments point out, the search space is probably too large, and the set of vulnerable executions is probably too small for an undirected random search.

I’d be truly amazed if a fuzzer could have caught this one. You need to invoke debug syscalls with the right parameters and the do a magic two-to-three instruction sequence.

It's what formal verification is for!

If you can't trust the CPU documentation how can you test that addition works? even if you could test all possible combinations of terms to add and verify the results there may be a hidden flag somewhere that when flipped will change how addition works.

On that subject, I'm curious whether there is any CPU out there that sets the overflow flag incorrectly when computing (-1) - n when n is the most negative number (which negates to itself, so implementing subtraction by simply negating the RHS and adding will screw up the flags).

The ARM ARM documents sub(left, right, no carry) as add(left, ~right, carry set), which is also the most straightforward implementation if you have to account for carry anyway.

Your idea is akin to searching the space of unknown unknowns, By definition someone can not even begin to quantify the space of what you don't know you don't know.

This mentality is why I like OpenBSD.

If they can’t sign off on the security/correctness of a pile of code, they delete it, even if it means losing functionality.

In this case, they simply didn’t invoke the incomprehensible instruction of doom.

It’s as simple as determining if a given program will halt.

You forgot to include the sarcasm flag. I suppose "tag" these days, right?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact