
Finding a bug in Win10 BOOTMGR when chaining to NTLDR - yuhong
https://bugzilla.mozilla.org/show_bug.cgi?id=1225094#c52
======
acqq
The summary, approximately: there are some crashes detected by Firefox and it
seems Firefox (or whatever Firefox uses as its libraries) thinks it can use
AVX instructions even if the OS isn't aware of the AVX technology and
therefore can't preserve the content of the registers during its "normal
work."

And yuhong has found that the reason Firefox thinks that, specifically on
Windows XP which is booted with the Windows 10 loader, is that the Windows 10
loader sets some bit "I know about AVX" instead of clearing it before entering
in Windows XP.

~~~
yuhong
Set it then forgot to clear it. The "bits" include CR4.OSXSAVE and XCR0 (set
by XSETBV). BOOTMGR is used for booting Vista and later, and it chains to
NTLDR for booting XP/Server 2003 and older.

~~~
0x0
Windows has always had the worst dualboot story. It'll gladly overwrite grub
etc without asking (to "helpfully" fix boot problems?), but if your partitions
are ordered a little out of the ordinary it'll still throw a ridiculously not-
helpful error. They might as well just stop pretending to support it.

~~~
Ezhik
I legit just physically disconnect the drives with other operating systems
while installing Windows in dual boot situations. Can't break what isn't
there!

~~~
ivank
That worked with MBR boot, but with UEFI, Windows will notice you have Boot*
variables in motherboard NVRAM pointing to a Linux install that doesn't exist,
then clear them for you.

------
i336_
A cool way to fix this if Microsoft doesn't want to (although I can't see it
being _that_ hard) would be to make a small stub that simply kills the AVX
flags then chainloads through to NTLDR.

It would probably be quite a simple project to work on, and a fun way to learn
about bootloader-level software development. Chainloading NTLDR is well-
understood (and will never change), and being booted by BOOTMGR is also fairly
well understood too. If I was running Windows at the moment I'd be seriously
considering playing with this myself.

~~~
Jaruzel
Would this work with Secure Boot enabled - doesn't everything have to be
signed Microsoft ?

~~~
i336_
Ooooh. Good point - and I don't actually know. [EDIT: See comment below, XP
doesn't do Secure Boot, the following is moot for this context.]

So... BOOTMGR (Win10) is chaining through to NTLDR to load WinXP. And either
Win10 comes with a copy of NTLDR, or pokes around to find the one on the XP
system.

By my reasoning, Secure Boot should say "okay" and happily start the machine
when it decides BOOTMGR is okay, on the basis that BOOTMGR will verify
whatever it loads. The question is whether BOOTMGR actually does that, and
seeing as if it doesn't then there isn't really a boot trust chain, well, it
probably does verify what it loads.

I fear this is something only Microsoft would be able to fix properly for
users who want/need Secure Boot. Slightly ironic. But thinking about it,
Secure Boot on XP is kind of like deadbolting your front door when your walls
have completely disappeared (picture a door sitting in the middle of nowhere),
because XP is officially EOL now.

~~~
ComputerGuru
Bootmgr won't chainload other bootloaders in UEFI mode, secure boot enabled or
not, unfortunately.

Not even other signed and verified uefi bootloaders.

~~~
i336_
Ah, that answers that question then.

------
satysin
God I bet _that_ was fun to debug!

~~~
yuhong
I even used a checked build of NTLDR with the boot debugger to confirm.

~~~
azinman2
What does "checked build" mean here?

~~~
krallja
[https://msdn.microsoft.com/en-
us/windows/hardware/drivers/de...](https://msdn.microsoft.com/en-
us/windows/hardware/drivers/devtest/checked-and-free-build-differences)

A checked build basically has optimization off and assert() on.

~~~
RDeckard
Optimizations are still on in checked builds.

~~~
krallja
From the MSDN article I linked:

> Many compiler optimizations (such as stack frame elimination) are disabled
> in the checked build. This makes it easier to understand disassembled
> machine instructions, and therefore it is easier to trace the cause of
> problems in system software.

------
chris_wot
Umm... what exactly is happening?

~~~
geofft
If I'm understanding this right:

\- Firefox is seeing a crash associated with use of the AVX vector-processing
(~= high-performance math) instructions

\- The AVX instructions use more registers, and registers need to be
saved/restored during a context switch, so you can switch back to a program
and have it be transparent. So you can only use AVX if the OS supports it, and
promises to save/restore those registers in addition to regular registers when
it does a context switch. The OS reports to the CPU "Yes, it's okay to let
people use AVX" by setting a bit in a control register. Applications check
that bit before using AVX instructions.

\- The crash is an illegal-operation exception, which should be impossible
because Firefox checks to see if that bit is set before using those
instructions.

The answer to the mystery: Some people are using an old version of Windows,
that does not support AVX, but with a bootloader from a new version of
Windows. For whatever reason, the bootloader sets the "Yeah, AVX is fine" bit,
and expects the new version of Windows to detect AVX and set support as
appropriate. Old versions of Windows don't know about that bit, though, and
never clear it. So Firefox proceeds to use AVX on a CPU that has no AVX
support.

This was discovered by someone mentioning that they were dual-booting Windows
versions, and that the crash went away when restoring the older bootloader.

~~~
aaronmdjones
"So Firefox proceeds to use AVX on a CPU that has no AVX support". To make it
clear; it uses AVX on a CPU that does support it (otherwise you'd run into an
illegal instruction error), but the OS doesn't. Firefox doesn't know this,
however, because it thinks the OS set the bit that says it does.

~~~
geofft
Oh, does VZEROUPPER generate an illegal instruction if the OS has set the
control register bit but has not called XSETBV? OK, that makes more sense.

------
ikeboy
So ... is there a way to report it to Microsoft and will it get fixed?

~~~
overgryphon
XP support ended years ago.

~~~
ikeboy
What is the Win 10 bootmgr code for XP loading counted as?

~~~
asveikau
Lots of bootloaders (I would say most I have used) allow chainloading another
bootloader. That is what is going on here. Pointing bootmgr at another
partition and saying "boot that".

