
Linux page table isolation is not needed on AMD processors - fanf2
https://lkml.org/lkml/2017/12/27/2
======
caio1982
Did he know it would blow up in a few weeks?
[https://www.fool.com/investing/2017/12/19/intels-ceo-just-
so...](https://www.fool.com/investing/2017/12/19/intels-ceo-just-sold-a-lot-
of-stock.aspx)

~~~
calt
What's the connection?

~~~
amckenna
Here is some more context -
[http://pythonsweetness.tumblr.com/post/169166980422/the-
myst...](http://pythonsweetness.tumblr.com/post/169166980422/the-mysterious-
case-of-the-linux-page-table)

The connection between the linked article in this comment and the linked page
for this post is that there is a potentially huge bug that will be made public
soon and it just affects Intel processors, not AMD - hence the large sale of
stock by the Intel CEO.

~~~
yeukhon
Not sure if it makes any sense and even logical to compare the market before
and after Intel's floating point bug was uncovered a decade ago. My bet is
this current bug won't shake Intel's stock price much.

~~~
Certhas
If the workaround being deployed now causes a 30% performance hit in real
world usage, even just for some cases, it could hit Intel way harder than
fdiv.

A lot of people on Intel will suddenly lose a noticeable amount of
performance. Conversely, if your Intel based VMs lose 25% performance, you are
now booting up and paying for 20% more VMs for the same load.

------
twotwotwo
Was the connection with speculative execution already being discussed openly?
I know about [https://cyber.wtf/2017/07/28/negative-result-reading-
kernel-...](https://cyber.wtf/2017/07/28/negative-result-reading-kernel-
memory-from-user-mode/), but not about anything between that and 28 Dec
suggesting someone made it work and that's the reason for KPTI.

If it wasn't in the open, seems...not ideal embargo-wise for AMD to leak it
there. Though no one's in that thread complaining about the disclosure, so
maybe they either think that part is already known to anyone looking closely,
or just don't think it's a very big piece of the exploit puzzle (like, finding
the way to get info out a side channel was the hard part).

~~~
daenney
It wasn't publicly acknowledged but people figured it out already. Take a look
at
[https://news.ycombinator.com/item?id=16046636](https://news.ycombinator.com/item?id=16046636)
(both the article and the comments) for example. This wasn't going to stay
secret much longer.

~~~
twotwotwo
That post is a couple days after the 28 Dec AMD commit, though. Curious if it
was _already_ discussed since that would mean no way what AMD said is how
people figured it out.

my123 does point out that the author of the speculative execution blog post is
first in the KAISER paper's acknowledgments, and looks like the paper was
presented at a July conference, so that's an earlier clue out in public, for
what it's worth.

------
electic
This is going to have dramatic effect on the cloud computing market. It might
make sense to make sure any VMs you run are on AMD processors or it can really
hurt your performance and basically cost you more to do the same workload.

It also seems, from early benchmarks, this can slaughter performance with
databases.

~~~
yeukhon
Why are people insisting this affects cloud computing market? I am not sure if
this bug is absolutely limited to cloud instances.

~~~
zlynx
The bug affects transistions into kernel mode. Virtual machines have one extra
transistion. A read() call in the guest calls the guest OS which calls the
host OS.

~~~
yeukhon
You are referring to the slow down and hence the extra slow down for calling
an extra syscall?

If so, then isn't it technically correct that the bug will affect regardless
of virtualization or not, but heavier penalty for VMs?

------
anonacct37
This feels like a big FU to Intel. I've heard this patch can slow down
programs like du by 50%. Does that mean AMD is going to find itself running
twice as fast as competitors?

~~~
jandrese
I think the du case was an outlier. Normal workloads shouldn't be so heavily
affected. I am expecting a few percent loss on most programs though. It's
basically a larger penalty for making a syscall, which was already a fairly
slow operation so performance minded people avoid them in tight loops. It will
be bad for people who need to do lots of fast I/O I suspect.

~~~
contrarian_
Sounds like servers handling lots of small UDP packets would be hit pretty
hard.

~~~
revelation
Applications like this where the syscall overhead (and latency) starts to be a
significant factor in processing time and latency have moved to userland
drivers anyway:

DPDK for 10-100 Gbps networking: [https://dpdk.org/](https://dpdk.org/)

SPDK for NVMe storage: [http://www.spdk.io/](http://www.spdk.io/)

The queuing and balancing stuff the kernel does makes sense for spinning rust
harddisks and residential networking, but when the underlying hardware is so
fast that _nothing is ever queued_ , really what are you doing. At 100 Gbps
line speed, a 1518 byte packet takes all of ~ 120ns to transmit, or about 360
clock cycles for a 3 GHz processor.

~~~
DSMan195276
> Applications like this where the syscall overhead (and latency) starts to be
> a significant factor in processing time and latency have moved to userland
> drivers anyway:

I would personally think that is worse, though please correct me if I'm wrong.
The userland driver will run with an isolated PT like any other userland
process won't it? If so, it will suffer the same slowdown that every other
process now has every-time it has to communicate with the kernel, which I
would think would be a lot for a driver.

~~~
revelation
It's counter intuitive at first, but the key to understand how this works is
that while you can use an MMU to assign chunks of physical memory to a
process, you can of course also just use the MMU to assign the memory mapped
IO registers of say a PCI express peripheral to a process.

That is in a nutshell what a "userland driver" is. It's not too far removed
from poking the parallel port at 0x378 on your DOS computer :)

------
bitwind
All Intel CPU's are affected, mitigation syscall overhead increased by 50%,
and none of AMD CPU's affected? I would say this could be an indicator to
short INTC and long AMD...

~~~
IgorPartola
Short INTC maybe but I am not sure this means that AMD will increase in value
over the long run as a result of this one incident.

~~~
dboreham
I think it will because it shows the downside of a monoculture. Hence big
purchasers of CPUs will want to diversify. Also good for ARM vendors I
suppose. Disclosure : bought AMD this morning before headlines saying "Buy
AMD, short INTC" appeared.

~~~
IgorPartola
Has there ever been a precedent for this? When there were major bugs in Intel
CPU's (or drives, or RAM, or motherboards) did the likes of Amazon and Google
invest in diversification? And has it affected stock prices meaningfully? My
guess is that they'll see this as just another one off issue that can be fixed
with software, then move on. For a large enterprise, monoculture that works is
actually better than diversification.

When you think about your own workstation, it's not a big deal to build an
Intel or AMD system. But when you buy 100k motherboards and spend the time
adjusting your tooling to those, from packaging to power, to cooling, to
support, to OS code, etc. and then you on a whim decide to get another 100k
motherboards of a different architecture, you spend a non-trivial amount of
time and money to support those as well. Again, if AMD provides better
hardware, it's absolutely worth it. But I personally wouldn't do it based on
this bug.

I don't own shares of either AMD or Intel.

~~~
revelation
I checked the stock of Intel during the FDIV bug (1994/1995) where they had to
go as far as recalling the affected processors at a cost of $500M in January
1995 and there was basically zero effect. By the end of 1995 the stock had
actually pretty much doubled in value..

~~~
sitkack
I personally think FDIV made Intel money, it told the world how important
Intel was. It wasn't just the calculator sitting on some trader's desk. It
_ran_ the stock market and the stock market responded.

------
artellectual
Essentially looks like Intel compromised (whether intentional or not is a
different point) the design to get the speed boost that gave them the lead
over AMD for the past decade. Will be interesting to see how all this plays
out.

~~~
rootlocus
> Essentially looks like Intel compromised (whether intentional or not is a
> different point)

If it wasn't intentional, then it wasn't a compromise. So it's not a different
point.

------
mindcrash
So first they bring a DLC concept ("unlock features by spending money") to
their enthousiast platform, and now this?

Having a hunch Threadripper will sell extremely well amongst PC enthousiasts
this year...

~~~
rrdharan
I'm curious what you are referring to re: the DLC concept? Did you mean this
thing?

[https://en.wikipedia.org/wiki/Intel_Upgrade_Service](https://en.wikipedia.org/wiki/Intel_Upgrade_Service)

Seems like that was discontinued a long time ago (2011) so was wondering if
there was something more recent that happened?

~~~
floatboth
Probably the paid hardware RAID unlock key.

~~~
amluto
Has Intel _ever_ had hardware RAID? They have firmware RAID, but that's quite
different.

~~~
keltor
They actually DID have hardware RAID controllers, but not like you're talking
about.

------
api
At the meta level this is just a special case of "complexity is evil" in
security. CPUs have been getting more and more complex, and the relationship
between complexity and bugs (of all types) is exponential. Each new CPU
feature exponentially increases the likelihood of errata.

A major underlying cause is that we're doing things in hardware that ought to
be done in software. We really need to stop shipping software as native blobs
and start shipping it as pseudocode, allowing the OS to manage native
execution. This would allow the kernel and OS to do _tons and tons_ of stuff
the CPU currently does: process isolation, virtualization, much or perhaps
even all address remapping, handling virtual memory, etc. CPUs could just
present a flat 64-bit address space and run code in it.

These chips would be faster, simpler, cheaper, and more power efficient. It
would also make CPU architectures easier to change. Going from x64 to ARM or
RISC-V would be a matter of porting the kernel and core OS only.

Unfortunately nobody's ever really gone there. The major problem with Java and
.NET is that they try to do way too much at once and solve too many problems
in one layer. They're also too far abstracted from the hardware, imposing an
"impedance mismatch" performance penalty. (Though this penalty is minimal for
most apps.)

What we need is a binary format with a thin (not overly abstracted) pseudocode
that closely models the processor. OSes could lazily compile these binaries
and cache them, eliminating JIT program launch overhead except on first launch
or code change. If the pseudocode contained rich vectorization instructions,
etc., then there would not be much if any performance cost. In fact
performance might be better since the lazy AOT compiler could apply CPU model
specific optimizations and always use the latest CPU features for all
programs.

Instead we've bloated the processor to keep supporting 1970s operating systems
and program delivery paradigms.

It's such an obvious thing I'm really surprised nobody's done it. Maybe
there's a perverse hardware platform lock-in incentive at work.

~~~
kps
> It's such an obvious thing I'm really surprised nobody's done it.

IBM AS/400 for about 30 years now.

~~~
gecko
Tao/Intent/Elate (which I think is defunct nowadays) would also qualify, and
I'd argue .NET on Windows with the GAC would, too (although there'll be a
legitimate argument about whether that's "simple and closely models the
processor").

~~~
pm215
Tao is long defunct, yes (went under a decade ago). It turns out that people
don't really want a runtime-portable OS/apps (IIRC the biggest takeup it got
was as a Java runtime for mobile, because the competition at that time was all
interpreted). There was no security model in VP, though -- single flat address
space and bytecode could turn any integer into a pointer and dereference it
(loads just got translated into host cpu load instructions), so there was no
isolation between processes or between processes and the os.

~~~
puzzle
AS/400 and descendants have a security model, but they rely at least partially
on a trusted runtime code generator (and, transitively, trusted boot). The
systems have HW assist to tag real pointers, but that's mainly for performance
reasons. Pointer validity checks are performed in software (or they were until
ten years ago), automatically inserted by the bytecode translator. If you
subverted the code generator, your malicious code could get a bit further by
forging pointers.

------
zer00eyz
I have to wonder:

Can intel release a drop in CPU that will avoid or mitigate this issue?

The infrastructure investment in intel cores is huge, if a drop in replacement
lets me minimize downtime, re-gain performance and is "cost effective"
compared to a cost prohibitive replacement does this result in intel having a
sales INCREASE where it replaces bad silicon?

I don't know enough about this issue to speak to the issue either way, but I
would love to hear if this fix is possible/viable.

~~~
nine_k
I wonder if this can be fixed at firmware level. (I frankly have no idea how
deeply configurable Intel cores are.)

~~~
djsumdog
For years, Intel and AMD processors have supported patching using microcode
updates. Until we know the embargo is lifted and we know the full extent of
this vulnerability, we won't know if that would be possible.

~~~
daenney
Based on the fact that kernel patches are going in it's reasonable to assume
this means it can't be fixed with a microcode update. Otherwise, Intel would
issue a microcode update and the Linux kernel wouldn't be accepting this patch
set as a mitigation for this issue (which is all this patch set is, it has no
other benefit to the end user than fixing this bug).

~~~
zimmerfrei
It depends on how long it takes for Intel to go through all regression tests
for all affected platforms. If it takes several months to complete, a
countermeasure in the kernel update may still be the better stopgap.

Or maybe it could be that Intel privately disclosed already that no backport
will be done to firmware of older CPUs, in which case the kernel update is the
stopgap for newer generation and the solution for older generations.

~~~
dx034
They knew of the issue since June. Now that the patches are out it'll be hard
to regain the performance. I doubt they are able to issue a microcode update
within the following months. Otherwise large clients such as AWS would have
implemented that instead.

------
rbanffy
Wouldn't this kind of issue validate the ideas of microkernel-based OSs, where
kernel and user spaces are already completely separated?

BTW, removing the kernel from the non-privileged address space seems like such
a great idea (which is not a new one at all) the whole thing should probably
should have some hardware support to be made fast.

~~~
Unklejoe
> Wouldn't this kind of issue validate the ideas of microkernel-based OSs,
> where kernel and user spaces are already completely separated?

I don't think so, but it depends what you mean.

Kernel space and user space being separated isn't specific to a microkernel.
The only reason the kernel is mapped in to each process is to avoid the TLB
flush during syscalls. The pages themselves aren't actually accessible unless
you're running in kernel mode (well, unless you're using hardware affected by
this bug). So, in a non-broken system, kernel and user spaces are separated,
even with a monolithic kernel (Linux).

> BTW, removing the kernel from the non-privileged address space seems like
> such a great idea (which is not a new one at all) the whole thing should
> probably should have some hardware support to be made fast.

For the most part, I agree. However, it really shouldn't be necessary if the
virtual memory protection did what it was supposed to do. Mapping the kernel
in to the process address space and using the page protection flags is an
optimization that is perfectly legal from an architectural standpoint.

If you can't rely on the page protection flags to work, then you really can't
rely on any other hardware feature to work either.

------
airesQ
Given Intel's dominance of the server market does this mean that datacenter
computational capacity will see an overnight ~5% drop?

Is there enough spare capacity to cope with this? Will spot-instance prices go
up? Will I need more instances of a given type to run the same workload?

~~~
kevin_thibedeau
Given what's been disclosed so far it seems an exploit using rowhammer
techniques would be unlikely to work with ECC RAM. Consumer systems will be
screwed unless a tolerable microcode update is released.

~~~
kentonv
I don't think this issue is related to rowhammer. I think people have been
speculating about rowhammer because it's a famous hardware bug, but none of
the details of page table isolation seem to align with a rowhammer-based
attack.

~~~
kevin_thibedeau
This enables the first step in a rowhammer attack: identify the privileged
address you want to target.

~~~
kentonv
Oh, are you thinking the KASLR bypass is actually the main problem, because it
allows targeted rowhammer? I'm not sure if that's really true, since a KASLR
bypass would give you a virtual address, and rowhammer would care more about
physical addresses.

But in any case, the KASLR bypass is not the main vulnerability here. KASLR is
widely seen as too leaky to be really useful. Linux would not rush out a >5%
performance hit just to fix one of the many leaks.

------
userbinator
All that I've read about this so far seems to indicate that it's only a way to
bypass KASLR... which is itself not really a problem, but there must be
something more to it. Given that it doesn't affect AMD, perhaps it's related
to Intel ME?

~~~
bitwind
The growing consensus is that someone managed to make this work:
[https://cyber.wtf/2017/07/28/negative-result-reading-
kernel-...](https://cyber.wtf/2017/07/28/negative-result-reading-kernel-
memory-from-user-mode/)

Reading kernel memory from user mode = reading cached disk blocks, cached
credentials and anything else, by simply running javascript on a web browser.

KASLR bypass is just a small bonus.

~~~
my123
It is that.

The author of that blogpost is mentioned in Acknowledgments on the Kaiser
whitepaper. :)

~~~
userbinator
Interesting. I can see it being a concern in shared environments (hence all
the cloud providers are quite scared), but unless there's another part about
being able to _modify_ kernel memory, IMHO it's not such a big deal for the
typical single-user personal computer.

I wonder if there are other (non-x86) CPUs that do similar speculative
execution affected... the general ideas behind it don't seem to be specific to
x86.

~~~
Unklejoe
> IMHO it's not such a big deal for the typical single-user personal computer

IDK. If this means that some JavaScript from a website can read my kernel's
memory, then it seems like a big deal.

~~~
userbinator
All the more reason to keep JS off by default...

...but the blog post above shows that you need to execute instructions that
(try to) access kernel addresses, and have a handler in place to catch the
inevitable exception. That doesn't seem like code a JS JIT could generate.

You might be thinking of that JS RowHammer demonstration, but that was using
regular memory accesses and not with the specific kernel addresses that you
need for this.

~~~
IgorPartola
Sorry that train has left the station. JS is now a part of the web. The advice
to keep JS off by default is a lot like saying "turn off your Wi-Fi by
default" and "don't use a computer." People that do it occasionally experience
an exaggerated sense of smugness when a particularly nasty bug is discovered,
but then they go back to leading a much more difficult online life than the
rest of the world.

~~~
baq
there's a subset of the web that still remains a hypertext document database
(the 'web 1.0' if you will) instead of becoming an application delivery
platform (web 2.0, i hear it's almost out of beta). going JS-less on wikipedia
is possible and not at all a bad experience.

~~~
IgorPartola
Sure if you limit your life to Wikipedia that's fine. Hell, you don't even
need an internet connection for it. Just download it all once in a while. But
the rest of us like using places like Amazon, Slack, Google Maps, etc.

I fully support not making content delivery rely on JS. But I disabling JS
because it can be used for intrusive ads is a lot like taking the wheels off
your car because it can take you to the mall where you might see big for sale
signs and annoying sales people. Effective, but stupid.

~~~
username223
> But disabling JS because it can be used for intrusive ads is a lot like
> taking the wheels off your car...

You should try it sometime. Selectively enabling JS will be annoying at first,
but as long as you save your preferences, the web will soon become a much less
terrible place, and you'll rarely have to tweak your config. This approach
won't work for non-techies, of course, but it's not much of a hardship for
someone vaguely familiar with how the web works. Amazon, for example, works
fine with a bit of JS not including amazon-adsystem.com.

------
pkaye
Is this issue only of concern for Intel with Linux? What about OSX, BSD or
Windows?

~~~
rst
Reports are that Windows is also getting an update for this; see refs to
recent NT kernels here:
[http://pythonsweetness.tumblr.com/post/169166980422/the-
myst...](http://pythonsweetness.tumblr.com/post/169166980422/the-mysterious-
case-of-the-linux-page-table)

and the original source for that report:
[https://twitter.com/aionescu/status/930412525111296000](https://twitter.com/aionescu/status/930412525111296000)

OSX and BSD variants are an interesting question...

------
zippie
Data structures stored in kernel space, such as llds [1], will not incur the
overhead of the TLB flush/load.

I suspect that storing data in the kernel space in order to avoid maintaining
a large application PD will become the norm, whereas in the past it has been
reserved for use cases like search engines with massive in-memory trees.

[1] [https://github.com/johnj/llds](https://github.com/johnj/llds)

------
czeidler
Would it be possible to slow down segfault notifications to mitigate the
attack? For example, if the segfault was not on kernel space, halt the
application for the time offset of a kernel read. In this way all segfaults
would be reported at more or less the same time and the attack could be
avoided.

Are there any sane apps that depends on timely segfault handling and thus
might be affected by such a workaround?

~~~
caf
It's not timing the segfault delivery itself, the idea is to time another read
of your own address space after the fault to see if it's been prefetched or
not.

Maybe you could CLFLUSH on segfault delivery though.

~~~
caf
Turns out "maybe" is "not" \- if you put the faulting read at a mispredicted
branch target, you don't take the fault.

------
jopsen
I sometimes wonder if verifying properties of the code we run wouldn't be
smarter than relying on hardware isolation. Or at-least in addition to
hardware isolation, so that there is two layers.

By verify I'm thinking NativeClient-like or JVM isolation.

Obviously, it would entail complete OS rewrite, or maybe partial...

~~~
cjbprime
You can't do secure computation on a CPU that is insecure in this way. The
insecure CPU will ignore any abstractions you create in software.

------
sandworm101
Lol. Was already very happy with my ryzen 1800 bought a couple months ago.
Even more happy today.

------
rdudek
How does all this affect everyday regular users?

~~~
kevin_thibedeau
Soon, malicious JavaScript will be able to own your computer.

~~~
rdudek
Don't most of them do that already?

~~~
dingo_bat
Right now, they just own you and your data. With this, they will own your
machine.

~~~
amckinlay
Pretty extraordinary that a user's most important files (their documents and
whatever else is in their home folder) are accessible to any app at any time.
Why are we still using this outdated security model? On Windows, I could
download an .exe and it could upload the entire contents of my Dropbox without
even prompting for elevation or anything. Kinda scary when you think about it.

~~~
cesarb
> Why are we still using this outdated security model?

Because it's convenient. The alternative would be something like flatpak's
portals, which funnel everything through a few standardized dialogs; but how
would you for instance use them to implement a media player application which
scans for mp3 files, reads their tags, and presents them on a list? A "select
a directory" portal dialog either would not allow for a recursive scan, or
risk a non-technical user selecting their home directory, and either way would
be a strange interruption in the workflow. (I understand, however, that
Android has done precisely that for removable SD cards...)

------
rbanffy
Would it make sense to switch core at the same time the context is switched
between user and kernel? The hit with cache is already there and, if one could
go back and forth to already primed caches on different cores, at least some
of the performance issues would be mitigated.

------
czardoz
Is there anything that an end user can do to mitigate the performance impact?

~~~
executesorder66
Switching to AMD or not upgrading your kernel are the only options I can think
of.

------
rpns
Slightly better link:
[https://patchwork.kernel.org/patch/10133447/](https://patchwork.kernel.org/patch/10133447/)

------
b1gtuna
I thought it was clear that this patch only applies to AMD. However, reading
the comments here confuses me. How's does the performance on Intel drops with
this?

~~~
dralley
No, other way around. The patch which decreases Intel performance has already
occurred. This patch AMD saying "we don't need this, so we're disabling it for
AMD CPUs."

------
534b44a
I can't even imagine how large the bug bounty can be for the researcher who
created the PoC.

------
sslalltheway
old news. [https://cyber.wtf/2017/07/28/negative-result-reading-
kernel-...](https://cyber.wtf/2017/07/28/negative-result-reading-kernel-
memory-from-user-mode/)

~~~
rootlocus
This isn't about the original vulnerability. It's about the linux kernel
disabling the fix patch (and the performance hit) for AMD processors.

