
Things you must know before you can understand Meltdown as a developer - captn3m0
https://razorpay.com/blog/meltdown-paper-summary/
======
keldaris
As others have already said, if you're a programmer, please just read the
original papers:

[https://meltdownattack.com/meltdown.pdf](https://meltdownattack.com/meltdown.pdf)
(start with this one)

[https://spectreattack.com/spectre.pdf](https://spectreattack.com/spectre.pdf)

They are extremely well written, clear and to the point. Understanding them
will take you less time than trying to get rid of all the tortured analogies
and unnecessary simplifications people have been trying to make up over the
past week. It's bad enough that we face the daunting task of explaining this
stuff to people who don't care about computers, there's no need to perpetuate
misunderstanding among those who deal with computers for a living. Just read
the real thing.

And on the subject of explaining this to others, it might surprise you how far
you can get if you try to honestly explain how the attacks work. I refuse to
use the silly train station metaphors, so I tried to describe the basic idea
of how speculative execution works in out of order CPUs to my parents (who can
browse the Internet, with some effort, and were patient enough to listen to me
for 10 minutes or so). I don't think I got the notion of return-oriented
programming across very well, but the basic idea of Meltdown and side channel
timing attacks in general is actually very easy to convey on the basis of a
reasonably simplified picture of a CPU - you need the explain the basic role
of cache memory, virtual vs physical addressing, the TLB and the basic notion
of branch prediction. That's all you need to understand the principle of how
the attacks work, if not the details of the implementation.

~~~
croddin
I have a question: the attacks were apparently found by the authors of these
papers and project zero very close to each other. Why is it that a bug that
has existed so long was suddenly found by both groups so close to each other?
Are there other papers or ideas that this is based on, or was there some other
reason they both tried to look for these kinds of bugs?

~~~
tptacek
This happens all the time. In this particular case, speculative execution
timing attacks were "in the air" for the last year. A little while ago the
C.W. was that Anders Fogh had kicked this off with his blog post from last
summer, but Fogh actually posted a chronology taking this work all the way
back to December '16:

[https://cyber.wtf/2018/01/05/behind-the-scene-of-a-bug-
colli...](https://cyber.wtf/2018/01/05/behind-the-scene-of-a-bug-collision/)

Once there's blood in the water, people race to find exploitable flaws (that's
the goal of the game), and so it's not surprising that you'd get multiple
teams disclosing, especially with something this egregious. Also: there's a
Nyquist Frequency thing happening here: remember that we're dealing with
months-long embargoes. So there's a lot of time for people to have found these
bugs "separately", and all we're really seeing is a colliding _disclosure_.

But having said all that: straight-up collisions happen a lot. We all have
favorite stories. My favorite is when Vitaly McLain (then at Matasano, now one
of my partners at Latacora) found an nginx bug that was identical to
Heartbleed, 2 years before Heartbleed was disclosed. A fantastic bug. We were
on a client engagement, so we had to coordinate with the client before
reporting it upstream, and in the _one hour_ it took to do that, someone else
reported the same bug.

------
scott_s
Unfortunately, this gets some big things wrong. Meltdown is _not_ about
speculative execution. (Spectre is.) Meltdown is about out-of-order execution
- no branches required. The authors are clear about this in the paper. From
Section 2.1:

 _" In practice, CPUs supporting out-of-order execution support running
operations speculatively to the extent that the processor’s out-of-order logic
processes instructions before the CPU is certain whether the instruction will
be needed and committed. In this paper, we refer to speculative execution in a
more restricted meaning, where it refers to an instruction sequence following
a branch, and use the term out-of-order execution to refer to any way of
getting an operation executed before the processor has committed the results
of all prior instructions."_

In this explanation, the author starts by showing two different code branches,
which is misleading. Meltdown does not require code branches - which is what
makes it so surprising. This is the C code example from the paper:

    
    
      raise_exception();
      // the line below is never reached
      access(probe_array[data * 4096]);
    

No branches: you have an exception, and then in the code following that
exception, you have some memory access. Despite the exception, the access
happens because of out-of-order execution. The actual exploit is, in assembly:

    
    
      ; rcx = kernel address
      ; rbx = probe array
      retry:
      mov al, byte [rcx]
      shl rax, 0xc
      jz retry
      mov rbx, qword [rbx + rax]
    

The exception is raised on the mov command, as it loads a kernel address. This
exception will eventually cause the processor to abandon all of the current
code it is executing, and the program will terminate from a segmentation
fault. But. There is a race condition: before the processor deals with the
exception, but after the memory has been accessed, the second mov instruction
executes, which _uses the data which caused the exception_. This shouldn't
matter, as execution is abandoned, but data is brought into the cache based on
this value, and using side-channel attacks, we can figure out what this value
was. From the paper:

 _" To load data from the main memory into a register, the data in the main
memory is referenced using a virtual address. In parallel to translating a
virtual address into a physical address, the CPU also checks the permission
bits of the virtual address, i.e., whether this virtual address is user
accessible or only accessible by the kernel. As already discussed in Section
2.2, this hardware-based isolation through a permission bit is considered
secure and recommended by the hardware vendors. Hence, modern operating
systems always map the entire kernel into the virtual address space of every
user process.

As a consequence, all kernel addresses lead to a valid physical address when
translating them, and the CPU can access the content of such addresses. The
only difference to accessing a user space address is that the CPU raises an
exception as the current permission level does not allow to access such an
address. Hence, the user space cannot simply read the contents of such an
address. However, Meltdown exploits the out-of-order execution of modern CPUs,
which still executes instructions in the small time window between the illegal
memory access and the raising of the exception."_

I find the paper to be very readable. They give a good overview of modern
computer architecture, and then walk through all of the steps of their attack.
I highly recommend reading it:
[https://meltdownattack.com/meltdown.pdf](https://meltdownattack.com/meltdown.pdf)

~~~
captn3m0
Thanks for pointing this out. I'll be updating the post shortly with changes.
I've added a note about going to the source and reading the paper alongside.

For others on this thread: +1 on the above recommendation for reading the
paper itself. It is very well written and accessible. If you've read the blog
post, you know pretty much everything you need to understand the paper.

~~~
scott_s
Note that this includes your explanation with the check_function() - that is
not part of the exploit. The branches in their assembly are only about dealing
with the zeros.

And, to reiterate: any explanation of Meltdown that depends on branches is
incorrect. It's not enough to just use the phrase "out-of-order". All of your
examples with if-statements need to change.

------
tptacek
Things you must know before you can understand Meltdown:

* The memory hierarchy (registers, cache, memory); really all programmers always need to know the memory hierarchy and Meltdown just sort of reinforces that.

* The basics of kernel memory management (kernel memory is mapped into userland processes and protected by page table permissions checks).

* Very basic assembly language (basically what a variable assignment and an "if" statement compile down to).

* The idea of pipelined CPUs, the idea that on modern CPUs the registers you see in assembly instructions are actually renamed from a larger invisible register file, and the distinction between instruction execution and retirement.

If you've got this I think you can just read the paper:
[https://meltdownattack.com/meltdown.pdf](https://meltdownattack.com/meltdown.pdf).
It's really well written. In particular: I don't think you need to understand
much about timing attacks. The Flush+Reload paper (you can just Google it,
it'll be the first result) is _also_ really well written, but you'll be fine
in the Meltdown paper without having read it.

~~~
gautamb0
I tried reading the flush+reload paper several times, and watching a couple of
the author's talks on it as well. I still haven't come out with a halfway
decent understanding of how the timing attack works...which seems to be the
most difficult and interesting part to me. It seems like its well understood
in the security community, so it gets glossed over when referenced. How they
actually manage to read data out of an evicted cache line remains a mystery to
me.

~~~
tptacek
Is it the timing _mechanism_ you have trouble with, or the timing _target_?
Flush+Reload is (to me) an unusually clear paper (it's an engineering paper,
which is probably why it wound up at Usenix). But even in the paper, the
actual _target_ (not just understanding square-and-multiply but also how that
gets translated into cache hits) is tricky.

The nice thing about Meltdown and Spectre is that the cache hits are less
tricky to understand; they're engineered specifically to make the exploit
work.

~~~
gautamb0
[I had to go back and reread it a couple of times...naturally :)]

I guess part of what bothered me is what makes it well written; there is so
much of the discussion spent on background, which felt like stating the
obvious to me. It wasn't clear to me how specific the conditions needed to be
for the attack. They use GnuPG as an example, and ostensibly rely on knowing
the algorithms that the decryption and encryption functions beforehand. With
knowledge of the implementation, they're able to trace execution, and
subsequently infer each bit of the victim data that they want to probe. They
also need to know the victim's cache characteristics; hierarchy and timing.

It's a far cry from arbitrarily reading memory on an arbitrary victim.

------
phkahler
I don't understand one part. If you read from an arbitrary memory location
(during speculative execution, I get all that) how does that read pull data
from a different process? Aren't all addresses virtual until they go through
the MMU and get translated to a physical address depending on the process?

Or does this work only because the kernel exists in the same virtual address
space, hence KPTI as a mitigation?

~~~
tedunangst
It is typical for 64 bit machines to have a direct mapping of physical memory
at some virtual offset. Ie, virtual 0xfffff...700000 corresponds to 0x0
physical. This simplifies some things since you can allocate and access any
physical page without creating new mapping.

32 bit machines do not typically have sufficient kernel address space to do
the same.

(Oh, linux still uses direct map on 32 bit machines even today, but only maps
some memory? I thought that was abandoned, but wouldn't really know. Anyway, a
much better explanation of all things direct map is
[https://www.sceen.net/mapping-physical-memory-
directly/](https://www.sceen.net/mapping-physical-memory-directly/))

~~~
bogomipz
>"This simplifies some things since you can allocate and access any physical
page without creating new mapping."

I am curious what types of things does this simply for the kernel? When is a
physical page allocation ever done that doesn't need to be entered in into a
page table entry?

~~~
tedunangst
ptrace of another process, for instance.

~~~
bogomipz
Can you elaborate? If I strace a process which makes a ptrace system call that
is just another userland process and my userland process has a page table
entry just like any other userland process.

~~~
tedunangst
You call ptrace and ask to read another process's memory. The kernel has to
turn the requested VA in that address space into a physical page (walking
those page tables, not yours) then copy the memory (from some VA in kernel
space). This is quicker if you can turn a PA into a VA by simply adding an
offset.

~~~
bogomipz
I see. I misinterpreted your previous comment. Cheers.

------
captn3m0
Tried to write a meltdown explanation for "everyday" developers. There are
some loose analogies and inexact writing. (Please point out mistakes, they're
mine)

~~~
DiffEq
How easy or how likely is a meltdown attack likely to be successful against a
moderately protected PC or say a VMware Cluster...these kinds of things seem
hard to pin down...and if all one can do is READ - exactly what is to gain
here? It seems that the machine would have already had to have been
compromised in another way to get the memory that has been READ off and out of
the computer system.

~~~
captn3m0
There are known PoCs for Meltdown using JS, which is what made this so scary.
Heartbleed was far worse in comparison since it was remotely exploitable. But
the JS vectors for Meltdown make it scary.

~~~
jnordwick
Where? We've been told it is possible, but I have yet to see a JavaScript
exploit that wasn't basically a canned demo.

~~~
unclepresent
I doubt it is possible to be done on JavaScript. Timing cache access is a
challenging task for such high level language. The key to the attack is to
figure out latency of memory access.

A JavaScript app that is dealing with 100 layers of intermediate code before
it actually gets to the physical memory could not see a difference between
reading from actual memory or from cache. It is too slow to notice any change.
Should be a pure assembler code to reliably estimate the effect of caching.

~~~
jnordwick
If you are already running untrusted binaries, there are bigger issues.
Without a JS exploit, I'm not sure this is a big problem.

And we haven't seen a real world binary version either. The versions I've seen
all take running starts so to speak.

------
Animats
Some CPUs can fetch data from memory they are not currently entitled to fetch.
The permissions checking is done in parallel with the fetch, so a fetch and
even some use of the data can take place. The result can't propagate back as a
result to the program, if the retirement unit is doing its job, but can affect
cache loads.

So how early does that chain of events have to be stopped? If it's stopped
before the unwanted fetch, security is sound - the CPU never pulls in the data
it shouldn't see. Future CPU designs are probably going to have to do that,
even at some cost in performance (but look for complicated explanations from
Intel as to why this isn't really necessary). That may require more permission
info in the various tables and caches of the memory system.

Even if the memory interface looks at page permissions earlier, there's the
possibility of using this attack to peek at data in the same address space,
data protected only by checks in the code. This may allow snooping around
within application programs such as browsers.

It used to be that you only worried about timing issues for speculative
execution in crypto code. It's important that strong encryption code take
constant time regardless of the data. Otherwise, timing measurements of known-
plaintext attacks may yield info about the key. Now it's a broader problem.

Bleah. Fortunately, my CPU designer friends are all retired now and don't have
to deal with this.

------
jgrahamc
My explanation: [https://blog.cloudflare.com/meltdown-spectre-non-
technical/](https://blog.cloudflare.com/meltdown-spectre-non-technical/)

~~~
33W
Great analogy, explanation, and illustrations. Thanks for sharing.

------
koolba
How many bits per second (or kbps, mbps, etc) of memory reading is possible
with Meltdown when run from JS vs running natively?

Somewhat related, is it possible to neuter the JS engines in Firefox or Chrome
so that they don't JIT JS and would doing so have any real world impact on
mitigating this attack? If it relies on speedy execution to be possible maybe
a solution would be to have a NeuterScript extension that deliberately slows
things down.

~~~
lykr0n
To do it fast enough will peg the browser to 100% cpu for that thread. Saw a
demo on twitter where they were not reliably able to extract information from
the browser. Native, as in not running inside a browser, has been shown to be
able to extract information reliably. My guess is the extreme amount of
overhead that is required to run JS in an isolated-ish way.

I think the point is that a JS based attack, while possible, would be much use
outside proof of concept

------
unclepresent
Here is the same in 10 lines of pseudo-code

[http://zergo.me/meltdown.html](http://zergo.me/meltdown.html)

~~~
hafta
> var BASE_ADR=zzzz;//Make BASE points to a random 256 bit currently uncached
> memory block that appplication has access to;

That should be 256 bytes.

~~~
unclepresent
yep, corrected. at the end it is a bit more complex than that as cache is
populated in 32-128 bytes increments but I leave it out for simplicity

------
d33
Suggestion - this logo:

[https://razorpay.com/blog/content/images/2018/01/34873319-a5...](https://razorpay.com/blog/content/images/2018/01/34873319-a5d02762-f7ba-11e7-8eca-4e04f2bd4522.png)

Should probably say "Intel" somewhere ;)

~~~
als0
That doesn't seem completely fair. The ARM Cortex-A75 is affected.
Furthermore, the A72 and A57 also suffer from a different variant of Meltdown.

------
bogomipz
I have a couple question about Meltdown and the Intel chips. My understanding
is that a key part of this is that upon speculative execution the page table
permission checks only happen when the "transient"instruction is retired.

Was this simply a performance engineering trade off made by Intel? Would
checking the PTE permissions on speculative execution result in giving up any
performance gained by the speculative execution?

------
konschubert
My naive expectation would have been that the CPU maintains some kind of
process level isolation.

My new understanding is now that the concept of a process and isolation of
processes is handled by the kernel.

This is probably a silly question, but maybe we _could_ handle process
isolation in the CPU somehow?

~~~
mr_toad
I doubt you’d want to implement the process scheduler in hardware, it would be
very inflexible.

------
Blazespinnaker
I am surprised this hasn’t been explained in terms of a vulnerability chain.
Ie, break it up into parts. As soon as you have an oracle providing cache
timing info you have a vulnerability.

Basic Bayesian analysis suggests that there is more fruit to fall off the
tree.

~~~
scott_s
I agree: as long as side channel attacks are possible, there's always the
possibility that someone else will find some other vulnerability that can be
combined to create an exploit. The paper
([https://meltdownattack.com/meltdown.pdf](https://meltdownattack.com/meltdown.pdf))
does present it in that way: they show you the parts, then show how they fit
together for a workable exploit.

------
xroche
Cache line is 64 bytes on x86-64 if I am not mistaken, not 4096 :)

Nice read anyway.

