
An introduction to virtual memory - signa11
https://www.internalpointers.com/post/introduction-virtual-memory
======
chc4
A few months ago one of my friends sent a picture of some disassembled x86
they didn't understand: they guessed it was something like `alloca` because it
was modifying the stack pointer based on the function parameters, but before
it did that it had a loop over the size in 0x1000 chunks. The body of the
loop, however, was just `test [ecx], eax; cmp eax, 0x1000; jae ...`, which
seems trivially like a no-op! `test` is purely to set flags, but `cmp` would
overwrite the flags immediately after, making the conditional jump independent
of whatever the `test` does.

But in reality, `test` has side effects since it's _dereferencing_ ecx, which
causes a memory page access, and potentially trap. Compilers insert what look
like pointless loops ("stack probes") for `alloca` because if you try to stack
allocate multiple pages worth of memory at once and then read only the later
bytes, you can skip over stack guard pages that the kernel uses to page in
memory on-demand or crash the program if it could cause the stack to clash
into the heap.

It's a really nice example of the "there's always something further down" type
thinking you need for low-level stuff imo. This was from reading x86
disassembly, which is an advanced topic that 90%(?) of programmers never have
to care about because it's so "low level"...until you have to start caring
about virtual memory, or your kernel implementation details.

~~~
CountSessine
Holy-moley - you know I’d never thought very hard about how alloca would work
if it skipped a stack guard. Here’s a question though - wouldn’t you have the
same problem if you just had a really really big stack frame in your function?
If you enter the function and then call another function before touching any
locals, you could jump past stack guards pushing arguments on to the stack,
no? Presumably the compiler needs to anticipate this and grab those pages like
in the alloca case?

~~~
saagarjha
Depends on the language! In C, yes, you can write right past the guard page:
[https://godbolt.org/z/HnnNPU](https://godbolt.org/z/HnnNPU). Rust will insert
a stack probe when necessary:
[https://godbolt.org/z/N8mGrz](https://godbolt.org/z/N8mGrz)

~~~
monocasa
In C that compiler is allowed to stack probe. The compilers just make you opt
into it with -fstack-check and the like to make sure they didn't break some of
your terribly written code.

~~~
saagarjha
That code is standards-compliant, because the standard has no knowledge of the
stack. I actually had those flags enabled when I was messing around with it
and they did nothing useful so I chose to leave them out. (Yes, the compiler
will probe the stack–but only if you alloca, it seems. Nobody seems to have
extended this protection to stack frames bloated by automatic variables.)

~~~
monocasa
I couldn't get clang to, and gcc seems to be omitting the checks in some cases
with your recursive call to foo (maybe it realizes that 'a' isn't actually
accessed, but still emits the increase and decrease?). But gcc given a program
that passes the address of 'a' into another function outside the compilation
unit does emit stack probes with -fstack-check and -fstack-protector-all.

[https://godbolt.org/z/jHwZiG](https://godbolt.org/z/jHwZiG)

And I don't care if they didn't come out and say it in the spec. If your
program depends on overflowing the stack into other regions of memory you need
to be taken out into the street and publicly dealt with as an example to
others. That being said, I'm sure there's something about implementation
defined behavior around exceeding the platform's max for automatic storage
duration objects. If not I may try to sneak that into C2X, lol.

------
vincent-manis
This was a good article. For my money, one of the best conceptual
introductions to virtual memory was an article by Jeff Berryman, over 40 years
ago. It's been reprinted many times, including at
[https://en.m.wikisource.org/wiki/The_Paging_Game](https://en.m.wikisource.org/wiki/The_Paging_Game).

------
Koshkin
Curiously, you do not need memory virtualization if all you want is memory
protection and process isolation: this can be achieved by a simpler piece of
hardware that just ensures that a range of the upper bits of the memory
address used by the process matches a certain pattern, or tag. This
effectively partitions the physical RAM.

~~~
fuklief
That sounds like a capability machine e.g., CHERI[1] It seems those might
become relatively mainstream in a few years, as ARM seems to be jumping on
board [2]

[1]:
[https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/](https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/)
[2]:
[https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri...](https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri-
morello.html)

