
Anatomy of a Program in Memory (2009) - chaitanyav
https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
======
okket
Previous discussions:
[https://hn.algolia.com/?query=Anatomy%20of%20a%20Program%20i...](https://hn.algolia.com/?query=Anatomy%20of%20a%20Program%20in%20Memory&sort=byDate&dateRange=all&type=story&storyText=false&prefix&page=0)

(or click "past" under the title, also helpful to check when you submit a
link)

------
finchisko
After reading many articles about virtual memory and how kernel space is
mapped into every process I don't understand why it is necessary. Why can't
process only have mapped it's user mode space? Also it only seems to be case
of unixes and windows. Not sure how exactly it's done in OSX but "Mac OS X
does not map the kernel into each user address space, and therefore each
user/kernel transition (in either direction) requires an address space
switch."
[https://flylib.com/books/en/3.126.1.91/1/](https://flylib.com/books/en/3.126.1.91/1/)

~~~
MarkSweep
On x86, I t was presumably for performance, so that the TLB does not have to
be flushed when switching from user to kernel mode. x86 requires some kernel
memeory to be mapped always, for example the stack for syscall and trap
handlers. So by keeping everything mapped into memory, the kernel did not have
to worry about which parts were needed to handle syscalls and which were not.
These kernel pages were marked as “supervisor only”, so only the kernel code
could actually read and write them.

I say all of this in the past tesnse, since Meltdown makes it possible to read
all that kernel memory. Kernels now keep most of the kernel memory unmapped
when user mode is executing.

~~~
bogomipz
>"x86 requires some kernel memeory to be mapped always, for example the stack
for syscall and trap handlers."

Can you elaborate on what you mean be x86 requires that the kernel stack
always be mapped into a process address space in order for system calls?

The kernel always knows where a process's kernel stack is located as there is
a pointer to it in the user process's task_struct. It is only in kernel mode
that the kernel switches the CPU's stack pointer to use that that processes
kernel stack.

~~~
monocasa
You can't unmap the kernel stack in Meltdown mitigation, because the syscall
instruction will want to push to the kernel stack before you as the kernel has
a chance to map the kernel stack.

~~~
bogomipz
It sounded like the OPs comment wasn't strictly about the post-meltdown era
and that they were commenting on the general case. But maybeI misinterpreted
that?

~~~
monocasa
Oh, sure, my bad.

OK, in the context of 'why can't you cleanly have the kernel in a different
address space from user processes on x86', the same reasons apply. It's a
chicken/egg thing, as a syscall instruction executes and touches the kernel
stack before you have a chance to change mmu mappings.

There are versions of Darwin for x86 (but no released versions of full OSX
AFAIK) that separate the address spaces, but they reserve a (albeit much
smaller) piece of virtual address space at the top for the kernel in all
address spaces in order to facilitate the transition to the full kernel
address space.

~~~
bogomipz
Right that would be a pretty awful place to segfault :)

Thanks for the clarification.

------
bogomipz
The author states:

>It is also possible to create an anonymous memory mapping that does not
correspond to any files, being used instead for program data."

This isn't strictly true though is it? It was my understanding even mmap()
MAP_ANONYMOUS used a file interface, and that the way the kernel creates
anonymous maps is by creating an instance of /dev/zero in tmpfs. Although I
believe the file descriptor might be ignored however.

~~~
monocasa
It seems to just pass around a null struct file pointer and special cases
that.

------
Myrmornis
if the process depicted in the diagram were to start a second thread, where
would that second thead’s stack go in the diagram? The two threads would share
the same heap.

~~~
monocasa
Another ~8MB(8MB plus guard pages) chunk of virtual memory that's free up
there near the shared mappings.

You're totally right though, that threads complicated the traditional "stack
grows down heap grows up" view of a Unix user address space.

~~~
Myrmornis
Thanks. So can I check I'm understanding correctly

\- If a process has many threads, their stacks are all located within a single
virtual address space corresponding to the user process?

\- If one thread grows down and is about to overwrite the top of another
thread's stack, does the OS detect this automatically and do some sort of
reallocation procedure?

~~~
monocasa
> If a process has many threads, their stacks are all located within a single
> virtual address space corresponding to the user process?

Yep!

> If one thread grows down and is about to overwrite the top of another
> thread's stack, does the OS detect this automatically and do some sort of
> reallocation procedure?

The kernel reserves a 8MB region for each stack and that's it (even the
initial stack). So you wouldn't get overlapping stacks per se; the regions are
preallocated. The kernel does try to detect stack overflow/underflow with
guard pages, but that's just a best attempt kind of thing, and of you
underflow by more than page you can just end up just corrupting memory.

And all of this is for C's sort of standard model. 'Split stacks' is a scheme
closer to your second question, but there's a lot of overhead of that model,
and not a lot of runtimes use it.

I've also even more rarely seen a model that allocates stack frames on the
heap and links them together in a linked list.

But like I said, these schemes are _very_ in practice.

~~~
Myrmornis
Thanks for this.

------
dsign
>> In Linux, kernel space is constantly present and maps the same physical
memory in all processes.

That's right there together with the city states of Greece and other ancient
memories. Meltdown and Specter happened.

~~~
monocasa
You could still make an argument for that if you squint hard enough. The
virtual memory is still reserved, and a transition to kernel mode still has
user space mapped, and the kernel's view of memory as well.

~~~
dsign
Thanks for clarifying! Is whatever the kernel does now in transition to user
space expensive because it's somehow proportional to the amount of actual
memory that the kernel is using or has reserved?

~~~
monocasa
No, it's still more or less a O(1) operation, it's just expensive to fully
flush TLBs.

------
newscracker
The title needs 2009 in it, since this article is from that year.

~~~
coffeeacc
64-bit update would be great.

