I can highly recommend it to anyone who wants to learn more about the kernel. This book is particularly great because it does not assume any kernel programming background knowledge - superficial understanding of C should be enough.
Basically, if you want a kernel space and a user space, you have to ensure users can’t breach kernel space. But this is the part where my logic runs dry: could a malicious caller control the return address that’s pushed to the stack? If so, could you redirect the kernel’s execution to an arbitrary physical address? Or does the kernel switch back into user mode just before calling RET?
Sigh... time to re-read xv6. I think interrupts are involved.
It's probably for speed reasons. Marshaling from user space is expensive due to all of the checks you have to make to not allow user to crash kernel.
> Basically, if you want a kernel space and a user space, you have to ensure users can’t breach kernel space. But this is the part where my logic runs dry: could a malicious caller control the return address that’s pushed to the stack? If so, could you redirect the kernel’s execution to an arbitrary physical address? Or does the kernel switch back into user mode just before calling RET?
Return from interrupt uses the special iret instruction. That makes sure that the return happens in a user context if need be by atomically setting the flags and ip registers at the same time.
32-bit "fast" syscalls use sysenter/sysexit.
64-bit "fast" syscalls use syscall/sysret.
Haven't really looked but I suspect sysexit and sysret are somewhat special cased versions of iret.
The caller does not have control over the return address. When int n or syscall instructions are executed, it's the processor who pushes the current context onto the kernel stack (pointed by ss0:esp0), so when you run iret, everything will go back to normal.
Even if the caller had control over this return address, the CR3 does not change [without taking KPTI into consideration], so the memory mappings will still be the same, and everything would be handled with paging enabled, so there's no "arbitrary physical address". You would only be allowed to jump to anything that you have already mapped, and given that there's a privilege change, you would only be able to access userspace memory.
This has nothing to do with whether the syscall parameters are passed down the stack or not. In x86 and x86_64, when you make a syscall and the kernel handles it, the stacks change, so if you were to pass parameters via the stack, you would need to be able to access the userspace stack from the kernel and it sounds like a mess (but possible). The registers, on the other hand, are available for the syscall handler to use, so it's easier to just set the parameters there.
This was good, but it leaves a lot out. No mention of kernel space.
Isn't the number of arguments already determined by the nature of the syscall? Are you talking about a situation where one creates new syscalls?