
Unix Syscalls - rspivak
https://john-millikin.com/unix-syscalls
======
vesinisa
There is a great deal more information about how sycalls are implemented on
Linux in this excellent free GitHub book called _linux-insides_ :

[https://github.com/0xAX/linux-
insides/tree/master/SysCall](https://github.com/0xAX/linux-
insides/tree/master/SysCall)

I can highly recommend it to anyone who wants to learn more about the kernel.
This book is particularly great because it does not assume any kernel
programming background knowledge - superficial understanding of C should be
enough.

~~~
rurban
This view is a bit better: [https://0xax.gitbooks.io/linux-
insides/content/SysCall/](https://0xax.gitbooks.io/linux-
insides/content/SysCall/)

------
eboyjr
It appears that Linux x86-64 does not support more than 6 arguments for
syscalls[0]. So if you want to pass more than 6 arguments, use a struct... or
modify nearly every step of the syscall process in the kernel.

[0]:
[https://elixir.bootlin.com/linux/v4.18-rc8/source/arch/x86/i...](https://elixir.bootlin.com/linux/v4.18-rc8/source/arch/x86/include/asm/syscall.h#L117)

~~~
shawn
It’s because args to syscalls are passed in registers rather than the stack.
This is a security mechanism I believe, but I’m mostly guessing based on xv6.

Basically, if you want a kernel space and a user space, you have to ensure
users can’t breach kernel space. But this is the part where my logic runs dry:
could a malicious caller control the return address that’s pushed to the
stack? If so, could you redirect the kernel’s execution to an arbitrary
physical address? Or does the kernel switch back into user mode just before
calling RET?

Sigh... time to re-read xv6. I think interrupts are involved.

~~~
monocasa
> It’s because args to syscalls are passed in registers rather than the stack.
> This is a security mechanism I believe, but I’m mostly guessing based on
> xv6.

It's probably for speed reasons. Marshaling from user space is expensive due
to all of the checks you have to make to not allow user to crash kernel.

> Basically, if you want a kernel space and a user space, you have to ensure
> users can’t breach kernel space. But this is the part where my logic runs
> dry: could a malicious caller control the return address that’s pushed to
> the stack? If so, could you redirect the kernel’s execution to an arbitrary
> physical address? Or does the kernel switch back into user mode just before
> calling RET?

Return from interrupt uses the special iret instruction. That makes sure that
the return happens in a user context if need be by atomically setting the
flags and ip registers at the same time.

~~~
Taniwha
yes exactly this - once upon a time (V6/V7 on the PDP-11) when I was younger
sys call parameters were on the stack, I worked for a company that ported Unix
to various CPUs/MMUs, we'd knock one out every 6 weeks or so - on some MMUs
accessing user space from kernel space (safely) was extremely slow - we
discovered that switching syscalls to pass parameters in was a real
performance hog, and benchmarking showed that passing in registers was far
faster in all systems. Our systems supported both sorts of system calls. When
I wrote the original 68k system V ABI I included register passing as the
default

~~~
cptnapalm
I'm learning PDP-11 assembly and would like to play around with some OS stuff.
This was very helpful to know. Thanks.

