
Ways of Implementing a System Call - lelf
https://x86.lol/generic/2019/07/04/kernel-entry.html
======
burfog
There are many more. It would be difficult to enumerate them all. Nearly
anything that causes an exception is easy. Examples:

The 0xf1 byte, an "icebp" or "int1" instruction

In 32-bit code, use an FPU stack underflow

In 64-bit mode, access a non-canonical address

In 32-bit code, use the "into" instruction

Write data using a segment selector that goes beyond the LDT limit

With the alignment checking enabled, do a misaligned memory operation

Of course, we can also do system calls without causing exceptions. Examples:

Write the system call number to a well-known memory location that the OS will
poll

Modulate the CPU temperature by running code. The OS decodes the signal. Use
message passing that involves OFDM, LDPC, ASN.1, and digital signatures.

~~~
etaioinshrdlu
> Modulate the CPU temperature by running code. The OS decodes the signal. Use
> message passing that involves OFDM, LDPC, ASN.1, and digital signatures.

I like how you mixed this in with real suggestions.

~~~
kragen
It's an excellent suggestion if the code you're trying to communicate with is
not actually the kernel, but rather other code on the machine that the kernel
is attempting to prevent you from communicating with. Except for the ASN.1
part. That's never a good idea.

~~~
waterhouse
Or code _not_ on the machine. I remember: "PowerHammer: Exfiltrating Data from
Air-Gapped Computers Through Power Lines"
[https://news.ycombinator.com/item?id=16821513](https://news.ycombinator.com/item?id=16821513)

~~~
EGreg
[https://www.themarysue.com/ibm-black-
team/amp/](https://www.themarysue.com/ibm-black-team/amp/)

------
eloff
Before you look at these benchmarks and concluded system calls are cheap,
please be aware the real cost is not 100 cycles, it's 100 cycles plus the cost
of clobbering your L1 and often L2 cache, instruction cache, TLB, etc. This is
made worse on many current systems due to mitigations for Intel's buggy
processors. The real cost is often more than 10,000-30,000 cycles. Which is
why for anything very syscall heavy, like a network server handling millions
of very small packets per second you get more than an order of magnitude
better performance by bypassing the kernel and avoiding syscalls altogether.
It's also why threads are "faster" than processes even although they're
implemented the same way, no need to flush the TLB on context switch.

~~~
cesarb
The current trend seems to be to use ring buffers between the kernel and
userspace (see: perf, io_uring, this proposal
[https://lwn.net/Articles/789603/](https://lwn.net/Articles/789603/)), which
reduce this overhead.

~~~
eloff
At the lowest level this is how it works, so using ring buffers as the
abstraction makes tremendous sense. Plus it can be done safely without
requiring a context switch on the fast path, with a little care.

Its also easy to naturally batch things. When the load is low the batch size
is 1, and then it naturally grows as more work gets queued during processing
of the prior work.

I love ring buffers and queues.

------
MarkSweep
Fun fact, not only can you use iret to return to user-mode from kernel-mode
and kernel-mode from kernel-mode (as footnote 5 mentions), you can also use
iret to return from user-mode to user-mode. It is a handy way to restore RIP,
RSP, and RFLAGS at the same time. .NET CoreCLR uses it during exception
handling on Unix:

[https://github.com/dotnet/runtime/blob/4db72366e8e49c30d7aa6...](https://github.com/dotnet/runtime/blob/4db72366e8e49c30d7aa6d2cc920cf063fb911ea/src/coreclr/src/pal/src/arch/amd64/context2.S#L183)

------
why_only_15
I heard that, especially with spectre, most of the cost of context switching
comes from messing up your caches + TLB
([https://blog.tsunanet.net/2010/11/how-long-does-it-take-
to-m...](https://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-
context.html)). There's a graph on Dan Luu's blog about how it takes 14,000
cycles to get back to full performance after a syscall:
[https://danluu.com/images/new-cpu-
features/flexsc_ipc_recove...](https://danluu.com/images/new-cpu-
features/flexsc_ipc_recovery.png).

~~~
foota
I wonder if it would be possible (& worth doing, since it would require cross
cutting work) to enable saving and loading cache hit profiles

~~~
dfox
The problem is that on i386 the TLB entries are keyed only by linear address
and not by page table that the entry came from and writing into the page table
register (even rewriting the same value) is documented as triggering TLB
flush.

On typical RISC platforms that directly expose TLB to supervisor TLB entries
usually have few bits that can be used to record for which process is the
entry valid and thus there is no need to flush whole TLB on each task switch.

(Modern i386 CPUs actually have somewhat similar mechanism, but it is only
usable by hypervisors, not by normal operating systems)

------
anaisbetts
One of the ways that many operating systems (I'm fairly certain that Windows
did this at one point) implemented syscalls long ago was to execute an illegal
instruction, because this was the fastest way at the time to transition to
kernel mode. x86 even has a specific opcode that is guaranteed to never be
used aka _always_ be an illegal instruction

~~~
amluto
x86 has two: UD1 and UD2. Sadly, no one can agree on the _length_ of UD1, even
though everyone agrees that it’s illegal. UD2 is well behaved.

------
pcr910303
OK, I really don't know anything about assembly/OSes, but if the price of
going to kernel mode with sysenter/syscall is smaller than a not that
pricey(the article says it's less than a single 64-bit integer division), does
that mean microkernels can utilize that to improve performance?

What are the shortcomings of the method compared to interrupts?

~~~
dfox
On microkernels the issue is usually not with syscall performance but with
essentially every operation causing a task switch which on i386 causes TLB
flush (which in post-Spectre days seems somewhat moot) and in many cases
involves interacting with task scheduler.

(QNX uses interesting hack to sidestep this: the message passing interface
does not really pass messages, but is simply cross-addressspace call/return
that in essence temporarily moves a thread to different process without
rescheduling anything)

------
renox
Saying that the cost of a system call is X cycles is IMHO a bit "short",
what's the effect on the various caches? TLB..

Plus nearly all system calls must validate the user inputs, so the minimum
cost for a system cost is far from the sysenter cost itself.

~~~
monocasa
L4 is all about removing all of those concerns from at least a variant of IPC,
FWIW.

------
leoc
That domain name though!

------
DonHopkins
Another super-efficient way of implementing a system call is described in
Alexia Massalin's PhD classic thesis on the Synthesis Kernel:

[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.4871&rep=rep1&type=pdf)

>Synthesis: An Efficient Implementation of Fundamental Operating System
Services

>This dissertation shows that operating systems can provide fundamental
services an order of magnitude more efficiently than traditional
implementations. It describes the implementation of a new operating system
kernel, Synthesis, that achieves this level of performance.

>The Synthesis kernel combines several new techniques to provide high
performance without sacrificing the expressive power or security of the
system. The new ideas include:

>\- Run-time code synthesis -- a systematic way of creating executable machine
code at runtime to optimize frequently-used kernel routines | queues, buffers,
context switchers, interrupt handlers, and system call dispatchers | for
specific situations, greatly reducing their execution time.

>\- Fine-grain scheduling -- a new process-scheduling technique based on the
idea of feedback that performs frequent scheduling actions and policy
adjustments (at submillisecond intervals) resulting in an adaptive, self-
tuning system that can support real-time data streams.

>\- Lock-free optimistic synchronization is shown to be a practical, efficient
alternative to lock-based synchronization methods for the implementation of
multiprocessor operating system kernels.

>\- An extensible kernel design that provides for simple expansion to support
new kernel services and hardware devices while allowing a tight coupling
between the kernel and the applications, blurring the distinction between user
and kernel services.

>The result is a significant performance improvement over traditional
operating system implementations in addition to providing new services.

Previous discussion on HN:

The Synthesis Kernel (1988) [pdf] (usenix.org)

[https://news.ycombinator.com/item?id=15076642](https://news.ycombinator.com/item?id=15076642)

[https://www.usenix.org/legacy/publications/compsystems/1988/...](https://www.usenix.org/legacy/publications/compsystems/1988/win_pu.pdf)

It’s Time for a Modern Synthesis Kernel (regehr.org)

[https://news.ycombinator.com/item?id=20337231](https://news.ycombinator.com/item?id=20337231)

[https://blog.regehr.org/archives/1676](https://blog.regehr.org/archives/1676)

>Alexia Massalin’s 1992 PhD thesis has long been one of my favorites. It
promotes the view that operating systems can be much more efficient than then-
current operating systems via runtime code generation, lock-free
synchronization, and fine-grained scheduling. In this piece we’ll only look at
runtime code generation, which can be cleanly separated from the other aspects
of this work.

Valerie Henson's commentary on the Synthesis kernel (2008) (lwn.net)

[https://news.ycombinator.com/item?id=10441995](https://news.ycombinator.com/item?id=10441995)

[https://lwn.net/Articles/270081/](https://lwn.net/Articles/270081/)

KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems
Services

>When I was but a wee computer science student at New Mexico Tech, a graduate
student in OS handed me an inch-thick print-out and told me that if I was
really interested in operating systems, I had to read this. It was something
about a completely lock-free operating system optimized using run-time code
generation, written from scratch in assembly running on a homemade two-CPU SMP
with a two-word compare-and-swap instruction - you know, nothing fancy. The
print-out I was holding was Alexia (formerly Henry) Massalin's PhD thesis,
Synthesis: An Efficient Implementation of Fundamental Operating Systems
Services (html version here). Dutifully, I read the entire 158 pages. At the
end, I realized that I understood not a word of it, right up to and including
the cartoon of a koala saying "QUA!" at the end. Okay, I exaggerate - lock-
free algorithms had been a hobby of mine for the previous few months - but the
main point I came away with was that there was a lot of cool stuff in
operating systems that I had yet to learn.

>Every year or two after that, I'd pick up my now bedraggled copy of
"Synthesis" and reread it, and every time I would understand a little bit
more. First came the lock-free algorithms, then the run-time code generation,
then quajects. The individual techniques were not always new in and of
themselves, but in Synthesis they were developed, elaborated, and implemented
throughout a fully functioning UNIX-style operating system. I still don't
understand all of Synthesis, but I understand enough now to realize that my
grad student friend was right: anyone really interested in operating systems
should read this thesis.

Other interesting mentions:

[https://news.ycombinator.com/item?id=19598385](https://news.ycombinator.com/item?id=19598385)

>Massalin's Synthesis kernel [10] stands out to me as another example that for
me crosses the boundaries and is both outstanding engineering and art in
challenging the ideas of how systems could be built in ways that make me look
at it just as much because of the beauty of it as because of the practical
ideas. (The main thing it brought was the idea of making the kernel adapt to
its clients by generating custom code for system calls; to me it is art
because it turned the idea of a kernel as something static on its head; and
conceptually we're still just scraping the very surface of dynamic code
generation in kernels that Massalin's thesis started playing with). There are
many works like that, where the specific implementations are irrelevant -
nobody uses the Synthesis kernel - but where the ideas are as important as any
expressed in more overt or "intentional" art.

[https://news.ycombinator.com/item?id=20507155](https://news.ycombinator.com/item?id=20507155)

>ggm: It feels to me like if you can do runtime code call checks, and confirm
which actual calls you make, then stripping the bits of libc and associated
libraries out, being left with only the strictly required calls, and then by
extension the syscalls, and then by extension the kernel elements, is actually
possible a lot of the time. So, library -> reduced library -> reduced calls ->
reduced syscalls -> reduced kernel state is a sequence or set or something, of
applied minimisations which can be done, if you can predict all the call paths
in your code and their dependencies.

>NelsonMinar: That sounds a little like the ideas in the Synthesis kernel, the
idea of the kernel JIT-compiling itself to optimize your code path.

------
monocasa
> Although Call Gates are the somewhat official way of implementing system
> calls in the absence of the more modern alternatives discussed below, I’m
> aware of no use except by malware.

I think 386BSD and early FreeBSD used call gates for syscalls.

------
TheDesolate0
Was reading this article while waiting for my kernel to build.

bwrawp giving my LFS nightmares, and I'm not super keen on user_namespace, but
suid isn't working either, FML

------
Paperweight
I wonder if mainstream CPUs will ever get back to being truly programmable
with software instructions, instead of running off and speculating on what you
might do next.

~~~
Gladdyu
Intel tried that with Itanium. It didn't go well.

~~~
wolfgke
Also keep the delay slots of MIPS in the back of your mind, which with the
evolution of MIPS became a burden.

~~~
saagarjha
They’re a burden to work with regardless of whether they match how pipelines
work.

------
trumbitta2
Generate thermal element.

Form arrow shape.

Discharge!

~~~
trumbitta2
I have no regrets :D

