
C Runtime Overhead - lunixbochs
http://ryanhileman.info/posts/lib43
======
justincormack
Static linking to Musl libc is pretty minimal, not sure you really need
assembly

strace -tt /tmp/test

15:38:33.245549 execve("/tmp/test", ["/tmp/test"], [/* 16 vars */]) = 0

15:38:33.246436 arch_prctl(ARCH_SET_FS, 0x601070) = 0

15:38:33.246617 set_tid_address(0x6010a0) = 17227

15:38:33.246875 exit_group(0) = ?

15:38:33.247060 +++ exited with 0 +++

~~~
lunixbochs
My solution in the final trace was a minimal (300 sloc) libc with a total of
two lines of assembly (to bootstrap syscalls) and didn't do any syscalls
beside read, write, and exit. I've also toyed with the idea of testing exit
via error (SIGSEGV, SIGILL, etc) because cleanup might be faster on a forced
exit.

------
falcolas
Now imagine the speed if you could get rid of the syscall overhead. You would
have to write a lot more code (and it probably wouldn't run on Linux due to
trying to access protected memory locations or IO ports), but it would be
significantly faster.

Aah, the promise of the Exokernel[1].

[1]
[http://u.cs.biu.ac.il/~wiseman/2os/microkernels/exokernel.pd...](http://u.cs.biu.ac.il/~wiseman/2os/microkernels/exokernel.pdf)

~~~
pjmlp
Well, when I started programming, C and Pascal execution speed were looked
down like managed runtimes are still nowadays.

Serious application development for consumer computers was done in straight
Assembly. Anything higher than that was "scripting". :)

~~~
scott_s
Back when digital, electronic computers were new, assembly was called
"automatic programming".

~~~
smhenderson
Good old Autocode. Finally you didn't need to enter all instructions by
number, it was considered a great leap forward at the time...

 _The first autocode and its compiler were developed by Alick Glennie in 1952
for the Mark 1 computer at the University of Manchester and is considered by
some to be the first compiled programming language. His main goal was to make
the programming of Mark 1 machine, known for its partcularly abstruse machine
code, comprehensible. Although the resulting language was much clearer than
the machine code, it was still very machine dependent._

From:
[https://en.wikipedia.org/wiki/Autocode](https://en.wikipedia.org/wiki/Autocode)

------
eloff
It's interesting to note that Go doesn't depend on libc. However, I rather
doubt the Go runtime bootstraps any faster than libc, so it probably wouldn't
help here. One consequence is that Linux and libc are tightly coupled and some
of the "syscall" behavior you depend on is actually implemented as a wrapper
in libc. This has caused some behavioral differences between Go and C that
have surprised the language developers.

~~~
ufo
can you give an example?

~~~
justincormack
Linux has very non Posixy ideas about threads, which are basically processes
with shared memory. libc is supposed to present a Posix compatible interface.
See [1] for some examples. Go simply exposes raw syscalls.

[1] [http://ewontfix.com/17/](http://ewontfix.com/17/)

------
neonscribe
Micro-benchmarks lead to optimizing at the wrong level. In the real world, if
process startup time is a bottleneck in your application then you are creating
too many processes too quickly.

~~~
lunixbochs
You're absolutely right about the majority of the time in the real world. A
ctf level is a contrived case, but that doesn't make this any less interesting
for me. This was mostly a challenge of "how fast can I run my existing
program?" and I think I nailed it pretty well with lib43 [0] (which is a
working _extremely minimal_ libc and crt0 replacement in about 300 lines of
code).

My favorite part is my x86_64 linux syscall interface, in which I define a
short assembly stub (syscall.c) and simply list syscall names (syscall.list)
to wire them up. The calling convention between user-space functions and the
kernel was close enough I was able to reduce the inline asm to two
instructions (`mov rax, _syscall num_ ; syscall;`).

[0] [https://github.com/lunixbochs/lib43](https://github.com/lunixbochs/lib43)

------
101914
"...huck the whole thing and rewrite it in assembly."

What are the functions in libc that you truly need and use?

How many of those routines are you incapable or unwilling to rewrite yourself?

I have asked myself these questions on many occasions.

The cruft laden operating system I use (which I imagine by many people's
standards is "bare bones") is hopelessly dependent on "libc". But I find that
the userland programs I truly need can be written using only a minimal subset
of the many functions that are compiled into it.

Could I write any of these in assembly and create my own "libasm" to do the
syscalls? This is a project I have contemplated many times.

To motivate myself to keep writing more assembly in the age of scripting
language du jour madness, I write a small program in assembly language and
then write the same program in C and then step through each compiled program
(e.g., with ald). Seeing some of the overhead is an effective motivator.

~~~
tptacek
You for the most part don't need to rewrite anything. You can take the source
code for most libc functions and compile them independently.

~~~
justincormack
Musl libc has every function in a separate file, so if you statically link you
just get what you use, modulo a number of functions that use each other. So
building it yourself is not going to be much better.

~~~
cnvogel
Just a remark: If you find it confusing to scatter your code over too many
small files with only one function each, look up "-ffunction-sections" on the
gcc manual.

------
oso2k
This is cool. I've been working on something similar[0]. It only takes a few
more lines to add some of the other useful libc/posix/sysv things. My eventual
goal is to rewrite asmutils'[1] httpd [2] in C and get a binary about 2-3x in
size (2-3K).

[0] [https://github.com/lpsantil/rt0](https://github.com/lpsantil/rt0)

[1]
[http://asm.sourceforge.net/asmutils.html](http://asm.sourceforge.net/asmutils.html)

[2]
[https://github.com/leto/asmutils/blob/master/src/httpd.asm](https://github.com/leto/asmutils/blob/master/src/httpd.asm)

------
tedunangst
Something funny about the numbers here. The assembly stub took 8ms, but the
whole program only takes 1ms?

~~~
darshan
That was a typo -- the assembly stub took about 0.8 ms as shown in the trace.

~~~
lunixbochs
Thanks, fixed.

