
Show HN: A minimal C runtime for Linux i386 and x86_64 in 87 SLOC of C - oso2k
https://github.com/lpsantil/rt0
======
lunixbochs
For x86_64 Linux only, here's a single-file crt0 (with arbitrary syscalls
working from C-land):
[https://gist.github.com/lunixbochs/462ee21c3353c56b910f](https://gist.github.com/lunixbochs/462ee21c3353c56b910f)

Build with `gcc -std=c99 -ffreestanding -nostdlib`. After -Os and strip, a.out
is 1232 bytes on my system. I got it to 640 bytes with `strip -R .eh_frame -R
.eh_frame_hdr -R .comment a.out`.

Starting at ~640 bytes, maybe you could come close to asmutils' httpd in
binary size. Failing that, take a look at [1]

You can get pretty far without a real libc, keeping in mind:

\- You probably want fprintf, and things can be slower without buffered IO
(due to syscall overhead)

\- `mmap/munmap` is okay as a stand-in allocator, though it has more overhead
for many small allocations.

\- You don't get libm for math helper functions.

Of course, you can cherry-pick from diet or musl libc if you need individual
functions.

[1] "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux"
[http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...](http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html)

~~~
acqq
And the good thing for the statically linked executables is that the binary
can work on every Linux as the syscall interfaces remain the same. The
critical point is not depending on something else platform specific other than
the syscalls, but for some kind of utilities its doable.

~~~
lunixbochs
Pros:

\- Portable (as long as you don't need dlopen/dlsym). If you do need to dlopen
other libraries, you can run into issues on ARM mobile with mixed soft/hardfp
calling conventions. Otherwise you're fine if the ABI is solid.

\- No need to atomically update your system, as applications stand alone.

\- Slightly faster program startup time.

Cons:

\- Slightly larger file size (assuming you linked glibc... doesn't apply so
much to things like musl or lib43).

\- You can't link in OpenGL client libraries, so games will probably still
need the dynamic linker.

~~~
mmozeiko
On x86 and x86_64 you should be fine to use OpenGL via dlopen/dlsym, right?

------
kentonv
I like this a lot more than I should. :)

Somehow I've always found system calls far more pleasant to use than "section
3" C library interfaces, and it makes me sad that I'm pulling in some 3MB of
library code (libc + libm + pthread; not even counting the dynamically-loaded
stuff like nsswitch) that I mostly don't want.

Sadly as a C++ programmer I do at least need libgcc to implement exceptions,
which in turn likely pulls in glibc anyway. Sigh. (And I haven't completely
cut ties with libstdc++ yet, though I'm close...)

(And yeah, on typical system these libraries are already resident anyway since
other apps are using them, so wanting to avoid them is mostly silly, but it
feels nice!)

~~~
justin66
> And yeah, on typical system these libraries are already resident anyway
> since other apps are using them, so wanting to avoid them is mostly silly,
> but it feels nice!

In principle, shouldn't avoiding those libraries make code friendlier to the
instruction cache, even if the libraries are already in memory? (yes, I'm also
trying to justify the inherent niftyness of this...)

~~~
acqq
The major niftyness of the static linking (it doesn't have to be minimally
minimal) on Linux is that it's the only way I know of (1) to have a nice
portable executable for _every_ x32 Linux or for every x64 Linux, no matter
who made the distro and what he put inside of it.

At the moment, Windows has an advantage: even if some fractions in Microsoft
don't like to support that (2), Windows has a single C crt dynamic library
which is compatible through the different versions of Windows, which can give
you an extremely small portable binary which also links the C crt library (the
starting size of your such binary is typically around 3 KB on the Win64, if I
remember).

\----

(1) If somebody knows anything else please say!

(2) Modern versions of Visual Studio are intentionally made to not link to
that C crt lib. You have to jump through the hoops.

~~~
maxerickson
I've crashed into an issue where "crtdll.dll" did not manage to get installed
on a fresh XP, so software relying on its presence died mysteriously.

First level support for that software didn't understand what I was going on
about.

------
rian
00_start.c is too hacked on x86_64. it'll work but you're getting a less
efficient binary since gcc has to assume _start is called like a normal C
function (e.g. it creates a preamble). you should just implement it in
assembly.

__init() itself also needs some work. the argument list is weird, linux pushes
all of argc, argv, and environ on the stack. why special case argc? also your
method of deriving argv and environ from the function argument's address is
extremely brittle, and i don't think it actually works on x86_64 (if it does,
that's really lucky). you aren't calculating envp using argc, so it's probably
wrong. you could get more efficient code from using __attribute__((noreturn)).
this would be better:

    
    
        /* called from _start */
        void __init(void *initial_stack) __attribute__((noreturn));    
        void __init(void *initial_stack) {
            int argc = *(int *) initial_stack;
            char **argv = ((char **) initial_stack) + 1;
            /* assert(!argv[argc]); */
            char **envp = __environ = argv + argc + 1;
    
            _exit(main(argc, argv, envp));
        }

~~~
oso2k
What do you mean 00_start.c is too hacked on x86_64? As for efficiency of
_start, `objdump -d lib/00_start.o` yields

``` 0000000000000000 <_start>: 0: 48 89 e5 mov %rsp,%rbp 3: 48 8b 3c 24 mov
(%rsp),%rdi 7: 48 8b 74 24 08 mov 0x8(%rsp),%rsi c: 48 83 e4 f0 and
$0xfffffffffffffff0,%rsp 10: e8 00 00 00 00 callq 15 <_start+0x15> 15: c3 retq
```

I only see one byte of inefficiency, the `retq`. Otherwise, it's exactly as
I've specified it.

What I found is (by using gdb) is that the stack contains, in order , `argc`
`[RSP+0]`, ` __argv` `[RSP+8]`, & ` __envp` `[RSP+16]`. I verified this using
'frame' in gdb using the source (RSP) and dest addresses. Honestly, I was
surprised since it matched exactly what was presented to the ELF image on
i386.

Most of the libc's I've surveyed did something like you've specified for
__init. However, gcc generated different code for -O3 & -Os, often breaking
one or the other optimization args, by modifying what was stored/pointed to
for __envp and /or __*argv. While argc, argv, envp, and envpc are soecified
tge

~~~
rian
good point, the gcc optimizer is smart enough to omit the preamble. it's still
hacked though. inline assembly is one thing, messing with compiler-owned
registers is another _especially without a clobber list_. btw why do you
prefix your x84_64 start code with "mov %rsp,%rbp"? "xor %ebp, %ebp" is more
idiomatic and efficient.

according to the abi
([http://www.x86-64.org/documentation/abi.pdf](http://www.x86-64.org/documentation/abi.pdf)),
the stack frame is set up like this:

    
    
        argc = [RSP+0]
        argv = RSP+8
        envp = RSP+8+8*argc+8
    

the same for x86 but replace 8 with 4. your code mirrors this but retrieves
argv by taking the address of the second function argument (because you pass
[RSP+8] to __init(), which is actually argv[0]). C provides no guarantees on
the stability of addresses of passed argument values between caller and
callee, so this makes your code subject to non-standard behavior. i can see
this working for x86 since values as passed through the stack but not for
x86_64 where values are passed through registers.

if you're seeing bugs between gcc -O3 and -Os it's likely due to that, or it's
due to improper use of inline assembly/clobbering registers.

~~~
oso2k
Point taken. I'll fix up the clobbers and noreturns.

And you're right about me having issues with calculating envp. There's
multiple examples of envp=argv+argc+1. But when I examined the stack frames,
that lead to miss calculations and with -Os envp was getting clobbered. Try
running t/test.exe, what I have now works.

~~~
rian
it only works because you're skipping over environment variables:

    
    
        char **argv = &stack;
        char **envp = __environ = argv + ( 1 * sizeof( void* ) );
    

let's say argv = 0x8, then according to this code, on x86_64, envp = 0x48 (0x8
+ sizeof(void * _)_ * sizeof(char *)). here's a sample stack where argc = 1:

    
    
        stack:
        [0x0]  = 1 (argc)
        [0x8]  = "program path"
        [0x10] = 0
        [0x18] = "FOO1=FOO1"
        [0x20] = "FOO2=FOO2"
        [0x28] = "FOO3=FOO3"
        [0x30] = "FOO4=FOO4"
        [0x38] = "FOO5=FOO5"
        [0x40] = "FOO6=FOO6"
        [0x48] = "FOO7=FOO7"
        ...
    

since you set envp to 0x48 it now points to FOO7=FOO7, you've inadvertently
skipped FOO1-FOO6. if argc = 2, then envp would point to FOO6 and you skipped
FOO1-FOO5.

try this with your code, pass 8 arguments to your test. the environment will
point to the last element in argv and then terminate, completely missing the
actual environment. again, that's only the behavior on x86, on x86_64, passing
any argument will cause a segfault.

~~~
oso2k
I think I've fixed it [0]. Ran into the issue where gcc was kindly re-aligning
the stack to 16-bytes on i386 for no good reason (stack was already
aligned?!?!). But that's fixed. Just hope gcc on i386 doesn't stop doing `sub
ESP, 0x1C` upon entry to _start.

[0]
[https://github.com/lpsantil/rt0/blob/master/lib/00_start.c](https://github.com/lpsantil/rt0/blob/master/lib/00_start.c)

------
mmastrac
Nice! As part of some work that I was doing ages ago, I had to build myself a
custom libc to statically link executables that would run on Android and WebOS
since they are both essentially ARM Linux under the hood.

You can learn a lot by writing yourself a libc. Even building a simple/stupid
malloc from scratch is a learning exercise.

~~~
jeffreyrogers
I was shocked how complicated the glibc malloc is. Here's the source for
anyone interested:
[http://repo.or.cz/w/glibc.git/blob/HEAD:/malloc/malloc.c](http://repo.or.cz/w/glibc.git/blob/HEAD:/malloc/malloc.c)

~~~
jacquesm
It's relatively easy to make an allocator, it is really hard to make an
efficient one that works well across a large variety of use cases. That's also
why it can pay off big time if you know more about your use case than malloc
does (and you usually do) to roll your own allocator.

------
oso2k
My eventual goal is to rewrite asmutils'[0] httpd [1] in C using librt0 and
get a binary about 2-3x in size (2-3K). Malloc unnecessary.

[0]
[http://asm.sourceforge.net/asmutils.html](http://asm.sourceforge.net/asmutils.html)

[1]
[https://github.com/leto/asmutils/blob/master/src/httpd.asm](https://github.com/leto/asmutils/blob/master/src/httpd.asm)

------
__gcmurphy
Nice. I went down this rabbit hole a while ago. Didn't get very far though.
Have fun!

(my crappy code -
[https://bitbucket.org/gcmurphy/libc/](https://bitbucket.org/gcmurphy/libc/))

------
olalonde
_Somewhat_ related: a minimal OS that just prints "Hello world" to the screen
[https://github.com/olalonde/minios](https://github.com/olalonde/minios)
(interesting code is in kmain.c and loader.s). Wrote it while going through
[http://littleosbook.github.io/](http://littleosbook.github.io/) (which is
great by the way if you are interested in learning a bit about OS
development).

------
chappar
I am not familiar with embedded asm. Can someone explan what the following
line does?

"register long r10 __asm__( "r10" ) = a3"

~~~
tjgq
It declares a variable named r10 and instructs the compiler to store it in the
r10 CPU register. It's a GCC extension; the farthest you can get in standards-
compliant C is

    
    
        register long r10 = a3;
    

but the register keyword is advisory only (the compiler is free to ignore it)
and you cannot specify the exact register you want to be used.

Reference: [https://gcc.gnu.org/onlinedocs/gcc/Local-Reg-
Vars.html](https://gcc.gnu.org/onlinedocs/gcc/Local-Reg-Vars.html)

------
pc2g4d
What does "SLOC" stand for in this context?

~~~
NhanH
Source lines of code

~~~
pc2g4d
Why not just "LOC"\---Lines of Code? To exclude Makefiles and the like?

~~~
colechristensen
Definitions aren't all that solid, but it's nice to exclude whitespace and
comment lines sometimes.

~~~
oso2k
Yes and it also usually excludes things like macros & define'd constants.

------
pjmlp
Why just not do

gcc -static -Os -fdata-sections -ffunction-sections -Wl, --gc-sections ...

or similar, in the compiler of choice?

------
brudgers
I'll admit my ignorance down at this level. Can someone explain what does and
how it can be used?

~~~
ChuckMcM
libc is C's "standard" library. It has a lot of stuff in it that some
programs, especially small single purpose ones, don't need. So when a very
simple program is linked into an executable, it has a bunch of extra stuff
brought along for the ride. If you write helloworld.c (the canonical 4 line
program in C) and link it on Ubuntu 14.04LTS it is 6240 bytes stripped, 8511
bytes unstripped. With this version of libc you can make it 1/5th that size.

Not a huge deal on large systems with giant disks and memory but a lot of ARM
linux users are rediscovering the joys of small binaries, especially on
limited space eMMC storage for the kernel and all the programs and libraries.

~~~
oso2k
I'd be careful calling what I've done a libc. It's really just a little bit of
startup code, and syscall for i386 & x86_64.

~~~
hornd
It's pretty awesome regardless, nice job.

~~~
oso2k
Thank you.

------
101914
Ancient history question: What was the main problem the creators of the ELF
were trying to address at the time the ELF was adopted?

~~~
DougMerritt
There's a brief explanation here:

[http://en.wikipedia.org/wiki/COFF#History](http://en.wikipedia.org/wiki/COFF#History)

(ELF replaced COFF)

~~~
101914
Shared libraries?

Forwarding to the present day, with ample inexpensive memory and secondary
storage space available, what if for some strange reason I do not use shared
libraries in my project? Is there still a need for ELF? Why or why not?

~~~
DougMerritt
I'm not following your intent. ELF is the current modern standard file format
for specifying executables. It's just how you store executable machine code in
a way that Linux/Unix systems know how to load the program into RAM and
begining execution.

As such, generally speaking, yes, ELF is needed for all machine code
executables regardless of minor details like shared libraries.

Before ELF, COFF was used. Before COFF, Unix used multiple generations of so-
called "a.out" format.

Except as an exercise in retro-computing, one can no longer use those older
file formats because modern tools no longer support them.

If you are unhappy with the complexity of generating ELF, you might look into
whatever Fabrice Bellard did with his remarkably tiny "tcc" C compiler.

Incidentally, in recent years more and more people are starting to ask "do we
_really_ need shared libraries in today's era of huge amounts of RAM?" \--
whatever the answer, it's a reasonable question.

~~~
101914
Your last line more or less captures my "intent".

The idea is that if someone can reconsider the need for shared libraries, then
could she also reconsider the need for ELF?

But maybe there are other pertinent reasons that ELF exists, besides shared
libraries?

~~~
DougMerritt
I already answered that very directly:

> ELF is the current modern standard file format for specifying executables.
> It's just how you store executable machine code in a way that Linux/Unix
> systems know how to load the program into RAM and begining execution.

There is no other way to implement executables. And if there were, it would
look just like ELF, from twenty thousand feet.

ELF has nothing in particular to do with shared libraries, and shared
libraries still exist if ELF does not, they just have some implementation
difficulties in some cases.

It's like saying, what if you didn't need any networking code, would you still
need ELF? Yes you would. What if you didn't need GUI code? Yes, you still need
executables.

You seem to have gotten a misconception about ELF really fixed in your mind.
It is the one and only way to store executables. Without it, you have no
tools, no commands, no apps, no way to run any software, no nothing.

Shared libraries are a small side feature and don't have very much to do with
ELF, despite the fact that ELF made it easier to implement them.

Also, I mentioned that people have questioned the need for shared libraries,
but I fail to see any motivation whatsoever for why anyone might want to get
rid of ELF.

~~~
101914

        What is ELF? (top)
    
       ELF is a binary format designed to support dynamic objects
     and shared libraries. On older COFF and ECOFF systems,
     dynamic support and shared libraries were not naturally
     supported by the underlying format, leading to dynamic
     implementations that were at times complex, quirky, and
     slow.
    

source: netbsd.org

However you said "ELF has nothing in particular to do with shared
libraries..."

Perhaps the concept of a binary format has nothing in particular to do with
support for shared libaries. Perhaps that is a design choice. And perhaps ELF
is merely one approach to the design of a binary format.

If true, then this choice makes a lot of sense in a world where there is a
compelling need to use shared libraries, such as the computing world 20 years
ago. But I am not so sure the same holds true in a world where the need for
shared libraries is "questionable".

You said "Shared libraries are a small side feature..."

If that is true (cites to articles appreciated), then I would be curious what
are ELF's "main" features, and how these differ from the formats that preceded
ELF (e.g., a.out). Also I would be curious how frequently ELF's "minor"
features have historically been used in practice, as opposed to just adding
complexity and taking up space.

------
jestinjoy1
For the uninitiated what does this program do and what is its importance?

~~~
cnvogel
Attempt at an explanation:

To get a C-program to run, even an empty "int main(){ return 0;}" needs some
supplemental code to set things up the way your main() function expects.

This supplemental code is what the github repository provides. You can then
write very small, statically linked, programs without pulling in most of the
C-library and other conveniences, and you will only be able to use the "raw"
interfaces your kernel provides.

E.g. you'll have first to ponder what the "write()" system-call actually is,
then use write(1,"Hello\n",6); instead of the wrappers and convenience of the
C-library such as printf() which of course additionally gives you formatting,
buffered I/O, ...).

