
Writing C software without the standard library - andreaorru
http://weeb.ddns.net/0/programming/c_without_standard_library_linux.txt
======
userbinator
_A value in the range between -4095 and -1 indicates an error, it is -errno._

The syscall/errno stuff has always seemed unusual, inelegant, and inefficient
--- instead of just returning a negative error code directly, the function
returns the vague "an error has occurred" -1, and you have to then check errno
separately after that. It only adds insult to injury when you realise that the
kernel itself isn't doing it, but the syscall wrappers. And thanks to POSIX
standardising this mechanism, the alternative will likely never get much
adoption; of course, if you write your own syscall wrappers like this article,
then you can skip that bloat.

 _For now this guide is linux-only, but I will be writing a windows version
when I feel like firing up a virtual machine._

Unfortunately the Windows syscalls are not officially documented and even less
stable than on Linux, changing even between service packs.

[http://j00ru.vexillium.org/ntapi/](http://j00ru.vexillium.org/ntapi/)

[http://j00ru.vexillium.org/ntapi_64/](http://j00ru.vexillium.org/ntapi_64/)

At least on Linux the first few (i.e. the oldest, most common and useful)
syscalls have not really moved around over the years:

[https://filippo.io/linux-syscall-table/](https://filippo.io/linux-syscall-
table/)

~~~
masklinn
> At least on Linux the first few (i.e. the oldest, most common and useful)
> syscalls have not really moved around over the years

IIRC raw syscalls are an officially supported kernel API, that's why you can
have alternate libc implementations (e.g. musl), and Linux is an oddity in
that, on most systems even if the syscalls are fairly stable there are no
actual guarantees with respect to them, and the only officially supported
interface to the kernel is the standard library. OSX does not allow statically
linking libSystem for that reason for instance.

~~~
sebcat
golang also targets syscalls instead of the C standard library (or other
libraries except for windows, and maybe others), which is interesting on e.g.,
Darwin:

[https://github.com/golang/go/issues/17490](https://github.com/golang/go/issues/17490)

~~~
bogomipz
>"golang also targets syscalls instead of the C standard library"

What would have been the reasons for targeting syscalls directly instead of
the C standard library?

~~~
masklinn
> What would have been the reasons for targeting syscalls directly instead of
> the C standard library?

Since the standard library probably assumes (and requires) a C stack, linking
against the standard library would require cgo (or some other specific
workaround) on non-linux platforms.

------
vxNsr
He claims that your code will be easy to port but then goes straight to Linux
system calls.

Still I like the idea. This is something that should be covered in a CS 102
type course. I know way to many cs guys who have no idea how to debug, let
alone how their is being implemented.

~~~
lolisamurai
Suppose you want to port to some architecture not supported by libc. If you
were using libc you would have to find a replacement that works and targets
that arch or port libc yourself. If you wrote everything from scratch,
instead, you just have to read the specification and add support to your code.
That's what I meant.

Of course, if your target archs are all supported by libc, porting is much
easier with libc.

~~~
time4tea
Well, not if you expect that you will have exactly the same time/space
behaviour and pre/post conditions.

This is why lots of embedded, secure and defense software doesn't use standard
libraries, and printf will instead be called sio04583 or something....

Also.. Don't forget that you can get gcc to get rid of unused code when
compiling static executables, and use sstrip (yes, two 's' \- it's a different
program) to strip even more, if it's an ELF binary...

------
leeter
A few thoughts:

* The space savings are moot as other processes such as the daemons are going to load libc into virtual memory anyway, and the kernel shares libc's page among all processes.

* This adds a lot of LOC you have to maintain, instead of shoving it off on the compiler/libc vendor, this increases the chance of bugs.

* This will prevent the use of VDSOs to optimize high volume system calls like gettimeofday.

* It's still probably good to know how these happen, even if you're not doing them yourself.

* The only place this would really see benefit is in a single process environment, however in those cases I would suggest a unikernel anyway for simplicity sake.

~~~
taeric
I'd be interested to see benchmarks to consider point one. At face value, I
fully agree with your point and doubt it has any benefit. However, it is less
that has to be swapped in for your program to run. Would be curious if this
has an odd cache friendliness for an application.

The tooling answer to this, it seems, would be to support statically linked
libraries. But again, I would want to see numbers before personally worrying
about this.

~~~
leeter
So a few things to consider in regards to point 1:

* Your executable is going to be loaded as a whole page regardless of how large it is, on most platforms this means you'll need at least 4k of User VM.

* You'll need a page table, which has its own overhead. If someone was going to push back on the libc assertion I'd expect it here, a the PTEs for libc and and the VDSOs cannot be shared between processes (as far as I'm aware).

* I would expect it to in theory RUN faster assuming it was a small toy program like the example, this is because there is less work to be done even with shared pages.

~~~
taeric
Right, my question is along the lines of avoiding the pages of libc. An easy
question here would be how many pages libc takes up. I'm assuming not many,
but more than one.

I think there is a strong argument that this page is often already paged into
memory from everyone using it. However, if the function you used from it would
have fit in the pages you were already using for your application, I could
imagine some benefit.

I continue to stress, though, that this is just imagined. Numbers would be
first thing I would have to collect before acting on this. (And I hope it
doesn't sound like I am tasking you or anyone else with this. That is not my
intent.)

~~~
leeter
At least two is the best answer I can give without specifying a specific libc
and architecture. Something to think about is that the binary size of libc is
only half the story. Even if the executable portion of libc fits in a single
page. A libc implementation has a lot of per thread and per process statics it
holds onto.

A good libc developer could actually put these into separate pages based on
how often they change. In other words, if a static only is ever set once then
coalesce it into a page with other statics that are only set once and are not
process dependent.

Why that's important: because of fork, when fork creates a new process it sets
the parent processes pages read only and then preforms copy on write when they
are modified. In theory you can share both the binary and some of the statics
between all the processes.

------
dvfjsdhgfv
The guy is definitely a fan of old-school minimalism:
[http://weeb.ddns.net/0/articles/modern_software_is_at_its_wo...](http://weeb.ddns.net/0/articles/modern_software_is_at_its_worst.txt)
I have to say I miss the old days of Gopher, too. It was so much easier to
focus on the content back then.

~~~
badsectoracula
I find myself agreeing with him 100%. Not just on Gopher (although i did write
a Gopher client some time ago -
[http://runtimeterror.com/tools/gopher/](http://runtimeterror.com/tools/gopher/))
but on the entire rant about wasting resources, UIs that waste screen real
estate and become unusable in smaller resolutions, fonts that only look good
with anti-aliasing and have weird misplaced pixels with antialiased disabled
(which i also do), websites that make reading things harder and waste
unnecessary resources on bling and fluff with all those javascript frameworks
slowing them down, pagination often being replaced with "endless scrolling"
which makes it hard not only to skip large bits of content but also hard to
see how much content is there in the first place. And of course the newest
worst trend of all - using an entire web browser as a UI framework for a text
editor (i mean honestly, how people decided that HTML and CSS are the best
technologies to use as the foundation for user interfaces?).

The only bit i'd disagree would be with games, at least on AAA games since i
have some experience there and -at least at the engine level- there is still a
lot of low level wizardry being done there.

~~~
lj3
> The only bit i'd disagree would be with games, at least on AAA games since i
> have some experience there and -at least at the engine level- there is still
> a lot of low level wizardry being done there.

He explicitly calls out people "writing poorly performing code on top of pre-
made engines". I'm pretty sure he's aiming most of his ire at non-indies using
Unity3D or Unreal.

~~~
omegaham
Personally, I don't mind this at all, mostly because they wouldn't be writing
_anything_ if it weren't for Unity and Unreal.

~~~
Retra
They could be decomposable libraries rather than singular massive engines.

------
nwmcsween
The comment section where gcc puts in ident info can be omitted with -fno-
ident and syscall(2) is usually a very thin wrapper[0]. If you follow the musl
syscall(2) it simply maps errors to errno[1] and uses the fancy count-args-in-
macro[2] to call off the respective $arch/syscall_arch.h[3] syscall$n numbered
functions.

[0] [https://git.musl-
libc.org/cgit/musl/tree/src/misc/syscall.c](https://git.musl-
libc.org/cgit/musl/tree/src/misc/syscall.c)

[1] [https://git.musl-
libc.org/cgit/musl/tree/src/internal/syscal...](https://git.musl-
libc.org/cgit/musl/tree/src/internal/syscall_ret.c)

[2] [https://git.musl-
libc.org/cgit/musl/tree/src/internal/syscal...](https://git.musl-
libc.org/cgit/musl/tree/src/internal/syscall.h)

[3] [https://git.musl-
libc.org/cgit/musl/tree/arch/x86_64/syscall...](https://git.musl-
libc.org/cgit/musl/tree/arch/x86_64/syscall_arch.h)

~~~
justincormack
Highly recommend reading the Musl source code if you want to find out how
things work. Don't bother trying to look at Glibc.

~~~
shakna
Cannot agree more.

glibc's source feels like archeology: there's so much history and remnants of
bygone eras.

musl feels like a structured, well-engineered and specified piece of
architecture.

This isn't a knocj against glibc. But it was grown, not built.

musl and the team's great documentation have been incredibly handy when I've
been building tightly constrained applications.

------
beeforpork
If the asm was written a little more cleverly, the syscalls would avoid almost
all moves, because the compiler'd put everything in place:

    
    
      _syscall5:
        mov %r9, %r10
      _syscall3:
        mov %rcx, %rax
        syscall
        ret
    

And then:

    
    
      extern unsigned long _syscall3(
        unsigned long, unsigned long,
        unsigned long, unsigned long);
    
      extern unsigned long _syscall5(
        unsigned long, unsigned long, unsigned long,
        unsigned long, unsigned long, unsigned long);
    
      #define syscall0(NUM)             _syscall3(0,0,0,NUM)
      #define syscall1(NUM,A)           _syscall3(A,0,0,NUM)
      #define syscall2(NUM,A,B)         _syscall3(A,B,0,NUM)
      #define syscall3(NUM,A,B,C)       _syscall3(A,B,C,NUM)
      #define syscall4(NUM,A,B,C,D)     _syscall5(A,B,C,NUM,0,D)
      #define syscall5(NUM,A,B,C,D,E)   _syscall5(A,B,C,NUM,E,D)

~~~
ant6n
Is that going to be inlined properly?

------
oso2k
A little self promotion but mostly because it addresses some of the other
commenters concerns about malloc (or the lower-level api around sbrk): a
couple years ago I wrote rt0 [0], a small (mostly minimal) C runtime for i386
& amd64 that makes it easier to replace libc & crt0 (as long as you have the
kernel headers installed). Also, as part of the examples, I wrote wrappers
around the sbrk syscall. Pretty easy to do and all documented in the repo. I
expect to eventually port the lib to arm (raspberry pi) and aarm64. There's
also lots of references to other small c runtimes. I'll be adding this one as
well.

[0] [https://github.com/lpsantil/rt0](https://github.com/lpsantil/rt0)

------
pawadu
I think this is unnecessary when you got <stdint.h>:

    
    
        typedef unsigned long int  u64;
        typedef unsigned int       u32;
        ...
    

if you define your own types like this you may need to revise them when you
switch architecture or even compiler.

Now you could argue that this is part of the standard library, but I actually
see it as a part of the standard C language.

~~~
DSMan195276
You're getting down-voted (Which is unnecessary IMO) but you're really not
wrong. `gcc` itself provides `stdint.h`, it's not actually part of libc - or
rather, you can use it without actually having a libc in place. Generally this
is a good move, because you can always write a `stdint.h` replacement on
arch's that don't have one, but on ones that do you're guaranteed to get the
types correct.

~~~
sincerel
I use stdint.h too, but I'm honestly curious if there is any _common_ platform
around today where one of the following asserts fails:

    
    
        int main() {
    
            assert(sizeof(signed char) == 1);
            assert(sizeof(short)       == 2);
            assert(sizeof(int)         == 4);
            assert(sizeof(long long)   == 8);
    
            return 0;
        }
    

I'm not interested in the language lawyering, because yes I know the standard
provides more freedom to compilers. I just think those definitions are very
universal for any real computer that would otherwise run my software. Please
don't bring up Windows 3.1, that's about as relevant to most of us a PDP-11.

And for what it's worth, using typedefs based on the above provides more
readable printf strings. This is hideous:

    
    
        int main() {
    
            int64_t portable = 123;
            printf("Ugly: %" PRId64 "\n", portable);
            
            return 0;
        }
    

Where this is acceptable:

    
    
        int main() {
    
            long long palatable = 123;
            printf("Better: %lld\n", palatable);
            
            return 0;
        }

~~~
jblow
I think int is 8 bytes on the PS4. (I just got bitten by this...)

~~~
I_deny_it
How does the PS4 declare a 4 byte int? If that's "short", is there a way for a
2 byte int?

~~~
DSMan195276
I haven't programming on the PS4, but an 8-byte `int` sound very suspect and
fairly unlike (but not impossible). That said, it wouldn't be a _huge_ issue.
`short` could either be a 2-byte or 4-byte int (Either would be standards
compliant), and `char` would presumably still be byte-sized (Not doing so
would be a fairly big issue to deal with).

That leaves out either the 2-byte or 4-byte int from the standard data-types,
but you can gain that back by simply using a compiler attribute or compiler-
defined type to allow access too it. While that sounds non-standard, it really
wouldn't be that bad because it could simply be used in `stdint.h` to expose
the standard `int16_t` and `int32_t` types, which could be used like normal.

------
nathan_f77
> When we learn C, we are taught that main is the first function called in a C
> program. In reality, main is simply a convention of the standard library.

Well, I've already learned something new. I assumed that convention was from
the compiler. This is a great resource.

------
gibsjose
While this seems mainly useful as an academic exercise, the `printf "#include
<unistd.h>" | gcc -E - | grep size_t` bit to easily grep in header files was
worth the read.

~~~
112233
indeed. under bash, it can be shortened to

cpp <<< "#include <stdio.h>"|grep size_t

which is super convenient

------
rikkus
It's interesting to read the sources[1] of lots of djb's[2] code, as he often
works around problems with (or perhaps dislikes the style of) standard
libraries by re-implementing parts.

[1]
[https://github.com/abh/djbdns/blob/master/str_len.c](https://github.com/abh/djbdns/blob/master/str_len.c)
[2] [http://cr.yp.to/djb.html](http://cr.yp.to/djb.html)

~~~
csl
On a side note, isn't the choice of exactly _four_ unrolls very architecture
specific? As in, it works, but may be sub-optimal for your specific machine.
I've done the exact same thing myself, and IIRC its performance varied a lot
between which ISA it was compiled for.

This is _almost_ what Duff's device solves, except then you need to know the
length beforehand.

~~~
rikkus
Absolutely. It's a (possible) optimisation that is either based on evidence
(seems likely, because DJB) or hope. Actual behaviour is impossible to predict
on untested platforms.

My assumption is that DJB tested this locally and found enough of a speedup
that it was worth it, considering the very low added complexity and risk of
major degradation / defects on untested platforms.

------
capnfantasic
Fantastic until you need to malloc. You're reimplementing libc, but at least
you know what's going on at every level.

~~~
std_throwaway
Unless you also need to free, it's pretty simple.

~~~
capnfantasic
Of course.

When I wrote that comment I asked myself should I have written "malloc" or
"malloc/free" \- surely one implies the other.

~~~
dom0
Oh well, with 16 GB RAM even in laptops, who needs free anymore? Just restart
the program. It's simpler anyway.

~~~
makapuf
ironic for a minimalist/anti bloat pamphlet to start with "with 16GB ..."

------
coreyp_1
It's posts like this (and the accompanying comments) that make me realize how
much I still have left to learn!

One of the reasons that I love HN is how informative you all are!

------
lolisamurai
The server is getting hit pretty hard right now, did not expect this much
traffic. In the meantime, you can find a bbcode mirror of the guide here:
[https://ccplz.net/threads/writing-c-software-without-the-
sta...](https://ccplz.net/threads/writing-c-software-without-the-standard-
library-linux-edition.69623/)

------
kriro
Some of the reasons that he mentions for avoiding the standard library could
also be mitigated by using another library like dietlibc (I played around with
it back in the day, last release seems to be from 2013):
[https://www.fefe.de/dietlibc/](https://www.fefe.de/dietlibc/)

------
partycoder
This is required if you do systems programming (e.g: kernel development).

~~~
lokedhs
Or demo coding on old hardware. I sometimes write demos for the Atari ST
(68000-based home computer launched in 1985), and the modern way of doing that
is to develop on a modern computer, and cross-compile to a native ST
executable.

The main loops are all assembler, but the support code is in C, but the C code
is used as a more expressive assembler, and linking with libc requires way too
much memory.

All this means that I don't even have things like memcpy() available. In a way
it's a quite liberating way to program, since you are in full control of the
hardware.

I guess yesterday's computers is today's embedded hardware.

~~~
DaiPlusPlus
I'm curious what the use case of memcpy is in highly-optimised software. Are
there any scenarios where copying bytes is better than using a char*+length
tuple?

~~~
unwind
There can be hardware-driven requirements that force you to simply have data
in a particular place in memory, and if you want that data to hang around you
might need to manually move it somewhere else.

There's also the case where your API accepts a pointer and a size, but you
don't want to have lingering pointers into the caller's memory, so you have to
copy the data over to the "inside" of the API. This kind of design is perhaps
less common in demo software, but certainly plausible in embedded products
which at least _try_ to be somewhat optimized.

~~~
lokedhs
That is exactly it. For example, on the Atari ST you display graphics by
copying the bitmaps to the screen address.

Much of the C code is used during precomputation of data before the actual
time-critical code is run. This involves copying lots of data in order to set
it up so that as little computation as possible is performed in the actual
time-critical parts.

------
flukus
> Executables are incredibly small (the http mirror server for my gopherspace
> is powered by a 10kb executable).

Is this ever an real issue, even on any embedded system in the last 20 years?

~~~
shakna
AVR atmega based devices generally have about 2kb SRAM, and 32kb flash. Maybe
2kb EEPROM.

~~~
m_eiman
This is true, but the stdlib provided by the compilers aimed at these chips is
usually very bare bones and size-optimized to begin with. So it's likely hard
to save much space by reimplementing subsets of it.

~~~
shakna
True, but you often end up avoiding parts of the stdlib like malloc anyway,
because they tend to be heavy handed on the board.

(I've never needed to remove stdlib yet).

------
nitwit005
I've tried this myself. What you'll run into is that you tend to need a few
things that are non-trivial:

An implementation of malloc/free

Functions to parse and print floats (somewhat system dependent)

Assembly implementations of any trigonometric functions used

While there is code that goes to that effort (The Go runtime comes to mind),
it's quite a pain for "normal" code.

------
bogomipz
I had a question about this sentence:

"It's often necessary to either push useless data or simply align the stack
pointer when the pushed values don't happen to be aligned."

That's kind of hand-wavy. How do we "simply align the stack pointer"?

~~~
JoeAltmaier
On most architectures, by decrementing it appropriately. E.g. subtract 4 to
align from a 4-byte to an 8-byte boundary.

~~~
bogomipz
Thanks for the responses.

------
sytelus
It would be wrap this up in lightweight libc. There is uSTL for C++:
[https://msharov.github.io/ustl](https://msharov.github.io/ustl)

------
jxy
It's a very good learning process. But once your project scales up, you are
essentially writing your own libc.

And there is no portability. It only works with the specific architecture's
calling convention and the specific c compiler.

------
dispose13432
> xor rbp,rbp /* xoring a value with itself = 0 */

Is this faster than a (const) mov ?

~~~
capnfantasic
Traditionally yes. On the latest CPUs - who knows.

~~~
SonOfLilit
Probably yes, because Intel knows this is the code every compiler outputs for
zeroing a register.

Also, the reason it is "faster" is that the encoding is 1 byte, vs. 9 bytes
(in 64 bit) for "mov rbp, 0" \- roughly, 1 for "mov rbp,", 8 more for a 64 bit
"0".

~~~
bonzini
Technically you could get by with 5 bytes for "mov ebp, 0".

Another reason why it was faster was that the processor recognized it and
avoided partial flags stalls after an "inc". But in 64-bit code you rarely
have "inc" at all, so it matters less. On the other hand, a few years ago XOR
had a false dependency on the register you're clearing; I'm not sure it is
still that way on more recent processors.

~~~
SonOfLilit
I tip my hat to you, your analysis is far more interesting than mine.

~~~
bonzini
Wrong too, it's partial register stalls not partial flags stalls.

------
thewavelength
What is necessary to do this with C++? Is there a tutorial available on the
web?

~~~
posterboy
behold the demoscene
[https://in4k.github.io/wiki/c-cpp](https://in4k.github.io/wiki/c-cpp)

focused on windows, because grafix. also, linux guys are more likely to use C
anyway.

I only remember a comment about avoiding exceptions, which I might have read
first in the context of micro controllers.

[https://www.google.de/search?q=site%3Apouet.net+c%2B%2B+exce...](https://www.google.de/search?q=site%3Apouet.net+c%2B%2B+exception)

[https://www.google.de/search?q=site%3Apouet.net+c%2B%2B+stdl...](https://www.google.de/search?q=site%3Apouet.net+c%2B%2B+stdlib+or+standard+library)

------
DaiPlusPlus
Your first paragraph makes me wish this site supported Markdown.

~~~
BinaryIdiot
Yeah off topic but if HN supported markdown (and GitHub's flavor of markdown
so we could take different types of syntax with their language) would be
amazing. There would be a _ton_ more coding examples and discussion, in my
opinion, if this were to happen.

How do we summon Dang? :)

~~~
posterboy
my netiquette says, more than about three lines of code should go in a
pastebin anyway

~~~
lsaferite
Then in a few years when someone is reading old posts the links to old
services are broken or the service is gone. If the code is central to the
comment, why would you put it some place other than the comment? If it's not
small enough you can collapse it inline and if it's so large that you don't
want it with the comment then perhaps you should be rethinking what you are
posting.

tldr; practicality and longevity should trump netiquette.

~~~
BinaryIdiot
I agree and this was my original thought. For example Reddit, for most of its
life, relied on imgurl for hosting images. Before the single place images
constantly broke after services died. With imgurl being third party even it
has led to broken images at times (though with significantly less frequency).
Now reddit is doing their own.

Hacker News is very technical and code heavy. Seems to make sense to me that
some may want to communicate / discuss code itself. I could even see it
opening up more conversations like "this is my implementation of X; thoughts?"
or "do this in any other language" challenges.

------
clifanatic
Interesting - my McAfee web washer blocked this site. Don't know why.

~~~
valarauca1
A lot of web/internet filters block ASM related content. My company uses
Barracuda Networks filter and most ASM references/content are blocked, Reason:
_Hacking_

~~~
mjevans
They're not technically wrong... it's just 'hacking' in the traditional black
magic/voodoo manipulation of actual systems components sense. That is,
literally taking a hacksaw to a circuit board and altering it or making a new
board.

------
taocipian
the C standard library is not perfect but good enough

------
00k
An essential function of stardard library is to wrapper the syscalls. Besides
that, you can make a live without the library. But why would you do that?

------
eliangidoni
I can't believe this post has 409 points. Are we in the 80's again ?

------
SFJulie
Myth busted : printf("Hello world") is simple and is a relevant C program for
a beginning.

The "hello world" example is just the first step to annihilate your capacity
of understanding how thinks works by relying on institutional black magic,
that maybe wrong

(see all the scanf bugs that have been living in C code for so long and all
bugs coming from respecting the old's man wisdom)

