
On programming without malloc - jvns
http://jvns.ca/blog/2013/12/03/day-36-programming-without-malloc/
======
kyork
I write C code for embedded devices and haven't been able to use malloc in
6-ish years. Everything is either statically or stack allocated. It changes
how perceive memory usage and allocation.

For example, it used to bug me that I had to statically allocate memory to
manage an event that was used for maybe 0.1% of the product's life. It seemed
so inefficient. Yes, you could potentially co-opt the event and use it
elsewhere, but then you had to deal with a bunch of other considerations:
could they ever run at the same time?, would the code be maintainable?, etc.

Or the other day I accidentally set a function scoped buffer's size too large
and it cause a stack pointer overflow. That was a pain to debug, because the
exception happened "before" the function started running (in the function call
preamble). From the debugger, it looked like the return of the previous
function caused the issue.

~~~
acuozzo
> I write C code for embedded devices and haven't been able to use malloc in
> 6-ish years.

Where do you work?

~~~
kyork
I work for a home/building automation company. Primarily I write code for
Zigbee wireless devices.

~~~
leeoniya
a bit off-topic, but Zigbee came up in a discussion with a friend recently
when talking about adding wifi functionality to PLCs. he mentioned that Zigbee
devices are pretty much unusable in existing Wifi deployments because they run
over the same bands/frequencies. can you comment on this at all? thanks!

------
a-priori
The simplest allocator ever is this:

    
    
        void *malloc(int size) {
            static long nextAddress = BASE_ADDRESS;
            void *ptr = (void *)nextAddress;
            nextAddress += size;
            return ptr;
        }
    
        void free(void *ptr) {
            return;
        }
        

Using that will last you a long time in writing a toy operating system kernel.
Of course, it will eventually run out of memory pages, but that shouldn't
happen until your system has been running for a long time.

~~~
IgorPartola
You probably want to align the memory you return as well instead of having
size be an arbitrary value.

~~~
gizmo686
What advantage does aligning it have, given that you are already satisfied
with using an allocater that cannot free memory.

~~~
ArbitraryLimits
None on Intel chips, but on virtual any other chip you'd get SIGBUS by
referencing e.g. an N-bit datum not alinged on N/8-byte boundaries.

------
exDM69
Very nice post! It takes special character to be working on your own OS kernel
project, even a very simple one. Using Rust makes this even more interesting,
because you have to know the language quite well to be able to avoid the parts
that require runtime support.

Programming without malloc is a good exercise and can also be valuable outside
of kernel space. Computer programs written in a naive manner may end up
spending most of their time doing memory allocations. This seldom happens in C
programs because usually you're aware of every malloc you do. But I've had a
few Python programs end up uselessly slow because of the time wasted in
allocating small objects. I was doing some physics simulation prototyping with
Python and the result was too slow to do work in (soft) realtime. (although
you can avoid this, you'd be writing non-idiomatic Python code and that
practically destroys any benefit of using it for prototyping).

@jvns: do you have the source code of your mini-kernel project shared
anywhere? Any plans for the future?

[https://github.com/rikusalminen/danjeros](https://github.com/rikusalminen/danjeros)
this is my hobby kernel project from a few years ago.

~~~
jvns
Right now it's a fork of rustboot [1], a proof-of-concept repo for
implementing a kernel in Rust. I'm working on extending rustboot [2] to
support handling keyboard interrupts and to have a real print function. It's
not remotely in a working state yet, but I'll definitely update my blog when I
get keyboard interrupts working.

[1]:
[https://github.com/charliesome/rustboot](https://github.com/charliesome/rustboot)
[2]: [https://github.com/jvns/rustboot](https://github.com/jvns/rustboot)

~~~
jvns
> you have to know the language quite well to be able to avoid the parts that
> require runtime support.

This isn't really true, it turns out :) I've only spent about a week using
Rust. I spend a ton of time in #rust asking questions, and everyone's been
really helpful. The maintainer of rust-core (the library that lets you not
require runtime support) hangs out in #rust all the time and has been
wonderful about answering all my questions about it.

~~~
exDM69
> > you have to know the language quite well to be able to avoid the parts
> that require runtime support.

> This isn't really true, it turns out :) I've only spent about a week using
> Rust. I spend a ton of time in #rust asking questions, and everyone's been
> really helpful.

Rust is perhaps easier in this aspect than pretty much any other high level
language out there. The fact that Rust can operate without a fully featured
runtime system makes it one of the most interesting new languages in my
opinion. The different "pointer types" in Rust make this possible, it's a bit
confusing to start with but enables nice things on the other hand.

It is nice to see someone exploring alternatives to memory management other
than malloc, reference counting and full GC. It's also nice that Rust hasn't
really decided on which way to go and have been doing some outstanding
exploratory work on this field.

Oh yeah, and a good IRC channel with helpful people is a very valuable
resource.

------
sixbrx
I've found Rust to be great at avoiding allocations.

You can return a pre-allocated buffer to be shared by _all callers_ from a
function in a safe way that prevents interference between callers. The callers
only get to keep it for a limited time (region, actually), but usually that's
no issue.

In exchange, the callers are gauranteed that the returned buffer contents
can't be modified by any other part of the program while it is borrowed.

Here's some code for multiplying some polynomial vectors which uses this
style, without allocating (except for a single initial allocation of the
buffer):

[https://github.com/scharris/WGFEM-
Rust/blob/master/weak_grad...](https://github.com/scharris/WGFEM-
Rust/blob/master/weak_gradient.rs#L275)

Line 275: The structure holding the buffer and on which the operations are
implemented.

Line 287: the buffer field

Line 297: The operation returning a structure (by value) which has an
immutable "borrow" of the shared buffer. It's lifetime is tagged with lifetime
'a, tied to the lifetime of the implementing object.

Line 319: The buffer being included in the return value.

I really like the pattern. Get the efficiency of not allocating, kindof like
old school libraries like LAPACK, without doing things like writing over your
inputs :)

~~~
blt
one buffer per thread, I hope?

~~~
sixbrx
Right, that's a natural outcome of Rust's task model. Any borrowed pointer or
structures containing such are never "sendable" which are the only things
allowed to cross task boundaries, enforced by the compiler.

------
adam-f
A neat trick (tm) for allocating a linked list: recursively call a function
which allocates a node on the stack, hooking them up in the function... call
something else and pass the list to it.

at the exit condition (whatever that is), exit the recursion and call
something else with your new linked list.

(I used something similar to keep track of scope while recursively evaluating
in this toy language I made once.)

~~~
jvanenk
A friend of mine wrote something up on this technique a while ago:
[http://spin.atomicobject.com/2012/11/01/hey-c-is-a-
functiona...](http://spin.atomicobject.com/2012/11/01/hey-c-is-a-functional-
language-too/)

~~~
justinhj
The book Modern Compiler Implementation in C has a very functional style,
written in 1997

~~~
silentbicycle
There are editions of the book using ML and Java. Since Andrew Appel is
involved with SML/NJ, the ML edition is most likely the original. (The ML one
is also quite good.)

------
sliverstorm
Considering how much memory modern machines have, and considering she is
running no other programs, I would be so tempted to write malloc like this
just for grins:

    
    
        int malloc() {
          return rand() % TOTAL_MEMORY;
        }

~~~
munificent
You jest, but you'd actually get surprisingly far just doing:

    
    
        void* malloc(size_t size) {
          static void* everything = 0;
          void* result = everything;
          everything += size;
          return result;
        }
    

When people are hacking together VMs for garbage collected languages, the
first allocator often looks like this until they get around to actually
_collecting_ the garbage.

~~~
apgwoz
You can get `free` pretty simply too:
[https://news.ycombinator.com/item?id=6849471](https://news.ycombinator.com/item?id=6849471)

------
angersock
Ah, good ol' malloc.

You are wise to implement malloc once...you are a fool (or a masochist) to
implement it twice.

~~~
tomlu
I've implemented it thrice. Easy to get working, surprisingly tricky to
balance all the space/performance requirements. You do get better at it!

------
MrBuddyCasino
A bit OT, but this is the first time I've heard about Hacker School. Is it as
awesome as it sounds? If I wanted to join, how much money would I need to stay
in New York for 3 months?

~~~
jvns
> Is it as awesome as it sounds?

Yes.

> How much money would I need to stay in New York for 3 months?

About $1000/month for rent, give or take, and $112 for a metrocard. Plus
whatever you spend on food (depends how much you cook) and entertainment.

~~~
jamii
Depending on the level of comfort you are used to, you can go a lot cheaper. I
shared a room in Bed-Stuy for $350 / month and biked to Hacker School.

I did then blow the money I'd saved by eating out every day, but that's a
different matter.

------
ww520
Memory allocation in kernel needs special care. Usually the plain malloc() is
not enough, once interrupt and memory paging are in place. Since your piece of
the kernel code can run in different interrupt level, you really need a
pageable memory allocator and a non-pageable allocator. At a high level
interrupt (like hardware interrupt), the memory paging service won't get a
chance to run and accessing a pageable memory might crash the system. You want
to allocate non-pageable, pinned down memory for your code that runs in high
interrupt level. For low interrupt level code, it's fine to use pageable
memory.

That's why kernel mode development is a lot more difficult and fun.

------
jlongster
I've been working on LLJS and the current version (which targets asm.js)
doesn't implement dynamic allocation yet. I've been making all kinds of demos
without malloc and it has been surprisingly fun!

------
asveikau
When I was working on a hobby kernel, I'll confess I just took dlmalloc and
called it a day. There are loads of interesting problems in kernel development
(mapping pages, doing crazy stuff in page fault handlers, synchronization,
filesystems) but somehow I didn't find malloc to be one of them. I guess
anyone reasonably skilled at C can write a bad malloc pretty quickly, and once
you get to that point you don't really learn that much by putting more effort
into it.

~~~
joosters
There's a huge depth to a good malloc() implementation. While you're wise to
avoid it when your interests are focused on kernel development, I'd recommend
every programmer to have a go at writing malloc/free one day. Avoiding
terrible performance problems is pretty tricky. At the very least, you'll gain
a lot of respect for memory allocation and will discover that there's a lot of
work going on behind the scenes when your code tries to allocate only a few
bytes of memory.

------
gz09
If you don't want to use malloc in a kernel, that is fine. However, you should
at least consider having something similar. This can be as simple as finding
the available memory in your system and writing a frame or slab allocator. It
will make your life much easier. It would be interesting to see how easy it is
to integrate your memory manager directly with the rust language.

------
majke
Interesting. Golang _runtime_ requires malloc, so even if used carefully I
doubt you could use go in this way.

~~~
apgwoz
But, malloc doesn't have to be significantly advanced. In a situation where
you have a large chunk of memory unallocated, the simplest thing to do is just
to "bump" allocate it. Freeing it is a bit of pain of course, but you could
build a freelist out of the chunks that are free.

So, the suggestion looks like this:

    
    
        struct freemem {
          int len;
          char *addr;
          struct freemem *next; 
        };
        // globals
        int total_mem_size;
        char *start_of_memory;
        char *bump_pointer;
        struct freemem *freelist;
    

Allocate is simply:

    
    
        if bump_pointer + MIN(size_to_allocate, sizeof(struct freemem)) < total_mem_size:
          int *b = (int *) bump_pointer;
          *b = MIN(size_to_allocate, sizeof(struct freemem);
          return b + sizeof(int)
        else
          iterate over freelist, checking for size_to_allocate < freelist->len
          return freelist->addr (the *b business is already taken care of)
    

Deallocate simply turns the discarded memory into a struct freemem and
prepends it to the freelist, setting addr = the discarded address, and len =
*(addr - sizeof(int)) (it's for this reason that we allocate a minimum size of
sizeof(struct freemem))

That's a basic malloc / free.

~~~
munificent
> Freeing it is a bit of pain of course, but you could build a freelist out of
> the chunks that are free.

If you want a much longer explanation of this technique, I wrote a chapter
about object pools[1] that discusses exactly this[2].

    
    
        [1] http://gameprogrammingpatterns.com/object-pool.html
        [2] http://gameprogrammingpatterns.com/object-pool.html#faster-particle-creation

------
dradtke
I'm really looking forward to Rust 1.0. It seems like an awesome language in
many respects, not least of which because it's possible to write operating
systems and other low-level software in, but the lack of stability will be
kind of annoying for a while.

------
goldenkey
Does Rust not have dynamic heap allocation (malloc)? An explanation to
accompany the article would be nice. Or at least a comprehensive response to
my comment ;-)

~~~
cespare
Rust uses malloc provided by the system. It's mentioned in the article:

"I actually do have a malloc function because my Rust standard library needs
to link against it"

He hasn't implemented malloc in his kernel yet (except for a stub).

~~~
kibwen
> Rust uses malloc provided by the system.

It doesn't lock you into the system allocator. You can LD_PRELOAD your own,
and in the past Rust has even shipped with and used jemalloc[1] (though I
believe it's not using it at the moment). Furthermore, as alluded to at the
end of the article, you can override the compiler's malloc implementation via
what Rust calls "lang items" (which are sadly ill-documented).

> He hasn't implemented malloc in his kernel yet (except for a stub).

Rather, Julia hasn't implemented malloc in her kernel yet.

[1] [http://www.canonware.com/jemalloc/](http://www.canonware.com/jemalloc/)

~~~
pekk
> Copyright © 2013 Jason Evans

I think you could forgive people for not knowing that Jason and Julia are the
same person? (that's what you are implying, right?)

~~~
kibwen
You appear to be falling victim to the vagaries of footnote notation. Blame HN
for not giving us Markdown with which to embed links!

------
e12e
Hm, is there a reason why you've not declared something like "const VGABUF:
int = 0xb8000;" (like I assume VGA_WIDTH is defined?)?

    
    
        pub unsafe fn putchar(x: u16, y: u16, c: u8) {
            let idx : uint =  (y * VGA_WIDTH * 2 + x * 2) as uint;
            // 0xb8000 is the VGA buffer
            *((0xb8000 + idx) as *mut u16) = make_vgaentry(c, Black, Yellow);
    }

------
com2kid
As others have stated, embedded programming generally avoids malloc.

Having all one's memory usage laid out at compile time really does provide
some benefits. For one thing, it makes keeping track of who is wasting memory
really easy! It is far too common to over allocate a buffer when initially
developing code and to not tune it in later. Indeed, having to justify every
allocated byte means developers will sit and do math and think about their
code before writing it.

Odd as it may sound, this takes a fraction of the time that tracing down a
memory leak does! Which is where the other huge savings comes in, no more
memory leaks!

That of course leads right up to the other reason why heap-free programming is
so common in embedded: Stability.

Memory leaks are a very common class of bug, for deeply embedded devices, it
is not reasonable to have the user power cycle. Avoiding memory leaks removes
one huge class of bugs. Add to that the lack of threads, and you knock out a
second large class of bugs. In the end, one's test surface is greatly reduced!

That said, it is an interesting experience, I went from C# to embedded C# (yes
it exists! .Net Micro Framework) which generally avoids allocations if
possible, to straight embedded C and C++.

Some wise guy may now remark about smart pointers, to which I'll reply "my
stack is 4k, go away!" Allocating on the stack can still fail with an OOM!
Static allocation fails with an OOM at compile time! (Of course it is still
possible to blow one's stack, but it is a lot harder when you aren't
allocating structs or classes on it!)

Oh, and finally, a _great_ interview question is to ask someone where in
memory statics are stored. Heck just ask what are the types of memory
allocation in C or C++. The percent of candidates that can pass this is quite
low! Bonus points if the candidate knows that static consts may be stored in a
separate read only region! Super extra bonus points if they list uninit'd and
initialized data! That is very much OS and linker options dependent however,
assuming one even has an OS!

80%+ of candidates will look at code such as

    
    
        static int16_t value = 5;
    

and, when asked where it is stored at, will answer either heap or stack. (Not
quite sure why people say stack.)

Really, what you want is to look at a diagram like the one at
[http://www.geeksforgeeks.org/memory-layout-of-c-
program/](http://www.geeksforgeeks.org/memory-layout-of-c-program/)

And again, extra credit if they can talk about the various segments on their
OS of choice. The above linked most certainly varies by platform, but anyone
who has a true understanding of how one platform is laid out will be able to
easily comprehend how other platforms go about things.

On a more practical note, all this came in handy when using C++ on a platform
that didn't call constructors on statically declared classes! I opened up the
debugger, looked at my pointer, say a bunch of 0's, then my class data, and
realized I had no v-table! (I was crashing whenever I called any virtual
functions.) Suffice to say, understanding what a v-table is, and how it
worked, and how compilers created them, helped a ton. I then went off to ask
the gentleman who wrote our run time why I had no v-table, well, turns out he
hadn't written the code to go about and call constructors!

It is good to know that all of the "magic" that happens behind the scenes is
not magic at all, but perfectly comprehensible code written by a developer
just like you or I!

------
optymizer
Implementing malloc() was one of the first assignments in my OS class. A
linked list with some header information about each allocation will get you a
long way before you need a better algorithm.

