
Malloc Challenge - codr4life
http://vicsydev.blogspot.com/2016/11/the-malloc-challenge.html
======
greenshackle2
>[libc4life] is aiming for simplicity and leverage; and it makes a real effort
to get there by playing on C's strengths, rather than just inventing yet
another buggy Lisp.

Ouch, right in the feels, I've been working on
[https://buildyourownlisp.com](https://buildyourownlisp.com) in my spare time.
(EDIT: I was looking for a name for the repo, YABLisp it is.)

> coding in C is a welcome therapy after seemingly wasting years exploring
> various ways of pretending the hidden complexity in my stack was someone
> else's problem

I've noticed several older talented programmers express similar feelings. I
was watching Casey Muratori's Handmade Hero stream, where he writes a game in
C from scratch, and he said, I don't know who would watch this except aging C
programmers.

I'm less than 30 but I already feel like an aging C programmer. Most OOP seems
like a morass; I've switched to writing my own projects in C and my prototypes
in C-like Python. But I wonder what hope there is for people like us in the
industry, which seems to be moving ever further away from this type of
programming.

~~~
userbinator
_But I wonder what hope there is for people like us in the industry, which
seems to be moving ever further away from this type of programming._

C is still very popular (along with Asm) in embedded systems. ARM cores and
large memories are certainly getting very cheap, but still not at the level of
many of the 8 and 16-bit microcontrollers.

I find OOP often overused too, but it can be genuinely useful in certain
situations where there is a very strong association between some data and the
operations on it. For this reason, I tend to use a subset of C++ in a style
that would probably enrage a lot of the "C++ is not C" advocates.

~~~
conjectures
I've noticed this, and also that the typical salaries for embedded systems
developers seems to be lower than for hotness.js developers :(

Am I wrong? Hope so.

~~~
mauvehaus
My experience, coming as somebody who has experience on the large end of
embedded (100's of MHz, 100's of MB RAM, MMU, running Linux), albeit a while
ago, is that it's hard to break back into it if you've been out of the
community for a while.

The last time I was looking for a job, I had been doing non-embedded,
including kernel and low-level for 8 years or so, and I had the darnedest time
finding embedded positions to apply for, let alone get calls back from.

Startups that do embedded seem to mostly get the embedded done in the MVP
stage, and by the time they're looking to grow, they've got a (small) team
already doing that work.

In fairness, this is partially self-inflicted; iRobot has been on the monthly
whoishiring threads for as long as I've been looking for jobs in the Boston
area, but I decided long ago that I'd never work in defense (or finance).

Recently, I've added to that list "companies whose business model is selling
their users' information" (many of them these days), and "companies whose
entire sales pitch plays on your fears" (a recruiter tried to put me in touch
with a company that does baby monitors that measure some basic health
parameters, and their promotional video was appalling to me).

I kind of stuck a toe across the first line, and tried talking to somebody
Fitbit-esque, but never got a call back. I also wound up with a connection
into an electronic paper company, but that went cold after submitting a
resume. Ditto a company that does automotive suspension stuff.

I realize that being out of the community for a while makes it hard to get
consideration if there are other candidates who have current experience, so I
started a personal project building a clock out of VFD tubes, and added a
github link to my resume before submitting several of those. That didn't seem
to help.

I got hired doing non-embedded before I finished the clock project. I'm at the
point of needing to learn EAGLE, lay out boards and get them made. That was a
year and a bit ago; right now I've got other projects that take up my weekend
time.

To respond directly to your point, I'm willing to bet that there are two
factors at work: 1\. Supply and demand. If demand exceeded supply, I would
think that I'd at least be able to get a call back with my background. 2\.
Hardware is a hard row to hoe. There's a lot less capital invested in a
hotness.js project that flops, and it's a lot easier to scale if it takes off.
No, it's not necessarily _easy_ , but there's a wide gap between "provision
more instances on EC2 and fix the bottlenecks in the architectures" and "the
factory is already at capacity and the supplier for the key part is
backordered for 3 months". The money that doesn't need to go to covering those
risks can go into your pockets.

~~~
JoeAltmaier
I'm doing embedded contracting between startups. Its been pretty easy to find
work, but I've been doing this sort of thing for decades. In fact, 4 days
after quitting the last startup (new investor changed direction) I got a
panicky call from an old contract begging me to pick up a project. It had been
10 years; they found my number in an old rolodex.

Embedded now has plenty of that 'large end' work now, since SOC/SOM prices
have come way down.

As for iRobot, I actually called them. They split in half; their whoishiring
entry is the Roomba people not the military people. But they're strangely
still putting 5 tiny controllers in each vacuum instead of one hefty
processor. And they're an east-coast Boston company, quite a bit different
from a west-coast startup.

Re: Eagle. My college roommate wrote a board-layout package with some friends
called Eagle years ago. He's from Austria. I wonder if its any relation to the
modern product?

------
userbinator
Before I read the article or the comments I thought it would be about
rewriting code to _not_ use dynamic allocation, which is IMHO a far more
interesting (and challenging to some) exercise. Contrary to common
expectations, it often doesn't mean e.g. restricting the lengths of inputs,
and can result in simpler, more efficient, and less buggy code. From my
experience it is usually those with a background in higher-level languages who
overuse malloc().

~~~
codr4life
You should have a look at the repository README. libc4life bends over
backwards to provide stack allocation and value semantics wherever possible.
Right now I'm working on an ordered map that allows user defined key/value
sizes which lets it allocate all the memory it needs in one block and provide
value semantics.

------
throwaway4891a
See also:

\-
[http://locklessinc.com/benchmarks_allocator.shtml](http://locklessinc.com/benchmarks_allocator.shtml)
($, use as a minimum performance target)

\-
[http://www.nedprod.com/programs/portable/nedmalloc/](http://www.nedprod.com/programs/portable/nedmalloc/)

\-
[http://phk.freebsd.dk/pubs/malloc.pdf](http://phk.freebsd.dk/pubs/malloc.pdf)
[PDF] (phkmalloc)

\-
[https://github.com/gperftools/gperftools](https://github.com/gperftools/gperftools)
(tcmalloc)

\-
[https://github.com/ivmai/bdwgc/blob/master/malloc.c](https://github.com/ivmai/bdwgc/blob/master/malloc.c)

\-
[https://github.com/jemalloc/jemalloc](https://github.com/jemalloc/jemalloc)

\-
[http://gee.cs.oswego.edu/dl/html/malloc.html](http://gee.cs.oswego.edu/dl/html/malloc.html)
(dlmalloc)

~~~
codr4life
Thank you, but that's not the problem I'm solving. Any one of those could be
used to feed any implementation from the challenge. The idea I'm pushing is
using local knowledge to customize memory management, instead of trying to
find the perfect one-size-fits-all solution.

~~~
mathgenius
Aha, I see.. Very nice. Like the following implementation of free() as
commonly used in HFT:

void free(void *mem) {}

~~~
codr4life
That's a totally valid solution to some problems :) Having access to a decent
allocator protocol lets you do that and more.

------
EdSharkey
I recall reading in the last year or two a recount of how a game developer got
their PS3 engine to run at 30 or 60Hz framerate by aggressive triple-buffering
of their scenes.

One of the interesting bits about the article was their memory allocation
scheme. Each game frame they'd allocate a single huge memory pool and then
allocate from it by simply incrementing a pointer into the pool. I think this
is what you describe as a slab allocator, because they never free()'d their
allocations, they just recycled the pool after each frame had been rendered.

I kindof see slab allocator as a happy middle ground between allocating
temporary memory from the stack and full-blown free-list allocator (or
whatever your classic malloc implementation is.)

Are there any high level languages that have the ability to provision fast
memory allocation pools like a slab where garbage collection occurs when the
slab is no longer accessible, for instance?

~~~
fulafel
You are describing an arena allocator. Slab is more like a set of free lists
per object size.

~~~
wahern
IME arena doesn't really have a fixed definition. The term can and has been
used to refer to something like an object stack or bump allocator, but also to
something that supports deallocation and reallocation. The term goes back over
30 years and nothing specific has really stuck.

I think the most you can say of an arena is that it's usually a contiguous
region of memory from which smaller allocations are made, and which can be
efficiently freed as a whole. An arena may only support fixed-size
allocations, or a range of sizes; it may or may not support deallocation.
However, in many cases it's natural to require multiple regions to satisfy all
allocation requests for a particular context (task, generation, etc), so don't
be surprised if an implementation labels a collection of contiguous regions an
"arena".

The term pool is similarly ambiguous, but usually implies support for
deallocation and recycling of memory. It does not necessarily imply a
contiguous region, but that's a natural optimization in a language like C.

Slab is less ambiguous because it has a very specific origin in SunOS--
allocation and deallocation of fixed-size, often typed objects (to optimize
initialization).

~~~
codr4life
They're all out there floating around :) What I refer to as a pool allocator
is a set of separate allocations, could be same size. While a slab is a single
block of memory that's dished out as separate pointers, that could likewise be
same size.

------
gbarboza
All past 15-213 students go in search of their malloc lab solution.

~~~
anaccountwow
Current 15-213 student here in the middle of malloc lab, my current solution
is probably worse than anything in that link (left unclicked).

~~~
codr4life
Give it a spin, you might learn something. Your class is probably focusing on
general purpose, system level allocators; a much thornier problem without any
really good answers. As How to Solve It states; if you can't solve the given
problem, try to solve a simpler version of the problem.

------
vvanders
Awesome challenge.

Back when I worked in a C/C++ shop we'd use this as an in-person interview
question for senior positions. The candidate was never expected to finish but
more as a springboard to talk about the pro/cons and issues they'd seen with
performance/etc of various approaches.

~~~
codr4life
Good call, writing C/C++ without having a grip on memory allocation is a
recipe for exactly the kind of disaster we're in right now :)

~~~
MichaelMoser123
Depends on the domain - i suspect that this question would cut off qualified
application programmers who are not perfectly familiar with lower
level/infrastructure code.

~~~
codr4life
Not if they're writing C/C++; it doesn't really matter how hard you try to
hide memory management when the whole language is designed around memory and
pointers. What kind of programmer is that, anyway? Who can only program
specific api's, as long as they don't touch the wrong parts of the language.

~~~
MichaelMoser123
There are different strategies for handling perf problems of memory allocation
- you can cache blocks of the same size in some linked list within the
application (free list) or you can reconsider the allocator/allocation
strategy. Both get the job done.

Preferance of the first solution does not make you a worse programmer.

~~~
codr4life
What you're describing is essentially a pool, which could be thought of as a
kind of allocator since it provides an allocation strategy. There's even one
in the challenge that does just that. Nothing makes you a worse programmer
except refusing to learn.

------
jwatte
How about: mmap() a few terabytes of virtual space let malloc() be pointer
addition and let free() be a no-op?

~~~
sparky_
Somebody out there is doing this in production. I guarantee it.

~~~
codr4life
Why wouldn't they? If it turns out to be a good solution for the problem
they're trying to solve? There's nothing wrong with coming up with your own
solutions, that's why we got brains instead of answering machines.

~~~
naasking
It is a problem if it's not properly documented. This system will probably
outlast their time at that company, so the next poor shmuck is left to figure
out what that program is doing.

~~~
codr4life
Agreed, but that's an if; there's no reason to assume anything. And the next
scmuck also doesn't need to spend most his time struggling to find a
combination of dependencies that still kind of work. There are advantages to
owning your code.

------
pheo
Let's all just agree to use Perl. It's just dynamic, functional, OOPy C isn't
it?

J/K. I think this is an interesting problem in that its a sandbox for
allocation and GC in pretty much any dynamic interpreter's implementation. My
qualm is that it would be "easy" to tune for the test. Consider the difference
between dynamic blocks of a small but fixed size, getting alloc'd/freed in an
asynchronous way (a network stack?) versus a pool of variable byte length
strings getting shuffled around (a key/value store?). Those are simple, but
drastically different, strategies for your heap. There won't be a "best"
answer besides the limits of your problem domain.

~~~
codr4life
Been there, done that. Even went to YAPC EU in Pisa and had a whiff of Larry.
Not for me, that's all I can say. It's too loose, too much shooting from the
hip. There's a lot of good ideas in there though.

Agreed. Which is why the 'one size fits all' approach might not be the best
way to go. The main reason I decided to launch the challenge, and encourage a
more combinatory approach with local special purpose allocators.

------
otabdeveloper
A malloc benchmark that doesn't measure multithreaded performance is worse
than useless.

~~~
codr4life
Only if you insist on clinging to a general purpose, system level perspective
on memory allocation. No amount of thinking and reasoning about these issues
is useless.

~~~
huhtenberg
While OP might've phrased it more politely, he's got a point - allocators that
aren't designed for multithreaded use are ultimately oversimplified toy
projects.

It's really not that much of a challenge to knock together a (slab + heap +
free list) allocator that will perform really well single-threadedly. However
it will be nearly impossible to adapt it to the multithreaded context. It is a
considerably more complex task and the end result _will_ end up looking like a
rocket ship compared to a simpleton that even the best single-threaded
allocator will look like.

~~~
codr4life
Not if they're meant for specific uses in limited parts of your application;
one allocator per thread, for instance. Why are you clinging so hard to the
system level, general purpose perspective? Given that the problems it comes
with don't really have any good answers. How is that supposed to lead us
forward?

------
tedkalaw
at UIUC, there's a project in the systems class (CS241) that is exactly this.
there's a leaderboard with projects and how it compares to the system malloc
for a variety of metrics

this is definitely one of the best projects i ever did in school and a great
coming of age project. worst case, there's always an implementation at the
back of K&R ;)

~~~
kjdal2001
I agree. That assignment was one of my favorites. It was a lot of fun because
it was fairly easy to get something that functioned, and as you came up with
ideas for making it better (or just got ideas from reference implementations),
you could watch your metrics get better or worse.

~~~
codr4life
Exactly the experience I'm trying to provide with the challenge. For the price
of forking the repository you get a framework for trying out and comparing
your own allocation strategies.

------
morio123
What's the incentive? Credits? Give me a break.

It's fairly easy to beat the given examples but in the end heap management is
heavily dependent on application, client code, platform, hardware and many
other criteria. It's a very complex problem space and what matters here is how
existing important code behaves and continues to behave given that existing
code has most likely made assumptions how the heap is managed.

glibc is a good example of a perfectly fine compromise not optimized for any
particular use case. Anyone who has had performance issues with it has most
likely already implemented their own solution for their problem set.

It might much more worthwhile to develop a set of malloc like implementations
a developer can chose from instead of going for a fits all approach.

~~~
codr4life
Further, the complexity and need for flexibility is exactly the problems that
I'm trying to deal with here. That's why the challenge encourages splitting
the allocator up in Unix-like pieces and stacking them to get the desired
features.

------
wfunction
What if I want to write a malloc that requires the size to be stored
separately? (i.e. one that needs to be paired with free(ptr, length) rather
than just free(ptr) for good performance.) Wouldn't that provide more
flexibility in the challenge and be more useful (e.g. for C++'s
std::allocator)?

~~~
codr4life
The pool reference allocator does just that internally. It prefixes each
allocation with a block containing the size among other things. Either base
your implementation on top of that or take the idea and run with it.

~~~
wfunction
You totally missed the point of my question. I was __NOT __asking "I have an
allocator that doesn't store the buffer size; how can I use it?"

I was asking, "I don't think an allocator should need to store the buffer size
internally; why not formulate the challenge so that the block size doesn't
need to be stored?"

~~~
ben_bai
I would not do that. Userspace Apps have way to many ways to screw up memory
managment already. Allowing them to

    
    
        buf=malloc(128);
        free(buf, 256);
    

seems dangerous, if free can't check the size.

But kernels sometimes do just that ( free(ptr, size) ), for performance
reasons and because "kernel writers know what they are doing".

------
nujabes
I'm trying to build it in MacOS but I'm inundated with errors. For example
malloc_perf.c:39:3: error: implicit declaration of function 'clock_gettime' is
invalid in C99 [-Werror,-Wimplicit-function-declaration] BENCHMARK("basic",
&c4malloc); ^

~~~
david-given
Looking at the implementation of libc4life, I think it's full of gcc-isms.
C4DEFER() is this:

    
    
        #define _C4DEFER(code, _def)				\
          void _def() code;					\
          bool _def_trigger __attribute__((cleanup(_def)))	\
      
        #define C4DEFER(code)				\
          _C4DEFER(code, C4GSYM(def))			\
    

So, nested functions and gcc attributes.

Nested functions awesome, but they're a gcc extension, and only supported on
some architectures anyway, and AFAIK only work if you have an executable
stack, which is frowned on these days (because they have to create a callable
thunk to stash the nested function's context pointer).

[http://stackoverflow.com/questions/8179521/implementation-
of...](http://stackoverflow.com/questions/8179521/implementation-of-nested-
functions)

My understanding is that clang doesn't support nested functions, but it does
have its _own_ non-standard extension, blocks. But of course that's still not
standard C.

I'm currently writing clunky C89 code for an old compiler (and occasionally,
K&R C!), and I got really excited by this library for a moment, but... nope.
Non-standard. Can't use it.

~~~
codr4life
I had to draw a line somewhere, and C99 with GNU extensions is where it is.
Cleanup attributes and anonymous functions are just too useful to leave
behind. And since I'm using clang to develop this, I'm pretty sure it supports
nested functions just fine.

~~~
david-given
Hm. Sounds like they added them --- nested functions certainly weren't
supported the last time I looked (because, as you say, they're far too useful,
and I hated having to give them up).

Have you had any reports about problems on, e.g., OpenBSD, related to needing
an executable stack?

~~~
codr4life
Could be, it's moving fast. Once I realized I could use nested functions to
provide anonymous functions and deferred actions with decent syntax in a semi-
standardized way, I was hooked :)

Nothing, but I seldom hang around in the BSD crowd these days.

------
jheriko
its an interesting exercise for learning but the code style is awful.

freel instead of freelist? ffs.

abbreviating memory to mem is enough of a mistake in the standard library
without going further to m like malloc does and some of the examples here.

still, much respect, to the coder4life for making such a good effort and
having such an awesome name...

~~~
zzzcpan
Coding style never saved anyone's C code from being insecure. So it's not
something to be concerned about, C code is generally awful on its own.

~~~
codr4life
What's awful with C on it's own? You can write all sorts of code in C.

------
esaym
Looks like fun. Wish I had time to join in :(

~~~
codr4life
Oh come on, there's always time for fun :)

------
102100101
GitHub apparently infests every code snippet in existence. People work for
free and GitHub gets the credit/branding.

~~~
codr4life
Too be fair, they're also providing a decent service for free.

