Hacker News new | comments | show | ask | jobs | submit login
Malloc Challenge (vicsydev.blogspot.com)
147 points by codr4life 391 days ago | hide | past | web | favorite | 104 comments

>[libc4life] is aiming for simplicity and leverage; and it makes a real effort to get there by playing on C's strengths, rather than just inventing yet another buggy Lisp.

Ouch, right in the feels, I've been working on https://buildyourownlisp.com in my spare time. (EDIT: I was looking for a name for the repo, YABLisp it is.)

> coding in C is a welcome therapy after seemingly wasting years exploring various ways of pretending the hidden complexity in my stack was someone else's problem

I've noticed several older talented programmers express similar feelings. I was watching Casey Muratori's Handmade Hero stream, where he writes a game in C from scratch, and he said, I don't know who would watch this except aging C programmers.

I'm less than 30 but I already feel like an aging C programmer. Most OOP seems like a morass; I've switched to writing my own projects in C and my prototypes in C-like Python. But I wonder what hope there is for people like us in the industry, which seems to be moving ever further away from this type of programming.

But I wonder what hope there is for people like us in the industry, which seems to be moving ever further away from this type of programming.

C is still very popular (along with Asm) in embedded systems. ARM cores and large memories are certainly getting very cheap, but still not at the level of many of the 8 and 16-bit microcontrollers.

I find OOP often overused too, but it can be genuinely useful in certain situations where there is a very strong association between some data and the operations on it. For this reason, I tend to use a subset of C++ in a style that would probably enrage a lot of the "C++ is not C" advocates.

I work on a modular business application at work, and OOP is useful in places. I don't hate it, but it can be kinda clunky.

We have an embedded systems team at work but they barely have enough work to keep the current small dev team busy.

I've noticed this, and also that the typical salaries for embedded systems developers seems to be lower than for hotness.js developers :(

Am I wrong? Hope so.

My experience, coming as somebody who has experience on the large end of embedded (100's of MHz, 100's of MB RAM, MMU, running Linux), albeit a while ago, is that it's hard to break back into it if you've been out of the community for a while.

The last time I was looking for a job, I had been doing non-embedded, including kernel and low-level for 8 years or so, and I had the darnedest time finding embedded positions to apply for, let alone get calls back from.

Startups that do embedded seem to mostly get the embedded done in the MVP stage, and by the time they're looking to grow, they've got a (small) team already doing that work.

In fairness, this is partially self-inflicted; iRobot has been on the monthly whoishiring threads for as long as I've been looking for jobs in the Boston area, but I decided long ago that I'd never work in defense (or finance).

Recently, I've added to that list "companies whose business model is selling their users' information" (many of them these days), and "companies whose entire sales pitch plays on your fears" (a recruiter tried to put me in touch with a company that does baby monitors that measure some basic health parameters, and their promotional video was appalling to me).

I kind of stuck a toe across the first line, and tried talking to somebody Fitbit-esque, but never got a call back. I also wound up with a connection into an electronic paper company, but that went cold after submitting a resume. Ditto a company that does automotive suspension stuff.

I realize that being out of the community for a while makes it hard to get consideration if there are other candidates who have current experience, so I started a personal project building a clock out of VFD tubes, and added a github link to my resume before submitting several of those. That didn't seem to help.

I got hired doing non-embedded before I finished the clock project. I'm at the point of needing to learn EAGLE, lay out boards and get them made. That was a year and a bit ago; right now I've got other projects that take up my weekend time.

To respond directly to your point, I'm willing to bet that there are two factors at work: 1. Supply and demand. If demand exceeded supply, I would think that I'd at least be able to get a call back with my background. 2. Hardware is a hard row to hoe. There's a lot less capital invested in a hotness.js project that flops, and it's a lot easier to scale if it takes off. No, it's not necessarily easy, but there's a wide gap between "provision more instances on EC2 and fix the bottlenecks in the architectures" and "the factory is already at capacity and the supplier for the key part is backordered for 3 months". The money that doesn't need to go to covering those risks can go into your pockets.

I'm doing embedded contracting between startups. Its been pretty easy to find work, but I've been doing this sort of thing for decades. In fact, 4 days after quitting the last startup (new investor changed direction) I got a panicky call from an old contract begging me to pick up a project. It had been 10 years; they found my number in an old rolodex.

Embedded now has plenty of that 'large end' work now, since SOC/SOM prices have come way down.

As for iRobot, I actually called them. They split in half; their whoishiring entry is the Roomba people not the military people. But they're strangely still putting 5 tiny controllers in each vacuum instead of one hefty processor. And they're an east-coast Boston company, quite a bit different from a west-coast startup.

Re: Eagle. My college roommate wrote a board-layout package with some friends called Eagle years ago. He's from Austria. I wonder if its any relation to the modern product?

No worries :) I was mostly referring to the tendency to turn C into another language on the api-level. Building Lisps is one of my part time hobbies. Language abstractions are like training wheels. Once you know what you're doing they start getting in your way, and you start preferring languages with less baggage.

I'm sure it inflates one's ego but the chances that they've avoided writing any exploitable security vulnerabilities is almost zero.

Show me a software stack without those kinds of bugs; they're all written in C somewhere down the line; enormous amounts of complicated C, written by coders of varying competence. At least in my stack I can keep it simple and fix anything that needs fixing.

He/she meant: two programmers of equal skill and security conscientiousness will create less exploitable security vulnerabilities per "unit of functionality", using C and <memory safe language> respectively, ceteris paribus.

"Easier languages attract less skilled devs", "higher level languages allow you to create more complex programs", yada yada. Could all be true, but it's besides the point.

> "Easier languages attract less skilled devs", "higher level languages allow you to create more complex programs", yada yada. Could all be true, but it's besides the point.

For the armchair discourse it's beside the point, but for real-world development, it's what matters the most.

Not everything has to come down to C. OCaml is self-hosting I believe.

And even if it did, rates matter.

I didn't mean to make the point so C-centric. I'm not just talking about C programmers. I don't think we should all write C all the time.

I went back to C and Lisp in my spare time because I'm already invested in those languages, between C itself, and CPython.

The common feeling I've seen repeated is doing OOP for a while then getting tired of it, and casting it off.

> I'm sure it inflates one's ego

I'm sure that's true for some, but it would be shortsighted to think all programmers who dislike OOP are just full of themselves and don't have any valid insights.

What the hell is C-Like Python?

Python that looks like C? It's not a thing, I just made it up. A subset of Python that is not too painful to rewrite in C. No classes, I use namedtuples as structs, etc.

I'm not sure it's a sane thing to do, arguably I should just write the thing in C in the first place, but I have more experience with Python so it's still easier to me.

Python does not look like C. The first thing I can think of is using traditional loop counters instead of the for..in construct.

That's a minor detail. I'm talking about writing in a procedural style.

Rewriting a for..in as a for with counters is relatively trivial, compared to re-implementing a complex class hierarchy in C.

You can write procedural Python, and it will look kinda like C if you squint. Cython even lets you use C types directly, with a simple syntax. As I said, maybe not the sanest thing to do, everything is an object in Python so you can never really get away from it's object-oriented nature.

Python is not really object oriented. To get information hiding you have to wrap your class in a closure and that closure has to track the individual instances. It's not clean code and it's horribly inefficient.

Python is also not a functional language.

Python provides language structures that allow you to use both styles of programming OOP and functional as-if-it-was.

Right, it's not pure OOP. It's "only" missing strong encapsulation. Trying to implement really private properties in Python doesn't work too well, the language isn't designed for it and it's not considered 'Pythonic'.

All I meant is that everything is an object. It doesn't have strong encapsulation, but you can't get away from its weak encapsulation, since the basic types are objects.

Nobody stops one from writing Python like C. CLikeMemoryModel = []

Python written like Python doesn't look like C. Imagine writing in a subset of the language that translates especially cleanly into C, though. It certainly wouldn't be stylistically similar to most other Python code; it'd have more in common structurally to how the same program would be written in C.

Long term maintanance is easier with safer languages.

Everything is easier in safer languages, far from everything is possible.

What do you mean with everything? Because we have seen languages such as D (without GC) and Rust that offer the same possibilities and performance, yet much less opportunity to blow your hand off by accident.

Before I read the article or the comments I thought it would be about rewriting code to not use dynamic allocation, which is IMHO a far more interesting (and challenging to some) exercise. Contrary to common expectations, it often doesn't mean e.g. restricting the lengths of inputs, and can result in simpler, more efficient, and less buggy code. From my experience it is usually those with a background in higher-level languages who overuse malloc().

You should have a look at the repository README. libc4life bends over backwards to provide stack allocation and value semantics wherever possible. Right now I'm working on an ordered map that allows user defined key/value sizes which lets it allocate all the memory it needs in one block and provide value semantics.

Any tips or links on how to do this?

Thank you, but that's not the problem I'm solving. Any one of those could be used to feed any implementation from the challenge. The idea I'm pushing is using local knowledge to customize memory management, instead of trying to find the perfect one-size-fits-all solution.

Aha, I see.. Very nice. Like the following implementation of free() as commonly used in HFT:

void free(void *mem) {}

That's a totally valid solution to some problems :) Having access to a decent allocator protocol lets you do that and more.

I recall reading in the last year or two a recount of how a game developer got their PS3 engine to run at 30 or 60Hz framerate by aggressive triple-buffering of their scenes.

One of the interesting bits about the article was their memory allocation scheme. Each game frame they'd allocate a single huge memory pool and then allocate from it by simply incrementing a pointer into the pool. I think this is what you describe as a slab allocator, because they never free()'d their allocations, they just recycled the pool after each frame had been rendered.

I kindof see slab allocator as a happy middle ground between allocating temporary memory from the stack and full-blown free-list allocator (or whatever your classic malloc implementation is.)

Are there any high level languages that have the ability to provision fast memory allocation pools like a slab where garbage collection occurs when the slab is no longer accessible, for instance?

This is how the minor heap in many garbage collection schemes works. eg. In OCaml (the GC I'm too intimately familiar with) allocation is just decrementing a pointer and testing whether you've hit the beginning of the minor heap, so it's extremely fast.

When the minor heap is exhausted you then take the hit of scanning the heap for objects which are still live and moving those to the major heap. If you're doing it right then most objects on the minor heap will be dead by this point so only a few will need to be moved.

A fairly common trick for games or any code with little tolerance for GC pauses is to size the minor heap large enough that every allocation required in a single frame can be satisfied from the minor heap, and then do an explicit sweep of that heap at the end of the frame / before waiting for a network message / while waiting for user interaction.

> Are there any high level languages that have the ability to provision fast memory allocation pools like a slab where garbage collection occurs when the slab is no longer accessible, for instance?

Rust has several slab allocation libraries, such as the "slab" crate. They use lifetimes to ensure that the slab outlives the objects stored in it.

This is so rad.

> a single huge memory pool and then allocate from it by simply incrementing a pointer into the pool

Maybe a register could be used to keep track of the current pointer into this pool, and every time you called into a new function the call could increment the pointer enough for all the usage in a function, and when the function returns it could set it back, automatically deallocating the usage with almost no cost at all?

But not popping the last allocation when the function returns is kind of the whole point... it is (or certainly was...) common to double-buffer these, so that one frame's data is built up, and then consumed by another thread/the GPU/etc. while the corresponding data for the next frame is being filled in.

You are describing an arena allocator. Slab is more like a set of free lists per object size.

IME arena doesn't really have a fixed definition. The term can and has been used to refer to something like an object stack or bump allocator, but also to something that supports deallocation and reallocation. The term goes back over 30 years and nothing specific has really stuck.

I think the most you can say of an arena is that it's usually a contiguous region of memory from which smaller allocations are made, and which can be efficiently freed as a whole. An arena may only support fixed-size allocations, or a range of sizes; it may or may not support deallocation. However, in many cases it's natural to require multiple regions to satisfy all allocation requests for a particular context (task, generation, etc), so don't be surprised if an implementation labels a collection of contiguous regions an "arena".

The term pool is similarly ambiguous, but usually implies support for deallocation and recycling of memory. It does not necessarily imply a contiguous region, but that's a natural optimization in a language like C.

Slab is less ambiguous because it has a very specific origin in SunOS--allocation and deallocation of fixed-size, often typed objects (to optimize initialization).

They're all out there floating around :) What I refer to as a pool allocator is a set of separate allocations, could be same size. While a slab is a single block of memory that's dished out as separate pointers, that could likewise be same size.

The GNU people call this "obstacks"; they are in the "iberty" library. (So named because it spells -liberty on a Unixy linker command line).


Obstacks provide objects by drawing them from a linear piece of memory incrementally. Additionally, this is treated like a stack: you can free an object, but that also pops all objects allocated since that one.

names vary. It's also called a scratchpad or region.


This might be the talk you're referring to.

This is it! Great presentation, thought provoking.

The slab reference allocator from the challenge does that, but adds a segmented approach where the memory is allocated in N sized blocks. Go has decent support for doing these kinds of tricks; but nothing really runs faster since there's still too much magic behind the scenes. I suspect the story is the same in Rust land but I honestly never got passed the lifetime hump, too much ceremony for me.

There's not really much runtime magic in Rust (pretty comparable to C++), and arenas are definitely a performance benefit in many cases.

All past 15-213 students go in search of their malloc lab solution.

Current 15-213 student here in the middle of malloc lab, my current solution is probably worse than anything in that link (left unclicked).

Give it a spin, you might learn something. Your class is probably focusing on general purpose, system level allocators; a much thornier problem without any really good answers. As How to Solve It states; if you can't solve the given problem, try to solve a simpler version of the problem.

In retrospect, that class was amazing. At the time, I hated it.

Awesome challenge.

Back when I worked in a C/C++ shop we'd use this as an in-person interview question for senior positions. The candidate was never expected to finish but more as a springboard to talk about the pro/cons and issues they'd seen with performance/etc of various approaches.

Good call, writing C/C++ without having a grip on memory allocation is a recipe for exactly the kind of disaster we're in right now :)

Depends on the domain - i suspect that this question would cut off qualified application programmers who are not perfectly familiar with lower level/infrastructure code.

Not if they're writing C/C++; it doesn't really matter how hard you try to hide memory management when the whole language is designed around memory and pointers. What kind of programmer is that, anyway? Who can only program specific api's, as long as they don't touch the wrong parts of the language.

There are different strategies for handling perf problems of memory allocation - you can cache blocks of the same size in some linked list within the application (free list) or you can reconsider the allocator/allocation strategy. Both get the job done.

Preferance of the first solution does not make you a worse programmer.

What you're describing is essentially a pool, which could be thought of as a kind of allocator since it provides an allocation strategy. There's even one in the challenge that does just that. Nothing makes you a worse programmer except refusing to learn.

I would argue that except in the case of legacy software there's no reason to be using C/C++ for something that doesn't care about performance or memory overhead.

Even then it's useful to know how sentinel values and other features work if you don't dig into the performance aspect.

What kind of software doesn't care about performance or memory overhead? And why would anyone want to use it?

How about: mmap() a few terabytes of virtual space let malloc() be pointer addition and let free() be a no-op?

Sure, the idea is that you can have several allocators around for different purposes. One allocator per http handler that reserves enough memory and resets the pointer between request makes the idea more feasible than insisting on the centralized general purpose perspective.

Except if the program needs to run for a non trivial duration or you need many instances of it at the same time/run at the same time as other processes.

Still you have obstacks where free is a no op - here you allocate off the stack (assuming that you dont need to store pointers after you are done). Ngnix and apache allocate a chunk of memory per connection and free it all when the connection is finished - but this results in high memory fragmentation, also this is not so good if connections take a long time.

Somebody out there is doing this in production. I guarantee it.

Why wouldn't they? If it turns out to be a good solution for the problem they're trying to solve? There's nothing wrong with coming up with your own solutions, that's why we got brains instead of answering machines.

It is a problem if it's not properly documented. This system will probably outlast their time at that company, so the next poor shmuck is left to figure out what that program is doing.

Agreed, but that's an if; there's no reason to assume anything. And the next scmuck also doesn't need to spend most his time struggling to find a combination of dependencies that still kind of work. There are advantages to owning your code.

There was a console game (I want to say Sega CD? CD32? or Atari Jaguar?) that reset whole system between level loads because it was cheaper than freeing memory.

Or add a simple GC to that, if you need to recycle memory during longer runs, and your stack is too small to hold most allocs.

Given the evaluation criteria, this sounds like a winner to me.

Let's all just agree to use Perl. It's just dynamic, functional, OOPy C isn't it?

J/K. I think this is an interesting problem in that its a sandbox for allocation and GC in pretty much any dynamic interpreter's implementation. My qualm is that it would be "easy" to tune for the test. Consider the difference between dynamic blocks of a small but fixed size, getting alloc'd/freed in an asynchronous way (a network stack?) versus a pool of variable byte length strings getting shuffled around (a key/value store?). Those are simple, but drastically different, strategies for your heap. There won't be a "best" answer besides the limits of your problem domain.

Been there, done that. Even went to YAPC EU in Pisa and had a whiff of Larry. Not for me, that's all I can say. It's too loose, too much shooting from the hip. There's a lot of good ideas in there though.

Agreed. Which is why the 'one size fits all' approach might not be the best way to go. The main reason I decided to launch the challenge, and encourage a more combinatory approach with local special purpose allocators.

A malloc benchmark that doesn't measure multithreaded performance is worse than useless.

Only if you insist on clinging to a general purpose, system level perspective on memory allocation. No amount of thinking and reasoning about these issues is useless.

While OP might've phrased it more politely, he's got a point - allocators that aren't designed for multithreaded use are ultimately oversimplified toy projects.

It's really not that much of a challenge to knock together a (slab + heap + free list) allocator that will perform really well single-threadedly. However it will be nearly impossible to adapt it to the multithreaded context. It is a considerably more complex task and the end result will end up looking like a rocket ship compared to a simpleton that even the best single-threaded allocator will look like.

Not if they're meant for specific uses in limited parts of your application; one allocator per thread, for instance. Why are you clinging so hard to the system level, general purpose perspective? Given that the problems it comes with don't really have any good answers. How is that supposed to lead us forward?

Doesn't the benchmark also just allocate and then immediately free? I think I could specialise for that particular use pattern and make it very fast just for the benchmarks. Will that not work for some reason? A good benchmark might be a replayed set of allocate and free operations from a large real application.

It does, but with random sizes. If think you have a way to make that run faster than the provided implementations, then please show us.

Challenge accepted (if I can find the free time)!

at UIUC, there's a project in the systems class (CS241) that is exactly this. there's a leaderboard with projects and how it compares to the system malloc for a variety of metrics

this is definitely one of the best projects i ever did in school and a great coming of age project. worst case, there's always an implementation at the back of K&R ;)

I agree. That assignment was one of my favorites. It was a lot of fun because it was fairly easy to get something that functioned, and as you came up with ideas for making it better (or just got ideas from reference implementations), you could watch your metrics get better or worse.

Exactly the experience I'm trying to provide with the challenge. For the price of forking the repository you get a framework for trying out and comparing your own allocation strategies.

What's the incentive? Credits? Give me a break.

It's fairly easy to beat the given examples but in the end heap management is heavily dependent on application, client code, platform, hardware and many other criteria. It's a very complex problem space and what matters here is how existing important code behaves and continues to behave given that existing code has most likely made assumptions how the heap is managed.

glibc is a good example of a perfectly fine compromise not optimized for any particular use case. Anyone who has had performance issues with it has most likely already implemented their own solution for their problem set.

It might much more worthwhile to develop a set of malloc like implementations a developer can chose from instead of going for a fits all approach.

Further, the complexity and need for flexibility is exactly the problems that I'm trying to deal with here. That's why the challenge encourages splitting the allocator up in Unix-like pieces and stacking them to get the desired features.

Pushing the state of the art in application level memory management? Too lofty? Come on, it's fun :)

What if I want to write a malloc that requires the size to be stored separately? (i.e. one that needs to be paired with free(ptr, length) rather than just free(ptr) for good performance.) Wouldn't that provide more flexibility in the challenge and be more useful (e.g. for C++'s std::allocator)?

The pool reference allocator does just that internally. It prefixes each allocation with a block containing the size among other things. Either base your implementation on top of that or take the idea and run with it.

You totally missed the point of my question. I was NOT asking "I have an allocator that doesn't store the buffer size; how can I use it?"

I was asking, "I don't think an allocator should need to store the buffer size internally; why not formulate the challenge so that the block size doesn't need to be stored?"

I would not do that. Userspace Apps have way to many ways to screw up memory managment already. Allowing them to

    free(buf, 256);
seems dangerous, if free can't check the size.

But kernels sometimes do just that ( free(ptr, size) ), for performance reasons and because "kernel writers know what they are doing".

That's a pretty nasty requirement on top of an allocator protocol; I wouldn't want to use it, or debug it for that matter, yikes.

And you don't need to store the size. The slab allocator doesn't store any information except how far into the current slab it's already dished out memory.

I'm trying to build it in MacOS but I'm inundated with errors. For example malloc_perf.c:39:3: error: implicit declaration of function 'clock_gettime' is invalid in C99 [-Werror,-Wimplicit-function-declaration] BENCHMARK("basic", &c4malloc); ^

Looking at the implementation of libc4life, I think it's full of gcc-isms. C4DEFER() is this:

    #define _C4DEFER(code, _def)				\
      void _def() code;					\
      bool _def_trigger __attribute__((cleanup(_def)))	\
    #define C4DEFER(code)				\
      _C4DEFER(code, C4GSYM(def))			\
So, nested functions and gcc attributes.

Nested functions awesome, but they're a gcc extension, and only supported on some architectures anyway, and AFAIK only work if you have an executable stack, which is frowned on these days (because they have to create a callable thunk to stash the nested function's context pointer).


My understanding is that clang doesn't support nested functions, but it does have its own non-standard extension, blocks. But of course that's still not standard C.

I'm currently writing clunky C89 code for an old compiler (and occasionally, K&R C!), and I got really excited by this library for a moment, but... nope. Non-standard. Can't use it.

I had to draw a line somewhere, and C99 with GNU extensions is where it is. Cleanup attributes and anonymous functions are just too useful to leave behind. And since I'm using clang to develop this, I'm pretty sure it supports nested functions just fine.

Hm. Sounds like they added them --- nested functions certainly weren't supported the last time I looked (because, as you say, they're far too useful, and I hated having to give them up).

Have you had any reports about problems on, e.g., OpenBSD, related to needing an executable stack?

Could be, it's moving fast. Once I realized I could use nested functions to provide anonymous functions and deferred actions with decent syntax in a semi-standardized way, I was hooked :)

Nothing, but I seldom hang around in the BSD crowd these days.

MacOS doesn't provide clock_gettime in its libc. You have to use a Mach specific timer. There are some libraries that abstract this but obviously the author never built it on a Mac, which is their choice.

I just don't have the patience for anything but Linux these days, sorry about that.

I totally agree, when I'm working on my OS project I just assume Linux/GCC. These days you're only about 30 seconds away from having a brand new Linux machine up and running in AWS/Google. Porting across systems is just too time consuming. As this example shows you can't even count on basic POSIX stuff working everywhere.

its an interesting exercise for learning but the code style is awful.

freel instead of freelist? ffs.

abbreviating memory to mem is enough of a mistake in the standard library without going further to m like malloc does and some of the examples here.

still, much respect, to the coder4life for making such a good effort and having such an awesome name...

The naming discussion stops here; if you have nothing more constructive to add, I suggest silence. They are not mistakes, they are born out of 30 years of experience with naming things. We will never come to an agreement on this, and it's ok. Why not skip ahead to the more interesting discussions that actually lead anywhere?

Coding style never saved anyone's C code from being insecure. So it's not something to be concerned about, C code is generally awful on its own.

What's awful with C on it's own? You can write all sorts of code in C.

Looks like fun. Wish I had time to join in :(

Oh come on, there's always time for fun :)

GitHub apparently infests every code snippet in existence. People work for free and GitHub gets the credit/branding.

Too be fair, they're also providing a decent service for free.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact