PS. Also the block alignment is aligning the wrong thing. It should be aligning the start of the public (returned) part, not the block size and that's done by padding Block struct as required (if required)... and not to the hardcoded 8 bytes. Otherwise you'll get SIGBUS on RISC boxes.
PPS. At the risk of stating the obvious, this is nothing more than a toy allocator, with lots of loose ends. A lab exercise in a second-third year of an average CS course - maybe, but this is not something suitable for production. Perhaps it works for some specific cases, but these aren't specified. So seeing this massively upvoted and at the top of HN is a bit strange.
Basically you can fail memory allocation with a lot of small allocations that aren't using all of a block. This situation bit me when using some C++ code on a Cortex-M class processor where the C++ code ended up creating a bunch of objects with very small allocations in them.
That said, the code is pretty clean and easy to read so its a pretty understandable allocator. Some malloc implementations are so opaque you have to study them for a week to tease out how they work.
1) ta_free() is only O(n) if block compaction is enabled, else it's O(1), i.e. a simple cons of the freed block addr to the free list (https://github.com/thi-ng/tinyalloc/blob/master/tinyalloc.c#...)
2) there's no hardcoded alignment (it's configurable via TA_ALIGN macro) and I'm confused about your other comment about not aligning the right address in general. that's a wrong observation and i think you misunderstood the diagram in the readme.
3) it's clearly stated in the readme that this is largely meant for memory constrained situations (or where not large numbers of blocks, realloc or multithreading is needed/available) and so it's not meant to be a replacement for dlmalloc / jemalloc caliber allocators etc. i needed something small & lightweight and it did it's job very well in several production projects on the ARM STM32 platform (where I have upto around 8MB managed). I also found it very useful for some ongoing JS/WASM interop projects (i.e. there's JS version here: https://github.com/thi-ng/umbrella/blob/master/packages/mall...)
In the end we still had to add one layer of hack^H^H engineering tricks to workaround fragmentation issues in the worst case :-)
This seems like it has some nice features. Though a 3K overhead could be important in such a constrained environment (I guess if it's only managing 128k that could reduce).
The library was implemented using C++. We also ported that to QNX in addition to Linux Kernel.
I remembered it can do > 1 millions msg block per second between user space programs or similar perf for between user and kernel space.
It did waste some amount of space. That can be config at system startup.
Thread1 allocate (block index++)
Thread2 allocate (block index++)
Thread1 free (block index--)
Thread3 allocate (block index++)
Now thread 3 has a block that is actually still owned by thread 2.
When a thread, process, kernel thread "allocated" a block. It actually acquired and owned the "index" via atomic inc from the free list.
When it free the block, it put the block's index back to the "free" list.
The "free" list is actually a array. Thus the list index can be allocated and free with atomic inc, dec. Thus it can do SMP safe alloc, free operations without any locks and guarantee the correctness.
This was actually a IPC, DIPC library to support TCP session redundancy in a network router product. Other programmers call this library from user, kernel and TCP stack, etc.
Since the blocks and index are all in share memory, there are huge potential for share memory corruptions. In debug mode, there are memory space before and after each block, every allocate, free are checked for ownership, double free, and border space corruption. There are regression tests to cover all aspect of API calls that were run 24/7 check for every commit. When not compile with debug mode, the operations were much faster but there were still quite a bit of assertions check in place.
There were a debug trace system that track all system activities (including Linux, QNX, vxwork context switches.) The log is actually in special region of memory that preserved over warm boot. If the system was completely hang, in 99% of cases, I can recover the memory to do post mordem analysis and know exactly the last user, kernel space threads that were running before the system crash. It was more useful for vxWork, less useful for Linux as majority of code were in user space and didn't take down the system.
The regression test coverage code and the debug mode validations were 80% of code. The actually implementations were very small.
256 blocks of memory, you use 8 words of 32 bits to to indicate used/free status. 1 means free, 0 means occupied. To quickly find a free block you compare the words with zero.
I’m fairly happy that mirsa C forbids malloc. I can’t use it even if I wanted to, so I need to be more careful about how memory is used. Annoying at first, makes a lot of sense later on though.
Also if the goal is to get one chunk of memory with a single call, why use malloc at all? Why not use the direct memory mapping command?
Under that system malloc/new was just a pointer bump, and an assert if there wasn't enough room. Free wasn't linked, and delete asserted. That heap region (or mine had several heaps that lived in different memory with different characteristics that could be specified with a new overload) is way easier to define in your linker.
There are differences of course, but that memory can be used in the same way, and so memory pools are orthogonal to how the memory is aquired.
The unit of reuse that defines the type, example: http://c-faq.com/struct/sd1.html
We wrote the system I mentioned above (128K of RAM, 256K of flash) without it, I agree, you can do everything you need without malloc when you're careful.
Unfortunately we had a third party dependency that required a malloc implementation to be able to run, so we had to implement a simple one.
We ran into the same thing. Some open source compression code I wanted to use had malloc, dumped it for something else that didn't. Luckily we found a protobuffers instance that didn't use malloc, that saved some time on a different project.
First thing I do now when looking at example or source we might want to use is a quick scan for malloc.
Sometimes you don't get to choose or do it yourself, more's the pity!
Right, just when coding a network protocol for embedded, often you can free in the same order, i.e. it’s often a good idea to limit parallelism to a small number of FIFO streams (ideally a single one).
That’s only for small chips. When they aren’t small e.g. you have a quad-core ARM with a gig of RAM you usually have OS anyway and the programming is totally different, much higher level and you guaranteed to have a fast malloc and lots more.
allocation a = m.allocate(512);
These 3 files should be a fairly self contained, minimal way to allocate into existing buffers. No attempt is made at thread safety though (we assume the user synchronizes that themselves)
A neat trick could be to be able to 'trim' the data used by the handle, then when it comes to defrag time just deal with the remaining part and free up the trimmed space.
addr = getrandom(8);
mem = mmap(addr, len, PROT_RWX, MAP_FIXED|MAP_ANON, -1, 0);
Don't forget that theres memory managed languages that compile to c. They still use malloc under the hood.