Hacker News new | past | comments | ask | show | jobs | submit login
A not-so-quick introduction to the C++ allocator model (quuxplusone.github.io)
141 points by jandeboevrie on June 4, 2023 | hide | past | favorite | 32 comments



How do you update pointers when relocating the object? I assume this must be combined with some rust-ish assertions about the number of references to the object? Or I suppose it's C++ so you just have to do it right or you get to keep the pieces.


> Or I suppose it's C++ so you just have to do it right or you get to keep the pieces.

Yep. One reason why giving out pointers/references all willynilly to items in something like a std::vector can be fraught with peril. If you push to that vector, it could trigger a realloc of all of the previous items, leaving you with dangling pointers to everything you previously took references of.


By the time someone gets to the point where they are keeping pointers while reallocating the data structure that contains the objects, they have made a bunch of wrong turns already.

It is much better to use indices or keys instead of pointers since that is the interface to the data structure.


I was recently faced with this (luckily I realized it before coming across any strange bugs), and realized std::deque works great for this if you don’t need random access (which was fine for my use-case because I just grabbed references immediately after insertion).


std::deque has O(1) access by index so it's fine for random access (although it has some extra indirection so it's still slower than vector)


It's O(1), for high values of 1.


Wow I was literally reading the docs last night and didn’t notice operator[]! Good to know


An occurrence of this bug caused a lot of pain for one of my previous employers.

One way we tried to isolate the bug was creating a replacement for std::vector. This vector used an array for the elements, but capitalized on the x86 page-protection system:

- Between every pair of adjacent elements, this vector allocated a dummy page, marked as un-readable and un-writeable. Any attempt to access it would immediately trigger a segfault. This was to detect erroneous attempts to access some vector element; e.g. punning it to a type larger than the element's actual type.

- Any time the vector needed to replace (or just retire) its internal array, it would not return the array to the free pool. Instead, it would leave it allocated, but mark all of its elements as un-readable and un-writeable. This was to detect usage of stale pointers into the vector.

I think the team discovered the bug by other means, but I still think this approach has some merit.

TL;DR:

- The basic idea was inspired by Electric Fence.

- This vector wasn't exactly a drop-in replacement for std::vector, because (at the time) it seemed like that would require a lot of additional code.

- Obviously this wastes a lot of memory. It's only something one would use in certain debugging situations.


At that point you might as well use a dynamic array which doesn't invalidate old pointers, when growing the internal buffer: <https://yakubin.com/notes/comp/reserve-and-commit.html>. It also exhibits much better space utilisation than std::vector, not worse.

(Sorry about the suboptimal formatting. It appears that web technologies aren't up-to-par when it comes to this type of content. I intend to rewrite this post in TeX and add some graphs in TikZ when I find a spare moment.)


Just to clarify, our approach was to help find the bug that was misusing an instance of std::vector.

It was a temporary measure, put in place only until we found the bug.


This looks like a really neat approach!

I'd be curious to see how this approach performs when integrated in a running system. In particular, there are some additional costs associated with this approach which might not appear in the listed microbenchmarks. Using mmap/mprotect/munmap like this will cause page faults and TLB shootdowns. After mprotect()-ing memory to allow writes to it, writes will still initially cause page faults. On page fault, the kernel will need to allocate a page of memory, map the page into the process, and then a TLB shootdown to clear the core-local caches of the page table. In addition to these costs, every page fault incurs a context switch to/from the kernel for each 4KB page.

The TLB shootdowns are especially costly, since they slow down all threads currently running in the process, not just the threads that are performing the memory access. As a result, this approach could slow down other parts of the program that aren't even using DynArray.

One potential approach to mitigate this is to use Large/Huge pages. Because all the costs that I mentioned are incurred on each page miss, you can make misses less frequent by using large pages. The issue is that, if you're going around allocating 1GB pages willy-nilly, then you're very quickly going to run out of system memory. You can also run into memory fragmentation issues, since large pages require the system to have 1GB of physically contiguous memory available (whereas with 4KB pages, you can slice and dice from anywhere in physical memory).

Another solution is to use caching/leverage a user-space memory allocator. One benefit of using the std::vector approach is that, if you're using a properly tuned memory allocator, once your program reaches a steady-state of memory usage, it should never need to ask the OS for more memory. Because the memory allocator is just recycling memory over and over again, it never needs to update its page tables (by using mmap() to ask for more memory from the OS), and so it never has page faults or incurs TLB shootdowns.

Regardless, there are lots of interesting trade-offs in this space. Especially between memory usage and compute. And for short-lived applications, for example, I could see DynArray being a clear win.

One of the things I really like about performance optimization is the amount that you can experiment--different solutions perform better in different scenarios, and it's fun to try them all out!


> Between every pair of adjacent elements, this vector allocated a dummy page, marked as un-readable and un-writeable. Any attempt to access it would immediately trigger a segfault.

I'm struggling to figure this one out. Essentially every two elements would be stored at the start and end of a page, respectively? Followed by a dummy page?

8KB per pair of elements I guess, hope your vectors weren't too large :p


Sorry, let me clarify. Suppose you have a vector with 3 elements, and each individual element fits within a single memory page.

Then the array layout is:

page[0] : dummy page

page[1] : element 0

page[2] : dummy page

page[3] : element 1

page[4] : dummy page

page[5] : element 2

page[6] : dummy page

IIRC. It's been a while :)


It's probably between blocks of elements. Like, when the vec needs to realloc they insert a guard page. This is a pretty common allocator trick


I love using this trick. It's extremely useful for having relatively high confidence you don't have memory bugs


In game dev, you sometimes avoid holding on to pointers for more than a single frame or operation because of this. Especially since many things get created or destroyed over time, and we want to keep our arrays densely packed to reduce cache misses.

We are trying a database like approach for retrieving references to things in our current project. It adds a small performance penalty, but it’s far from a bottleneck and it’s been nice to work with


Did you ever work on a game development project where you would have rather done what you did versus using Unity?

Would you want to work in a way where performance didn't matter?

How could you tell the difference between the following two scenarios:

- A specific kind of higher performance was important.

- Higher performance wasn't important: you'd get the same reviews, sell the same copies, whatever graphics improvements you got didn't matter, and maybe you even finish your game 2x sooner because you weren't writing an engine, all else being equal. But because it was intellectually stimulating to you to deal with performance, you committed code every day, which is better than nothing.


Yes. Sorry for the low brow comment, but fuck unity. I really dislike the fact that if someone wants to become a gamedev in 2023, their only practical options are to write everything themselves, learn unity, or learn unreal engine. I’ve been putting together an alternative. https://github.com/shawwn/noh

It’s frankly amazing to work with a production grade engine that compiles in three minutes. I wouldn’t trade it for all the complexity in unity, regardless of how many extra copies I’d sell. But I realize I’m in the minority.

I think it matters to love the tech stack you use, and gamedev has gone out of fashion mostly because no one loves any of the stacks. They deal with them, which is different.


Raw C or C++ pointers you can’t, at least not trivially. You either have a double pointer masquerading as a single pointer, or some other fanciness.

Come to think of it, std::shared_ptr has a control block that points to the actual data. I think implementations allocate the control block and the data contiguously for performance reasons, but you don’t really have to.


There is a std::make_shared( constructor arguments... ) function which will do one allocation for the control block + item data. I don't think compilers will do it automatically (it has other implications over a normal std::shared_ptr) so you need to opt into it.


std::make_shared exists to allocate the control block at the same time as the memory itself. This has a pretty major disadvantage, the memory for the shared pointer won't be deallocated until the last std::weak_ptr has been destroyed, even if the object has been destroyed.


And the major advantage that object data by being coupled with control block reduces fragmentation, and avoids the error of wrongly handling memory leaks when new obj() fails as parameter to shared pointer construction.


Yep, I realise I only talked about the negatives, not the advantages. In general I prefer to use make_shared and be careful with uses of weak pointers.


Ah good point. But in exchange you get cache locality of control block + data and single instruction deref.


Indeed. It's a tradeoff, and I personally prefer std::make_shared.


`boost::intrusive_ptr` or the proposed `std::retained_ptr` is required to guarantee that.


Indices, regular or generational.


Have to admit to skimming -- this is complicated, but neat! -- but I also have to say lack of pluggable per-container allocators in Rust is perhaps my chief complaints at this point; really miss this from the C++ STL, even if this article does point out how it can get snakey and foot-gunny.

I don't pay attention much the process or politics, but allocator_api feels like it is stuck in permanent limbo. https://github.com/rust-lang/rust/issues/32838 -- opened 6 years ago, still pending.


See also the proposal for language-support for scoped allocations:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p26...

There are some issues with PMR usage that it hopes to resolve, for example:

   struct Aggregate {
      std::pmr::string data1;
      std::pmr::string data2;
      std::pmr::string data3;
   };

   std::pmr::polymorphic_allocator a1;
   Aggregate ag  = {{"Hello", a1}, {"World", a1}, {"!", a1}};

   std::pmr::vector<Aggregate> va(a1);
   va.emplace_back(std::move(ag));   // Correct allocator retained by moves
   va.emplace_back(ag);              // Error, copied lvalue has wrong allocator
   va.resize(5);                     // Error, new values have wrong allocator
   va.resize(1);                     // OK, remove all objects with bad allocators

The proposed syntax is via a "using" keyword:

   std::pmr::polymorphic_allocator a4;
   ScopeAggregate s1 using a4 {"Hello"};


> You must go far out of your way, at Bloomberg, to encounter a container that works the same way as in the ordinary C++ STL.

When I worked in the group writing the Bloomberg Enterprise backbone we would go out of our way to avoid using the BDE/BSL library. OP is right about it not working the same way as ordinary C++. :-(


> You must go far out of your way, at Bloomberg, to encounter a container that works the same way as in the ordinary C++ STL.

You need only disable the "bsl overrides std" build option. Whether you can do that is another question.


-DBSL_OVERRIDES_STD has been gone for a while now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: