
Stack, Heap, Pool - ingve
http://bulldozer00.com/2015/09/14/stack-heap-pool/
======
bryanlarsen
Why is stack allocation so slow? Fundamentally, it should compile down to a
single machine expression (mov stack pointer), correct? Sure, the first time
it may cause a page exception to allow the OS to grow your stack, but that
should only happen once. What else is happening that I'm missing?

AFAICT, the big difference in the linked code is that the stack allocation
code uses the constructor whereas the pool allocator uses a special function
called reset(). Is this just a bogus benchmark that is measuring the speed
difference between the constructor and reset()?

~~~
Matheus28
It won't even modify the stack pointer during the loop.

Here's what happens: Inbetween every loop iteration, for the correct code
(stack & heap), the BigObject has its destructor called, then the constructor
is called again, zero filling the array.

For the pool, it calls "reset", which does nothing. What should be done is
calling the destructor in place, then the constructor in place. Otherwise
you're not really doing the same thing as the other allocations.

Object pools are better than heap allocations, but they are _never_ faster
than stack allocation (which is virtually free).

TL;DR: The author is cheating, the benchmark is plain wrong.

~~~
filthydumbidiot
Note also that the real cost of the pool allocation is up front in the "Pool
construction" field. At 368ms in this benchmark, it takes even longer than all
of the heap allocations!

With pools, you pay an up-front cost for the faster allocation later. So a
better way to arrange the benchmark would be showing off frequent pool
creation and deletion of objects.

~~~
Matheus28
To be honest, I'd consider the entire benchmark void.

It really depends on how heap allocation is implemented, so it's hard to
compare. By using memory pools, you can allocate a huge block of memory at
once from the OS (a small initial cost), and lazily initialize the values.
Which brings you: 1. Cache locality. 2. A lot less fragmentation.

I'd rank it as:

1\. Stack allocation (essentially free)

2\. Memory pool (a small hit every once in a while)

3\. Heap allocation (significant overhead compared to the other ones)

------
filthydumbidiot
By explicitly calling the destructor, you could do away with the requirement
that classes must implement a "reset()" member function (which, aside from
being an extra step for users, does kind of make RAII in their classes
somewhat confusing):

    
    
      void release(OBJ* obj) {
        // ...
        obj->OBJ::~OBJ(); // OBJ was the template type
        // ...
      }
    

acquire() could be improved by using placement new and forwarding the
constructor arguments via variadic template:

    
    
      template<typename... T>
      OBJ* acquire(T... args) {
        // ...
        return new(mem) OBJ (args...);
      }
    

Finally, you can exchange potential upfront construction cost (and remove the
need for a default constructor in user classes) by making an inner dummy
struct that matches the size:

    
    
      template<typename OBJ, int32_t NUM_OBJS>
      class Pool {
        struct MockOBJ {
          uint8_t data[sizeof(OBJ)];
        };
        
        public:
          Pool(){
            // ...
            std::for_each(std::begin(_available), std::end(_available),
              [](OBJ*& mem){mem = reinterpret_cast<OBJ*>(new MockOBJ);});
            // ...
          }
      // ...
      };

------
tkinom
Did similar profiling with Borland/Turbo C++ a while back. (yes, I am that
young. )

When I needed to alloc() 1-100 millions objects, the alloc, delete() api times
did increase exponentially. BC++ profiler tracked that down to free list
traversal.

Implemented my own version of pool with C++, and the new/delete time dropped
down to constant times. Later one, I implemented a version on Linux (both in
Kernel and User space + SMP) on faster CPU (1.5GHz) and get that API time down
to sub micro-seconds measured with rdtsc.

It makes a huge difference for any code that need to alloc/delete millions+
objects as fast as possible.

------
mikeash
Why is the author testing with optimizations off? If the worry is that the
optimizer will delete the code you're trying to time (a valid worry, since it
actually does for me) then you should come up with better test code that the
optimizer won't remove, not just turn it off. The performance of unoptimized
code can be vastly different in all sorts of ways, and it's not usually very
informative.

------
leovonl
This code is not only non-optimal, but also inneficient.

First, it does a linear search - O(n) time - because it keeps adding null
pointers for elements that are removed from each list. This makes literally no
sense. Just removing the null pointer insertion would already improve the
performance a lot.

Second, the multiset is also a poor choice for the available objects list,
given its performance characteristics. A structure like a deque, with O(1)
insertion/removal on both ends, would be much more suited for the job.

There are a few papers/presentations on memory allocation for C++ from Andrei
Alexandrescu, like this one: [http://nwcpp.org/talks/2008/memory-
allocation.screen.pdf](http://nwcpp.org/talks/2008/memory-
allocation.screen.pdf)

I'd suggest giving them a read, they give some good insights about simple yet
very powerful techniques for improving memory allocation with little code.

------
nine_k
TL;DR: the author compares object allocation times of heap, stack, and pool
methods, in C++. Heap is slowest, stack is much faster, pool is insanely fast.
A pool implementation is shown.

~~~
ska
Am I the only one who believes "TL;DR" adds negative value here (in general,
not just your contribution) ?

~~~
eximius
You are almost certainly not the only one, but I can't say I agree with you.
In an age where we're bombarded by information and it is a struggle (if even
possible) to stay current on technology trends, TL;DR's give a summary
allowing you to make a more informed decision on whether an article is worth
your time - titles just aren't usually sufficient to determine that.

~~~
philsnow
If you just blindly read the TL;DR's, though, you're ceding judgement of the
merits of the article to whoever wrote the TL;DR.

In this case, the TL;DR parrots the (false) conclusion from the article, so if
you stopped here, you now have false facts in your brain.

~~~
eximius
As I have said, this is a problem with specific TL;DRs not the general concept
of them. (It is still a relevant concern, but it isn't sufficient reason to
disregard them completely.)

Also, I generally use TL;DRs to decide _WHAT_ to read, I don't use them as my
source of information. So I would have disregarded the TL;DRs incorrect
conclusion and read the article for myself (which I did).

