A Well Known but Forgotten Trick: Object Pooling

twic · on July 29, 2015

Object pooling was very common in Java last century. Today, it survives only for the most heavyweight objects: threads, database connections, web browsers, that sort of thing.

What killed it off was generational collection. Compacting generational collectors make allocation very cheap, and they make maintenance of short-lived objects very cheap. As a result, it's reliably cheaper to allocate an object and throw it away than it is to attempt to reuse it.

So it's surprising and interesting to read that the author of this article clearly thinks that there are still benefits to object pooling. Since Alex Petrov is a very sharp guy, i have to take him at his word. But it's a shame he didn't include benchmarks to compare it to idiomatic use of a modern GC.

vitalyd · on July 29, 2015

It's very much alive in the java low latency space where even young GCs are a problem. Allocations of temps/short lived objects also wash your cpu caches, and generally there are other reasons to avoid any GC in this space.

ketralnis · on July 29, 2015

> java low latency space where even young GCs are a problem ... there are other reasons to avoid any GC in this space

I'm no Java hater, but that sounds like the wrong place to use Java in the first place

vitalyd · on July 29, 2015

It has its pros/cons but you'd be surprised (perhaps) at the throughput and latency you can get if you're mindful of things to avoid.

kctess5 · on July 30, 2015

Which is, IMO, a part of the reason why Java isn't great for that. A serious pro can make it go like a race horse, but beginners and even long time users may fall into traps. If a language has less things to be mindful of, it's users may be less mindful with fewer dire consequences.

Of course, this totally depends on who might be touching the codebase...

EdSharkey · on July 30, 2015

I think this is more an article about garbage collecting vm's, namely the JVM, than it is about java, the language.

The JVM is being used in Mt. Everest-sized workloads. Object pooling is a useful tool when architecting something huge and performance sensitive.

vitalyd · on July 30, 2015

Any language used for producing high performance software requires more than just beginner knowledge, each one just has its own specific things along with general cross-lang concepts to keep in mind.

jestar_jokin · on July 29, 2015

Possibly true, in the sense that any GCed language suffers in high-performance or soft realtime apps (e.g. games, audio software). However, other constraints might influence the choice of technology, such as existing experience with the available team members, or vendor-mandated technology choices (e.g. Android apps [ignoring the NDK]), or speed-to-market - and therefore development speed - or, debateably, program correctness, is more important.

cbsmith · on July 30, 2015

You'd be surprised. With an allocator and collector that is aware of real time constraints, GC can actually be a pretty huge advantage for achieving low latency.

aidenn0 · on July 30, 2015

GC is essentially never an advantage for low latency, but it is not incompatible with it either. Things like metronome can give you extremely well defined latencies.

It's fairly moot for hard real-time programs though, as those typically completely eschew dynamic allocation (malloc can have unpredictable time too).

anarazel · on July 30, 2015

> GC is essentially never an advantage for low latency

I can't really agree with that statement. One way to get to lower latency is to avoid using locks and rely on lock free algorithms.

Many of those are much easier to implement if you can rely on a GC, because the GC solves the problem that you can have objects that are still referenced in some thread, but that aren't reachable from the lock-free datastructure anymore. There are ways around this, e.g. using RCU or hazard pointers, but mostly it's easier with a GC.

aidenn0 · on Aug 4, 2015

Do you have an example? I'm not super familiar with lock-free structures, since when I've worked on low-latency things there has been a need to quantify the worst-case timing which rules out most of the lock-free options.

MichaelGG · on July 30, 2015

It might make it easier, no? I'm working on a perf-sensitive program now. It's written in C (mainly for performance). It's spending about 25% of CPU time in free/malloc. Yikes.

This happened because it has an event dispatcher where each event has a bunch of associated name/value keypairs. Even though most of the names are fixed ("SourceIP", "SourceProfile", "SessionUuid", etc.) the event system ends up strdup'ing all of them, each time. With GC we could simply ignore this. All the constant string names would just end up in a high gen, and the dynamic stuff would get cleaned in gen0, no additional code. (As-is, I'm looking at a fairly heavy rewrite, affecting thousands of callsites.)

vitalyd · on July 30, 2015

So what's the reason for strdup'ing vs having const names that never get freed? Also, sounds like you could use ints/enum to represent the key and provide string conversion util functions. Anyway, spending 25%in malloc/free is just poor code, but you already know that. This really isn't about GC :).

Gen0 or young GC still involves a safepoint, a few trips to kernel scheduler, trashes the cpu instruction and data caches, possibly causes other promotions to tenured space (with knock on effects later), etc. It's no panacea when those tens/hundreds of millis are important.

MichaelGG · on July 30, 2015

'Cause all of the strings aren't const, some are created dynamically. Third parties add on to these event names at runtime, so we don't know ahead of time. An int-string registry would work at runtime, except for the dynamic names.

I was just pointing out that GC can "help", by reducing complexity and enabling a team that otherwise might get mired in details to deliver something OK.

cbsmith · on July 30, 2015

In a latency sensitive system, you want to minimize how much time you spend allocating and deallocating memory during performance critical moments. GC gives you a great way to leave those operations as trivial as possible (increment a pointer to allocate, noop to deallocate) during performance critical moments, and clean up/organize the memory later when outside the time critical window.

Similarly, it makes it easier to amortise costs across multiple allocations/deallocations.

GC does have a bad rep in the hard real-time world, because in the worst case scenario, a poorly timed GC creates all kinds of trouble, which is why I mentioned that it helps if the allocator/deallocator is aware of hard real-time commits.

aidenn0 · on Aug 4, 2015

> In a latency sensitive system, you want to minimize how much time you spend allocating and deallocating memory during performance critical moments. GC gives you a great way to leave those operations as trivial as possible (increment a pointer to allocate, noop to deallocate) during performance critical moments, and clean up/organize the memory later when outside the time critical window.

This only works if you enter a critical section with sufficient free heap. You could have just malloc()ed that space ahead of time if you weren't using a GC, so I don't see an improvement, just a convenience.

> Similarly, it makes it easier to amortise costs across multiple allocations/deallocations.

Amortizing costs is often the opposite of what you want to do to minimize latency; with hard real-time you care more about the worst-case than the average-case, and amortizing only helps the average-case (often at the expense of the worst-case)

> GC does have a bad rep in the hard real-time world, because in the worst case scenario, a poorly timed GC creates all kinds of trouble, which is why I mentioned that it helps if the allocator/deallocator is aware of hard real-time commits.

Yes, and GC can be made fully compatible with hard real-time systems; any incremental GC can be made fixed-cost with very little effort. It's somewhat moot since most hard real-time systems also want to never run out of heap, and the easiest way to do that is to never heap allocate after initialization, so most hard real-time systems don't use malloc() either.

deciplex · on July 30, 2015

I made a game for Android phones a few years back that would instantiate, at peak, a few thousand agents that would roam around an artificial environment, devouring renewable resources, before (maybe) reproducing and eventually dying off. So these agent instances were being continually created and destroyed, and the GC lag absolutely killed performance and made the game totally unplayable. It was unplayable even on a Galaxy S2 - a reasonably high-performance phone at the time. Once I implemented object pooling to handle the creation and destruction of these agents, the game ran very smoothly on an S2, and I could even run the smaller simulations on an old HTC Magic I had lying around. It made an enormous difference.

guelo · on July 30, 2015

I'm sure twic was referring to the JVM. Android's VM doesn't use a generational collector.

aidenn0 · on July 30, 2015

I'm not from the JVM world, but indeed a generational collector can be a huge boon, particularly if they use a Cheney style collector for the nursery.

MichaelGG · on July 30, 2015

On the CLR, I've found that in tight processing loops (say, 50K msg/sec) even a few tiny (~32 bytes) allocations are measurable. F# didn't lift lambdas (it'd generate a new "function pointer" object each time even though it didn't need to) and just rewriting to force evaluation to a static var was a gain.

j_baker · on July 29, 2015

Well, there are still cases where object pooling might make sense. Some games on mobile devices still use object pooling. I suppose I could also see the need to use object pooling for a very busy server or a very large hadoop job.

jdonaldson · on July 30, 2015

I can say that object pooling is critically important on certain mobile devices.

rescendent · on July 30, 2015

Medium lifetime objects are good candidates for pooling esp with a generational GC

tbirdz · on July 29, 2015

Object pools are certainly a very useful technique, but can they really be considered a "forgotten trick"? This kind of pooling is pretty basic CS knowledge, and thus is a very popular technique in many applications, libraries, etc. I would have assumed most people working with a need for scale in a GC language like Java would be familiar with this technique (dare I call it a "design pattern")

Perhaps I am wrong and the technique is less familiar than I had thought?.

ajuc · on July 29, 2015

It's everywhere in java ee. Still I guess people writing fronted may have never heard of it (except thread pools?).

ericflo · on July 29, 2015

People writing frontend learn how to pool objects very quickly in both iOS and Android for list/collection view cells.

rturben · on July 29, 2015

I know that in things like games and other highly concurrent applications that object pooling is a must, otherwise you inevitably run in to the problem of attempting to update or render some game object that's died and been deleted. Maybe it's less common in other types of application development?

RyanZAG · on July 29, 2015

> update or render some game object that's died and been deleted

Woah, woah. No, object pooling is not for that case and definitely doesn't help with that. If you've re-used the object for something else and you've still got a reference to it - and you're updating it - you've got some seriously nasty bugs and randomly broken code.

Object pooling is not a 'must'. You need object pooling when you have a real time app and your total create/delete bandwidth exceeds the garbage collectors ability to process it in the 'early generation', forcing eventual 'stop the world' GC.

pcwalton · on July 29, 2015

In fact, object pooling makes this problem worse. You suddenly have to remember to remove objects from the pool at the right time, resulting in memory leaks if you forget or use-after-free (though not the exploitable kind) if you do it too early. It's better than unsafe manual memory management from a security perspective, but it's not any better from an ergonomic perspective.

vitalyd · on July 29, 2015

Yes, ergonomics suck. It's particularly frustrating since a lot of things that end up pooled are confined to current thread execution (e.g. request) and could easily be stack alloc'd if that facility was there. Along the same lines, sharing messages across threads where one would simply like to memcpy into a ringbuffer become a pain in the rear.

RyanZAG · on July 30, 2015

In Java8, escape analysis usually catches most of the objects that can be moved to the stack, and they don't touch the heap anyway.

https://en.wikipedia.org/wiki/Escape_analysis http://www.burnison.ca/articles/escape-analysis-stack-alloca...

It would be nice to be able to control it though. (value objects?)

vitalyd · on July 30, 2015

Escape analysis hasn't lived up to its promises/potential, IMO. Current Hotspot uses control flow insensitive escape detection, meaning an object that escapes on any path, even if that path is never taken, is considered escaping. This can be improved to be CF sensitive, and Graal does that.

Secondly, beyond trivial examples, sufficient inlining needs to occur for compiler to have full horizon, and that can fail for a variety of reasons, even with run to run variance of exact same code shape.

As you say, there's no control and you're fully at the mercy of the compiler's mood that day. This is well into Sufficiently Smart Compiler land ...

The bottom line is that anyone for whom those allocations can be problematic take manual measures to ensure they don't happen and do not rely on EA.

Value types should help for a few reasons, however they appear to be somewhat limited (e.g. immutable, cannot be passed by ref).

macrael · on July 29, 2015

Another example of this: table views in Cocoa. Enough cells to fill the screen are allocated and then just updated instead of re-created as cells scroll off screen.

nemesisrobot · on July 29, 2015

IIRC, LMAX Disruptor[0] uses the same concept with its ring buffer to avoid GC-related slowdown.

[0]: https://github.com/LMAX-Exchange/disruptor

vitalyd · on July 29, 2015

While it does avoid GC, it's mostly a faster way to share data between threads than the builtin concurrent collections.

nemesisrobot · on July 30, 2015

Yep, that's what Disruptor is for. Here's a blog post by one of the devs giving more details about the ring buffer where she specifically mentions that objects are pooled to avoid the GC http://mechanitis.blogspot.com/2011/06/dissecting-disruptor-...

lucb1e · on July 29, 2015

If this is a forgotten trick, then why did we learn about this in school? It was in Java classes though, never heard of it being needed elsewhere, but the concept is generally applicable.

rch · on July 29, 2015

IIRC it's discussed in Bulka & Mayhew - Efficient C++: performance programming techniques

http://dl.acm.org/citation.cfm?id=320041

jebblue · on July 29, 2015

We used object pooling in C++ 2 decades ago in real time applications.

newobj · on July 30, 2015

Pooling and in-place reuse are two different things, to be sure.

This was a fun thing I remember writing: https://github.com/newobj/ffwd/blob/master/Src/Core/Std/Allo... When returned to the pool, each object used the first 4 bytes of itself to point to the next free one in a list, so there was no additional overhead.

amelius · on July 29, 2015

I can see a security nightmare lurking around the corner.

pcwalton · on July 29, 2015

I don't know why this was downvoted. This vulnerability in Jetty was the result of sharing a buffer between requests: http://blog.gdssecurity.com/labs/2015/2/25/jetleak-vulnerabi...

jaltekruse · on July 30, 2015

This is a reasonable concern, and as the other response mentions, it has happened, even to well-respected projects. I will however say that there are some ways to combat this if you are in a situation where security is an absolute must. While it removes some of the performance benefits, a simple defense is to include a call to clear the state of the object before letting it out of the pool, or upon returning it to the pool.

zwerdlds · on July 29, 2015

It's been a while, but this was a common technique for agent based modeling in MASON back in the day.

Having tens of thousands of agents constantly being garbage invalidated was the use case.

davelnewton · on July 30, 2015

If it's well-known it's not forgotten.

(Not really sure it's forgotten, either; it's used regularly for heavy objects, or when speed is absolutely critical.)

LCDninja · on July 30, 2015

Object pooling is essential for performance in Unity 3D if you're dealing with large amounts of game objects.

shanemhansen · on July 30, 2015

Object pooling is alive and well. Take a look at go's sync.Pool.

wangii · on July 29, 2015

question:

which is cheaper, stack object allocation/creation or object pooling, in languages like C/C++/Rust?

tbirdz · on July 29, 2015

The stack is cheaper, but it is a limited size. If you need more objects than would fit on the stack, pooling might be a better choice.

joelwilliamson · on July 30, 2015

The stack doesn't have to be limited. On a 64-bit system, you can give each thread a couple of terabytes of stack space with no problem.

aidenn0 · on July 30, 2015

For amd64 on linux you only have 128TB; with green threads I regularly use a lot more than 64 threads.

pcwalton · on July 29, 2015

Generally the stack, because the cache behavior is better and you can allocate an entire activation record at one time. The compiler is also often able to optimize the allocations better, for example by promoting stack objects to registers.

vitalyd · on July 29, 2015

For the most part stack allocs would be faster unless ctor does something ridiculously expensive. Also, one typically doesn't pool objects that are confined to the current thread's execution - it's mostly for heap references that outlive requests/transactions/etc.

jacquesm · on July 29, 2015

Stack for sure. But if you want any kind of persistence then stack is right out and an object pool would be faster.