

The cost of dynamic vs. static dispatch in C++ (2013) - anacleto
http://eli.thegreenplace.net/2013/12/05/the-cost-of-dynamic-virtual-calls-vs-static-crtp-dispatch-in-c

======
twoodfin
As the article demonstrates, in anything other than trivial microbenchmarks
it's the inability to inline across a virtual call that will cost you.
Inlining is the single most fundamental optimization for a modern C++
compiler. It enables the 0-cost abstractions for which C++ is rightly famous.

~~~
MichaelGG
Why shouldn't it be possible to inline virtual calls? The CLR folks were
complaining about this, too. But it seems rather straightforward to figure out
the actual implementation that's commonly called and inline it, perhaps with a
guard for uncommon cases. And I wouldn't be surprised if many apps end up with
only a single actual implementation, so the entire virtual overhead can just
be dropped.

~~~
adrusi
Because c++'s type system doesn't make it possible to determine the actual
implementation in most cases. Concepts would allow rust-style
monomorphization, but they aren't part if the standard yet.

~~~
kazinator
None of this matters if you create the "flexible joints" in the right places,
where you get the benefit of dispatch, without making hundreds of millions of
calls per second to it.

------
zsombor
In 2015 clang & gcc have a more or less working devirtualization feature that
eliminates the virtual overhead in a simple benchmark such as this. The
compiler sees all classes and knows that a particular interface is implemented
by only one class, so it simply elides the virtual table lookup. Add that
features such as speculative devirtualization that can remove a surprisingly
large number of virtual calls in an established code base such as Firefox:
[http://gcc.gnu.org/ml/gcc-
patches/2013-09/msg00007.html](http://gcc.gnu.org/ml/gcc-
patches/2013-09/msg00007.html)

If C++ will exist in 10 years from now, I predict 'virtual' to be just yet
another legacy keyword.

~~~
ginko
>If C++ will exist in 10 years from now, I predict 'virtual' to be just yet
another legacy keyword.

I really doubt that. Not so much because of the virtual call overhead, but
because there's plenty of cases where you really want control over how your
class objects and structs look in memory. Adding a vtable entry to every
struct is something you really don't want in many cases.

~~~
mbel
Also from semantic point of view expressing which methods you are expecting to
be overwritten in quite handy.

------
ambrop7
If you just want the number: dynamic is 6x slower.

~~~
npalli
Not a good summary.

1\. If the static method is trivial enough, the compiler can inline it
removing the need for a function call. In that case, a virtual function call
for a single increment op is 6 times slower.

2\. If you disable inlining for the static method, curiously it is 1.5 times
slower than the virtual function call.

3\. Newer compilers have devirtualize that the OP could not test but should be
able to give much better than the 6x degradation.

------
kspiteri
If you can use templates, the decision of which function to call can be done
at compile time and you don't need virtual calls. Virtual calls are useful
when the decision must be done at runtime.

~~~
jjaredsimpson
Article just seems like "I misused some feature X and it was slower than
something else."

~~~
tezka
no it's not. it's more like, the most common dispatch pattern used in C++ (and
most other OO languages) as a factor of 7x performance penalty over the less
widely embraced alternative. The popularity of the former methods is partly
due to the ignorance of programmers to its performance implications. I have
seen huge code bases that had to be retired because of their over reliance on
OO, and dynamic dispatch.

~~~
kazinator
> _I have seen huge code bases that had to be retired because of their over
> reliance on OO, and dynamic dispatch._

What were they replaced with?

------
0x0
Why is the author referencing the Itanium C++ ABI? Isn't IA-64 an obsolete
architecture that nobody uses? And then he gives examples with x86_64 and
mentions his "i7-4771 CPU". Confusing.

~~~
pcwalton
The C++ ABI that everyone (well, OK, the open source compiler toolchain
ecosystem) uses nowadays on popular architectures was created for the Itanium
[1], and so it bears its name. Itanium ended up having an impact in a weird,
roundabout way :)

[1]: [https://mentorembedded.github.io/cxx-
abi/abi.html](https://mentorembedded.github.io/cxx-abi/abi.html)

~~~
0x0
Learn something every day, I guess!

I remember early linux used to have lots of trouble running binaries across
upgrades, because the C++ ABI kept changing and changing. Is that related?

------
bit_razor
How is this an apples to apples comparison? Instantiating CRTPInterface with
Implementation is no different than just calling DynamicImplementation::tick()
directly, so why not benchmark that?

~~~
Rusky
The entire point of the benchmark is virtual calls vs templated static calls.
Everything else is what should be controlled.

~~~
kazinator
If everything else can be controlled, then virtual calls are being used
unnecessarily. The fact that an indirect calling mechanism which has to first
use indirection to retrieve the address to be called (and possibly do more
work, like fixing up a pointer) is slower than a static call is completely
unsurprising.

If virtual calls are used essentially, then it's a canoe-versus-bicycle
comparison, because everything else cannot be controlled. The program based on
static calls has to be written quite differently to solve the same problem,
and the benchmark then measures the entire approach. Plus the benchmark
doesn't account for benefits that it doesn't measure, like maintainability and
extensibility of the code.

~~~
Rusky
Regardless, it's still useful to actually _test_ the performance of virtual vs
static calls- how much slower are they? in which situations can the compiler
devirtualize them?

Or should we only ever benchmark the entire software stack at once, even
though people do in fact sometimes use CRTP and virtual functions in the same
situations?

~~~
bit_razor
My beef with this article is that it misleads you believe that need CRTP is
the only way to eliminate vtable lookups on interface calls. It left me
wondering how CRTP compared to invoking DynamicImplementation::tick()
directly. The article is actually quite great otherwise.

