Forced Inlining Might Be Slow

userbinator · on Oct 9, 2017

Note that this is about slow compilation, not execution. IMHO if the output also becomes much faster, the tradeoff is reasonable. It would be interesting to know how much runtime performance changed, and in what direction, by not inlining.

On the other hand, the huge difference in compile times suggests there may be a hidden quadratic or higher complexity algorithm somewhere in the code path of the compiler when inlining is performed.

blt · on Oct 9, 2017

The function call overhead for an SVD-based matrix inverse should be tiny. Maybe if they are inverting 4x4 matrices, it's measurable. Inlining can give huge speedups but it should not be applied to every function without thought.

Animats · on Oct 9, 2017

Right. Of course turning off forced inlining will speed up compilation. The article is completely silent about what it does to run time.

This is an unusual case, in that it involves code with SIMD instructions. Out of line calls usually mean the arguments have to be put on the stack, then loaded into the SIMD registers. Inlining opens up the optimization possibility of not doing that.

It's not clear what he's writing, but from the math it sounds like a physics engine for animation or games.

amagumori · on Oct 9, 2017

he's a senior developer at unity.

amyjess · on Oct 9, 2017

> IMHO if the output also becomes much faster, the tradeoff is reasonable.

The output may be faster in terms of CPU cycles, but the executable will have a larger memory footprint, which can cause performance problems when it comes to swapping, eviction of file-backed pages, etc.

colanderman · on Oct 10, 2017

Don't forget instruction cache misses.

halayli · on Oct 9, 2017

aside from compilation time, inlining doesn’t mean your program will run faster, because now you’re filling your icache with the inlined instructions rather than a call instruction, that might be flushed away on a missed branch prediction.

smitherfield · on Oct 9, 2017

True, but less so now than a decade ago; if it's a hot path (which anything `__forceinline` presumably is, assuming the author has a reasonable understanding of optimization) inlining is usually going to be a performance win, and often a cache/code size win as well (optimizing the inlined code, avoiding register spills).

rurban · on Oct 9, 2017

No, still. If it's an unlikely branch and pretty big, better put it into an extra function call if possible. icache pollution is still an issue.

Narishma · on Oct 9, 2017

The main benefit of inlining come from the additional optimizations it allows the compiler to do.

loeg · on Oct 9, 2017

That is not a benefit of forced inlining (over allowing the compiler to decide what to inline).

maplant · on Oct 9, 2017

It's a trade-off. Sometimes simply calling a function will flush the I cache

CrystalGamma · on Oct 9, 2017

Also, there are function prologues/epilogues and sometimes additional instructions to satisfy the calling convention.

maplant · on Oct 9, 2017

True, and the functions in which inlining is most effective are small non recursive functions that don't lose any debug info from omitting the frame pointer.