Hacker News new | past | comments | ask | show | jobs | submit login
Forced Inlining Might Be Slow (aras-p.info)
34 points by ingve on Oct 9, 2017 | hide | past | favorite | 14 comments



Note that this is about slow compilation, not execution. IMHO if the output also becomes much faster, the tradeoff is reasonable. It would be interesting to know how much runtime performance changed, and in what direction, by not inlining.

On the other hand, the huge difference in compile times suggests there may be a hidden quadratic or higher complexity algorithm somewhere in the code path of the compiler when inlining is performed.


The function call overhead for an SVD-based matrix inverse should be tiny. Maybe if they are inverting 4x4 matrices, it's measurable. Inlining can give huge speedups but it should not be applied to every function without thought.


Right. Of course turning off forced inlining will speed up compilation. The article is completely silent about what it does to run time.

This is an unusual case, in that it involves code with SIMD instructions. Out of line calls usually mean the arguments have to be put on the stack, then loaded into the SIMD registers. Inlining opens up the optimization possibility of not doing that.

It's not clear what he's writing, but from the math it sounds like a physics engine for animation or games.


he's a senior developer at unity.


> IMHO if the output also becomes much faster, the tradeoff is reasonable.

The output may be faster in terms of CPU cycles, but the executable will have a larger memory footprint, which can cause performance problems when it comes to swapping, eviction of file-backed pages, etc.


Don't forget instruction cache misses.


aside from compilation time, inlining doesn’t mean your program will run faster, because now you’re filling your icache with the inlined instructions rather than a call instruction, that might be flushed away on a missed branch prediction.


True, but less so now than a decade ago; if it's a hot path (which anything `__forceinline` presumably is, assuming the author has a reasonable understanding of optimization) inlining is usually going to be a performance win, and often a cache/code size win as well (optimizing the inlined code, avoiding register spills).


No, still. If it's an unlikely branch and pretty big, better put it into an extra function call if possible. icache pollution is still an issue.


The main benefit of inlining come from the additional optimizations it allows the compiler to do.


That is not a benefit of forced inlining (over allowing the compiler to decide what to inline).


It's a trade-off. Sometimes simply calling a function will flush the I cache


Also, there are function prologues/epilogues and sometimes additional instructions to satisfy the calling convention.


True, and the functions in which inlining is most effective are small non recursive functions that don't lose any debug info from omitting the frame pointer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: