It's not just the allocation—just the cost of switching between stack segments is really expensive. Function calls are 2 ns on most architectures; any overhead can easily make that 5x or 10x slower.


I know it's the "wrong language" for you, but after years of working in Perl and occasionally profiling it, it was a shock to switch to Go and profile anything in "nanoseconds". To see even a "microsecond" appear in Perl code is often a bit of a pleasant surprise. To some extent I'm still not used to being concerned that adding 2ns to a code path might have noticeable performance impacts....

