
Performance Tuning for .NET Core - benaadams
https://reubenbond.github.io/posts/dotnet-perf-tuning
======
rhinoceraptor
It would be nice if .NET Core profiling was a bit easier on Linux, Microsoft
has a shell script[1] to do profiling but it requires Windows only tools.

They don't ship Crossgen with the Linux packages, and you have to manually
generate the .NET runtime symbols.

I've gotten things like FlameGraphs working using BCC profile[2], but it took
quite a bit of work.

[1]: [https://raw.githubusercontent.com/dotnet/corefx-
tools/master...](https://raw.githubusercontent.com/dotnet/corefx-
tools/master/src/performance/perfcollect/perfcollect) [2]:
[https://github.com/iovisor/bcc/blob/master/tools/profile.py](https://github.com/iovisor/bcc/blob/master/tools/profile.py)

~~~
tluyben2
The perf script from MS is what we use to profile and fix issues on linux. I
do not have Windows; no issues so far with just Linux. We managed to diagnose
and fix every perf issue so far. Not sure what you mean by windows only tools
or manually generate symbols?

~~~
rhinoceraptor
I haven't tried the script, the documentation says you need PerfView to view
the data so I didn't bother running it.

For generating symbols, (I misspoke a bit, I mean downloading them) I'm
talking about the native CLR runtime symbols. According to the docs if you
want those, you need to use dotnet-symbol and manually download the symbols
for the CLR alongside the CLR .so files.

~~~
NicoJuicy
Well, .net was for a long time only available on windows (since 2002 and
development in late 1990's).

I think making the tools cross-platform after such a long time is not that
easy/fast.

------
kevingadd
A tip related to the throw inlining tip: One way to get more
consistent/effective inlining is to split the complex 'slow paths' out of your
functions into helper functions. For example, let's say you have a cached
operation with cache hit and cache miss paths:

    
    
      void GetValue (string key, out SomeBigType result) {
        if (_cache.TryGetValue(key, out result))
          return;
      
        result = new SomeBigType(key, ...);
        _cache[key] = result;
      }
    

In most scenarios this function might not get inlined, because the cache miss
path makes the function bigger. If you use the aggressive inlining attribute
you might be able to convince the JIT to inline it, but once the function gets
bigger it doesn't inline anymore.

However, if you pull the cache miss out:

    
    
      void GetValue (string key, out SomeBigType result) {
        if (_cache.TryGetValue(key, out result))
          return;
      
        GetValue_Slow(key, out result);
      }
      
      void GetValue_Slow (string key, out SomeBigType result) {
        result = new SomeBigType(key, ...);
        _cache[key] = result;
      }
    

You will find that in most cases, GetValue is inlined and only GetValue_Slow
produces a function call. This is especially true in release builds and you
can observe it in the built-in Visual Studio profiler or by looking at method
disassembly.

(Keep in mind that many debuggers - including VS's - will disable JIT
optimization if you start an application under the debugger or attach to it.
You can disable this.)

This tip applies to both desktop .NET Framework and .NET Core, in my testing
(netcore is generally better at inlining, though!) If you're writing any
performance-sensitive paths in a library I highly recommend doing this. It can
make the code easier to read in some cases anyway.

------
gameswithgo
One of the tips is to avoid Linq, which many .NET developers are hesitant to
do. I made a library that lets you use Linq style convenience functions
without a performance hit in many cases:

[https://github.com/jackmott/LinqFaster](https://github.com/jackmott/LinqFaster)

~~~
tluyben2
People abuse Linq a lot though; enormously complex queries over very large
datasets without really knowing what you are doing. When people need .Each,
some will just do .ToList().Each(. Etc. I found a bigger issue even with
abuse/overuse) (or use at all really) of dynamic. I wish there was a way of to
ban it at compile time.

~~~
jenscow
I will challenge every use of dynamic (and var, for that matter), unless it's
used in the very few appropriate cases.

~~~
reubenbond
`var` is entirely syntactic sugar (compile-time type inference) and there's no
runtime cost associated with it.

I find it makes most (but not all) code more readable, particularly given that
we have great IDEs/tooling in the .NET world.

~~~
ralphael
I agree, by using var you need to name your variable better therefore making
your code more readable. Rather than relying on the interface/class definition
to explain to someone why you used "obj".

~~~
quickthrower2
To var or not to var is purely a "tabs 'n' spaces" debate, but doesn't affect
performance.

~~~
jenscow
It will affect the performance of someone trying to read your code.

------
zamalek
> Reduce branching & branch misprediction

I wrote a parser for a "formalized" URI (it looked somewhat like OData). This
parser was being invoked millions of times and was adding minutes to an
operation - it dominated the profile at something like 30% CPU time. It
started off something like this:

    
    
        int state = State_Start;
        for (var i = 0; i < str.Length; i++)
        {
            var c = str[i];
            switch (state)
            {
                case State_Start:
                    /* Handle c for this state. */
                    /* Update state if a new state is reached. */
            }
        }
    

Hardly rocket science, a clear-as-day miniature state machine. VTune was
screaming about the switch, so I changed it to this:

    
    
        for (var i = 0; i < str.Length; i++)
        {
            for (; i < str.Length; i++)
            {
                var c = str[i];
                /* Handle c for this state. */
                /* Break if a new state is reached. */
            }
            
            for (; i < str.Length; i++)
            {
                var c = str[i];
                /* Handle c for this state. */
                /* Break if a new state is reached. */
            }
        }
    

The new profile put the function at < 0.1% of CPU time. This is something that
the "premature optimization crowd" (who tend to partially quote Knuth
concerning optimization) get wrong: death by a thousand cuts. A _single_
branch in the source (it ends up being more in machine code) was costing 30%
performance.

~~~
chrisseaton
> This is something that the "premature optimization crowd" (who tend to
> partially quote Knuth concerning optimization) get wrong

But this wasn't premature optimisation.

\- you found performance was actually a problem in practice

\- you used a tool to profile the application for a real workload

\- you isolated something to optimise

\- you came up with a way to optimise based on the data and your tools

\- you tested that the optimisation worked

That isn't premature optimisation. Premature optimisation would have been
writing this in the first place without checking anything first.

------
GordonS
> Mark classes as sealed by default

Please, no! This shouldn't be the _default_ \- it's a constant bugbear of mine
where I want to extend a class from a library, and I can't because it's been
sealed for no good reason.

~~~
wvenable
I had the same thought when I saw that. There seems to be a trend to seal and
lock down classes preventing any kind of extension. It sort of misses one of
the main benefits of OOP and I know what I'm doing.

~~~
scarface74
Instead of my rehashing the old arguments about inheritance....

[http://developer-interview.com/p/oop-ood/what-are-
advantages...](http://developer-interview.com/p/oop-ood/what-are-advantages-
of-composition-and-aggregation-over-inheritance-14)

~~~
wvenable
I understand the argument for composition-of-inheritance but they rarely apply
when you actually _need_ to do it. Often it's to reach in and fix a bug or
enhance the behaviour of an existing component. You simply can't do that with
composition.

Without inheritance (if the class is sealed) I often end up having to re-write
the entire component or simply accept my fate. So that is a lose-lose
situation.

~~~
scarface74
That's where delegation comes in. You wrap each public method of Foo in
another class call MyFoo and then fix the one method you care about. With C#
and R# it's a simple matter of:

Create a new class

Create a private variable:

private Foo _foo

Click on _foo

Resharper menu -> Generate Code -> create Delegating members.

~~~
wvenable
But you can't pass that into places that accept Foo because MyFoo isn't of the
type Foo so that's a non-starter is most cases.

Secondly this code generation solution is just re-implementing inheritance
again poorly and with the above mentioned limitation. I fail to see how code
generating a proxy is _any way_ better than (or significantly different from)
inheritance.

~~~
scarface74
Hopefully the code was written to depend on interfaces and not hard coded
types.

~~~
wvenable
So if you implement an interface and then use code generation to create proxy
from a "parent class", congratulations you just reinvented inheritance. What's
the difference?

------
blinkingled
> JIT won't inline functions that throw

Seriously? Never had to worry about that in Java land. What would be the
reason for this?

~~~
nlawalker
Stack trace accuracy, maybe?

~~~
fgonzag
Shouldn't matter, since you're throwing at that particular location of the
executing function, so the runtime still has to have a way of knowing that the
code was part of an inline function along with its name.

------
jermaustin1
So in other words extreme tuning = do the opposite of what you probably did!

