Hacker News new | more | comments | ask | show | jobs | submit login
Performance Tuning for .NET Core (reubenbond.github.io)
182 points by benaadams 39 days ago | hide | past | web | favorite | 75 comments

It would be nice if .NET Core profiling was a bit easier on Linux, Microsoft has a shell script[1] to do profiling but it requires Windows only tools.

They don't ship Crossgen with the Linux packages, and you have to manually generate the .NET runtime symbols.

I've gotten things like FlameGraphs working using BCC profile[2], but it took quite a bit of work.

[1]: https://raw.githubusercontent.com/dotnet/corefx-tools/master... [2]: https://github.com/iovisor/bcc/blob/master/tools/profile.py

The perf script from MS is what we use to profile and fix issues on linux. I do not have Windows; no issues so far with just Linux. We managed to diagnose and fix every perf issue so far. Not sure what you mean by windows only tools or manually generate symbols?

I haven't tried the script, the documentation says you need PerfView to view the data so I didn't bother running it.

For generating symbols, (I misspoke a bit, I mean downloading them) I'm talking about the native CLR runtime symbols. According to the docs if you want those, you need to use dotnet-symbol and manually download the symbols for the CLR alongside the CLR .so files.

Well, .net was for a long time only available on windows (since 2002 and development in late 1990's).

I think making the tools cross-platform after such a long time is not that easy/fast.

I view in the Linux cli and I can drill down, sort etc everything there and see exactly what the issues are. No Windows or PerfView needed.

A tip related to the throw inlining tip: One way to get more consistent/effective inlining is to split the complex 'slow paths' out of your functions into helper functions. For example, let's say you have a cached operation with cache hit and cache miss paths:

  void GetValue (string key, out SomeBigType result) {
    if (_cache.TryGetValue(key, out result))
    result = new SomeBigType(key, ...);
    _cache[key] = result;
In most scenarios this function might not get inlined, because the cache miss path makes the function bigger. If you use the aggressive inlining attribute you might be able to convince the JIT to inline it, but once the function gets bigger it doesn't inline anymore.

However, if you pull the cache miss out:

  void GetValue (string key, out SomeBigType result) {
    if (_cache.TryGetValue(key, out result))
    GetValue_Slow(key, out result);
  void GetValue_Slow (string key, out SomeBigType result) {
    result = new SomeBigType(key, ...);
    _cache[key] = result;
You will find that in most cases, GetValue is inlined and only GetValue_Slow produces a function call. This is especially true in release builds and you can observe it in the built-in Visual Studio profiler or by looking at method disassembly.

(Keep in mind that many debuggers - including VS's - will disable JIT optimization if you start an application under the debugger or attach to it. You can disable this.)

This tip applies to both desktop .NET Framework and .NET Core, in my testing (netcore is generally better at inlining, though!) If you're writing any performance-sensitive paths in a library I highly recommend doing this. It can make the code easier to read in some cases anyway.

One of the tips is to avoid Linq, which many .NET developers are hesitant to do. I made a library that lets you use Linq style convenience functions without a performance hit in many cases:


I feel I need to clarify this - my post is aimed specifically at optimizing code for high performance. Many of those tips will reduce readability of the code, which may make it harder to determine its correctness.

I'm a big fan of LINQ and I use it in my own code, just not in the high-perf bits. LINQ is great for writing code which has "obviously no deficiencies" in terms of correctness. Without it your code may end up having "no obvious deficiencies".

I also like LinqFaster, LinqAF and these other libraries/tools which can make LINQ usable in more domains.

So your library speeds up Array<T>/T[] and List<T> Linq operations? That's quite interesting, though usually in my experiences in Linq performance optimization the problem is usually prematurely using the wrong data type for your Linq operation.

Given the immediate next bullet point in the article is about foreach operations over List<T>, that would be my top expectation why they were seeing so much allocation in Linq. ToList() allocates a lot, and to many projects I've seen reify everything with ToList() way too often in Linq usage. I've argued before that List<T> is very rarely the appropriate data structure for a lot of Linq work, and personally consider ToList() harmful. A successful strategy I've used to cleaning up Linq performance in projects is to simply start by remove all ToList() calls entirely and work to move API signatures to use smarter, more appropriate data types than List<T> and IList<T> everywhere.

You just need to be careful not to allow any subsequent modifications of the original collection when doing that (assuming you're using the returned LINQ result), or you'll introduce runtime errors. Typically, my use of ToList or ToArray means I'm guarding against future runtime errors due to collection modification at the expense of memory allocation.

It sounds like you know this already, but I like to think I just saved someone from introducing a runtime error after following your advice without understanding the consequences.

I just often find I need to remind people that there are other data types to consider reifying to beyond List/Array. While I'd probably suggest removing ToList() from Linq as List<T> really does seem to be the wrong data type more often than not, but for whatever reason so many people find it the "easiest" way to deal with Linq. I feel like ToLookup() and ToDictionary() should be more people's best friends when working with Linq. Too often I find bad searches and joins that become so much easier/more performant if some middle product was appropriately stored in a Dictionary<K, V> or ILookup<K, V>.

Also with Linq against IQueryable sources, so often ToList/ToArray is used where AsEnumerable would be better. Understanding those monad boundaries is hard, and AsEnumerable doesn't always sound self-explanatory to people when they need to cross those boundaries. (I've thought before that some simple IDE highlighting of which bits of Linq are against IQueryable and which against IEnumerable might help some people think better in Linq.)

People abuse Linq a lot though; enormously complex queries over very large datasets without really knowing what you are doing. When people need .Each, some will just do .ToList().Each(. Etc. I found a bigger issue even with abuse/overuse) (or use at all really) of dynamic. I wish there was a way of to ban it at compile time.

What's wrong with any "use at all" of dynamic? It's just another type like object that has runtime reflection built-in and works well for interop, and without much performance problems.

>without much performance problems.

Just depends on the kind of thing you are working on. For many people you could use dynamic all over, no problem. For many people that would be a disaster. (super high throughput server, things that need super low latency, games, rendering, libraries that could be used in any of those domains)

Exactly: I am usually writing backends for financial systems which need high throughput and low latency; dynamic is just a massive performance hit. And almost never needed; people usually use it because lazy in my experience.

It is just ‘faster’ for people to dump JSON into dynamic and write against that than it is to properly create structured classes. In the end ofcourse the unittests validate the JSON, the code is slow and harder to use and refactor; you would have saved time and stress actually just writing classes with proper typing.

From personal experience it also gets abused by people looking for ways to shortcut writing actual classes

I will challenge every use of dynamic (and var, for that matter), unless it's used in the very few appropriate cases.

`var` is entirely syntactic sugar (compile-time type inference) and there's no runtime cost associated with it.

I find it makes most (but not all) code more readable, particularly given that we have great IDEs/tooling in the .NET world.

I agree, by using var you need to name your variable better therefore making your code more readable. Rather than relying on the interface/class definition to explain to someone why you used "obj".

To var or not to var is purely a "tabs 'n' spaces" debate, but doesn't affect performance.

It will affect the performance of someone trying to read your code.

People are still afraid of `var`?

Also worth mentioning: https://github.com/kevin-montrose/LinqAF

Focused on reducing the allocations which makes linq heavy

In that vein, Nessons has a Stream Fusion library: https://github.com/nessos/Streams

For lightweight threads I recommend https://github.com/Hopac which is an implementation of SML's John Reppy's Concurrent ML on .NET.

This looks really interesting - are the benchmarks in the GitHub repo against .NET Framework? With all the performance work that's been done in .NET Core, it might be interesting to compare it with .NET Core.

Also, I noticed there haven't been any commits for a while - is this more or less considered 'complete'?

Yes it is more or less considered complete. I may do a pass over it to see if it can leverage or support span<t> better sometime soon.

Performance deltas should be similar on .NET Core.

> Reduce branching & branch misprediction

I wrote a parser for a "formalized" URI (it looked somewhat like OData). This parser was being invoked millions of times and was adding minutes to an operation - it dominated the profile at something like 30% CPU time. It started off something like this:

    int state = State_Start;
    for (var i = 0; i < str.Length; i++)
        var c = str[i];
        switch (state)
            case State_Start:
                /* Handle c for this state. */
                /* Update state if a new state is reached. */
Hardly rocket science, a clear-as-day miniature state machine. VTune was screaming about the switch, so I changed it to this:

    for (var i = 0; i < str.Length; i++)
        for (; i < str.Length; i++)
            var c = str[i];
            /* Handle c for this state. */
            /* Break if a new state is reached. */
        for (; i < str.Length; i++)
            var c = str[i];
            /* Handle c for this state. */
            /* Break if a new state is reached. */
The new profile put the function at < 0.1% of CPU time. This is something that the "premature optimization crowd" (who tend to partially quote Knuth concerning optimization) get wrong: death by a thousand cuts. A single branch in the source (it ends up being more in machine code) was costing 30% performance.

> This is something that the "premature optimization crowd" (who tend to partially quote Knuth concerning optimization) get wrong

But this wasn't premature optimisation.

- you found performance was actually a problem in practice

- you used a tool to profile the application for a real workload

- you isolated something to optimise

- you came up with a way to optimise based on the data and your tools

- you tested that the optimisation worked

That isn't premature optimisation. Premature optimisation would have been writing this in the first place without checking anything first.

I failed to see how your anecdote relate to your ax-grindy assertion that premature optimization is bad. You wrote your code, you profiled it, you found one function taking 30% of the time, you optimized that function.

We can always end up with a tautology interpretation of premature optimization, in which case it cannot be bad, if it was bad, it wasn't premature!

While this may no include you, a great many forums and advise areas on the internet, and elsewhere, would discourage you from even worrying about a detail like this. Asking questions about this will just get you a knuth quote. I sometimes wonder if the reason so much software today is inexplicably slow is because for 30 years so many young kids curious about how computers and compilers work were told not to worry about it.

I think it is useful to build up some intuition about these things, so your default implementation can be the fast one when it isn't especially messy to do so. People don't always get the opportunity to come back and change things later, and some design choices don't just affect a single function that you can easily fix later in the development processes.

> We can always end up with a tautology interpretation of premature optimization, in which case it cannot be bad, if it was bad, it wasn't premature!

The vast majority of "premature" optimisation happens before the code has been profiled to see which part is slow. In some cases, before the code has _ever been run_. i.e. it is speculative design.

In contrast, you're talking about a function using 30% of the CPU time in a profile, so that's clearly not you.

The longer and less common Knuth quote is below, and it's much more nuanced than the headline. Note that it specifically talks about code optimisation by identifying "critical code" with "measurement tools". That sounds like what you did. Does it not?

> "There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

> Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail."

- Structured Programming with go to Statements, Donald Knuth, ACM Computing Surveys, Dec. 1974

It's because most of the time, people asking questions like these don't actually have a perf issue. This is very evident on StackOverflow.

So when you do have such an issue, and you want to make it clear that it's not a premature optimization, you should mention that it's a hot path as determined by profiling (or whatever other technique). Then you won't get the Knuth quote.

On the other hand, it’s highly annoying to have to anticipate and ward off a whole range of uncharitable and/or unimaginative assumptions about why one would want to do such a thing before there is even an attempt made at a straight answer. If nobody knows the answer, I’d rather have no replies than yet another soliloquy on why nobody actually needs to optimize anything.

It's a function of what's typical. You might be coming to it from the position of having just this one issue, but keep in mind that the people answering that are likely seeing more than one such question every day, and most of them are premature optimization. Thus people bring this up, because in the grand scheme of things, it saves a lot of time for everybody involved.

> most of them are premature optimization

The person asking may be prematurely optimizing, but can you make that assertion for every single person arriving at the same question/post from a search engine? If premature optimization is a bad thing, then prematurely calling out premature optimization is doubly-so.

I have no doubt that fielding ignorant questions every day inculcates a certain attitude of presumptuous arrogance. How could it not?

However, as the sibling comment says, I'm most commonly annoyed by this when I arrive at an answer via search and have to wade through the whole interrogation process about what the OP was really trying to accomplish.

It's like, look... you're not being brought on as a consultant, where whole-system analysis and root-causing issues would be part of your professional responsibilities. Sometimes a person just wants to ask an obscure question about software to see if maybe someone else knows the answer. That person shouldn't have to provide a complete backstory and concept of operations for whatever it is they happen to be working on.

They're also not entitled to an answer, so if you don't know, for the love of god, just move along and do something else with your day. I think this kind of behavior wastes a lot of time for everybody involved.

> Mark classes as sealed by default

Please, no! This shouldn't be the default - it's a constant bugbear of mine where I want to extend a class from a library, and I can't because it's been sealed for no good reason.

It is a general design principle for .net libraries that you should seal a class unless you deliberately and carefully design it to be extensible.

If a class is unsealed you are basically not able to change it ever again, since someone might have subclassed it and depend on all kinds of internal behavior. So it should only be done very deliberately.

This isn't just a design principle for .NET libraries, it's a SOLID design principle. All classes should be inextensible unless they are explicitly designed to be extended.

The open/closed principle doesn’t say that. It says classes should be “open for extension” (usually meaning you are able to inherit from them and add functionality) and “closed for modification”, meaning you aren’t able to change internal implementations.

It does not say “everything should be closed by default unless explicitly marked otherwise”...

Composition is still an option, and arguably the default option. Sealing a class does not prevent anyone from extending the functionalities offered by a class.

> It is a general design principle for .net libraries

Is it? A more typical approach, IMO, is for internal classes to be marked as `internal`, and for `public` classes to be considered part of the public API.

You do both. If you don't need to expose a class you keep it internal. If you do need to expose it, you keep it sealed unless you deliberate design it for extension.

The overarching principle is to keep the API as tight as possible, because as soon as an API is public, you have committed to it and even subtle changes in observable behavior may break some client.

Usually you define an interface for anything that you want to be overridable and provide some way for consumers to provide their own implementations. Then seal your own implementations.

Then you only need to design your interfaces in such a way that they can easily be implemented by composition.

This is the approach taken in ASP.Net and in WCF; although the latter does a really bad job at making their interfaces user friendly (for even tiny changes you have to re-implement a lot of functionality - which could have been avoided by better interface design).

Please yes!

Classes should be designed for extension on purpose, not as oversight, one of the SOLID pillars.

Taking advantage of an oversight is an open door for fragile base class problems and hard to track down bugs.

I agree but in the context of the article, I believe the classes being sealed aren't meant to be overridden.

If the author has spent time optimising to that level, overriding probably isn't necessary or warranted in that area.

How do you know it's not been done for a good reason? Designing classes for extensibility is not easy. Furthermore, you can make classes extensible later on, but you can't do the reverse without breaking the API.

There are different kinds of extensibility.

Designing for extensibility is not easy in the context of virtual methods. Every virtual method can be overridden, which means that you effectively need to define contracts for all your methods that can be overridden, and only call them in ways that respect those contracts. In a language where everything is virtual by default, like Java, this means all non-private methods. Which is not good, because most of the time, the reason why you're making a method public is to allow calling it, not to allow overriding. That's why C# made "virtual" explicit opt-in.

But for inheritance, there's no such issue. If someone inherits from your class, they can't break your invariants - they still have to invoke your constructor, and they don't have access to any private state. In C#, they also can't override public and protected members, unless those are declared virtual. Thus, there's no safety or correctness reason to prohibit derivation.

It should also be noted that there's no perf gain from declaring a class "sealed", unless it has virtual methods (normally, inherited from a base class, because it doesn't make any sense to have virtual methods declared in a sealed class). Thus, the only time to do so is when you have a class hierarchy, for leaf nodes in that hierarchy.

Unfortunately, the "new" keyword can override functionality.

It cannot, really. It can only shadow the old declaration (and isn't even necessary). This means if you have Derived : Base with such a shadowed member, then code from the library that uses a Base cannot call your shadowing member. You can't really change behaviour that way, nor break anything, except for yourself.

Sometimes yes - what I'm saying is that it shouldn't be the default, and all too often is, for no good reason, even in classes that would appear to be natural extensibility points.

On the plus side, when I encounter this, I generally ask the author on GitHub if it can be 'unsealed', and they generally do so.

Why doesn't the JIT assume classes are effectively sealed until it sees a subclass, like the JVM does.

Because AFAIK .net cannot deopt when another implementation is loaded dynamically

I think Core 3.0 can?

Because it is bad design known as fragile base class, which usually leads to hard to track down bugs, because someone somewhere is accessing methods or internal class data structures that they shouldn't have to in first place.

That sounds more like an argument to seal your classes, rather than for why the JIT doesn't, in addition to that, assume classes as sealed until proved otherwise.

You're talking about the programming model, where I'm talking about the implementation.

Well, JIT heuristics end up being designed to live with the typical code patterns of the programming language.

In Java's case, since methods are virtual by default so with open classes a large majority of the code has a possible indirection, hence such optimizations.

I have never inherited from a class from a library unless it was specifically designed that way. I think it's much better to aggregate.

Makes me wonder: Can you add extension methods to sealed classes?

Yes; extension methods just operate over the public interface

I had the same thought when I saw that. There seems to be a trend to seal and lock down classes preventing any kind of extension. It sort of misses one of the main benefits of OOP and I know what I'm doing.

As a consumer of library software, yes, you know what you are doing and you know how you want to override things. That is true. But the advice is more about how to create stable libraries that can change over time and making sure the public surface area and extensibility points are created deliberately and can be maintained. It means the lib developer can release their software knowing that he didn't make any breaking changes to the public surface area of the library.

The problem is, some libs are not well maintained, and not that well designed, and you (the generic "you" ) are happy to deal with breaking changes if and when they come along on version upgrades. It can be frustrating if they essentially locked it all down without much thought. But in general, I think the sealed by default is the better advice. You can do cunning ( evil ) things to get around it.

Instead of my rehashing the old arguments about inheritance....


I understand the argument for composition-of-inheritance but they rarely apply when you actually need to do it. Often it's to reach in and fix a bug or enhance the behaviour of an existing component. You simply can't do that with composition.

Without inheritance (if the class is sealed) I often end up having to re-write the entire component or simply accept my fate. So that is a lose-lose situation.

That's where delegation comes in. You wrap each public method of Foo in another class call MyFoo and then fix the one method you care about. With C# and R# it's a simple matter of:

Create a new class

Create a private variable:

private Foo _foo

Click on _foo

Resharper menu -> Generate Code -> create Delegating members.

But you can't pass that into places that accept Foo because MyFoo isn't of the type Foo so that's a non-starter is most cases.

Secondly this code generation solution is just re-implementing inheritance again poorly and with the above mentioned limitation. I fail to see how code generating a proxy is any way better than (or significantly different from) inheritance.

Hopefully the code was written to depend on interfaces and not hard coded types.

So if you implement an interface and then use code generation to create proxy from a "parent class", congratulations you just reinvented inheritance. What's the difference?

That's a terrible article.

> There is principle, which sounds like "favor object composition over class inheritance".

This is begging the question, isn't it? Why favor object composition over inheritance? The qualification here that's missing is "for code re-use". If your goal is simply to re-use code then you should favor composition. But if you're actually modeling an is-a relationship or trying to modify the behavior of a base class then that's not just for code re-use.

This subtly seems to have been lost. It's much easier to religiously ban all inheritance than to simply use it appropriately.

> In most cases "HAS-A" relationship is more semantically correct than "IS-A" relationship between classes.

In most, but not all cases. You will find that in most cases of inheritance in frameworks, for example, follow the is-a relationship perfectly. And you'll find most of time composition is used for has-a relationships. It's an error to mix this up but that's not the fault of inheritance.

For example, in my own software I inherit from the database context class to provide a host of features beyond what the framework provides. My context is a database context.

> Composition is more flexible than inheritance.

If you need that flexibility, great. But I don't see why having more flexibility is automatically good for the correctness of the program. If X is always implemented by Y, that's much easier to reason about.

> Inheritance breaks encapsulation.

I don't see how that's true. Your base class exposes what it wants to potential child classes so encapsulation is preserved. There are some languages that provide no limits (no protected/private/virtual/etc) but then those don't provide any encapsulation in any other situation either.

> A design based on object composition usually will have less classes.

Says who? What's the research behind this? I assume most projects have a mix of inheritance and composition as needed.

> It is possible to implement "multiple inheritance" in languages which do not support it by composing multiple objects into one.

That's not multiple inheritance -- it's just composing multiple objects using some kind of proxy. There is a difference.

> There is no conflict between methods/properties names, which might occur with inheritance.

Ok, that's a fair point.

Inheritance is also one of the most tight forms of coupling, which should be avoided. One is better off to use composition than inheritance.

A workaround would be to use extension methods, depending on your reasons for extending.

Or, reflection if you're breaking the rules anyway.

> JIT won't inline functions that throw

Seriously? Never had to worry about that in Java land. What would be the reason for this?

Size generally; you can aggressive inline it, but you'll likely be pulling in some chunky stuff for the throw. Push the throw out of line and your inline will be more trim.

Stack trace accuracy, maybe?

Shouldn't matter, since you're throwing at that particular location of the executing function, so the runtime still has to have a way of knowing that the code was part of an inline function along with its name.

So in other words extreme tuning = do the opposite of what you probably did!

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact