This is good to know, but fairly unsurprising. The interesting case is List<T>. The .Count property is actually a function call, and the value could change during the loop. If you don’t mutate the list, is it smart enough to both inline the function call and hoist the value out as an invariant?
There's no mathematical way for it to know that you won't modify (especially remove). The developer would have to express that by, say, iterating over the internal array directly.
A good compiler can in a large number of cases infer that it won't modify.
The compiler can explore the call graph of the loop body and find out which methods might be called on the list, and then check that none of those methods write to the list's internal length variable.
How deeply and precisely the compiler (or JIT) analyzes the code in an attept to prove that the length won't change involves a tradeoff in compilation speed, but I'd expect .NET to have found a pretty good sweet spot by now.
> A good compiler can in a large number of cases infer that it won't modify.
For a single thread that should be doable. But that’s not mathematically covering all bases is it?
And how can the compiler know if the relevant list is not accessed from multiple threads? I’m pretty sure that’s next to impossible and definitely not worth the complexity for such a small optimisation.
Compilers have been assuming single-threaded memory access by default since, forever? If you want your code to work with multiple threads, then you have to explicitly write code to do so. Otherwise, the compiler is liable to store things in registers, shuffle instructions around, and do whatever it feels like to optimize performance for the single-threaded case. If the compiler didn't do this, then every program would run many many times slower than it does.
It often is easy to know it isn’t accessed elsewhere (escape analysis can prove it for local variables), and compilers do check the simple cases (search term: “lock elison”).
This also can help when using java’s StringBuffer, which is thread-safe, but can often be ‘converted’ into a non-thread-safe variant by the compiler (the existence of StringBuilder shows that ‘often’ at least in some version of Java wasn’t often enough, though)
>There's no mathematical way for it to know that you won't modify
That's an overstatement. That's currently impossible in C#, but Microsoft is working on it in F*: https://www.fstar-lang.org I hope some of that will trickle down to F# and C# some day.
This is veering into C and undefined behavior territory, but with List<T> there's a guarantee that you cannot modify the list while enumerating over it, so could the compiler theoretically assume that when using foreach?
I assume it does, if you try to do so it throws an exception. It's a rather common footgun with multi-threaded code until you've lost a few toes to it. People are always writing naive producer-consumer queues with an unlocked List<T> backing it until they learn better.
> with List<T> there's a guarantee that you cannot modify the list while enumerating over it, so could the compiler theoretically assume that when using foreach?
No such guarantee exists.
The only guarantee there is is that enumeration will throw on next iteration after a modification.
Nice benchmarking on Microsoft .Net CLR. Looks like the JIT compiler is smart enough to recognize that array.Length is an invariant and hoist it out of the loop, which is awesome for the common use cases!
One nitpick about the title: C# runs on more runtimes than Microsoft .Net CLR, and those may behave very differently. For example: Mono CLR, or Unity's IL2CPP which is an ahead-of-time compiler.
Specifically, I'd expect IL2CPP would not hoist length out, because it would not recognize it as an invariant. (Some great examples of IL2CPP cross compilation are here: https://jacksondunstan.com/articles/4749 )
TLDR: the Microsoft JIT compiler makes the local variable unnecessary, but this is a property of the JIT, not of C#. Developers on non-MS platforms shouldn't assume this.
Always accessing array.Length is defensively coding. In the event your array is mutated, always accessing array.Length ensures you won't run into an Index Out of Bounds exception.
Even better is to just avoid accessing the array's length. I almost always use foreach or Linq.
I write first for readability and performance second. Not to say that performance isn't important (and of course I pay attention to O notation), but throwing a little more memory and CPU power at a web page is far cheaper and easier than spending developer time reading code.
I'm not a C# developer, but this kind of thing seems to permeate almost all languages in one form or another.
Maybe it is just a style and readability thing; or maybe (as suggested elsewhere as well) it is meant to be reused elsewhere in the system, so it is cached in a variable for later use.
Or, it's possible that at one time - maybe early in the early days of .NET - doing it this way was more optimized, and the habit stuck with developers (perhaps they all read the same article in the knowledge base about it?). If that's the case, it's a bit of "premature optimization", but one that doesn't apparently harm anything.
What I do wonder is if certain other changes could change the speed?
At least it might be interesting to see in these trivial cases; I admit that in more complex loops it might not be advisable.
But - for instance, what if rather than iterating thru the array from the 0th element to the length of the array, you instead started from the last element and iterated backwards, until you hit zero? That way, you wouldn't be checking the length of the array, but rather for zero?
The code for such a test might look like:
public int WithoutVariable() {
int sum = 0;
for (int i = array.Length - 1; i > -1; i--) {
sum += array[i];
}
return sum;
}
I'm not sure that a "with variable" version would make much difference (or sense), but here it is for completeness sake:
public int WithVariable() {
int sum = 0;
int length = array.Length - 1;
for (int i = length; i > -1; i--) {
sum += array[i];
}
return sum;
}
Again - I'm not a C# developer - maybe my code is wrong above, but hopefully it gets the idea across.
Would this work better? Would it be faster? What would the JIT compiler create? Maybe it wouldn't be any faster or better than the ForEach examples?
I honestly don't know - but if anybody wants to give it a shot, I'd be curious as to the results...
EDIT: I noticed that I said "checking for zero" - but I modified my code to check for -1 as the boundary; I suppose the check in the loops could be modified to be "i == 0;" instead. I'm not sure if whether doing an "i >= 0;" vs "i == 0;" vs "i > -1;" which is faster - another thing to check, I suppose...
”what if rather than iterating thru the array from the 0th element to the length of the array, you instead started from the last element and iterated backwards, until you hit zero?”
That used to be common in assembly, as it leads to smaller and faster code on many systems.
Iterating from the tail has other advantages, such as if you want to remove items from the list. With negative iteration you don't need to do any special handling. But that is an optimisation for development time rather than run-time, I'd imagine multiple in-place removals is not so efficient.
I wouldn't naturally think about summing lists in reverse so it becomes more cognitive effort to understand the code. I've seen plenty of termination condition bugs on reverse iterations that might back that up.
A related thought is how modern code clean-up tools are doing things like reducing if-nesting e.g. turning this:
if (open)
{
stuff();
close();
}
into this, with early returns:
if (!open) return;
stuff();
close();
My feeling is that the first more naturally represents the idea I have of the behaviour and the second, like your reverse iteration, is an encoded version of that idea. I feel I have to make an extra cognitive step decode and reassemble it to create an idea of the behaviour.
I suspect the further you stray from the natural idea, the harder the code is to read, validate by eye and the more likely errors are to crop in. I'm don't know how subjective this is. I generally don't have a strong feeling about early returns it's just I have been noticing the slightly greater cognitive effort they are causing me compared to the logical chunking that nested-ifs provide.
In the above example I tend to agree with you, but I'd definitely prefer early return over multiple levels of nested if statements unless all of the nested statements are quite small.
There are definitely cases like pre-conditions and validation that are natural early exists. There are other situations where it would sound really off and confusing to phrase instructions for a human that way. Not an easy task for an automatic code-clean-up tool to discern!
I feel like I saw something about how the JIT only checks the bounds once when going in reverse, but I can't find anything concrete to back that up, so we'll have to go with the 10 year old write-up for now saying: no, it is not optimized compared to the Length optimization. https://blogs.msdn.microsoft.com/clrcodegeneration/2009/08/1...
`i > -1` isn't quite right. I'm not sure about how C# does implicit casting, but arrays should be indexed by `usize` and comparing signed w/ unsigned is a bug waiting to happen or UB in other languages.
Whilst I would personally use i >= 0 and find the use of -1 here slightly jarring, it is correct. .NET libraries usually use int even in places where uint would be best (like Array.Length) and indexers like this[int i] for Array. So, this will work fine. Well, it'd work fine anyway, because the negative value is never used, but yeah...
Yeah I meant it more as a bad habit, because when you move to other languages you'll have problems. In rust I don't think the style would even compile, and in C you're just playing with fire.
The difference is that the article is investigating using a property in the boundary check. The boundary check is evaluated with every iteration. Your example is only using it in the initialization expression which, by definition, is evaluated only once. This holds true for virtually any language with a for construct.
Avoiding doing unnecessary work is something all developers should strive to, experienced or not. Maybe in that case that won't make any difference. But, say, a method call in a loop often sould be done outside the loop. Doing it here but not there leads to inconsistent style. Also depending on the language and compiler that might actually make a difference sometimes. I'm not going to write code that assumes a certain interpreter/compiler behavior.
It's a great default assumption when you don't know all the quirks of the specific language or compiler, because either it will help you or at least won't hurt you.
It might hurt if you add this extra variable to every loop and at some point the array length changes within the loop. It also makes the code more verbose. This optimisation should be done like all optimisations: first you benchmark and then see if it makes sense to make this change. Most of the time there's no point doing so.
If things are slow, instrument and investigate. Only bother fiddling with Array.length if profiling with production workloads exposes that as the bottleneck. More likely the problem will be elsewhere.
Yes. The total slowdown you get will depend on how much of the work being done is the actual iteration vs the work inside the loop. If you are doing a long running thing each time you go through the loop, then the overhead of the linq iteration is not so bad
But something like:
var sum = values.Sum(x => x * x);
will take ~10x longer than an for or foreach loop equivalent.
(260ms vs 36ms on 32 million float32s on my machine, measured by benchmarkdotnet)
You're doing more work though. You're converting to a List<T> (in order to caox the lazy enumerable to enumerate). You should use Aggregate for a more (generalised) way to reduce/fold collections into a value:
var sum = arr.Aggregate(0, (t, x) => t + (x * x));
On the whole though it's better to use Linq until it's not. It's more declarative which will lead to more reliable code. Optimise when you find performance issues, don't write bad code just because you may gain a few nanoseconds here and there.
Bear in mind that the operation is trivial and so the overhead cost of invoking the lambda per iteration becomes more significant. I'm not trying to dismiss the performance differences, but summing 32 million floats is not a common use-case and I think the performance differences are mostly irrelevant for those using it to (for example) sum a list of line-items for an invoice or whatever.
For anyone that needs maximum performance then they should clearly be wary of anything that requires invocation of lambdas, but the usefulness of Linq shouldn't be written off because of that - the declarative style leads to fewer bugs, more stable, maintainable and easier to read code - that in itself is a performance boost.
You're right, but for most people writing C#, the developer time saved by Linq (being easier to read and understand quickly) is much more valuable than the tiny amount of CPU time that's lost.
If I remember correctly, the runtime keeps size of the array in the header of the object along with sync block, etc. If you have VS, you can view the object in memory to see the sync block value, array size value, etc.
It is very possible the reason is not speed, but readability. If you simply named it 'length', then sure, there is no point. If it is given a better, more descriptive name, and then gets used in an equation elsewhere in the code, then it may be very useful because it is easier to read.
it will skip the array bounds check as long as myArray.Length is the terminal condition in the for loop, rather than a local variable with the length stored.
It is odd, that C# needs foreach loop for bound check elimination.
Java supports this optimization for a wide range of loop types... since Java 7, I think. Normally I would argue, that Java is just ahead of curve, but Android has also gained supports for bounds check elimination in 2014-2015.
Either article does not tell us whole truth or Microsoft JIT is subpar by modern standards.
Long time ago I decided that you shouldn't second guess the compiler without a reason. So just avoid worrying about local optimizations and focus on your data structures and algorithms.