
Should array length be stored into a local variable in C#? - atomlib
https://habr.com/en/post/454582/
======
OskarS
This is good to know, but fairly unsurprising. The interesting case is
List<T>. The .Count property is actually a function call, and the value could
change during the loop. If you don’t mutate the list, is it smart enough to
both inline the function call and hoist the value out as an invariant?

~~~
zamalek
There's no mathematical way for it to know that you won't modify (especially
remove). The developer would have to express that by, say, iterating over the
internal array directly.

~~~
ygra
This is veering into C and undefined behavior territory, but with List<T>
there's a guarantee that you cannot modify the list while enumerating over it,
so could the compiler theoretically assume that when using foreach?

~~~
thrower123
I assume it does, if you try to do so it throws an exception. It's a rather
common footgun with multi-threaded code until you've lost a few toes to it.
People are always writing naive producer-consumer queues with an unlocked
List<T> backing it until they learn better.

------
60654
Nice benchmarking on Microsoft .Net CLR. Looks like the JIT compiler is smart
enough to recognize that array.Length is an invariant and hoist it out of the
loop, which is awesome for the common use cases!

One nitpick about the title: C# runs on more runtimes than Microsoft .Net CLR,
and those may behave very differently. For example: Mono CLR, or Unity's
IL2CPP which is an ahead-of-time compiler.

Specifically, I'd expect IL2CPP would _not_ hoist length out, because it would
not recognize it as an invariant. (Some great examples of IL2CPP cross
compilation are here:
[https://jacksondunstan.com/articles/4749](https://jacksondunstan.com/articles/4749)
)

TLDR: the Microsoft JIT compiler makes the local variable unnecessary, but
this is a property of the JIT, not of C#. Developers on non-MS platforms
shouldn't assume this.

~~~
chrisseaton
> those may behave very differently

They shouldn’t behave any differently should they? There’s a single language
spec.

~~~
tomsmeding
"Behave very differently" is meant with regards to performance, here; or in
other words, things that are not observable in the language semantics.

------
ducttape12
Always accessing array.Length is defensively coding. In the event your array
is mutated, always accessing array.Length ensures you won't run into an Index
Out of Bounds exception.

Even better is to just avoid accessing the array's length. I almost always use
foreach or Linq.

~~~
gameswithgo
That is orders of magnitude slower and allocates. Of course you can use:
[https://github.com/jackmott/LinqFaster](https://github.com/jackmott/LinqFaster)

~~~
gwbas1c
Wrong

From the article:

> It also turned out that Foreach often walks through the array faster than
> For

~~~
merb
1\. `ForEach` is slower. 2\. `foreach` is faster. 3\. Linq is slower.

------
cr0sh
I'm not a C# developer, but this kind of thing seems to permeate almost all
languages in one form or another.

Maybe it is just a style and readability thing; or maybe (as suggested
elsewhere as well) it is meant to be reused elsewhere in the system, so it is
cached in a variable for later use.

Or, it's possible that at one time - maybe early in the early days of .NET -
doing it this way was more optimized, and the habit stuck with developers
(perhaps they all read the same article in the knowledge base about it?). If
that's the case, it's a bit of "premature optimization", but one that doesn't
apparently harm anything.

What I do wonder is if certain other changes could change the speed?

At least it might be interesting to see in these trivial cases; I admit that
in more complex loops it might not be advisable.

But - for instance, what if rather than iterating thru the array from the 0th
element to the length of the array, you instead started from the last element
and iterated backwards, until you hit zero? That way, you wouldn't be checking
the length of the array, but rather for zero?

The code for such a test might look like:

    
    
        public int WithoutVariable() {
            int sum = 0;
            for (int i = array.Length - 1; i > -1; i--) {
                sum += array[i];
            }
            return sum;
        }
    

I'm not sure that a "with variable" version would make much difference (or
sense), but here it is for completeness sake:

    
    
        public int WithVariable() {
            int sum = 0;
            int length = array.Length - 1;
            for (int i = length; i > -1; i--) {
                sum += array[i];
            }
            return sum;
        }
    

Again - I'm not a C# developer - maybe my code is wrong above, but hopefully
it gets the idea across.

Would this work better? Would it be faster? What would the JIT compiler
create? Maybe it wouldn't be any faster or better than the ForEach examples?

I honestly don't know - but if anybody wants to give it a shot, I'd be curious
as to the results...

EDIT: I noticed that I said "checking for zero" \- but I modified my code to
check for -1 as the boundary; I suppose the check in the loops could be
modified to be "i == 0;" instead. I'm not sure if whether doing an "i >= 0;"
vs "i == 0;" vs "i > -1;" which is faster - another thing to check, I
suppose...

~~~
duncanawoods
I wouldn't naturally think about summing lists in reverse so it becomes more
cognitive effort to understand the code. I've seen plenty of termination
condition bugs on reverse iterations that might back that up.

A related thought is how modern code clean-up tools are doing things like
reducing if-nesting e.g. turning this:

    
    
        if (open) 
        { 
            stuff();
            close();
        }
    

into this, with early returns:

    
    
        if (!open) return;
    
        stuff();
    
        close();
    

My feeling is that the first more naturally represents the idea I have of the
behaviour and the second, like your reverse iteration, is an encoded version
of that idea. I feel I have to make an extra cognitive step decode and
reassemble it to create an idea of the behaviour.

I suspect the further you stray from the natural idea, the harder the code is
to read, validate by eye and the more likely errors are to crop in. I'm don't
know how subjective this is. I generally don't have a strong feeling about
early returns it's just I have been noticing the slightly greater cognitive
effort they are causing me compared to the logical chunking that nested-ifs
provide.

~~~
whoisthemachine
I personally find early returns to be easier to understand (I think of them as
exit conditions) than nested if statements but to each their own.

~~~
duncanawoods
There are definitely cases like pre-conditions and validation that are natural
early exists. There are other situations where it would sound really off and
confusing to phrase instructions for a human that way. Not an easy task for an
automatic code-clean-up tool to discern!

------
laurent123456
Saving the array length to a variable is one of those things that
inexperienced programmers love to do, thinking it will optimise something.

~~~
vips7L
It honestly depends on the compiler.

~~~
stult
It's a great default assumption when you don't know all the quirks of the
specific language or compiler, because either it will help you or at least
won't hurt you.

~~~
laurent123456
It might hurt if you add this extra variable to every loop and at some point
the array length changes within the loop. It also makes the code more verbose.
This optimisation should be done like all optimisations: first you benchmark
and then see if it makes sense to make this change. Most of the time there's
no point doing so.

------
patsplat
Use Linq and forget about array length.

It's an interesting analysis and all, but why bother when the language has
such an elegant collections API.

~~~
gameswithgo
Linq is orders of magnitude slower and allocates. It is not always an
appropriate choice.

~~~
roetlich
Do you have data for that?

Whenever I tried to find actual data for this Linq seemed to be slightly
slower, but not too much.

Examples: [https://codereview.stackexchange.com/questions/14197/is-
the-...](https://codereview.stackexchange.com/questions/14197/is-the-linq-
version-faster-than-the-foreach-one)
[https://wheresmykeyboard.com/2015/06/linq-lambda-loop-
perfor...](https://wheresmykeyboard.com/2015/06/linq-lambda-loop-performance-
test/)

~~~
gameswithgo
Yes. The total slowdown you get will depend on how much of the work being done
is the actual iteration vs the work inside the loop. If you are doing a long
running thing each time you go through the loop, then the overhead of the linq
iteration is not so bad

But something like:

    
    
        var sum = values.Sum(x => x * x);
    

will take ~10x longer than an for or foreach loop equivalent.

(260ms vs 36ms on 32 million float32s on my machine, measured by
benchmarkdotnet)

plus a small allocation

~~~
roetlich
Thanks for your reply. I'm fairly new to C#, and was hoping that linq would
make my life easier. Oh well.

I made some benchmarks myself:
[https://pastebin.com/5vQNpbPC](https://pastebin.com/5vQNpbPC)

And esp. the Sum is a lot slower in linq. Not quite an order of magnitude, but
pretty bad.

Even worse:

    
    
            float sum = 0;
            arr.Select(x => (sum += x * x)).ToList();
            return sum;
    

This is somehow still a lot faster than the normal linq Sum. What does .Sum()
do to be this slow?

Edit: I just noticed you also wrote this blog post on the topic:
[https://jackmott.github.io/programming/2016/07/22/making-
obv...](https://jackmott.github.io/programming/2016/07/22/making-obvious-
fast.html) I should have read that earlier!

~~~
louthy
You're doing more work though. You're converting to a List<T> (in order to
caox the lazy enumerable to enumerate). You should use Aggregate for a more
(generalised) way to reduce/fold collections into a value:

    
    
        var sum = arr.Aggregate(0, (t, x) => t + (x * x));
    

On the whole though it's better to use Linq until it's not. It's more
declarative which will lead to more reliable code. Optimise when you find
performance issues, don't write bad code just because you may gain a few
nanoseconds here and there.

------
coinerone
At the university, every time i put the array length into a local variable for
a loop, i got 2 points deduction on my Homework.

~~~
Narishma
Why?

~~~
Gibbon1
Probably because his teachers arrogance to life experience was too high.

------
jay_kyburz
Ahh, nice to see some c# without a new line before the braces.

~~~
thrower123
That is probably my least favorite thing about C# coding. The preeminent style
wastes so much vertical space. K&R braces for me.

------
artofcode
archive.org has a mirror in case the site is still hugged to death:
[https://web.archive.org/web/20190606120130/https://habr.com/...](https://web.archive.org/web/20190606120130/https://habr.com/en/post/454582/)

------
germanlee
If I remember correctly, the runtime keeps size of the array in the header of
the object along with sync block, etc. If you have VS, you can view the object
in memory to see the sync block value, array size value, etc.

------
suff
It is very possible the reason is not speed, but readability. If you simply
named it 'length', then sure, there is no point. If it is given a better, more
descriptive name, and then gets used in an equation elsewhere in the code,
then it may be very useful because it is easier to read.

~~~
JustSomeNobody
something like:

    
    
        var widgetLength = widget.Length;
    
    ?

~~~
SketchySeaBeast
I think that's confusing - it could be referencing a different object names
"widgetLength" how about widgetDotLength to be more clear?

~~~
JustSomeNobody
You're right. Let's go with yours. When can you push that?

~~~
SketchySeaBeast
Oh that'll be at least next week, I'm currently busy changing all the "int"
declarations to "wholeNumberDataType" for clarity.

------
potiuper
TL;DR: Yes & use foreach rather than for to skip array bound checks.

~~~
yc12340
It is odd, that C# needs foreach loop for bound check elimination.

Java supports this optimization for a wide range of loop types... since Java
7, I think. Normally I would argue, that Java is just ahead of curve, but
Android has also gained supports for bounds check elimination in 2014-2015.

Either article does not tell us whole truth or Microsoft JIT is subpar by
modern standards.

~~~
gameswithgo
c# does not need a foreach loop for bound check elimination.

