
Why is it faster to process a sorted array than an unsorted array? (2012) - tosh
https://stackoverflow.com/questions/11227809/why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array?m=1
======
taeric
By far the most interesting part of this post is the update with newer
compilers. Intel's compiler, in particular, makes an awesome optimization.

Edit: The update is not new, either. Just the part that I found interesting.
Apologies for any confusion.

~~~
winstonewert
The update with the newer compilers was done back in 2012. There have only
been minor changes since then. (Not to say that it isn't interesting, but so
that people don't go to the post expecting new content and get disappointed)

~~~
taeric
I updated to make this clearer at the top level. Thanks!

------
corey_moncure
The question I have, which I do not see answered after a quick glance of the
top responses, is: What is the total time of sorting + good branch prediction
loop, versus not sorting and bad branch prediction loop? Does the good branch
prediction save more time than it cost to sort the array, on modern
processors? What about old processors with shorter pipelines?

~~~
autokad
I feel like branch prediction is only a small part of the performance boost.
If I do the same in python 2500 times for an array of size 40k, I get
0.00183519964218 vs 0.00163079910278 or ~11% boost.

I imagine the 6x factor probably is because the optimizer unfolds the
array/loop structure.

edit: sorting the array increases the time by about 4x if you include the sort
time, so its not worth it

~~~
hyperpape
You can't use the Python implementation to get insight into what's happening
with a C++ implementation, because the overhead of operations in Python may
mask a real effect that can be seen in C++.

~~~
autokad
not directly, but information is information. I am also more skeptical that
branching alone would result in a 6x increase in run time.

We know optimizes unfold loops as well as arrays. it doesnt take a profound
leap in logic that the optimizer may have unfolded everything and realized
certain code was not going to change the state of the program, thus removed
it. Maybe it didnt, but then again maybe it did.

~~~
hyperpape
> information is information

It's information, but not worthwhile information.

> It doesnt take a profound leap in logic that the optimizer may have unfolded
> everything and realized certain code was not going to change the state of
> the program, thus removed it. Maybe it didnt, but then again maybe it did.

Prove it. This article has had a ton of traffic. If you see the thing that
everyone missed, a lot of people will be very impressed with you.

------
bluejekyll
The update at the very end is pretty awesome. Basically saying, some compilers
will optimize this for you now so there's no difference.

It would be interesting to see how Clang/LLVM do...

------
cimi_
I had a surprise with sorting in JavaScript a few days ago, details also on
SO.[0]

TL;DR: Chrome uses quick sort and we managed to hit its worst case by pre-
ordering the input alphabetically on the server side.

[0] [https://stackoverflow.com/questions/46228556/how-is-array-
so...](https://stackoverflow.com/questions/46228556/how-is-array-sort-
performance-affected-by-the-initial-ordering-of-the-input)

------
jnordwick
Needs a "(2012)" at the end of the title. Also I'm not sure when cmov changed
to 1 cycle latency, but it might effect some of the tangential topics here.

------
gene91
It appears to me here that it is trivial for a compiler to use a conditional
instruction (instead of a branch) here. As a result, I'm very surprised that
it didn't. Any idea why this is the case?

~~~
mortehu
Correctly predicted conditional branches are faster than conditional
instructions, because they add fewer dependenies. The programmer usually has
better knowledge of whether the branch is going to be 50/50 or somewhat
predictable, and is thus better suited to make that choice.

[http://yarchive.net/comp/linux/cmov.html](http://yarchive.net/comp/linux/cmov.html)

~~~
wyager
While that post is technically correct (conditional moves have high latency
because you can't dispatch the instruction until the conditional variable has
been evaluated), they're still good in many cases because there's _also_ no
immediate dependency on the output of the conditional move. You might end up
with 10 conditional moves in your reservation stations, but that's fine if all
you're doing is summing up the results. You don't actually act on that sum
until the end of your loop, so it's OK if it takes a few cycles after the loop
to flush all those pending conditional moves out of the reservation stations.

~~~
jnordwick
Cmov is 1 cycle latency now. Cant remember when there change occurred but AF's
info should point the way.

~~~
Veedrac
The issue isn't the `cmov` itself, which is extremely forgiving nowadays (4
per cycle!), but the fact the `cmov` introduces a dependency on both inputs
and the condition. A predicted branch introduces a dependency on _only_ the
predicted input: the other input is never calculated and the condition is
checked after-the-fact.

------
minikites
This is a fabulous explanation that I think a novice programmer could readily
grasp.

~~~
ak39
Agreed. StackOverflow could make a lovely book with these kind of high quality
answers. Stuff's gold.

------
xigency
I'm currently taking an OMS CS course in High Performance Computer
Architecture and recently completed the branch prediction lesson. It is a free
course on Udacity. Here's the source: [https://www.udacity.com/course/high-
performance-computer-arc...](https://www.udacity.com/course/high-performance-
computer-architecture--ud007)

Lesson 3 covers pipelines and lesson 4 covers branches. Milos does a great job
of explaining "How It Works" for something that is really a hidden layer under
the CPU.

~~~
bogomipz
Thanks for sharing, this course looks interesting. Cheers.

------
matthberg
This is a re-occuring duplicate, last posted 3 months ago, then twice again a
year ago. Should be tagged [dupe].

------
ramshorns
Some of the suggested ways to avoid the branch are bit manipulation (which is
not necessarily portable) and the ternary operator (which seems hardly
different from an if-else, though maybe the compiler usually treats it
differently). It seems like another way would be sum += data[c] * (data[c] >=
128); which adds 0 when the condition is false.

~~~
mark-r
Many processors have an instruction that will select between two operands
depending on a condition. That can be faster than a branch. Although a
sufficiently intelligent compiler would be able to determine that the
instruction could be used in an if/else situation.

------
QuotesDante
My immediate reaction would be to think of the entropy of the information in
the array. Sorted sounds like energy was spent to put more information into
that array. Intuitively, the lower entropy of a sorted array should help us
predict and make better decisions along the way of searching for things in the
array. Completely unsorted arrays give us less information to work with: the
lack of order certainly can't help us make decisions!

------
localhost
Read a little further down to WiSaGaN's answer. There's an excellent
discussion of the optimizations afforded by the cmovge instruction that is
typically generated for the C ternary operator and how its implementation in
the CPU pipeline allows it to avoid the branch misprediction penalty.

He references a textbook as well "Computer Systems: A Programmer's Perspective
(2nd Edition)". Just bought a copy.

~~~
xigency
See this also for a case against CMOV predicates:
[http://yarchive.net/comp/linux/cmov.html](http://yarchive.net/comp/linux/cmov.html)

------
samfisher83
After its sorted you could add a break condition when the array is less than
128 that way it would make it faster.

~~~
Kesty
This is just a simple example for this kind of situation.

Of course, you could make it faster by removing everything under 128 from the
array before starting the loop, but it's not really the point here.

------
wiz21c
When I look at the code, I'm under the impression that will generate always
the same output because the random seed is fixed (besides the timer). Given
current tech, could a compiler see that and just reduce the computation to a
simple value ?

~~~
MichaelBurge
Sort of. If rand() wasn't built into the standard, it would be impossible
because you could override it at link-time with another function that connects
to your database or something. And the linker runs after the compiler.

But because it's built-in, the compiler might be able to infer more about it,
just like it reduces:

printf("x");

to

putchar('x');

------
wiredfool
(2012)

~~~
gbrown_
Does this time this was posted really have any relevant bearing on the content
here?

~~~
fenomas
Among other things, for people to whom the title sounds familiar it gives them
a hint that it's the same article they remember, not something brand new.

~~~
zeotroph
I was quite certain that this was the famous post with the railroad junction
picture before I followed the link. This was submitted 3 times already (but
only sortof gained traction once):

[https://news.ycombinator.com/item?id=12490893](https://news.ycombinator.com/item?id=12490893)

[https://news.ycombinator.com/item?id=14459549](https://news.ycombinator.com/item?id=14459549)

[https://news.ycombinator.com/item?id=12272428](https://news.ycombinator.com/item?id=12272428)

I am not against resubmissions, but if it is an older post the year should be
in the title.

------
the_evacuator
Technically the branch prediction benefit comes from partitioning the array
around the critical value 128. Sorting is not necessary. Partitioning may be
faster.

------
oso2k
It's simple.

Cache, baby! Cache!

To fill in the blanks: Computers, CPU, RAM, and other Storage devices are well
optimized for sequential reading and writing.

------
tapatio
Imagine that as an interview question!

~~~
logicallee
Yep, the best interview questions are ones you yourself would fail before
2017.09.15 but would pass after 2017.09.15 based on having read a HN article.

This interview question is great for three reasons:

1\. It identifies that someone is you.

2\. It separates the bad-coder you (before 2017.09.15) from the good-coder you
(after 2017.09.15). This means that it is immune from generating a false
positive on a poor candidate due to time travel.

3\. It identifies cultural fit: the person reads the same news articles you
do. If you waste time reading random hacker news articles, you're going to
want to hire people who do the same. _Especially_ ones who were around and not
too busy on exactly 2017.09.15! It easily weeds out people who were on
vacation on that date, for example.

What I especially like about this is that it has nothing to do with anyone's
code. (After all, anyone who works at a level that low can answer it very
easily without having read this stack overflow question, so it's a strictly
orthogonal puzzle: it's only hard for people who don't need it!)

You should go ahead and add this to your list of interview questions! In fact,
why not make it the only one?

(It also avoids the fuss of having to come up with questions in any way
related to the work that a candidate will be doing, which, in case the above
sarcastic comment wasn't clear, is what you should actually be doing.)

~~~
mark-r
OK, we get your point. But consider that there are jobs where this stuff is
actually important, and it's useful to have working knowledge. If you were
actually going to ask this in an interview, you would obviously want to ask
something different but along the same lines, just in case the candidate was
familiar with this post. But that goes for many interview questions.

~~~
logicallee
For the jobs where this is actually important, this isn't a good interview
question - it's not subtle/deep enough. (There are many deeper questions for
those jobs - this one is a basic intro level question for a job such as that
one.) The reason I enumerated so many ways this is wrong isn't to be funnier:
it's because I really want people to stop doing this.

------
known
Compare with Hash

------
typon
Excellent answers

------
LucMomal
Imagine if words in your dictionary wasn't sorted and you will have your
answer.

~~~
icebraining
Why would you care, if you were reading them sequentially, like the code here
is doing?

