
Branch prediction: fundamentals every programmer need not know - mycpuorg
http://www.mycpu.org/branch-prediction-basics/
======
vowelless
Very shallow. I prefer the popular stack overflow answer by Mystical:

[https://stackoverflow.com/questions/11227809/why-is-
processi...](https://stackoverflow.com/questions/11227809/why-is-processing-a-
sorted-array-faster-than-processing-an-unsorted-array)

~~~
lostmsu
I love the explanation, but why did the guy suggest a complicated bit work
instead of just doing sum += arr[i] > 128 ? arr[i] : 0 is beyond me.

~~~
rabryan35
Theres still a conditional branch in that statement

~~~
lostmsu
That depends on the compiler, and I'd expect most to optimize this into a cmov

------
CalChris
"This results in a loss of a single cycle at the time of instruction fetch."
Maybe on that paper CPU but branch mis-predicts on Skylake are 16.5 cycles if
there's a μop cache hit and 19-20 cycles if there isn't.

[https://www.7-cpu.com/cpu/Skylake.html](https://www.7-cpu.com/cpu/Skylake.html)

That said, I didn't know about using BPM to access the PMC performance
registers.

------
stygiansonic
Rekated: Dan Luu's article on branch prediction is pretty good:
[https://danluu.com/branch-prediction/](https://danluu.com/branch-prediction/)

------
azinman2
Feels very shallow. I was surprised to reach the end of the article so
quickly.

~~~
ncmncm
Instead of complaining, post a follow-up. The author committed time to 1000
words on the topic. Without such a limit the article probably would not have
been written.

------
jbverschoor
What is the power consumption hit with prediction?

~~~
chrisseaton
You'd need two identical processors but one with branch prediction turned off
to find that out.

I don't think we have good models for power consumption to find out in a
simulation, either.

~~~
gchadwick
> I don't think we have good models for power consumption to find out in a
> simulation, either.

There are excellent models available to predict power consumption but you'll
need access to the processor RTL (the design of the processor typically
written in a language such as verilog or vhdl) and / or the outputs of the
implementation flow (like the netlist which is the description of the actual
logic gates and their connectivity) to be able to use them.

Such things are normally not available outside the company building the
processor and potentially their customers.

