
Estimating branch probability using Intel LBR feature - ot
https://easyperf.net/blog/2019/05/06/Estimating-branch-probability
======
AstralStorm
The thing is, branch prediction already in the CPU is so good, manual branch
optimisations are equivalent to code cache layout optimizations, which given
the ever bigger caches are getting ever less important too.

The more important ones are for cache layout of bigger data structures, and
there branch tracing is less than useful.

That said, this is very useful for tracing thread execution in particular.
(Inter thread cache flushes are a thing.)

~~~
reitzensteinm
Are caches actually getting bigger? They've remained essentially constant per
core from Nehalem through Kaby Lake for L1-3. I have heard they're doubling
L1/L2 in an upcoming chip, but it's been ten years!

~~~
ivl
Skylake-X has 1024 KiB of L2 per core. Which was a 4x increase over the 256
KiB of L2 per core that had been the standard forever since they've been going
with the Core series.

~~~
tntn
But skylake also has much less L3 per core, so the increased L2 is less like
cache growth and more like a rebalancing.

The change to a non-inclusive L3 is more likely to contribute to greater
capacity.

~~~
ivl
In the case I was thinking of, Skylake-X, the HEDT line, that wasn't entirely
true. I was looking mostly at something like Broadwell-E vs Skylake-X, where
the difference is huge:

Core i7-6950X - $1,700 at launch in 05-2016: 10 × 256 KiB of L2, with 25 MiB
of L3. Core i9-7960X - $1,700 at launch in 09-2017: 16 x 1024 KiB of L2, with
22 MiB of L3.

~~~
tntn
i7-6950X: 256 KiB L2/core, 2.5 MiB L3/core

i9-7960X: 1024 KiB L2/core, 1.375 MiB L3/core

Seems to be exactly what I described? L2/core went up by 768 KiB, L3/core went
down by 1.125 MiB.

~~~
ivl
L3 went down per core, but total L3 only fell by 3 MiB. While total L2 flew up
from 2.5 MiB to 16 MiB in a single generation for two processors that target
the same segment at the same price.

Per core you're correct, but I was looking at net cache available, as that had
been remaining static, usually just seeing an extra 256 KiB of L2 per core
with little L3 change.

------
titzer
Nice work, and great writeup!

