
Binary search is a pathological case for caches - kinetik
http://www.pvk.ca/Blog/2012/07/30/binary-search-is-a-pathological-case-for-caches/
======
brooksbp
Here's a fun and relevant paper:

A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using
Program Instrumentation

<http://www.cs.amherst.edu/~ccm/cs34/papers/ladnerbst.pdf>

------
koverstreet
...This is news? You've got a tight loop with a branch, and depending on which
way the branch goes, you're going to touch completely different locations in
memory.

The solution (at least for some problems) is just to lay your data out in
memory better - optimal is a binary search tree in an array, like the way
heaps are implemented.

That's what I did for bcache: [http://evilpiepirate.org/git/linux-
bcache.git/tree/drivers/m...](http://evilpiepirate.org/git/linux-
bcache.git/tree/drivers/md/bcache/bset.h#n56)

~~~
dmpk2k
Good lord. You didn't read the article _at all_.

I have some really ugly things to say to a new poster who cannot be bothered
to read an article before trying to show off, but I'll refrain. In the future,
read first, think, then comment. You'll be doing everyone else a favor.

~~~
koverstreet
I did read it. And none of that stuff about cache behavior _matters_ when your
data isn't in cache and you're spending 70 ns on every loop iteration waiting
for the next key to fetch from dram.

One of the first things to learn about caches is how dependent modern
processors are on prefetching, and a binary search just completely breaks
prefetching.

~~~
dmpk2k
No, you did not. If you had, there is absolutely no way you could have
written:

"optimal is a binary search tree in an array, like the way heaps are
implemented."

I leave it to other people to read the article first, then your comment, then
vote accordingly. For shame.

~~~
koverstreet
If I'm wrong, enlighten me.

Thus far all you've got is ad hominem arguments.

Anyways, I didn't come here to show off, but I did implement a b+ tree that
can do > 1 million random lookups/second on a single core, on a > 100 mb tree.
So believe it or not, I might know what I'm talking about.

~~~
to3m
TL;DR - If your element sizes are powers of two, and your cache is set-
associative, you run the risk of set aliasing. Caches always use lower bits of
the address as the key, probably because it's really fast.

If you have more aliases than there are ways in each set, items will become
evicted from the cache. If you do it just right, you can end up with the worst
case, and you get 100% cache misses. Therefore even something that might on
the face of it seem perfectly cacheable (well... if you don't know quite how
the cache works), with a small working set (note that the test involves
repeating the same search over and over again), can actually end up suffering
from 100% cache misses.

The L2 and L3 caches use physical addresses rather than virtual ones, so you
can't fully control this effect from user mode.

(Fingers crossed for my terminology...)

~~~
koverstreet
Yeah, I get that. What I was getting at is that binary searches are terrible
in general are _terrible_ as soon as you're not hitting cache anymore - and
for any problem worth optimizing that's what happens, if nothing else because
your data set doesn't fit in l2 anymore.

Worrying about stuff not being cached because of associativity is pointless
when the last 5-10 levels aren't going to be in cache anyways because they
don't fit; those last 5-10 cache misses are going to _utterly dominate_ your
search time.

IOW, even if associativity isn't a problem at all, binary searches suck w.r.t.
caches - I'm not at all saying they're wrong, just that it's not terribly
relevant - if performance matters you need to be doing something other than a
binary search.

