
An Introduction to Cache-Oblivious Data Structures - rusbus
https://rcoh.me/posts/cache-oblivious-datastructures/
======
SlySherZ
For everyone interested in this, I would like to recommend the MIT Advanced
Data Structures course, by prof. Erik Demaine (mentioned in the article). The
part of the course about cache oblivious data structures starts on this
lecture:

[https://www.youtube.com/watch?v=V3omVLzI0WE](https://www.youtube.com/watch?v=V3omVLzI0WE)

~~~
quantum-qaem
I watched the lectures last week but it was difficult to understand. Can you
recommend any beginner data structures course?

~~~
SlySherZ
As @Cursuviam said, Introduction to Algorithms is a pretty good introduction:

[https://www.youtube.com/playlist?list=PLUl4u3cNGP61Oq3tWYp6V...](https://www.youtube.com/playlist?list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb)

There are two things that make the lectures difficult to follow, IMO:

The first thing is that you need to be comfortable with the names that are
given to things, understand well the simpler data structures like different
kinds of trees, linked lists, graphs and so on, with their corresponding
operation running times. Go watch Algorithms and you'll be fine :)

The second is that the subject is _really hard_. During the course, most of
the time we're learning about (close to) state of the art data structures, so
don't expect to understand everything in one go.

If you want to understand it in depth, you'll probably have to more time
studying each lecture. Try thinking about how you'd solve the problem, try to
understand the proposed solution and go through the video very slowly making
sure you understand the details, try to implement at least some of them.

I used these lectures as a way to get comfortable with the field and get a
broad overview of the kind of solutions that usually work, so I still have to
go through most of Step 2.

------
fulafel
Trivia: Why they are called "cache-oblivious" and not "cache-aware"?

The paper that coined the term is here:
[http://cacs.usc.edu/education/cs653/Frigo-CacheOblivious-
FOC...](http://cacs.usc.edu/education/cs653/Frigo-CacheOblivious-FOCS99.pdf)

Basically, there were first cache-aware algorithms that assumed certain cache
sizes and other properties. "Cache-oblivious" algorithms were a refinement
that worked well for many cache sizes.

All in all it's silly that the "cache-oblivious" term was the one that
survived, because now cache-unaware and cache-oblivious algorithms mean the
opposite things - contradicting the dictionary definition of oblivious.

~~~
egnehots
It all become clearer if you replace it in your head by

"cache sizes oblivious" algorithms.

~~~
philipov
Nope. Still sounds like "having no awareness of cache sizes and lacking a
strategy for dealing with them."

No matter how you prefix it, 'oblivious' means you are missing something you
are expected to be responsible for. It is a word expressing negative
sentiment.

------
vvanders
Another fun one is radix sort[1]. We used to use it for sorting alpha'd
quads(think particle systems/ui/etc) from front-to-back. Easily annihilates
anything else in terms of raw performance for datasets < 10,000 with int/f32
keys and it's stable(!) for identical values.

This talk[2] from Herb Sutter(starting at 22:30) is one of my favorite talks
and does a great job of explaining why this is so important(and why dense
arrays are still king in many areas).

[1] [http://stereopsis.com/radix.html](http://stereopsis.com/radix.html)

[2]
[https://channel9.msdn.com/Events/Build/2014/2-661](https://channel9.msdn.com/Events/Build/2014/2-661)

------
joe_the_user
The article looks great but a doubling of speed actually seems kind of
disappointing (What I'd like to see is would something like "big O"
improvement).

I have been investigating cache-oblivious data structures for a while. They're
impressive, complex and a bit difficult to fathom (for a cache-oblivious tree,
you have to consider both relational structure created by pointers and the
layout of all the pointer in memory - the article seems like it gives a nice
overview compared to academic articles I've looked at).

The thing is, in most of the large applications I've tried to optimize, you
could squeeze several speed doublings out of fairly easy inner-loop rewrites
as soon as you started profiling the app. After that, you got more marginal
results from optimizing this and that. Consider; if 25% of the time is spent
retrieving data, a halving of time to do this results in a 12.5% increase in
performance. Is that 12.5% worth the complexity of implementing a cache
oblivious structure? Even if you application is a full database, I'm not sure
if the trade-offs are worth it.

~~~
rusbus
It actually may be an order as you suggest -- the difference gets bigger as
the tree gets bigger. I just stopped benchmarking at about 300MB of data where
the perf was almost 2x. If you run it on a 1GB tree it might be 4x

~~~
joe_the_user
Sure, maybe. The problem is that doesn't seem enough to necessarily compensate
for the added complexity.

The thing is a self-balancing binary tree is _so_ much faster than the
alternatives for large chunks of data that you really have to use it. Here,
I'm not so sure.

Anyway, thinking about this lead me to this discussion about fractal trees and
LSM trees.

[https://www.quora.com/What-are-the-major-differences-
between...](https://www.quora.com/What-are-the-major-differences-between-LSM-
trees-used-in-BigTable-etc-and-Fractal-Trees-used-in-TokuDB?share=1)

Another question here is how will this play out if the high-performance DB is
going to be moving to a GPU where cache levels and scans would be quite
different.

~~~
dspillett
_> Sure, maybe. The problem is that doesn't seem enough to necessarily
compensate for the added complexity._

A 50% improvement could be massively significant if processing a huge data-set
on thousands of nodes and it is still taking days/weeks/more of wall-clock
time. Given the choice between waiting longer, buying more computer power, or
using a more complex algorithm, the best of those choices will depend a lot on
the circumstances of the project. If it is a one-off then waiting or paying
for power will likely win out. Of course in some circumstances paying for more
power won't help: you can only get so much out of each node and for some
processes factors like interconnect throughput & latency might eat into the
ROI significantly as the number of nodes grow.

In some cases a 10% or less boost might be worth the extra complexity. And if
the algorithm can be generalised and packages as a library, you aren't
actually having to deal with that complexity each time the method is useful.

------
1wu
Neat--I wrote a C++ implementation of a cache-oblivious (implicit) van Emde
Boas binary search tree a [long] while back:

[https://github.com/lwu/veb-
tree/blob/master/veboas.cpp](https://github.com/lwu/veb-
tree/blob/master/veboas.cpp)

------
elvinyung
Dumb question: is it appropriate to describe fractal trees as an "extreme"
version of a log-structured merge tree? At a glance (i.e. if you squint really
hard), the buffer flushing process in a fractal tree seems kind of similar to
the tablet merging process in LSM trees in that they both batch operations to
amortize secondary memory access costs, except fractal trees generalize it to
all operations.

~~~
AstralStorm
Yes, additionally fractal trees have a predictable and compact memory
structure while LSMTs are more sparse. Average case for queries is also better
for fractal structure and implementation is actually easier.

------
djhworld
Really enjoyed this post, thanks.

I'm not sure if I really understood how the cache-oblivious solution works
though in this case, I can't get an intuition for how the code linked in the
github relates to the description in the article.

~~~
rusbus
OP here. It is admittedly a little hairy. The place to start is
[https://github.com/rcoh/treelayout/blob/8adb9bb9727bdaee671d...](https://github.com/rcoh/treelayout/blob/8adb9bb9727bdaee671d86e73eb7fbffee82889d/treelayout.go#L71)

The rough idea is that we layout the top chunk, then the bottom chunks, and
then wire them together

------
cryptonector
Do fractal tree indexes lend themselves to copy-on-write designs? (B-trees
do.)

EDIT: Also, are fractal tree indexes roughly like COW b-trees?

------
MichaelMoser123
Are fractal tree indexes in the public domain, or are they patented?

~~~
Intermernet
They are patented but there is a patent grant in place.

[https://github.com/Tokutek/ft-
index/blob/master/PATENTS](https://github.com/Tokutek/ft-
index/blob/master/PATENTS)

Disclaimer: IANAL. Consult proper legal professionals etc.

~~~
fourthark
One thing I've always wondered with patent grants in open source: it seems to
say only that it's okay to use _this library_ , nor necessarily to implement
the patented algorithm yourself.

Probably open sourcing and including a patent grant signal that the owner is
benevolent and isn't likely to sue you for re-implementing their algorithm,
but I don't think that's spelled out anywhere.

~~~
Intermernet
In the linked document they go out of the way to specify that the patent grant
only applies to "THIS IMPLEMENTATION" of the algorithm. I'm not one hundred
percent sure what that means in regards to reimplementing the algorithm.

