
Btrees are the new black - jjh42
http://www.me.net.nz/blog/btrees-are-the-new-black/
======
bdg
Someone should create a library where all your data is stored in a BTree.
Think of it! Huge data sets, small data sets, everything scales in this
system. And even better, they can create a language-agnostic API to a small
server that just stores the btrees for them and gives them back on request. We
could call it a database, but that would require we use indexes correctly for
once in our lives.

~~~
ebiester
And then we can stack every conceivable use case on top of it, and write that
language-agnostic API in a way that makes interfacing with most current
languages so onerous that people can build hundreds of thousands of lines of
code to make it slightly less painful!

Maybe we can make the configuration so convoluted that we can base entire
companies and revenue models around maintaining them.

~~~
coldtea
> _write that language-agnostic API in a way that makes interfacing with most
> current languages so onerous that people can build hundreds of thousands of
> lines of code to make it slightly less painful_

Those would be people who couldn't be bothered to learn a really simple
declarative language, and prefer ad-hoc OO ORMS and other shit like that...

~~~
dkersten
SQL is _not_ a simple language and expecting everyone to lean it properly is
crazy. Simple queries are easy, sure, but as with all things you quickly
outgrow simple things and then you're stuck. Especially if you want good
performance.

~~~
taeric
SQL is arguably at least as easy as any other language you can define around
this idea. Usually the complications come from trying to normalize the crap
out of your data and expecting a single relational model to work for all use
cases.

Oddly, moving this to another domain would not make things any easier in that
regard. If you tried to have a single data structure that stores all of your
data for all of your use cases, expect pain.

------
vinkelhake
Google has an open source implementation of b-trees in C++. You gain
performance and memory efficiency (less overhead per element stored). You lose
the iterator stability that std::map/std::set guarantees. Other than that the
interfaces are pretty much equal.

[https://code.google.com/p/cpp-btree/](https://code.google.com/p/cpp-btree/)

~~~
jjh42
(top article author). I wasn't aware of that, it looks like they come to
similar conclusions "B-trees are widely known as data structures for secondary
storage, because they keep disk seeks to a minimum. For an in-memory data
structure, the same property yields a performance boost by keeping cache-line
misses to a minimum. C++ B-tree containers make better use of the cache by
performing multiple key-comparisons per node when searching the tree. Although
B-tree algorithms are more complex, compared with the Red-Black tree
algorithms, the improvement in cache behavior may account for a significant
speedup in accessing large containers."

~~~
m_eiman
A sidenote: your blog post is from the future - unless we're already in May?
:)

~~~
jjh42
Fixed. I'd like to say it was a subtle implication that it contains wisdom
from the future ... but actually it was just me being boneheaded.

------
joe_the_user
Well,

If the hype is to be believed, Cache-oblivious b-trees are an even better
match for today's hierarchical memory systems.

Unfortunately, I don't know any simple free/open-source implementation of
these.

But here is a "home page" for them:
[http://supertech.csail.mit.edu/cacheObliviousBTree.html](http://supertech.csail.mit.edu/cacheObliviousBTree.html)

~~~
JulienSchmidt
There are even better fits for modern CPU architectures and hierarchical
memory systems (but not on HDDs) than B-Tree based structures, like the
adaptive radix tree: [http://wwwkemper.informatik.tu-
muenchen.de/~leis/papers/ART....](http://wwwkemper.informatik.tu-
muenchen.de/~leis/papers/ART.pdf)

I think B-Trees are mostly important for educational purposes, since this is a
very important general way of organizing data. For real world usage, there are
hundreds of different structures optimized for different use cases.

~~~
justin66
Do you know of any examples of good open source projects using the ART?

------
srean
Since there has been a resurgence of interest in OCaml on HN, some may be
curious to see how a production quality Btree implementation looks like in
OCaml. Algebraic data types, make it particularly convenient for these sorts
of things. For comparison one may contrast it with an implementation in a
language that does not have algebraic types, C++ for example. So glad Rust
chose to have them.

Here is a Btree implementation from the mirage project [1]
[https://github.com/cgreenhalgh/ocaml-
btree/blob/master/lib/b...](https://github.com/cgreenhalgh/ocaml-
btree/blob/master/lib/btree.ml)

EDIT: removed link to binary tree after kerneis pointed it out.

[1] [http://www.openmirage.org/](http://www.openmirage.org/)

~~~
kerneis
> Here is a Btree implementation from the mirage project

And in fact, this implementation does not show a lot because the actual B-tree
work is done by the baardskeerder library:
[https://github.com/Incubaid/baardskeerder](https://github.com/Incubaid/baardskeerder)

It looks very large and complex, it is most certainly possible to write a
simpler, pedagogical implementation.

~~~
srean
Found cpc [https://github.com/kerneis/cpc](https://github.com/kerneis/cpc)
from your comment stream. Looks super interesting. [http://felix-
lang.org/share/src/web/tut/fibres_index.fdoc](http://felix-
lang.org/share/src/web/tut/fibres_index.fdoc) might pique your interest. It
also compiles coroutines portably and efficiently (to a C++ runtime library).
Felix tries to do for C++ what F# does to C# (in case there is any confusion,
I am just a new user)

------
beagle3
The original B-Trees as described by Bayer were "intrusive" (like implemented
here), in the sense that every node contains keys, values, and pointers.

There are variants such as the B+Tree
[http://en.wikipedia.org/wiki/B%2B_tree](http://en.wikipedia.org/wiki/B%2B_tree)
, which stores only keys in nodes and chains blocks - which is more efficient
in range scans and less in general retrieval; And the
[http://en.wikipedia.org/wiki/B*-tree](http://en.wikipedia.org/wiki/B*-tree)
which is more densely packed.

------
cheepin
Interestingly, a red-black tree can be viewed as a BTree [0].

Another case where caching plays a huge role in determining the most efficient
data structure (See vector vs list [1]).

[0]:
[http://en.wikipedia.org/wiki/Red%E2%80%93black_tree#Analogy_...](http://en.wikipedia.org/wiki/Red%E2%80%93black_tree#Analogy_to_B-
trees_of_order_4)

[1]: [http://baptiste-wicht.com/posts/2012/12/cpp-benchmark-
vector...](http://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-
deque.html)

~~~
nonsens3
Robert Sedgewick gives a very good analogy of a red-black tree to a 2-3 tree
on his Coursera course (Algorithms, Part I) [0] - a bit different to what is
discussed in Wikipedia, but very easy to understand and implement.

[0]:
[https://class.coursera.org/algs4partI-004](https://class.coursera.org/algs4partI-004)

------
arneeiri
This reminds me of The Ubiquitous B-Tree by Douglas Comer. That article was
written in 1978.

[http://people.cs.aau.dk/~simas/aalg06/UbiquitBtree.pdf](http://people.cs.aau.dk/~simas/aalg06/UbiquitBtree.pdf)

------
csch
Timo Bingmann presented a fast and robust implementation here
[https://panthema.net/2007/stx-btree/](https://panthema.net/2007/stx-btree/)

------
thezorg
Speaking of B-trees, I find CouchDB's implementation really interesting: the
backing file is append-only, meaning that modifying the tree implies appending
the modified nodes at the end of the file (for example, if I change a leaf,
than I rewrite it at the end of the file, and rewrite its parent so that it
points to the leaf's new position, and so on until we reach the root node).

Sounds like a waste of space, but it solves a bunch of problems, like being
crash-proof (since we never overwrite anything, it's always possible to just
find the last root) and allowing reads concurrent with a write without any
synchronizing necessary. Plus, it's actually optimized for SSD's due to its
log-like structure!

CouchDB B-Trees :
[http://guide.couchdb.org/draft/btree.html](http://guide.couchdb.org/draft/btree.html)
Log structures in SSDs :
[http://lwn.net/Articles/353411/](http://lwn.net/Articles/353411/)

~~~
hyc_symas
Massive waste of space though. LMDB is also copy-on-write and crash-proof but
doesn't require compaction like CouchDB does.

[http://symas.com/mdb/](http://symas.com/mdb/)

Read the LDAPCon 2011 paper (and slides) to see how it works and why it's
better than CouchDB.

------
ef47d35620c1
In C++, std::set and std::map are typically implemented as RB trees. To
compare the performance of those to a hash backed container, try
std::unordered_set and std::unordered_map.

------
anotherevan
I have fond memories of red/black binary trees. About 15 years ago I wrote a
reusable library for R/B trees in C. I had a series of programs I had to write
that were collating a lot of data in memory into several buckets.

(The nice thing about virtual memory is you can make a really big RAM disk.)

------
briandw
Under what conditions will a B-tree out perform a hash map? Seems like the
hash map was far and away the better choice in his test.

~~~
jjh42
For insertions and lookup it is probably hard to find a situation (for large n
hashes are asymptotically superior).

Trees provide some other nice properties such as maintaining an ordering (so
it's fast to get the smallest argument or interate of the elements in
numerical order) and always perform at the asymptotic time cost (a dynamic
hash table occasionally takes O(n) when the table is resized so it might not
be a good fit in a realtime situation)

------
checker659
What about false sharing?

~~~
jjh42
I only benchmarked and considered the single-threaded case. Multithreading is
a whole new can of worms.

~~~
warmfuzzykitten
Trivially, one could synchronize around the entire tree.

However, if modified nodes are always copied, then the only node that matters
is the root node (since there can never be a partially modified tree
accessible from the root), which boils down to atomic update of a root node
pointer. Should be very efficient.

There is, of course, a performance implication of node copying, but it only
affects add/delete/replace performance - read access runs at full speed - and
it is cache-friendly, as well.

~~~
jjh42
If you want different threads to see the same data structure then I think you
would also need a mechanism to prevent multiple threads modifying the same
node and clobbering each other's changes.

~~~
warmfuzzykitten
Yes, of course. But there is a difference between read, which only requires
synchronization at the root node, and write, which requires synchronization at
the modified leaf and potentially all the way back up the path from leaf to
root.

