

Scalable distributed b-tree - tim_sw
http://highscalability.com/paper-practical-scalable-distributed-b-tree
new distributed systems primitive?
======
lsb
This looks like it'd be promising for having SQL databases over a cluster of
machines; how far along is this research?

~~~
evgen
How does this provide any more promise than similar distributed data
structures? A distributed b-tree will not get around the CAP paradox and the
necessity of client code to understand that it is running across a distributed
service. Other than an efficiency gain how will this be any different than
wrapping changes to the various internal nodes of a standard b-tree in a lot
of paxos calls and sharing it across a cluster...

~~~
adatta02
can you shoot up a link to the CAP paradox?

~~~
neilc
<http://devblog.streamy.com/2009/08/24/cap-theorem/>

[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1495)

CAP is absolutely _not_ a paradox; it is just a conjecture or a theorem.

As far as CAP and distributed databases, it is worth noting that a lot of
distributed DBs don't actually need to be partition-tolerant.

~~~
evgen
Not sure why I put paradox instead of conjecture there, but a quick answer to
the "what is it" query that does not involve reading papers is that for a
distributed system at any point in time you can only have two of consistency,
availability, or partition-tolerance. "At any point in time" is a key item
there, as can change the two characteristics chosen as the DB goes through its
various states or at different layers in the transaction.

OTOH, saying partition-tolerance is unimportant is only partially-true if your
db is only going to run on a single lan. This is often the case (i.e.
basically saying that "distributed" is another word for "cluster") but if your
db is going to run across a wan then sacrificing partition-tolerance can have
unpleasant consequences. I do a lot of work in Erlang and its built-in
distributed db (like the language itself) was designed to run in a phone
switch among SBCs and blades that shared a common backplane. In this case a
partition meant that something extremely bad was happening and graceful
shutdown/failure in the face of a partition was a good choice. Now that a lot
of us are using these tools out in the wilds of the internet we are
discovering that sacrificing partition-tolerance has some unpleasant side-
effects too: running mnesia across EC2 instances, for example, is not a
trivial undertaking because you need to deal with inevitable network problems
by shutting down auto-rejoin features of the db and effectively turn off
availability for a short period during the rejoin.

------
uggedal
Tokyo Cabinet and Tyrant have B-tree support.

~~~
brianr
As far as I could tell, Tokyo uses B-trees within a single instance, not
across instances as this article is talking about. Has this changed?

~~~
jbellis
No.

