
Sysbench Benchmark for MongoDB – Performance Update - plasma
http://www.tokutek.com/2013/05/sysbench-benchmark-for-mongodb-v0-1-0-performance-update
======
plasma
Received an email from TokoDB about this new tech, available to play now.

Perf Blog: [http://www.tokutek.com/2013/03/sysbench-benchmark-for-
mongod...](http://www.tokutek.com/2013/03/sysbench-benchmark-for-mongodb/)

Download Source: <https://github.com/Tokutek/mongo>

TokuTek say to contact them for using it.

TokuTek made TokuDB, a storage engine for MySQL that uses Fractal Indexes (as
opposed to B-Tree), read more at <http://www.tokutek.com/products/tokudb-for-
mysql/>

~~~
danielbarla
I'm curious about the fractal tree indexing referred to, but so far all the
information I'm running across seems to be fairly one-sided. Performance is
rarely so one-dimensional that I can accept statements like "When used in a
database, they can be used in any setting where a B-tree is used, with
improved performance" (from <http://en.wikipedia.org/wiki/TokuDB>) on face
value. The algorithm claims to have strong insert performance, as well as
higher compressability when compared to B-trees, and all the performance
results play heavily on these.

What about read-heavy workloads, etc? Can someone with a bit more insight
provide an objective overview?

~~~
leif
I am an engineer at Tokutek. Just talking about TokuMX, I have only seen
MongoDB beat us on single-threaded, read-only workloads on extremely small
data sets (like under 8K, or one MongoDB B-tree bucket).

The statement is really pretty much true. The best B-tree indexes (InnoDB)
beat Fractal Tree indexes a little bit on in-memory reads, but we're not done
tuning our implementation. On out-of-memory reads though, we usually beat
B-trees because we compress so much better that we simply need to read less
off disk.

Benchmarking is a tricky business though. Workloads can be varied, and you can
probably find some corner cases where InnoDB beats TokuDB. I think we have
some outstanding issues where the MySQL optimizer doesn't plan our queries
properly, and we're still working on that. I'm a little out of touch with the
MySQL side these days though so that could be wrong. But algorithmically,
there are no cases where a B-tree index has a significant advantage over a
Fractal Tree index.

Generally though, the read performance advantage we see is that if your
indexes are Fractal Tree indexes, you can afford to maintain a richer set of
indexes than you could have on InnoDB or MongoDB, and these extra indexes make
your queries orders of magnitude faster. I think this is the most important
(non-obvious) point to understand. I gave a talk about it here:
<http://www.youtube.com/watch?v=q6BnG74FZMQ>

~~~
StefanKarpinski
This data structure sounds as mythical as a unicorn. Are there any peer-
reviewed publications about this data structure or is it a trade secret? The
name seems pretty vacuous/hype-laden since any well-balanced tree is fractal.

~~~
leif
The name is a marketing term, you're right. The structure is somewhere in the
realm of buffer trees, LSM-trees, $B^\epsilon$ trees, and COLAs. It's hard to
give it an academic name precisely because the implementation takes hints from
many places, but we hope to publish more soon. B-trees are on the optimal
tradeoff curve but there are many other points on that curve, and B-trees
_heavily_ favor reads so there's plenty of room to win on the write side. I
have a couple of blogs about this actually:
[http://www.tokutek.com/2011/09/write-optimization-myths-
comp...](http://www.tokutek.com/2011/09/write-optimization-myths-comparison-
clarifications/) [http://www.tokutek.com/2011/10/write-optimization-myths-
comp...](http://www.tokutek.com/2011/10/write-optimization-myths-comparison-
clarifications-part-2/)

------
malkia
I'm working with PostgreSQL right now, and it supports various indices (I'm
sticking with btree for everything I have for now). Just made me wonder,
whether the folks behind TokuDB cannot pursue PGSQL variant of their solution?

never mind - found some good answers here -
[http://postgresql.1045698.n5.nabble.com/Fractal-tree-
indexin...](http://postgresql.1045698.n5.nabble.com/Fractal-tree-indexing-
td5744987.html)

------
programminggeek
So, TokuMX is web scale?

In all seriousness, this looks super cool, but I'm easily fooled by benchmarks
and I don't have any projects right now that really push MongoDB that hard, so
it doesn't impact me other than giving us a potentially faster Mongo, which is
cool.

~~~
zardosht
I work for Tokutek. In addition to performance improvements and compression,
another cool thing about TokuMX: it's fully transactional. We hope this makes
development of applications simpler.

~~~
kapilvt
could you clarify this statement. mongodb native is consistent and atomic wrt
to single documents. Are you saying that tokumx does something additional wrt
to transactions (which also implies client api changes)?

~~~
leif
TokuMX offers multi-document transactional semantics without application
changes (snapshot reads), as well as protocol support for multi-statement
(read-modify-write style) transactions, within a single shard. We are still
designing how we want to present transactions in a sharded cluster.

------
toolslive
is to be expected: you have a data structure where half of the inserts can be
done in O(1).

------
nasalgoat
I've signed up for this beta release - even without ACID transactions on a
cluster-wide basis, the other features alone are worth it, especially the
compression.

------
philsnow
From what I understand, what mongodb needs is not more speed, but better acid
guarantees by default.

~~~
zardosht
From Leif earlier, "TokuMX offers multi-document transactional semantics
without application changes (snapshot reads), as well as protocol support for
multi-statement (read-modify-write style) transactions, within a single shard.
We are still designing how we want to present transactions in a sharded
cluster."

~~~
philsnow
All built on the rock-solid webscale foundation of Mongo?

