

TokuDB claims to speed up MySQL by an order of magnitude - lisper
http://tokutek.com/

======
rcoder
I see one graduate thesis, extended into a sparse handful of conference
papers, then somehow converted into the holy grail of database technology: an
index algorithm which can be used as a drop-in replacement for B-trees,
offering a 2-order-of-magnitude increase in performance with _no_ downsides.

What's that, you say? There must be _some_ trade-offs involved? Well, sure,
maybe...but you'll never be able to find out, because you can't look at
anything other than said early research results without waiting for a sales
representative to contact you. No downloads, real-world benchmarks, or
testimonials from users == a pretty dubious set of claims, at least to these
cynical eyes.

I'm not disputing the fact that these guys may have an honest improvement over
the state of the art for certain search tasks. If they do have anything
_close_ to the 50x speedups they claim for general MySQL workloads, though,
why hasn't Google simply written them a check and stuffed their great
discovery away with all of the their IP portfolio?

------
jteo
Fractal tree described in this paper:
<http://supertech.csail.mit.edu/cacheObliviousBTree.html>

~~~
jerf
How did you make that connection? "Fractal tree" doesn't naturally lead me to
"cache-oblivious tree".

(Looking at the descriptions it is at least plausible, I'm just curious how
you are so certain.)

~~~
jteo
I left out the part where I skimmed the blogs that mentioned Toku's software.
=)

------
smokinn
Percona recently released benchmark results comparing innodb to TokuDB:
[http://www.mysqlperformanceblog.com/2009/04/28/detailed-
revi...](http://www.mysqlperformanceblog.com/2009/04/28/detailed-review-of-
tokutek-storage-engine/)

The take-away is that TokuDB has a lot of promise but on CPU bound workloads a
lot of work still needs to be done.

~~~
jteo
Apparently it's not ACID compliant at this point in time.

~~~
jteo
My first impression: Compressed Indexes, New Index structure that allows
sequential I/O read/write to disk of indexes, making indexes cheap. Cheap
indexes = many indexes = improved query performance.

~~~
smanek
If that's the case, then this would be a much smaller improvement for DB's
that can be stored entirely in-memory or on SSD, right?

I think that most small db's are mostly in memory, and most large ones will be
on SSD soon, so do you think that this will have a smaller impact on those use
cases?

~~~
jteo
For really large databases, SSDs aren't cost effective (think multi-terabyte).
For the smaller sized databases, most will prefer a hybrid SSD-RAID
solution(like Sun's storage appliances) for cost again.

Toku is targeting the niche of people who 1. Don't give a damm about ACID. 2.
Can't afford hybrid SSD solutions.

Toku's indexing technology would be worthwhile at the higher end also, and
people with multi-terabytes of information at that end usually use a database
as a persistence backend rather than a ACID RDBMS. (think FACEBOOK or your-
next-large-startup). Those with a need for ACID, are already running Oracle
(which lacks this technology AFAIK). I sense an Oracle-buy-me exit strategy.

------
oliverkofoed
While it's important to try to squeeze out as much performance out of a single
box as possible, it seems to me that the future lies in scaling horizontally
by adding more servers.

CouchDB is an obvious contender here, but as far as i know they still don't
have partition of data across multiple servers (yet.., i saw a GSoC project
about it).

ScimoreDB is also interesting (disclaimer; i know them personally); a full
RDBMS with ACID properties and all that jazz that can partition data over a
set of servers. (www.scimore.com if you are interrested)

I'm a bit tired of using SQL/Tables/Columns for storage; the perfect solution
for me is CouchDB w/ partitioning where you "just add more servers". It would
be cool to get to the point where we can eliminate memcached and the likes
from our infrastructure because the database itself scales just as well.

just some random thoughts.

~~~
vozoscuro
does `lisper` ~wrk for tokuDb

------
chaosmachine
The site claims a 10-50x improvement in performance, which is pretty amazing,
if true. I guess the big question is how much does it cost?

~~~
shib71
Sounds like my my reaction. Until their improvements can actually be tested
and used outside the lab, why should I care? This announcement is very, very
close to being vaporware.

~~~
lisper
Kayak is running TokuDB. They're even paying for it :-)

~~~
schemmel
Link please...

~~~
lisper
None yet AFAIK, this is inside information. But there will be a press release.
That's part of the deal.

