

How TokuMX was Born - dataviz
http://www.tokutek.com/2014/02/how-tokumx-was-born/

======
rogerbinns
I'm looking forward to when TokuMX is "ready", and especially hope it gives
MongoDB the kick they deserve.

I did try TokuMX over a month ago and it was a dismal failure. It used
considerably less space (good), imported data quicker (good) but failed at
runtime after a few hours claiming issues with locking. Our code doesn't use
locking and was running exactly what runs against MongoDB just fine.

~~~
zardosht
Roger,

I work at Tokutek (and wrote the post above). I'm sorry you ran into issues
trying out TokuMX. I assure you, we are "ready", as we have users running in
production.

Nevertheless, you ran into problems and that is unfortunate. If you have
details, can you please share them with the tokumx-user google group? We might
be able to help. I suspect the transition to using a transactional system like
TokuMX where entire statements are transactional is resulting in some
"gotchas", but that is just an educated guess.

-Zardosht

~~~
rogerbinns
I mean ready in the sense that pointing code that worked flawlessly against
MongoDB to TokuMX then just works flawlessly too.

I uninstalled Toku and went back to MongoDB so I can't provide any further
testing. (The mongorestore takes days.)

I can tell you want code was running at the time. It reads events sorted by
user id and timestamp, and then discovers session boundaries in that. A new
session object (in a different collection) is written out with all the events
as a subdocument list. (In rarer cases an existing session object is updated.)
This was happening in 8 separate processes all in Python/pymongo. There are no
statements running that affect more than one document, nor any need for
transactions.

~~~
leif
If you were using upserts I expect you were having problems due to the
optimizer retrying all possible plans (including table scan) periodically.
This is reflected in
[https://github.com/Tokutek/mongo/issues/796](https://github.com/Tokutek/mongo/issues/796)
and is fixed in 1.4.0. If you'd like to try another evaluation, get in touch
with us and we can help you track down whatever problems you see.

Not all mongodb code will optimally use tokumx without any changes.
Concurrency is hard and mongodb encourages some patterns that are bad for any
concurrent database. For example, count() for an entire collection is not, and
could never be, as cheap in a concurrent database like tokumx as it is in
mongodb.

~~~
rogerbinns
Thanks for the offer, but the mongorestore times (against MongoDB) being over
a week makes this too risky.

The code making changes was insert (mostly) with a few upserts, but the latter
was by _id. My hypothesis as to the cause is that tokumx adds implicit
transactions and then there are some arbitrary restrictions around those
transactions (eg how many outstanding at once, timeouts in lock acquisition)
and after a few hours one of those was hit. The error message was something
about being unable to start a transaction.

> Not all mongodb code will optimally use tokumx without any changes

The goal wasn't to be optimal or anything like that. It was initially about
space consumption (where you did _really_ well) and verifying the same client
code ran correctly. We have two setups so one would run toku and one mongodb
and data processing results compared.

~~~
leif
Ok. Well, you said you were waiting for it to be ready, and I think it is.
We'll be here when you get a week free to tinker.

------
jontobs
Very informative! Sounds like great technology!

