
RethinkDB 1.9: Indexing and query runtime performance - coffeemug
http://rethinkdb.com/blog/1.9-release/
======
jonpaul
I've been experimenting a bit with RethinkDB within the last few days. Its
stability is definitely not there (too be fair, they've made this clear), so
it's crashed when I needed it most. But here are the finer points:

1) Their team is extremely responsive. In fact, the most responsive team that
I've dealt with on Github. I've posted at least 4 Github issues during various
times of the day and evening and I've gotten a response in less than 10
minutes. Most of the time, more than one member responded.

2) They are not over selling their product like MongoDB did. They are very
clear about what their product does and doesn't do.

3) Clustering is stupidly simple. You can have one setup within minutes and
have your servers geographically located throughout the world.

4) The web UI makes administering the databases easy and fun.

After they get the kinks (stability & docs) fixed, this database will
definitely have a bright future.

~~~
coffeemug
slava @ rethink here. Could you link to the crashing bug you're referring to
here? (just so other people have the benefit of learning about it)

EDIT: also, thanks for the kind words!

~~~
jonpaul
[https://github.com/rethinkdb/rethinkdb/issues/1419](https://github.com/rethinkdb/rethinkdb/issues/1419)
:)

~~~
coffeemug
Ah thanks. This OOM crashes have been lurking for a while. @srh is working on
this now and it should be gone once and for all after 1.10. Sorry you ran into
this!

~~~
StavrosK
I've asked this before and you replied, but it was lies, all lies! I will ask
again: What have you done with Marc and why is he never on Steam? We lost a
perfectly good DOTA2 player and all-around good guy!

~~~
mglukhovsky
Haha, I mentioned it to Marc the other day-- he's out of the office now, but
I'll be sure to ask again for you. :)

~~~
StavrosK
Alright, thanks :)

------
andrewmunsell
It's great to see an amazing product like RethinkDB move so quickly and
improve so much. Since I've been experimenting with it, the performance has
improved dramatically.

RethinkDB + Docker is a great setup for me and has made upgrading between
versions of RethinkDB less painful, and when combined with tinc, clustering is
really easy as well (DigitalOcean, where I'm experimenting with RethinkDB,
doesn't support Amazon VPC-like functionality and only recently added support
for a private network in one region).

~~~
corresation
_It 's great to see an amazing product like RethinkDB move so quickly and
improve so much_

The counter-argument is that they can move so quickly and improve so much
because so much is so poorly implemented. While it may seem cynical, I have
formed a habit of eliminating projects that still see magnitudes performance
improvements from such rudimentary activities. Once they get to the point
where a yearly release are a couple of syntax improvements and some minor
speedups, it is more likely to be production ready.

~~~
gruseom
You have a point in general, because that is how these things usually go, but
these guys spent like three years working on the foundation for what they're
doing now. This level of rapid improvement may be the visible payoff of a lot
of careful design and engineering.

~~~
andrewflnr
Speaking of which, for the rethink people on the thread, how much of old "key-
value store for SSDs" code is still active? Is the document store at all a
layer on top of that?

~~~
SamReidHughes
Most of that code still exists, extensively modified. Here are some things
that don't:

\- Talking to files with libaio -- we now only support use of blocking I/O
calls in a thread pool.

\- Use of block devices instead of files, and code optimized for older SSDs
that spread writes to different parts of the disk.

\- A fsck utility and a utility for extracting data from a corrupted file.

Most other stuff has been rewritten and refactored, though -- RethinkDB was
once implemented entirely using callbacks on an epoll loop. Then we introduced
coroutines (cooperative green threads that sit atop an epoll loop) into the
codebase. Nowadays almost everything is implemented in terms of coroutines --
there aren't many APIs left that take a callback and say it'll get called
sometime later. There's even still a secret memcache interface.

The storage engine itself does a bit more now than it used to. Support for
secondary indexes, MVCC, efficiently bringing back out-of-date replicas, and
better on-disk storage are the main ones I can think of at this hour.

------
vosper
I'm currently using a columnar SQL database with a denormalised schema of
about 1000 columns that contains several billion rows. 99% of our queries are
SUM over some of these columns, with a few simple filters: WHERE order_id IN
[...] AND date > xyz

It performs really well (sub-second queries over millions of rows) but it's
licensed by data volume and (more importantly) it's a single-server solution -
at some point we're going to need a distributed database.

Is RethinkDB suitable for analytics workloads like this?

~~~
coffeemug
For your workload a columnar store will always outperform a row-based store
(like RethinkDB). Rethink does work for analytics, but it can never compete
with columnar stores (like Vertica, etc.) on the workloads those systems are
good at.

~~~
coolsunglasses
Your honesty is much appreciated by the rest of us. Don't ever lose that
please.

------
trebor
I have to say, RethinkDB has piqued my interest. And I'm a pretty staunch
MySQL/SQLite -based developer.

~~~
cbsmith
My condolences.

~~~
siddhant
Why?

~~~
cbsmith
Because... MySQL.

------
dkhenry
Awesome no driver changes.

~~~
coffeemug
For all the driver devs out there, there should be no more driver protocol
changes until we hit an LTS release (unless we find some critical API bugs
that need to be fixed).

~~~
TylerE
With that being the case are there plans to spend a bit more time on driver
support? At the very least a high-quality 1st party Java driver would go a
long way. Even better if you provide Scala/Clojure/Groovy bindings.

~~~
dabeeeenster
Can anyone comment on the quality of the 3rd party Java driver
([https://github.com/dkhenry/rethinkjava](https://github.com/dkhenry/rethinkjava))?

~~~
dkhenry
I am biased, but It is a very straight forward implementation building
directly on the protobuf interface. It lacks some of the niceties of the
interperted drivers, but It is as fast as ( or faster in some cases ) then the
official drivers ( as best I can tell )

The author stopped working on it for a little because he was tired of
refactoring his code as the protocol changed and figured he would just add all
the missing bits once they finished the protocol.

~~~
tigeba
I was eager to give these a whirl but from my perspective the AGPL licensing
is rather onerous considering that the official drivers are APL. Not sure if
this was intentional or accidental.

~~~
dkhenry
Well Color me surprised. Your right. I just used the same license that
RethinkDB is licensed under, I didn't see that they had the drivers under a
different license.

By Tomorrow I will have them updated to APL ( luckily I can do this since no
one else has really supplied a pull request :-( )

~~~
tigeba
Excellent! I will check them out and send off a pull request if I run into
anything.

------
hnnnnng
I really like the idea behind rethink and the concept as well. I've also
looked at documentation and it seems very simple to use.

Also, I understand that 'benchmarking' databases is extremely situational and
generally inaccurate. However, without reasonable ways to measure an increase
in performance, its really hard for me to make the decision to switch from
mongo to rethink.

What I'm trying to ask is, does anyone have any information that can help me
decide where, how, when and why should I use make the decision to switch from
mongo to rethink? Not just for me but also so that I can show others in my
team to get a consensus to switch.

~~~
coffeemug
Is performance the main motivation for your team? If so, we'll be publishing
some results soon. In the meantime, there are some non-performance related
comparisons with Mongo that you might want to take a look at:

* Technical comparison: [http://rethinkdb.com/docs/comparison-tables/](http://rethinkdb.com/docs/comparison-tables/)

* A slightly more biased comparison: [http://rethinkdb.com/docs/rethinkdb-vs-mongodb/](http://rethinkdb.com/docs/rethinkdb-vs-mongodb/)

~~~
hnnnnng
Performance would grant us the best reason to switch. Because otherwise we are
quite satisfied with mongo. I have seen the technical comparison. I will keep
an eye out for the performance results. Thanks for letting me know.

