
Our take on RethinkDB vs. MongoDB - coffeemug
http://www.rethinkdb.com/blog/mongodb-biased-comparison/
======
optimiz3
My company uses MongoDB. Our biggest pain points are:

1\. MongoDB has massive storage overhead per field due to the BSON format.
Even if you use single character field names, you're still looking at space
wasted on null terminators. 32bit fixed length int32s also bloat your storage
use. We solve this by serializing our objects as binary blobs into the DB, and
only using extra fields when we need an index.

2\. In Mongo, the entire DB eventually gets paged into memory and relies on
the OS paging system which murders performance. For a humongous DB, not so
much.

3\. #1 and #2 force #3, which is sharding. MongoDB requires deploying a
"config cluster" - 3 additional instances to manage sharding (annoying that
the nodes themselves cannot manage this, and expensive from an ops/cost
standpoint).

What I would like to know is:

1\. What is the storage overhead per field of a document in RethinkDB? If it's
greater than 1 byte, I'm wary.

2\. Where is the .Net driver?

~~~
coffeemug
1\. In the coming release we'll be storing documents on disk via protocol
buffers, which, unlike BSON has an extremely low overhead on fields. A few
releases after that we'll be able to do much better via compression of
attribute name information (though this feature isn't specced yet).

2\. No ETA yet, but we're about to publish an updated, better document, better
architected client-driver to server API spec, so we'll be seeing many more
drivers soon.

~~~
optimiz3
If you use proto-bufs, it means you already have a system for internal auto-
schematization. Why not pack all the fields together and use a bit-vector
header to signify which fields are present and which fields have default
values? I'd LOVE to see a document DB with ~1 bit overhead per field.

~~~
coffeemug
Yes, that's pretty much what we're going to do. It's a bit hard to guarantee
everything in a fully concurrent, sharded environment so it'll take a bit of
time, but that's basically the plan.

------
desireco42
I started using RethinkDB in one of my projects and am looking for excuses to
use it in more of them. So far things have been great and honestly my
impression is that RethinkDB doesn't get nearly the hype it deserves.

I used Mongo before and it is fine db and I don't think I would be sad to use
it, however rethink really does so many things better.

Again I just started using it and things are really good, I didn't ran into
any obvious limitations and annoyances.

There are several features that I really like, for example: web admin is
really well done, it is easy and obvious how you create cluster, there are a
lot of small things that made me jumpstart my development faster, as I can run
queries in admin to try them out and I also get data back to see how things
will look like.

The only thing I am somewhat missing is 'brew install rethinkdb'

~~~
coffeemug
Actually, 'brew install rethinkdb' is now available. I'll update the install
page with it in a moment.

EDIT: <https://github.com/rethinkdb/rethinkdb/issues/269>

~~~
desireco42
Thank you :), again it is not really serious gripe as regular installer does
the job just fine, but I tend to install all apps like that and this helps.

+1 for sure for this one

btw, when I was installing it, I tried that even before checking on homepage
:)

------
lucian1900
It does indeed look very much like MongoDB, but made by people that actually
know what they're doing. It's refreshing to see good database design for a
change.

~~~
dwnoble
Can you expand a little bit on this? What design decisions in MongoDB vs
RethinkDB are you referring to?

~~~
lucian1900
Rethink has durability, MVCC, joins, logical sharding, excellent admin tools,
etc. All things that serious databases tend to have and Mongo doesn't.

~~~
taligent
MongoDB IS durable now by default, has a third party MVCC implementation
(MongoMVCC) and has pretty decent admin tools.

And this idea that joins is a requirement for a "serious" database makes
absolutely no sense. Database level joins are toxic for scalability and IMHO
should always be done in the application layer.

~~~
lucian1900
Mongo in its most durable mode (which btw isn't what, say, Postres would call
durable) is really slow. Why even bother with it anymore?

First party MVCC is the only one that matters. It affects vital things like
backups, analytical queries and transactions.

Joins are extremely useful. If a database does the sharding, it is almost
always better for it to do the joins as well. Performance can be good with the
right model, and Mongo is slow anyway.

~~~
taligent
So we are in agreeance then. MongoDB IS durable but it will be slower doing
so. Hardly a surprise there. And still have to disagree about the joins but
hey agree to disagree.

As for MongoDB performance well making a blanket statement is pretty silly. On
a previous project I had queries that were upwards of 40x faster in MongoDB
than MySQL. Why ? Because MongoDB allows the ability to embed documents within
other documents to the point where I could make a single query with zero joins
to fetch 20 entities worth of data.

Every database is optimal for different use cases.

~~~
eurleif
I wonder what sort of performance you would have gotten using MySQL or
PostgreSQL, but denormalizing your data into JSON.

------
wiremine
Most of the posts I've seen about RethinkDB focus on "hey, we're a better
NoSQL solution than MongoDB." That could be true, but so far I see it mostly
coming from RethinkDB themselves, or people who like the design in theory.

However, does anyone have any practical real-world experience using it? It's
not production ready (from what I gather), but has anybody actually used it
for real world stuff?

For my own part, I tried it out, and got stuck trying to implement a many-to-
many style join. I did some searching, and it looks like that is not really
possible at this point. Not a bit deal, but it might be handy to have some
example SQL-to-RethinkDB queries, just to help us newbies figure out the
ropes.

~~~
coffeemug
Hi, slava @ rethink here. People have been using Rethink for lots of projects,
but there are still some showstoppers we have to work out (better docs
notwithstanding) -- remember, the product has been in the wild for less than
90 days.

What did you try to do with a many-to-many join? We could help you with
writing the query, and could add syntax sugar to the language to make it
easier if it makes sense.

~~~
wiremine
> "remember, the product has been in the wild for less than 90 days."

I hear you, and, FWIW, I'm excited about Rethink. To rephrase my
question/observation: your article clearly lays out why you think it is better
than MongoDB, using some quotes from people who agree with you. However,
without some real-world data, it is still an argument rooted in theory. I like
theory, but I also like to take real-world data to my bosses. Do you have any
stats/examples that actual compare and demonstrate the performance? (I
understand that wasn't the purpose of your article, just asking as a follow
up).

Regarding the many-to-many joins: I was just playing around with a contrived
example: "a blog post has and belongs to many categories." Mainly I was just
curious how to do it, I didn't _need_ it for anything. But, I couldn't figure
out how to write it with the query language. I was using Python DSL, FWIW.

~~~
coffeemug
I also hear you, there will be real data soon. We just want to be careful
about making sure everything is sound before publishing information.

As for many-to-many joins, I'll write something up about it, thanks!!

------
rubyrescue
Riak is NOT operations-oriented. It's nearly impossible to manage
operationally without dedicated staff at scale and the tools to introspect and
analyze and deal with failures aren't robust enough yet.

I know they're just trying to contrast Riak and Cassandra with Couch and
Mongo, and that Riak is designed to shard easily without the developer having
to think about it.

That philosophy actually is "developer-oriented" in that it SEEMS like an
operational savings because it was designed by developers.

~~~
argvzero
Chief Architect at Basho here:

Saying Riak is categorically non-operations-oriented is a bit hyperbolic, but
I will be the first to acknowledge that we need even more visibility into
failure-recovery / degraded mode situations. I've spoken to a few customers
who have "cheat sheets" of Erlang console commands they use to debug things
like handoff slowness or poor performance in general. This alone means we need
to do better,

On the other hand, Riak continues to function in scenarios where other
databases would be completely unavailable. I'll take immature visibility
during those situations over complete unavailability any day,

I appreciate your feedback - I can assure you that this is something we're
constantly working on and you'll see improvements with each release.

Finally, if you've been bitten by anything specific you'd like to see fixed,
we do all our development in the open at <http://github.com/basho>, so github
issues, pull requests, etc go right into our internal tools and workflows.

Cheers,

Andy Gross

~~~
rubyrescue
Would love to give some feedback. I like Riak and I believe in it - over the
long haul I'm 100% sure it will be an amazing product.

Let's grab beers at Erlang Factory!

Chad

~~~
argvzero
Sounds good - I'll see you there!

------
jcdavis
Not mentioned: RethinkDB doesn't yet support secondary and compound indexes,
which is a dealbreaker for a lot of setups

Definitely looks interesting though, and I look forward to playing around with
it at some point.

~~~
andrewljohnson
Read the article:

"Some key features like secondary indexes and live backup are still in
development"

~~~
coffeemug
Sorry, we added limitations a few hours after posting. I should have been more
clear about the fact that they were edited in. I'll try to get better at this
live blogging thing :)

~~~
jrochkind1
Please keep talking about limitations in your marketting, even as the product
gets more mature and has fewer and/or different ones. (There's _always_
limitations). Learn from the mongodb backlash.

------
mintplant
I'm eager to take RethinkDB for a spin, as soon as secondary and compound
indexes are fully implemented.

[1] <https://github.com/rethinkdb/rethinkdb/issues/88>

[2]
[https://github.com/rethinkdb/rethinkdb/tree/jd_secondary_ind...](https://github.com/rethinkdb/rethinkdb/tree/jd_secondary_indexes)

------
islon
Every NoSQL database is perfect and better than all the other options until
you start using it in the real world. I'm not saying rethink DB is not a good
solution, the point is, nosql dbs are about compromise and specific problems.

~~~
coffeemug
Hi, Rethink is my baby. It is beautiful, but it is _not_ perfect :)

~~~
TY
Off topic: Slava, I wish you could resume writing for your blog - you've
always had interesting things to say.

~~~
coffeemug
Thank you. I tell people I've stopped because I don't have the time. Really
it's because I've since realized that the world is much more nuanced than I
gave it credit for, but I have nowhere near the sufficient writing skill to
express it. I'll try to get back into it.

~~~
TY
I'll be looking forward to it! Thank you very much for the time you have
already already put in your writings and the future.

------
bunkat
Looks very interesting, but this statement in their FAQ is a red flag for me:

How can I understand the performance of slow queries? Understanding query
performance currently requires a pretty deep understanding of the system. For
the moment, the easiest way to get an idea of why your query isn't performing
well is to ask us.

Wish RethinkDB was a little further along because it seems like it might be a
good fit for a new service I'm building.

~~~
neumino
Michel @ RethinkDB

We are building a tool to explain in a nice way how the query is executed,
what are the bottlenecks etc. It should make it for 1.5.

You can track progress here
<https://github.com/rethinkdb/rethinkdb/issues/175> (it's kind of empty for
now)

~~~
bunkat
Interesting, I'll keep an eye on that. When do you think RethinkDB will be
ready for production use?

~~~
neumino
We aim to be ready for production in 6 months.

------
bfirsh
This just reads like marketing speak. What are the _disadvantages_ of
RethinkDB?

~~~
coffeemug
[http://www.rethinkdb.com/docs/faq/#when-is-rethinkdb-not-
a-g...](http://www.rethinkdb.com/docs/faq/#when-is-rethinkdb-not-a-good-
choice)

[http://www.rethinkdb.com/docs/advanced-faq/#is-rethinkdb-
imm...](http://www.rethinkdb.com/docs/advanced-faq/#is-rethinkdb-immediately-
or-eventually-consis)

~~~
bcbrown
You mention comparing against Hadoop for computationally-intensive data
analysis. Would Rethink be suitable for a several-terabyte dataset with non-
computationally-intensive analytics?

Currently we're using Hive and Python over streaming Hadoop. There's no
significant ongoing data accumulation; we're just analyzing the data we have.

~~~
coffeemug
We haven't tested on workloads like that, but I can't think of anything that
would prevent this workload from working well. The idea behind Rethink's
architecture is to eventually allow people to run full analytics on the same
cluster as their live app. Currently we're optimizing for online-type queries,
but you can run analytics queries too, it's just that we haven't given the
optimizer enough love on that front yet.

------
mhd
How's the general performance and memory consumption on smaller machines, e.g.
entry-level VPS's or the lower spectrum of AWS VMs? Don't have any big
projects in the pipeline that immediately required sharding etc, but would
like to play with it on a few weekend-scale items.

~~~
coffeemug
We did some testing and it should be great, especially once
<https://github.com/rethinkdb/rethinkdb/issues/97> makes it in.

~~~
ukd1
When I last tested I maxed out at about 700 inserts / sec with nothing else
happening (MBP Retina, SSD, etc) - it's not bad, but not as fast as Mongo.

I'm going to benchmark it when I get some time!

~~~
coffeemug
This is a known limitation -- it will be resolved when we fix
<https://github.com/rethinkdb/rethinkdb/issues/207>. We just picked the safest
default (fsync per op) and shipped the product. #207 will make things a little
smarter.

~~~
ukd1
700 inserts / sec is still quite a lot for a single laptop and I'm sure it
will get better over time.

:-)

------
arunoda
This comparison speaks better than this -
<http://www.rethinkdb.com/docs/comparisons/mongodb/>

------
dennis82
this is marketing cloaked in a developer portal. I think it's great that
rethinkdb is trying to distinguish themselves from Mongo, but what's the real
marginal utility of a rethinkdb over Mongo?

Mongo has been around for years, and it still has problems.

Rethinkdb is just launching a new product that essentially does the same thing
as Mongo, but is maybe just a little easier to use.

I think the Yet Another Database (YAD) question still hasn't been answered by
this post.

------
munimkazia
We have been looking at NewSQL(or even NoSQL) platforms for our databases at
my place of work, and we also stumbled upon RethinkDB. While everything these
guys say does sound amazing, we were looking for someone who has implemented
it, or any third party case study about it. Since we couldn't find any, we
decided not to go with RethinkDB for now.

Does anyone here know any big website/service which uses RethinkDB?

~~~
apendleton
It's brand-new, and not recommended for production use yet, so I highly doubt
that anything like that exists.

------
etanol

        «An asynchronous, event-driven architecture based on 
         highly optimized coroutine code scales across multiple
         cores and processors, network cards, and storage systems.»
    

It may be a dumb question, but isn't this statement a bit contradictory? As
far as I understand, event-driven design and coroutines (i.e. cooperative
multitasking, lighweight threads, etc.) are the techniques usually chosen to
AVOID concurrency.

How does such a design imply multicore scalability? Obviously, coroutines and
event loops don't prevent you from running in multiple cores. I just fail to
see the correlation.

~~~
coffeemug
This is a great question. We start a thread per core, and multiplex thousands
of coroutines/events on each thread. When coroutines on different threads need
to communicate, we send a message via a highly optimized message bus, so
cross-thread communication code is localized. This means each thread is lock-
free (i.e. when a coroutine needs to communicate with another coroutine, it
sends a message and yields, so the CPU core can process other pending tasks).
The code isn't wait-free -- a coroutine might have to wait, but it never ever
locks the CPU core itself. So, as long as there is more work to do, the CPU
will always be able to do it.

If instead we used threads + locking like traditional systems, we'd have to
deal with "hot locks" that block out entire cores. Effectively we solved this
problem once and for all, while systems that use threads + locks (like the
linux kernel) have to continuously solve it by making sure locks are extremely
granular.

~~~
JulianMorrison
Sounds very Erlang-ish. Did you copy that deliberately?

~~~
coffeemug
We do effectively have an ad-hoc mini Erlang runtime that we wrote at the core
of the system. I'm not sure how deliberate that was -- we sort of borrowed
performance ideas from many places, tried a lot of different approaches, and
settled on this one. Lots of this was definitely inspired by ideas from
Erlang.

~~~
gruseom
There definitely seems to be a version of Greenspun's Tenth Rule for Erlang.
But I think Greenspunning has gotten too bad a name – sometimes implementing a
subset of a classical system is exactly what you ought to do, for example when
your problem allows you to exploit certain invariants that don't hold in the
general case, or for some reason using the classical system itself (Erlang in
this case) is not an option.

~~~
coffeemug
Right! Rethink has an adhoc Erlang runtime for message processing, and an
adhoc lisp for the query language. I'm both ashamed and proud of this at the
same time :)

------
primitur
Anyone know if there is a Lua driver for RethinkDB in the works? I suppose I
could use the C client and think about generating one for Lua, but maybe
someone has already done that?

~~~
coffeemug
There isn't one yet. We'll publishing the new, much simpler spec for client
driver writers. It'd be pretty easy to do a native Lua driver (based on
protocol buffers).

------
smagch
Other than RethinkDB, BigCouch looks both Developer/Operation oriented
database since it is a Dynamo-like CouchDB. Does anyone have a BigCouch
experience?

------
DonnyV
Where is the Windows port? and if there is one will it always be second rate
compared to the Unix and Linux port?

Also where is the .Net driver?

------
raxitsheth
geo support? i think No!

