
RethinkDB 2.1.5 performance and scalability - knes
http://rethinkdb.com/blog/rethinkdb-performance-report/?hn
======
Cidan
We use RethinkDB for our home grown monitoring system; it's incredible. We
have several hundred clients connected, inserting monitoring data and doing
some pretty complex queries. In the ~2 years we've been using RethinkDB, we
have never experienced a failure.

If I could go back in time, I would use RethinkDB for our actual product.

~~~
jtmarmon
Has the live query stuff performed well? i.e. subscribing to some arbitrary db
query for new records?

------
overcast
This is probably my favorite database to work with currently. Real time,
relational document storage. Best of three worlds.

~~~
marknadal
Relational documents? That sounds like graphs but Rethink is not a graph
database (compared to Neo4j, GUN, Orient, Arango, etc).

Do you have a link to explain what you mean?

~~~
rspeer
What would you say is the distinguishing feature of whether something "is a
graph database"?

To put it another way, what do the databases you listed have in common with
each other that they don't have in common with other NoSQL databases, besides
marketing?

~~~
niftich
A graph database exposes an object model that fits graph terminology: nodes
and edges. They (hopefully) are optimized for graph-traversal operations like
hopping along a series of edges, which would be painful in a relational model.

In this situation, NoSQL is the marketing term, while a graph database is a
clear concept with corresponding performance and capability expectations.

~~~
rspeer
I'm not asking you to explain to me what a graph is.

I've found that it's easier and faster to import and traverse edges in SQLite
or PostgreSQL than in some of the systems you listed. I haven't tried
RethinkDB but it sounds promising. You use words like "hopefully" and
"performance and capability expectations", but distinguishing databases based
on their hopes and expectations is not at all a clear concept.

The main difference seems to be whether you call things "nodes and edges" or
"documents and references" or "rows and joins" in the documentation.

(edit: removed some negativity, and I'm aware some remains)

~~~
dingfeng_quek
The most common and relevant difference is that a graph database has a storage
engine that is optimized for graph traversal. For example, in a RDBMS, another
row is referenced via a foreign key which involves a lookup to an indexed
column, while in a graph database, a node is often referenced by its storage
location. This makes graph traversals much cheaper, and is also something that
an RDBMS is unable to optimise for because of conflicts with the relational
model.

Similarly, the query language is optimised for graph traversal types of
queries, in a way that would not be possible in an RDBMS due to the relational
model constraints, and also because some of the query operations would be
extremely inefficient in an RDBMS storage engine.

> That sounds like graphs but Rethink is not a graph database (compared to
> Neo4j, GUN, Orient, Arango, etc).

With regards to the listed databases, some are, but the rest are not (and do
not claim to be) graph databases.

Note: It's different people replying to you.

~~~
rspeer
FWIW, I should mention that I _have_ tried one DB that seems to qualify as a
graph database, in that it actually does seem to import and traverse edges
efficiently, which is Blazegraph. You tend to have to search past several
other things calling themselves graph DBs to find that one.

(But I'm a bit afraid of using it. Any armchair lawyers want to tell me if
code that uses a GPL database has to be GPLed itself?)

~~~
jerven
Don't worry about the GPL part, e.g. see MySQL as a GPL database in wide use.

As long as you access blazegraph via the sparql or tinkerpop interfaces you
will clearly be fine.

------
vonklaus
I _really_ like using rethinkdb. Currently, I am using thinky for node as an
orm which works quite well for my purposes. If anyone else works with nodeJS I
would be interested to know:

* how you use rethinkdb

* what you don't use it for, and why

* good examples resources like ORMs, repos, and data models.

Some things I like/use/reference:

\- [https://github.com/drhurdle/node-rethinkdb-auth-
starter](https://github.com/drhurdle/node-rethinkdb-auth-starter)

\- Yo Express, provides a nice structured MVC application with thinky and
rethinkdb with gulp

\- [https://www.airpair.com/javascript/posts/using-rethinkdb-
wit...](https://www.airpair.com/javascript/posts/using-rethinkdb-with-
expressjs)

~~~
cjhveal
While thinky is great and does a ton of heavy lifting for me, I've found it a
little bit confusing when needing to write more advanced queries. I would
personally recommend to start learning reql + rethinkdb with the
rethinkdbdash[0] driver first. It helps with managing connection pools and
working with Promises but still lets you play nice with the (very good)
rethinkdb docs while ramping up.

[0]:
[https://github.com/neumino/rethinkdbdash](https://github.com/neumino/rethinkdbdash)

~~~
chrisfosterelli
We had this experience as well. We started using Thinky for a few projects,
but found that as soon as we need more complex queries it became very awkward
to work within Thinky to do those.

Eventually, we discovered ReQL by itself is just great to work with
(especially with rethinkdbdash). Writing your own queries also has significant
atomicity benefits over Thinky.

~~~
overcast
You can mix and match your own queries with Thinky helper functions. I do it
all the time with .getJoin. Do some more advanced REQL query, getJoin it into
your other Thinky defined models. Thinky is great for the
modeling/relationship aspect, setting up default values based on returns from
anonymous js functions, ensuring indexes, and dealing with referential
integrities when you want to recursively delete related documents. Managing
all of that manually, while it's doable, would be a giant pain in the ass.
Point is, Thinky and raw REQL work together.

~~~
chrisfosterelli
It's really not that bad to manage that manually! The models we replaced with
Joi validators, the relationships we weren't using anyway anymore because
Thinky has some limitations to them, default values are easy in Rethink, index
management just became part of normal migrations, and referential integrities
aren't actually guaranteed in Thinky since it's not atomic so we found
explicit chained promises to be more intuitive :)

Thinky and raw ReQL definitely work together, don't get me wrong! But as our
project scaled we found 95% of our code had become raw ReQL for atomicity,
performance, and relational reasons, so it just didn't make sense to use
Thinky in future projects. Our original RethinkDB project still contains
Thinky, but it'd be cleaner if we just made a complete switch at this point
too.

To each their own though, of course!

------
colordrops
What are some counterindicated use cases for rethinkdb?

~~~
dingfeng_quek
Besides multi-document transactions, I've encountered these problematic use-
cases (something I discovered after scope creep of projects):

2\. When I needed "more exotic" data types (e.g. but not limited to Decimals)
where you want query/analytics logic to happen on the database rather than the
app.

3\. OLAP types of use cases. I really wanted collation, sort order, query
optimizers, ability to "explore" the data in relational ways, and so on.

~~~
arjunbajaj
What did you use to solve these two use-cases?

~~~
dingfeng_quek
Couldn't figure out a way to solve them reasonably (and within schedule) using
RethinkDB, so I did a "master-slave" with the RethinkDB as the master and
PostgreSQL as a slave replica. Then did all the queries and analytics on the
PostgreSQL database. I think this is far from ideal in terms of server costs,
but scope creep is never ideal.

I suppose using computed property indexes in RethinkDB might work as well, or
generating indexes, sort order, and doing query optimisation outside of the
DB, but I couldn't figure it out in time, and it also seemed way more
complicated than replicating the data to PostgreSQL.

EDIT: Above is for the OLAP use cases. OLAP of some types just don't go well
with document databases. For the Decimal type, I stored it as a string, and
had a computed property that transformed the string to a float. The float can
then be indexed and queried like a number. It was good, until all the other
OLAP use-cases creeped in.

Having said that, I found that custom replicas with RethinkDB is easy because
it emits data change events. Never expected that feature to be an escape
hatch.

------
hipsterrific
I love how the rethink team keeps improving an otherwise wonderful document
database. ReQL is also super nice. :)

------
gshx
Does anyone have real prod scale numbers - even order of magnitude concurrency
would be helpful. We're obviously very wary of trying out new and young data
stores.

