
Building Better Node.js Applications with RethinkDB - CherryJimbo
https://nodecraft.com/blog/dev/building-better-node-js-apps-with-rethinkdb
======
falcolas
> over 70 million API requests a week.

Congratulations. However, if you're going to imply that traditional SQL
databases are non-performant in comparison, you may want to do a bit more
looking at the performance numbers both PostgreSQL and MySQL are capable of.

100+ million transactions per day is pretty easy to achieve with a traditional
SQL DB. Also, you can do a lot of filtering in software specialized for those
use cases, reducing the need to consume processing time in the Node.js V8
thread.

I don't want to knock RethinkDB here, just point out that there are stronger
performance cases to be made.

~~~
CherryJimbo
You're absolutely right - there's nothing stopping a standard MySQL setup
pulling off the same stats, but it just wasn't what we were looking for,
coupled with the various other decision points made in the post. RethinkDB's
native ability to shard and scale is the key performance case to be made, in
my opinion.

As mentioned by another user though, the filtering here is done entirely
server-side (at RethinkDB's level), so the V8 thread performance isn't really
a concern.
[https://news.ycombinator.com/item?id=9411738](https://news.ycombinator.com/item?id=9411738)
for more info.

------
woah
So is rethinkdb what mongo was always meant to be?

~~~
girvo
In my experience, yes, pretty much! It's a really amazing database, and it's
baked-in web UI for management and dead-simple clustering is just perfect for
the use cases I've slotted RethinkDB into.

------
btrombley
All of the benefits sound the same as MongoDB, which also does sharding and
JavaScript queries, including arbitrary code in "callbacks" in map-reduce or
$where clauses (though it's slow).

Can someone explain the benefit of RethinkDB over MongoDB?

~~~
coffeemug
Slava @ Rethink here.

RethinkDB is based on a fundamentally different architecture from MongoDB.
Instead of polling for changes, the developer can tell RethinkDB to
continuously push updated query results in realtime -- check out
[http://rethinkdb.com/faq/](http://rethinkdb.com/faq/) for more details on
this.

Just FYI, Rethink does sharding and JavaScript queries (including callbacks)
--
[http://rethinkdb.com/api/javascript/js/](http://rethinkdb.com/api/javascript/js/).
The `js` command interops with every other command in RethinkDB and works
really well.

------
hoggle
I'm probably forever spoiled by Rails' ActiveRecord - has anybody here used
[http://thinky.io](http://thinky.io) with Node/RethinkDB? I really like to
define my objects in an OOP fashion with class/instance methods included and
would prefer a convenience "ORM" library for that purpose.

~~~
habitue
Thinky is really top notch. Michel keeps it up to date and constantly adds new
features to it. I highly recommend checking it out

------
ebbv
Maybe it's just me but:

    
    
        var signupFilter = new Date();
        signupFilter.setDate(signupFilter.getDate()-30); // get a timestamp from 30 days ago
        r.table('users').filter(function(user){
             return user('signup_date').gt(signupFilter);
        }).run(function(err, user){
             // ... a cursor to stream through the data which can also be converted
             //  to an array using the toArray method
        });
    

Is a lot nastier than:

    
    
        db.query('SELECT * FROM users WHERE signup_date > CURDATE() - INTERVAL 30 DAY;', {}, function (err, users) {
            // Do stuff
        });
    

No?

~~~
filearts
I would agree that the latter is less 'nasty' in that it is a query written in
a DSL designed to make querying tabular data more understandable. It would be
a major failing of SQL if that were not the case.

However, what the former does that the latter does not is provide a mechanism
to build up and modify queries. It can be very helpful to pass around and
affect partial queries when you are looking to eliminate repetitive code for
building multiple, similar queries.

This is probably why many abstraction layers have query builders that look a
lot like the syntax that you call nasty to generate the 'less nasty' raw SQL.

~~~
ebbv
So, basically you're saying it's OK that the abstraction layer for RethinkDB
creates nasty code because it's native JS? I don't agree. I think nasty code
is never OK.

Why can't the RethinkDB abstraction layer just be better? Why isn't:

    
    
        var dateFilter = new Date();
        dateFilter = dateFilter.setDate(dateFilter.getDate() - 30);
        rdb.table('users').get([ { column: signup_date, gt: dateFilter } ]).each(function (err, user) {
            // Do stuff
        });
    

Possible? That's just off the top of my head how I would do it in a way that's
native JS and far better than what's presented in the article.

~~~
habitue
You can do this in RethinkDB:

    
    
        r.table('users').filter(r.row('signup_date').gt(signupFilter)).run()
    

Using an anonymous function like in the first example is optional. `r.row` can
be very convenient. In this case, you only need to use `r.row` because you're
using the `gt` comparison. If you're just doing a direct match, you can use
this syntax:

    
    
        r.table('users').filter({signup_date: dateFilter})

------
atarian
Seems like a great option for real-time on the backend. But what would you use
to push real-time data to a client-side JS app?

~~~
jkarneges
Lots of choices there: Faye, Socket.io, Pushpin, PubNub, etc.

I'm curious to see someone try integrating RethinkDB with Meteor.

~~~
tehbeard
Don't forget vanilla tech like WebSocket, EventSource or good old fashioned
ajax.

------
mrits
One of the best optimizations I have done was switch my caching from using a
key-value serialization to an array form. I'm caching fairly large data
reports about computer performance. It brought the average cache size from 100
megs to around 4 megs.

[{'first_name':'mike'},{'first_name':bill'},...]

vs

{meta: {'header':['first_name']]}, data: [['mike'],['bill],...]};

This is why I can't consider databases like RethinkDB or Mongo. They are
taking such a big hit already for using something like BSON as a storage that
I don't even want to be involved in that train of thought anymore.

------
Jean-Philipe
> r.table('users').filter(function(user){ > return
> user('signup_date').gt(signupFilter); > }).run(function(err, user){

Whenever I see one of those "awesome" no-SQL queries, I can't help but think
about how ugly and bulky they are compared to SQL.

Or compared knex.js for node:

select().from('users').where('signup_date', '>', signupFilter)

It supports promises. What's wrong with that?

------
ttty
I can easily switch "RethinkDB" with "MongoDB" and the article would still be
correct.

~~~
bovermyer
Except for the part about real time notifications, which is RethinkDB's killer
feature.

~~~
tfb
How would RethinkDB's real-time capabilities compare to doing something like
this with Mongo? (I'm genuinely curious about any shortcomings with the
following implementation, besides scaling.)

    
    
        // plugins/replicate.js
    
        var replicator = require('replicator');
    
        module.exports = exports = function replicatePlugin (schema, options) {
          schema.pre('save', function (next) {
            // notify subscribers of save
            replicator.notify('save', this);
            next();
          });
        }
    
    
    
        // schemas/game.js
    
        var replicatePlugin = require('../plugins/replicate.js');
        var GameSchema = new Schema({ ... });
        GameSchema.plugin(replicatePlugin);

~~~
habitue
I don't know what the replicator module does, but from RethinkDB's end, it
allows you to avoid polling (which is a scalability thing, lots of really nice
library APIs are built on top of dead-slow polling of the database).

RethinkDB also lets you write queries and subscribe to changes on the query
itself. So in mongo, you can tail the replication oplog to avoid polling, but
you still need to filter _every_ event happening on the database for the ones
you're interested in. On top of that, if you need to do transformations of the
data, you have to re-apply them to what you get from the oplog. With
RethinkDB, you write the same query you would have, and the database can be
very efficient in only sending changes you're actually interested in, and send
them with the transformations you asked to be applied.

Check out these slides if you're interested:
[http://deontologician.github.io/node_talk/#/](http://deontologician.github.io/node_talk/#/)

------
netcraft
so are we saying that

    
    
        r.table('users').filter(function(user){return user('signup_date').gt(signupFilter);}).run(...
    

is running this filter in the database, not in node?

~~~
andrewmunsell
It does run on the database server-- see the documentation for more
information:
[http://rethinkdb.com/api/javascript/filter/](http://rethinkdb.com/api/javascript/filter/)

They key line(s) is/are:

Predicates to filter are evaluated on the server, and must use ReQL
expressions. You cannot use standard JavaScript comparison operators such as
==, </> and ||/&&.

It's essentially equivalent to:

r.table('users').filter(r.row("signup_date").gt(signupFilter)).run(conn,
callback);

~~~
korijn
Am I correct in assuming that the implementation is similar to that of LINQ
and ORMs like NHibernate or Entity Framework? Example:
[https://msdn.microsoft.com/en-
us/data/jj573936.aspx](https://msdn.microsoft.com/en-us/data/jj573936.aspx)

~~~
coffeemug
Yes, that's right. If you want to know how it's implemented, check out this
post -- [http://rethinkdb.com/blog/lambda-
functions/](http://rethinkdb.com/blog/lambda-functions/), it goes into quite a
bit of depth into how the drivers work.

------
__Joker
Just a shout out, I am not sure when they evaluated the different db's. But
Postgres ships with hstore and json store for people who want to go schema-
less.

~~~
woah
I'd rather use a database purpose built for what I'm using it for. I.e., if
I'm using Postgres, I'll suck it up and plan a schema. Their json stuff
strikes me as a second class citizen.

~~~
__Joker
I am curious, why you think postgres json stuff is second class citizen ?

