
A Comprehensive Guide to Moving from SQL to RethinkDB - TheMissingPiece
https://www.airpair.com/rethinkdb/posts/moving-from-sql-to-rethinkdb
======
NDizzle
After spending a decade working with Lotus Domino, I think all of these people
who are willingly going back to document style databases are insane.

Being able to fiddle with your schema on the fly isn't necessarily a good
thing. Brainstorm, plan, execute. Or just slap another field on the end of the
document. Whichever sounds better for the long term health of your data!

~~~
wereHamster
I use RethinkDB as the backing database for a API server written in Haskell.
So I have pretty strong guarantees that the data which enters the database is
sane, and when I read it out Haskell automatically validates it against a very
strict schema.

SQL schemas are not expressive enough to describe the data types which I want
to store in the database. The argument SQL vs NoSQL becomes moot when any
database is degraded to essentially pure blob storage. What matters to me is
ease of use, administration/maintenance burden, and of course speed. How
strictly the database verifies the schema isn't even on the picture.

~~~
jlarocco
> SQL schemas are not expressive enough to describe the data types which I
> want to store in the database.

I'm incredibly skeptical that SQL doesn't support your data types but JSON and
RethinkDB do. A quick glance at the docs for PostgreSQL and RethinkDB, and the
RethinkDB types are a small subset of the PostgreSQL types. It sounds to me
like you just don't want to break out nested objects into separate tables.

[http://www.postgresql.org/docs/current/static/datatype.html](http://www.postgresql.org/docs/current/static/datatype.html)

[http://www.rethinkdb.com/docs/data-
types/](http://www.rethinkdb.com/docs/data-types/)

~~~
wereHamster
Neither database natively supports the types I need. The decision which
database to use was not at all based on that criteria. I suspect either one
would be suitable (from a purely technical perspective).

It is not realistic to create a separate table for each type. You'd end up
with hundreds of tables (and would have to replicate the types in two places).
You can only break down the types so much, at some point you'll want to have
columns with your own special types which you want to treat as primitives.

------
mberning
I love me some document based databases. But I find it ridiculous how much FUD
there is out there about them. After being severely burned by some "It's not
fully ACID" types in my organization I am seriously reluctant to even use them
on projects where they are ideally suited. Sorry for the rant. I think they
are wonderful tools, but I also think people should know about the political
implications of using them in organizations where 'old fashioned' or
conservative thinking might come in and ruin their day.

~~~
matthewmacleod
I'm not sure the level of FUD is unjustified.

I've seen far, far more cases of document databases being used in situations
that are entirely inappropriate, than in situations where they are ideally
suited.

I'd argue that the situations in which it's a good choice to use a document
database are pretty rare, and that you're almost always better off using a
bog-standard relational database.

~~~
ppj606
With the rise of realtime applications I'd argue that the situations for using
NoSQL for the common project are growing.

I think the author of the post has made a good point, that many seem to forget
- it is always a trade-off whichever tool you use.

~~~
moe
_With the rise of realtime applications_

What does "realtime" have to do with the datastore?

~~~
ppj606
The usage and access of data in your data store when building a realtime
application. If you were building a chat-app these days you are given a number
of very easy to adopt (minimal complexity/edge cases) NoSQL solutions -
Firebase, Parse, RethinkDB.

Whereas with SQL you have the option of using cache layers (which you could
say is NoSQL anyway), message queues, naive timely polling and log monitoring.

With NoSQL the solutions are just simpler. For example Firebase has a tutorial
for creating a real-time chat [1] - it takes 5 minutes.

[1]
[https://www.firebase.com/tutorial/#example/chat](https://www.firebase.com/tutorial/#example/chat)

------
jimmytucson
Some of these examples are not equivalent. For example,

    
    
        r
          .db('dragonball')
          .table('characters')
          .filter(function(row){
            return row('maxStrength').gt(700000).and(row('species').contains('Saiyan'));
          })
          .orderBy(r.desc('maxStrength'));
    

presumably returns one row per character, whereas

    
    
        SELECT c.* 
        FROM characters c
        INNER JOIN character_species cs ON c.id = cs.character_id
        INNER JOIN species s ON cs.species_id = s.id
        WHERE max_strength > 700000
        AND s.name = 'Saiyan'
        ORDER BY max_strength DESC;
    

returns _at least_ one row per character. Rather, the equivalent SQL
(depending on what syntax is available) might be:

    
    
        SELECT c.* 
        FROM characters c
        WHERE max_strength > 700000
        AND EXISTS (SELECT NULL
                    FROM character_species cs
                    INNER JOIN species s ON cs.species_id = s.id
                    WHERE c.id = cs.character_id
                    AND s.name = 'Saiyan')
        ORDER BY max_strength DESC;
    

The above returns one row per character, no matter how many matching entries
there are in `character_species` or `species`.

Pursuing this a little further then, suppose you wanted to expand your search
to include humans and androids. In SQL, only a slight modification is needed:

    
    
        SELECT c.* 
        FROM characters c
        WHERE max_strength > 700000
        AND EXISTS (SELECT NULL
                    FROM character_species cs
                    INNER JOIN species s ON cs.species_id = s.id
                    WHERE c.id = cs.character_id
                    AND s.name IN ('Saiyan', 'Human', 'Android'))
        ORDER BY max_strength DESC;
    

I shudder to think what that would look like in functional form. Something
like this?

    
    
        r
          .db('dragonball')
          .table('characters')
          .filter(function(row){
            return row('maxStrength').gt(700000).and(row('species').contains('Saiyan')).or(row('species').contains('Human')).or(row('species').contains('Android'));
          })
          .orderBy(r.desc('maxStrength'));
    

But how would it know how to group the disjunctions?

In my very limited experience, examples like these always look so tantalizing
until you start taking them a little further and then you quickly realize
there's no such thing as a free lunch. In this case, you pay for the
flexibility of RethinkDB with the expressiveness SQL.

~~~
atnnn
The RethinkDB query language can be very expressive.

What you shudder to think of could be expressed as

    
    
        row('species').contains(function(s){ r.expr(['Saiyan', 'Human', 'Android']).contains(s) })
    

Or also as:

    
    
        row('species').setIntersection(['Saiyan', 'Human', 'Android']).isEmpty().not()
    

The succintness of ReQL depends on that of the host language. JavaScript's
bulky syntax for functions and lack of operator overloading make queries a lot
more cluttered.

~~~
dkersten
Untested, but also something like this should work:

    
    
        r.expr(['Saiyan', 'Human', 'Android']).contains(row('specias'))

------
baldfat
I still struggle for why someone would go RethinkDB over Hardoop. If you have
the "Big Data" issue for using a Non-SQL DB (Another struggle of mine to find
a clear case for its use).

Why go RethinkDB when you limit what you can do with your data at the start.
You can't do so things but the big one for me is you really can't do
statistical analysis (Author pointed it out in the article). Why have a Non-
SQL that really hinders the biggest selling point for the non-SQL DB? Wouldn't
it just be better to use a SQL and expand the scheme horizontally???

Serious question that I don't understand.

~~~
mglukhovsky
RethinkDB and Hadoop solve fundamentally different use cases.

RethinkDB is designed for building scalable, realtime apps -- the query
language, realtime push features (e.g. subscribing to queries), and clustering
are purpose-built to help developers build and scale realtime apps.

Hadoop is designed for distributed processing of large data sets, and is
generally used for analytics workloads and data processing.

NoSQL really just means "not using SQL". A lot of projects fit under that
umbrella, including graph databases, key-value stores, document databases, and
analytics databases, and they all address very different use cases.

------
robodale
...and why the hell would you want to do this.

