
RethinkDB 1.12: simplified map/reduce, ARM port, new caching infrastructure - coffeemug
http://rethinkdb.com/blog/1.12-release/
======
taf2
I don't understand why databases like RethinkDB and MongoDB don't just provide
a SQL interface. I get it - they are marketing themselves as NoSQL but they
are infact providing almost exactly the same set of features that say MySQL
provided back in the day a simple "fast" sub-set of SQL. Instead they use the
lack of an interface as a marketing gimmick... when in reality we have to
learn a new query language... :(

~~~
coffeemug
_> I don't understand why databases like RethinkDB and MongoDB don't just
provide a SQL interface._

Actually, it's a really good question. I'm one of the ReQL designers at
Rethink, and I was the one pushing for no SQL compatibility. Here is some of
my reasoning (we could talk about this for days, though):

    
    
      * Even SQL designers would tell you SQL isn't a very good
        programming language. It even looks like Cobol! Imagine if
        every language after Cobol decided to be backwards compatible
        -- what sort of world would we live in?
      
      * SQL is really bad for querying hierarchical data with lots of
        empty columns. The quality of experience you get from learning
        a new language designed for JSON, far outweighs the downsides
        of learning it. We're assuming people will be using ReQL
        fifteen years from now.
      
      * Designing a language that embeds into your programming language
        wasn't possible before -- but it is now. That means no more SQL
        injection, no more string manipulation, no more heavy
        ORMs. Well worth it in my opinion.
      
      * The chaining paradigm (largely pinoneered and proven by jQuery)
        is magical for getting people to intuitively understand how to
        write complex queries. No more StackOverflow questions of "How
        do I do X in SQL?" because there is now an intuitive
        consistency to the language.
      
      * SQL compatibility is very, very difficult. The standard is hard
        to implement, and has lots of grey area around the edges. You
        can go for full bug-by-bug compatibility, which would take
        decades. Or you can go for basic compatibility, which confuses
        people. They try to port their application, it works for a
        while, and then breaks in some grey area in production, which
        results in a terrible user experience.
    

As an example, I'd ponder on why SQL has both the `where` keyword, and the
`having` keyword. This alone wouldn't be enough of a reason to design a new
language, of course, but this sort of thing permeates SQL. It's 2014. We can
do better.

~~~
dataminded
Background: I've done a fair bit of hiring for analysts and have worked in
most facets of analytics/DW including vendor evaluation/selection and
implementation.

I think that the lack of a SQL interface is a big negative. It's really hard
to find analyst (not programmers) with good technical skills or train them to
proficiency in new technologies. I would have a very hard time selling a
platform that existing analysts wouldn't be able to use right away and that
few people on the market could work with.

Also, designing a new language makes integration with other applications that
much harder.

~~~
coffeemug
That's a great point. Our hypothesis is that great querying capabilities for
analysts and great querying capabilities for programmers should look
completely different, and be two totally different technologies. We'll
eventually ship a solution for analysts, but I don't think a great database
product should shoot for a lowest common denominator.

------
slap_shot
RethinkDB is my favorite piece of my current tech stack. I tell everybody I
know about it, and I'm soon to release some blog posts and speak at a Meetup
showing off what it is good at and how to get started - I really just can't
say enough good stuff about these guys.

A high level overview from someone who has used it in production since 1.10
(about six months):

Pros:

* ReQL is a beautiful DSL that makes querying and using my data simpler than anything I've ever had before.

* An amazing UI lets you quickly do the things you do the most (verify a query, cherry pick some results, add/drop tables, indexes, shards, etc).

* Unparalleled support. During almost any reasonable hour, a RethinkDB employee will quickly field any question you have on #rethinkdb (freenode), their User Group, or email.

* Quick releases. These guys ship major releases every 60 to 90 days. Each release offers huge features, performance improvements, and bug fixes. They do a great job of listening to what people want and implementing them fast.

Cons:

* Database is still technically "beta" \- great for side projects and prototyping, but be mindful if you intend to use it in production.

* Works great on tables of a few GB, but performance really degrades on the next order of magnitude. It looks like there were major changes in 1.12 addressing this.

* Only three officially supported clients (JS, Python, Ruby). That's a good start for their target market, but it is limiting for some.

RethinkDb is an archetype for startups - building what people want, shipping
fast, always talking to customers, and clearly passionate about what they do.
Even if you don't use their product, we all can learn from these guys. I hope
they do well.

~~~
weixiyen
Ditto. Also appreciated is the upfront honesty about what their tech can and
cannot do currently, and what they plan to do to address limitations / bugs.

I really love their ability to design sensible APIs paired with their honesty,
gives me the confidence to begin using this software right now with belief
that it will eventually get to a very stable point.

Would not use it for any multi-million$ business just yet but it's perfect for
personal projects that require quick iteration which might in the future
become big.

------
farhanpatel
Has anyone used RethinkDB for a relative large database(100GB+)? I cant seem
to find performance benchmarks anywhere.

~~~
egeozcan
That's something I've been searching for, too. I actually opened this page,
wishing that someone linked to a benchmark for large data sets. Large datasets
are the reason I'm trying to move away from mongodb completely, and I've been
very happy with postgres but I'm willing to give a try to others.

~~~
coffeemug
Hi, slava @ rethink here.

We'll be publishing benchmarks and case studies for large data sets in the
coming months. You might want to wait for those because there will be tons of
info, and lots of bugs that I'm sure are still lurking ironed out. However, if
you want to be one of the early adopters and give it try, we'd absolutely love
your feedback and will work hard to incorporate it into the product.

You can use our regular channels for feedback (rethinkdb.com/community), your
shoot me an e-mail any time -- slava@rethinkdb.com.

~~~
egeozcan
Hi, do you have any plans to support FreeBSD?

------
spyder
I like their API and the webadmin interface but I tried the 1.10 with node.js
client (using the cpp protobuffer) and in my test it was slow compared to
MySQL (InnoDB). A simple select query with 100 items took around 70 ms while
the same query was done in 3ms with MySQL. I'm not sure what did i do wrong or
it is just in such an early stage. Also i cannot get versions above 1.10 from
the repo on CentOS.

~~~
coffeemug
Latency for range queries is a surprisingly tricky issue, there were a number
of problems around it that we solved. Check out this issue for example --
[https://github.com/rethinkdb/rethinkdb/issues/1766](https://github.com/rethinkdb/rethinkdb/issues/1766).
Most of the problems have been resolved in 1.12, so we'd encourage you to give
it a try again!

As far as the outdated CentOS build -- I haven't heard of this problem before,
but I'll check with @atnnn who's in charge of packaging -- see
[https://github.com/rethinkdb/rethinkdb/issues/2176](https://github.com/rethinkdb/rethinkdb/issues/2176).
We'll get this resolved ASAP.

EDIT: ok, @atnnn confirmed that 1.12 works on CentOS. If it doesn't for you,
could you post more details on the GitHub issue? It would help immensely!

------
dkhenry
I love the elegance of ReQL and how well it works in dynamic languages, but I
still can't figure out a good way to port that lambda syntax used in python
into the Java driver.

~~~
coffeemug
What about lambda expressions in Java 8? I haven't programmed in Java in
years, so unless there are limitations I'm not aware of, they seem like a
great fit.

Without those, doing an elegant driver would be pretty hard, which
unfortunately probably means we'll never get good ReQL support into legacy
Java.

~~~
dkhenry
So the way all underlieing communication is done is through a protobuf. So in
python I know they hijack the expression and they can then construct the
protobuf from the passed in python lambda. In Java land we can't really do
that since lambdas are desugared at compile time to just be functions. In Java
I can't even overload operators like you can in Scala or C++ to make it a
little nicer. The result is your pretty much left with rolling the protobuf by
hand.

~~~
coffeemug
Ah, got it!

The upcoming 1.13 release will introduce a pure JSON interface for queries
(which official drivers will switch to). You'll be able to construct JSON
directly, and send it to the server -- no protobufs. Would that make things
easier?

~~~
RyanZAG
Sounds like that would work well - it's the approach ElasticSearch has taken
with their APIs and they're generally very nice to use.

~~~
coffeemug
To clarify, in 1.13 JSON will be a communication protocol for queries. It's
not intended to be used by the end user, but by the client drivers. You _can_
use it as an end user, but it isn't nearly as nice as native language drivers.

The change is meant to simplify driver development, packaging, and improve
performance. Protobuf is worse than JSON in almost all of these categories.

------
knyt
> you no longer have to manually specify cache sizes for tables to prevent
> running over memory and into swap

I'm really glad you're addressing this. Does this auto-sizing apply to
RethinkDB's memory usage in general? The last time I tried using it on a small
VPS, I set my table's cache size really low, but I still ran out of memory
whenever I ran queries on a large table.

Definitely looking forward to secondary index export/migration and to upgrades
without the export-import cycle.

~~~
coffeemug
The autosizing only applies to the cache itself. There is a separate issue for
a query data structures memory limit
([https://github.com/rethinkdb/rethinkdb/issues/1375](https://github.com/rethinkdb/rethinkdb/issues/1375)).
We'll try to address it soon, but the cache autosizing should take care of
most problems people have seen before.

I'm also looking forward to stable formats and seamless migration, but it's a
really hard problem. I think for the time being we'll introduce a stable
branch, and users will have to pick between new features or stability. It's
not ideal, but will give most people most of what they want.

------
transientbug
RethinkDB is pretty awesome. Its a blast working with it and the Python driver
is quite well rewritten, imo.

As the author of PyRethinkORM for Python it was ridiculously easy to write
compared to an ORM for say SQL which was a major selling point for using
RethinkDB behind my last several projects.

I'm fairly excited about the ARM port as I've been wanting to use Rethink on a
few projects on my BeagleBone White/Black. The new map/reduce changes are
pretty cool too.

~~~
pandemicsyn
The python driver is cool (and PyRethinkORM was super handy!), but setting up
the driver to use the optimized protobuf backend was a PITA.

[http://www.rethinkdb.com/docs/driver-
performance/](http://www.rethinkdb.com/docs/driver-performance/)

~~~
coffeemug
Hey, sorry you (and everyone else) ran into this! This will go away in the
next version (1.13), because we're removing the requirement for protobufs and
supporting a pure json transport interface. It turns out to be both faster,
easier to setup, and paradoxically more space efficient.

See
[https://github.com/rethinkdb/rethinkdb/issues/1868](https://github.com/rethinkdb/rethinkdb/issues/1868)
for more details.

------
efge
Still no official Java driver? How unfortunate...

~~~
dkhenry
I know its not an official driver, but I maintain the Community Java driver
here[1]. If its missing something you would need feel free to open an issue
and I can add it for you. I have been using RethinkDB with java for a few
projects and it works well enough for me.

[https://github.com/dkhenry/rethinkjava](https://github.com/dkhenry/rethinkjava)

------
RivieraKid
What are the three top practical reasons to choose RDB over a traditional
database such as Postgres?

~~~
coffeemug
There are two main reasons:

1\. If you're dealing with JSON, a native JSON database provides a 100x better
programming experience than a relational system. 2\. Scalability out of the
box. You can just add a node, reshard, and keeping going. It's much harder in
relational systems.

If you have lots of null columns, hierarchical relationships, or eventual need
for scale out, give RethinkDB a try.

If you have traditional rigid data or ACID requirements, stick with a
relational DB.

Hope this helps.

~~~
RivieraKid
Thanks, that definitely helps.

------
taybin
This is great. I love RethinkDB and had a lot of fun writing Erlang and Elixir
drivers for it.

------
dignati
I really like how they are making videos to announce news or advertise their
product.

------
jzelinskie
I liked the Evangelion reference in the example video.

------
leccine
Somebody pls fix the Clojure driver... :)

