
RethinkDB 2.0 is now production ready - hopeless
http://rethinkdb.com/blog/2.0-release/
======
richardwigley
Is Rethink going to stay in the community? Or is there a chance that it could
be bought out? I don't want to spend time learning something and have it go
private like FoundationDB. I'm assuming GNU and Apache is a good thing?

How is RethinkDB licensed?

The RethinkDB server is licensed under the GNU Affero General Public License
v3.0. The client drivers are licensed under the Apache License v2.0.
[http://rethinkdb.com/faq/](http://rethinkdb.com/faq/)

~~~
danielmewes
Daniel @ RethinkDB here. As you mention, RethinkDB is fully open source so
RethinkDB is always going to remain freely available.

~~~
benatkin
I don't consider the AGPL to be _fully open source_. To me, it isn't in the
spirit of open source.

(BTW the Open Source Institute was unable to get a trademark for "open source"
so it doesn't matter that they approved it.)

~~~
e12e
I hesitate to comment (we've had a few copyleft vs BSD etc-discussions...) --
still, I think the best way to look at the AGPL is as the GPL patched to work
around the move from software distribution to software as a service: the end
user no longer gets a copy of the software, and so the GPL doesn't protect the
end user any more (which is who the GPL is for, incidentally the end user
might also be a developer -- but that is incidental: the first Freedom
(Freedom 0) is the freedom to _run_ code. You don't have that freedom with
SaaS -- if the service provider goes away, so does your ability to run the
software).

Now, one can be in agreement with the idea that the four freedoms are
important, especially as we increasingly live in a world where software is not
only convenient, but necessary in our daily lives -- but the idea with the
AGPL, and why it is needed for server software -- is pretty clear.

------
notdonspaulding
Cool!

I've started to look into RethinkDB in the past, and I'm very interested in
the features it claims. However, I only have so much time to investigate new
primary storage solutions, and our team has been burned in the past by jumping
too quickly on a DB's bandwagon when the reliability, performance, or tooling
just wasn't there.

As of late, we've come to rely on Aphyr's wonderful Call Me Maybe series[0] as
a guide for which of a DB's claims are to be trusted and which aren't. But
even when Aphyr hasn't tested a particular DB himself, some projects choose to
use his tool Jepsen to verify their own claims. According to at least 1
RethinkDB issue on Github, RethinkDB still hasn't done that[1].

Not to poo poo on the hard work of the RethinkDB team, but for me, the TL;DR
is NJ;DU (No Jepsen, Didn't Use)

[0] [https://aphyr.com/tags/jepsen](https://aphyr.com/tags/jepsen)

[1]
[https://github.com/rethinkdb/rethinkdb/issues/1493](https://github.com/rethinkdb/rethinkdb/issues/1493)

~~~
coffeemug
Slava @ Rethink here.

This is a great point, and we're on it! We have a Raft implementation that
unfortunately didn't make it into 2.0 (these things require an enormous amount
of patient testing). The implementation is designed explicitly to support
robust automatic failover, no interruptions during resharding, and all the
edge cases exposed in the Jepsen tests (and many issues that aren't).

This should be out in a few months as we finish testing and polish, and will
include the results of the Jepsen tests. (It's kind of unfortunate this didn't
make it into 2.0, but distributed systems demand conservative treatment).

~~~
sixdimensional
This conservative/consistent/responsible approach is one of the reasons I have
faith in RethinkDB. You always seem to be taking the time to build it right
and that is priceless.

------
cdnsteve
I'm going to give this a spin out of pure respect for the team that's
dedicated 5 years to a product without cashing out. Hats off. Your CEO has
some respectable... anatomy.

------
geddski
I've been using RethinkDB for a while now and I really enjoy working with it.
It's a great fit for React and Angular 2 apps with their one-way data flow
through the application. Hook up a store or a model to an event source
(server-sent events) that streams the RethinkDB changes feed and it's just
awesome and simple. Realtime shouldn't be this easy, totally feels like
cheating. Love it.

I also really like the ability to do joins, where before in Mongo I would have
to handle data joins in the app level.

~~~
e12e
How do you deal with user authentication, authorization and data encryption?
Do you have a web server/application server or do you just combine static
js/html/css resources and RethinkDB?

I'm kind of enamoured with the idea of couchapps -- but I'm still not entirely
comfortable with having my db be my web and app server, as well as having it
manage passwords etc... as I'm reading up, I'm slowly convincing myself it's
possible to both make it work, be easy, support a sane level of TLS, load
balance _and_ be secure with proper ACL support... but very few
tutorials/books seem to really deal with that to a level that brings me
confidence.

~~~
jkarneges
By "an event source [...] that streams the RethinkDB changes feed", the parent
is implying a separate web service layer that consumes data from RethinkDB and
sends it out to clients. RethinkDB is not meant for direct access by clients.
More about RethinkDB access here:
[http://rethinkdb.com/docs/security/](http://rethinkdb.com/docs/security/)
(TL;DR: plaintext shared key or ssh tunnel)

------
evo_9
Now if only Meteor would support this all would be good in the world.

~~~
GordyMD
+1

RethinkDB's realtime capabilities would fit perfectly with Meteor.

~~~
nileshtrivedi
How? Meteor's server-side architecture is still oriented around polling the
DB, and I believe that's because many apps are still explicit request-response
oriented.

~~~
GordyMD
As @imslavko said, when using Meteor with MongoDB (which I believe is the only
production ready DB driver) it observes the oplog [1] for changes. You can use
the polling observer too though.

You can find out more about the LiveUpdate core project of Meteor on their
site [2] - it basically says the implementation of Live Updates for each db
driver is independent to what the db is capable of. Specific mention of
RethinkDB and Firebase is made as DBs that are built with making realtime data
something that you get for relatively little work.

[1]
[https://github.com/meteor/meteor/blob/devel/packages/mongo/o...](https://github.com/meteor/meteor/blob/devel/packages/mongo/oplog_observe_driver.js)

[2] [https://www.meteor.com/livequery](https://www.meteor.com/livequery)

------
Xorlev
Congrats on the 2.0! It's been interesting to watch as a project.

Do you expect that as you stabilize you'll officially support more drivers? Or
are you going to leave that as a community effort?

~~~
coffeemug
Slava @ Rethink here.

We're planning to take the most well-supported community drivers under the
RethinkDB umbrella (assuming the authors agree, of course). It will almost
certainly be a collaboration with the community, but we'll be contributing
much more to the community drivers, supporting the authors, and offering
commercial support for these drivers to our customers.

~~~
tshannon
That's good to hear, because the Go driver has been exhaustively maintained by
a single developer
([https://github.com/dancannon/gorethink](https://github.com/dancannon/gorethink))
Dan Cannon, and I'm sure he (as well as the Go community) would love to see
some support.

~~~
thethimble
Ditto regarding a java driver. It seems crazy for a database to not provide
native java support.

~~~
shockzzz
JAVA

------
mping
Anyone has some numbers on performance? I tried RethinkDB 1.x and the
performance wasn't quite there yet, specially bulk import and aggregations.

~~~
coffeemug
We'll be publishing a performance report soon (we didn't manage to get it out
today).

Rough numbers you can expect for 1KB size documents, 25M document database:
40K reads/sec/server, 5K writes/sec/server, roughly linear scalability across
nodes.

We should be able to get the report out in a couple of days.

~~~
lobster_johnson
Any work done in 2.0 for improving aggregation performance?

The last time I tried with 1.16, I gave up my testing when even the simplest
aggregation query (count + group by with what should be a sequential,
streaming scan) took literally minutes with RethinkDB, compared to <1s with
PostgreSQL. Rethink coredumped before I gave it enough RAM, after which it
blew up to around 7GB, whereas Postgres uses virtually no RAM, mostly OS
buffers.

~~~
danielmewes
We did a couple of scalability improvements in 2.0, but didn't optimize groups
and counts specifically.

Would you mind writing me an email with your query or opening an issue at
[https://github.com/rethinkdb/rethinkdb/issues](https://github.com/rethinkdb/rethinkdb/issues)
(unless you have already?)? I'd like to look into it to see how we can best
improve this.

We're planning to implemented a faster count algorithm that might help with
this
([https://github.com/rethinkdb/rethinkdb/issues/3949](https://github.com/rethinkdb/rethinkdb/issues/3949)),
but it's not completely trivial and will take us slightly longer to implement.

~~~
lobster_johnson
What I was doing is so trivial, you don't really need this information. This
was my reference SQL query:

    
    
      select path, count(*) from posts group by path;
    

(I don't have the exact Rethink query written down, but it was analogous to
the SQL version.)

You can demonstrate RethinkDB's performance issue with any largeish dataset by
trying to group on a single field.

The path column in this case has a cardinality of 94, and the whole dataset is
about 1 million documents. Some rows are big, some not; each has metadata plus
a JSON document. The Postgres table is around 3.1GB (1GB for the main table +
a 2.1GB TOAST table). Postgres does a seqscan + hash aggregate in about
1500ms.

It's been months since I did this, and I've since deleted RethinkDB and my
test dataset.

~~~
danielmewes
As a second data point: I tried

    
    
        table.groupBy(function (x) { ... }).count()
    

where the function maps the documents into one out of 32 groups (so that's
less than your 94, but shouldn't make a giant difference... I just had this
database around). Did that on both 1 million and a 25 million document table,
and memory usage looked fine and very stable. This was on RethinkDB 2.0, and I
might retry that on 1.16 later to see if I can reproduce it there.

Do you remember if you had set an explicit cache size back when you were
testing RethinkDB?

~~~
lobster_johnson
Cool. Well, the process eventually crashed if I used the defaults. I had to
give it a 6GB cache (I think, maybe it was more) for it to return anything.
The process would actually allocate that much, too, so it's clear that it was
effectively loading everything into memory.

------
mberning
For the rubyists out there check out
[http://nobrainer.io/](http://nobrainer.io/)

~~~
sandstrom
Is anyone using nobrainer in production?

We're currently using Mongoid (MongoDB ORM), and an Active Record like ORM for
RethinkDB is the main thing holdings us back.

I don't have great insight into nobrainer, but last I checked it seemed like
joins wheren't implemented (but on the roadmap).

~~~
vdaniuk
I like rethinkdb and have been successfully using their official ruby and js
libraries for some time.

Nobrainer orm wasn't fun though, too many edge cases that interfere with
activerecord and rails conventions. Going a bit on a tangent, after many
experiments I've developed a strong conviction that pg is the best database
choice for rails, especially with the jsonb datatype included in 9.4. It is
the best of two worlds: reliable, proven sql db that plays really well with
Rails and has nosql capabilities, including indexing and quering. So good.
Ymmv.

------
expando
Selling support is a great non-intrusive business model.

~~~
ThinkBeat
Except that it incentivises a company to build a product that requires
continuing support.

That can be a good thing or a bad thing.

~~~
coffeemug
_> Except that it incentivises a company to build a product that requires
continuing support._

People say this a lot, but in our case we really haven't seen this incentive
for a couple of reasons.

Large organizations are more than happy to pay for training and development
support to accelerate their time to market. It doesn't matter how polished
your product is -- databases are complex enough that people are willing to pay
for best practices, training, and support.

Similarly, databases are pretty critical pieces of the infrastructure. If
anything goes wrong, it can significantly impact the business, so people
always want operational/production support.

There are many enterprise services that can be built on top of the product
that can be very valuable. You don't have to build a crappy product -- there
are plenty of ways to monetize with a great product.

Finally, a bad product will significantly limit growth of the company in the
long term. There are lots of options now -- you can't get away with building a
crappy product and an artificial monopoly.

If you see a crappy product from a company that offers subscription support,
it's probably not because of misaligned incentives. Building databases is
_really_ hard, I don't think the business model has much to do with it.

------
xtrumanx
Lots of congratulating on this thread and a hell of a lot of points for a
software release. I've been on HN consistently for a long while and I didn't
realize there was so much love and hype for RethinkDB here.

Have I missed something?

~~~
andrewflnr
I guess you have. There are a lot of us into alternative databases that are
hoping for Rethink to fulfill the original promise of MongoDB. That said, I
can't blame you for not devoting a bunch of attention to it. :)

~~~
jasondc
Can you be more specific on the original promise MongoDB didn't fulfill?

~~~
shockzzz
It's a nightmare to scale and has performance quirks that are really
unexpected. Many, many companies have had to spend enormous amounts of
developer time to migrate off of MongoDB to something else.

~~~
andrewflnr
That and the unreliable-by-default write settings.

~~~
e12e
I think mongodb is following a similar path to what MySQL did. Be really good
at one thing, market as something else -- and then slowly, slowly catch up to
the hype (sort of).

As I understand it mongodb have changed the default settings (probably why
someone downvoted you) -- but the fact that it _was_ off by default is still
something that is rightfully hard for the team to live down.

And while a lot of people are probably still happily using MySQL -- I
personally see little use for it, when PostgreSQL is an option.

I maybe wrong, but I think both mongodb and mysql appeal to the same groups:
people that don't know or care about normalization, databases and
datastructures -- and really just want image (as in Smalltalk) based
development, but has been tricked into using php/javascript etc.

It's kind of crazy that you have two mature (one Free, one free) object
databases that have seen some real-world usage -- and neither get any love.

One is zodb, the Z object database, developed for zope/plone -- one of the
first web application frameworks -- and a major contributor to python
(invented eggs, buildout...). It's ridiculously easy to use outside of
zope/plone/pyramid[1] and now has a free replication service[2].

The other one is gemstone glass[3] which works with Smalltalk and have their
own ruby runtime, maglev[4].

[1]
[http://zodborg.readthedocs.org/en/latest/documentation/artic...](http://zodborg.readthedocs.org/en/latest/documentation/articles/ZODB1.html)

[2] [http://www.zope.com/products/x1752814276/Zope-Replication-
Se...](http://www.zope.com/products/x1752814276/Zope-Replication-Services)

[3] [http://seaside.gemtalksystems.com/](http://seaside.gemtalksystems.com/)

[4] [http://maglev.github.io/](http://maglev.github.io/)

------
dkhenry
Awesome news. I have used Rethink for a few internal projects and while I
don't think it has that one "killer feature" that other DB's don't it is such
a painless experience in development and deployment that makes just worlds
better then trying to set up and scale some of the other solutions.

BZ rethinkdb team.

------
kolencherry
Congrats on the 2.0 release! Changefeeds are an _incredibly_ powerful feature.
We're looking forward to the next release with automagic failover!

------
_dancannon
Congratulations, been looking forward to this release for a while!

~~~
straik
I think this a good place to say thank you for you're work on the Go Rethink
driver. This is a clear written easy to follow and effective peace of code.

~~~
_dancannon
Thank you very much! I hope to have an update to the Go driver which supports
RethinkDB v2.0 within a couple of hours.

~~~
nulltype
I would like to thank you as well! I didn't really have any time to work on
rethinkgo after I made the first version, thanks for doing such a good job
with gorethink.

------
billclerico
congrats Slava, Mike & team. in an age of thin apps getting shipped in weeks
or months, the patience you showed in spending 5 years developing some pretty
hard-core technology is amazing. really excited for you guys!

------
gauravphoenix
any plans of releasing officially supported Java driver? For most enterprise
oriented apps, having officially supported Java driver will be great.

~~~
coffeemug
Yes! No ETA yet, but we're on it.

------
dorfsmay
Does RethinDB has a concept of transaction? My question is actually about
restoring a lost node... If a node is rebooted, will all the data for its
shards going to be sent again? Or just the delta?

Similarly if I have to rebuild a node from scratch, is there a way to prime it
so that a massive copy of all the data in the cluster gets copied to it from
the other nodes?

~~~
coffeemug
> If a node is rebooted, will all the data for its shards going to be sent
> again? Or just the delta?

Just the delta. We built an efficient, distributed BTree diff algorithm. When
a node goes offline and comes back up, the cluster only sends a diff that the
node missed.

> Similarly if I have to rebuild a node from scratch, is there a way to prime
> it so that a massive copy of all the data in the cluster gets copied to it
> from the other nodes?

You don't have to do that, it happens automatically. You can have full
visibility and control into what's happening in the cluster -- check out
[http://rethinkdb.com/docs/system-tables/](http://rethinkdb.com/docs/system-
tables/) for details on how this works.

~~~
dorfsmay
> You don't have to do that, it happens automatically

Well, in a past life, I used another store that did that automatically, the
issue with that is that EITHER it kills the cluster because of read-congestion
as it re-builds the "new" node, OR, if you limit the bandwidth for node-
building, it takes for ever and a half to rebuild a node which means that you
are exposed with one less shard of what was on that node.

What are the chances of a filesystem snapshot to be consistent enough to be
used to prime a crashed node? What about restoring backup files from other
nodes?

~~~
coffeemug
Congestion vs. time is definitely a hard problem. We've done an enormous
amount of tuning to make this work, and the upcoming Raft release does even
more. This part has been quite solid for a while, so I think you might have a
better experience with RethinkDB than what you're used to.

There is currently no other way to prime the node -- I hope we don't have to
add it. This sort of functionality should work out of the box.

------
thoughtpolice
I've updated NixOS to include 2.0.0-1:
[https://github.com/NixOS/nixpkgs/commit/fe6ec3d13a1554458e64...](https://github.com/NixOS/nixpkgs/commit/fe6ec3d13a1554458e647511fb364b65572b363a)
\- any way we can get it mentioned on the website?

~~~
coffeemug
Could you suggest a pull request in docs?
([https://github.com/rethinkdb/docs](https://github.com/rethinkdb/docs))

------
cookiecat
Congrats guys, RethinkDB has been a joy to use so far, but the 3rd party .net
driver needs some help. I filed an issue here:
[https://github.com/rethinkdb/rethinkdb/issues/3931](https://github.com/rethinkdb/rethinkdb/issues/3931)

------
nickstinemates
Big fan of RethinkDB. Use it in all of my projects these days.

~~~
vonklaus
What were you using before? What are the pros and cons of the switch?

------
DAddYE
I'm very happy to see this milestone, even tho I haven't used it recently I
remember 2/3 years ago we tried it (adtech) for some heavy production
workload. Even if we chosen another product (cassandra) I was literally
surprised how well performed! Congrats!

------
wilsonfiifi
Well done guys! Have been wanting to use rethinkdb for my project but it
didn't have the "production ready" tag, so Mongodb was chosen instead. Now I
can confidently switch! It's a pity the Go driver isn't quite there yet
though.

~~~
_dancannon
I hope to have a "production ready" version of the driver ready in about a
month. I know its slow but currently I am the only dev working on maintaining
this project and all work is done in my free time.

If you have any further questions I would be more than happy to answer them on
[https://gitter.im/dancannon/gorethink](https://gitter.im/dancannon/gorethink).
Thanks!

~~~
wilsonfiifi
Hey no worries I absolutely understand. Apologies for lamenting on the
pace/state of your contribution and thanks for your time and effort.

------
aioprisan
the commercial services launch is critical and will speed adoption from large
players

------
dorfsmay
Why would I use RethinkDB instead of OrientDB?

~~~
coffeemug
Check out [http://rethinkdb.com/faq/](http://rethinkdb.com/faq/) for details
on when RethinkDB is a great choice. The short version is that if you're
building realtime apps, RethinkDB is an awesome choice because it pushes data
to the application (which makes building and scaling realtime apps
dramatically easier).

~~~
ScottBurson
Hi Slava, the FAQ has a typo in the second sentence: "architecutre".

~~~
coffeemug
Thanks -- fixed. Will take a little bit to push the site update live.

~~~
mmcclellan
But ... but I expected the push to be real time. Just kidding.

------
nviennot
Lots of hard work has been poured into this release :)

Congrats to the RethinkDB team!

------
jmtame
Congrats Slava, Mike and the rest of the folks at RethinkDB!

------
covi
Brilliant name (Yojimbo) and great cover photo there...

------
ataussig
Congrats to the RethinkDB team on this huge milestone!

------
babo
Looking forward to install it from homebrew but it's not there yet. Good to
see that for python drivers PIP is already updated!

~~~
coffeemug
It should be out later today. We're working on it now.

------
Fauntleroy
Now that 2.0 is production ready, will we be seeing some RethinkDB providers?
A simple Heroku integration would be amazing for quickly prototyping apps with
a new database technology.

~~~
jkarneges
As Slava mentioned, you can use Compose.io. It requires using an SSH tunnel,
though, which is a little tricky in Heroku. Here's a tunnel script I made to
simplify this:

[https://github.com/fanout/leaderboard/blob/master/tunnel.py](https://github.com/fanout/leaderboard/blob/master/tunnel.py)

In particular, it reads the entire SSH private key as an environment variable,
so you don't need to commit the key to the git repository.

------
jessejhernandez
Congrats Mike & Team!

------
jjsalamon
Congrats guys! I've been looking forward to using Rethink.

Is windows support coming anytime?

------
Ciantic
I wish they did official TypeScript definition files. I'm a bit wary to rely
on huge DB API with community definitions only.

There are reasons to write TypeScript definitions for documentation generation
too, if not for the code as TS.

~~~
coffeemug
There is an official spec here: [http://rethinkdb.com/docs/writing-
drivers/](http://rethinkdb.com/docs/writing-drivers/) Not quite TS, but it's
well defined and new releases of the spec are carefully managed.

~~~
Ciantic
For me the TS is a tool to ensure my code is not using deprecated API. This is
a partly reason why Facebook is also pushing typing to JS with Flow.

Edit: And Guido is pushing it to Python with PEP 484:
[https://www.python.org/dev/peps/pep-0484/](https://www.python.org/dev/peps/pep-0484/)

It's inherent problem with dynamic languages, you have to read all new release
documents and migrate your code. With typed code I at least can be somewhat
sure I'm not using deprecated calls and such just by compiling.

------
jkot
Congratulations!

------
cachvico
Any thoughts about multi-doc transactions?

~~~
danielmewes
It's not currently on our road map.

Even though there are some well-researched algorithms for it, actually
implementing transactions in a distributed system is pretty hard. It also
comes at significant performance costs, which would interfere with our goal of
easy and efficient scalability.

~~~
cachvico
Thank you for the comment. I was wondering if something along the lines of
[http://blog.labix.org/2012/08/22/multi-doc-transactions-
for-...](http://blog.labix.org/2012/08/22/multi-doc-transactions-for-mongodb)
would be feasible.

------
weixiyen
Congratulations guys! Amazing update :D

------
hemantv
This is awesome :)

------
thomcrowe
Congrats guys!

