

RethinkDB 1.7: hot backup, atomic set/get, 10x insert performance improvement - coffeemug
http://rethinkdb.com/blog/1.7-release/

======
astral303
Cache getting out of control and the server getting OOM killed needs to be
fixed before I'd consider the database production-ready for large sets of
data.

[https://github.com/rethinkdb/rethinkdb/issues/97](https://github.com/rethinkdb/rethinkdb/issues/97)

It's a little disappointing that it has not been resolved for several releases
now.

Is there a target milestone for a production-ready release of RethinkDB? Is it
1.8 or 1.9?

~~~
throwit1979
_Is there a target milestone for a production-ready release of RethinkDB? Is
it 1.8 or 1.9?_

1.0 is supposed to mean production-ready. Are you suggesting this should be a
0.7 release?

~~~
hexedpackets
If I remember correctly, the RethinkDB team opted to match their in-house
version numbers with public releases. So the 1.x releases would be more
comparable to 0.x releases for other projects, and 2.0 would be considered
stable and production ready.

~~~
coffeemug
slava @ rethink -- this is exactly right. Sorry for confusion.

------
xfour
+1 for the ease of export and import now!

Not directly related to the new version, but speaking of upgrading previously,
make sure you do the migration scripts beforehand if you have a current
server.

I make the mistake of upgrading and not migrating beforehand from 1.4 to 1.6
and couldn't find an version of 1.4 since all the archives were down and
building from source on a VPS just wasn't happening. The Rethink team was
amazing in their support of building the old version for me specifically, and
if this is indicative of their dedication to user support can't wait to see
this become a big success.

~~~
coffeemug
slava @ rethink here. Thanks for the vote of confidence, and sorry for the
poor experience. The product is improving very quickly, and we decided to
trade off format stability for speed of development until we hit 2.0 (at which
point we'll freeze the formats and put more resources into backwards
compatibility).

~~~
simcop2387
This latest release has actually prompted me to check it out. I had been
meaning to but hadn't found an exact usecase for it yet, now I'm just going to
try to build something with it, though I don't know what yet.

This kind of dedication to it is definitely what makes me want to check it
out, you guys are doing some really great work on it.

------
RyanZAG
10x insert performance improvement seems very small considering this simple
benchmark someone did:

[http://stackoverflow.com/questions/15151554/comparing-
mongod...](http://stackoverflow.com/questions/15151554/comparing-mongodb-and-
rethinkdb-bulk-insert-performance)

    
    
      MongoDB: 0m0.618s
      RethinkDB: 2m2.502s
    

Although as stated in that post, this is likely because of the different
fsync() policies between the two databases.

~~~
jamesaguilar

        /dev/null: 0m0.001s
    

/dev/null is clearly the best database.

(If your performance numbers are too good to be true, they might not be true.)

~~~
alexpopescu
One aspect of micro-benchmarks that is most of the time ignored is that they
reveal different default settings various systems come with. And unfortunately
after seeing the results, not too many dig deeper to figure them out.

~~~
jamesaguilar
Indeed. And the default settings reveal something about a system's priorities.
If you care about your users' data, then Mongo's default settings should be a
giant red flag. The revealed priority is _seeming_ fast.

~~~
amenod
In noSQL world, _any_ DB's default settings should be a giant red flag. Don't
get me started about HBase... I lost 2 hours of data this way. :(

~~~
coffeemug
That was actually one of the explicit design goals of Rethink -- pick defaults
such that users _never_ have to wonder about the safety of their data. I know
the folks at Riak are also in this camp, so there are definitely NoSQL dbs
that do this well.

~~~
shin_lao
I'm not sure Riak is a good example as it's another example of an extremely
slow database.

There's also the angle that if it's to offer no performance benefit, perhaps a
classic relational database will do it?

I'm curious to hear your thoughts about it.

------
charlieyuan
Is there a plan to fix: "If the machine is acting as a master for any shards,
the corresponding tables lose read and write availability (out-of-date reads
remain possible as long as there are other replicas of the shards). ...
Currently, RethinkDB does not automatically declare machines dead after a
timeout. This can be done by the user, either in the web UI, manually via the
command line, or by scripting the command line tools."

[http://www.rethinkdb.com/docs/advanced-faq/#what-happens-
whe...](http://www.rethinkdb.com/docs/advanced-faq/#what-happens-when-a-
machine-becomes-unreachab)

------
dkhenry
I have been working on the Java driver for rethink for a bit now and I will
tell you the DB seems like an awesome mix of Mongo and Riak, Mapping objects
into the DB is super easy and doing datacenter aware operations is amazingly
easy.

~~~
enjo
What's the state of that driver look like? It's literally the only thing
holding me back from moving over today.

~~~
dkhenry
Right now you should have complete functionality, but Its not fully tested and
there are some "convience" methods that need to be added to some of the ReQL
classes to make it mimic the official API ( for example row is not a member
function of connection yet so r.update(r.row("foo")) doesn't work yet )

------
mglukhovsky
FYI-- for those looking to install on Ubuntu, we have a build problem with one
of our dependencies for the web UI. We're rebuilding the packages now; this
should take about half an hour on Launchpad. I'll update this comment when the
new package is available.

Edit: Packages have been uploaded to Launchpad, waiting in queue to build.
(12PM PST)

~~~
mglukhovsky
@atnnn confirms that Launchpad builds are now working and available-- Ubuntu
packages are live.

------
InAnEmergency
This is a really nice update. Migrating data between releases with a giant
Ruby script feels like a hack each time there's an update. Insert speed has
been a real annoyance. Fetching multiple keys is really nice (instead of
map/filter the entire thing). And expanding pluck() to nested documents makes
so much sense (I was worried ReQL would be limited to manipulating top-level
documents).

Overall, an exciting release. Going to upgrade and see what insert speed is
like on my setup.

~~~
InAnEmergency
For those interested, my insert time for ~250 records (~9.2MB of JSON) went
from 340 seconds to 20 seconds.

Unfortunately when migrating from 1.6.2 to 1.7.1 I lost a table (not sure how)
and all secondary indexes :(

------
kclay
Scala driver for anyone [https://github.com/kclay/rethink-
scala](https://github.com/kclay/rethink-scala), been fun working with rethink
(its not updated to 1.7 changes yet)

------
leif
For my curiosity, where is the insert bottleneck now?

~~~
coffeemug
We have to investigate to know for sure, but if I had to guess it's probably
in the storage engine/disk IO.

~~~
leif
Surely your SSD has more than 800 IOPS though. Have you done any profiling?

~~~
coffeemug
Sorry, I misunderstood your question. In this specific benchmark the
bottleneck is in the network and disk latency (since the benchmark sends out a
batch of writes, waits for a server acknowledgement, which in hard durability
mode means waiting on disk), and then sends out the next batch.

When we use a benchmark that doesn't bottleneck on latency (by adding more
concurrent clients, or by using noreply) the ops throughput approaches
theoretical IOPS throughput of the SSD.

------
blantonl
Does anyone know the status of PHP drivers for RethinkDB? I'm surprised there
aren't official drivers out yet.

~~~
coffeemug
There is a pretty well maintained community supported PHP driver --
[https://github.com/danielmewes/php-rql](https://github.com/danielmewes/php-
rql).

------
wildchild
It's cool to see atomic operations. Any plans to implement multi-document ACID
transactions? Just like all_or_nothing in pre 0.9 couchdb when in case of
conflict update was being rejected.

------
KaoruAoiShiho
Are you guys aware of any rethinkdb cloud hosts that are starting up?

~~~
coffeemug
Check out [https://www.rethinkdbcloud.com](https://www.rethinkdbcloud.com). (I
don't know about their current state, but it seems like an interesting
service)

~~~
ConceitedCode
I'm the main developer for RethinkDB Cloud. It's pretty rough at the moment.
You can expect 1.7 instances available in the over the weekend, along with
smaller shared instances, and lots of various updates. If there are any
questions or if anyone wants to be apart of the Heroku add-on testing you can
contact me at cam@rethinkdbcloud.com .

------
mey
Can anyone compare/contrast RethinkDB to CouchDB and MongoDB?

~~~
alexpopescu
While both CouchDB and RethinkDB store JSON, the differences between them are
more radical. I cannot post an as-extensive comparison as the one with
MongoDB, but here are some aspects.

 _Please keep in mind that this is not an authoritative comparison and it may
contain mistakes. Plus as for many such systems, the aspects covered are in
reality not that easy to be described in just a few words_.

Platforms:

\- RethinkDB: Linux, OS X

\- CouchDB: where Erlang VM is supported

Data model: \- both JSON

Data access:

\- RethinkDB: Unified chainable dynamic query language

\- CouchDB: key-value, incremental map/reduce

Javascript integration:

\- RethinkDB: V8 engine; JS expressions are allowed pretty much anywhere in
the RQL

\- CouchDB: Spindermonkey (?); incremental map/reduce, views are JS-based

Access languages:

\- RethinkDB: Protocol Buffers

\- CouchDB: HTTP

Indexing:

\- RethinkDB: Multiple types of indexes (primary key, compound, secondary,
arbitrarily computed)

\- CouchDB: incremental indexes based on view functions

Sharding:

\- RethinkDB: Guided range-based sharding (supervised/guided/advised/trained)

\- CouchDB: -

Replication:

\- RethinkDB - sync and async replication

\- CouchDB - bi-directional replication can be set between multiple CouchDB
servers

Multi-datancenter:

\- RethinkDB - Multiple DC support with per-datacenter replication and write
acknowledgements

\- CouchDB - (?)

MapReduce:

\- RethinkDB: Multiple MapReduce functions executing ReQL or Javascript
operations

\- CouchDB: views are map/reduce but they need to be pre-defined

Consistency model:

\- RethinkDB: Immediate/strong consistency with support for out of date reads

\- CouchDB:
[http://guide.couchdb.org/draft/consistency.html](http://guide.couchdb.org/draft/consistency.html)

Atomicity:

\- both document level

Durability:

\- both durable

Storage engine:

\- RethinkDB: Log-structured B-tree serialization with incremental, fully
concurrent garbage compactor

\- CouchDB: B-tree

Query distribution engine:

\- RethinkDB: Transparent routing, distributed and parallelized

\- CouchDB: none

Caching engine:

\- RethinkDB: Custom per-table configurable B-tree aware caching

\- CouchDB: none (?)

~~~
apendleton
Are there any plans for Couch-style incrementally-computed aggregates/views in
RethinkDB?

~~~
alexpopescu
Considering RethinkDB's secondary indexes can be defined around pretty complex
ReQL expressions [1] you could already get some of it already.

[1] [http://rethinkdb.com/docs/pragmatic-faq/#how-do-i-take-
advan...](http://rethinkdb.com/docs/pragmatic-faq/#how-do-i-take-advantage-of-
secondary-indexes)

~~~
apendleton
Yeah, I thought about that, but it seems like you can only use them to get the
"map" part of map/reduce... no aggregation. Unless I'm missing something.

