
RethinkDB 1.14: binary data, seamless migration, and Python 3 support - coffeemug
http://www.rethinkdb.com/blog/1.14-release/
======
sigzero
Reading through some of their docs, I like their honesty about where RethinkDB
is and isn't. Reading the vs Mongodb they have:

"RethinkDB's performance has degraded significantly after the addition of the
clustering layer, but we hope we'll be able to restore it over the next
several releases."

They could have tried to gloss over that. I like that they weren't afraid to
put that out there.

~~~
coffeemug
Thanks! Just FYI, this statement is now seriously out of date. We've done an
enormous amount of performance work since the comparison document was written,
and performance is now back to normal. Once we get some scientific graphs to
back up anecdotal evidence, we'll update this part of the doc.

~~~
sigzero
I knew that when I read it pretty much. I just posted it as an example of the
transparency.

------
jpgvm
I have been using RethinkDB in production with clustering.

One thing I would really love to see more time put into is integration of a
real consensus protocol like Paxos or Raft for properly recovering from
failure.

Currently tables will become unwritable after a single machine fails and you
need to update the blueprint/semilattice to use a new master manually. This is
error prone (due to race conditions in replication of blueprint updates that
can cause vector clock conflicts if you update it from more than one place)
and generally not all that fun.

Error handling in these cases could also be better, the initscript it ships
with doesn't make this much better (would be nicer if it used a symlink hack
or something similar to have different initscripts for each db instance).
Currently it's hard to tell without manual inspection why a RethinkDB server
crashed or didn't start without manual intervention.

That being said, I love using RethinkDB, ReQL is great - infact it's so good
that it's generally justification enough for me to use it for new projects.

If clustering could be given more love it would be my #1 datastore for all
projects probably. (See the Jepsen by aphyr series for an idea of what I am
looking for here)

~~~
coffeemug
I'm happy you brought up clustering. Internally we've been quite frustrated
with this part of the product, but until a few months ago we held off the
development on it for two reasons: we wanted to collect more information from
users on real use cases and behavior, and there were more immediate
bottlenecks in the product.

We restarted heavy development on the clustering infrastructure two months
ago, and just yesterday I played with the prototype of the first upcoming
upgrade. It's a WIP but is absolutely delightful (you can see my tongue in
cheek review of it at
[https://github.com/rethinkdb/rethinkdb/issues/2957](https://github.com/rethinkdb/rethinkdb/issues/2957)).

Here are the parts that are already done and will be shipped soon:

    
    
      - Vector clocks conflicts are now resolved automatically, no
        more manual conflict resolution
      - There is now a ReQL API for clustering that's dramatically
        better than the current `rethinkdb admin` tool
      - Much love has been put into presenting the abstractions to
        users. Everything is cleaned up and simplified, it's easier
        to understand and change, and even in advanced cases you
        won't have to know anything about blueprints/semilattices.
      - Really, this is about to get dramatically better. I can't
        summarize it in a bullet point, we put an enormous amount of
        effort into this in a thousand different places.
    

Here's what's coming immediately after that:

    
    
      - Automatic failover
      - Always-on resharding (no more resharding downtime)
      

(The reason why these latter updates are coming after the API overhaul is
because they require a lot of simplification/refactoring/redesign internally
as well as externally, and we wanted to do it piecemeal).

Thanks for writing up your feedback and sticking with RethinkDB despite the
limitations of clustering 1.x. Multiple people are currently working very hard
on this, and things are about to get _a lot_ better.

~~~
dkersten
This makes me very happy to hear. I've been using RethinkDB for a few months
now and I love it, but the manual failover has made me a bit uneasy.

Any chance of a clue as to the kind of timeframe for this? Even a very rough
idea would be fine as I can appreciate you might not want to (or be able to)
commit to anything yet.

~~~
coffeemug
I think we'll be able to ship the new clustering API in ~two months (note,
it's a huge and a massively delightful change). I'm hoping we'll be able to
get failover out two months after that, but it's hard to give precise
estimates looking that far out.

~~~
dkersten
Fantastic, thank you.

 _it 's hard to give precise estimates looking that far out_

Absolutely! I just wanted some kind of ballpark idea, which you've given me -
thanks!

------
CitizenKane
Been using RethinkDB in "light" production since February. It's a joy to use,
and it has been running rock solid since that time.

Even though it's suggested to use RethinkDB on a 2GB+ machine
([http://rethinkdb.com/faq/#what-are-the-system-
requirements](http://rethinkdb.com/faq/#what-are-the-system-requirements))
it's run just fine on a machine with 1GB of memory.

Looking forward to all the new changes. I'm especially happy about the binary
support with r.http, going to make life easy!

------
coffeemug
Hey all. Slava here @ RethinkDB.

If you have any questions about the release, product roadmap, or anything
RethinkDB-related, please ask! I'll be around all day to answer questions and
incorporate your comments into the development roadmap.

~~~
DAddYE
Hi and thanks for this!

What are the plans for supporting java/clojure/scala and go-lang?

~~~
kclay
O don't forget the scala one [https://github.com/kclay/rethink-
scala](https://github.com/kclay/rethink-scala) :D

~~~
coffeemug
Arghh, sorry. I haven't used Java in years, so all JVM-related
languages/drivers are kind of mashed together in a single bucket in my mind. I
really ought to play with scala/clojure to unmash them.

------
tnjm
Hearty congratulations to all. I'm using RethinkDB in production, and it's a
joy.

The only annoyance so far has been data migration between versions; the
elimination of this is excellent news.

If you're considering taking Rethink for a spin, now is a better time than any
before.

~~~
coffeemug
Thanks for being a user! If you run into any issues or have feature requests,
please post on
[https://github.com/rethinkdb/rethinkdb/issues](https://github.com/rethinkdb/rethinkdb/issues)
(we're planning 1.15 now and can still reshuffle priorities!)

------
simi_
Even with Docker, the upgrade process wasn't that painful. I just used the
Dockerfile from here [1] to create a new xxx/rethinkdb:1.14 image, then booted
it up and everything worked automagically. I did have to rebuild secondary
indexes, but no biggie, it was a simple `rethinkdb index-rebuild -c host:port
-r db.table` away (after I upgraded the brew rethinkdb package). Hopefully it
will get even better in the future.

1:
[https://registry.hub.docker.com/u/dockerfile/rethinkdb/docke...](https://registry.hub.docker.com/u/dockerfile/rethinkdb/dockerfile/)

~~~
coffeemug
FYI, you don't _need_ to rebuild the indexes. You can continue running your
app and the old index protocol will work seamlessly. It helps to rebuild when
you upgrade in case you start using new functionality (so the indexes get
computed correctly), but running an old system would work fine.

EDIT: note that you won't necessarily be able to run with indexes from
multiple versions back. We'll almost certainly prune the backwards
compatibility code to only a few releases to keep the codebase clean and
nimble, and to keep difficult to diagnose bugs to a minimum. But it's still a
very convenient feature release-to-release, as you can upgrade, and then
rebuild indexes at your convenience.

------
z3ugma
I love rethink and ReQL and especially the intuitive Python bindings - but I
can't wait for geospatial indexing. It's the last thing preventing me from
using RethinkDB in prod.

~~~
mglukhovsky
Good news on that front: geospatial support has already passed code review and
been merged into the codebase (check out this issue to learn about the
implementation and progress:
[https://github.com/rethinkdb/rethinkdb/issues/2571](https://github.com/rethinkdb/rethinkdb/issues/2571)).

There are a few limitations we'd like to work out (e.g. right now the
implementation doesn't support compound geo indexes), but we're well on track
to shipping 1.15.

------
ukd1
If you want to come and learn more about RethinkDB, there is a meetup early
next month in San Francisco - [http://www.meetup.com/RethinkDB-SF-Meetup-
Group/events/20080...](http://www.meetup.com/RethinkDB-SF-Meetup-
Group/events/200802562/)

~~~
amiraliakbari
I'm very interested in looking into the code as a sample of real-world
distributed system implementation, much more informative than our university's
distributed systems course, I guess. Are there any guide available for the
source code, besides src/README ?

~~~
coffeemug
Unfortunately, there isn't a good guide yet (but hopefully there will be one
soon).

Studying RethinkDB source code may not be the best way to study distributed
systems, though. There is an enormous gap between a working system and a
production quality product, and most of the code in that gap has to do with
relatively mundane issues like error checking/error reporting, monitoring
tools/APIs, lots of polish, edge case handling, etc.

It's a lot of fun to get into the guts of the system, but it's fairly large,
so it's a non-trivial undertaking. If you do decide to do it, we'd love to
help you out on IRC (#rethinkdb on freenode), and would appreciate if you
documented your experience so we could make the process easier for others in
the community!

------
ahoge
I'm really exited about binary data support. The timing is amazingly
convenient. It's exactly what I needed right now.

Coming from CouchDB, support for attachments (=binary data) was the only thing
I was missing.

------
zallarak
Wow - people have so much positivity in sharing their experience with
RethinkDB. Congrats to the team for building something great, that others find
utility in and enjoy working with.

------
orkj
Seamlessly migrating dev db as we speak!

Eh, update. After looking back in my terminal window when I wrote that
sentence, it seems this is already done. All I can say is: :o

Advancing through versions from the early days back in 1.2 this has of course
been one annoyance. Just want to say to the team: awesome job!

Thanks again

------
weixiyen
Congrats! Using Rethink in production, solid so far.

------
hardwaresofton
Great work guys! So glad to see seamless migration added!

Congratulations to the team

------
DAddYE
I'm a fan of RethinkDB, however I didn't had a chance to use early releases.
I'm glad to see so much positivity. Can you share your thoughts and production
workload?

------
domrdy
The one thing I really came to appreciate is the polished UI that ships with
rethink, especially the query editor.

------
wamatt
Congrats guys. It's been fun following your progress. Love the technical
updates and overall direction :)

------
kclay
Congrats on the update

------
andy_ppp
Be nice to hear about large clusters using Rethink DB in production. Are there
any startups using it for large cross-data-centre databases?

------
kfk
Is RethinkDB shifting focus? I remember at the beginning it was focusing on
streaming data but now it does also light analytics?

~~~
coffeemug
There isn't much of shifting of focus. The query language can do both
analytics and realtime queries. We optimize for realtime, but with most of the
performance work, optimizations tend to apply to both use cases. Everything is
getting better, but the focus is still on realtime.

------
illumen
python 3 forever!

