
RethinkDB 1.16: cluster management API, realtime push - coffeemug
http://rethinkdb.com/blog/1.16-release/
======
mglukhovsky
Hey everyone, we'll be hosting a live webcast and Q&A with our engineers this
afternoon to introduce the new changefeeds and cluster management API in
RethinkDB 1.16.

If you're interested in joining, you can RSVP here:
[http://www.meetup.com/RethinkDB-Bay-Area-Meetup-
Group/events...](http://www.meetup.com/RethinkDB-Bay-Area-Meetup-
Group/events/219703108/)

~~~
spiffytech
Was this recorded?

~~~
mglukhovsky
Yes, if you missed the webcast, you can watch it here:
[https://www.youtube.com/watch?v=bBqVjT2xz_8](https://www.youtube.com/watch?v=bBqVjT2xz_8)

------
egeozcan
It was very easy to setup, the web interface looks like a wonder, has joins
and now realtime push and cluster management? I'm giving this a try! Why don't
all the databases come with such beautiful web interfaces?

~~~
meowface
RethinkDB is awesome, but to my understanding it's one of the slowest document
stores out there right now. That's one reason why it hasn't seen very
widespread adoption yet. For smaller projects it's not a problem, but for
large projects databases can often be a major bottleneck.

~~~
coffeemug
RethinkDB performance has improved dramatically in the last year. We'll be
posting benchmarks for the 2.0 release, which should dispel any performance
concerns.

~~~
meowface
That's great. I'd love to see you guys completely replace the MongoDBs of the
world. Once you eventually reach MongoDB's performance across the important
areas, switching would be a no brainer for me, and probably for many others.

~~~
moe
Please just make sure that data safety in RethinkDB always takes precedence
over performance.

The last thing we need is _another_ MongoDB that can not be trusted with
anything valuable.

~~~
vskarine
Try TokuMX (MongoDB on steroids with ACID transactions)
[http://www.tokutek.com/tokumx-for-mongodb/](http://www.tokutek.com/tokumx-
for-mongodb/)

~~~
moe
You don't fix MongoDB and all its data destroying habits (as documented by
aphyr and a dozen blog posts) by replacing the index.

It's rotten from the core, poorly designed on every layer, and has caused many
companies great grief.

I wouldn't touch it or any of it's descendants with a 10 foot pole.

~~~
vskarine
You should do more research on TokuMX. They didn't just replace index, they
rewrote a lot of the core.

------
coderzach
Wow, the realtime push feature will be a game changer. It removes so much of
the boilerplate required to build server push apps. I'm super excited to start
using this.

------
porker
I've been watching RethinkDB and it looks really cool, but...

One need I have in nearly every project is record versioning. Think
implementing a modern wiki, in a CRM (what did X alter? Which phone numbers
have been assigned to this person?), versioning updates to data records etc.
Realtime web with multiple users editing records simultaneously, it's even
more important.

Versioning is a pain to implement in every system I've used, and I'm looking
for something to lower that pain.

CouchDB et al have built-in versioning, but cause pain in other places. I've
looked at the RethinkDB docs and found no mention - so how would RethinkDB
users handle versioning, and are there any helpers on the way?

~~~
UserRights
yes, versioning please for all the databases, missing this since years! Please
more talk about this!

~~~
dkersten
CouchDB and Datomic are the only databases that _I_ know about that have
built-in versioning. Would love to hear about others that do too!

~~~
juliangregorian
MVCC is not versioning and you can not rely on CouchDB's MVCC functionality to
provide versioning.

Datomic on the other hand does claim to provide "rewindability".

~~~
dkersten
Ah, Ok, I stand corrected. I only used CouchDB very very briefly some years
ago and was under the illusion that its document version numbering could be
used to keep track of document changes. I didn't realise it was purely for
MVCC purposes.

Yes, Datomic advertises that it keeps track of a documents history.

------
CitizenKane
Super excited for this! I have other comments to this effect, but try out
RethinkDB. It has been absolutely great for us and what we're doing and it's
been rock solid in its reliability.

We're implementing real time features in our product in the next couple of
months so this could not have come at a better time.

------
nkohari
I'm really excited about RethinkDB, and am using it in my (yet-to-be-launched)
startup. The new changes() support seems very interesting for my app, but for
my use case I'd need several queries open per user (probably on the order of
10-100).

I missed the Q&A this afternoon, but I see lots of RethinkDB engineers on
here... so, would there be severe performance implications of holding open
that many cursors?

~~~
danielmewes
How many users are you expecting?

If you go over a couple 1,000 active changefeeds in 1.16, I recommend setting
the `maxBatchSeconds` optional argument to `run` to something like 10 (the
default is 0.5).

That change should significantly reduce the CPU overhead on the server if you
have lots of idle changefeeds. Note that this does _not_ affect how quickly
changes get delivered - you will still get changes instantly.

The exact performance will of course depend on what queries you're going to
run exactly, the rate of writes etc.

~~~
nkohari
Thanks! I'll put together a test to see if I might run into any trouble.

(Related: the RethinkDB team is very responsive in the IRC channel. It's great
to see how interactive they are with the community!)

------
kylestlb
Exciting. Looks perfect for Meteor integration (streaming to a
Meteor.Collection), any news on that?

~~~
nicksergeant
Check out [http://rethinkdb.com/blog/realtime-
web/](http://rethinkdb.com/blog/realtime-web/).

~~~
ndarilek
This blog post states that community members are working on an integration,
but is that work happening publicly? I'd be iterested in following it even if
it isn't ready for use.

------
joseraul
From the FAQ: "For multiple document transactions, RethinkDB favors data
consistency over high write availability. While RethinkDB is always CID, it is
not always A."

What is the relation between write availability and A(tomicity)?

~~~
coffeemug
slava @ rethink here.

I think this sentence is just phrased poorly. It means to convey that there is
no atomicity across multiple documents, which is unrelated to the previous
sentence. I've opened an issue to fix that:
[https://github.com/rethinkdb/docs/issues/633](https://github.com/rethinkdb/docs/issues/633).
Thanks for noticing!

------
mcms
Where can I find more about clustering in Rethinkdb? e.g. What is durability
and replication models?

I could not find enough info in the documents and the github issues are not
clear about the status and the roadmap :(

~~~
danielmewes
This document explains our basic replication architecture and might be helpful
[http://www.rethinkdb.com/docs/architecture/](http://www.rethinkdb.com/docs/architecture/)

Let me know if you want to know more about a specific aspect.

~~~
DanWaterworth
So, I am right in reading that there's no automatic failover? Also, when a new
master is chosen, do you pick the replica with the most up to date log or do
you hand that decision off to the administrator?

Sounds like you need to implement raft. Do you have any plans in that
direction?

------
FooBarWidget
Everything looks amazing, but what is their business model? How will they fund
all this development in the long term? I cannot find any commercial offerings
on their website.

~~~
shin_lao
Maybe they hope that one of the biggest NoSQL players, who raised a lot of
money, acquires them?

From the website also I cannot understand the advantages compared to a
relational database.

------
kclay
I really need to add changefeed support to the Scala lib. Should be a fun
weekend.

~~~
orthecreedence
Ditto, just rewrote the CL lib for 1.15.x and the only thing missing is
changefeeds. Hoping to get that in soon.

------
scottmessinger
What are the performance characteristics of realtime push? Does the
performance of inserts slow down with the number of subscriptions to change
feeds? Or, is insert performance unrelated to subscriptions? Also, does the
change feed only show the before/after or does it also show the query that was
used to transform the data?

~~~
coffeemug
Slava @ rethink here.

The idea behind the architecture was that performance should be significantly
better than rolling your own infrastructure, because the database has a lot of
information that userland (from the database perspective) software doesn't.

The performance of inserts might slow down _slightly_ (matter of microseconds
in insert latency) if you create many feeds. The database has to look at each
insert and figure out if it applies to any of the feeds. This code is written
in optimized C++ and is very fast. We're still running benchmarks, but we're
shooting for performance levels where you (as a user) might not even be able
to measure the difference.

The same applies for inserts that aren't affecting feeds (on a per table
basis).

Same goes for throughput -- it might slow down _slightly_ , but we're shooting
for making the slowdown barely measurable if at all.

EDIT: in clustered environments, if you're subscribed to 1000 changefeeds on
machine A and a write happens on machine B, we do constant work on B to send
the changes to A and then A does all the work to figure out which changefeeds
need to see it. TL;DR: We don't block out other writes for time proportional
to the number of feeds.

~~~
taion
Are you doing anything clever to figure out e.g. which subset of changefeeds
subscribed to a table might be interested in a given update?

Let's say I have a table containing data for many users, while each
subscription only needs data for a single user. Instead of scanning through
all the changefeeds, you could put subscriptions in a hashmap and figure out
which ones to update in O(1) time rather than O(N) time in number ob
subscriptions, per update.

Obviously this is much harder in the general case, but do you do anything
along these lines?

------
dkersten
Does this help for auto-failover? Can I now do something like set up a
changefeed on table_status to monitor it and the call reconfigure? Or is there
more work still required?

PS Absolutely loving RethinkDB, I've been using it as my main database since
April and its a joy to work with!

~~~
timmaxw
Auto-failover is hard because of some fairly deep-rooted aspects of how
RethinkDB is implemented. When one of your servers dies, the RethinkDB cluster
won't allow any reconfigurations until it reconnects. If the server is
permanently dead, you can tell the RethinkDB cluster to go on without it; but
you shouldn't do that unless the server is _actually_ permanently dead,
because the server won't be allowed to rejoin the cluster later. This makes it
hard to write an automatic failover tool because you'd need to be able to tell
the difference between a server that's permanently dead and a server that's
just dropped its connection for a bit.

The solution is to re-architect RethinkDB so that it can reconfigure a table
even if there's a disconnected server. This is a pretty big project, but we're
working on it, and it will probably ship around April or May. We'll also
include server-side auto failover at that time, because it's easy once this
problem is solved.

------
chrisduesing
Looking at the docs it doesn't appear this is meant to stream to the client,
but to the server. From there you would still need to manage queues and
sockets, etc.. I've had to write several exchanges that do something along
these lines (stream order data to a client) and I've always accomplished it by
having the code that writes to the db also push a client update to a queue.

I guess my question is why would this be a preferred solution? It seems to run
afoul to the 'one job' design goal. What am I missing?

~~~
Rapzid
And what happens to the fact that the data has changed if the client becomes
disconnected for whatever reason? Is the fact that the data has changed
committed with the data atomically. Is the query persistent across
connections? Or does it end in tears?

~~~
danielmewes
Changefeeds are currently bound to a database connection, so they will get
terminated if for example the application server goes down or there's a
networking issue.

We feel like for many queries (especially the ones you find in web
applications) that's not a big deal, since you can efficiently re-run them
after reconnecting. In other cases it definitely matters, and we are going to
add what you describe in a future release. You can follow the progress (or
chime in if you like) at
[https://github.com/rethinkdb/rethinkdb/issues/3471](https://github.com/rethinkdb/rethinkdb/issues/3471)
.

------
jules
Realtime push seems like such an obviously useful feature. I wonder if there
are any other databases that support that?

~~~
platform
MarkLogic alerts is similar [https://docs.marklogic.com/guide/search-
dev/alerts](https://docs.marklogic.com/guide/search-dev/alerts)

" An alerting application is used to notify users when new content is
available that matches a predefined (and usually stored) query. MarkLogic
Server includes several infrastructure components that you can use to create
alerting applications that have very flexible features and perform and scale
to very large numbers of stored queries. "

I am not sure how commits on multiple documents work in RethinkDB, but
MarkLogic the reverse query will be executed only after the commit (multiple
or single-doc) will have happened (since MarkLogic is ACID).

Another system that supports similar thing is EllasticSearch with query
percolation feature

[http://www.elasticsearch.org/blog/percolator-redesign-
blog-p...](http://www.elasticsearch.org/blog/percolator-redesign-blog-post/)

------
siscia
Can we have the real time push also for PostgreSQl ?

(I am asking if there are any technical limitation...)

~~~
taion
You can get a very basic real-time push in PostgreSQL with per-row triggers
and NOTIFY/LISTEN (pg_notify/pg_listen).

The hard part is building out a change feed that lets you synchronize on a
subset of a table in a way that is performant, and to make this work with
joins (i.e. when you're using the database as more than just a document
store).

------
evo_9
What are the plans for future official driver support? Is there a community
Swift driver in development, or will that be an official driver at some point?

Looks awesome really want to use this on my next couple of projects.

~~~
coffeemug
After the 2.0 release (this February) we're going to start bringing more
drivers under the official umbrella. I don't have specific plans to share, but
there will definitely be more officially supported drivers.

I'm not sure about Swift. If anyone wants to give building a Swift driver a
try, check out the "Contribute a driver" section at
[http://rethinkdb.com/docs/install-
drivers/](http://rethinkdb.com/docs/install-drivers/). One is pretty easy to
build, and is a lot of fun!

~~~
tracker1
Too bad you require relocation to SF... would love the opportunity to work on
the drivers for node.js and .Net

------
vskarine
Any plans to work with
Tokulek([http://www.tokutek.com](http://www.tokutek.com)) and use fractal
trees for indexing?

~~~
danielmewes
We're generally open for this. However it's a bit of extra work since
RethinkDB doesn't currently have a pluggable storage backend API like MongoDB
or MySQL do.

------
moatra
Regarding changefeeds - is there a way to tell when you've consumed all the
initial data and are now receiving update diffs?

~~~
coffeemug
Yes. When you're getting initial data you'll get a document of the form `{
new_val: data }`. When you're getting changes, the document is of the form `{
new_val: data, old_val: data }`. Note that in the former case, the `old_data`
field is missing.

~~~
moatra
Ah, thanks. It looks like only a few query types actually return an initial
result set: between, min, max, and order_by/limit (
[http://rethinkdb.com/docs/changefeeds/python/](http://rethinkdb.com/docs/changefeeds/python/)
)

Is there a way to get an entire table as the initial result set before getting
update diffs? Something like:

    
    
      r.table("users").between(-Infinity, Infinity).changes().run()  // Not actually valid

~~~
coffeemug
Not at the moment, but we're on it -- see
[https://github.com/rethinkdb/rethinkdb/issues/3579](https://github.com/rethinkdb/rethinkdb/issues/3579).

