I'll be around all day to answer questions about the release (along with a few other engineers on our team).
We're very excited about this release -- it makes the lives of RethinkDB users dramatically better because they won't have to wake up anymore in the middle of the night in case of most hardware failures :) It also took over a year to build and test, and has been one of the most challenging engineering problems we ever had to solve.
You can see where the memory sets itself during the rethink process restart, and then it slowly climbs back up to ~3gb, despite no new data being written to the system (all our RethinkDB data is generated during deploy time).
Any pointers? I'd be happy to provide any diagnostics.
The image doesn't seem to load for me. Would you mind opening a GitHub issue (https://github.com/rethinkdb/rethinkdb/issues/new) with as many details as possible? We need to know a bit more about the infrastructure/workload to replicate this; once we get the info we'll get to the bottom of it ASAP.
Also feel free to e-mail me at firstname.lastname@example.org if you need urgent/critical support.
It helps if you include information on your setup in the issue (for a list of helpful details, read "How to submit a bug report" here: http://rethinkdb.com/docs/crashes/).
IMO picking a KV store makes the most sense if you have insane performance requirements (e.g. millions of ops/second) where you need to squeeze every last drop of performance and the query layer gets in the way.
(Also if you have any questions and don't want to share more info publicly, feel free to shoot me an email -- email@example.com)
If you wanted to stay closer to key/value with nice clustering and scaling options, I might suggest cassandra. To be honest, that would be overkill as a relatively small RethinkDB cluster would likely handle your load for several years without issue.
Do you use the Community Edition and if so is it generally stable / suits your needs? I didn't read your article but I'll check it out tonight.
Unless you are a business where $5k/node looks reasonable to pay for peace of mind, you are probably fine with the Community Edition. A lot of the customers paying for Enterprise Edition are moving to Couchbase from Oracle, so they are paying less per node and running fewer nodes than they used to.
The Couchbase web site says: "Bug fixes and new features are eventually integrated with the Community Edition (CE), but this typically takes several weeks." But from what I can tell it's not "several weeks" -- it's at least 10 months.
Am I misunderstanding something?
I'd rather say it's pretty useless. I wouldn't run anything where a bug fix (which could be a data corruption issue, a crash...) may take weeks to be fixed.
Unless I'm missing something, my opinion is: either use the commercial version or use a different product. Please correct me if I'm wrong.
The paid version of the software funds the community edition for Couchbase. I cannot say our system is perfect and I am certainly happy to hear feedback on how you'd like to see community vs enterprise edition configured. Posting here is fine or I am at - firstname.lastname@example.org.
I would naively assume that since you need to replicate your database to the slave servers, you need to send the WAL or changes somehow anyway. Could you elaborate on what was the problem there? I'd be surprised if it is because of a fundamental limitation of Raft (but could easily be a limitation of a Raft library!)
If we used Raft to replicate the document, in many cases it would be a lot more chatty. If you look at the Raft paper and track all the messages that would have to go back and forth, it would dramatically increase the latency for each write. This is inherent to Raft (and any distributed consensus protocol). Unfortunately distributed consensus isn't free; you have to pay pretty heavy latency costs which would be unacceptable in production, so we couldn't just uniformly apply that to every write.
> If we used Raft to replicate the document, in many cases it would be a lot more chatty.
I'm out of my depth here. Leader needs to send AppendEntries and slave needs to apply to persistent storage and ACK. Leader needs to wait for majority of ACKS before responding to the client. That's the same as your three-node replication scenario, so what am I missing here? Does Rethink db relax consistency guarantees in some cases to achieve better latency?
It's not your job to educate me on Raft, I appreciate your being patient with me, but feel free to opt out of this conversation anytime you please.
A couple of things -- the payload in Raft tends to be much higher (though it could probably be fixed with sufficient engineering effort), and in some scenarios this process would have to happen multiple times during netsplits (which may or may not be ok).
RethinkDB doesn't relax consistency guarantees; we implement them in a different way. Check out http://rethinkdb.com/docs/consistency/ for more details.
I'm a bit busy today, but this is a really interesting question. I'll see if we can do a technical blog post on this and go into all the details in depth.
> Does Rethink db relax consistency guarantees in some cases to achieve better latency?
> RethinkDB doesn't relax consistency guarantees; we implement them in a different way.
But the page you linked says:
> `single` returns values that are in memory (but not necessarily written to disk) on the primary replica. This is the default.
> `majority` will only return values that are safely committed on disk on a majority of replicas. This requires sending a message to every replica on each read, so it is the slowest but most consistent.
which seems to imply that the default settings sacrifice consistency for better latency. Can you (or someone else, if you're busy) clarify?
We implement a variety of modes for reading and writing that allow the user to select different trade-offs for consistency and performance. By default: writes are always safe; reads are safe when the cluster is healthy, but can sometimes have anomalies in failure scenarios. You can also do `majority` reads which are completely safe even during failure scenarios, but are slower.
However note, that the default read mode isn't an "anything can happen" implementation. The guarantees are precisely defined, and 2.1 passes all tests in a large variety of failure scenarios with respect to these guarantees.
My comment about not relaxing consistency guarantees was meant in a slightly different context. The OP was talking write transactions (and implementing individual writes with Raft vs. using a different way), and I pointed out that we don't relax consistency guarantees for writes despite not using Raft.
I realize my comment was confusing -- sorry about that; I didn't mean to mislead.
In the Raft protocol, the leader sends AppendEntries; the follower writes the log entries to persistent storage, but doesn't apply them to the state machine yet; the leader sends another AppendEntries with a higher commit index; and then the follower applies the changes to the state machine. In RethinkDB's case, the "state machine" is the B-tree we store on disk. One of the guarantees we provide is that if the server acknowledges a write to the client, then all future reads should see that write; and we perform reads by querying the B-tree. So we can't acknowledge writes until they've been actually written to the B-tree, and we can't start writing to the B-tree until after the write has been committed via Raft. So that's where the latency would come from if we were using Raft to manage individual documents. We considered a couple of ways to work around this. One would be to make reads check both the B-tree and the Raft log; but that makes the read logic much more complicated. Another would be to start writing to the B-tree as soon as we put the write in the Raft log. The problem is that Raft expects to be able to roll back parts of its log at any time, and our storage engine's MVCC capabilities aren't good enough for that. The hybrid approach allows us to have good latency without major rewrites of the existing storage engine.
The other reason is that RethinkDB allows tables to be split into shards, and the number of shards can be changed while the databases is running and accepting queries. We considered having one Raft instance per shard, but we would have needed to modify the Raft algorithm to allow splitting and merging Raft instances while they are running and accepting queries. (This is approximately what CockroachDB is doing .) But the Raft algorithm is really tricky to get right even when you're not trying to modify it, and we wanted to stick as closely as possible to the official algorithm to minimize bugs. The approach we ended up going with allows us to have the performance and convenience of live resharding without having to modify the Raft algorithm.
In response to your question about consistency guarantees: RethinkDB gives users several options for trading off consistency and latency. The default is to acknowledge writes only once they're safely on disk on a majority of replicas, but to perform reads only on the primary replica (leader). This doesn't give perfect consistency; if the leader fails over, reads that hit the database around the time of the failover might see outdated data, or they might read writes that were rolled back as part of the failover. Unfortunately, the only way to get stronger consistency guarantees is to wait for a majority of replicas to acknowledge the read, which makes performance much worse. We offer a safe-but-slow mode for reads, but it's not the default because the performance is so bad. We also offer fast-but-unsafe modes for writes, for users that want better latency and are OK with losing the last few writes in the event of a failover. See the documentation  for more information.
Am I correct in understanding that the tests only pass in the "safe-but-slow" configuration that timmaxw described (https://news.ycombinator.com/item?id=10043746) and not in the default "fast-but-unsafe" configuration?
However you're right that we don't pass stricter guarantees in fast-but-unsafe mode (but I'm not aware of any products in the space that do).
Doing each thing independently is hard; doing both of them together in one system is really really hard. I'm particularly proud of how nicely everything clicks together, and how elegant the administration API is. It seems very simple and is extremely powerful; getting that right took many iterations over the course of a year.
See the faq for more details: http://rethinkdb.com/faq/
It looks like they try to follow http://www.defmacro.org/2013/04/03/issue-etiquette.html, it'd be great to see other companies adopt it too.
It's taken a while to get to this point, but the development to get here has been methodical and incredibly well managed in terms of getting appropriate groundwork in place for a feature (like automatic fail-over) instead of just trying to hack at it or bolt it on the side. For that matter baking solutions for data streaming into the box, as opposed to less thoughtful options.
I'd probably reach for RethinkDB before Postgres or others simply for the better administrative experience. Especially for small teams or start-ups that don't have a dedicated DBA role.
For anyone curious, the databases I would most likely reach for, depending on the situation would be RethinkDB, ElasticSearch and Cassandra. I really do like MongoDB a lot as well, but RethinkDB offers the features with far less friction, though the query interface takes a bit of getting used to.
That said, I also like more traditional RDBMS options as well. I REALLY like what PostreSQL offers, but have no desire to administer such a beast, failover isn't really baked in, and the best options are only commercially available, at a significant cost. There are also hosted options for AWS and Azure for various SQL RDBMS. That said, I find being able to have data structure hierarchies in collections tends to be a better fit for MANY data needs.
Congratulations to Slava and everyone else at RethinkDB.
AGPL also enforces code and changes to be distributed to the users if they directly consume your database over the network. This would be the case, for example, of a database-as-a-service.
Put it simply: you use an AGPL database, internally, even for your SaaS: you're fine. You either modify RethinkDB and ship it as a product or you provide a RethinkDB or RethinkDB-derived database-as-a-service, then you have to also provide source code to your users.
Theoretically we can stop doing that in the future and keep changing the protocol, but that would alienate all of our users. That would be an insane decision, and we'll never ever do it.
I was never able to get the referenced python script working, but using a shell script inside `.profile.d` did the trick.
If you like, you can launch a 1.16 AMI, ssh into the instance, and run `apt-get update && apt-get install rethinkdb` to update to RethinkDB 2.1.
Thanks for being patient -- we'll get the official images updated soon.
We have an aggressive release cycle and try to release ever 2-4 months. Unfortunately the process to get an image onto AWS is fairly slow (with the exception of security patches). So the AMI images are unfortunately a little behind. We're working hard to update them as soon as possible, but unfortunately most of the process is out of our hands.
...Doesn't seem available on homebrew yet though.
This has been a long-awaited feature for me. While I loved nearly every aspect of RethinkDB, it was the reason that made me hold back from using RethinkDB. Good to see RethinkDB keep improving!
Also, very much looking forward to trying this out!
If there's anything else you think is missing from the documentation that would be helpful, open an issue here: https://github.com/rethinkdb/docs
Honestly, if you want to use RethinkDB with Java, it may be worthwhile to write a domain centered service with Node or Python and have that as an intermediary for Java. I've actually used Node on several occasions as a translation service for requests to foreign systems as sometimes there's a lot of disconnect in specific implementations of SOAP/WS-* services from Java/PHP/.Net etc. It tends to work very well for this use case.
I've actually been looking at GraphQL with some interest, and thinking this could be a pretty awesome option in front of RethinkDB, which would open up to pretty much any client that supports graph, which is limited right now...
Follow https://github.com/rethinkdb/rethinkdb/issues/3930 for progress updates
The normal use case is to use the Teiid JDBC driver (or Postgres ODBC) to connect with the Teiid server and then that handles talking to your datasources via the translators.
I think there was some work done to make a standalone MongoDB JDBC driver using an embedded Teiid server, but I'm not sure how much progress was made there. Mapping document to relational can be tricky.
That seems like deceptive marketing to me. Once the pull request is merged, then you can make that claim.
Here's the open PR for Jepsen: https://github.com/aphyr/jepsen/pull/70
We've been working with Kyle on this though, and I think he seemed happy to merge the PR (but he's pretty busy). Hopefully will happen soon, sorry for confusion.
EDIT: here is some context -- https://github.com/rethinkdb/rethinkdb/issues/1493#issuecomm....