Hacker News new | comments | ask | show | jobs | submit login
RethinkDB 2.1 is out: high availability (rethinkdb.com)
264 points by coffeemug on Aug 11, 2015 | hide | past | web | favorite | 105 comments



This is soooo awesome. I started rewriting SageMathCloud to use RethinkDB when I learned in May about your plans to support high availability. I've been rewriting everything, doing tests (building from sources, then using the beta you kindly provided), and finally after months of work, I'm ready to release the new version of SageMathCloud last night, but RethinkDB 2.1 isn't out yet. So I'm torn about whether to go with 2.1beta and cross my fingers, or just wait, or what. And this! Thank you so much. RethinkDB is, for my use, the first database I've ever actually really loved (and React.js+flux the first web framework). Here's my client code in case anybody is curious: https://github.com/sagemathinc/smc/blob/rethinkdb/salvus/ret...


[flagged]


I used it because everything all data comes from JS apps and JSON->storing in Rethinkdb is incredibly easy. Plus hooking up changefeeds (event stream for whenever a table changes) has tons of usecases (analytics, IRC bots, etc).


Slava @ RethinkDB here.

I'll be around all day to answer questions about the release (along with a few other engineers on our team).

We're very excited about this release -- it makes the lives of RethinkDB users dramatically better because they won't have to wake up anymore in the middle of the night in case of most hardware failures :) It also took over a year to build and test, and has been one of the most challenging engineering problems we ever had to solve.


We're seeing serious memory leaks in 2.0.4~trusty.

http://glui.me/?i=uu1gqb3son0sbnn/2015-08-11_at_11.33_AM.png...

You can see where the memory sets itself during the rethink process restart, and then it slowly climbs back up to ~3gb, despite no new data being written to the system (all our RethinkDB data is generated during deploy time).

Any pointers? I'd be happy to provide any diagnostics.


Thanks for reporting this!

The image doesn't seem to load for me. Would you mind opening a GitHub issue (https://github.com/rethinkdb/rethinkdb/issues/new) with as many details as possible? We need to know a bit more about the infrastructure/workload to replicate this; once we get the info we'll get to the bottom of it ASAP.

Also feel free to e-mail me at slava@rethinkdb.com if you need urgent/critical support.


You should open an issue on GitHub, so we can track it down: https://github.com/rethinkdb/rethinkdb/issues

It helps if you include information on your setup in the issue (for a list of helpful details, read "How to submit a bug report" here: http://rethinkdb.com/docs/crashes/).



I'm evaluating a move from CouchDB to another system purely as an easily sharded/replicated KV document store. Do you recommend Rethink for that specific use-case or is it overkill? Your docs seem to convey that this isn't really in Rethink's wheelhouse (in that RDB is more suited for querying or real-time updating).


RethinkDB can definitely act as a distributed key-value store (and a pretty good one). But I think the choice is highly dependent on your use-case and requirements. Could you go into a little more depth wrt your workload? (e.g. number of records, average record size, read/write ratio, expected ops/second, etc.)


Yea of course. We're basically talking about 10/90 write/read on JSON documents ~100kb in size with very low load, probably no more than 10-20 ops/second at peak. Records now are at 200,000 projected to grow 250k/year.


Thanks! I'd definitely give RethinkDB a try, I think you'd be pleasantly surprised at how nice it can be for this. You'll also find that you might want to run ad-hoc queries on JSON documents (e.g. for analytics/exploratory analysis, etc.) and having ReQL at your disposal will be great for that.

IMO picking a KV store makes the most sense if you have insane performance requirements (e.g. millions of ops/second) where you need to squeeze every last drop of performance and the query layer gets in the way.

(Also if you have any questions and don't want to share more info publicly, feel free to shoot me an email -- slava@rethinkdb.com)


Thanks, Slava I appreciate you taking the time here.


RethinkDB could definitely handle your load.. the advantage over CouchDB being you can do a bit more than just your value, and get indexing on other fields in your data structures. At your data load and growth, I'd be surprised if a small cluster couldn't handle your load for 8-10 years without issue... Since you're using JSON structures being able to query your data beyond keys is really nice... A document database is probably a more flexible option than a straight kv store.

If you wanted to stay closer to key/value with nice clustering and scaling options, I might suggest cassandra. To be honest, that would be overkill as a relatively small RethinkDB cluster would likely handle your load for several years without issue.


For traffic that low, you could use basically anything.


If you like the general style of CouchDB but want scale and availability that's been proven in mission critical user facing deployments for years, we just did a blog post about moving to Couchbase http://blog.couchbase.com/2015/august/moving-couch


That's funny, I just installed that over the weekend and was testing it out. It's really good but what sort of through me off was the high $/node licensing fee if you want the Enterprise edition. It basically starts at $5k/year/node and that's just too oppressive.

Do you use the Community Edition and if so is it generally stable / suits your needs? I didn't read your article but I'll check it out tonight.


I'm a cofounder of Couchbase. Almost all our code is Apache 2.0, and we make the Community builds available for everyone. Enterprise builds are free to use for a small test cluster.

Unless you are a business where $5k/node looks reasonable to pay for peace of mind, you are probably fine with the Community Edition. A lot of the customers paying for Enterprise Edition are moving to Couchbase from Oracle, so they are paying less per node and running fewer nodes than they used to.


As far as I can tell the Community Edition lags far behind the Enterprise Edition in stability and improvements. The latest available Community Edition is v3.0.1, released in October 2014. Since then, there has been a v3.0.2 edition released in December 2014, and a v3.0.3 released in March 2015, both of which had "critical bug fixes" according to the release notes, plus a new version, 3.1.0, released in July 2015. None of these post-3.0.1 versions are available as a Community Edition, they're only available to Enterprise customers -- is that correct?

The Couchbase web site says: "Bug fixes and new features are eventually integrated with the Community Edition (CE), but this typically takes several weeks." But from what I can tell it's not "several weeks" -- it's at least 10 months.

Am I misunderstanding something?


Hi, it seems we missed correcting this page. It takes months not weeks to release updates to CE. The releases after 3.0.1 (3.0.2 and others) are maintenance releases with bug fixes. 3.1 is a recent release and it does contain features. We would normally push a community edition version or 3.1 in several months after the release but we are close to pushing out 4.0 and that incorporates 3.1 enhancements and more. I am expecting that we will get a 4.0 community edition out that will supersede the 3.1 release and will deliver these features to you soon.


If you want to use the latest features without waiting for a CE build, it's getting easier to build Couchbase Server all the time. This readme is where our new hires start: https://github.com/couchbase/manifest


Thanks for pointing this out... I don't have any experience with CouchBase/CouchDB. But I do appreciate open-source database options, especially those where the open version closely matches the commercially supported options.


If bug fixes take weeks to get to the CE, as one post below quoted from Couchbase's website, how is it possible that you say that the CE is probably fine?

I'd rather say it's pretty useless. I wouldn't run anything where a bug fix (which could be a data corruption issue, a crash...) may take weeks to be fixed.

Unless I'm missing something, my opinion is: either use the commercial version or use a different product. Please correct me if I'm wrong.


Hi, if you are experiencing corruption of data with the community edition, please let us know. We will take a look at that immediately.

The paid version of the software funds the community edition for Couchbase. I cannot say our system is perfect and I am certainly happy to hear feedback on how you'd like to see community vs enterprise edition configured. Posting here is fine or I am at - cihan@couchbase.com. thanks -cihan


Ah, I apologize 'oppressive' isn't the right word. I really meant 'outside of our budget', but I'll certainly be test-driving the CE this month. Thanks again.


I'm curious about "We chose to only store cluster metadata in Raft due to performance limitations imposed by distributed consensus"

I would naively assume that since you need to replicate your database to the slave servers, you need to send the WAL or changes somehow anyway. Could you elaborate on what was the problem there? I'd be surprised if it is because of a fundamental limitation of Raft (but could easily be a limitation of a Raft library!)


Suppose you configure RethinkDB to store three copies of the data. When you do a write, RethinkDB will replicate the data to three nodes; that operation is roughly a matter of sending three messages to the appropriate nodes, getting the acks, and sending the ack back to the client.

If we used Raft to replicate the document, in many cases it would be a lot more chatty. If you look at the Raft paper and track all the messages that would have to go back and forth, it would dramatically increase the latency for each write. This is inherent to Raft (and any distributed consensus protocol). Unfortunately distributed consensus isn't free; you have to pay pretty heavy latency costs which would be unacceptable in production, so we couldn't just uniformly apply that to every write.


OK, so it's a latency concern, which makes perfect sense.

> If we used Raft to replicate the document, in many cases it would be a lot more chatty.

I'm out of my depth here. Leader needs to send AppendEntries and slave needs to apply to persistent storage and ACK. Leader needs to wait for majority of ACKS before responding to the client. That's the same as your three-node replication scenario, so what am I missing here? Does Rethink db relax consistency guarantees in some cases to achieve better latency?

It's not your job to educate me on Raft, I appreciate your being patient with me, but feel free to opt out of this conversation anytime you please.


> Leader needs to send AppendEntries and slave needs to apply to persistent storage and ACK. Leader needs to wait for majority of ACKS before responding to the client. That's the same as your three-node replication scenario, so what am I missing here?

A couple of things -- the payload in Raft tends to be much higher (though it could probably be fixed with sufficient engineering effort), and in some scenarios this process would have to happen multiple times during netsplits (which may or may not be ok).

RethinkDB doesn't relax consistency guarantees; we implement them in a different way. Check out http://rethinkdb.com/docs/consistency/ for more details.

I'm a bit busy today, but this is a really interesting question. I'll see if we can do a technical blog post on this and go into all the details in depth.


The parent comment asked:

> Does Rethink db relax consistency guarantees in some cases to achieve better latency?

You responded:

> RethinkDB doesn't relax consistency guarantees; we implement them in a different way.

But the page you linked says:

> `single` returns values that are in memory (but not necessarily written to disk) on the primary replica. This is the default.

> `majority` will only return values that are safely committed on disk on a majority of replicas. This requires sending a message to every replica on each read, so it is the slowest but most consistent.

which seems to imply that the default settings sacrifice consistency for better latency. Can you (or someone else, if you're busy) clarify?


Sorry, this is a bit nuanced and my comment was unclear.

We implement a variety of modes for reading and writing that allow the user to select different trade-offs for consistency and performance. By default: writes are always safe; reads are safe when the cluster is healthy, but can sometimes have anomalies in failure scenarios. You can also do `majority` reads which are completely safe even during failure scenarios, but are slower.

However note, that the default read mode isn't an "anything can happen" implementation. The guarantees are precisely defined, and 2.1 passes all tests in a large variety of failure scenarios with respect to these guarantees.

My comment about not relaxing consistency guarantees was meant in a slightly different context. The OP was talking write transactions (and implementing individual writes with Raft vs. using a different way), and I pointed out that we don't relax consistency guarantees for writes despite not using Raft.

I realize my comment was confusing -- sorry about that; I didn't mean to mislead.


tim@rethinkdb here. coffeemug's earlier comment is not quite right. There were two reasons why we went for this hybrid approach where Raft handles metadata but not documents: because of how Raft interacts with the storage system, and because of sharding.

In the Raft protocol, the leader sends AppendEntries; the follower writes the log entries to persistent storage, but doesn't apply them to the state machine yet; the leader sends another AppendEntries with a higher commit index; and then the follower applies the changes to the state machine. In RethinkDB's case, the "state machine" is the B-tree we store on disk. One of the guarantees we provide is that if the server acknowledges a write to the client, then all future reads should see that write; and we perform reads by querying the B-tree. So we can't acknowledge writes until they've been actually written to the B-tree, and we can't start writing to the B-tree until after the write has been committed via Raft. So that's where the latency would come from if we were using Raft to manage individual documents. We considered a couple of ways to work around this. One would be to make reads check both the B-tree and the Raft log; but that makes the read logic much more complicated. Another would be to start writing to the B-tree as soon as we put the write in the Raft log. The problem is that Raft expects to be able to roll back parts of its log at any time, and our storage engine's MVCC capabilities aren't good enough for that. The hybrid approach allows us to have good latency without major rewrites of the existing storage engine.

The other reason is that RethinkDB allows tables to be split into shards, and the number of shards can be changed while the databases is running and accepting queries. We considered having one Raft instance per shard, but we would have needed to modify the Raft algorithm to allow splitting and merging Raft instances while they are running and accepting queries. (This is approximately what CockroachDB is doing [1].) But the Raft algorithm is really tricky to get right even when you're not trying to modify it, and we wanted to stick as closely as possible to the official algorithm to minimize bugs. The approach we ended up going with allows us to have the performance and convenience of live resharding without having to modify the Raft algorithm.

In response to your question about consistency guarantees: RethinkDB gives users several options for trading off consistency and latency. The default is to acknowledge writes only once they're safely on disk on a majority of replicas, but to perform reads only on the primary replica (leader). This doesn't give perfect consistency; if the leader fails over, reads that hit the database around the time of the failover might see outdated data, or they might read writes that were rolled back as part of the failover. Unfortunately, the only way to get stronger consistency guarantees is to wait for a majority of replicas to acknowledge the read, which makes performance much worse. We offer a safe-but-slow mode for reads, but it's not the default because the performance is so bad. We also offer fast-but-unsafe modes for writes, for users that want better latency and are OK with losing the last few writes in the event of a failover. See the documentation [2] for more information.

[1] http://www.cockroachlabs.com/blog/scaling-raft/

[2] http://rethinkdb.com/docs/consistency/


Thank you for the detailed and fascinating response! I read and re-read it to make sure I didn't miss any details. I understand now why you chose to handle only metadata with Raft. I really appreciate the time you took to explain not only your reasoning, but your reasoning through the alternatives as well. Databases and distributed systems are both fascinating, and very challenging, problem domains. Combining them is even more challenging.


Have you Jepsen-tested the new 2.1 release? If so, what were its results?


Yes. We did months of internal tests, and 2.1 passes Jepsen tests. We'd love for Kyle to do his own analysis once he gets some free time. In the meantime there is a bit more info on this in the blog post under "testing" headline.


Based on this pull request: https://github.com/aphyr/jepsen/pull/70

Am I correct in understanding that the tests only pass in the "safe-but-slow" configuration that timmaxw described (https://news.ycombinator.com/item?id=10043746) and not in the default "fast-but-unsafe" configuration?


AFAIK the tests account for that and test different scenarios (i.e. not all unsafe modes are created equal; a product can still fail the relevant test cases in fast-but-unsafe mode). We pass the fast-but-unsafe tests (i.e. we fulfill the guarantees you'd expect in that mode), and also pass the stricter guarantees in the slow-but-safe mode.

However you're right that we don't pass stricter guarantees in fast-but-unsafe mode (but I'm not aware of any products in the space that do).


Good to know. The test code in that pull request only tests the case where read_mode="majority", so I'm looking forward to seeing the other scenarios when they become available.


We only submitted the configuration that passed all scenarios, but tested some of the others internally. You generally don't get guaranteed linearizability of operations under network partitions in any of the faster configurations.


Can one commission Jepsen analyses from Kyle? Or is it more a matter of hoping he'll do it on his own?


I'm excited to work with RethinkDB on my pet projects. Glad to hear you guys beefed up failovers, that's a big deal in a lot of places. ;)


It's a joy to use for side projects! I'm using it as the datastore in a web app for learning that automatically generates quiz questions from user-provided data. Integrating with it using the Python API has thus far been dead simple.


Awesome work guys. Could you give some details on what you're most proud of and what was most challenging?


There are two parts to this that are really challenging -- designing a robust system (since there are a million different edge cases that can happen in a distributed system), and designing an intuitive interface so that the database does the right thing out of the box, is configurable if users want that, and is actually easy to configure in practice.

Doing each thing independently is hard; doing both of them together in one system is really really hard. I'm particularly proud of how nicely everything clicks together, and how elegant the administration API is. It seems very simple and is extremely powerful; getting that right took many iterations over the course of a year.


Are you interviewing the RethinkDB team here, Russell? ;)


Fantastic news, this looks great! Just a quick question, can I update individual nodes one at a time in my running cluster, or do they all need to be on 2.1 at the same time?


Unfortunately 2.0 nodes cannot connect to 2.1 nodes and vice versa, so depending on your table configuration you might need to update all of them at the same time.


No problem, was just curious. Thanks!


A bit off-topic, but can you recommend any RethinkDB SaaS providers?


Many of our users use compose.io, and it's pretty solid. I highly recommend them.


How is the WAN replication performance for a cluster in 3 DCs with ~100-200ms RTT? Or is WAN replication performance on the todo list?


WAN replication works out of the box, and RethinkDB was designed with multidatacenter setups in mind. Feel free to email me at slava@rethinkdb.com if you need to discuss a specific use case.


Congrats on the release! Could you update the docker hub image? I'm impatient to try it...


Would RethinkDB be a good choice for an invoicing web application? where you can't afford loss of data.


Yes. RethinkDB has very strict durability guarantees (that are the same as traditional RDBMS). However, if you need transactions compliance across multiple documents, RethinkDB isn't a good choice.

See the faq for more details: http://rethinkdb.com/faq/


Are there plans (heh) to write an optimizing planner that can automatically evaluate which indexes to use?


Yes. There are a couple of proposals floating around -- check out https://github.com/rethinkdb/rethinkdb/issues/4150 for details.


RethinkDB is great and has a lot of great features, however the thing that has impressed me the most is the way they communicate with the community. They are incredibly responsive and friendly on GitHub and IRC. It's not uncommon to get a response to a bug report within an hour or two (not that they have any obligation to this). They're incredibly nice.

It looks like they try to follow http://www.defmacro.org/2013/04/03/issue-etiquette.html, it'd be great to see other companies adopt it too.

Thanks folks!


I could not agree with this more.. the development teams are definitely active with the community via github and irc... I followed along with the support for geolocation indexes, and was really happy to see that come into play.

It's taken a while to get to this point, but the development to get here has been methodical and incredibly well managed in terms of getting appropriate groundwork in place for a feature (like automatic fail-over) instead of just trying to hack at it or bolt it on the side. For that matter baking solutions for data streaming into the box, as opposed to less thoughtful options.


+1 to this, visited the team (I happened to be in California) once with very very short notice, and the office was a small tight-knit nice group of people.


I've said before how I really appreciate the approach the guys at RethinkDB have taken... With the automatic failover support baked in, this would definitely be one of my go to solutions. The management/admin interface is much nicer than any other NoSQL database out there, while offering a lot of the things that a traditional RDBMS offers.

I'd probably reach for RethinkDB before Postgres or others simply for the better administrative experience. Especially for small teams or start-ups that don't have a dedicated DBA role.

For anyone curious, the databases I would most likely reach for, depending on the situation would be RethinkDB, ElasticSearch and Cassandra. I really do like MongoDB a lot as well, but RethinkDB offers the features with far less friction, though the query interface takes a bit of getting used to.

That said, I also like more traditional RDBMS options as well. I REALLY like what PostreSQL offers, but have no desire to administer such a beast, failover isn't really baked in, and the best options are only commercially available, at a significant cost. There are also hosted options for AWS and Azure for various SQL RDBMS. That said, I find being able to have data structure hierarchies in collections tends to be a better fit for MANY data needs.

Congratulations to Slava and everyone else at RethinkDB.


This looks awesome .. great job guys .. Just a question on licenses . Server is "GNU Affero General Public License v3.0" and drivers are "Apache License v2.0." , so in simple english does it means that can i use make commercial products with backend as RethinkDB? these things always confuses me so apologies if i ask something stupid here ..


Yes, you can build a commercial product with RethinkDB in the backend without paying us any money; that's explicitly allowed by this licensing scheme.


The AGPL confuses a lot of people on a lot of projects. It's actually really simple: If you modify the package (RethinkDB itself, in this case), you have to release your changes. That's it.


That's not precise. If you modify the package and distribute it to your users, they must have access to the source code (including the changes, which also become licensed under the same terms). Up to this point, it's the same as with GPL.

AGPL also enforces code and changes to be distributed to the users if they directly consume your database over the network. This would be the case, for example, of a database-as-a-service.

Put it simply: you use an AGPL database, internally, even for your SaaS: you're fine. You either modify RethinkDB and ship it as a product or you provide a RethinkDB or RethinkDB-derived database-as-a-service, then you have to also provide source code to your users.


I'm also curious about this. It seems like the ASL drivers are a bit of a loophole. What happens if later Rethink decides to relicense their drivers as AGPL with a commercial option for businesses? Can 3rdparty drivers be ASL or is that a violation of the AGPL?


We guarantee that we'll grant anyone who wants to release a driver a more permissive license. We're also happy to do it in writing, both for driver developers and users.

Theoretically we can stop doing that in the future and keep changing the protocol, but that would alienate all of our users. That would be an insane decision, and we'll never ever do it.


AGPL is outright scaring. Patch it and suddenly you get in the business of releasing code, checking dependencies licenses, exposing the server code to anybody, etc. Deal breaker.


Then pay for it?


@coffeemug, do you have an ETA on when performance benchmarks will be released?


This is totally my fault. I've had a pretty comprehensive benchmark report pdf sitting in my inbox for a few months now that the performance team put together, but something else always took my attention away. I'll ask the documentation team to format it and add it to the official documentation. Sorry this slipped through the cracks.


Really looking forward to seeing this.


Great documentation with some useful examples and tutorials to get you started. I just tried it and very impressed with the performance and ease of use , especially the admin section is very handy. Need to try it with cluster , any docs/videos on creating the cluster with different machines across the globe?


Check out http://rethinkdb.com/docs/sharding-and-replication/ -- lots of information there.


As a heavy Heroku user - I'm wondering - is there some hosted RethinkDB solution?


Utilizing rethinkdb on compose.io with heroku does take some hackery, mainly around deploying to the same colo (amzn us-east-1), and then using heroku env variables to store your ssh key that will be used to connect your api/svc to compose box. See this article for more info: https://www.compose.io/articles/tunneling-from-heroku-to-com...

I was never able to get the referenced python script working, but using a shell script inside `.profile.d` did the trick.


Compose.io offers hosted RethinkDB instances, I'd recommend them: https://www.compose.io/rethinkdb/


We've been using compose.io with good success.


Can I ask please why you don't provide ready to use, fine tunned amazon images? This is preventing me to use it now as I cannot find reliable configuration or information. Also the current image is out of date. Thanks


Amazon changed the image format, and we've been working to update our images to use it. Here's some more detail: https://github.com/rethinkdb/marketplace-ami/issues/1

If you like, you can launch a 1.16 AMI, ssh into the instance, and run `apt-get update && apt-get install rethinkdb` to update to RethinkDB 2.1.

Thanks for being patient -- we'll get the official images updated soon.


This article describes deploying on AWS with a pre-built AMI -- http://rethinkdb.com/docs/paas/.

We have an aggressive release cycle and try to release ever 2-4 months. Unfortunately the process to get an image onto AWS is fairly slow (with the exception of security patches). So the AMI images are unfortunately a little behind. We're working hard to update them as soon as possible, but unfortunately most of the process is out of our hands.


Very cool, thanks for all the hard work that went into this. Will the docs [1][2] be updated at some point to reflect the Python 3.4.x asyncio support? Right now just Tornado is documented.

[1] http://rethinkdb.com/docs/async-connections/ [2] http://www.rethinkdb.com/api/python/set_loop_type/


Yes, we'll update the docs very soon.


Great news! Keep up the good work. It's getting harder and harder to justify not using rethinkdb in production :-)

...Doesn't seem available on homebrew yet though.


We should be able to make it available on brew very soon.


Thanks!


Finally, we can convince our management to start to use it. All of the beautiful of ReQL, then addhing high availability What else I can expect more.


> Always on – you can add and remove nodes from a live cluster without experiencing downtime.

This has been a long-awaited feature for me. While I loved nearly every aspect of RethinkDB, it was the reason that made me hold back from using RethinkDB. Good to see RethinkDB keep improving!


I couldn't really find any good docs on how to use the various async Python drivers...? All I found was some references to Tornado under `set_loop_type`.

Also, very much looking forward to trying this out!


Besides the documentation article Slava linked to, you might find this blog post helpful: http://rethinkdb.com/blog/async-drivers/

If there's anything else you think is missing from the documentation that would be helpful, open an issue here: https://github.com/rethinkdb/docs


We're still working on the full documentation for the Twisted and asyncio backends. We'll put it up in the next days. You can find the Tornado documentation under the link coffeemug mentioned.



So happy you've added in Math functions into ReQL. Thank you!


Any support for Windows yet? I'm keen to move to rethinkDB from using redis, but my development is done on Windows at the moment.


A Windows port is in progress (https://github.com/rethinkdb/rethinkdb/labels/windows), but it will take some time to complete. Windows support is one of our most-requested features: https://github.com/rethinkdb/rethinkdb/issues/1100


Comics is awesome.


official JDBC drivers please :)


I know it's kind of a mixed bag. I'm also slightly, but not completely surprised that .Net and Java aren't officially supported.. I wouldn't expect JDBC support, but would expect to see a Java library for use with RethinkDB. Given the flexible nature of RethinkDB and other document databases, dynamic language environments tend to be much easier to support over static environments like .Net and Java.

Honestly, if you want to use RethinkDB with Java, it may be worthwhile to write a domain centered service with Node or Python and have that as an intermediary for Java. I've actually used Node on several occasions as a translation service for requests to foreign systems as sometimes there's a lot of disconnect in specific implementations of SOAP/WS-* services from Java/PHP/.Net etc. It tends to work very well for this use case.

I've actually been looking at GraphQL with some interest, and thinking this could be a pretty awesome option in front of RethinkDB, which would open up to pretty much any client that supports graph, which is limited right now...


I'm actually writing the official Java driver right now. We should have a beta version out in a few weeks

Follow https://github.com/rethinkdb/rethinkdb/issues/3930 for progress updates


While it's not JDBC, we are working on an official Java client driver: https://github.com/rethinkdb/rethinkdb/issues/3930


One way to get this would be to develop a Teiid translator (plugin) for Rethink.

https://issues.jboss.org/browse/TEIID-3303 http://teiid.jboss.org/

The normal use case is to use the Teiid JDBC driver (or Postgres ODBC) to connect with the Teiid server and then that handles talking to your datasources via the translators.

I think there was some work done to make a standalone MongoDB JDBC driver using an embedded Teiid server, but I'm not sure how much progress was made there. Mapping document to relational can be tricky.


It won't be JDBC (since we have our own language ReQL) but we're working on an official java driver now: https://github.com/rethinkdb/rethinkdb/issues/3930


It's a shame that they don't have Java support. Although rethinkdb was a better fit, lack of official Java driver is what prompted the company I work for to go with another doc db. I could see they are working on it, but it is too late for us unfortunatley.


Great news, Does any one know/recommend a python ORM for RethinkDB ?


Uhhh what? "Added support for Jepsen tests" shouldn't mean "we sent Kyle a pull request"

That seems like deceptive marketing to me. Once the pull request is merged, then you can make that claim.

Here's the open PR for Jepsen: https://github.com/aphyr/jepsen/pull/70


I wrote that paragraph of the blog post, and I think I didn't phrase it correctly. What I meant to say is that we hacked Jepsen tests to support RethinkDB, and then ran the tests internally for months. I added that note late last night, definitely could have phrased it better.

We've been working with Kyle on this though, and I think he seemed happy to merge the PR (but he's pretty busy). Hopefully will happen soon, sorry for confusion.

EDIT: here is some context -- https://github.com/rethinkdb/rethinkdb/issues/1493#issuecomm....




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: