Hacker News new | past | comments | ask | show | jobs | submit login
RethinkDB 2.0 is now production ready (rethinkdb.com)
407 points by hopeless on Apr 14, 2015 | hide | past | web | favorite | 152 comments

Is Rethink going to stay in the community? Or is there a chance that it could be bought out? I don't want to spend time learning something and have it go private like FoundationDB. I'm assuming GNU and Apache is a good thing?

How is RethinkDB licensed?

The RethinkDB server is licensed under the GNU Affero General Public License v3.0. The client drivers are licensed under the Apache License v2.0. http://rethinkdb.com/faq/

Slava, CEO @ Rethink here. There are two aspects that you should consider.

Firstly, as Daniel pointed out, RethinkDB is licensed under AGPL. An acquirer wouldn't have the legal means to close the source code, and with over 700 forks on GitHub they also couldn't do it practically.

But beyond licensing, consider our personal motivations. We've been working on RethinkDB for five years, and had quite a few opportunities to sell the company. We turned them all down because we really believe in the product. The world is clearly moving towards realtime apps, and we feel it's extremely important for open realtime infrastructure to exist. It's easy for people to make promises about the future, but consider this from a game-theoretic point of view. If we wanted to sell, we could have done it long ago. I know it's not a guarantee, but hopefully it's a strong signal to help with your decision.

(Also, there are lots of really interesting companies building products on RethinkDB that we can't talk publicly about yet. It would be silly to sell given that momentum)

I hear you talk a lot about real-time — I guess that's a niche that you noticed. But let me add to that: being "distributed" without major pain is also big. There is a niche to be filled on the (loosely defined) "distributed" spectrum between, say, Redis and Cassandra, and so far you seem to be heading right for that place. I like that a lot and plan to use RethinkDB for a number of projects in the near future.

What about Couchbase? Specifically for the niche on the spectrum between Redis and Cassandra.

My understanding is that Couchbase is just a key-value store, while RethinkDB has much more features including complex queries, sorting, joins, indices...

Couchbase has indexing, map reduce, and full text. Also a mobile sync connector.

Couchbase conflict resolution is really basic, assuming you mean distributed in a wan context (eg xdcr). Riak would represent the state of the art in this respect (random internet endorsement, I'm not affiliated with anything, just a user who has worked with both systems).

Our XDCR doesn't have quite the power of our mobile sync conflict management, but I'd wager to say what we do for mobile is unbeatable as far as delivering unsurprising behavior for offline / p2p applications.

Minimal hardware requirements are... huge.

In the unlikely chance that you can talk about future plans, is the idea to commercialize by offering "enterprise" support on top of Rethink, or an extended-feature closed source version, etc...?

At some point people need bread and butter, so I'm curious where that's going to come from :)

<3 Rethink.

Our business plans are all about a subscription support model (see http://rethinkdb.com/services/) and enterprise services on top of RethinkDB. The product will always be open source (hopefully the OSS community at large is past the world of closed source "extensions").

FoundationDB was a closed-source database. It was never open-source.

They had only open-sourced the SQL-Layer on top of their key/value store and it's still available on Github. The reason: They build it based on open-source code

When someone deletes a public repository on Github, one fork remains as the new master. (Here's FoundationDB's SQL-Layer: https://github.com/louisrli/sql-layer)

So: RethinkDB will stay, even if someone tries to pull the plug. Just fork them on Github. :)

Daniel @ RethinkDB here. As you mention, RethinkDB is fully open source so RethinkDB is always going to remain freely available.

This does not discount the possibility of a license change after a hypothetical acquisition, however. Though in that case you'll likely get a community fork branching off the upstream proprietary base.

No, the AGPL uses strong copyleft, so any future derivative work must be released under the same terms (and the same license or later versions if I'm not mistaken). The only possibility is to start a closed source clone that doesn't use any of the original code from zero.

The cases in which the community forks a project licensed with a copyleft license (like LibreOffice) has to do with insatisfaction with the direction in which the company that owns the original trademarks is leading said project. There's no risk of closing the source code.

> No, the AGPL uses strong copyleft, so any future derivative work must be released under the same terms (and the same license or later versions if I'm not mistaken). The only possibility is to start a closed source clone that doesn't use any of the original code from zero.

Slava @ RethinkDB here.

This has one exception -- the copyright owner can choose to start releasing enhancements as closed source, and they wouldn't be legally obligated to open source them. Currently RethinkDB, Inc. owns the rights to the code -- if we wanted to continue developing closed-source enhancements, technically we could. If we were acquired and our acquirer chose to do that, they could too (since they'd end up owning the rights to the code). Copyright and licensing are different things -- essentially the copyright owner has the right to relicense future code in a different way.

We have extremely strong incentives not to engage in bad behavior (and it runs against our beliefs), but I thought I'd point out that there is no legal barrier.

One caveat with this: if your project is open source and has accepted submissions from non-affiliated entities, you would either need to get them to assign the rights to that code to RethinkDB or remove them from the source if you were to relicense under something that would break the AGPL. This is where things like code releases come in.

This is also why many open source startups don't accept code submissions from outsiders until they've determined they're going down the path of a consulting-focused business model. It's just too risky.

Yes, we ask all contributors to sign the CLA (http://rethinkdb.com/community/cla/).

Just a note: the comparison with foundation db isn't all that apt -- Sun/Solaris+ZFS/Oracle is probably a more apt comparison to what could hypothetically change if RethinkDB is bought up (by, say, Oracle..).

I'm not suggesting that is likely, but that it's more what would/could happen -- there is already a real, working, full product that wouldn't disappear over night. And a commercial fork would likely be pretty painful for everyone, just as closed Solaris is looking less and less interesting as both many of the minds behind the great parts of Solaris work on Open Solaris in one form or another, and as fewer community resources go into closed Solaris.

I see the AGPL (as opposed to some ad-hoc license) as another benefit for RethinkDB. Many might not like the copyleft-part -- but at least it is a known and well-documented quantity -- no surprises likely to come from/with forking and/or when trying to merge with other Free software (be that BSD or GPL or...).

Slava, why do you consider that bad behavior?

- You & and your team have put in 5 years of effort. - You have generously shared your mind product. The community has it.

I don't [know] about you, but over here in NYC, we have to pay for food, rent, clothing, medical insurance, entertainment, etc. Not a single one of these transactions involves counter parties that would smile my way and say "it's on the house".

Well, for one thing, at this point it would be a bait-and-switch.

Authoritative source of information, thanks ;-) I will evaluate it on my next project.

Quick question: How would RethinkDB benefit an E-Commerce shop?

It depends on what you're trying to accomplish. Shoot me an email to slava@rethinkdb.com -- I'd be happy to help!

I don't consider the AGPL to be fully open source. To me, it isn't in the spirit of open source.

(BTW the Open Source Institute was unable to get a trademark for "open source" so it doesn't matter that they approved it.)

I hesitate to comment (we've had a few copyleft vs BSD etc-discussions...) -- still, I think the best way to look at the AGPL is as the GPL patched to work around the move from software distribution to software as a service: the end user no longer gets a copy of the software, and so the GPL doesn't protect the end user any more (which is who the GPL is for, incidentally the end user might also be a developer -- but that is incidental: the first Freedom (Freedom 0) is the freedom to run code. You don't have that freedom with SaaS -- if the service provider goes away, so does your ability to run the software).

Now, one can be in agreement with the idea that the four freedoms are important, especially as we increasingly live in a world where software is not only convenient, but necessary in our daily lives -- but the idea with the AGPL, and why it is needed for server software -- is pretty clear.

What does "fully open source" mean?


I've started to look into RethinkDB in the past, and I'm very interested in the features it claims. However, I only have so much time to investigate new primary storage solutions, and our team has been burned in the past by jumping too quickly on a DB's bandwagon when the reliability, performance, or tooling just wasn't there.

As of late, we've come to rely on Aphyr's wonderful Call Me Maybe series[0] as a guide for which of a DB's claims are to be trusted and which aren't. But even when Aphyr hasn't tested a particular DB himself, some projects choose to use his tool Jepsen to verify their own claims. According to at least 1 RethinkDB issue on Github, RethinkDB still hasn't done that[1].

Not to poo poo on the hard work of the RethinkDB team, but for me, the TL;DR is NJ;DU (No Jepsen, Didn't Use)

[0] https://aphyr.com/tags/jepsen

[1] https://github.com/rethinkdb/rethinkdb/issues/1493

Slava @ Rethink here.

This is a great point, and we're on it! We have a Raft implementation that unfortunately didn't make it into 2.0 (these things require an enormous amount of patient testing). The implementation is designed explicitly to support robust automatic failover, no interruptions during resharding, and all the edge cases exposed in the Jepsen tests (and many issues that aren't).

This should be out in a few months as we finish testing and polish, and will include the results of the Jepsen tests. (It's kind of unfortunate this didn't make it into 2.0, but distributed systems demand conservative treatment).

This conservative/consistent/responsible approach is one of the reasons I have faith in RethinkDB. You always seem to be taking the time to build it right and that is priceless.

Another good one for testing distributed systems/databases is blockade: http://blockade.readthedocs.org/en/latest/

Understood. We're planning to test with Jepsen soon. This will happen once we have implemented fully automatic failover (at the moment it still requires manual intervention, even though it's usually straight forward). We have a first working implementation, but are still working on the details. It should become ready in the next ~2 months.

See the issue you mentioned https://github.com/rethinkdb/rethinkdb/issues/1493 for progress on this.

I'm going to give this a spin out of pure respect for the team that's dedicated 5 years to a product without cashing out. Hats off. Your CEO has some respectable... anatomy.

I've been using RethinkDB for a while now and I really enjoy working with it. It's a great fit for React and Angular 2 apps with their one-way data flow through the application. Hook up a store or a model to an event source (server-sent events) that streams the RethinkDB changes feed and it's just awesome and simple. Realtime shouldn't be this easy, totally feels like cheating. Love it.

I also really like the ability to do joins, where before in Mongo I would have to handle data joins in the app level.

How do you deal with user authentication, authorization and data encryption? Do you have a web server/application server or do you just combine static js/html/css resources and RethinkDB?

I'm kind of enamoured with the idea of couchapps -- but I'm still not entirely comfortable with having my db be my web and app server, as well as having it manage passwords etc... as I'm reading up, I'm slowly convincing myself it's possible to both make it work, be easy, support a sane level of TLS, load balance and be secure with proper ACL support... but very few tutorials/books seem to really deal with that to a level that brings me confidence.

By "an event source [...] that streams the RethinkDB changes feed", the parent is implying a separate web service layer that consumes data from RethinkDB and sends it out to clients. RethinkDB is not meant for direct access by clients. More about RethinkDB access here: http://rethinkdb.com/docs/security/ (TL;DR: plaintext shared key or ssh tunnel)

Do you have any project on github that works like that?

Not that's open source, but I can do a little write up article and share a sample app that shows how to do it.

Would love to see this as well. I'm the creator of the Scala driver and always looking for ways to improve the api.I know you may not be using Scala but having insight on how other devs would use it always helps. Plus I'm still trying to figure out the best way to do change feeds hehe.

I'd personally love to see this. Think it would be very valuable to the community.

I second this. I am getting happier by the day with my react client side architecture and am now casting an eye to my server side (currently a Django-rest-framework api) to determine what the best fit there would be.

That would be awesome, thank you for considering it!

Please do. :)

Now if only Meteor would support this all would be good in the world.


RethinkDB's realtime capabilities would fit perfectly with Meteor.

How? Meteor's server-side architecture is still oriented around polling the DB, and I believe that's because many apps are still explicit request-response oriented.

As @imslavko said, when using Meteor with MongoDB (which I believe is the only production ready DB driver) it observes the oplog [1] for changes. You can use the polling observer too though.

You can find out more about the LiveUpdate core project of Meteor on their site [2] - it basically says the implementation of Live Updates for each db driver is independent to what the db is capable of. Specific mention of RethinkDB and Firebase is made as DBs that are built with making realtime data something that you get for relatively little work.

[1] https://github.com/meteor/meteor/blob/devel/packages/mongo/o...

[2] https://www.meteor.com/livequery

No, Meteor's server-side architecture uses MongoDB's replication log that is analyzed to get updates.

Congrats on the 2.0! It's been interesting to watch as a project.

Do you expect that as you stabilize you'll officially support more drivers? Or are you going to leave that as a community effort?

Slava @ Rethink here.

We're planning to take the most well-supported community drivers under the RethinkDB umbrella (assuming the authors agree, of course). It will almost certainly be a collaboration with the community, but we'll be contributing much more to the community drivers, supporting the authors, and offering commercial support for these drivers to our customers.

That's good to hear, because the Go driver has been exhaustively maintained by a single developer (https://github.com/dancannon/gorethink) Dan Cannon, and I'm sure he (as well as the Go community) would love to see some support.

Ditto regarding a java driver. It seems crazy for a database to not provide native java support.


Very glad to hear that. I tried using RethinkDB with Clojure recently, but there are two drivers. Both are mentioned on your pages. Figuring out which driver I should use isn't a great start — so even if you don't do a lot of development, just pointing to the drivers you consider "canonical" would help.

I had to create a small project for programming class. I settled on clojure and revise(bitemyapp's driver). Avoid this one. I hit on driver bug today, but didn't have the time to fix/report it. Just switched to the other one...

I've been using clj-rethinkdb in production for a few months and its stable. Its also up to date on features afaik and the codebase is fairly clean, so adding new ones as needed shouldn't be hard. The author is also reasonably responsive.

I've also used Revise a lot and it was great, but the author is busy and hasn't been able to keep up with RethinkDB releases so its now a few versions behind.

Anyone has some numbers on performance? I tried RethinkDB 1.x and the performance wasn't quite there yet, specially bulk import and aggregations.

We'll be publishing a performance report soon (we didn't manage to get it out today).

Rough numbers you can expect for 1KB size documents, 25M document database: 40K reads/sec/server, 5K writes/sec/server, roughly linear scalability across nodes.

We should be able to get the report out in a couple of days.

Any work done in 2.0 for improving aggregation performance?

The last time I tried with 1.16, I gave up my testing when even the simplest aggregation query (count + group by with what should be a sequential, streaming scan) took literally minutes with RethinkDB, compared to <1s with PostgreSQL. Rethink coredumped before I gave it enough RAM, after which it blew up to around 7GB, whereas Postgres uses virtually no RAM, mostly OS buffers.

We did a couple of scalability improvements in 2.0, but didn't optimize groups and counts specifically.

Would you mind writing me an email with your query or opening an issue at https://github.com/rethinkdb/rethinkdb/issues (unless you have already?)? I'd like to look into it to see how we can best improve this.

We're planning to implemented a faster count algorithm that might help with this (https://github.com/rethinkdb/rethinkdb/issues/3949), but it's not completely trivial and will take us slightly longer to implement.

What I was doing is so trivial, you don't really need this information. This was my reference SQL query:

  select path, count(*) from posts group by path;
(I don't have the exact Rethink query written down, but it was analogous to the SQL version.)

You can demonstrate RethinkDB's performance issue with any largeish dataset by trying to group on a single field.

The path column in this case has a cardinality of 94, and the whole dataset is about 1 million documents. Some rows are big, some not; each has metadata plus a JSON document. The Postgres table is around 3.1GB (1GB for the main table + a 2.1GB TOAST table). Postgres does a seqscan + hash aggregate in about 1500ms.

It's been months since I did this, and I've since deleted RethinkDB and my test dataset.

As a second data point: I tried

    table.groupBy(function (x) { ... }).count()
where the function maps the documents into one out of 32 groups (so that's less than your 94, but shouldn't make a giant difference... I just had this database around). Did that on both 1 million and a 25 million document table, and memory usage looked fine and very stable. This was on RethinkDB 2.0, and I might retry that on 1.16 later to see if I can reproduce it there.

Do you remember if you had set an explicit cache size back when you were testing RethinkDB?

Cool. Well, the process eventually crashed if I used the defaults. I had to give it a 6GB cache (I think, maybe it was more) for it to return anything. The process would actually allocate that much, too, so it's clear that it was effectively loading everything into memory.

Are you sure the analogous RethinkDB query was using the index? Iirc it's not enough just to use the column name (or wasn't, I don't keep up).

It wasn't using an index, but then Postgres wasn't, either. I don't think aggregating via B-tree index is a good idea; aggregation is inherently suited to sequential access. An index is useful only when the selectivity is very low.

If you wrote your query with group and count, with no index, then there would be problems with the performance. RethinkDB generally does not do query optimization, except in specific ways (mostly about distributing where the query is run), unless that's changed very recently. You can write that query so that it executes with appropriate memory usage with a map and reduce operation.

Do you think map/reduce would result in performance near what I get from Postgres?

You would get the same behavior that Postgres's would be in terms of how data is traversed and aggregated -- that is, not by building a bunch of groups and counting them after the fact. I do think RethinkDB ought to be able to apply aggregations to group queries on the fly though... I'm not really up to date on that.

Postgres will still have better numbers, I'm sure. It has a schema for starters.

While I don't know RethinkDB is structured internally, I don't see any technical reason why a non-mapreduce group-by needs to load the entire table into memory instead of streaming it, or why a mapreduce group-by needs to be slow. M/R only becomes a slow algorithm once you involve shards and network traffic; any classical relational aggregation plan uses a kind of M/R anyway.

Postgres has a schema, of course, but it still needs to look up the column map (the ItemIdData) in each page as it scans it, the main difference being that this map is of fixed length, whereas in a schemaless page it would be variable-length.

Anyway, I'm hoping RethinkDB will get better at this. I sure like a lot about it.

Generally speaking RethinkDB doesn't query optimize, except in deterministic ways, unless they've changed policy on this. I don't see any reason why a plain group/aggregate query couldn't be evaluated appropriately -- I know it is when the grouping is done using an index, maybe it is now when the grouping is done otherwise (I don't know, but it would be sensible, I'm out of date).

(Also it would be nice if it did/does, because performance will still be terrible if you have too many groups, otherwise.)

I haven't used RethinkDB, but I would assume the answer is no. Choosing to use map/reduce is basically a declaration that performance is your lowest priority.

An optimally optimized query by Postgres would be effectively mapping and reducing.

And the point is that the converse is definitely not true.

Postgres knows about the structure of your data and where it's located, and can do something reasonably optimal. A generic map/reduce algorithm will have to calculate the same thing as Postgres eventually, but it'll have tons of overhead.

(Also, what is with the fad for running map/reduce in the core of the database? Why would this be a good idea? It was a terrible, performance-killing idea on both Mongo and Riak. Is RethinkDB just participating in this fad to be buzzword-compliant?)

While there have been some truly misguided mapreduce implementations, mapreduce is just a computation model that isn't inherently slower than others: A relational aggregation of the type you get with SQL like:

  select foo, count(*) from bar group by foo
...is essentially a mapreduce, although most databases probably don't use a reduce buffer larger than 2. (But they would benefit from it if they could use hardware vectorization, I believe.)

Mapreduce works great if you are already sequentially churning through a large subset of a table, which is typically the case with aggregations such as "count" and "sum". Where mapreduce is foolish is when you try using mapreduce for real-time queries that only seek to extract a tiny subset of the dataset.

There is no relevant knowledge that Postgres has that RethinkDB lacks that lets it evaluate the query more efficiently (besides maybe a row layout with fixed offsets so that it doesn't haven't parse documents, but that's not relevant to the reported problem). A generic map reduce certainly would have more overhead, obviously, but not running-out-of-memory overhead reported above, just the overhead of merging big documents.

The reason you run queries in "the core" of a database is because copying all the data outside the database and doing computations there would be far worse.

Thanks for the info. I'll look into this. The fact that we are running out of memory suggests that we're doing something wrong for this query.

Rough numbers indeed - you forgot to define what a "server" is -- dedicated hw 16 core xeon with 4xssd in hw raid0 or a Digital Ocean vps with 512MB ram? ;-)

Daniel @ RethinkDB here. We'll release the details shortly. This was running on 12 core Xeon servers with 2 SSDs each in software RAID 0. There were also additional read queries running at the same time as the write queries, and the read throughput that coffeemug posted is the sustainable increase in reads/s that you get when adding an additional server to a cluster. Single-server performance is much higher due to missing network / message encoding overhead.

I realize these numbers alone are still not very meaningful and there are many remaining questions (size and structure of the data set, exact queries performed etc). Rest assured that all of these details will be mentioned in the actual performance report that should be up soon.

10 gbps ethernet?

Two 1 GBit ethernet ports, one used for intra-cluster and one for client connections.

Thank you for providing some quick back-of-the-envelope numbers here, which is exactly what most people are looking for at a first pass. One question though - do those numbers change considerably between disk vs SSD?

The numbers were measured on SSD. If the active/hot dataset fits into RAM, the numbers between SSD and rotational don't change much. If the active dataset doesn't fit into RAM, RethinkDB performs significantly worse on rotational.

Can you give a ballpark on how many nodes Rethink can scale up to, and any future roadmap in that direction?

Thank you for the fantastic product, by the way! :)

We often run tests scaling up to ~40 nodes without problems. You could probably push Rethink quite further than that, but I think over 100 nodes would be hard. The goal is to keep pushing the boundary indefinitely.

Thank you Slava. Put all these numbers on the front of your webpage :)

Looking forward to seeing the report!

I contributed the benchmarks to Dan's gorethink driver. Dan is great to collaborate with so if you want to hack on Go and contribute to OSS, consider giving his project a look.

One way to improve writes is to batch them, an example is here.


I believe rethinkdb docs state that 200 is the optimum batch size.

Another way is to enable the soft durability mode.


"In soft durability mode RethinkDB will acknowledge the write immediately after receiving and caching it, but before the write has been committed to disk."


Obviously your business requirements come into play. I prefer the Hard writes because my data is important to me but I do insert debug messages using soft writes in one application I have.

*Edit: Heh I forgot to mention, on my Macbook Pro I was getting 20k w/s while batching and using soft writes.

Individual writes for me are hovering around 10k w/s on the 8 cpu 24gb instance i have. But yeah, define your business reqs then write your own benchmarks and see if the need is met.

Many devs write benchmarks in order to be the fastest and not the correctest. Super lame.

For the rubyists out there check out http://nobrainer.io/

Is anyone using nobrainer in production?

We're currently using Mongoid (MongoDB ORM), and an Active Record like ORM for RethinkDB is the main thing holdings us back.

I don't have great insight into nobrainer, but last I checked it seemed like joins wheren't implemented (but on the roadmap).

I like rethinkdb and have been successfully using their official ruby and js libraries for some time.

Nobrainer orm wasn't fun though, too many edge cases that interfere with activerecord and rails conventions. Going a bit on a tangent, after many experiments I've developed a strong conviction that pg is the best database choice for rails, especially with the jsonb datatype included in 9.4. It is the best of two worlds: reliable, proven sql db that plays really well with Rails and has nosql capabilities, including indexing and quering. So good. Ymmv.

Selling support is a great non-intrusive business model.

Except that it incentivises a company to build a product that requires continuing support.

That can be a good thing or a bad thing.

> Except that it incentivises a company to build a product that requires continuing support.

People say this a lot, but in our case we really haven't seen this incentive for a couple of reasons.

Large organizations are more than happy to pay for training and development support to accelerate their time to market. It doesn't matter how polished your product is -- databases are complex enough that people are willing to pay for best practices, training, and support.

Similarly, databases are pretty critical pieces of the infrastructure. If anything goes wrong, it can significantly impact the business, so people always want operational/production support.

There are many enterprise services that can be built on top of the product that can be very valuable. You don't have to build a crappy product -- there are plenty of ways to monetize with a great product.

Finally, a bad product will significantly limit growth of the company in the long term. There are lots of options now -- you can't get away with building a crappy product and an artificial monopoly.

If you see a crappy product from a company that offers subscription support, it's probably not because of misaligned incentives. Building databases is really hard, I don't think the business model has much to do with it.

Selling support for terrible (but free!) software is usually known as the "MongoDB model," so it's a proven path to riches in the database market.

Lots of congratulating on this thread and a hell of a lot of points for a software release. I've been on HN consistently for a long while and I didn't realize there was so much love and hype for RethinkDB here.

Have I missed something?

I guess you have. There are a lot of us into alternative databases that are hoping for Rethink to fulfill the original promise of MongoDB. That said, I can't blame you for not devoting a bunch of attention to it. :)

Can you be more specific on the original promise MongoDB didn't fulfill?

It's a nightmare to scale and has performance quirks that are really unexpected. Many, many companies have had to spend enormous amounts of developer time to migrate off of MongoDB to something else.

That and the unreliable-by-default write settings.

I think mongodb is following a similar path to what MySQL did. Be really good at one thing, market as something else -- and then slowly, slowly catch up to the hype (sort of).

As I understand it mongodb have changed the default settings (probably why someone downvoted you) -- but the fact that it was off by default is still something that is rightfully hard for the team to live down.

And while a lot of people are probably still happily using MySQL -- I personally see little use for it, when PostgreSQL is an option.

I maybe wrong, but I think both mongodb and mysql appeal to the same groups: people that don't know or care about normalization, databases and datastructures -- and really just want image (as in Smalltalk) based development, but has been tricked into using php/javascript etc.

It's kind of crazy that you have two mature (one Free, one free) object databases that have seen some real-world usage -- and neither get any love.

One is zodb, the Z object database, developed for zope/plone -- one of the first web application frameworks -- and a major contributor to python (invented eggs, buildout...). It's ridiculously easy to use outside of zope/plone/pyramid[1] and now has a free replication service[2].

The other one is gemstone glass[3] which works with Smalltalk and have their own ruby runtime, maglev[4].

[1] http://zodborg.readthedocs.org/en/latest/documentation/artic...

[2] http://www.zope.com/products/x1752814276/Zope-Replication-Se...

[3] http://seaside.gemtalksystems.com/

[4] http://maglev.github.io/

I've been following RethinkDB on HN for quite a while now and have been eagerly awaiting them to make a production-ready statement. Everything I have read has sounded very promising and I am excited to try it out!

Awesome news. I have used Rethink for a few internal projects and while I don't think it has that one "killer feature" that other DB's don't it is such a painless experience in development and deployment that makes just worlds better then trying to set up and scale some of the other solutions.

BZ rethinkdb team.

Congrats on the 2.0 release! Changefeeds are an incredibly powerful feature. We're looking forward to the next release with automagic failover!

Congratulations, been looking forward to this release for a while!

I think this a good place to say thank you for you're work on the Go Rethink driver. This is a clear written easy to follow and effective peace of code.

Thank you very much! I hope to have an update to the Go driver which supports RethinkDB v2.0 within a couple of hours.

I would like to thank you as well! I didn't really have any time to work on rethinkgo after I made the first version, thanks for doing such a good job with gorethink.

Just released the latest update to the Go driver, it has some pretty big changes including the ability to connect to a RethinkDB cluster + automatic host discovery.

For more information check out https://github.com/dancannon/gorethink/releases/tag/v0.7.0.

congrats Slava, Mike & team. in an age of thin apps getting shipped in weeks or months, the patience you showed in spending 5 years developing some pretty hard-core technology is amazing. really excited for you guys!

any plans of releasing officially supported Java driver? For most enterprise oriented apps, having officially supported Java driver will be great.

Yes! No ETA yet, but we're on it.

Does RethinDB has a concept of transaction? My question is actually about restoring a lost node... If a node is rebooted, will all the data for its shards going to be sent again? Or just the delta?

Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?

> If a node is rebooted, will all the data for its shards going to be sent again? Or just the delta?

Just the delta. We built an efficient, distributed BTree diff algorithm. When a node goes offline and comes back up, the cluster only sends a diff that the node missed.

> Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?

You don't have to do that, it happens automatically. You can have full visibility and control into what's happening in the cluster -- check out http://rethinkdb.com/docs/system-tables/ for details on how this works.

> You don't have to do that, it happens automatically

Well, in a past life, I used another store that did that automatically, the issue with that is that EITHER it kills the cluster because of read-congestion as it re-builds the "new" node, OR, if you limit the bandwidth for node-building, it takes for ever and a half to rebuild a node which means that you are exposed with one less shard of what was on that node.

What are the chances of a filesystem snapshot to be consistent enough to be used to prime a crashed node? What about restoring backup files from other nodes?

Congestion vs. time is definitely a hard problem. We've done an enormous amount of tuning to make this work, and the upcoming Raft release does even more. This part has been quite solid for a while, so I think you might have a better experience with RethinkDB than what you're used to.

There is currently no other way to prime the node -- I hope we don't have to add it. This sort of functionality should work out of the box.

I've updated NixOS to include 2.0.0-1: https://github.com/NixOS/nixpkgs/commit/fe6ec3d13a1554458e64... - any way we can get it mentioned on the website?

Could you suggest a pull request in docs? (https://github.com/rethinkdb/docs)

Congrats guys, RethinkDB has been a joy to use so far, but the 3rd party .net driver needs some help. I filed an issue here: https://github.com/rethinkdb/rethinkdb/issues/3931

Big fan of RethinkDB. Use it in all of my projects these days.

What were you using before? What are the pros and cons of the switch?

I'm very happy to see this milestone, even tho I haven't used it recently I remember 2/3 years ago we tried it (adtech) for some heavy production workload. Even if we chosen another product (cassandra) I was literally surprised how well performed! Congrats!

Well done guys! Have been wanting to use rethinkdb for my project but it didn't have the "production ready" tag, so Mongodb was chosen instead. Now I can confidently switch! It's a pity the Go driver isn't quite there yet though.

I hope to have a "production ready" version of the driver ready in about a month. I know its slow but currently I am the only dev working on maintaining this project and all work is done in my free time.

If you have any further questions I would be more than happy to answer them on https://gitter.im/dancannon/gorethink. Thanks!

Hey no worries I absolutely understand. Apologies for lamenting on the pace/state of your contribution and thanks for your time and effort.

the commercial services launch is critical and will speed adoption from large players

Why would I use RethinkDB instead of OrientDB?

Check out http://rethinkdb.com/faq/ for details on when RethinkDB is a great choice. The short version is that if you're building realtime apps, RethinkDB is an awesome choice because it pushes data to the application (which makes building and scaling realtime apps dramatically easier).

Hi Slava, the FAQ has a typo in the second sentence: "architecutre".

Thanks -- fixed. Will take a little bit to push the site update live.

But ... but I expected the push to be real time. Just kidding.

Lots of hard work has been poured into this release :)

Congrats to the RethinkDB team!

Congrats Slava, Mike and the rest of the folks at RethinkDB!

Brilliant name (Yojimbo) and great cover photo there...

Congrats to the RethinkDB team on this huge milestone!

Looking forward to install it from homebrew but it's not there yet. Good to see that for python drivers PIP is already updated!

It should be out later today. We're working on it now.

Now that 2.0 is production ready, will we be seeing some RethinkDB providers? A simple Heroku integration would be amazing for quickly prototyping apps with a new database technology.

As Slava mentioned, you can use Compose.io. It requires using an SSH tunnel, though, which is a little tricky in Heroku. Here's a tunnel script I made to simplify this:


In particular, it reads the entire SSH private key as an environment variable, so you don't need to commit the key to the git repository.

You can spin up RethinkDB today with https://www.compose.io/ (it's surprisingly easy, and their support is awesome). It should be pretty easy to get a RethinkDB Heroku plugin based on Compose. If the community doesn't get around it, we can probably do it internally pretty easily.

Congrats Mike & Team!

Congrats guys! I've been looking forward to using Rethink.

Is windows support coming anytime?

I wish they did official TypeScript definition files. I'm a bit wary to rely on huge DB API with community definitions only.

There are reasons to write TypeScript definitions for documentation generation too, if not for the code as TS.

There is an official spec here: http://rethinkdb.com/docs/writing-drivers/ Not quite TS, but it's well defined and new releases of the spec are carefully managed.

For me the TS is a tool to ensure my code is not using deprecated API. This is a partly reason why Facebook is also pushing typing to JS with Flow.

Edit: And Guido is pushing it to Python with PEP 484: https://www.python.org/dev/peps/pep-0484/

It's inherent problem with dynamic languages, you have to read all new release documents and migrate your code. With typed code I at least can be somewhat sure I'm not using deprecated calls and such just by compiling.


Any thoughts about multi-doc transactions?

It's not currently on our road map.

Even though there are some well-researched algorithms for it, actually implementing transactions in a distributed system is pretty hard. It also comes at significant performance costs, which would interfere with our goal of easy and efficient scalability.

Thank you for the comment. I was wondering if something along the lines of http://blog.labix.org/2012/08/22/multi-doc-transactions-for-... would be feasible.

Congratulations guys! Amazing update :D

This is awesome :)

Congrats guys!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact