Hacker News new | comments | show | ask | jobs | submit login
The Future of RethinkDB (changelog.com)
128 points by williamstein on Nov 3, 2017 | hide | past | web | favorite | 56 comments

I'm interested in the backchannel and business that happens when a company that raises multiple millions of dollars shuts down. Do the investors take over? Do the founders have to pay back investors? Is there any liability personally by the founders to investors? This is something that is never really discussed, probably because it gets messy with lawyers.

EDIT: Just heard Mike say that CNCF acquired the ip and assets of RethinkDB for $20k? That's can't be right? Tiny startups that generate $1k a month in recurring revenue sell for more than $20K. RethinkDB raised over $12M right? What am I missing?

The transaction was more of a donation. Companies that get bought are because the technology is strategically important or it is profitable (either business model scales up, or business functions overlap and you can reduce expenses). I'd guess that no companies with enough money to pay sufficiently more than $20K though thought either of those were the case?

> I'd guess that no companies with enough money to pay sufficiently more than $20K though thought either of those were the case?

It seems like an official hosted RethinkDB that included enterprise support could generate pretty nice MRR revenue and take business from Compose.io (IBM). Shoot, wish I'd known, $20K for the ip and assets was a steal. Probably would have been exponentially more to a buyer who wanted to turn it commercial though.

I don't know. Databases are a hard business (see Riak, FoundationDB). Costs money to add features and fix bugs (and really hard to find qualified folks that can do that) as well as hosting costs if you offer a service. We'll see what happens with some other entrants. I would read http://www.defmacro.org/2017/01/18/why-rethinkdb-failed.html regarding DBaaS, too.

I've read it. Completely agree about RethinkDB specifically, who raised VC capital on the order of $12M. A cloud hosted solution is a very tough business, with thin margins when you have dozens of employees, $15-$25k a month bay area rent, and very high overhead. The numbers don't add up and not the return investors are looking for.

Though, if you're bootstrapping with your own capital and grow it to something like $10K or $20K a month in MRR, that's a win. I'm all about bootstrapping SaaS companies and growing recurring revenue.

The company wasn't bought, it went out of business. This transaction was just for the rights to the IP so that the open source project could continue under the same name.

I really enjoy RethinkDB. Partly, that's cause it's the first time I've really dug into a nosql database, and I'm loving the freedom of it... Being able to dump stuff straight from a websockets feed straight into the database is awesome! Also, I like the web interface. It's simple and shows me what I want to see with regard to performance and storage.

I haven't really gotten much into doing fancy queries or transforms or streams.

If someone could make a tool that let you use RethinkDB as a (more or less) direct back end for pandas... That would be killer

I do a lot of work with RethinkDB + Pandas. We should talk.

I use RDB heavily with some Pandas hanging off the side because we haven't integrated them well but looking to improve that. You can get in touch with me if you'd like - I use this handle pretty much everywhere (except reddit ;) ).

my contact info is at wondering.xyz/contact


I assume they're doing some dataframe munging in Python.


yup, and I love pandas! it's so amazing! it makes me want to do data analysis just for fun

I've had to convert 500+ CSV files into graphs... Pandas + seaborn were the most effective solution!

Is there any update on horizon.io?

it's github hasn't received any update in a very long while

I'm not sure if the pun was intended, but if not, it would've made an excellent joke.

It's almost hard not to make that joke, horizon.io just lends itself to that so readily.

Unfortunately, people don't always get jokes right away. Certain amounts of explanation can be necessary.

What is the primary use case for rethinkdb (vs other databases) ?

There isn't one. But there also isn't really a primary use case for postgres, as compared to mysql. It's just a pretty pleasant to use nosql database.

Easy clustering, first-class changefeeds, a somewhat-confusing query-system, runs in the current working-dir by default means it's dead-simple to set up for development. No fire-and-forget write.

Reminds me a bit of firebase, but free and open-source. If you're in a position where changefeeds or nosql are important, you should probably give rethinkdb a look.

That admin UI is the slickest in the biz tho

While not as powerful, I enjoy CockroachDB's UI too. And it's wire-protocol compatible with Postgres!

We use it here (ZeroTier).


- Very easy and robust clustering (Raft-based, automatic fail-over). This is huge for us.

- Streaming change feeds. This one is also huge. Makes any kind of real-time, reactive, or event driven programming very easy and IMHO is something that should exist in every database.

- It's kind of half SQL. It's a NoSQL document store but encourages a relational design and supports many relational queries.

- Rational and pretty easy to understand query language. It's much cleaner than Mongo.

- Easy to deploy and configure.

- It passed the Jepsen tests before Mongo did and overall has a solid history of not losing data.


- It's a CPU hog, at least when compared with PostgreSQL.

- It's also an I/O hog, though we sponsored some improvements that are getting merged in the next version that will reduce this and also make table commit a configurable parameter. You'll be able to have fully and partially (long flush delay) in-memory tables for highly ephemeral data.

Hey Adam - I'm very interested in your sponsored contributions towards reducing IOPS in RethinkDb. It's our biggest challenge, even though we're on SSDs. Our backfills after outages are especially long and painful. Are these updates something you're running safely in production today?

How do you handle the lack of transactions / atomic updates affecting more than one document?

If you're going to use a database that does not support these features then you should come up with a data model that does not rely on them.

For example, instead of applying a bank account transfer as a database transaction that debits one account record and credits another, you create a new transaction record (account transaction, not database transaction.) Then account balances are a sum over these transaction records.

Our data model generally doesn't require this. We're actually okay with less guarantees than RethinkDB provides. AFIAK NoSQL stores in general are a bad choice if you need this. You should use a SQL database.

To be clear, nosql vs sql doesnt mean much - use the right type of database for the scenario: relational, document, graph, key/value, etc.

They all have various support for transactions with relational usually the most comprehensive.

Not OP, but I admit that's not a problem I've ever had. I've done a lot of webapps supporting data-science pipelines, and I've built some major components of those pipelines. It's not something I've felt the need for when I've used postgres.

What do you use that for?

Doing destructive data rollups (combining multiple rows and deleting the original rows) are made much simpler with transactions, especially if you do hybrid aggregations of short term data and long term data.

For example, you might store data with minute level granularity for the past 24 hours but only hourly for the past 30 days. If someone queries the past two days, you need to look at both those datasets. Then, every hour or so, you need to summarize an hour of minute level data, insert it into the hourly granularity table and then remove it from the minute granularity table. Meanwhile, you want to make sure any queries aren't going to double count that data after insertion but before removal.

This can be done without transactions in a few ways, but they require putting your replication and rollup logic and constraints into your reading code, rather than having it isolated to your roll up code. And your data model has to be tweaked to allow for some of these operations. And the complexity often results in double counting bugs (or bugs where the data is not counted at all).

There are solutions though. They just require a lot more hoops than starting a transaction, moving the data, committing the transaction.

It's good for really easy changefeeds. Eg in a multiplayer game scenario you might have several people touching rows in a table. Any client can construct a query filtering on their particular subset and trivially just say .Changes() and then get a fast, reasonably-robust changefeed with an image-then-deltas type interface. It's not particularly low latency and the latency is quite variable - so if you're timing requirements are of the millisecond order, look elsewhere.

Games and chat are always popular examples, but I think it's really valuable for data-science.

You can also "fill in" missing data. If you were writing a webscraper, you could make a service that looks for url objects without any content, and scrape them. Then make a service that looks for url objects with content but that is missing ML-filled in details, and have it fill those in.

It's pretty good for disparate teams with different sets of technology. One group doing document classification, another trying NLP, another using RNNs, etc.

There are a few times in my career where rethinkdb would have been a killer feature, especially with it's well-documented language bindings.

Definitely valuable, as an aside, changefeeds are now in mongodb too: https://emptysqua.re/blog/driver-features-for-mongodb-3-6/#c...

and postgres!

Do you mean using LISTEN/NOTIFY, or some other method?

Probably LISTEN/NOTIFY. I explain in detail how I migrated from RethinkDB to PostgreSQL using LISTEN/NOTIFY + tons of additional work here http://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.ht...

I've continued working on this codebase, even pushing commits this week like this one: https://github.com/sagemathinc/cocalc/commit/c20a62446b6e43c...

Haha, I actually just finished skimming through that exact blog post because it was one of the first things that came up when I searched for "postgresql changefeed" to see if there was some other functionality for doing it that I didn't know about.

I'll definitely need to go back and read it more thoroughly later and take a look through your code, thanks for the links.

You could do it with pglogical, if you're feeling particularly adventurous.

I think it's very tough to describe the "use case" for RethinkDB (I used it in production two years and have expressed that I don't think RethinkDB did exceedingly well at any particular use case to warrant being used over other solutions). It might be easier to list the differentiators in roughly the order they were introduced:

JSON document storage

ReQL Query Language


Easy Deployment

Administrative UI for monitoring, sharding, querying data

Change feeds

High Availability

For me, the real draw was how easy it is to setup a cluster with high availability and automatic failover. Having used MySQL in production on a mission critical web app for years, failing over between data centers and setting up replication again every time there was some kind of hiccup got very old very fast.

It's a place to store data.

- The data can be structured

- It can also have relations

- The query language is JS, and allows both 'shape of data' and functional queries

- There's live change feeds (which means the DB, being the source of truth, takes the role of initiating change messages)

- RethinkDB has an excellent reputation for being able to get the data back after you save it.

Basically it's like Mongo but not (insert adjective).

We've been using it in production for 2 years at CertSimple and have been very happy. Previous experience is Mongo, GAE data store, and various ORMs pointed at SQL. The docs are great, the defaults are safe, and doing new things is easy.

> RethinkDB has an excellent reputation for being able to get the data back

Uh. It's not always true, although not in the sense you've meant it (no complaints about storage reliability).

I don't know if my setup is broken or I did something stupid, but for me rethinkdb-dump saturates the CPU to the max and the only thing that keeps the machine from choking to death with LA1 going over 100 is resource limit on the database container. Trying to back things up results in random connection drops and timeouts. I gave up on trying to back up the database online.

And that's a very small database (75GB on disk, 12GB as uncompressed JSON, 2.5GB as a tarball), on a reasonably powerful machine. It's a single node, though - I thought I'd "upgrade" to a cluster at some point but it's way too early.

It's what MongoDB should have been. A relational document database, with changefeeds.

Could someone with the time to listen please provide a summary?

Listened to it last week -- The gist is that they've found a new home with the CNCF & The Linux Foundation, which bought the IP so that they could continue working on it publicly. Besides the database (which was always open source) this is especially important for parts of RethinkDB that were meant for "enterprise-only", which the company was working on internally before they shutdown. All and all the community support sounds strong, and after listening I decided to take another look at Rethink for my next project :)

Small edit: CNCF funded the transaction (to free the IP by relicensing under Apache-2.0) but the project is hosted by CNCF's parent, The Linux Foundation.

Disclosure: I'm executive director of CNCF and did the transaction. And, in case you're wondering, I'm thrilled that the community of people able to take advantage of the code is growing.

Dude, really thank you for your hard work on this. Out of curiosity, how did you pull it off?

There's a transcript on the bottom of the page (on mobile).

From the transcript, in case anyone was confused:


Thanks for pointing that out! Our transcriber usually does a great job, but he doesn't get 'em all right.

The awesome thing is that our transcripts are open source and somebody must've read your comment, because I just merged a PR fixing this.


Our site auto-updates the transcripts after a merge, so your comment is now outdated. :)

That was me, and I've read the transcript. :)

Or bazaar, rather.

Thats a lot of text...

Also https://xkcd.com/927/

I don't think RethinkDB ever intended to unify and replace all existing databases.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact