EDIT: Just heard Mike say that CNCF acquired the ip and assets of RethinkDB for $20k? That's can't be right? Tiny startups that generate $1k a month in recurring revenue sell for more than $20K. RethinkDB raised over $12M right? What am I missing?
It seems like an official hosted RethinkDB that included enterprise support could generate pretty nice MRR revenue and take business from Compose.io (IBM). Shoot, wish I'd known, $20K for the ip and assets was a steal. Probably would have been exponentially more to a buyer who wanted to turn it commercial though.
Though, if you're bootstrapping with your own capital and grow it to something like $10K or $20K a month in MRR, that's a win. I'm all about bootstrapping SaaS companies and growing recurring revenue.
I haven't really gotten much into doing fancy queries or transforms or streams.
If someone could make a tool that let you use RethinkDB as a (more or less) direct back end for pandas... That would be killer
it's github hasn't received any update in a very long while
Easy clustering, first-class changefeeds, a somewhat-confusing query-system, runs in the current working-dir by default means it's dead-simple to set up for development. No fire-and-forget write.
Reminds me a bit of firebase, but free and open-source. If you're in a position where changefeeds or nosql are important, you should probably give rethinkdb a look.
- Very easy and robust clustering (Raft-based, automatic fail-over). This is huge for us.
- Streaming change feeds. This one is also huge. Makes any kind of real-time, reactive, or event driven programming very easy and IMHO is something that should exist in every database.
- It's kind of half SQL. It's a NoSQL document store but encourages a relational design and supports many relational queries.
- Rational and pretty easy to understand query language. It's much cleaner than Mongo.
- Easy to deploy and configure.
- It passed the Jepsen tests before Mongo did and overall has a solid history of not losing data.
- It's a CPU hog, at least when compared with PostgreSQL.
- It's also an I/O hog, though we sponsored some improvements that are getting merged in the next version that will reduce this and also make table commit a configurable parameter. You'll be able to have fully and partially (long flush delay) in-memory tables for highly ephemeral data.
For example, instead of applying a bank account transfer as a database transaction that debits one account record and credits another, you create a new transaction record (account transaction, not database transaction.) Then account balances are a sum over these transaction records.
They all have various support for transactions with relational usually the most comprehensive.
What do you use that for?
For example, you might store data with minute level granularity for the past 24 hours but only hourly for the past 30 days. If someone queries the past two days, you need to look at both those datasets. Then, every hour or so, you need to summarize an hour of minute level data, insert it into the hourly granularity table and then remove it from the minute granularity table. Meanwhile, you want to make sure any queries aren't going to double count that data after insertion but before removal.
This can be done without transactions in a few ways, but they require putting your replication and rollup logic and constraints into your reading code, rather than having it isolated to your roll up code. And your data model has to be tweaked to allow for some of these operations. And the complexity often results in double counting bugs (or bugs where the data is not counted at all).
There are solutions though. They just require a lot more hoops than starting a transaction, moving the data, committing the transaction.
You can also "fill in" missing data. If you were writing a webscraper, you could make a service that looks for url objects without any content, and scrape them. Then make a service that looks for url objects with content but that is missing ML-filled in details, and have it fill those in.
It's pretty good for disparate teams with different sets of technology. One group doing document classification, another trying NLP, another using RNNs, etc.
There are a few times in my career where rethinkdb would have been a killer feature, especially with it's well-documented language bindings.
I've continued working on this codebase, even pushing commits this week like this one: https://github.com/sagemathinc/cocalc/commit/c20a62446b6e43c...
I'll definitely need to go back and read it more thoroughly later and take a look through your code, thanks for the links.
JSON document storage
ReQL Query Language
Administrative UI for monitoring, sharding, querying data
- The data can be structured
- It can also have relations
- The query language is JS, and allows both 'shape of data' and functional queries
- There's live change feeds (which means the DB, being the source of truth, takes the role of initiating change messages)
- RethinkDB has an excellent reputation for being able to get the data back after you save it.
Basically it's like Mongo but not (insert adjective).
We've been using it in production for 2 years at CertSimple and have been very happy. Previous experience is Mongo, GAE data store, and various ORMs pointed at SQL. The docs are great, the defaults are safe, and doing new things is easy.
Uh. It's not always true, although not in the sense you've meant it (no complaints about storage reliability).
I don't know if my setup is broken or I did something stupid, but for me rethinkdb-dump saturates the CPU to the max and the only thing that keeps the machine from choking to death with LA1 going over 100 is resource limit on the database container. Trying to back things up results in random connection drops and timeouts. I gave up on trying to back up the database online.
And that's a very small database (75GB on disk, 12GB as uncompressed JSON, 2.5GB as a tarball), on a reasonably powerful machine. It's a single node, though - I thought I'd "upgrade" to a cluster at some point but it's way too early.
Disclosure: I'm executive director of CNCF and did the transaction. And, in case you're wondering, I'm thrilled that the community of people able to take advantage of the code is growing.
The awesome thing is that our transcripts are open source and somebody must've read your comment, because I just merged a PR fixing this.
Our site auto-updates the transcripts after a merge, so your comment is now outdated. :)