RethinkDB: An open-source distributed database built with love over three years

mmorearty · on Nov 9, 2012

I don't know much about RethinkDB yet, but I will say that I have been a big fan (online) of one of its founders, Slava Akhmechet, for years. I've never met him, but he wrote some terrific articles on his website, http://www.defmacro.org/ , a few years ago. Start at the bottom of the list of articles, with "The Nature of Lisp."

Slava is a deep thinker, which makes me very excited to take a look at RethinkDB.

aberman · on Nov 9, 2012

I wish I could up-vote this a thousand times. Slava is one of the most genuine founders I've met, and I wish him and RethinkDB all the best.

jgw · on Nov 9, 2012

Indeed - he mentions in the article that he set himself a goal to convert 10 programmers into Lispers. Sounds like he probably has that many just in this thread! Kudos, sir!

samstokes · on Nov 9, 2012

He didn't make me a Lisper (I'm more a Haskell fan these days), but reading Slava's Lisp articles years ago was a significant part of what set me on my current career path.

He helped get me into functional programming, which got me a contract job [1], which is how I met one of my current co-founders.

[1] http://martin.kleppmann.com/2009/09/18/the-python-paradox-is...

reinhardt · on Nov 9, 2012

So is RethinkDB written in Lisp?

NathanKP · on Nov 9, 2012

No according the website it is written in C++.

erichocean · on Nov 10, 2012

And yet the Clojure guy did his distributed DB in Clojure (aka, Lisp).

Kind of makes me wonder why C++ was chosen...

alexpopescu · on Nov 10, 2012

I've joined RethinkDB just a couple of months ago, so I might not have all the historical facts right, but here is what I know.

In a previous incarnation RethinkDB was a highly optimized storage engine for SSDs implemented in C++ to be able to take full advantage of both low level SSD and kernel access.

The current distributed engine was built on top on this storage engine and I think it only made sense to continue with C++.

leif · on Nov 10, 2012

Originally, RethinkDB was to be a MySQL storage engine, which made C++ the natural choice.

They pivoted away from MySQL after my short stint in the beginning so I can't speak to why the storage code was kept (though I can't imagine it's because my code was so great they couldn't bear to throw it away).

Storage people tend to stick close to the metal, in general. This means C or C++ in most cases, for better or for worse.

dxbydt · on Nov 10, 2012

Thought I'll share this with you.

A yc company hired me. I showed up at their mountain view office. The founder said "This is the former office of RethinkDB! I hope we are as successful as them."

I didn't know who/what RethinkDB was, so I said ok, sure.

3 days later he asked me to clear my desk and leave. He said "You are the sort of person who should work in RethinkDB".

So I asked "What does that mean ?"

He said "RethinkDB is trying to solve very deep algorithm problems. They want somebody with CS knowledge to do deep research. That is what you are good at. But here we are just trying to run a business. You are not a good fit for that!".

So I left.

fragsworth · on Nov 10, 2012

I know lots of engineers who have trouble talking to people who don't share their knowledge. This problem is extremely pervasive - I'd say a good 25% or more have this problem to some extent. It's not a good thing when this happens - you need to be able to speak to laymen or you're gonna have a bad time.

I am going to go out on a limb here and suggest you try to work on being a bit more practical. Don't complicate things for the sake of solving difficult problems. Don't try to shower people with your engineering knowledge when it's not necessary, and don't expect everyone to know everything you do. And don't be an asshole about it either.

borplk · on Nov 10, 2012

I adjust the level of details and technical stuff pretty well according to who I am speaking to.

But sometimes someone with very limited knowledge of something asks me a detailed question about X.

What they ask is too difficult and complex to be described in a simple way. Either I have to overly simplify it which may insult them and will do no good or I have to go on and step by step give them digestible chunks of explanation which will inevitably be a bit technical even though I try to minimise that.

I think it's a two-way thing. The person should also consider their own level of knowledge before asking for an explanation of something and adjust their question based on that.

I don't ask my doctor to tell me why my heart does this and that because I simply don't have the knowledge to be able to understand his answer. I ask, my heart is doing ok? cool!

halfninety · on Nov 10, 2012

Looks like I'm one of the people you are referring to. Like anyone with a healthy dose of curiosity, I'm interested in anything that is, well, interesting. I'm excited to meet people with expertise in various areas and ask them questions. I don't expect to understand their answers in full but in most cases I can still grab part of them. Based on my partial understanding in the first answer I can ask a better question the next time, and after several cycles I can probably learn something valuable (at least in the sense of satisfying my own curiosity).

The point is, if you don't ask questions in areas you are not familiar with, you will never become familiar to these areas. Well, unless you learn everything from books and Wikipedia.

I'm not sure how many people see me as an annoyance, but at least I'm consistent, in that if other people ask me questions in my expertise, I'm happy to try my best and explain.

Oh, and if it's just impossible to reasonably answer my question in a way that makes any sense to me, I expect you to just say it, and I'm happy with this.

borplk · on Nov 10, 2012

You are not the person I'm referring to because you say you are happy with that.

I do what you described too, in parties or whenever I have an opportunity for a discussion with someone and I enjoy it and they ask me questions too and we try our best to teach each other something which is perfectly ok and fun.

What I was referring to was mostly employee/boss situations where the boss asks the employee about the details or internals of system X and then gets pissed when the engineer can't explain it to him and blames it on them because they were incapable of explaining complex things to non-technical people.

I mean they have to appreciate that there's a limit to how much you can explain to non-technical people in simple terms. At some point it just doesn't work, and either you have to use the big words, and concepts and assume knowledge, or drop back to dead simple insulting analogies. You see that server boss? That's like a train! choo-choo!

drivebyacct2 · on Nov 10, 2012

I honestly I can't think of a single example where I was unable to determine the right level of explanation for someone. I think it's a unique skill that even non-engineers are often only superficially proficient at; though on that note, I do think that engineers aren't exposed enough to communication topics in University - most CS classes don't require group work, don't require verbal explanation of complex systems, etc. [1]

Reading, writing, and any sort of public speaking (especially debate if there's an opportunity) are TREMENDOUS for teaching you how to get into other peoples' heads, figure out how they're perceiving what you're saying, and adjusting what you're saying. I will sit in most of my CS classes and be PAINED when the class spends 10 minutes on a question because there is a disconnect between the misassumptions the student is making and the instructor thinks something else is the culprit.

These skills are also incredibly helpful in being charming and getting what you want without being manipulative.

[1]: Which is why some, like mine, created programs to blend CS and business to start to solve that problem. Students have to interact with paying customers, have to be accountable for their own code releases, are responsible for ALL of the requirement soliciting and fulfillment, etc.

flogic · on Nov 10, 2012

Wow that's special. Sounds to me like they're tools. You can't really expect an employee to know they're way around the code base after only 3 days. Hell most places you find yourself sitting on your thumbs the first week due to everyone being too busy to spend much time orienting you.

tsiki · on Nov 10, 2012

My thoughts exactly. I sometimes read about these people getting fired from IT jobs after a couple days and keep wondering why their bosses even hired new people. Either the boss is too overconfident in their ability to gauge someone's skills based on a couple of days, or they never really needed new people.

jedberg · on Nov 9, 2012

Suggestion: It would be great to have a page on your website that explains why RethinkDB is better than the other prevailing options. Right now I don't know why I'd want to invest time setting up yet another database.

coffeemug · on Nov 9, 2012

Thanks -- will do in the next few days.

xutopia · on Nov 9, 2012

What's the elevator pitch? Maybe we can help you with those advantages if you can tell us right now.

jdoliner · on Nov 9, 2012

The elevator pitch is: "Mongo's ease of use without the gotchas." We have a nice simple to use query language and quick setup process. But things like analytic queries like map reduce don't lock up the entire database. Our product aims to not be a ticking time bomb of technical debt.

kinleyd · on Nov 10, 2012

Sounds like an excellent elevator pitch.

dorkitude · on Nov 10, 2012

Mongo has a pretty low scale ceiling and a nasty distribution story.

Can you talk a bit about how RethinkDB compares?

stock_toaster · on Nov 9, 2012

Agreed. It would be very useful as a potential user.

As an example, I like how Basho provides some comparisons[1] of Riak vs other popular options.

http://docs.basho.com/riak/latest/references/appendices/comp...

alexpopescu · on Nov 9, 2012

We're already working on something in this direction.

alex @ rethinkdb

eventdroid · on Nov 9, 2012

yeh, that would be interesting to know.

coffeemug · on Nov 10, 2012

Hey guys, Slava here. I've been up since yesterday, so I'm going to clock out (though some of the team members are still lurking here). I wanted to thank everyone for great feedback. We're working hard to improve Rethink over the next few months. FYI, you can always hop on IRC (#rethinkdb on freenode) or github tracker (https://github.com/rethinkdb/rethinkdb/issues) with questions and we'll help you out.

biturd · on Nov 10, 2012

Thanks for this work, it looks really nice.

I was looking at the github comments about a home brew recipe in which it was stated that aside from a recipe creating a VM, the Mac OS X port would take a bit longer.

Is that a full port from one language to another? Or just an issue of the different flavors of *nix that need dealing with and probably some of the dependency tree issues that come with it?

I'm curious what needs be done to get it building on Mac OS X — perhaps I could assist somehow.

I see a few dependencies that don't immediately sound familiar. You may have better luck with MacPorts, which uses tcl as the language for their portfiles.

Portfiles are just like homebrews recipes, but MacPorts always builds new, including the entire dependency tree ( and dependencies of dependencies etc., etc. ), for which they have thousands of working portfiles. Since those are completed and working, you wouldn't have to worry about those until you wanted to be able to make a binary outside of any package manager.

MacPorts can build binaries now ( new feature ), so you could just as easily instruct it to create a standard Mac OS X installer .pkg which makes sure everything goes in the right place, on the right platform, for the right architecture.

They are an exceedingly friendly and helpful group, I'm sure they would live to see this software in their package/portfiles list.

coffeemug · on Nov 10, 2012

Most of the issues come from kernel differences. Some are relatively big, like epoll vs. kqueue, and some are very subtle (some syscalls have subtly different behavior on strange edge-case scenarios). I don't think building this natively would be too hard, but in the context of everything else left to do, it isn't an absolutely trivial project.

i386 · on Nov 10, 2012

Seems to me that most of us who have used MacPorts have moved to Homebrew or that could just be the bubble I'm living in. Is there anyone still who still uses MacPorts who could chime in and say why they never made the switch?

j-kidd · on Nov 10, 2012

Come on, Homebrew doesn't even have gcc.

I am not a Mac user, but a designer using MacBook joined our team last week, and we struggled for half a day with Homebrew. The next day, we installed MacPorts instead, and with just:

$ sudo port install python27 py27-virtualenv gcc46

we were able to proceed and get the whole stack up and running. Not to mention everything from MacPorts is installed nicely under /opt/local.

MacPorts is just way ahead of Homebrew. OTOH, Portage is way ahead of MacPorts ;)

statictype · on Nov 10, 2012

Interesting. This has not been my experience at all.

I had to fight for days to get MacPorts to install anything properly. It gives me flashbacks to the horrors from 3-4 years ago of compiling open source software on Linux.

Homebrew has been fuzzy kittens in comparison.

benregn · on Nov 10, 2012

I'm a relatively new Mac user and never used MacPorts. When I search brew I get these results:

$ brew search gcc

apple-gcc42 gcc

homebrew/versions/gcc45 homebrew/versions/llvm-gcc28

I'm a bit confused since I thought that these are gcc. Can someone tell me what those results mean? :S

0x10c0fe11ce · on Nov 10, 2012

This is the way to go:

    $ brew tap homebrew/dupes
    $ brew install gcc --enable-all-languages

To use your new gcc-4.7.2 when installing new packages just add '--use-gcc' at the end of the command.

tbrock · on Nov 10, 2012

Hahah don't be naive. Use homebrew. Macports has been broken for years.

merlincorey · on Nov 10, 2012

clang is better than gcc, IMO

irahul · on Nov 10, 2012

> clang is better than gcc, IMO

Maybe. Maybe not. But I don't think that's the reason homebrew doesn't have gcc. The OP is pointing out homebrew isn't extensive, and misses some commonly used utilities.

hyperbovine · on Nov 10, 2012

clang either does not build or defectively builds certain things on OS X, for instance Ruby 1.9.3. I had to acquire vanilla GCC for this reason the other day, and was relieved to find it in HomeBrew.

getsat · on Nov 11, 2012

Old-ass versions of Clang would build Ruby and PostgreSQL (client) binaries which would segfault upon execution. Try grabbing the latest XCode/command line tools package and you should be fine. I've been running Ruby 1.9.3 with Clang for quite a while.

brew install rbenv ruby-build; /* rc file shenanigans */; rbenv install 1.9.3-p327; rbenv global 1.9.3-p327; ruby --version

i386 · on Nov 10, 2012

Ruby 1.9.3 working fine here and built with clang. What was defective?

dman · on Nov 10, 2012

By what metric?

johncoltrane · on Nov 10, 2012

I never did the switch because I never had to complain about MacPorts. Last time I looked at Homebrew it had a very small number of receipes compared to MacPorts' ports.

I ran a fairly convincing "Linux-like" alternative desktop using Awesome in XQuartz for about a year before I switched to Linux full time. That was before Homebrew but I'm quite certain that it would have been impossible with it.

adolfoabegg · on Nov 10, 2012

Why would I switch to homebrew? I've been using Macports for 2 years now and I never had any issues with it. It works. Also migrating from Macports to homebrew (in case there's a good reason to do it) it would be a painful experience. I would have to start from scratch right?

mnutt · on Nov 10, 2012

Personally, I migrated to homebrew due to the ease of writing packages compared to macports. But I happen to enjoy ruby more than custom DSLs. It's probably a division similar to Chef/Puppet.

datasink · on Nov 11, 2012

Having switched and then switched back, I will agree with you. If you're inclined to write your own package installers, Homebrew is worlds better than MacPorts. But otherwise, MacPorts has many more packages that-just-work.

malkia · on Nov 10, 2012

I've had only troubles with Homebrew. For example recent upgrade, and the system no longer works, I had to manually clean up folders.

MacPorts seems better to me, after years of fink in the past. I need to build for universal (386/x86_64) for testing purposes, so it fits well for me.

I actually rebuild stuff later myself, since I can't really package stuff and require people to have that in /opt/local/bin or anywhere else, but a local folder to the main app.

(I use the same way cygwin on windows, like macports - I love the tools, the stuff, I test a lot of things, but afterall for things I want to distribute I compile myself, and post binaries).

ikawe · on Nov 10, 2012

based on the sub comments, I'd have to say YMMV.

codewright · on Nov 9, 2012

I'm hoping this'll be a viable replacement for MongoDB. (Sparse/Schema-free is incredibly useful for me, as is JSON-centric modeling)

jedberg already asked for a compare/contrast, but let me provide some specifics I care about that you might be able to answer.

1. Is it fair to say that thanks to MVCC, running an aggregation or map-reduce job isn't going to lock the whole damn thing up like it does on MongoDB?

2. You've got a distributed system that is seemingly CP, do the availability/consistency semantics compare with HBase? Master-slave? Replication? Sharding?

3. Latency is a big one for us and is a large part of why we use ElasticSearch. How does the read-latency on RethinkDB compare with Mongo/MySQL/Redis/et al ?

coffeemug · on Nov 9, 2012

1. Yes -- that was the main motivation for MVCC. We wanted to allow people to use rethinkdb for analytics and map/reduce on top of the realtime system without dealing with having to replicate data into something else.

2. Short answer: we favor consistency (via master/slave under the hood). It allows for much easier API, much fewer issues in production, etc. The user experience is just better. If you're ok with out of date results, you can do that too without paying the price of consistency guarantees. The downsite of our design is that you might lose write availability in case of netsplits (if the client is on the wrong side of the split). Longer answer: checkout the FAQ at http://www.rethinkdb.com/docs/advanced-faq/

3. Read latency should be equivalent to other comparable master/slave systems. We don't do quorums, so latency will be much better than quorum/dynamo-based designs.

rbranson · on Nov 9, 2012

I want to preface my comment: this is impressive work, congratulations on shipping, and this is what MongoDB should have been from the start.

In reality, most transactional database deployments are heavily skewed towards read workload, so reading from hot slaves is basically a requirement for master/slave databases. So, in most real world applications at scale, apps already deal with inconsistencies between slaves and the master and are making the "difficult" choice of dealing with CAP trade-offs. Asynchronous replication also creates a potential for difficult or impossible to recover from data loss in the sense that masters & slaves always have a continuous possibility for split-brain.

RethinkDB does not provide multi-shard transaction atomicity and/or isolation, which in my experience is the biggest difficulty thrown up in front of developers coming from single-node databases. I feel like the difficulty of dealing with inconsistencies across multiple versions of a single object is far more familiar as most developers have at least dealt with cache invalidation in some form. It's really having to ensure and deal with potentially out of order operations (inconsistency in the ACID sense) across a "graph" of data that's more insidious.

coffeemug · on Nov 9, 2012

I mostly agree with what you're saying, but I also think there's enormous value in making easy things be really easy. Even with today's state of the art adding a shard, dealing with consistency issues, adding replicas, etc. is relatively hard. Perhaps not in a computer-sciency sense (all the problems are fairly well understood), but in an operational sense. Lots and lots of work needs to be done even with systems like MongoDB, let alone with MySQL. And once you're done with that, you can't really run complicated queries easily, so you have to solve that problem.

We set out to make these things be really easy (whether we succeed or not remains to be seen). We want the users not to have to deal with these issues at all whenever possible. You should be able to set up a cluster, add shards, and run cross-shard joins and aggregation in five minutes.

Of course once that problem is solved, there are tougher problems like high-performance cross-document distributed ACID, but I think the industry as a whole is relatively far away from that right now. (there are some solutions to this - e.g. Clustrix, but they require specialized hardware which makes it out of reach for most developers)

Nitramp · on Nov 10, 2012

there are tougher problems like high-performance cross-document distributed ACID, but I think the industry as a whole is relatively far away from that right now

Megastore and Spanner solve that problem, with varying tradeoffs:

http://research.google.com/pubs/pub36971.html

http://research.google.com/archive/spanner.html

erichocean · on Nov 10, 2012

Our internal database does too (with a different design than Spanner, but stuff still comes "online" atomically for everyone across the globe at the same time, with similar latency). Unlike FoundationDB, and like Spanner, we're doing it with complex object graphs, not just key-values, and we also do it with consistent secondary indexing (I'm not sure if Spanner supports this or not).

This isn't "the future", this is now. People are doing it, and have been for awhile. If you're going to "rethink the database", distributed global consistency should be at the top of your list today. RethinkDB seems like its merely "rethinking" Mongo.

The main benefit of global consistency, of course, is ease of use. Global consistency is so much easier to reason about and write code for!

nlavezzo · on Nov 16, 2012

Hi Erich - I'm curious... you refer to your "internal database" doing distributed ACID but state that you're not with Google. Can you say who you're with? It's interesting to us to know who is also working on this problem.

nlavezzo · on Nov 9, 2012

Congratulations on shipping - looks like a very well thought out product.

Regarding your last comment on high-performance distributed ACID, that's what we've built at FoundationDB, although FoundationDB is a key value store so transactions are multi/cross-key instead of cross-document.

stavros · on Nov 10, 2012

By the way, how safe is the JS interpreter? Can you get into trouble by running untrusted code in map/reduce queries?

coffeemug · on Nov 10, 2012

JS interpreter (V8 under the hood) runs in a process pool -- similar to a thread pool, but outside of the core rethinkdb process. The code running in the JS interpreter cannot corrupt memory or crash the rethinkdb process (if it crashes, rethinkdb will simply start another v8 process). You also can't write from js executed on the server, so the data is safe (though I think it's more of a limitation than a feature).

Currently if you write an infinite loop in js, or write code in a way where it starts eating up memory we don't do anything to restart the js process, but it would be relatively easy to implement.

stavros · on Nov 10, 2012

I see, thanks. I'm definitely interested in that, as I'm developing http://www.instahero.com and the current approach isn't very scalable, so I'm evaluating alternatives. RethinkDB looks like a good candidate so far.

jbellis · on Nov 9, 2012

I'll ask the obvious question not in the FAQ: How is this different from MongoDB?

coffeemug · on Nov 9, 2012

Hey, this is Slava, founder of rethinkdb. There are some obvious high level differences:

* A far more advanced query language -- distributed joins, subqueries, etc. -- almost anything you can do in SQL you can do in RethinkDB

* MVCC -- which means you can run analytics on your realtime system without locking up

* All queries are fully parallelized -- the compiler takes the query, breaks it up, distributes it, runs it in parallel, and gives you the results

But beyond that, details matter. Database system differ on what they make easy, not what they make possible. We spent an enormous amount of time on building the low-level architecture and working on a seamless user experience. If you play with the product, I think you'll see these differences right away.

Note: rethink is a new product, so it'll inevitably have quirks. We'll fix all the bugs as quickly as we can, but it'll take a few months to iron things out that didn't come up in testing.

mej10 · on Nov 9, 2012

What do you see as the potential areas where RethinkDB will shine?

Also, I am excited to try this out. I always enjoyed your writings and I am sure you + team have made something awesome.

jdoliner · on Nov 9, 2012

Joe Doliner - Engineer at RethinkDB here. RethinkDB is designed for small teams with big data challenges. When you're just starting up a new project ideally you want to just boot your database up and start throwing data at it without worrying about schema. However with other products on the market, most notably Mongo, there are a lot of features that stop working when you get to a large scale. We've been very careful in developing RethinkDB to make sure that small teams who use our product aren't going to need to rewrite code once their dataset starts growing. As coffeemug mentions above we support fully parallelized queries. This means that when your dataset grows you can add more servers to speed up analytic queries. We feel this a valuable feature for small teams.

mej10 · on Nov 9, 2012

Thanks Joe.

parhamn · on Nov 9, 2012

Hah that question is almost word for word in the faq: http://www.rethinkdb.com/docs/faq/

mej10 · on Nov 9, 2012

Yep, you got me. I hadn't read the FAQ. Thanks for that.

fooyc · on Nov 9, 2012

> All queries are fully parallelized

Does it means that every query touches all servers ? Or does it sends queries to only a subset of servers when possible ? (e.g. range queries on PK)

jdoliner · on Nov 9, 2012

Joe Doliner - RethinkDB engineer here. > Does it means that every query touches all servers ? No.

> Or does it sends queries to only a subset of servers when possible ? (e.g. range queries on PK)

The query planner distributes the query between the nodes that actually contain the relevant data. Here are a few examples:

In your example, a range get on the primary key, the query would touch one copy of each shard of the table. *

A more interesting example is a map reduce query. That query will also only touch one copy of each shard of the table but the mapping and reduction phases will also happen on those shards which makes the whole process a lot faster.

continuations · on Nov 9, 2012

> In your example, a range get on the primary key, the query would touch one copy of each shard of the table.

But shouldn't it be fewer than "each shard"?

Let's say the range is 3 < PK < 7. If all PKs in that range only lives in 2 shards (out of a total of say 10 shards) then the query should only be run in those 2 shards, no? Or will all 10 shards still be touched by the query?

coffeemug · on Nov 9, 2012

Correct, the query will only touch two shards in this case.

spf13 · on Nov 10, 2012

This is exactly how normal, map reduce and aggregation queries work in a sharded MongoDB cluster.

While it's true that on a single node MongoDB map reduce is single threaded, it is parallelized when running on a sharded cluster.

lacker · on Nov 9, 2012

One apparent difference between RethinkDB and MongoDB is that in RethinkDB, you can only index on the primary key. I imagine secondary indexes will be coming along soon.

coffeemug · on Nov 10, 2012

We're planning to add secondary indexes in the next few releases. Doing them well in distributed systems is really hard. It's relatively easy to check off a box and introduce it as a feature, to make it actually behave well and allow for good performance is incredibly tricky. This is why they didn't make it into this release -- since it's a core feature of every db, we wanted to take the time to do this right.

sutro · on Nov 9, 2012

How does RethinkDB perform when compared to open-source distributed databases built with hate?

kermatt · on Nov 10, 2012

Maybe not built with hate, but used in anger?

ww520 · on Nov 10, 2012

Congratulate on releasing. Well done!

A few questions:

1. Will secondary indices be ever supported? Range scan with a different order than the primary key is very welcomed. E.g. date range query.

2. Do you support conditional update? Or any kind of optimistic locking or versioning to coordinate concurrent updates from different clients?

3. Related to 2. How can loosely-sequential Id be generated using a table?

4. Will some transaction support be added? Don't need full ACID, just grouping updates (intra-table and/or inter-tables) in one shot would be nice. Should be feasible with MVCC already in place.

5. Do all the clients hit a central server to initiate queries which then farms out the requests to different shards? Or the client library knows how to get to different shards directly? First case has a single-point-of-failure, and bottleneck in scaling.

6. Do you support automatically re-balancing of shard data (data migration) when new shards are added or old ones retired?

7. How are authentication and authorization done? Or any clients can come in?

8. Internal detail. For out-of-date distributed query on the slave replicas, is there a cost-based (or load-based) decision process to pick the most idle replica to do the sub-query?

9. Internal detail. Do you use Bloom Filter to optimize distributed joins?

coffeemug · on Nov 10, 2012

1. Yes. It's a matter of doing this right, which will take some time.

2. Yes. There is no special command, you just combine update and branch (http://www.rethinkdb.com/api/#py:control_structures-branch) Here's an example in Python:

  r.table('foo').get(5).update({ 'bar': r.branch(r['baz'] == 0, 1, 2)})

This will set attribute bar to 1 if baz is 0, or to two 2 otherwise. Everything is atomic on that document.

3. Currently the server doesn't support a sequential (or even loosely sequential) id autogeneration. You'd have to do that on the clients, but using a timestamp for example.

4. I don't know yet how to do this really efficiently. It's relatively easy to do on a single shard, but cross-shard boundaries make this really hard.

5. Any client can connect to any server. The server will then parse and route the query. There is no central server, everything is peer-to-peer. The client library doesn't know about multiple servers now, so responsibility is on the user to hit a random server. Alternatively you can run "rethinkdb proxy" on localhost and connect the client to that. The proxy will then route queries to proper nodes in the cluster.

6. In the web UI, if you click on the table and reshard, everything will be rebalanced. You don't even have to add or remove shards, it'll just rebalance data for the number of shards you have. The UI has a bar graph with shard distribution, so you can see how balanced things are.

7. Currently there is no authentication support - we expect users to use proper firewall/ssh tunneling precautions.

8. Yes, that's how queries get routed. Currently this isn't very smart, but it will get much better over time. If something breaks for you performance-wise, just reach out and we'll fix it.

9. No, not yet. If you run eq_join on a small subset of the data (99% of OLTP workloads) it will be very fast. Other joins work ok, but there's A LOT of room for optimization.

Phew!

ww520 · on Nov 10, 2012

Thanks for your and jdoliner's detail answers! Hope I didn't ask too many questions. :) I'll respond to both here.

For 2 and 3, I think I didn't make it clear. Let me clarify. A common db problem with multiple clients is dealing with concurrent update on the same piece of data. E.g both client1 and client2 read D as D=15 at the same time. Client1 adds 1 to D as 16 and saves it. Then client2 adds 1 to D as 16 and save it as 16, which is wrong. It should be 17.

Conditional update is one feature db usually provides to let clients deal with this problem, i.e. the update would only go through if certain condition is met otherwise abort. Update D=16 if D==15. Client1 would succeed while client2 would fail, where it can retry the whole read-increment-update cycle again with the new read value.

The litmus test to see if a db system can handle this problem is to try to implement a sequential Id generation feature run by multiple clients at the same time.

For 8, if the query is parsed into a query execution plan, you can ship the plan to all equivalent replicas to ask them to estimate the execution cost based on their current load. After they reply, pick the lowest cost one and send the execute command. Even a simple approach of asking for machine load of all replicas and picking the lowest one could have adaptive utilization of all the servers.

For 9, Bloomer Filter is a relative simple technique that can dramatically reduce the amount of data to ship across peers to do join. You basically filter out the vast majority of the non-matching data before shipping.

It's a good start. Good luck going forward!

Guillaume86 · on Nov 10, 2012

Your exemple of conditional update can be addressed using atomic update:

r.table('tv_shows') .filter({ name: 'Star Trek TNG' }) .update({ episodes: r('episodes').add(1) }) .run()

http://www.rethinkdb.com/docs/advanced-faq/#atomic

ww520 · on Nov 11, 2012

I think the atomicity model here works like a transaction on the whole document, where all the changes to the attributes of a document are updated all at once.

The scenario I described has to do with read-consistency, where the value read by a client should not be changed during the time of the read and the time of the update. The usual way of handling it was to take a write lock for the duration to prevent update from others but that degrades concurrency. The other way is to do optimistic lock (or conditional update) to allow the client to detect change during the time and retry with the new value.

coffeemug · on Nov 11, 2012

My point was that you don't have to do that with rethink because the entire query gets executed on the server. You don't have to take the value down to the client, make the change, and then send it back. The entire update gets evaluated on the server and the server handles atomicity in various ways (depending on the query).

ww520 · on Nov 12, 2012

That approach would only work if all the logic to compute the update can be expressed in the update query. It will break down if the read-eval-update cycle involves the client. There are many scenarios involved the clients.

E.g. the client reads a value, displays to the user, gets input from the user which is based on the old value, and stores the updated value. If another user doing the same thing has already changed it, the client would like to know that and let the user retry, with the new current value.

muhqu · on Nov 12, 2012

I think you, ww520, have a very well point here and I'm also interesten in what RethinkDB can offer for this very usage scenario. From what I read from the ReQL command reference there it should be possible to do something like:

  r.table('foo').get(5).update({ 'bar': r.branch(r['baz'] == 0, "foo", r.error("invalid baz!"))})

have not tested it, but this is how I understand it...

muhqu · on Nov 12, 2012

Do you think something like the following should work with RethinkDB?

  r.table('foo')
   .get(5)
   .update({
     '_rev': r.branch(r['_rev'] == 5,
       r('_rev').add(1),
       r.error("invalid revision")
     ),
     'name': "awesome name"
   })

the basic idea is that `name` should be update to "awesome name" and `_rev` should be incremented by 1, but only if `_rev` is 5, otherwise an "invalid revision" error should be thrown.

ww520 · on Nov 13, 2012

That would work. I didn't realize you can raise error on the row. Good work!

coffeemug · on Nov 12, 2012

Yes, this will work.

muhqu · on Nov 12, 2012

awesome, thanks!

DanielRibeiro · on Nov 10, 2012

Did they not launch a while ago: http://techcrunch.com/2011/06/06/rethinkdb-expands-beyond-ss... ?

ww520 · on Nov 10, 2012

Well any product release is a huge effort, especially database product. Things got done and pushed out of door. Congratulation well deserved.

jdoliner · on Nov 10, 2012

> 1. Will secondary indices be ever supported? Range scan with a different order than the primary key is very welcomed. E.g. date range query.

Secondary indices are one of the most asked for features so they'll probably be added in the next release. No promises though secondary indices are tough to do right and we won't ship them if they're not great.

> 2. Do you support conditional update? Or any kind of optimistic locking or versioning to coordinate concurrent updates from different clients?

Updates can be done with conditions on the row. For example: table.filter(lambda x: x['age'] > 25).update(lambda x: {"salary" : x["salary"] + 25)

> 3. Related to 2. How can loosely-sequential Id be generated using a table?

Loosely-sequential IDs would have to be generated client side for now.

> 4. Will some transaction support be added? Don't need full ACID, just grouping updates (intra-table and/or inter-tables) in one shot would be nice. Should be feasible with MVCC already in place.

Eventually. No concrete timeline for this right now though.

> 5. Do all the clients hit a central server to initiate queries which then farms out the requests to different shards? Or the client library knows how to get to different shards directly? First case has a single-point-of-failure, and bottleneck in scaling.

A client makes a connection to a specific server and all queries go through that server. However every server can file this role so connections can be distributed and there's no single point of failure. An even better option is to run a proxy on the same machine as the client. For more info run:

rethinkdb --help proxy

> 6. Do you support automatically re-balancing of shard data (data migration) when new shards are added or old ones retired?

Right now sharding is a manual process. You tell the server how many shards you want and it handles figuring out how to evenly split the data, picking machines to host them and getting the data where it needs to go. What it doesn't do is readjust the split points when the data distribution changes. This will be a feature in RethinkDB 1.3.

> 7. How are authentication and authorization done? Or any clients can come in?

RethinkDB has no authentication built in to it. You should not allow people you don't trust to have access to it.

8. Internal detail. For out-of-date distributed query on the slave replicas, is there a cost-based (or load-based) decision process to pick the most idle replica to do the sub-query?

Right now we just select randomly. This is slated as a potential upgrade for 1.3. Especially if it proves to be a problem for people. Thus far it hasn't been for us in profiling runs but this is the type of problem that's more likely to show up in real world workloads.

9. Internal detail. Do you use Bloom Filter to optimize distributed joins?

We do not currently use bloom filters to optimize this.

continuations · on Nov 9, 2012

* In the previous incarnation of rethinkdb the focus was on maximizing performance on SSDs. Is this still the case - does rethinkDB perform better than other databases on SSDs? Do you have any benchmark numbers?

* How does rethinkdb compare to MySQL Cluster? Both are distributed, replicated databases with a sql-like query language.

* Any plan to offer a java client?

coffeemug · on Nov 9, 2012

* The SSD-optimized storage engine is running under the clustering engine. I'm wary of saying 'better' or 'worse' in case of benchmarks, because they're really tricky to do right. We'll be publishing well-researched benchmarks as soon as we can, but it will take time.

* RethinkDB has flexible schemas and a query language that integrates straight into the host programming language and doesn't require string interpolation. As far as clustering goes, RethinkDB is a) really really really easy to use, and b) does a lot of query parallelization and distribution that MySQL cluster doesn't do. The product feels totally different, I think in a good way. The downside, of course, is that rethink is new and it will take some time to work out all the kinks.

* I can't commit to a timeline yet, but yes, absolutely.

Xorlev · on Nov 9, 2012

I'm impressed that you're taking the time to do proper, well-researched benchmarks. They're really tough to get right. It really comes down to your own specific workload that matters anyways.

This feels new and refreshing, I hope things turn out positively for you.

coffeemug · on Nov 9, 2012

Thank you!

erichocean · on Nov 10, 2012

I find JSON-oriented databases to be a huge limitation for writing applications managing any kind of financial data, due to the lack of a decimal number type and a timestamp/date type, both of which SQL provides (and are used A LOT).

Sure, you can put that stuff in strings, but then you'll run into limitation with queries where you want to, e.g., aggregate a total, or do timestamp arithmetic.

I could do everything with strings, custom map-reduce, etc., if you're inclined to suggest that as a workaround. Still doesn't mean JSON's a good idea.

alexpopescu · on Nov 10, 2012

We thought of supporting data types that are not part of JSON (date/time/timestamp/deltas, etc.), but we wanted to take the time to do it right so these didn't make it to this version.

alex @ rethinkdb

erichocean · on Nov 10, 2012

The other thing that bothers me about all these new JSON databases is they aren't really novel anymore.

Clustered databases are essentially a solved problem, and have been for years. What's needed today are databases solving the problem that Google Spanner addresses – global consistency across distributed clusters in separate data centers. If you want a challenge in the DB world, that's where it is.

But another clustered, schema-less JSON database? Might as well open up Intro to Algorithms and run through the exercises -- it's no longer a challenge, algorithmically or otherwise.

Sorry to be a downer on this, and it does still take a strong coder to implement one, so well done on that front. :)

jchrisa · on Nov 10, 2012

Try keeping it running while growing to millions of users in weeks. The simple data model lets us focus on elasticity and performance. There's a lot more to production quality software than the algorithms, but there are only a few NoSQL databases that get the algorithms right.

erichocean · on Nov 10, 2012

Sure, and I'm not trying to imply the RethinkDB guys are writing shoddy code or anything. For all I know the thing is bug-free with fantastic performance, perfect linear scaling with both number of cores and number of nodes in the cluster, and really does let you run your analytic workload on the same cluster you're taking transactions on (though I really doubt this last one – running analytics on your transactional database tends to slow transaction latency to a crawl).

That said, with a name like RethinkDB, I guess I expect more than a feature list I could have reasonably put together three years ago and gone, yeah, that's straightforward to do.

I've written my own database (and continue to improve it), so I'm pretty familiar with the issues involved. You're absolutely right that many of these JSON database have serious problems under load with their clustering abilities (and it's always under load, they tend to work fine on simple workloads).

Perhaps RethinkDB can carve out a niche for reliability-under-load among the existing JSON DB field. That's got to be worth something.

gruseom · on Nov 10, 2012

You say "yeah, that's straightforward to do" and also you "really doubt" that their claims are true?

Reminds me of Freud's story about the peasant who says to another, "Hey, you broke that kettle I lent you", and the other says, "It was fine when I gave it back to you, it was already broken when you lent it to me, and I never borrowed it."

erichocean · on Nov 10, 2012

Running analytics on a database is both "really straightforward to do" and at the same time, I "really doubt" that anyone would actually do both analytics and transactions on the same database instance in production.

Why? Analytics are CPU hogs, tend to access tons of data in random fashion (blowing caches and hogging the SSD drive), and given that RethinkDB has no secondary indexing, are likely to be especially slow.

That's why people have separate machines dedicated to analytics. What I think a team would actually do with RethinkDB is the same thing people do with Cassandra: include a separate cluster (in the same or a remote datacenter) and replicate data to it from the transactional cluster(s). They would then run analytics on the analytics cluster.

This approach won't impact transactional latency, and also allows you to have different hardware altogether for running analytics (e.g. tons of cores and RAM that might go wasted on the transactional DB machines).

This is all Big Data 101; it's not controversial.

gruseom · on Nov 10, 2012

The RethinkDB guys made it clear in this thread that although they don't have secondary indexes in this release, they will definitely be adding them. They also explained why.

oh_sigh · on Nov 10, 2012

Just store it in a string. Or a special map structure: {"date": "2012-11-01", "format":"YYYY-MM-DD" }

cuu508 · on Nov 10, 2012

Or you could use fixed point math: 600 in database means $6.00 Then aggregates and comparison operators would work, but you would have to decide upfront how much precision you might ever need

erichocean · on Nov 10, 2012

That sounds good, but with only 53 bits of integer precision in JSON (51 if you move the decimal point to account for cents), there's just not enough digits for finance these days.

A similar problem exists if you use JSON numbers (aka doubles) for timestamps –- the numbers just aren't big enough to do it accurately.

ksec · on Nov 10, 2012

Interesting, that is something i not think about much, any solution to that?

haberman · on Nov 11, 2012

JSON is not by definition limited to 53 bits of precision -- the standard itself does not specify range or precision. In practice, most implementations represent a JSON number with a double though.

shrughes · on Nov 11, 2012

Maybe one that wants to keep working through high inflation scenarios.

(But what financial software is running on a NoSQL database?)

haberman · on Nov 11, 2012

I think perhaps you replied to the wrong comment?

I didn't pose the question "what financial app needs >53 bits of precision?"

shrughes · on Nov 11, 2012

Yes I did, thank you.

moe · on Nov 10, 2012

Excuse me, what financial application rounds to 51 digits?

leif · on Nov 10, 2012

And doesn't moving the decimal point chop off between 6 and 7 bits?

szopa · on Nov 9, 2012

Nice work! It seems that you are well aware of the tradeoffs that you are taking and communicating it openly in your documentation (and your choices seem to be very reasonable). I really like the tone of your communication – it seems essentially BS/koolaid free.

1. How much data can you put in one instance before seeing performance degradation? I know that you still working on good benchmarks – but do you have any ballpark figures?

2. How does replication work? Is it closer to row/document or statement based (or something completely different)? How fast is the replication?

3. What is your envisioned used of the replication? Are replicas supposed to serve read traffic, or their goal is to keep the data safe in case of a catastrophe?

4. Can you tell me something more about cluster configuration propagation? The Advanced FAQ answer doesn't get into much detail.

5. Am I correct to assume that you are using protocol buffers? What motivated your choice?

jdoliner · on Nov 9, 2012

Hi, here to answer question number 4.

Short answer: Our configuration data is most similar to git. Any machine can be used as an administrative node via the WebUI or the CLI. It will make changes to the metadata which then get pushed to the other nodes. If 2 nodes make conflicting changes you get a conflict which the system will help you to merge.

Long Answer Cluster configuration is stored in semilattices which are a neat mathematical structure with a few very desirable properties. Semilattices support have a join operator. For our cluster metadata joining is the means by which metadata is updated. When one server connects to another the two swap metadata and each joins the other's metadata into his own. In essence learning what the other knows.

There are two properties in particular of the joining that are nice. First off joining is commutative. This means machines can exchange data in whatever order they want and get the same result at the end. Secondly they're indempotent. That means machines can resend their data without fear. The value doesn't change if the same value is joined in twice. These help us with a lot of the worries of distributed systems.

szopa · on Nov 9, 2012

Interesting. Do you have any way of checking that the change has actually propagated through the system before starting to act on it? Is the system consistent at all times?

If I understand correctly, the client can connect to any instance and its request will get routed appropriately. Let's assume that you take a master offline and promote one of the replicas to be a new master. Won't that lead to a window in which (from the point of view of different instances) there are two masters at the same time and some writes are sent to the wrong instance?

EDIT:

One solution for such things is to use something like Zookeeper (or some other system whose documentation mentions "Paxos" ;)). Have you considered that? How does what you are doing compare with that?

coffeemug · on Nov 10, 2012

Joe may be responding to this soon, but in the meantime I'll chime in. There is no way to verify the propagation reliably without either introducing strong performance inefficiencies (e.g. two phase commit protocol), or divergence (paxos, semi lattices, etc.) In our implementation we're using immediately consistent algorithms for data, but eventually consistent algorithms for cluster metadata. This means that if there is a metadata conflict, the user is presented with an issue (via the web ui or CLI) that they have to resolve. We'll also be adding automated resolution soon.

We basically have something very similar to zookeper baked into rethinkdb. We wrote it internally from scratch to better suit the needs of our architecture.

yxhuvud · on Nov 10, 2012

Now that you mention it, it would be very nice to have a database suited for configuration that behaved like git in that branching, restoring old states, reverting selected commits was built in, while at the same time supporting ACID features and replication?

Does anyone know if something like that exist?

coffeemug · on Nov 9, 2012

1. As long as active dataset everything fits in RAM, performance will be great. E.g. you can have terabytes of data but as long as the actively accessed dataset is < ~80GB the system will perform well. Once things get out of RAM, everything will work well if you have an SSD. On a rotational disk, performance will degrade very very quickly. This is a bottleneck on all modern databases, but clustering makes this problem go away because you can effectively increase the amount of RAM at linear cost by just adding more nodes (e.g. two nodes at 100GB of RAM each cost about four times less than one node with 200GB of RAM).

2. We do do block-level replication. On each node of the btree we store replication timestamps. When a node asks for new data, we can cull away parts of the tree the node has almost instantly. So replication is very very efficient for most OLTP workloads. We don't have statement-level replication yet, so if you do a range update on a large table, we'll have to replicate data block by block. It'll take a while to add statement-based replication - we'd have to do a pretty significant refactoring to make it happen.

3. Either. Replicas are great for failover -- if the master dies, you just failover and a replica picks up where the master left off. If you're ok with out-of-date reads, you can also hit replicas directly (e.g. for reports, etc.) and spread out the read load across the cluster.

4. This is a really complex question - we didn't document this because doing it properly would take a lot of time. I'll ask jdoliner to chime in -- he designed the architecture and wrote most of the code, perhaps he can describe it succinctly while we write deeper docs on this :)

5. We use protocol buffers between the client drivers and the server. We picked that because there were libraries for the initial three languages we picked (Ruby, JS, Python), they were really easy to use, and very efficient. We could also have a single spec for the client/server API. Internally we use our own serialization scheme which allows us to dump arbitrary C++ objects on the network. It doesn't support other languages (which we didn't need), but is much more versatile for writing complex cross-machine code.

jdoliner · on Nov 9, 2012

Small correction to this. There actually wasn't a library for JS we had to write that for ourselves.

pc · on Nov 9, 2012

This took tenacity. Congrats on shipping.

jamesli · on Nov 9, 2012

Great work! One question: is there any manual that explains the implementation details of the internals? Some manual similar to those Oracle, MySQL, Postgres, etc. provide?

The only docs I found in the company website that goes deep into the internals are Advanced FAQ (http://www.rethinkdb.com/docs/advanced-faq/). It is more of an architecture view, though.

The reason I ask is that with a good understanding on the internals, the engineers who understand database internals and distributed systems will have an "more" accurate idea on the capabilities and the limits of the features. Thus, if they decide to adopt RethinkDB, the understanding will help them design their applications to take advantages of the benefits and avoid the potential issues (or surprises!). MongoDB was not very good at documentation. It claims this or that feature works smoothly. Then, people found out many potential issues and limitations. That is one reason it leaves a bad tastes to many engineers.

coffeemug · on Nov 9, 2012

There currently isn't, beyond the advanced FAQ. This isn't by design -- writing really good detailed architecture papers takes a lot of time, and we were 100% focused on getting the product out. We'll get much better at documenting the internals, but it will take some time.

harryh · on Nov 9, 2012

If I were you guys I'd strongly consider adding support for hashing of the shard key. There are many cases where you care about distributing your writes(1) a lot more than fast range queries on the PK.

-harryh

1. Yes, I know there are other ways to do this besides hashing the shard key, but this is often the best way.

coffeemug · on Nov 9, 2012

We actually support hash sharding underneath -- each range shard is further broken down into hash shards internally to support multicore scalability. This isn't exposed to the users currently, but it can be. Another option is to allow the user to provide a hash function on the PK (I think this is what you're suggesting).

We'll be addressing this at some point, we have to sort through the list of feature requests first. It's a long list :)

harryh · on Nov 9, 2012

I don't feel the need to provide the hash function, as long as it's something reasonable. I understand that it's a long list! Just adding a vote to one particular item. Good luck!

tjic · on Nov 9, 2012

What the heck does "built with love" even mean?

Is this just a hipster marketing term to tell us that it's small and cute and made by people who play ukuleles and ride unicycles in their spare time, and not by evil corporate people who commute to work and have mortgages?

I find a lot of advertising eyeroll inducing, and the current trend of more-hipster-than-thou posturing is right at the top.

gruseom · on Nov 9, 2012

You could not have gotten these guys more wrong. They are serious technologists who have been working day and night for years to build something that they deeply believe in. Every hacker's heart should be warmed by the fact that they kept at it.

When you have a vision of something great that ought to exist and set about bringing it into the world, you are in an isolated position: other people don't yet see what you see. This leads to a lot of doubt by others and by yourself too. The longer it takes, the more exposed you are. To make it through that you are going to need a deeper source of motivation – an underground spring. Love is a fine word for this, and it makes me happy that Slava put it in his title: it's a clue to this experience that rarely gets mentioned, especially in the land of pivots and MVPs and weekend hacks.

daslee1969 · on Nov 9, 2012

cannot agree more. saw firsthand how hard this team kept at it!

jacquesm · on Nov 9, 2012

Releasing a project like this means working very hard for a long time without anybody patting you on the back saying you're doing a good job. It's exhausting, a labour of love.

Get it now?

haberman · on Nov 9, 2012

As someone who's been doing that for several years on a side project that I believe in, this really hit home.

whyleyc · on Nov 10, 2012

I reckon $1.2 million in funding helped soften the pain :)

skeletonjelly · on Nov 12, 2012

True, but nothing beats positive reinforcement, for me at least. The whole happiness vs money thing.

VeejayRampay · on Nov 10, 2012

You look like an angry person man, chill the fuck out.

Rule of thumb is if you build something this nice and with that order of magnitude in complexity you can put My Little Poney stickers on your homepage and still get respect. Who cares about the "attitude" and the "language" for Christ's sake, they BUILT stuff with their own hands and are offering it to the world, they can do whatever they damn please.

benatkin · on Nov 9, 2012

Since it's under the AGPL it will mostly be built by people that have been vetted. By switching to this people are one step closer to having a machine that boots without assholes. http://rusty.ozlabs.org/?p=196

ketralnis · on Nov 10, 2012

What does hipster even mean here?

oh_sigh · on Nov 10, 2012

"People who I don't like and people who I think they think they're cooler than I am"

amazedsaint · on Nov 10, 2012

Made with love = Made not just for money, but Made for making the world a better place.

shykes · on Nov 9, 2012

I am very excited about this. The RethinkDB team is rock-solid and the market is only going to get bigger.

I particularly like the perspective of an easy onramp to get started, knowing that I will never have to leave because of scale or reliability.

Please, please give me a SQL adapter! My marketing team needs SQL. My business app developers need SQL. Give them an adapter and I will get them to use RethinkDB - knowing that 1) my data is safe and I'm not 6 months away from a painful re-architecture and migration, and 2) as my developers hit the limits of SQL they can gradually (gradually!) peel the paint off and start using your more powerful query language.

sandstrom · on Nov 11, 2012

On the contrary, I hope you [RethinkDb] don't spend time on an SQL adapter. I'd rather see time spent on improvements to the database itself.

haberman · on Nov 9, 2012

Is schemaless a win over an object schema like a JSON schema (or a Protocol Buffer .proto file)?

Schemaless is clearly a convenience win over SQL because SQL's way of modeling nested/repeated data doesn't map as easily onto programming languages. But for all the people who are using JSON-based databases these days, I'm curious how many of them couldn't easily write a JSON schema or a .proto file that describes their de facto schema.

I ask because a lot of things become easier to reason about (and optimize) if you know that a field won't be a string in one record and a number in another. And writing a .proto file (or equivalent JSON schema) would give you an authoritative place to document what all the fields actually mean.

I don't have any actual experience with JSON-based databases, so I was interested to hear the opinions of people who do.

coffeemug · on Nov 9, 2012

There is of course no fundamental reason why JSON-based db's has to be schemaless. This is one interesting direction that might be worth exploring.

m0th87 · on Nov 9, 2012

I would love a system that is schema-less by design, but has guards that can be enforced at insert/update. That way, the underlying data structures don't have to be locked up from complex migrations (as needed w/ ALTER TABLE), but you still get type safety. A migration instead would simply involve a change in guards and an asynchronous update of existing entries. Plus you'd get all the wins of something resembling optional types: you only enforce guards if you want.

jdoliner · on Nov 10, 2012

This is a feature we've talked a lot about. Another idea we think is interesting is having the database detect schema such that users could see a readout that said: 100% of your documents have a integer field named "foo" would you like to make this a schema constraint?

rgarcia · on Nov 10, 2012

At Clever we do this with mongo + mongoose. Mongoose is janky in places but it's great for type safety and migrations like you describe.

embwbam · on Nov 9, 2012

this. I don't like SQL columns. They make life hard. But I'm spending time learning TypeScript specifically so I can add some types/schemas to my JavaScript.

That doesn't mean I want to deal with the implementation detail of columns, but I definitely wouldn't mind some type safety.

ollysb · on Nov 10, 2012

I hear a lot of talk about how hard it is to maintain sql schemas and columns. I've used mongodb on projects previously and while it was interesting I didn't find that the lack of schema made life any easier. Writing a migration in rails is so easy, I just can't understand how managing a schema really makes life more difficult. What is it about managing a schema that makes people so eager to jump to schema-less?

m0th87 · on Nov 9, 2012

How do filters work? They seem pretty difficult implementation-wise since you can write them in any of the language bindings. My first guess is that you pipe all the data in a table to the client, and the client itself does the filtration. But this would be extraordinarily inefficient.

jdoliner · on Nov 9, 2012

Piping all the data to the client would be extremely inefficient. Fortunately we don't do that.

When a filter is written in the client language it gets compiled into a protocol buffer which is sent to the cluster. This gets compiled into a query which is sent to each of the relevant shards for the table. This query has the filter baked right into it. The shards then go through their local copy of the data and filter out the rows which do not meet the query predicate. This data gets returned to the coordinating node and eventually to the user. Thus only the data the will actually be returned is ever transferred over the network.

Furthermore this process is done lazily. On the client side rather than getting back a huge array with the results of your filter you get back an iterator. This iterator stores a buffer of data which will be refilled as it is incremented.

coffeemug · on Nov 10, 2012

To add to jdoliner's answer, the reason why you can write table('foo').filter(lambda x: x['bar'] > 5).run() is because we do some language trickery on the client side to compile the query to an AST. In this case, we overload greater than operator, call the lambda function once on the client with a special object, and return an AST. This AST is then sent to the server and executed there.

It's rather difficult to integrate into a host language like that smoothly from the driver implementation perspective, but once the driver is written the user experience is amazing because you can write queries that look exactly like Python, but they're executed entirely on the server.

lucian1900 · on Nov 11, 2012

I find I prefer SQLAlchemy's Table.query.filter(Table.bar > 5) to a lambda that gets compiled to an AST in an odd way.

coffeemug · on Nov 11, 2012

You can do that too: r.table('foo').filter(r['bar'] > 5)

The use of r in filter is getting the attribute bar of the row.

lucian1900 · on Nov 11, 2012

That's pretty cool.

Have you found that there are useful expressions that are awkward to express without the lambda trick?

coffeemug · on Nov 11, 2012

Lambda is necessary because when you do nested subqueries, saying r['x'] is ambiguous and can cause all sorts of unpleasantness. So, if you use nested queries, the server rejects the implicit syntax and requires the use of lambda.

Lambda syntax is really nice too, I actually prefer it for writing queries.

lucian1900 · on Nov 11, 2012

I find the lambda trick not explicit and obvious enough. I fear I would do something stupid like trigger a side-effect without realising.

Again taking an example from SQLAlchemy, you can explicitly make a subquery, and then reference it instead of the original Table. A binding more like SQLAlchemy can probably written for RethinkDB.

m0th87 · on Nov 10, 2012

Does the function have to be referentially transparent?

coffeemug · on Nov 10, 2012

No. E.g., you could write r.table('foo').update(lambda row: row.merge({'bar': row['bar'] + 1 })). A shortcut for this is r.table('foo').update({'bar': r['bar'] + 1 }). Neither is referentially transparent, and both work.

jdoliner · on Nov 10, 2012

I believe both of those functions are referential transparent because they're pure functions. An example of a non referential transparent function would be:

r.table('foo').update(lambda row: {'bar' : r.table('bar').get(row["bar_id"]))

This still works but gets evaluated in a different way to make sure every secondary winds up with the same value.

m0th87 · on Nov 10, 2012

This is beyond awesome, and thanks for the follow-ups!

old-gregg · on Nov 9, 2012

Nope, they build an AST for filter expressions and compile it on the server IIRC. The client gets filtered data from the server.

ch0wn · on Nov 9, 2012

This looks really interesting. I'm interested to see how their license choice works out. The server is AGPL-licensed while the drivers are under Apache 2.0. This should at least avoid the issues we all know from libmysqlclient.

tbrock · on Nov 10, 2012

Same licensing that 10gen uses for MongoDB (AGPL) and drivers (apache).

jedahan · on Nov 9, 2012

Last I heard RethinkDB was a tail-append style engine for MySQL that was optimized for SSDs. Interesting to see a drastic pivot like this. Looks good, and good luck.

javajosh · on Nov 10, 2012

What is the business model, if any? (This question is not addressed in the FAQ, and I believe has at least some relevance to the longevity/shape of the reDB community over time.)

Also, have you talked to the Meteor folks about swapping Mongo out for this? Or would this be 'newness overload'?

bjhoops1 · on Nov 9, 2012

The querying capabilities here look amazing. Having to manually figure out how do joins and group by in something like CouchDB is really a pain, but this looks really slick. Very impressed and I will be trying this out!

jdoliner · on Nov 10, 2012

We spent a long time trying to reimplement other people's protocols but with our engine underneath. Being able to have features like this is one of the things that eventually convinced us it would be worth it to control our own. Bear in mind that our joins are also distributed which we think is really cool.

pspeter3 · on Nov 9, 2012

Are there any performance tips or information? This looks really cool

coffeemug · on Nov 9, 2012

Performance analysis of complex systems is really tricky, and good benchmarks are even trickier. We'll be working through posting numbers, docs, tips, etc. over the next few months. Documenting this well is very hard work, so it'll take a little bit of time.

In the meantime, you can always chat with us and we'll help you work through any issues you might run into.

shawn-butler · on Nov 9, 2012

Can I provide an intelligent sharding algorithm in place of a naive partition by primary key? I always get into trouble encoding things into primary keys eventually.

coffeemug · on Nov 9, 2012

Not at the moment, unfortunately.

masukomi · on Nov 9, 2012

Yeah, I'm curious too as to how it stacks up against MongoDB performance-wise and why I'd want to use one over the other.

3amOpsGuy · on Nov 9, 2012

I'm sure they'll provide performance examples in due course but we should be conscious how difficult it is to do this in practice.

Sure they can take the shortcut and just produce some single case and claim 'faster than a.n. other db' and that will probably satisfy the 'my db is faster than your db' brigade.

However, based on the effort that appears to have gone into this product to get it to this stage, they probably wouldn't be happy with that.

More power to them!

hcarvalhoalves · on Nov 10, 2012

It seems very interesting, and having to deal with ORMs daily makes me appreciate the clean API.

I feel being based on JSON is a big con though. While it's popular, it was never meant to be a rich serialization format, just simple. How to implement more complex fields like dates, and query efficiently on RethinkDB?

shrughes · on Nov 11, 2012

A benefit to JSON is that people know immediately what it is. It certainly does have its deficiencies (binary data being one that outweighs dates in terms of immediate importance). Being limited to strict JSON is not a permanent decision (I'm saying this as a member of the RethinkDB engineering team), it's a conservative one in terms of API design, and in terms of limiting scope for the first release.

krob · on Nov 9, 2012

I'd really like to know why they don't have a PHP library, considering it powers half the web. The key shouldn't be to promote someone elses own language when using infrastructure products but rather to support everyone in using their tool. Mongo supports everything under the sun, so should this.

alexpopescu · on Nov 9, 2012

... and it will.

We had to start from somewhere and we also wanted to get a feel of what are the most requested libraries so we can focus our energy in those directions.

alex @ rethinkdb

d0m · on Nov 9, 2012

I absolutely love the website. Congrats on the public launch. In the FAQ, I would suggest a "How do you compare with Mongo?" I've read the intro, the faq and a couple of quick guides to find out what was different (read better). If I'm a happy mongo user, why would I switch to RethinkDB?

alexpopescu · on Nov 9, 2012

1. If you are a happy mongo user, we'd still be happy to hear your feedback about RethinkDB.

2. We're putting together some comparisons, hope to add them to the site soon.

3. There are already a couple of answers to this question on this thread:

https://news.ycombinator.com/item?id=4764137

https://news.ycombinator.com/item?id=4763939

alex @ rethinkdb

banachtarski · on Nov 9, 2012

Would love a comparison of this to couchbase which seems to have a similar sharded distributed setup.

Congrats on shipping!

Heff · on Nov 9, 2012

Congrats on the launch guys!

nnash · on Nov 10, 2012

  joe@alchemist~$ rethinkdb
  joe@clockwerk~$ rethinkdb -j alchemist:29015

Dota player?

coffeemug · on Nov 10, 2012

Oh yes!

stavros · on Nov 10, 2012

Wanna get a game going sometime?

I play DOTA 2, by the way, hopefully you do too.

coffeemug · on Nov 10, 2012

We have at least three dota players in the office. We should ge this going!

joevandyk · on Nov 9, 2012

Why would you use this over PostgreSQL, especially with pg's new json support?

tcwc · on Nov 9, 2012

JSON support in Postgres is currently limited to a validated plain text field, it doesn't let you efficiently query inside the json object.

joevandyk · on Nov 9, 2012

There's nothing stopping you from writing functions that can index json documents. plv8 makes this easy.

shawn-butler · on Nov 9, 2012

While technically correct I assume the parent was referring to hstore [0] which has both GIST and GIN indexes as well as btree and hash for equality.

With some simple formatting functions (its just json after all) it's sixes to me.

[0] http://www.postgresql.org/docs/9.2/static/hstore.html

dbuxton · on Nov 9, 2012

hstore is great, I won't hear a word against it, but it doesn't deal with complex nested objects like JSON can (I would love to be wrong about this) - it's just a key/value store with indexing of the objects inside.

arzvi · on Nov 9, 2012

Query language and ruby like function-chaining are what I feel the selling factors. I like the ease with which I added a node to the cluster. But naming the version as Rashomon scares me..

atombender · on Nov 10, 2012

A release named Rashomon should be the one where they introduce versioning support, in my opinion.

dkhenry · on Nov 9, 2012

I don't see much on the documentation on indexes. Also this looks awesome, I would love to see a option to let it be eventually consistent and still keep the great querying ability.

coffeemug · on Nov 9, 2012

We currently support primary key indexing, but no secondary indexes. This is definitely planned -- it will take a few months to get this out.

EDIT: also, you can run queries with an out_of_date_ok flag, which will give you what you want. This only works for read queries though, the architecture is pretty much set up in a way where this would be very very difficult to do for write queries.

perfunctory · on Nov 9, 2012

Please stop using json as a data model. I have no idea how to represent dates, or timestamps, or colors, or any other unsupported data type.

crazygringo · on Nov 9, 2012

I have nothing to do with RethinkDB, but what are you talking about? Just represent them as strings. (What databases support colors as native types, anyways?) If you format dates YYYY-MM-DD, then you can do string comparisons for ranges.

And JSON has the huge advantage of supporting hierarchical data -- arrays with objects inside, etc. It seems a like a huge step forward.

kemiller · on Nov 9, 2012

But there's no standard way of representing the dates as strings, or indicating that this field here is a date and not a string that happens to look like one, so nothing can rely on what you do. You can represent literally anything with a string, but you lose type information when you do.

cpeterso · on Nov 9, 2012

ISO 8601 is a standard way of representing dates as strings, e.g. "2012-11-09T08:08Z".

rdtsc · on Nov 10, 2012

There is a standard for representing string (see sibling comment) and your type information is the field name. So you know your document has a type if you give it a d_type field for example and then based on that you know your tstamp field is the date.

kemiller · on Nov 13, 2012

I'm not saying I can't imagine how to work around it, I'm saying that we're forced to, and likely will do so in a variety of competing ways, which degrades its usefulness as an interchange format. It means your tools, at the end points or in the middle, don't know what the data in that field is and so can't do anything useful for you that you haven't explicitly taught it to do.

inportb · on Nov 10, 2012

Hopefully, your application knows that this field here is a date encoded in a string.

lttlrck · on Nov 10, 2012

Or unix epoch if you want an easy life when it comes to querying.

optymizer · on Nov 9, 2012

So because you have no idea how to represent some arbitrary things with JSON, we should all stop using it? You, sir, are not making any sense.

Also, you seem to be confusing primitive data types with complex data types. Yes, JSON doesn't have a 'color' data type. But guess what? Neither does C, nor Java. If you want a 'color' type, you'll have to create one yourself! Mind blowing, I know.

So here, let me suggest a possible solution:

dates: string

timestamp: integer

colors: string (RGB,BGR,RGBA,...), integer, object

Part of 'data modeling' is to model your data out of basic types. Shocking! If JSON had types for every type of object under the Sun (like you seem to want), JSON parsers would be a lot more complicated and little thinking would be required in the process of modeling your data.