
RethinkDB: An open-source distributed database built with love over three years - coffeemug
http://www.rethinkdb.com
======
mmorearty
I don't know much about RethinkDB yet, but I will say that I have been a big
fan (online) of one of its founders, Slava Akhmechet, for years. I've never
met him, but he wrote some terrific articles on his website,
<http://www.defmacro.org/> , a few years ago. Start at the bottom of the list
of articles, with "The Nature of Lisp."

Slava is a deep thinker, which makes me very excited to take a look at
RethinkDB.

~~~
jgw
Indeed - he mentions in the article that he set himself a goal to convert 10
programmers into Lispers. Sounds like he probably has that many just in this
thread! Kudos, sir!

~~~
samstokes
He didn't make me a Lisper (I'm more a Haskell fan these days), but reading
Slava's Lisp articles years ago was a significant part of what set me on my
current career path.

He helped get me into functional programming, which got me a contract job [1],
which is how I met one of my current co-founders.

[1] [http://martin.kleppmann.com/2009/09/18/the-python-paradox-
is...](http://martin.kleppmann.com/2009/09/18/the-python-paradox-is-now-the-
scala-paradox.html)

------
dxbydt
Thought I'll share this with you.

A yc company hired me. I showed up at their mountain view office. The founder
said "This is the former office of RethinkDB! I hope we are as successful as
them."

I didn't know who/what RethinkDB was, so I said ok, sure.

3 days later he asked me to clear my desk and leave. He said "You are the sort
of person who should work in RethinkDB".

So I asked "What does that mean ?"

He said "RethinkDB is trying to solve very deep algorithm problems. They want
somebody with CS knowledge to do deep research. That is what you are good at.
But here we are just trying to run a business. You are not a good fit for
that!".

So I left.

~~~
fragsworth
I know lots of engineers who have trouble talking to people who don't share
their knowledge. This problem is extremely pervasive - I'd say a good 25% or
more have this problem to some extent. It's not a good thing when this happens
- you need to be able to speak to laymen or you're gonna have a bad time.

I am going to go out on a limb here and suggest you try to work on being a bit
more practical. Don't complicate things for the sake of solving difficult
problems. Don't try to shower people with your engineering knowledge when it's
not necessary, and don't expect everyone to know everything you do. And don't
be an asshole about it either.

~~~
borplk
I adjust the level of details and technical stuff pretty well according to who
I am speaking to.

But sometimes someone with very limited knowledge of something asks me a
detailed question about X.

What they ask is too difficult and complex to be described in a simple way.
Either I have to overly simplify it which may insult them and will do no good
or I have to go on and step by step give them digestible chunks of explanation
which will inevitably be a bit technical even though I try to minimise that.

I think it's a two-way thing. The person should also consider their own level
of knowledge before asking for an explanation of something and adjust their
question based on that.

I don't ask my doctor to tell me why my heart does this and that because I
simply don't have the knowledge to be able to understand his answer. I ask, my
heart is doing ok? cool!

~~~
halfninety
Looks like I'm one of the people you are referring to. Like anyone with a
healthy dose of curiosity, I'm interested in anything that is, well,
interesting. I'm excited to meet people with expertise in various areas and
ask them questions. I don't expect to understand their answers in full but in
most cases I can still grab part of them. Based on my partial understanding in
the first answer I can ask a better question the next time, and after several
cycles I can probably learn something valuable (at least in the sense of
satisfying my own curiosity).

The point is, if you don't ask questions in areas you are not familiar with,
you will never become familiar to these areas. Well, unless you learn
everything from books and Wikipedia.

I'm not sure how many people see me as an annoyance, but at least I'm
consistent, in that if other people ask me questions in my expertise, I'm
happy to try my best and explain.

Oh, and if it's just impossible to reasonably answer my question in a way that
makes any sense to me, I expect you to just say it, and I'm happy with this.

~~~
borplk
You are not the person I'm referring to because you say you are happy with
that.

I do what you described too, in parties or whenever I have an opportunity for
a discussion with someone and I enjoy it and they ask me questions too and we
try our best to teach each other something which is perfectly ok and fun.

What I was referring to was mostly employee/boss situations where the boss
asks the employee about the details or internals of system X and then gets
pissed when the engineer can't explain it to him and blames it on them because
they were incapable of explaining complex things to non-technical people.

I mean they have to appreciate that there's a limit to how much you can
explain to non-technical people in simple terms. At some point it just doesn't
work, and either you have to use the big words, and concepts and assume
knowledge, or drop back to dead simple _insulting_ analogies. You see that
server boss? That's like a train! choo-choo!

------
jedberg
Suggestion: It would be great to have a page on your website that explains why
RethinkDB is better than the other prevailing options. Right now I don't know
why I'd want to invest time setting up yet another database.

~~~
coffeemug
Thanks -- will do in the next few days.

~~~
xutopia
What's the elevator pitch? Maybe we can help you with those advantages if you
can tell us right now.

~~~
jdoliner
The elevator pitch is: "Mongo's ease of use without the gotchas." We have a
nice simple to use query language and quick setup process. But things like
analytic queries like map reduce don't lock up the entire database. Our
product aims to not be a ticking time bomb of technical debt.

~~~
kinleyd
Sounds like an excellent elevator pitch.

------
coffeemug
Hey guys, Slava here. I've been up since yesterday, so I'm going to clock out
(though some of the team members are still lurking here). I wanted to thank
everyone for great feedback. We're working hard to improve Rethink over the
next few months. FYI, you can always hop on IRC (#rethinkdb on freenode) or
github tracker (<https://github.com/rethinkdb/rethinkdb/issues>) with
questions and we'll help you out.

~~~
biturd
Thanks for this work, it looks really nice.

I was looking at the github comments about a home brew recipe in which it was
stated that aside from a recipe creating a VM, the Mac OS X port would take a
bit longer.

Is that a full port from one language to another? Or just an issue of the
different flavors of *nix that need dealing with and probably some of the
dependency tree issues that come with it?

I'm curious what needs be done to get it building on Mac OS X — perhaps I
could assist somehow.

I see a few dependencies that don't immediately sound familiar. You may have
better luck with MacPorts, which uses tcl as the language for their portfiles.

Portfiles are just like homebrews recipes, but MacPorts always builds new,
including the entire dependency tree ( and dependencies of dependencies etc.,
etc. ), for which they have thousands of working portfiles. Since those are
completed and working, you wouldn't have to worry about those until you wanted
to be able to make a binary outside of any package manager.

MacPorts can build binaries now ( new feature ), so you could just as easily
instruct it to create a standard Mac OS X installer .pkg which makes sure
everything goes in the right place, on the right platform, for the right
architecture.

They are an exceedingly friendly and helpful group, I'm sure they would live
to see this software in their package/portfiles list.

~~~
i386
Seems to me that most of us who have used MacPorts have moved to Homebrew or
that could just be the bubble I'm living in. Is there anyone still who still
uses MacPorts who could chime in and say why they never made the switch?

~~~
j-kidd
Come on, Homebrew doesn't even have gcc.

I am not a Mac user, but a designer using MacBook joined our team last week,
and we struggled for half a day with Homebrew. The next day, we installed
MacPorts instead, and with just:

$ sudo port install python27 py27-virtualenv gcc46

we were able to proceed and get the whole stack up and running. Not to mention
everything from MacPorts is installed nicely under /opt/local.

MacPorts is just way ahead of Homebrew. OTOH, Portage is way ahead of MacPorts
;)

~~~
merlincorey
clang is better than gcc, IMO

~~~
hyperbovine
clang either does not build or defectively builds certain things on OS X, for
instance Ruby 1.9.3. I had to acquire vanilla GCC for this reason the other
day, and was relieved to find it in HomeBrew.

~~~
getsat
Old-ass versions of Clang would build Ruby and PostgreSQL (client) binaries
which would segfault upon execution. Try grabbing the latest XCode/command
line tools package and you should be fine. I've been running Ruby 1.9.3 with
Clang for quite a while.

brew install rbenv ruby-build; /* rc file shenanigans */; rbenv install
1.9.3-p327; rbenv global 1.9.3-p327; ruby --version

------
codewright
I'm hoping this'll be a viable replacement for MongoDB. (Sparse/Schema-free is
incredibly useful for me, as is JSON-centric modeling)

jedberg already asked for a compare/contrast, but let me provide some
specifics I care about that you might be able to answer.

1\. Is it fair to say that thanks to MVCC, running an aggregation or map-
reduce job isn't going to lock the whole damn thing up like it does on
MongoDB?

2\. You've got a distributed system that is seemingly CP, do the
availability/consistency semantics compare with HBase? Master-slave?
Replication? Sharding?

3\. Latency is a big one for us and is a large part of why we use
ElasticSearch. How does the read-latency on RethinkDB compare with
Mongo/MySQL/Redis/et al ?

~~~
coffeemug
1\. Yes -- that was the main motivation for MVCC. We wanted to allow people to
use rethinkdb for analytics and map/reduce on top of the realtime system
without dealing with having to replicate data into something else.

2\. Short answer: we favor consistency (via master/slave under the hood). It
allows for much easier API, much fewer issues in production, etc. The user
experience is just better. If you're ok with out of date results, you can do
that too without paying the price of consistency guarantees. The downsite of
our design is that you might lose write availability in case of netsplits (if
the client is on the wrong side of the split). Longer answer: checkout the FAQ
at <http://www.rethinkdb.com/docs/advanced-faq/>

3\. Read latency should be equivalent to other comparable master/slave
systems. We don't do quorums, so latency will be much better than
quorum/dynamo-based designs.

~~~
rbranson
I want to preface my comment: this is impressive work, congratulations on
shipping, and this is what MongoDB should have been from the start.

In reality, most transactional database deployments are heavily skewed towards
read workload, so reading from hot slaves is basically a requirement for
master/slave databases. So, in most real world applications at scale, apps
already deal with inconsistencies between slaves and the master and are making
the "difficult" choice of dealing with CAP trade-offs. Asynchronous
replication also creates a potential for difficult or impossible to recover
from data loss in the sense that masters & slaves always have a continuous
possibility for split-brain.

RethinkDB does not provide multi-shard transaction atomicity and/or isolation,
which in my experience is the biggest difficulty thrown up in front of
developers coming from single-node databases. I feel like the difficulty of
dealing with inconsistencies across multiple versions of a single object is
far more familiar as most developers have at least dealt with cache
invalidation in some form. It's really having to ensure and deal with
potentially out of order operations (inconsistency in the ACID sense) across a
"graph" of data that's more insidious.

~~~
coffeemug
I mostly agree with what you're saying, but I also think there's enormous
value in making easy things be _really_ easy. Even with today's state of the
art adding a shard, dealing with consistency issues, adding replicas, etc. is
relatively hard. Perhaps not in a computer-sciency sense (all the problems are
fairly well understood), but in an operational sense. Lots and lots of work
needs to be done even with systems like MongoDB, let alone with MySQL. And
once you're done with that, you can't really run complicated queries easily,
so you have to solve that problem.

We set out to make these things be _really easy_ (whether we succeed or not
remains to be seen). We want the users not to have to deal with these issues
at all whenever possible. You should be able to set up a cluster, add shards,
and run cross-shard joins and aggregation in five minutes.

Of course once that problem is solved, there are tougher problems like high-
performance cross-document distributed ACID, but I think the industry as a
whole is relatively far away from that right now. (there are some solutions to
this - e.g. Clustrix, but they require specialized hardware which makes it out
of reach for most developers)

~~~
Nitramp
_there are tougher problems like high-performance cross-document distributed
ACID, but I think the industry as a whole is relatively far away from that
right now_

Megastore and Spanner solve that problem, with varying tradeoffs:

<http://research.google.com/pubs/pub36971.html>

<http://research.google.com/archive/spanner.html>

~~~
erichocean
Our internal database does too (with a different design than Spanner, but
stuff still comes "online" atomically for everyone across the globe at the
same time, with similar latency). Unlike FoundationDB, and like Spanner, we're
doing it with complex object graphs, not just key-values, and we also do it
with consistent secondary indexing (I'm not sure if Spanner supports this or
not).

This isn't "the future", this is now. People are doing it, and have been for
awhile. If you're going to "rethink the database", distributed global
consistency should be at the top of your list today. RethinkDB seems like its
merely "rethinking" Mongo.

The main benefit of global consistency, of course, is ease of use. Global
consistency is _so much easier_ to reason about and write code for!

~~~
nlavezzo
Hi Erich - I'm curious... you refer to your "internal database" doing
distributed ACID but state that you're not with Google. Can you say who you're
with? It's interesting to us to know who is also working on this problem.

------
jbellis
I'll ask the obvious question not in the FAQ: How is this different from
MongoDB?

~~~
coffeemug
Hey, this is Slava, founder of rethinkdb. There are some obvious high level
differences:

* A far more advanced query language -- distributed joins, subqueries, etc. -- almost anything you can do in SQL you can do in RethinkDB

* MVCC -- which means you can run analytics on your realtime system without locking up

* All queries are fully parallelized -- the compiler takes the query, breaks it up, distributes it, runs it in parallel, and gives you the results

But beyond that, details matter. Database system differ on what they make
easy, not what they make possible. We spent an enormous amount of time on
building the low-level architecture and working on a seamless user experience.
If you play with the product, I think you'll see these differences right away.

Note: rethink is a new product, so it'll inevitably have quirks. We'll fix all
the bugs as quickly as we can, but it'll take a few months to iron things out
that didn't come up in testing.

~~~
mej10
What do you see as the potential areas where RethinkDB will shine?

Also, I am excited to try this out. I always enjoyed your writings and I am
sure you + team have made something awesome.

~~~
jdoliner
Joe Doliner - Engineer at RethinkDB here. RethinkDB is designed for small
teams with big data challenges. When you're just starting up a new project
ideally you want to just boot your database up and start throwing data at it
without worrying about schema. However with other products on the market, most
notably Mongo, there are a lot of features that stop working when you get to a
large scale. We've been very careful in developing RethinkDB to make sure that
small teams who use our product aren't going to need to rewrite code once
their dataset starts growing. As coffeemug mentions above we support fully
parallelized queries. This means that when your dataset grows you can add more
servers to speed up analytic queries. We feel this a valuable feature for
small teams.

~~~
mej10
Thanks Joe.

------
sutro
How does RethinkDB perform when compared to open-source distributed databases
built with hate?

~~~
bsg75
Maybe not built with hate, but used in anger?

------
ww520
Congratulate on releasing. Well done!

A few questions:

1\. Will secondary indices be ever supported? Range scan with a different
order than the primary key is very welcomed. E.g. date range query.

2\. Do you support conditional update? Or any kind of optimistic locking or
versioning to coordinate concurrent updates from different clients?

3\. Related to 2. How can loosely-sequential Id be generated using a table?

4\. Will some transaction support be added? Don't need full ACID, just
grouping updates (intra-table and/or inter-tables) in one shot would be nice.
Should be feasible with MVCC already in place.

5\. Do all the clients hit a central server to initiate queries which then
farms out the requests to different shards? Or the client library knows how to
get to different shards directly? First case has a single-point-of-failure,
and bottleneck in scaling.

6\. Do you support automatically re-balancing of shard data (data migration)
when new shards are added or old ones retired?

7\. How are authentication and authorization done? Or any clients can come in?

8\. Internal detail. For out-of-date distributed query on the slave replicas,
is there a cost-based (or load-based) decision process to pick the most idle
replica to do the sub-query?

9\. Internal detail. Do you use Bloom Filter to optimize distributed joins?

~~~
coffeemug
1\. Yes. It's a matter of doing this right, which will take some time.

2\. Yes. There is no special command, you just combine update and branch
(<http://www.rethinkdb.com/api/#py:control_structures-branch>) Here's an
example in Python:

    
    
      r.table('foo').get(5).update({ 'bar': r.branch(r['baz'] == 0, 1, 2)})
    

This will set attribute bar to 1 if baz is 0, or to two 2 otherwise.
Everything is atomic on that document.

3\. Currently the server doesn't support a sequential (or even loosely
sequential) id autogeneration. You'd have to do that on the clients, but using
a timestamp for example.

4\. I don't know yet how to do this really efficiently. It's relatively easy
to do on a single shard, but cross-shard boundaries make this really hard.

5\. Any client can connect to any server. The server will then parse and route
the query. There is no central server, everything is peer-to-peer. The client
library doesn't know about multiple servers now, so responsibility is on the
user to hit a random server. Alternatively you can run "rethinkdb proxy" on
localhost and connect the client to that. The proxy will then route queries to
proper nodes in the cluster.

6\. In the web UI, if you click on the table and reshard, everything will be
rebalanced. You don't even have to add or remove shards, it'll just rebalance
data for the number of shards you have. The UI has a bar graph with shard
distribution, so you can see how balanced things are.

7\. Currently there is no authentication support - we expect users to use
proper firewall/ssh tunneling precautions.

8\. Yes, that's how queries get routed. Currently this isn't very smart, but
it will get much better over time. If something breaks for you performance-
wise, just reach out and we'll fix it.

9\. No, not yet. If you run eq_join on a small subset of the data (99% of OLTP
workloads) it will be very fast. Other joins work ok, but there's A LOT of
room for optimization.

Phew!

~~~
ww520
Thanks for your and jdoliner's detail answers! Hope I didn't ask too many
questions. :) I'll respond to both here.

For 2 and 3, I think I didn't make it clear. Let me clarify. A common db
problem with multiple clients is dealing with concurrent update on the same
piece of data. E.g both client1 and client2 read D as D=15 at the same time.
Client1 adds 1 to D as 16 and saves it. Then client2 adds 1 to D as 16 and
save it as 16, which is wrong. It should be 17.

Conditional update is one feature db usually provides to let clients deal with
this problem, i.e. the update would only go through if certain condition is
met otherwise abort. Update D=16 if D==15. Client1 would succeed while client2
would fail, where it can retry the whole read-increment-update cycle again
with the new read value.

The litmus test to see if a db system can handle this problem is to try to
implement a sequential Id generation feature run by multiple clients at the
same time.

For 8, if the query is parsed into a query execution plan, you can ship the
plan to all equivalent replicas to ask them to estimate the execution cost
based on their current load. After they reply, pick the lowest cost one and
send the execute command. Even a simple approach of asking for machine load of
all replicas and picking the lowest one could have adaptive utilization of all
the servers.

For 9, Bloomer Filter is a relative simple technique that can dramatically
reduce the amount of data to ship across peers to do join. You basically
filter out the vast majority of the non-matching data before shipping.

It's a good start. Good luck going forward!

~~~
Guillaume86
Your exemple of conditional update can be addressed using atomic update:

r.table('tv_shows') .filter({ name: 'Star Trek TNG' }) .update({ episodes:
r('episodes').add(1) }) .run()

<http://www.rethinkdb.com/docs/advanced-faq/#atomic>

~~~
ww520
I think the atomicity model here works like a transaction on the whole
document, where all the changes to the attributes of a document are updated
all at once.

The scenario I described has to do with read-consistency, where the value read
by a client should not be changed during the time of the read and the time of
the update. The usual way of handling it was to take a write lock for the
duration to prevent update from others but that degrades concurrency. The
other way is to do optimistic lock (or conditional update) to allow the client
to detect change during the time and retry with the new value.

~~~
coffeemug
My point was that you don't have to do that with rethink because the entire
query gets executed on the server. You don't have to take the value down to
the client, make the change, and then send it back. The entire update gets
evaluated on the server and the server handles atomicity in various ways
(depending on the query).

~~~
muhqu
Do you think something like the following should work with RethinkDB?

    
    
      r.table('foo')
       .get(5)
       .update({
         '_rev': r.branch(r['_rev'] == 5,
           r('_rev').add(1),
           r.error("invalid revision")
         ),
         'name': "awesome name"
       })
    

the basic idea is that `name` should be update to "awesome name" and `_rev`
should be incremented by 1, but only if `_rev` is 5, otherwise an "invalid
revision" error should be thrown.

~~~
coffeemug
Yes, this will work.

~~~
muhqu
awesome, thanks!

------
continuations
* In the previous incarnation of rethinkdb the focus was on maximizing performance on SSDs. Is this still the case - does rethinkDB perform better than other databases on SSDs? Do you have any benchmark numbers?

* How does rethinkdb compare to MySQL Cluster? Both are distributed, replicated databases with a sql-like query language.

* Any plan to offer a java client?

~~~
coffeemug
* The SSD-optimized storage engine is running under the clustering engine. I'm wary of saying 'better' or 'worse' in case of benchmarks, because they're really tricky to do right. We'll be publishing well-researched benchmarks as soon as we can, but it will take time.

* RethinkDB has flexible schemas and a query language that integrates straight into the host programming language and doesn't require string interpolation. As far as clustering goes, RethinkDB is a) really really really easy to use, and b) does a lot of query parallelization and distribution that MySQL cluster doesn't do. The product feels totally different, I think in a good way. The downside, of course, is that rethink is new and it will take some time to work out all the kinks.

* I can't commit to a timeline yet, but yes, absolutely.

~~~
Xorlev
I'm impressed that you're taking the time to do proper, well-researched
benchmarks. They're really tough to get right. It really comes down to your
own specific workload that matters anyways.

This feels new and refreshing, I hope things turn out positively for you.

~~~
coffeemug
Thank you!

------
erichocean
I find JSON-oriented databases to be a huge limitation for writing
applications managing any kind of financial data, due to the lack of a decimal
number type and a timestamp/date type, both of which SQL provides (and are
used A LOT).

Sure, you can put that stuff in strings, but then you'll run into limitation
with queries where you want to, e.g., aggregate a total, or do timestamp
arithmetic.

I _could_ do everything with strings, custom map-reduce, etc., if you're
inclined to suggest that as a workaround. Still doesn't mean JSON's a good
idea.

~~~
erichocean
The other thing that bothers me about all these new JSON databases is they
aren't really novel anymore.

Clustered databases are essentially a solved problem, and have been for years.
What's needed today are databases solving the problem that Google Spanner
addresses – global consistency across distributed clusters in separate data
centers. If you want a challenge in the DB world, that's where it is.

But another clustered, schema-less JSON database? Might as well open up Intro
to Algorithms and run through the exercises -- it's no longer a challenge,
algorithmically or otherwise.

Sorry to be a downer on this, and it does still take a strong coder to
implement one, so well done on that front. :)

~~~
jchrisa
Try keeping it running while growing to millions of users in weeks. The simple
data model lets us focus on elasticity and performance. There's a lot more to
production quality software than the algorithms, but there are only a few
NoSQL databases that get the algorithms right.

~~~
erichocean
Sure, and I'm not trying to imply the RethinkDB guys are writing shoddy code
or anything. For all I know the thing is bug-free with fantastic performance,
perfect linear scaling with both number of cores and number of nodes in the
cluster, and really does let you run your analytic workload on the same
cluster you're taking transactions on (though I really doubt this last one –
running analytics on your transactional database tends to slow transaction
latency to a crawl).

That said, with a name like RethinkDB, I guess I expect more than a feature
list I could have reasonably put together three years ago and gone, yeah,
that's straightforward to do.

I've written my own database (and continue to improve it), so I'm pretty
familiar with the issues involved. You're absolutely right that many of these
JSON database have serious problems under load with their clustering abilities
(and it's always under load, they tend to work fine on simple workloads).

Perhaps RethinkDB can carve out a niche for reliability-under-load among the
existing JSON DB field. That's got to be worth something.

~~~
gruseom
You say "yeah, that's straightforward to do" and also you "really doubt" that
their claims are true?

Reminds me of Freud's story about the peasant who says to another, "Hey, you
broke that kettle I lent you", and the other says, "It was fine when I gave it
back to you, it was already broken when you lent it to me, and I never
borrowed it."

~~~
erichocean
Running analytics on a database is both "really straightforward to do" and at
the same time, I "really doubt" that anyone would actually do both analytics
and transactions on the same database instance in production.

Why? Analytics are CPU hogs, tend to access tons of data in random fashion
(blowing caches and hogging the SSD drive), and given that RethinkDB has no
secondary indexing, are likely to be especially slow.

That's why people have separate machines dedicated to analytics. What I think
a team would actually do with RethinkDB is the same thing people do with
Cassandra: include a separate cluster (in the same or a remote datacenter) and
replicate data to it from the transactional cluster(s). They would then run
analytics on the analytics cluster.

This approach won't impact transactional latency, and also allows you to have
different hardware altogether for running analytics (e.g. tons of cores and
RAM that might go wasted on the transactional DB machines).

This is all Big Data 101; it's not controversial.

~~~
gruseom
The RethinkDB guys made it clear in this thread that although they don't have
secondary indexes in this release, they will definitely be adding them. They
also explained why.

------
szopa
Nice work! It seems that you are well aware of the tradeoffs that you are
taking and communicating it openly in your documentation (and your choices
seem to be very reasonable). I really like the tone of your communication – it
seems essentially BS/koolaid free.

1\. How much data can you put in one instance before seeing performance
degradation? I know that you still working on good benchmarks – but do you
have any ballpark figures?

2\. How does replication work? Is it closer to row/document or statement based
(or something completely different)? How fast is the replication?

3\. What is your envisioned used of the replication? Are replicas supposed to
serve read traffic, or their goal is to keep the data safe in case of a
catastrophe?

4\. Can you tell me something more about cluster configuration propagation?
The Advanced FAQ answer doesn't get into much detail.

5\. Am I correct to assume that you are using protocol buffers? What motivated
your choice?

~~~
jdoliner
Hi, here to answer question number 4.

Short answer: Our configuration data is most similar to git. Any machine can
be used as an administrative node via the WebUI or the CLI. It will make
changes to the metadata which then get pushed to the other nodes. If 2 nodes
make conflicting changes you get a conflict which the system will help you to
merge.

Long Answer Cluster configuration is stored in semilattices which are a neat
mathematical structure with a few very desirable properties. Semilattices
support have a join operator. For our cluster metadata joining is the means by
which metadata is updated. When one server connects to another the two swap
metadata and each joins the other's metadata into his own. In essence learning
what the other knows.

There are two properties in particular of the joining that are nice. First off
joining is commutative. This means machines can exchange data in whatever
order they want and get the same result at the end. Secondly they're
indempotent. That means machines can resend their data without fear. The value
doesn't change if the same value is joined in twice. These help us with a lot
of the worries of distributed systems.

~~~
szopa
Interesting. Do you have any way of checking that the change has actually
propagated through the system before starting to act on it? Is the system
consistent at all times?

If I understand correctly, the client can connect to any instance and its
request will get routed appropriately. Let's assume that you take a master
offline and promote one of the replicas to be a new master. Won't that lead to
a window in which (from the point of view of different instances) there are
two masters at the same time and some writes are sent to the wrong instance?

EDIT:

One solution for such things is to use something like Zookeeper (or some other
system whose documentation mentions "Paxos" ;)). Have you considered that? How
does what you are doing compare with that?

~~~
coffeemug
Joe may be responding to this soon, but in the meantime I'll chime in. There
is no way to verify the propagation reliably without either introducing strong
performance inefficiencies (e.g. two phase commit protocol), or divergence
(paxos, semi lattices, etc.) In our implementation we're using immediately
consistent algorithms for data, but eventually consistent algorithms for
cluster metadata. This means that if there is a metadata conflict, the user is
presented with an issue (via the web ui or CLI) that they have to resolve.
We'll also be adding automated resolution soon.

We basically have something very similar to zookeper baked into rethinkdb. We
wrote it internally from scratch to better suit the needs of our architecture.

------
pc
This took tenacity. Congrats on shipping.

------
jamesli
Great work! One question: is there any manual that explains the implementation
details of the internals? Some manual similar to those Oracle, MySQL,
Postgres, etc. provide?

The only docs I found in the company website that goes deep into the internals
are Advanced FAQ (<http://www.rethinkdb.com/docs/advanced-faq/>). It is more
of an architecture view, though.

The reason I ask is that with a good understanding on the internals, the
engineers who understand database internals and distributed systems will have
an "more" accurate idea on the capabilities and the limits of the features.
Thus, if they decide to adopt RethinkDB, the understanding will help them
design their applications to take advantages of the benefits and avoid the
potential issues (or surprises!). MongoDB was not very good at documentation.
It claims this or that feature works smoothly. Then, people found out many
potential issues and limitations. That is one reason it leaves a bad tastes to
many engineers.

~~~
coffeemug
There currently isn't, beyond the advanced FAQ. This isn't by design --
writing really good detailed architecture papers takes _a lot_ of time, and we
were 100% focused on getting the product out. We'll get much better at
documenting the internals, but it will take some time.

------
harryh
If I were you guys I'd strongly consider adding support for hashing of the
shard key. There are many cases where you care about distributing your
writes(1) a lot more than fast range queries on the PK.

-harryh

1\. Yes, I know there are other ways to do this besides hashing the shard key,
but this is often the best way.

~~~
coffeemug
We actually support hash sharding underneath -- each range shard is further
broken down into hash shards internally to support multicore scalability. This
isn't exposed to the users currently, but it can be. Another option is to
allow the user to provide a hash function on the PK (I think this is what
you're suggesting).

We'll be addressing this at some point, we have to sort through the list of
feature requests first. It's a long list :)

~~~
harryh
I don't feel the need to provide the hash function, as long as it's something
reasonable. I understand that it's a long list! Just adding a vote to one
particular item. Good luck!

------
tjic
What the heck does "built with love" even mean?

Is this just a hipster marketing term to tell us that it's small and cute and
made by people who play ukuleles and ride unicycles in their spare time, and
not by evil corporate people who commute to work and have mortgages?

I find a lot of advertising eyeroll inducing, and the current trend of more-
hipster-than-thou posturing is right at the top.

~~~
gruseom
You could not have gotten these guys more wrong. They are serious
technologists who have been working day and night for years to build something
that they deeply believe in. Every hacker's heart should be warmed by the fact
that they kept at it.

When you have a vision of something great that ought to exist and set about
bringing it into the world, you are in an isolated position: other people
don't yet see what you see. This leads to a lot of doubt by others and by
yourself too. The longer it takes, the more exposed you are. To make it
through that you are going to need a deeper source of motivation – an
underground spring. Love is a fine word for this, and it makes me happy that
Slava put it in his title: it's a clue to this experience that rarely gets
mentioned, especially in the land of pivots and MVPs and weekend hacks.

~~~
daslee1969
cannot agree more. saw firsthand how hard this team kept at it!

------
shykes
I am very excited about this. The RethinkDB team is rock-solid and the market
is only going to get bigger.

I particularly like the perspective of an easy onramp to get started, knowing
that I will never have to leave because of scale or reliability.

Please, please give me a SQL adapter! My marketing team needs SQL. My business
app developers need SQL. Give them an adapter and I will get them to use
RethinkDB - knowing that 1) my data is safe and I'm not 6 months away from a
painful re-architecture and migration, and 2) as my developers hit the limits
of SQL they can gradually ( _gradually_!) peel the paint off and start using
your more powerful query language.

~~~
sandstrom
On the contrary, I hope you [RethinkDb] don't spend time on an SQL adapter.
I'd rather see time spent on improvements to the database itself.

------
haberman
Is schemaless a win over an object schema like a JSON schema (or a Protocol
Buffer .proto file)?

Schemaless is clearly a convenience win over SQL because SQL's way of modeling
nested/repeated data doesn't map as easily onto programming languages. But for
all the people who are using JSON-based databases these days, I'm curious how
many of them couldn't easily write a JSON schema or a .proto file that
describes their _de facto_ schema.

I ask because a lot of things become easier to reason about (and optimize) if
you know that a field won't be a string in one record and a number in another.
And writing a .proto file (or equivalent JSON schema) would give you an
authoritative place to document what all the fields actually mean.

I don't have any actual experience with JSON-based databases, so I was
interested to hear the opinions of people who do.

~~~
coffeemug
There is of course no fundamental reason why JSON-based db's has to be
schemaless. This is one interesting direction that might be worth exploring.

~~~
m0th87
I would _love_ a system that is schema-less by design, but has guards that can
be enforced at insert/update. That way, the underlying data structures don't
have to be locked up from complex migrations (as needed w/ ALTER TABLE), but
you still get type safety. A migration instead would simply involve a change
in guards and an asynchronous update of existing entries. Plus you'd get all
the wins of something resembling optional types: you only enforce guards if
you want.

~~~
jdoliner
This is a feature we've talked a lot about. Another idea we think is
interesting is having the database detect schema such that users could see a
readout that said: 100% of your documents have a integer field named "foo"
would you like to make this a schema constraint?

------
m0th87
How do filters work? They seem pretty difficult implementation-wise since you
can write them in any of the language bindings. My first guess is that you
pipe all the data in a table to the client, and the client itself does the
filtration. But this would be extraordinarily inefficient.

~~~
jdoliner
Piping all the data to the client would be extremely inefficient. Fortunately
we don't do that.

When a filter is written in the client language it gets compiled into a
protocol buffer which is sent to the cluster. This gets compiled into a query
which is sent to each of the relevant shards for the table. This query has the
filter baked right into it. The shards then go through their local copy of the
data and filter out the rows which do not meet the query predicate. This data
gets returned to the coordinating node and eventually to the user. Thus only
the data the will actually be returned is ever transferred over the network.

Furthermore this process is done lazily. On the client side rather than
getting back a huge array with the results of your filter you get back an
iterator. This iterator stores a buffer of data which will be refilled as it
is incremented.

~~~
coffeemug
To add to jdoliner's answer, the reason why you can write
table('foo').filter(lambda x: x['bar'] > 5).run() is because we do some
language trickery on the client side to compile the query to an AST. In this
case, we overload greater than operator, call the lambda function once on the
client with a special object, and return an AST. This AST is then sent to the
server and executed there.

It's rather difficult to integrate into a host language like that smoothly
from the driver implementation perspective, but once the driver is written the
user experience is amazing because you can write queries that look exactly
like Python, but they're executed entirely on the server.

~~~
lucian1900
I find I prefer SQLAlchemy's Table.query.filter(Table.bar > 5) to a lambda
that gets compiled to an AST in an odd way.

~~~
coffeemug
You can do that too: r.table('foo').filter(r['bar'] > 5)

The use of r in filter is getting the attribute bar of the row.

~~~
lucian1900
That's pretty cool.

Have you found that there are useful expressions that are awkward to express
without the lambda trick?

~~~
coffeemug
Lambda is necessary because when you do nested subqueries, saying r['x'] is
ambiguous and can cause all sorts of unpleasantness. So, if you use nested
queries, the server rejects the implicit syntax and requires the use of
lambda.

Lambda syntax is really nice too, I actually prefer it for writing queries.

~~~
lucian1900
I find the lambda trick not explicit and obvious enough. I fear I would do
something stupid like trigger a side-effect without realising.

Again taking an example from SQLAlchemy, you can explicitly make a subquery,
and then reference it instead of the original Table. A binding more like
SQLAlchemy can probably written for RethinkDB.

------
ch0wn
This looks really interesting. I'm interested to see how their license choice
works out. The server is AGPL-licensed while the drivers are under Apache 2.0.
This should at least avoid the issues we all know from libmysqlclient.

~~~
tbrock
Same licensing that 10gen uses for MongoDB (AGPL) and drivers (apache).

------
jedahan
Last I heard RethinkDB was a tail-append style engine for MySQL that was
optimized for SSDs. Interesting to see a drastic pivot like this. Looks good,
and good luck.

------
javajosh
What is the business model, if any? (This question is not addressed in the
FAQ, and I believe has at least some relevance to the longevity/shape of the
reDB community over time.)

Also, have you talked to the Meteor folks about swapping Mongo out for this?
Or would this be 'newness overload'?

------
bjhoops1
The querying capabilities here look amazing. Having to manually figure out how
do joins and group by in something like CouchDB is really a pain, but this
looks really slick. Very impressed and I will be trying this out!

~~~
jdoliner
We spent a long time trying to reimplement other people's protocols but with
our engine underneath. Being able to have features like this is one of the
things that eventually convinced us it would be worth it to control our own.
Bear in mind that our joins are also distributed which we think is really
cool.

------
pspeter3
Are there any performance tips or information? This looks really cool

~~~
coffeemug
Performance analysis of complex systems is really tricky, and good benchmarks
are even trickier. We'll be working through posting numbers, docs, tips, etc.
over the next few months. Documenting this well is very hard work, so it'll
take a little bit of time.

In the meantime, you can always chat with us and we'll help you work through
any issues you might run into.

~~~
shawn-butler
Can I provide an intelligent sharding algorithm in place of a naive partition
by primary key? I always get into trouble encoding things into primary keys
eventually.

~~~
coffeemug
Not at the moment, unfortunately.

------
hcarvalhoalves
It seems very interesting, and having to deal with ORMs daily makes me
appreciate the clean API.

I feel being based on JSON is a big con though. While it's popular, it was
never meant to be a rich serialization format, just simple. How to implement
more complex fields like dates, and query efficiently on RethinkDB?

~~~
shrughes
A benefit to JSON is that people know immediately what it is. It certainly
does have its deficiencies (binary data being one that outweighs dates in
terms of immediate importance). Being limited to strict JSON is not a
permanent decision (I'm saying this as a member of the RethinkDB engineering
team), it's a conservative one in terms of API design, and in terms of
limiting scope for the first release.

------
krob
I'd really like to know why they don't have a PHP library, considering it
powers half the web. The key shouldn't be to promote someone elses own
language when using infrastructure products but rather to support everyone in
using their tool. Mongo supports everything under the sun, so should this.

~~~
alexpopescu
... and it will.

We had to start from somewhere and we also wanted to get a feel of what are
the most requested libraries so we can focus our energy in those directions.

alex @ rethinkdb

------
d0m
I absolutely love the website. Congrats on the public launch. In the FAQ, I
would suggest a "How do you compare with Mongo?" I've read the intro, the faq
and a couple of quick guides to find out what was different (read better). If
I'm a happy mongo user, why would I switch to RethinkDB?

~~~
alexpopescu
1\. If you are a happy mongo user, we'd still be happy to hear your feedback
about RethinkDB.

2\. We're putting together some comparisons, hope to add them to the site
soon.

3\. There are already a couple of answers to this question on this thread:

<https://news.ycombinator.com/item?id=4764137>

<https://news.ycombinator.com/item?id=4763939>

alex @ rethinkdb

------
banachtarski
Would love a comparison of this to couchbase which seems to have a similar
sharded distributed setup.

Congrats on shipping!

------
Heff
Congrats on the launch guys!

------
nnash

      joe@alchemist~$ rethinkdb
      joe@clockwerk~$ rethinkdb -j alchemist:29015
    

Dota player?

~~~
coffeemug
Oh yes!

~~~
StavrosK
Wanna get a game going sometime?

I play DOTA 2, by the way, hopefully you do too.

~~~
coffeemug
We have at least three dota players in the office. We should ge this going!

------
joevandyk
Why would you use this over PostgreSQL, especially with pg's new json support?

~~~
tcwc
JSON support in Postgres is currently limited to a validated plain text field,
it doesn't let you efficiently query inside the json object.

~~~
shawn-butler
While technically correct I assume the parent was referring to hstore [0]
which has both GIST and GIN indexes as well as btree and hash for equality.

With some simple formatting functions (its just json after all) it's sixes to
me.

[0] <http://www.postgresql.org/docs/9.2/static/hstore.html>

~~~
dbuxton
hstore is great, I won't hear a word against it, but it doesn't deal with
complex nested objects like JSON can (I would love to be wrong about this) -
it's just a key/value store with indexing of the objects inside.

------
arzvi
Query language and ruby like function-chaining are what I feel the selling
factors. I like the ease with which I added a node to the cluster. But naming
the version as Rashomon scares me..

~~~
lobster_johnson
A release named Rashomon should be the one where they introduce versioning
support, in my opinion.

------
dkhenry
I don't see much on the documentation on indexes. Also this looks awesome, I
would love to see a option to let it be eventually consistent and still keep
the great querying ability.

~~~
coffeemug
We currently support primary key indexing, but no secondary indexes. This is
definitely planned -- it will take a few months to get this out.

EDIT: also, you can run queries with an out_of_date_ok flag, which will give
you what you want. This only works for read queries though, the architecture
is pretty much set up in a way where this would be very very difficult to do
for write queries.

------
perfunctory
Please stop using json as a data model. I have no idea how to represent dates,
or timestamps, or colors, or any other unsupported data type.

~~~
crazygringo
I have nothing to do with RethinkDB, but what are you talking about? Just
represent them as strings. (What databases support colors as native types,
anyways?) If you format dates YYYY-MM-DD, then you can do string comparisons
for ranges.

And JSON has the huge advantage of supporting hierarchical data -- arrays with
objects inside, etc. It seems a like a huge step forward.

~~~
kemiller
But there's no standard way of representing the dates as strings, or
indicating that this field here is a date and not a string that happens to
look like one, so nothing can rely on what you do. You can represent literally
anything with a string, but you lose type information when you do.

~~~
rdtsc
There is a standard for representing string (see sibling comment) and your
type information is the field name. So you know your document has a type if
you give it a d_type field for example and then based on that you know your
tstamp field is the date.

~~~
kemiller
I'm not saying I can't imagine how to work around it, I'm saying that we're
forced to, and likely will do so in a variety of competing ways, which
degrades its usefulness as an interchange format. It means your tools, at the
end points or in the middle, don't know what the data in that field is and so
can't do anything useful for you that you haven't explicitly taught it to do.

------
aterreno
From the homepage it already looks more usable than mongo, liking the query
syntax. Looking forward in trying the performances. Well done.

------
shin_lao
Congratulations! How do you plan on making money?

------
wildmXranat
If authors are lurking here, link from github to a quick start page is broken:
<http://www.rethinkdb.com/docs/guides/basic_quickstart.html>

GH issue <https://github.com/rethinkdb/rethinkdb/issues/2>

edit: fixed already!

~~~
coffeemug
Thanks -- fixed.

------
eknkc
So, no secondary indexes right?

Will doing a query like "age > 25" perform something equivalent to a full
table scan?

~~~
coffeemug
Yes, there are no secondary indexes right now (they're coming, but it will
take a little time). Currently "age > 25" will do a scan, but if the db is
sharded it will touch only the relevant shards and the query will be
completely parallelized. We also do internal sharding for multicore
performance, and the query will be parallelized across cores as well.

All of that is only good for relatively small amounts of data though, so we'll
be adding secondary indexes soon.

~~~
spicyj
How does it know which shards are relevant? I thought that data was sharded by
primary key.

~~~
coffeemug
Err, sorry, my mistake -- pulled an all nighter to push the product out :)
You're right, the system will have to scan every shard. (the statement above
regarding parallelization is correct, though)

~~~
spicyj
No problem, huge congrats on shipping!

------
ww520
I like their DSL-style query API. It's fantastic!

Disclosure: Ok. I'm biased. :) I've designed a similar DSL-style query API in
another project. <https://github.com/williamw520/jsoda>

------
jpettersson
Wow, looks great! I'm excited to try it out.

The Github graphs are really interesting too, that's a lot of love/work right
there!

<https://github.com/rethinkdb/rethinkdb/graphs/impact>

~~~
dbro
I wonder what happens in the RethinkDB office on Monday evenings?
<https://github.com/rethinkdb/rethinkdb/graphs/punch-card>

------
gruseom
The most exciting news in quite a while! It will be interesting to hear what
people think as they try it out and differences begin to emerge. In the
meantime, congratulations on both the release and the open-sourcing.

------
wildmXranat
Is the package up on Ubuntu PPA already ? it seems that the installation
instructions use the ppa, but apt-get doesn't find the package.

edit: Indeed, my architecture i386 doesn't match the only available amd64
binaries. Thanks

~~~
coffeemug
The package is up. Which ubuntu version are you using? We support 11.04 and
above. Anything less than that is missing some kernel features we use.

EDIT: the main thing missing from earlier ubuntu versions is TCP_USER_TIMEOUT.
We can work around it in the server, but we haven't done it yet.

------
willytobler
This looks like very interesting. And a nice interface to deal with. But ... I
do not find anything about authentication. Whoever wants can fiddle with this
8080 Port.

Did you rethinkAuth or am I just too stupid to RTFM?

~~~
coffeemug
No authentication support yet unfortunately - you'd have to do an ssh tunnel
for the web admin. This is one of the things on the todo list.

------
l0stman
I was just wondering if you've planned on open-sourcing the code since the
very beginning or if the idea came much later. Anyway congrats on launching.

~~~
coffeemug
We wanted to open-source all along, but there were concerns raised by
investors about IP, etc. It took us a bit of time to work through all the
issues people raised, which is why it took so long.

~~~
whyleyc
How do you plan on monetizing the business ? Are you looking at some kind of
consultancy play, ala Redhat / MySQL (pre-Oracle) ?

~~~
coffeemug
We want everyone who has a reason to use the product to be able to get it and
use it for free (as in beer and speech). As companies grow and operations
become more important, we'll be there to provide paid support. It's actually a
great business model because you don't have to hire a really expensive
enterprise sales team (you still need a sales team, but it mostly has to
handle inbound requests).

We're also looking into launching services, which is a great revenue stream
for people who prefer to pay for convenience of not having to deal with
operations at all.

------
dested
This certainly seems interesting enough to look into!

One note, there's a typo in the code in the tutorial on top

r.table('users).insert({'name': 'Slava', 'age': 29 }).run()

users needs a closing quote.

~~~
coffeemug
Thanks -- fixed.

------
goranp
I am a bit late to the game but how does this compare to HyperDex in terms of
scalability and sharding?

------
sergius
No C++ API???

------
juzfoo
Congrats guys! Looks great from the specs. Any thoughts on architecting for
multitenancy?

------
tehansen
am I the only one that noticed the lambda at the bottom right of the webpage?
:)

------
Meai
Could you guys offer a C driver? (preferably not c++)

~~~
lambda
Taking a look at their API, a C++ driver would actually fit much better with
their style, which relies on operator overloading to make queries involving
arithmetic easy to write in the host language.

A C driver would definitely be possible, but would be a little bit clunky.

------
hagope
Great work, can't wait to try it out! :-)

------
DaemonXI
Will there ever be a 32-bit version?

~~~
coffeemug
It should be easy to do a port, but it's one of the many things to do. I can't
commit to a timeline, but it will probably happen.

------
mcartyem
Are there plans for an Arc driver?

~~~
jdoliner
PG said he'd post facto reject us from YC if we don't.

In all seriousness while we'd eventually like to have support for every
language Arc is farther down our list than others such as PHP and Java. We
will get to it eventually though.

If Arc could be made to use protocol buffers it wouldn't be too hard for a
contributor to write the driver themselves though.

------
sidcool
Catchy title, I must say.

