Hacker News new | past | comments | ask | show | jobs | submit login

Hey, this is Slava, founder of rethinkdb. There are some obvious high level differences:

* A far more advanced query language -- distributed joins, subqueries, etc. -- almost anything you can do in SQL you can do in RethinkDB

* MVCC -- which means you can run analytics on your realtime system without locking up

* All queries are fully parallelized -- the compiler takes the query, breaks it up, distributes it, runs it in parallel, and gives you the results

But beyond that, details matter. Database system differ on what they make easy, not what they make possible. We spent an enormous amount of time on building the low-level architecture and working on a seamless user experience. If you play with the product, I think you'll see these differences right away.

Note: rethink is a new product, so it'll inevitably have quirks. We'll fix all the bugs as quickly as we can, but it'll take a few months to iron things out that didn't come up in testing.




What do you see as the potential areas where RethinkDB will shine?

Also, I am excited to try this out. I always enjoyed your writings and I am sure you + team have made something awesome.


Joe Doliner - Engineer at RethinkDB here. RethinkDB is designed for small teams with big data challenges. When you're just starting up a new project ideally you want to just boot your database up and start throwing data at it without worrying about schema. However with other products on the market, most notably Mongo, there are a lot of features that stop working when you get to a large scale. We've been very careful in developing RethinkDB to make sure that small teams who use our product aren't going to need to rewrite code once their dataset starts growing. As coffeemug mentions above we support fully parallelized queries. This means that when your dataset grows you can add more servers to speed up analytic queries. We feel this a valuable feature for small teams.


Thanks Joe.


Hah that question is almost word for word in the faq: http://www.rethinkdb.com/docs/faq/


Yep, you got me. I hadn't read the FAQ. Thanks for that.


> All queries are fully parallelized

Does it means that every query touches all servers ? Or does it sends queries to only a subset of servers when possible ? (e.g. range queries on PK)


Joe Doliner - RethinkDB engineer here. > Does it means that every query touches all servers ? No.

> Or does it sends queries to only a subset of servers when possible ? (e.g. range queries on PK)

The query planner distributes the query between the nodes that actually contain the relevant data. Here are a few examples:

In your example, a range get on the primary key, the query would touch one copy of each shard of the table. *

A more interesting example is a map reduce query. That query will also only touch one copy of each shard of the table but the mapping and reduction phases will also happen on those shards which makes the whole process a lot faster.


> In your example, a range get on the primary key, the query would touch one copy of each shard of the table.

But shouldn't it be fewer than "each shard"?

Let's say the range is 3 < PK < 7. If all PKs in that range only lives in 2 shards (out of a total of say 10 shards) then the query should only be run in those 2 shards, no? Or will all 10 shards still be touched by the query?


Correct, the query will only touch two shards in this case.


This is exactly how normal, map reduce and aggregation queries work in a sharded MongoDB cluster.

While it's true that on a single node MongoDB map reduce is single threaded, it is parallelized when running on a sharded cluster.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: