* A far more advanced query language -- distributed joins, subqueries, etc. -- almost anything you can do in SQL you can do in RethinkDB
* MVCC -- which means you can run analytics on your realtime system without locking up
* All queries are fully parallelized -- the compiler takes the query, breaks it up, distributes it, runs it in parallel, and gives you the results
But beyond that, details matter. Database system differ on what they make easy, not what they make possible. We spent an enormous amount of time on building the low-level architecture and working on a seamless user experience. If you play with the product, I think you'll see these differences right away.
Note: rethink is a new product, so it'll inevitably have quirks. We'll fix all the bugs as quickly as we can, but it'll take a few months to iron things out that didn't come up in testing.
Also, I am excited to try this out. I always enjoyed your writings and I am sure you + team have made something awesome.
Does it means that every query touches all servers ? Or does it sends queries to only a subset of servers when possible ? (e.g. range queries on PK)
> Or does it sends queries to only a subset of servers when possible ? (e.g. range queries on PK)
The query planner distributes the query between the nodes that actually contain the relevant data. Here are a few examples:
In your example, a range get on the primary key, the query would touch one copy of each shard of the table. *
A more interesting example is a map reduce query. That query will also only touch one copy of each shard of the table but the mapping and reduction phases will also happen on those shards which makes the whole process a lot faster.
But shouldn't it be fewer than "each shard"?
Let's say the range is 3 < PK < 7. If all PKs in that range only lives in 2 shards (out of a total of say 10 shards) then the query should only be run in those 2 shards, no? Or will all 10 shards still be touched by the query?
While it's true that on a single node MongoDB map reduce is single threaded, it is parallelized when running on a sharded cluster.