Only thing I am concerned here is that since the data is accessed over HTTP Stack, won't it become the performance bottleneck.
Re: HTTP Stack: "Performance" is such a slippery word here. If what an application is most sensitive to is the time it takes to fulfill one request, then yes, the HTTP stack may be an important component to keep track of, as it plays a role in defining the lower bound of request fulfillment time.
However, since all HTTP requests are handled in parallel, and since any node can handle any request, applications whose main sensitivity is not latency, but is instead throughput or availability, are less likely to find the HTTP stack to be a performance bottleneck.
There is also the option of doing away with the HTTP stack completely, and speaking Erlang straight to Riak.
the slides are embedded through the video to help with clarity.
Edit: Yes, I have been a bit lazy not to go and try Riak first hand. I think I will do it sooner then later but as of now, any light on the subject will be appreciated.
Technically, I found CouchDB more interesting as it's more "different" from other approaches.
anyway, mongodb and couchdb are more closely related to each other than they are to riak. a closer comparison to riak would be cassandra (also based off dynamo).
riak is based off the amazon dynamo paper and it is a pretty faithful adaptation at that. circular hashing, node chit chat protocol, eventual consistency. CAP theorem. uses n,w,r paradigm where n=number of replicas, w=number of replicas that must be available in cluster to write and r=number of replicas that must be available in cluster to read. so in this mechanism the balance of CAP is up to you, the programmer. disconnected writes written to two separate nodes due to a network flap are both returned to the client for conflict resolution. interestingly riak is written in erlang and uses rest heavily. because they do they were able to demo access libraries in virtually every language.
regarding mongo v couch:
mongo (written in c++) has sharding/replication and rebalancing (via addition to cluster) in production now. rebalancing via removal is on the drawing board. mongo also supports dynamic querying. mongo also supports bson (binary json). i also believe there is a 4MB max row limitation. multiple daemons for different functions. limited creature comforts like web gui, efforts underway to rectify.
couch is more complete in its restful implementation and general overall approach. i feel it is an easier setup and administration. one daemon pretty much handles everything i believe. couch is a never-delete database in that it always appends writes and internally versions them. this allows for very interesting replication topologies like master-master-master due to their unique versioning scheme. this will also lead to and "offline" couch version that you can run on your desktop. couch also has "fouton" which is their admin web gui.
the teams behind each of them are committed and talented.
That said, Bryan gave a fantastic presentation and really knows his stuff.
Of course, they don't support range scan efficiently either. A proper Bigtable implementation is much harder, I guess :)
1000 nodes is a _lot_. Facebook's inbox search Cassandra cluster was at 150 nodes 6m ago; probably around 200 now given their growth trend -- so unless you have more data than FB you probably don't need to worry too much. :)
IIRC the largest OSS bigtable-clone cluster is Baidu's at around 120. (maybe 150 now?) So same ballpark really.
> they don't support range scan efficiently either
> A proper Bigtable implementation is much harder
It's actually much easier to get something that works when all your hardware goes well with the bigtable single-master model; it's just that having those single points of failure is a bitch (as even google with their vast engineering resources and 3 year head start is still working out, as in the appengine outage a couple months ago -- http://groups.google.com/group/google-appengine/msg/ba95ded9...). Once you put the engineering in, the fully distributed model is much better.
(For those following along: I'm a Cassandra developer; vicaya is a developer of hypertable, a bigtable clone.)
>> they don't support range scan efficiently either
> Cassandra does.
Only if you call order preserving hashing efficient when all the keys can be easily hashed in to one bucket, no matter what hash function you choose :) Cassandra's range query is an occasionally useful hack, which is not comparable to Bigtable like implementations, in terms of efficiency, scalability and robustness.
The Bigtable's single-master with standbys is never a problem in practice. The AppEngine outage was a due to a bug in GFS master protocol decoding. If this bug is in every node, all nodes would be crashing instead of one with some service (reads and some writes) still available. The so called "fully" distributed model is only better when you assume that you don't have bugs in your code and that node failure is random. I argue that separating different responsibilities/functionalities into different components is good for fault isolation, a sound software engineering practice. Naive fully distributed model is much more brittle in practice due to code bugs.
BTW, although Hypertable is inspired by Bigtable in design, the implementation and features set (support a dozen languages via Thrift and full read isolation via MVCC) are different enough that I wouldn't call it a clone :)
In any case, a few of differences I found:
1. Configuration of N/R/W parameters. Dynomite appears to set these properties once, at startup time, for all values. Riak configures N per-bucket ("bucket" being like a 1-level directory structure), and R/W per-request. Specifying N/R/W so "late" means that applications can put different CAP requirements on different sets of data and tune those requirements at request time.
2. Dynomite's main external interface seems to be thrift-based, with also some RPC exposure. Riak's main external interface is JSON-based, over ReSTful HTTP.
3. Hinted handoff is a target for Dynomite's 1.0 milestone (according to http://wiki.github.com/cliffmoon/dynomite/roadmap). Riak already fully supports hinted handoff.