Hacker News new | past | comments | ask | show | jobs | submit login

Yet another Dynamo/DHT. Now I've seen implementations in Java (Dynamo, Cassandra, Voldemort), Perl (Blekko) and now Erlang. I wonder if any of these has been used on >1000 node cluster? I haven't seen any mention of checksums (besides those in TCP) to ensure that switch/router would not corrupt data (including cluster states), like Amazon learned the hard way.

Of course, they don't support range scan efficiently either. A proper Bigtable implementation is much harder, I guess :)




> I wonder if any of these has been used on >1000 node cluster?

1000 nodes is a _lot_. Facebook's inbox search Cassandra cluster was at 150 nodes 6m ago; probably around 200 now given their growth trend -- so unless you have more data than FB you probably don't need to worry too much. :)

IIRC the largest OSS bigtable-clone cluster is Baidu's at around 120. (maybe 150 now?) So same ballpark really.

> they don't support range scan efficiently either

Cassandra does.

> A proper Bigtable implementation is much harder

It's actually much easier to get something that works when all your hardware goes well with the bigtable single-master model; it's just that having those single points of failure is a bitch (as even google with their vast engineering resources and 3 year head start is still working out, as in the appengine outage a couple months ago -- http://groups.google.com/group/google-appengine/msg/ba95ded9...). Once you put the engineering in, the fully distributed model is much better.

(For those following along: I'm a Cassandra developer; vicaya is a developer of hypertable, a bigtable clone.)


1000 nodes are nothing if you crawl the entire web and keep a history. I'm sure someone already has a modified version of an OSS bigtable cluster at 1PB+ and close to 1k nodes :)

>> they don't support range scan efficiently either > Cassandra does.

Only if you call order preserving hashing efficient when all the keys can be easily hashed in to one bucket, no matter what hash function you choose :) Cassandra's range query is an occasionally useful hack, which is not comparable to Bigtable like implementations, in terms of efficiency, scalability and robustness.

The Bigtable's single-master with standbys is never a problem in practice. The AppEngine outage was a due to a bug in GFS master protocol decoding. If this bug is in every node, all nodes would be crashing instead of one with some service (reads and some writes) still available. The so called "fully" distributed model is only better when you assume that you don't have bugs in your code and that node failure is random. I argue that separating different responsibilities/functionalities into different components is good for fault isolation, a sound software engineering practice. Naive fully distributed model is much more brittle in practice due to code bugs.

BTW, although Hypertable is inspired by Bigtable in design, the implementation and features set (support a dozen languages via Thrift and full read isolation via MVCC) are different enough that I wouldn't call it a clone :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: