

Replica Strategy in Hdfs Is Not Good Enough - garfee
http://notcode.github.io/blog/2013/09/07/replica-strategy-in-hdfs-is-not-good/

======
brugidou
Comparing to mongodb is a joke.

However some more advanced strategies should be applied for very large hdfs
clusters. The rack aware strategy is actually better than what is described
because the probability distribution is not perfectly uniform. It all depends
on the hardware, the location... Etc. But with a very large number of blocks
the probability of loosing data with 3 nodes failure is close to 1
unfortunately.

We could try to imagine a better strategy having replicas in cliques of nodes
to mitigate the risk. Its a tradeoff of loosing more data with less
probability or less data with high probability I guess? Haven't done the math
:)

