As a pretty good rule of thumb, a system that fails 1/nth of the time and has n opportunities to fail has ~.63 probability of failure, where n is more than ~10.
If a system has probability 1/n to fail, then it has probability 1 - 1/n to not fail. The probability it will not fail after n trials is (1 - 1/n) ^ n. The limit of this quantity when n->+inf is 1/e.
If you want to know the probability it will fail, just take 1 - probability_success = 1 - 1/e.
I'd say, hey, how do you calculate (1-h)^k? First take the natural log: ln((1-h)^k) = k ln(1-h) = -kh. And then exponentiate back up: e^(-kh). (For small values of h, ln(1-h) = -h by linear approximation.) (Edit: Wiped out looong comment.)
It's always amusing when someone asks for a layman/non-math/intuitive reason why something works out and HN responds with a 3-paragraph long proof that seems to always require university-level math. And it seems those comments almost invariably start with "Oh, you just..."
Ultimately, it's hard to give a math-free explanation for something that comes out straight from math. If you break down an explanation into small enough steps, they should be comprehensible for anyone even if they have to take some steps on faith.
cperciva 3548 days ago [-]
That is my startup idea. I don't want to take this
thread even more off-topic (if that's even possible),
but please feel free to contact me at the address in
that first post to explain why you think it is a bad
idea.
dhouston 3548 days ago [-]
we're in a similar space -- http://www.getdropbox.com
(and part of the yc summer 07 program) basically,
sync and backup done right (but for windows and os
x). i had the same frustrations as you with existing
solutions. let me know if it's something you're
interested in, or if you want to chat about it
sometime.
drew (at getdropbox.com)
10 machines with a 10% chance of failure roughly equal to 100 machines with a 1% of failure.
I think confusingly worded, as n increases the reliability of each node has to increase correspondingly to get the convergence. I'm not sure what real system this reflects, but I suppose it's an indication of at what point the problems of scale will bite (if you know your rough failure rate).
Graph: http://www.meta-calculator.com/online/?panel-102-graph&data-...