Backblaze of course, but we aren’t talking about the probability of seeing a failure, but of one of your disks failing, and more importantly, data loss. A binomial probability distribution is a simplified way to see the scenario.
Let’s pretend all disks have a failure rate of 2% in year one.
If you have 2 disks, your probability of each disk failing is 2%. The first disk in that array is 2%, and the second is 2%. If 2 disks fail in Z1, you lose data. This isn’t a 1% (half) chance, because the failure rate of one disk does not impact the other, however the risk is less than 2%.
So we use a binomial probability distribution to get more accurate, which would be .02 prob in year one with 2 trials, and 2 failures making a cumulative probability of .0004 for data loss.
If you have 6 disks, your probability of each disk failing is also 2%. The first disk in that array is 2%, the second is 2%, so on and so forth. With 6 disk Z2, three must fail to lose data, reducing your risk further (not to .08%, but lower than Z1).
So with a binomial probability distribution, this would be .02 prob with 6 trials, and 3 failures making a cumulative probability of .00015 for data loss.
Thats a significantly smaller risk. The other interesting part is the difference in probability of one disk failing in a 6 disk array than a 2 disk array is not 3x, but is actually barely any difference at all, because the 2% failure rate is independent. And this doesn’t even take into account large disks have a greater failure rate to start.
I’m not saying mirroring two larger disks is a bad idea, just that there are tradeoffs and the risk is much greater.
Please update the title to the full title of the article. This is super misleading and makes it sound like the founder doesn’t want this.