
Backblaze Durability Is Eleven 9s – And Why It Doesn’t Matter - ingve
https://www.backblaze.com/blog/cloud-storage-durability/
======
londons_explore
This analysis is simplistic.

Correlated failures are common in drives. That could be a power surge taking
out a whole rack, a firmware bug in the drives making them stop working in the
year 2038, an errant software engineer reformatting the wrong thing, etc.

When calculating your chance of failure, you _have_ to include that, or your
result is bogus.

Eg. Model A of drive has a failure rate of 1% per year, _but_ when failed the
symptom is failure of the drive to spin up from cold, however if already
spinning it will keep working as normal.

3 years later, the datacenter goes down due to a grid power outage and a
dispute with diesel suppliers so the generators go down. It's a controlled
shutdown, so you believe no data is lost.

2 days later when grid power is back on, you boot everything back up, only to
find out that 3% of drives have failed.

Not a problem. Our 17 out of 20 redundancy can recover up to 15% failure!

However, each customers data is split into files around 8MB, which are in turn
split into the 20 redundancy chunks. Each customer stores say 1TB with you.
That means each customer has ~100k files.

The chances that you only have 16 good drives for a file is about (0.97^16 *
0.03^4) _20_ 19*18 = 0.3%

Yet your customer has 100k files! The chance they can recover all their data
is only (1-0.003)^100000... Which means every customer suffers data loss :-(

~~~
Johnny555
_That could be a power surge taking out a whole rack_

This failure mode, at least, is already accounted for by sharding data across
cabinets:

 _Each file is stored as 20 shards: 17 data shards and 3 parity shards.
Because those shards are distributed across 20 storage pods in 20 cabinets,
the Vault is resilient to the failure of a storage pod, or even a power loss
to an entire cabinet._

[https://www.backblaze.com/blog/vault-cloud-storage-
architect...](https://www.backblaze.com/blog/vault-cloud-storage-
architecture/)

However, they don't seem to offer multi-datacenter (or multi-region)
redundancy so are still susceptible to a datacenter fire/failure.

In comparison, AWS S3 distributes data across 3 AZ's (datacenters), and you
can further replicate across regions if you choose. Though you pay for that
added redundancy in 3 - 4X higher cost.

~~~
gm-conspiracy
I thought AZs were in the same physical location, just separate networks, no?

~~~
Johnny555
They are separate datacenters, I don't think they make any promises about how
far they are apart from each other, but at least in some regions they are 10+
miles apart.

 _The AWS Cloud infrastructure is built around AWS Regions and Availability
Zones. An AWS Region is a physical location in the world where we have
multiple Availability Zones. Availability Zones consist of one or more
discrete data centers, each with redundant power, networking, and
connectivity, housed in separate facilities_

[https://docs.aws.amazon.com/aws-technical-
content/latest/aws...](https://docs.aws.amazon.com/aws-technical-
content/latest/aws-overview/global-infrastructure.html)

~~~
gm-conspiracy
So, a single AZ could span across multiple physical locations, as well?

Am I reading that correctly?

------
garettmd
This was an interesting read, both the points made about durability, as well
as the in-depth math. However, what stood out to me most was the line:

Because at these probability levels, it’s far more likely that:

\- An armed conflict takes out data center(s). \- Earthquakes / floods / pests
/ or other events known as “Acts of God” destroy multiple data centers. \-
There’s a prolonged billing problem and your account data is deleted.

The point that once you get to a certain point of durability (at least as far
as hardware/software is concerned) you're chasing diminishing returns for any
improvement. But the risks that are still there (and have been big issues for
people lately) are billing issues. I think it's an important point that the
operational procedures (even in non-technical areas like billing and support)
are critical factors in data "durability"

~~~
klodolph
I've posted the math here before but if we assume that an asteroid hits the
earth every 65 million years and wipes out the dominant life forms, then this
fact alone puts your yearly durability at a maximum of ~8 nines.

The point about billing is better, though.

My other concern is that a software bug, operator error, or malicious operator
deletes your data.

~~~
tjoff
That's why no sane entity would use one earth. Use two and you can quickly
recover.

~~~
klodolph
Our goal is N+2, that way when one Earth is down for planned maintenance, you
can endure unplanned loss of a second Earth.

N, of course, is always 1.

------
Animats
Financial failure or service shutdown by the provider is the highest risk for
long term storage. The backup services CrashPlan, Dell DataSafe, Symantec,
Ubuntu One, and Nirvanix all shut down. Nirvanix only gave two weeks notice
for users to save their data.[1]

[1] [https://www.computerweekly.com/opinion/Nirvanix-failure-a-
bl...](https://www.computerweekly.com/opinion/Nirvanix-failure-a-blow-to-the-
cloud-storage-model)

~~~
brador
I would add a third risk of an account ban by some type of future automated
copyright content ID. Especially if it is silent/without warning.

~~~
stefan_
Forget account ban, there is a nonzero risk a false positive of the automated
kiddie porn search all the cloud storage providers do has your home searched
and puts you in handcuffs.

~~~
e12e
While I'm sceptical of content filters, even with a home search, it seems
unlikely you'd end up in cuffs unless a) the filter caught acymtual illegal
content, or b) the search turned up something illegal.

You might get killed in the course of the initial police raid though..

~~~
stefan_
Well you take a picture of your kid in the bathtub, now who can tell the
difference?

~~~
e12e
If pictures of naked kids is illegal in your jurisdiction, you've got bigger
problems. I guess that's true for some locations, though. Still, nude people
=! pornography.

(kids below age of consent sexting each other is another, related, problem)

------
aikinai
I really want to like Backblaze and they seem to do a lot of good work, but
whenever this comes up, I also feel responsible to let people know the dark
side so they're informed at least.

I've written in more detail before[0], but just to share the gotchas in case
anyone here is thinking of switching to Backblaze:

1\. They backup almost no file metadata.

2\. The client is very slow (days or more) to add new files and there's no
transparency (it claims everything is backed up when it's not).

3\. There are still bugs in the client that can put your backup into an
invalid state where it gets deleted.

4\. Support is terrible, and won't be any help when you run into these bugs.

[0]
[https://news.ycombinator.com/item?id=16301626](https://news.ycombinator.com/item?id=16301626)

~~~
rendaw
As much as this is off topic, I'd like to continue this conversation. Do you
have any details/reference for point 1?

I've been using rclone (got the recommendation here) which has been reliable.

Also, does anyone know if Backblaze has any plans to offer u2f? I've switched
dns and email providers to get u2f.

~~~
aikinai
Yeah, it's a bit old, but here's an article about Backblaze not supporting
metadata. [0] "It fails all but one of the Backup Bouncer tests, discarding
file permissions, symlinks, Finder flags and locks, creation dates (despite
claims), modification date (timezone-shifted), extended attributes (which
include Finder tags and the “where from” URL), and Finder comments."

And I don't know if it supports U2F, but it does support TOTP.

[0] [https://mjtsai.com/blog/2014/05/22/what-backblaze-doesnt-
bac...](https://mjtsai.com/blog/2014/05/22/what-backblaze-doesnt-back-up/)

------
Pissompons
It's a really nice blog post but coming from Backblaze, it would have been
nice if they wrote it _after_ bringing the Phoenix DC fully online. When
Amazon or Google say 11 9s, I can believe it but Backblaze still only has a
single datacenter for most data. All it takes is an earthquake.

~~~
sneak
Or an overzealous prosecutor.

------
ujjain
It still wouldn't upload my 1TB in back-ups in an entire month. Amazon Drive
back-up completed in 3 days.

Their pricing is amazing, but saving money on a back-up solution that doesn't
seem as good as the other cloud storage providers is a dangerous game.

~~~
fencepost
Where are you geographically? Was your Amazon upload to a datacenter
physically much nearer to you than California? 30MBit/s sustained for 3 days
isn't unreasonable for a business connection, but seems high compared to most
of what I see available at least in the US.

~~~
brianwski
Disclaimer: I work at Backblaze and live in California.

> 30MBit/s sustained for 3 days isn't unreasonable for a business

We (Backblaze) are seeing more and more consumer internet connections in the
USA with 20 Mbit/sec upstreams, I thought they were available most everywhere
if you were willing to upgrade your internet package "just a bit". 30 Mbits is
a little unusual for "consumer", but not unheard of. Of course, there is a
"selection bias" when you look at online backup users. :-)

~~~
fencepost
Yes regarding the available speed, but on a consumer connection with Comcast
at least if you push 30MBit for 3 days straight you may get a call or at least
may start seeing popups in your browser on any http (not https) traffic.

From Comcast: _The Terabyte Internet Data Usage Plan is a new data usage plan
for XFINITY Internet service that provides you with a terabyte (1 TB or 1024
GB) of Internet data usage each month as part of your monthly service. If you
choose to use more than 1 TB in a month, we will automatically add blocks of
50 GB to your account for an additional fee of $10 each. Your charges,
however, will not exceed $200 each month, no matter how much you use. And, we
're offering you two courtesy months, so you will not be billed the first two
times you exceed a terabyte._

Also, _All customers in locations with an Internet Data Usage Plan receive a
terabyte per month, regardless of their Internet tier of service._ and _The
data usage plan does not currently apply to XFINITY Internet customers on our
Gigabit Pro tier of service. The plan also does not apply to Business Internet
customers, customers on Bulk Internet agreements, and customers with Prepaid
Internet._

------
eloff
This was an interesting read from a technical point of view, but also well
written and refreshingly transparent.

I found the discussion about why it doesn't matter when you start talking
about 11 nines of reliability to be hilariously true.

At the end of the day we're still flawed humans living in a hostile universe,
and no matter how foolproof we make the technology, there are some weaknesses
that just can't be eliminated.

~~~
brianwski
Disclaimer: I'm the author of the blog post. :-)

> well written and refreshingly transparent

Thank you! In the interests of full transparency, the blog post was a
collaborative affair and was proof read and edited for clarity by several
people at Backblaze.

> discussion about why it doesn't matter

One of the philosophies Backblaze uses is to build a reliable component out of
several inexpensive and unrelated components. So combine 20 cheap drives into
a ultra reliable vault. We have two or three inexpensive network connections
into each datacenter instead of buying one REALLY expensive connection for 8x
the price. Etc.

Personally, I recommend customers do the same. Instead of storing two copies
of your data in two regions in Amazon for 2x the price, store your data in one
region of Amazon and put one copy in Backblaze B2 for 1.25x the price. We
believe this will result in higher availability and higher durability that two
copies in Amazon because Amazon S3 and Backblaze B2 don't share datacenters in
common (that we know of), don't share network links, don't share the software
stack, etc. For bonus points, use different credit cards to pay for each, and
have different IT people's credentials (alert email address) on each. That way
if one IT person leaves your company and you don't get an alert that the
credit card has expired, hopefully your other copy will be Ok.

~~~
k__
BB and S3 both have eleven 9 durability, how much does using both increase
this?

~~~
bhelkey
Depends what you are modeling. The probabilities of random disk failures, are
probably independent.

However, there are risks that are not necessarily independent such as the US
Government ordering these two services to delete your data, or, as the article
mentions, an armed conflict destroying data centers.

~~~
e12e
Or your credit card being suspended and both services deny you access / delete
your data.

------
londons_explore
I've very disappointed their recovery time is 6 days!

Recovery workload should be spread across the whole cluster, so that the
recovered data gets distributed evenly. In that case, assuming 10,000 drives,
to recover one dead 12TB drive and a recovery rate of even 10 MB/secs per
machine, recovery of one drive should be done in under a second. Maybe 10
seconds with some sluggish tail machines.

Why do you need it done in under a second? While the data is down one replica,
it is at dramatically higher risk. Also, drive failures can be dramatically
accelerated, for example in the case of a bad software release erasing data -
you need to be able to move data faster than bad software gets released. And
releasing software at a rate of one machine per second still means a release
takes 3 hours!

~~~
jsjohnst
> In that case, assuming 10,000 drives, to recover one dead 12TB drive and a
> recovery rate of even 10 MB/secs per machine, recovery of one drive should
> be done in under a second.

I want to know where you can find a drive that can write 12TB/sec of data!

(In other words, you clearly missed half the problem. To add a new replacement
drive, you have to be able to write to it the data from an original drive.
Also RS code calculation is fast these days, but it ain’t that fast)

~~~
riobard
Additionally, it implies that data needs to be spread across 10,000 drives,
which is unrealistic anyway.

~~~
brianwski
Disclaimer: I wrote the blog post. :-)

> spread across 10,000 drives, which is unrealistic

I claim it is also undesirable. Backblaze specifically made the conscious
decision that the parts of any one single "large file" (these can be up to 10
TBytes each) are all stored within the same "vault". A vault is 20 computers
in 20 separate racks. This allows a single vault to check the consistency and
integrity of a large file periodically without communicating to other vaults
in the datacenter.

The vaults have been a really good unit of scaling for Backblaze. If the
vaults can maintain their performance, then we know we can just stamp out more
vaults because there is almost no communication between vaults.

------
riku_iki
> if you store 1 million objects in B2 for 10 million years, you would expect
> to lose 1 file.

Can this be reformulated: you store 10 trln objects (e.g. 100TB of 10 byte
records), you lose 1 record each year.

Also curious what are the stats from other providers.

~~~
jsjohnst
> Can this be reformulated

Roughly, yes.

> Also curious what are the stats from other providers.

As to other providers, most are 6+ 9s that I’ve looked at, with many in the
8-9 range. Anything over 8 is (as they admitted) essentially marketing porn
and not a useful metric (for reasons they mentioned as well as ones said by
other comments here).

~~~
riku_iki
And yet, no much complains about lost GMail letters in internet because of
disc failures.. And GMail is likely stores much more than 10 trln objects and
100TB of data.

~~~
brianwski
A number of years ago a Gmail "insider" that I know admitted they had lost
customer emails and their policy (at that time at least) was to simply ignore
it and not tell anybody because most customers never notice if it is a small
number.

I think a much bigger scandal is that all major laptop Operating System
vendors (Microsoft and Apple) absolutely know when your laptop drive loses
files or even is starting to go bad in some cases, and they NEVER tell the
customer. I think an excellent product offering would be a 3rd party piece of
software and cloud service which was a "verification service". It wouldn't
store your files offsite, it would store the name, size, and SHA1 offsite and
periodically check that no bits have been flipped on your local drive unless
you intended it. For example, a week after I take a photo, I absolutely never
want the photo to change. Ever. Same with music I (legally) download.

------
duxup
I just wish Backblaze would not go "oh man a lot of your data changed, you
should probabbly check for integrity and start your backup with us again
later...."

Oh gosh thanks Backblaze, I'll just dig through several TB of stuff....

------
toolslive
The best durability is probably achieved by Amplidata, but it does not matter.

You need to do the same calculation for your meta data, which is probably not
erasure coded. If you lose this, you don't lose your data, but you no longer
know where you put it.

So you probably add your meta data to your data as well in some kind of
recoverable format. That's fine, it means that you can harvest the meta data
again.

But how long does this take ?

------
simonebrunozzi
The Backblaze blog post points to Amazon's CTO (Werner Vogel)'s blog post, in
which he states that "...These techniques allow us to design our service for
99.999999999% durability."

(side note: Werner is a great person)

Unfortunately, there is a difference, a huge difference, between a system
"designed" for 11 9s of durability, and a system "offering" 11 9s of
durability.

I wish Backblaze, or Amazon, or anybody else, would clarify durability using
very honest terms.

An example?

"This system offers X 9s of durability over a period of one year, on average.
This is a technical paper that describes how we tested that durability",
followed by measurements and test specifics.

Any other claim has much less value to me.

------
HelloNurse
"Our nines go to eleven"

Well written, but there are other significant risks like losing access
credentials (e.g. a password stored only on one device that is destroyed in
the same accident in which its only user, who remembered the password by
heart, dies) or being hacked by someone who gains access to cloud storage and
intentionally erases or corrupts data.

Specialization is good, but if Backblaze is strictly in the business of
storing data on hard disks, who's going to help with designing and maintaining
the reliable complete system on top of their service that users actually need?

------
shub
Just one missing piece: the actual loss rate.

~~~
mikeryan
There’s another piece too. Practically they’re a backup service so knowing how
often someone needs to recover their data and not had an opportunity to reset
their backup based on their current state. I’ve used Backblaze for years and
only needed it once. (I also back up to a time capsule as well since data
recovery is more Practical/Easier from that)

------
Jabbles
_to lose a file, we have to have four drives fail before we had a chance to
rebuild the first one._

I wonder how many times they've had 2 or 3 drives fail before they've rebuilt
and if that matches their predictions.

------
anderspitman
Off topic but I just want to say how happy I've been as a Backblaze customer.
B2 is a fantastic product for my backup needs, and their hard drive stats have
always been handy when selecting drives.

~~~
atYevP
Yev from Backblaze -> Thanks! Glad you're enjoying it :D

------
fencepost
One question since I know some of the Backblaze folks respond to these
threads:

In addition to calling (and possibly getting blocked/ignored), does your
customer service staff send text messages? I suspect that a big percentage of
the phone numbers you have are for cell phones these days, and I see a lot
less SMS spam than I do telemarketing. SMS would also allow you to get a bit
of info visible to recipients (e.g. "Backblaze CC Expired") with more detail
once a message is opened.

~~~
atYevP
Yev from Backblaze here -> I believe we do send SMSs in the case of a Cap or
Alert getting reached, so yes that could be possible - though I'm not sure if
an SMS is part of our billing failure process - that's an interesting
question!

~~~
u02sgb
Backblaze B2 customer here. My credit card stopped accepting your billing and
no SMS for me. Took me a month or so to notice the emails and update my
details. I've got SMS alerts active for Caps. Would be worth adding that as
was a bit scary when I noticed the mail (think it was the third one you'd
sent!).

~~~
atYevP
VERY good to know! Thank you!

------
juancn
My experience is that it's bullshit. I had a backup (damn I still have one)
with Backblaze, when attempting to restore it, maybe 30% of the files survived
restore. The rest are lost in smoke.

They don't have any way to detect corruption in the data or if they have, the
backup clients are oblivious to it.

I lost about a 150GB of family photos and videos.

------
cascom
One simple question - what is the math if one data center goes down?

[https://www.nytimes.com/interactive/2018/05/24/us/disasters-...](https://www.nytimes.com/interactive/2018/05/24/us/disasters-
hurricanes-wildfires-
storms.html?hp&action=click&pgtype=Homepage&clickSource=g-artboard&module=second-
column-region&region=top-news&WT.nav=top-news)

------
sneak
This article supposes that a meteor impact is more likely than disaster that
renders northern California (their only data centers are in Oakland and
Sacramento AFAIK) without power or civil order within ten million years.

As someone else pointed out, it’s overly simplistic. They’re a great low-cost
alternative to S3, sure. But keep a backup on another continent if you need
your data 100 years from now.

------
contravariant
>The sub result for 4 simultaneous drive failures in 156 hours =
1.89187284e-13. That means the probability of it NOT happening in 156 hours is
(1 – 1.89187284e-13) which equals 0.999999999999810812715 (12 nines).

Minor nitpick, this ignores the possibility of _more_ than 4 failures,
although this error only affects the fourth digit _after_ the nines. Much more
egregious is the following:

>there are 56 “156 hour intervals” in a given year

This is too simplistic, there are in fact infinitely many 156-hour intervals
in a year, some of them just happen to overlap. This overlap can't simply be
ignored because even if none of their 56 disjoint intervals contain 4 events
this does not rule out the possibility of there being 4 events in some 156
hour interval they didn't take into account. In fact failing to take into
account even one of the infinitely many intervals creates a blind spot
(consider what happens if the drives happen to fail precisely at the start and
end of a particular interval). You can still get a lower bound by e.g.
ensuring none of the 56 intervals contain more than 1 failure, or by adding
more intervals and ensuring none of them have more than 2 failures etc.

Their binomial calculation contains the same mistake.

A quick improved lower bound can be obtained by calculating the probability
that any failure is followed by (at least) 3 other failures within 156 hours.
For one failure this probability is given by the Poisson distribution and is

    
    
        Pc = 1 -\sum_{k<3} e^-λ λ^k / k! = 5.18413e-10. 
    

Now we get into some trouble because the failures and the probability of a
'catastrophic' failure are dependent, however the probability that any
particular failure turns catastrophic is constant, so the expected number of
catastrophic failures can't be greater than the expected number of failures
times that constant, this gives a lower bound of

    
    
        Pc (365·24·λ) = 6.63154e-9
    

this is a lower bound, but that's still three fewer nines left than their
claim.

Anyway let's just hope their data centres are more reliable than their
statistics.

Edit: This last calculation can be justified by noting that the probability
that 1 critical failure starts in a particular time interval is Pc times the
probability of 1 failure in that interval plus some constant times the
probability of more than one interval. Similarly the probability of more than
one critical failure is at most the probability of more than one failure.

Now the probability of more than one failure in a time interval is dominated
by the length of the interval, therefore if you calculate the density those
parts fall away and you're left with a density of Pc λ critical failures per
hour.

This seems to be an exact expression for the expected number of critical
failures, and not just a lower bound. Although it is still a lower bound for
the _probability_ of a critical failure, albeit a fairly tight one.

------
hartator
> The math on calculating all this is extremely complex.

Hum, I would be more reassured by past statistics than a probability
evaluation. Did they happen to have loss data since their creation?

------
skybrian
The billing problem is mentioned, but just left dangling. I wonder if anyone
has done any interesting work at fixing this?

------
rossdavidh
Fun read, but the whole time I was reading it I could hear Nassim Nicholas
Taleb making exasperated noises of outrage.

------
zimbatm
answer: because all the files are in the same, single data center which has a
lot less nines

------
egonschiele
why calculate poisson and binomial when both are ultimately gaussian?

~~~
RA_Fisher
They're not ultimately gaussian. :) The normal distribution is continuous and
has unbounded support whilst neither the binomial or Poisson ate cts and
unbounded.

------
rafaelgarrido
is the website down?

~~~
hartator
It was the 0.00000001%.

