
Backblaze Hard Drive Stats Q3 2019 - garaetjjte
https://www.backblaze.com/blog/backblaze-hard-drive-stats-q3-2019/
======
akersten
These stats are wonderful and make me really appreciate the culture of the
company. I've been considering becoming a customer because of these posts,
since they reflect a lot of pride in the craft and care for the community.

But I'm stuck on one thing. Does Backblaze offer a solution for Linux backup?
I've got an NFS server running that I use for home storage that I want to back
up - but looks like Backblaze is only offering a Windows or Mac client.

Maybe the business version would work, since it claims to support NAS backup.
But then the pricing seems lower than the personal edition (60$/computer/year
= 5$/month < 6$/month) - unless that's implying that every computer that
accesses the NAS is part of the fee?

So I guess: is there a reasonable Linux offering for home users from
Backblaze? If not, what service do folks suggest?

~~~
seanlane
I've been using restic with the Backblaze B2 backend for a home server backup,
which seems to be as close as Backblaze will ever get to having a Linux
client.

My rough numbers are 450GB stored monthly, 4GB downloaded monthly, 90k stored
files, 385,000 individual transactions, which ends up costing about $2.25 in
storage fees, $0.25 for transactions, and $0.10 for download bandwidth.

~~~
favorited
One thing that has been keeping from backing up more content to B2 has been
finding an appropriate encryption strategy, and it looks like Restic manages
encryption the remote backup repository automatically?

If I'm reading this correctly, Restic + B2 sounds like an absolutely godsend!

~~~
seanlane
That's correct, restic operates under the assumption that you cannot trust
where it's storing the data.

Some more (informal) analysis on restic's crypto was done here:

[https://blog.filippo.io/restic-cryptography/](https://blog.filippo.io/restic-
cryptography/)

------
nolok
There was a post from BackBlaze a year (?) or so back where they commented on
the Toshiba low failure rate with something along the lines of "they seem
really reliable, but we buy in bulk and just don't have enough offers at low
price for those, otherwise we would buy a lot of them".

Well I run a couple dozen Synology NAS in professionnal setup, as well as two
in personnal setup (mine and my parents'), and ever since that post I made the
experiment of having almost 50% of all drives be Toshibas, and I have to say
they do seem to be much more reliable (on the scale of "why do every other
drive from Seagate and WD keep dying first, and often their replacement dies
first too").

It is still a scale of use where it's mostly anecdotical rather than
verifiable data, so don't take this fun comment for more than that. But I
suspect a lot of people reading these posts are not interested for some large
scale setup or anything like that but rather to know which drives to put in
their home computer or NAS, and honestly I can highly recommend the Toshiba
for that. They do tend to be a bit more expensive (around 10% more ? I buy
them from ldlc.com and grosbill.com , french IT stores, no bulk buying or
anything like that)

Of course no matter the brand never expect no failure and a Toshiba drive may
just as much die in the first ten minutes so always plan for it.

~~~
atYevP
Yev from Backblaze here -> Yea, the Toshiba drives are great! If the price was
lower, they'd likely play a larger role in our hard drive mix!

~~~
jl6
If they need replacing less often, then presumably you’d be willing to spend
more on them, so do I infer that they are sufficiently more expensive to wipe
out the benefits of greater reliability?

~~~
atYevP
Yev here -> Yes, that's basically the trade-off for us. Is the drive available
and reliable vs. how much does it cost. One of the reasons we have so many
Seagate drives is because they are available in abundance, are affordable for
us, and have failure rates in-line with the other drives that we're testing.

~~~
myself248
I'm theorizing that one of the things Backblaze has optimized is the labor
cost of drive replacement, yes?

So the reliability may be "worth more" to someone who pays a lot for remote-
hands in a server farm somewhere, but not in this case.

~~~
atYevP
Yev here -> we do have remote hands available in some cases, but we try to
have our own teams in place where possible, but yes - it's one of the things
we optimize on our end!

------
DanCarvajal
This is my favorite kind of content marketing.

------
piepoter
Shout out to Seagate, every drive that my friends and I have bought from them
have eventually failed, good to see that they fail in non-consumer use too,
not just me! Stick to the Western Digitals.

~~~
atYevP
Yev here from Backblaze -> The Western Digital drives that you've purchased
will fail too. That's part of the whole point of these reports, all of the
drives eventually fail out or reach a state where we have to replace them -
it's not any one specific manufacturer. That's part of why having a backup is
so important, even the SSDs in newer machines will eventually go wonky.

~~~
piepoter
Then I stand corrected. Is there anyway to avoid drive failures or should we
accept the fact that they will break eventually? Also, the data is interesting
and appreciated.

~~~
protomyth
Buy a RAID enclosure or setup a home RAIDed server and do backups. You cannot
avoid it but you can mitigate the damage.

~~~
derekp7
I really wish that consumer OSs would mandate RAID (mirroring at the minimum),
requiring additional steps to install on unprotected storage. This should have
been implemented even back during the early days of consumer hard drives, or
at the minimum have a warning on boot that "Your data is stored on temporary
storage -- go [here] to configure redundancy". At least something that puts it
front and center that storing data on a single hard drive will eventually lead
to data loss.

Also would be nice to have a similar warning to "Your data has not been backed
up in [x] days".

~~~
AnIdiotOnTheNet
Ugh, god why? Why must people in our industry daydream about over-complicating
things all the time? Very few use-cases benefit significantly from mirrored
disks. The amount of data a user actually cares about having a backup of is
often significantly smaller than the OS and related garbage on a disk that
they can do without. Besides which, a local mirror isn't a good backup anyway!

~~~
derekp7
My thinking is that this would serve a similar purpose to the trend for web
browsers to warn users of insecure websites -- the more "in your face" the
warnings are, the more incentive there is for providers to be secure.

I've had friends / relatives experience drive failure a few time in the past,
and the look of horror they have when there is very little I can do to help
them recover their photos etc. is something that I hate seeing.

And having the OS give a simple warning (that can be dismissed, with a "don't
show me this any more" checkbox) would not over complicate things, and may end
up saving some people's data.

Really no different than the current warning that Windows has, when you don't
have antivirus installed. Or the fasten seatbelt signal that comes on the
dashboard of your car when you start it. Or the flashing red light that gets
added to some stop sign controlled intersections.

Also it would be nice if there was a standard way that backup software could
inform the OS of backup status (this way it would serve as a secondary check
in case the backup software's internal reporting fails to notify the user of
bad backups). Just a little nice-to-have.

(I'm not advocating for this to be a legal requirement, just a nice feature if
any OS vendor wants to add it).

~~~
AnIdiotOnTheNet
> My thinking is that this would serve a similar purpose to the trend for web
> browsers to warn users of insecure websites -- the more "in your face" the
> warnings are, the more incentive there is for providers to be secure.

Another thing I hate about the industry today: being hostile to the user and
trying to force them to use their computer how you want them to use it.

> I've had friends / relatives experience drive failure a few time in the
> past, and the look of horror they have when there is very little I can do to
> help them recover their photos etc. is something that I hate seeing.

Then teach them about proper backups. It doesn't take a rocket scientist to
understand "don't keep all your eggs in one basket". Or if you're going to
implement some stupid forced user-hostile scheme at least use something that
actually qualifies as a backup.

> And having the OS give a simple warning (that can be dismissed, with a
> "don't show me this any more" checkbox) would not over complicate things,
> and may end up saving some people's data.

Here's what will happen: the user will dismiss the dialog without even reading
it. We have decades of experience showing us this. Users have learned that
"warning" dialogs are meaningless precisely because of crap like this. Oh
yeah, and they're super annoying.

> Really no different than the current warning that Windows has, when you
> don't have antivirus installed.

Exactly my point.

~~~
ghaff
I've looked into RAID at home but came to the conclusion that there are plenty
of easier/cheaper ways to do backup. I just use USB drives, have a couple
rotating Time Machine backups plus Backblaze. (Plus, when I think of it once a
year or so, I have one more copy of my main data disk that I keep in a fire
box.)

I don't really use Windows but you can do something similar--though in my
experience it's not as simple.

I don't really care if I avoid _any_ downtime. So long as I have belt and
suspenders backups, I'm pretty comfortable.

------
kbutler
I love these articles - I'd also love an update to the cost curves:
[https://www.backblaze.com/blog/hard-drive-cost-per-
gigabyte/](https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/)
(2017)

It looks like after stalling in 2017-2018, $/GB has dropped again -
[https://jcmit.net/diskprice.htm](https://jcmit.net/diskprice.htm) \- but JCM
doesn't have the large sample sizes Backblaze does.

------
briffle
Interesting how the checksumming process sounds very much like zfs's scrubbing
process. One of the reasons I trust zfs with my large data volumes is because
it proactively looks for problems and fixes them. (and most filesystems really
can't look/check)

------
gwern
> By increasing the shard integrity check rate, we potentially moved failures
> that were going to be found in the future into Q3. While discovering
> potential problems earlier is a good thing, it is possible that the hard
> drive failures recorded in Q3 could then be artificially high as future
> failures were dragged forward into the quarter. Given that our Annualized
> Failure Rate calculation is based on Drive Days and Drive Failures,
> potentially moving up some number of failures into Q3 could cause an
> artificial spike in the Q3 Annualized Failure Rates. This is what we will be
> monitoring over the coming quarters.

Wouldn't survival analysis on interval-censored data handle this problem
automatically? All of your observations of failure presumably are actually
interval data, where all you know is that the drive failed sometime in between
the last good check and the first bad check. Then it doesn't matter if some
time periods have large intervals and others have small intervals, that just
affects the precision of estimates.

------
h1d
The only reason I use other storage provider than backblaze is simply because
of the benchmark done by one of the more modern backup tool author.

[https://github.com/gilbertchen/cloud-storage-
comparison/blob...](https://github.com/gilbertchen/cloud-storage-
comparison/blob/master/README.md)

Can anyone from backblaze say anything about their performance compared to
other vendors?

The pricing is certainly ahead of others, so I would use if the performance is
comparable to some of the leading group tested there.

~~~
atYevP
Yev here -> well that chart hasn't been updated in a while. For starters we're
just $0.01/GB for downloads (we dropped the price last year). Our performance
is generally pretty good, and we're partnered with cloudflare (free egress) if
you need more umph. But most of the time folks don't have any issues with just
our regular service.

------
riobard
I must have missed it somehow, but what is the difference between boot drives
and data drives in a typical Backblaze server other than the boot drives store
the OS? Obviously you don’t need 8TB capacity solely for the OS, I’d assume
you also store user data on boot drives? In which case why is there a
distinction?

~~~
atYevP
Yev here from Backblaze -> Mainly it's the OS and log files - the reason we
make a distinction is that the boot drives typically do not have as much load
as the data drives, so it wouldn't be a real 1:1 comparison.

~~~
riobard
Thanks! What’s the capacity and utilization rate of those boot drives, then?

~~~
andy4blaze
Andy from Backblaze here: The capacity of the boot drives ranges from 80 to
500gb typically. They are mostly hard drives with some SSDs added recently. We
are switching over to SSDs for boot drives. The workload is reasonable, but on
the higher side as they not only boot the systems, they also are used to store
log files temporarily - so lots of reads, writes, and deletes. Since we only
have a little over 2,000, any data we published not be very accurate. If you
are really interested, the boot drive data is in the data files we publish
each quarter.

~~~
riobard
Thanks for the pointer! :)

------
ksec
I wonder If Backblaze will eventually offer some sort of consumer solution to
Backup. An App on iOS, Android, Mac and Windows that simplifies backup to your
Backblaze NAS, that Backup to B2 as well.

------
roddux
I've been following these since maybe 2016? I don't remember when they
started. It's striking to note that for as long as I recall, HGST still holds
the crown of lowest annualised failure rate across the board.

------
kortilla
Off-topic but sort of related: is there a hard drive tower designed
specifically for a pool of SSDs? The one I have for 8 3.5 bays could easily
fit twice as many of the little SSDs...

~~~
myself248
The term "mobile rack" will find some of the densest ways to shove, for
instance, eight 2.5" drives into a single 5.25" bay. That can get you some
serious density in one of those "cdrom duplicator" tower cases that's all
drive-bays, but heaven help you on the controller side.

------
mikece
And is there _any_ reason to not trust WD Red drives in my home NAS?

~~~
nolok
Your question makes no sense. Any drive can and will fail, and no matter the
brand it may fail immediately or it may last years.

The rule is to avoid all of your drive being from the same factory date/batch,
even if they're all from the same brand order from different stores or
whatever to help make sure in case of a defect they're not all affected.

Also for home NAS ensure you have redundancy (a proper raid level, avoid JBOD
and Raid-0), and have backups. Raid is not a backup.

~~~
hddherman
I would add data integrity to the list, in addition to redundancy and backups.

ZFS is great in this regard: redundancy, data scrubbing to ensure data
integrity and built-in snapshotting and data replication features (zfs send-
receive). Can't believe how I got by without ZFS in my earlier years of data
hoarding.

~~~
nolok
Absolutely right, most home Nas don't have zfs but can use btrfs, which helps
protect against bit rot and offers proper data scrubbing.

------
probo23
Is Blackblaze used mainly as a backup solution? Can anyone give me a use case?

~~~
atYevP
Yev here from Backblaze -> the company started by providing unlimited online
backup, and that's a great industry for us. About 4 years ago we released
Backblaze B2 Cloud Storage, which allows developers or sysadmins or
enthusiasts to directly upload/retrieve data to/from our data centers. Our
core competency is data storage - so while most folks do use us for a backup
(either of their Mac or PC on the consumer side or servers/NAS devices with B2
Cloud storage) - what we really do is store and retrieve data.

~~~
bluedino
So you guys basically wrote your own Ceph?

~~~
atYevP
Sort of, but not really, we just wrote APIs that let people talk to our pods
directly. You can read about our architecture here
([https://www.backblaze.com/blog/vault-cloud-storage-
architect...](https://www.backblaze.com/blog/vault-cloud-storage-
architecture/)) and check out the APIs and how we built them here
(backblaze.com/b2/docs/).

