
Hard Drive Reliability Update – Sep 2014 - nuriaion
https://www.backblaze.com/blog/hard-drive-reliability-update-september-2014/
======
disordr
I really want to applaud backblaze for publishing these reports and stats. Too
many companies closely guard this information that really helps the larger
community. Based on the previous blogs from backblaze, when I built out our
new hadoop cluster, I purchased 1450 Hitachi drives. I plan to gather our
failure rates and publish them as backblaze does. Thanks for blazing the path!

~~~
gioele
> I purchased 1450 Hitachi drives.

Isn't using very similar drives a problem because their failure rates are not
statistically independent, so there is an high probability that they will all
fail at the same time?

~~~
Sami_Lehtinen
I've seen it. RAID5 with 5 similar disks failed during weekend. All but one
drive were dead. Of course disk failure also prevented daily backup run and
all hell broke loose on Monday morning. The exact reason of death is unknown.
Important fact is that the raid was configured on Thursday. So I guess the
same time death is most probable when drives are really new and from same
batch.

~~~
runamok
I saw that once in an array that was on 24/7 for years. 1 drive was failing so
they shut it down to replace the drive (perhaps hot swap was not an option?)
and almost all of the rest of the drives did not come up. Basically the heads
stuck to the platters. Aka "stiction".

I would guess in your scenario something like that happened. Or perhaps trying
to migrate tons of data to a new disk caused an issue. Just seems unlikely
otherwise.

------
zaroth
Since annual failure rate is a function mostly of age, it would be interesting
to see a line chart of cumulative failure rate vs age. But since new drives
are continually being added to the population, there would be fewer drives in
the data set as you moved up each curve.

I guess you could calculate confidence intervals at quarterly intervals, and
so the error bars would get larger as age increases and 'n' decreases.

How would you calculate the CI for failure rate? It's not binomial or poisson,
since failure rate goes to 1 over time...

A little searching turns up
[http://rmod.ee.duke.edu/statistics.htm](http://rmod.ee.duke.edu/statistics.htm)
which I'm sure completely explains how to do this... (rolls eyes). I hate that
this is how statistics is commonly taught. Knowing which distribution to use
and applying it correctly can actually be intuitive if taught properly. It
doesn't always need to be an exercise in alphabet soup / deriving from base
principles.

~~~
dugmartin
It's been 20 years since I took a reliability engineering class but I believe
the go to curve is the negative exponential. Here is a link from a quick
Googling:

[http://www.quanterion.com/FAQ/Exponential_Dist.htm](http://www.quanterion.com/FAQ/Exponential_Dist.htm)

~~~
darkmighty
The exponential lifetime distribution is the model for ideal memoryless
failures: the probability of failure in a dt interval is independent of
current lifetime and have value lambda*dt. Those are as "random" failures as
they get. I suppose hard drives are better modelled by a variable failure rate
lamba(t), which should have a peak for the first few hours/days, settle down
and then start growing quickly after a few months.

------
ChuckMcM
One of the challenges I have with this analysis is that a 'failure' isn't just
that your drive is no longer working, it is that your drive isn't working and
you have to go replace it. The operational costs of replacing a drive have
three parts, loss of production while the drive is offline, operator time to
physically replace the drive and prep it for re-entry into the system, and
transactional costs of doing a warranty replacement (filling out the RMA form,
getting a valid RMA, shipping the and receiving replacements). We minimize the
latter by doing RMAs in batches of 20 but its still a cost across those 20
drives. (and the population of 40 drives which exist as spares are effectively
not available for production). It isn't as simple as 'sure drives fail a bit
more often but we don't expect to use them that long.'

~~~
acdha
That's a good point but also requires a more complicated analysis since it
requires you to account for your ops capacity (e.g. the marginal cost increase
of doing 10 drives instead of 5 is probably a lot less than 2 unless your ops
team is almost completely booked) and also requires some guessing about
whether there actually is a better option. They addressed that second point
with the discussion of enterprise drives, which certainly matched my own
experience.

------
michaelbuckbee
Biggest takeaway was at the end, with the "enterprise" drives being slightly
less reliable than the consumer ones at half the cost.

~~~
VLM
And thats whats weird, who is the audience for the article? I know what
enterprise drive means and I know the author is smart and knows what it means,
so why the weird implications in the article that have nothing to do with
"enterprise"?

For those not "in the know" the hardware is the same, but desktop firmware
drives will sit there for 10 seconds or whatever it is beating the drive when
there's a read (or write) fail on the assumption that if your machine only has
one drive you're better off trying as hard as possible to keep retrying until
it works, and possibly the slowness will motivate them to replace (god forbid
an end user have backups lol)

Enterprise firmware, when it has a soft fail, just croaks as fast as possible.
That lets the raid array hurry up and do its thing, or maybe even higher level
replication do its thing.

(edited to add the old startup adage of "fail quickly". Thats what enterprise
drives do to keep overall array latency low, which is counter productive for
consumer non-array drives)

Aside from the firmware load the prices are different because usually
enterprise has better guarantee and better service and unlike consumer drives
which statistically are never replaced under guarantee so you can claim
anything on paper for marketing purposes it won't cost anything, enterprise
drives WILL get replaced and there will be a papertrail etc. So the guarantee
for an enterprise drive actually costs something.

Sometimes the firmware has some other subtle differences like how it handles
recalibrates and scrubs (consumer home drives are like "too bad you get to
wait on my schedule" and again, enterprise will go to some effort to eliminate
array latency)

My guess is the article is subtle astroturf by the winning drive mfgr?

~~~
spindritf
_unlike consumer drives which statistically are never replaced under
guarantee_

They're never replaced because regular consumers statistically don't ask for
replacement and simply eat the loss?

~~~
larrys
"They're never replaced because regular consumers statistically don't ask for
replacement and simply eat the loss?"

There is another paradox as well. Some people won't ask for replacement
because assuming you have to send in the bricked drive there is the chance
that someone might get at your data somehow.

What about that? (It's why I would never send in a drive that has failed.) [1]

[1] My assumption is that I would have to send in the bad drive (and there is
no way to reformat or for me to easily destroy what might be on there). Anyone
have experience with what happens here?

~~~
spindritf
I encrypt my drives anyway. If they can reanimate the drive and crack the
encryption, well, they deserve to see the data.

~~~
larrys
Question (not a statement). Doesn't encrypting the drive also prevent you from
recovering some of the information on the disk by way of drive recovery
utilities?

So for example you might want to encrypt something super sensitive (which I
do) but decide to not encrypt something less sensitive (say photos or perhaps
a wiki with notes or letters to your grandma or wife or sig other).

Point being that if the drive isn't encrypted you might be able to get at some
of that data. (If you need to). If you've encrypted the drive then can you
still do that?

~~~
spindritf
Yes, the data will probably be completely irrecoverable from an encrypted
drive. However, instead of hoping that you will be able to recover data from a
broken drive, back it up.

I strongly recommend Tarsnap[1] for that. All your data is encrypted before it
leaves your machine, it is run by our very own 'cperciva, and the key used for
encryption (which you need to store securely somewhere, like your parents'
house or a bank) can even be printed so hard to destroy accidentally.

The key itself can be encrypted, too. You can use the same password you use to
encrypt the drive and now you safely and securely store all your data in such
a way that only you can ever access it by remembering a single, longer phrase.

(Although to be fair, another backup would probably be a good idea if the data
is really important. Maybe another encrypted hard drive kept at work.)

[1] [https://www.tarsnap.com/](https://www.tarsnap.com/)

~~~
larrys
I think that looks good but I have to tell you that the idea of committing
data to what appears to be one person who could get "hit by a bus" worries me.

I'd like to see on sites like that some kind of continuity plan.

Unless you are suggestion that this is just another "redundant array of
backups that you have to assume can fail".

But even in that case it would be a good idea if cperciva had something posted
on the site which showed there was someone else who had access and kept on top
of the system.

~~~
rajivm
[http://mail.tarsnap.com/tarsnap-
users/msg00849.html](http://mail.tarsnap.com/tarsnap-users/msg00849.html)

------
emodendroket
How times have changed; Seagate used to be (or at least have the reputation of
being) the most reliable and Hitachi the least.

~~~
jcampbell1
At some point, Hitachi Diskstars were referred to as "Death Stars", and that
was all I knew about disk reliability. It is great to have some real
information.

Dell, Google, and Amazon can never write reports like this because the vendor
relationship is important. Because these guys have no relationship and are
buying consumer disks, the world finally gets brand level reliability reports.
Kudos to Backblaze.

~~~
jonknee
> Dell, Google, and Amazon can never write reports like this because the
> vendor relationship is important. Because these guys have no relationship
> and are buying consumer disks, the world finally gets brand level
> reliability reports. Kudos to Backblaze.

Dell maybe, but Google and Amazon consider what goes on in their data centers
to be part of their secret sauce. It doesn't have to do with vendor
relationships, it has to do with a competitive edge. Google and Amazon will
drop a vendor without a second thought if it's even slightly more
advantageous.

------
shiftpgdn
I manage a computation cluster for an oil and gas exploration company. We have
a 50% failure (and rising!) of Seagate Constellation drives in 250GB, 1TB and
2TB configurations. My sample size is fairly small at a few hundred drives but
man does it keep me busy.

~~~
iSloth
Were they all purchased at the same time? Sounds more like a faulty batch or
issue with the environment for a rate that high.

~~~
shiftpgdn
They've been purchased over a span of a few years so I doubt that's the case.
They did have a single exposure to 110F ambient temperature for a few hours
when the A/C to the server room went out which may be a contributing factor.

~~~
pixl97
I have found that to be very hard on disks. Lost a AC unit in summer and over
the next month had a much higher rate of disk loss.

------
archgrove
Not that I use even 0.001% of the disks that BackBlaze go through, but my
anecdata suggests the same. The only dead hard disks I have on my desk at the
moment are Seagate, and they dominate the disks I've sent back in the last few
years.

However, they are cheap, and they do honour their warranties. Would just be
nice if they didn't have to quite so much.

~~~
atYevP
I think my new favorite phrase is going to be "anecdata". Love that.

------
saosebastiao
Tangential: When are you going to offer a linux client?

~~~
tachion
Even better, when will be FreeBSD client available?! :)

~~~
spindritf
You're pretty much morally obligated to use Tarsnap when running FreeBSD.

~~~
rakoo
You don't get the same space/price ratio, though...

~~~
mappu
Try attic-backup[1] to one of these[2] super-cheap VPSes. 2.5c/GB/month for a
50GB block.

1\. [https://attic-backup.org/](https://attic-backup.org/)

2\. [https://crissic.net/openvz_vps](https://crissic.net/openvz_vps)

~~~
icebraining
That's only cheap because it's a small block. If you need larger ones, you can
get much cheaper. I pay 0.64c/GB/month on my 2TB storage with TransIP:
[https://www.transip.eu/vps/big-storage/](https://www.transip.eu/vps/big-
storage/)

------
tambourine_man
I love reading these posts from Backblaze, but what I never understand is that
they are getting a cost of U$ ~0.05/GB with their storage pods:

[https://www.backblaze.com/blog/why-now-is-the-time-for-
backb...](https://www.backblaze.com/blog/why-now-is-the-time-for-backblaze-to-
build-a-270-tb-storage-pod/)

At these rates, why not use S3? What am I missing?

~~~
frankchn
Amazon S3 is US$0.0275/GB _per month_ at high volumes. I assume Backblaze's
operational cost of keeping each Backblaze pod hooked up and online is much
much less than US$0.05/GB/month.

~~~
tambourine_man
You're right, sorry. I'm so used to seeing U$/month I didn't realize it was
the per GB cost for _building_ the pod.

------
cake
I wish there was something similar for SSDs.

~~~
nl
There is, roughly. This is a lab test as opposed to production monitoring, but
it is still interesting. Obviously SSDs and HDDs are quite different
technically, so the appropriate test is quite different. They write data
continuously to see how long they last.

 _We 've now written over a petabyte, and only half of the SSDs remain. Three
drives failed at different points—and in different ways—before reaching the
1PB milestone_[1]

[http://techreport.com/review/26523/the-ssd-endurance-
experim...](http://techreport.com/review/26523/the-ssd-endurance-experiment-
casualties-on-the-way-to-a-petabyte)

~~~
XzetaU8
Latest results: [http://techreport.com/review/27062/the-ssd-endurance-
experim...](http://techreport.com/review/27062/the-ssd-endurance-experiment-
only-two-remain-after-1-5pb)

~~~
techrat
Also worth looking at:

[http://us.hardware.info/reviews/4178/10/hardwareinfo-
tests-l...](http://us.hardware.info/reviews/4178/10/hardwareinfo-tests-
lifespan-of-samsung-ssd-840-250gb-tlc-ssd-updated-with-final-conclusion-final-
update-20-6-2013)

Hardware.info did a "Write until it dies" test for a couple of 250GB Samsung
SSDs last year. They found that the drive consistently exceeds the 1000 writes
per cell spec.

------
makmanalp
Anyone have any reliability information on hitachi's new NAS drive series?
They're supposed to build on the 7k3000 etc, but specifically tailored for NAS
/ raid situations, like WD reds. One major difference is that they're 7200 rpm
instead of 5400 which is most non-high-end NAS drives.

~~~
linker3000
[http://www.hgst.com/hard-drives/internal-drive-kits/nas-
desk...](http://www.hgst.com/hard-drives/internal-drive-kits/nas-desktop-
drive-kit)

Specs and data sheets

------
justcommenting
There are well established methods for time-to-failure and time-to-event data
not used here. The author makes no effort to control for the multiple, obvious
biases created by the analytical approach employed. A few simple graphs would
give a much more telling view of these data.

~~~
sillysaurus3
Would you mind listing some of those time-to-failure and time-to-event methods
and how the author might control for them, and which graphs in particular the
author should have included?

~~~
justcommenting
first, i should've mentioned from the outset that these are really interesting
and useful data, and that i'm glad you took the time to generate them. i
really wish consumers could systematically report these data in a way that was
reliable/trustworthy...!

cox proportional hazards models and KM survival curves are the big kahuna with
a data set like this. basically, my impression is that you'd want to pretend
that you're doing a cohort study and essentially analyze it like a big
clinical trial.

and re: graphing, since you have sub-samples of the different drive models and
relatively different but still small numbers of models for each brand, the big
take-away graph comparing manufacturers leaves out a lot of useful nuances
that could be pulled out of the table, e.g. that most of the hitachi drives
are newer and you have fewer of them. it also doesn't portray how consistent
failure rates are across models produced by the same brand might be, and it
looks like there's a significant range of failure rates across different
seagate models. so even just seeing IQRs of pooled failure rates for each
manufacturer in box plots might be eye opening..

immortal time bias may be something to consider here as well...when you have
some sample groups that mostly include newer individual drives that you have
not yet had for a year on average, subtle differences in how the failure event
is described can make a big difference in the conclusions you
draw...especially in terms of uncertainty. if i have 10,000 hitachi drives and
5 of them fail in the first 6 months, the robustness of conclusions i can draw
from those data are different in some important ways from similar insights
drawn from a sample of 1,000 hitachi drives i've used for 5 years.

it's also not clear to me how you've dealt with replacement drives. based on
what i gather from the post (and i could be totally wrong), if you have a
bunch of one model of a drive failing and then get replacements for them...
some might argue that refurbed drives should be analyzed almost like a
separate drive model, since they often have physical differences compared to
those bought via retail channels.

i'd be quite curious to dig into these data a bit further if you're willing to
post the original data set...

thanks again for posting information that's quite useful and interesting

~~~
atYevP
Yev from Backblaze here -> We've talked about posting the raw data, but
haven't quite decided on that yet. I'll make sure to forward this to Brian so
he can take a look and see whether or not any of the above would be feasible
for the next one!

------
TheLoneWolfling
I'd be interested to see what a graph of percentage remaining versus time
since installation looks like for these. Might give a better picture of what's
going on.

------
shadeless
I recently bought 3TB Western Digital Red, following their advice from [1],
but now I see that it has yearly failure rate of 8.8%, bummer.

Off-topic, but It's a shame that BackBlaze isn't available in some countries,
I'd love to use it. What would be the best alternative to it, Tarsnap?

[1] [https://www.backblaze.com/blog/what-hard-drive-should-i-
buy/](https://www.backblaze.com/blog/what-hard-drive-should-i-buy/)

------
robomartin
Thanks for sharing such useful data. I just had a Seagate drive fail. Was able
to recover data since the last local backup with various tools. It took hours
of repair work.

I've been procrastinating about getting off-site backup. This post on HN
reminded me that I've been meaning to get an account going with your company
for a while. I just signed up and will test on my machine before deploying to
other machines in my business. Thank you.

------
rancor
Used in a small file server, my net failure rate on Seagate's consumer 3TB
drives has been over 50% thus far. The pair of their SAS drives I currently
have in use have been fine, although both of them are still below a year of
service life... Edit: Just checked my drive status, and yet another one has
dropped. If I'm doing my math correctly, that's 75% of the drives that weren't
DOA...

------
ars
No Toshiba hard disks apparently.

HGST and Wester Digital are the same company, but it seems they have separate
product lines? It's confusing.

~~~
ShinyObject
The Hitachi drives are Toshiba drives.

The 3.5" desktop and server Hitachi drives are all manufactured by Toshiba. WD
wanted to buy Hitachi's drive division so there would be a WD/Seagate drive
duopoly, European regulators said they had to sell Hitachi's 3.5" to someone
else before they would approve the merger. Toshiba stepped up and got it.

[http://www.anandtech.com/show/5635/western-digital-to-
sell-h...](http://www.anandtech.com/show/5635/western-digital-to-sell-
hitachis-35-hard-drive-business-to-toshiba-complete-hitachi-buyout)

So yes it is somewhat confusing. You buy one of these drives and it will say
"HGST a Western Digital company" on the box, when in reality it's all made by
Toshiba, probably in an old Hitachi factory.

~~~
blahblah234
Toshiba drives are not exactly Hitachi drives.

First, Toshiba had their own 3.5" drives long before the merger with Hitachi.
For example, Toshiba MK2002TSKB 2TB 3.5" hard drive was on sale since 2011.

Toshiba's own design and factories wouldn't magically disappear after
acquiring Hitachi's assets. So, after the merger, Toshiba sells some ex-
Hitachi drives, for example toshiba DT01ACA300 3TB is obviously a relabeled
Hitachi drive. But I believe that Toshiba MD03ACA300 3TB and Toshiba
MD04ACA300 3TB drives are based on Toshiba's own design because they don't
look like Hitachi or DT01ACA300. I suppose reliability and performance will be
different between these three Toshiba 3TB drives. And it would be interesting
to get some info on this.

Also, HGST is wholly owned by Western Digital. Despite regulators requirement,
they didn't sell all 3.5" assets to Toshiba, only some of them. Most of the
good stuff (that is, everything currently sold under HGST brand) still went to
WD. So, even ex-Hitachi drives sold under Toshiba brand (by Toshiba) and under
HGST brand (by WD) might have different reliability.

------
cgore
My main Linux box has quite a few hard drives in it from a large range of
time. About 4 weeks ago the oldest of them all died: it is from 2007, so about
7 years old, which I think is pretty good for a consumer drive that's on
24/7\. It was a Western Digital Caviar SE WD3200JB, 320GB. I replaced it with
a 2TB drive.

[No lost data, I do daily backups.]

~~~
luckyno13
I just replaced a WD Black 640 3 years into its life. Blacks have a 5 year so
they RMA'd it and sent me a 750 in its place.

I do however have a Seagate drive laying around somewhere that has almost 10
years on it and it still functions flawlessly. But it is admittedly smaller
given its age and that may contribute to lifespan. Either that or Seagate has
slipped in the last decade.

~~~
sounds
Seagate apparently has slipped in the last 2-3 years. (Could this be related
to the hard disk shortage due to flooding in Malaysia?)

Hard drive failures tend to happen more for drives that are power cycled a
lot, and for drives that undergo big swings in temperature (even when the
temps are all within the rated temp range).

------
BuckRogers
I keep seeing this stuff over the years, so I go buy something other than
Seagate like WD... and they fail within a year. So I replace it with a Seagate
and no problems for years. See another report that says Seagate is terrible-
repeat process.

I'm just going to keep using Seagate until my anecdata refutes the reality I
live in.

~~~
atYevP
Yev from Backblaze here -> all of the drives in our datacenter will fail. We
just report on the timing of those failures. We still buy Seagates in droves
because they are a great value. As long as you have backups of the data, any
drive is great!

------
larrys
Question for the OP here (or for anyone else).

Do you burn in new drives before using? I typically will take any new drive
and do some type of stress test [1] on it for 18 to 24 hours to see if it
fails with that initial constant use.

[1] Constant reformatting for example writing 0's to the entire disk 7 times
etc.

~~~
atYevP
Yev from Backblaze here -> Yes, we do burn in the drives before deploying
them. So the drives in the study are ones that have at least made it past that
state.

~~~
pbhjpbhj
Doesn't that negate a lot of the usefulness of the data? If most Hitachis fail
in the burn-in but all Seagates pass it then the expected failure of a drive
would be wildly different to that suggested by the charts.

Do you have the figures for raw losses including at burn-in?

I mean, sure, you burn-in and do warranty returns so buying Hitachi would
still seem better - but if one needs a drive to just work then it's key to
know the overall failure rate.

~~~
brianwski
Brian from Backblaze here. Let's call that "DOA Failure rate", where the drive
is either dead on arrival, or does not survive the 2 day initial burn in test.
I'll ask the boots on the ground for more data, but my impression is the 2 day
burn in doesn't catch many drive failures, let's say less than 1/2 of 1
percent. We have even proposed skipping it, but we gain confidence the REST of
the system is wired up correctly and other components like power supplies also
have infant mortality.

~~~
larrys
Very interesting. It might be an idea to do some post mortems on the failed
drives. [1]

Why not take apart some failed drives and see what you can find out as to why
they failed vs. ones that did not. Perhaps info that might be of interest to
the manufacturer but in any case would make an good blog post. Or maybe you
can cannibalize and use again.

[1] I did an "air crash investigation" on a RC Chopper that had crashed. I
found out that a servo failed because a part in it was plastic (a gear) (as
opposed to, I think, brass). Consequently the jerky move that I made was
enough to cause the plastic gear to loose a tooth and then I lost control.

------
sauere
Hard drive age a bad parameter to use. It should be the hours the drive was
actually powered on.

~~~
jamroom
I have a feeling that with Backblaze that ALL of their drives are mostly
powered on continuously until they fail (i.e. they might sit for a bit at
their data center waiting to go into one of their pods, but after that they
pretty much run until they fail).

~~~
brianwski
Backblaze Employee here - this is correct. Pods get powered up, then pummeled
with customer data until full. Then they stay spinning forever, but the rate
of writes drops to "churn" rates for the rest of the drive's life. This use
pattern is all we have experience with, so if your application is different
(maybe a database that gets heavy load forever) then your outcomes may vary.

------
GravityWell
This is extraordinarily useful and unique. My compliments to Backblaze for
making this available. This is the type of empirical data I would love to have
for as many things as possible: SSDs, monitors, TVs, kitchen appliances,
tablets, cars, etc.

------
BradRuderman
Let's get a blog post describing how they handled reimbursements for the drive
farming. I imagine it to be incredibly complicated to cross reference a
receipt with a product at that scale , especially since all the products were
the same.

~~~
andy4blaze
Andy from Backblaze here: Here's the post we did about that if you're
interested: [https://www.backblaze.com/blog/crowdsourcing-hard-
drives/](https://www.backblaze.com/blog/crowdsourcing-hard-drives/)

------
arb99
Very off topic, but their html is wrong:

"<a href='[https://www.backblaze.com/blog/hard-drive-reliability-
update...](https://www.backblaze.com/blog/hard-drive-reliability-update-
september-2014'><img) src='[https://www.backblaze.com/blog/wp-
content/uploads/2014/09/bl...](https://www.backblaze.com/blog/wp-
content/uploads/2014/09/blog-fail-drives-manufacture-report2.jpg') alt='Hard
Drive Failure Rates by Model' width='560px' border='0' /></a>"

should be "width='560'" not "width='560px'"

~~~
atYevP
Yev from Backblaze here -> It should work with the 560px, if you're referring
to the "share this" html code!

------
mercurialshark
All my WD and Seagate drives have failed within two years of use. Call me the
luckiest.

~~~
B5geek
My small datacenter results mimic BackBlaze too. Dead/dying seagates all over
the place. So I notified management that we will only be purchasing Hitachi
drives from now on. I have a BackBlaze server that I recently converted to
FreeBSD & ZFS. I love the drive-density that Backblaze offers but I HATE the
lack of physical notification when a drive dies.

Most ofther file-servers have a front-facing drive caddy, that usually has
LEDs on the front to indicate disk access or errors. This is great because you
can walk into the datacenter, and SEE which disk has failed. With the
backBlaze system you can get /dev/DriveID but not know where in the array that
particular disk is.

~~~
sounds
You definitely don't want to use drives from the same manufacturing run
(batch) on the same array, since they are the most likely to fail all at the
same time.

Second to that, you probably don't want to go single-source for your drives --
maybe use Hitachi with a mix of WD.

~~~
rane
> Second to that, you probably don't want to go single-source for your drives
> -- maybe use Hitachi with a mix of WD.

Can you explain why not?

~~~
bronson
Because then a single firmware bug could wipe you out. Once the drives all hit
4500 hours or 700 bad blocks or some other trigger, they die. It's happened
before.

Could also be caused by bad grease, or shoddy bearings, or pretty much
anything.

There is security in diversity.

