
Hard disc test surprises Google - pg
http://news.bbc.co.uk/2/hi/technology/6376021.stm
======
dankelley
In case you're interested in reading the report, it's at
http://216.239.37.132/papers/disk_failures.pdf at least as of 20070220. The
report has quite a few graphs, but it has a surprisingly unstatistical flavour
about it.

------
rtm
I wonder what this means for practical system design. Do people currently
build assumptions about hard drive failure patterns into their systems, in a
way that they should change? I suppose independent failure (i.e. copying data
to two drives is better than storing it on just one) is the main assumption
behind e.g. RAID; I wonder whether Google has any new insight there.

~~~
jbert
You should be able to improve over naive RAID by pairing a relatively-high-
probability-of-failure drive with a low prob one. i.e. what you *shouldn't* do
is the common practice of putting two new drives in a mirror, since they are
both in the infant mortality part of the failure curve. What this data
suggests is that you'll get a smaller chance of losing data (via simultaneous
failure) if you pair a new drive with an older "proven" one (but not one so
old that it is nearing end of life).

~~~
charliehotel
infant mortality is practically non-existent for enterprise-class drives and
rare for consumer-class drives.

there is something to gain by using drives from different manufacturers (or
different lots from the same manufacturer) within an array.

------
charliehotel
the analysis in this paper is problematic.

the main problem is that the authors didn't look at the data by disk model and
manufacturing lot. ideally you should remove drives with known problems from
the population.

known problems? yes. there aren't any truly horrible drives out there, but
there is he occasional bad bunch. a three point or more difference in afr
between "good" drives and a bad bunch is typical.

disclosure: i reviewed this paper for the FAST program committee.

