
Predicting Disk Replacement Towards Reliable Data Centers [pdf] - zxv
http://www.kdd.org/kdd2016/papers/files/adf0849-botezatuA.pdf
======
zxv
This paper [1] analyzes the Backblaze open source Hard Drive reliability data
set [2].

In this paper, Researchers at IBM Research Zurich analyze Backblaze's raw
data. They used machine learning to formulate drive replacement rules with
confidence intervals [1,3].

In addition to the factors identified by Backblaze, they identify certain
additional Smart stats that enhance the predictive capability [1, Table 6].

For Hitachi drives they factor in average time of spindle spin up (Smart raw
3). For Seagate they factor in count of aborted operations due to HDD timeout
(Smart raw 188).

[1] Botezatu et al, KDD 2016, Predicting Disk Replacement towards Reliable
Data Centers,
[http://www.kdd.org/kdd2016/papers/files/adf0849-botezatuA.pd...](http://www.kdd.org/kdd2016/papers/files/adf0849-botezatuA.pdf)

[2] What SMART Stats Tell Us About Hard Drives,
[https://www.backblaze.com/blog/what-smart-stats-indicate-
har...](https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-
failures/)

[3] Predicting disk failures for reliable clouds, IBM Research Blog,
[https://www.ibm.com/blogs/research/2016/08/predicting-
disk-f...](https://www.ibm.com/blogs/research/2016/08/predicting-disk-
failures-reliable-clouds/)

~~~
andy4blaze
The Botezatu et al paper is quite good. The recommendations they make are well
worth further consideration in our environment. -- Andy at Backblaze.

------
hbogert
Eager to know if their model would still be accurate when new/future hard disk
models are used, i.e., would their model need to be retrained every time. If
so, you would first have to wait until the new disks are old enough to show
signs of wear and tear, before the machine learning techniques can say
something meaningful. So in practice I'm afraid this approach gives you only a
pretty high confidence intervals for for last generation's harddisks. But
companies like Backblaze probably buy the relatively newest type of hardware
every new "hardware-season".

~~~
zxv
Good question: will newer firmware continue to behave the same?

The backblaze dataset itself covers a certain duration of time. One could
model an early portion of the data, and test the prediction for the latter
portion. That could be one way to approach the question.

One unstated assumption here is: the hard dives in the data set are running
the hard drive manufacturer's retail firmware, not another storage vendor's
(Dell, HP, EMC, etc) modified firmware. I believe this is the case, and it may
also contribute to the consistency.

------
mkj
So who's going to make us all a nice fork of smartmontools that does all the
prediction?

~~~
zxv
The code is in SVN, but there's also a mirror on github:
[https://github.com/mirror/smartmontools](https://github.com/mirror/smartmontools)

