But I have a question - why do they share this info? Is it to show they’re reliable or just for curiosity? Or some other reason?
Plus along the way some folks find out about us and sign up for the services we offer (B2 Cloud Storage and Computer Backup) and that's nice too! Plus we also like these conversations and at the end of the day, it's fun!
Would like to encourage your organization to keep publishing these works, and the works like your POD. It’s really spurned on a lot of innovation and sharing.
Thanks for a great product. I've signed up so many people.
Loved the service so far, too. Have had a few times over the years where I needed to recover data from the backup, and it worked perfectly every time.
It has many facets. It's attractive for us, nerds. It also helps their providers to spot problematic models. They also show their technical prowess, and lastly I always take a look to the most failing models and try to avoid them in the data center, if I can.
In case someone from backblaze read this :)
Thats because a "worn out" flash sector is never fully worn out - it can still store some data, just less than the error correction can correct. It is possible to combine two sectors to have extra error correction data to still recover a sector. Now you have less than half the performance.
Worn out flash also doesn't hold data long - perhaps only a few hours before too many bits have flipped and it is unreadable. To fix that, you need to rewrite the data, which slows everything down more.
And now that you have a bunch of unreliable sectors, you also need "super-sectors" which can do sector-based hierarchical erasure coding to recover data from sectors where even the methods above have caused data to be lost. This slows down writes even more.
In the worst case, reading a single sector requires reading every sector on the drive to reconstruct. Clearly thats going to be slow enough the drive will have stopped being used long before that.
Sadly some drive firmware doesn't implement some or all of the above, so they appear to have "failed" and become unreadable, which IMO is inexcusable when it's very easy to design so that worn out drives become slow instead.
Do you have any examples or references that drives that implement your suggested algorithms?
I wouldn't expect drives to have sector-splicing and super-sectors (though multi-level cells regularly store less bits/cell), infinitely degrade their size via write-failures or rewrite data every hour. Especially frequent rewriting would self-destruct the drive if it weren't already preceded by catastrophic data loss.
SSD firmware is also a spectrum between "correct" and "performance", and I know of no SSD's that for example maintain all of the acknowledged data on a power failure. Sure many SSD's may typically do that, but that isn't a guarantee when the power fails in worst-case conditions.
Old Apple SSD's for example have a special extra wire on the connector specifically for "impending power failure" to help them do that. PC SSD's don't even have a standard message to mean "power failure expected in 250 milliseconds".
If there's one trend I've seen from using generations of HDDs, excluding some problematic generations like first SATA Seagate Barracudas and early WD Caviars which died for no reason at all, newer generation HDDs are always more reliable from previous generation, regardless of their class (datacenter / consumer).
For the last 10 years or so (starting with the introduction of first WD Green / Blue / Black series), the HDDs are exceptionally reliable unless you abuse them on purpose (like continuous random read/write benchmarking).
I've replaced two 11 year old WD Blacks w/o any problems this year to upgrade to two IronWolf Pro NAS drives, because I wanted something dense and PMR. At office, I changed an old Seagate Constellation ES.2 (aka Barracuda enterprise) drive since it started to develop bad sectors (which I removed from an old disk storage unit anyway). IIRC, it was around ~10 years old too with a much heavier workload history.
Looks like the most differentiating factors between enterprise and consumer drives are the command sets they support and features they bundle. NAS and other enterprise drives have features to make them more reliable in harsher conditions (heat, vibration, operational knocks induced by hot swapping, etc.).
If you're getting an enterprise disk with a storage unit, you're probably also getting disks with special firmware developed for this brand anyway, so they're not off the shelf enterprise drives.
At the end of the day, for normal operating conditions, device class doesn't matter for the home user, but for density and speed, you might need to get an enterprise drive anyway.
You can. I’ve been getting the 16s and they are great but with caveats. The size makes volume creation and expansion insanely long (like a week per added drive). Not Seagate’s fault I know.
The noise. They are loud. They are either chirping away or grinding away.
The NAS is replaced but I still have the drives just for labbing / testing purposes.
I never had a drive failure during the lifetime of the NAS. Probably because it was off most of the time and only powered-on with wakeonlan when needed.
So those drives don’t have many hours on them. But recently they started dying. I lost 3 of them this year during some tests.
Imagine that these drives are probably 10+ years old.
Age does seem to matter.
Obviously this is a small uncontrolled sample but it seems that you really should keep this in mind when you run a NAS at home. Keep an eye on the SMART parameters as suggested by Backblaze and really consider replacing drives at some point. I would be afraid that drives do start dying at the same time due to age.
If so, it would be interesting to know how many drives failed on the day of a power cycle vs days with no power cycle.
I know other providers have found that "power cycle days" can be 100x more deadly for drives than "non-power-cycle days". It can have a massive impact when estimating data loss probabilities - since unforseen power cycle days tend to impact more than one drive at a time...
Hard drives are quite fragile: if they're dropped hard by the delivery person (ex: dropped onto your concrete steps), they could break.
That curve doesn't seem to match the data here. Or if it does, it says the "old" increase in failure rate happens at over 5 years.
I would guess backblaze will replace these old drives because they are too small/too slow/use too much power before they replace them for being too unreliable.
I've been involved in server farms with thousands of (mostly Intel) SSDs and (mostly WD) spinning drives; the spinning drives tended to have pre-failure indicators, but we couldn't figure out any indicators before SSD failure and generally they would just completely disappear from the bus when they did fail. The failure rate was signficantly less though. Our write rate wasn't very high and tended to be small writes; for busy disks, more than what we could do with a spinning disk, but usually not near the capability of the drives.
This "show your work" strategy helps me trust them at the end of the day. With this kind of storage being a commodity, a high level of openness could be a competitive advantage.
And then, we could compute what the life expectancy of a given model is given how long you've had it (just bought, few years old, etc)
It'd be fun to compare those across vendors and drive models. There's maybe even enough data at this point that some of the numbers might be meaningful! =)
Backblaze Hard Drive Stats - https://news.ycombinator.com/item?id=26696876 - April 2021 (108 comments)
Backblaze Hard Drive Stats for 2020 - https://news.ycombinator.com/item?id=25917466 - Jan 2021 (175 comments)
Backblaze Hard Drive Stats Q3 2020 - https://news.ycombinator.com/item?id=24838106 - Oct 2020 (112 comments)
Backblaze Hard Drive Stats Q2 2020 - https://news.ycombinator.com/item?id=24200048 - Aug 2020 (119 comments)
Backblaze Hard Drive Stats Q1 2020 - https://news.ycombinator.com/item?id=23156815 - May 2020 (95 comments)
Hard Drive Stats for 2019 - https://news.ycombinator.com/item?id=22299251 - Feb 2020 (163 comments)
Backblaze Hard Drive Stats Q3 2019 - https://news.ycombinator.com/item?id=21515084 - Nov 2019 (122 comments)
What S.M.A.R.T Stats Can Tell You About a Business - https://news.ycombinator.com/item?id=21331995 - Oct 2019 (3 comments)
Backblaze Hard Drive Stats Q2 2019 - https://news.ycombinator.com/item?id=20624464 - Aug 2019 (136 comments)
Backblaze Hard Drive Stats Q1 2019 - https://news.ycombinator.com/item?id=19788157 - April 2019 (21 comments)
Backblaze Hard Drive Stats for 2018 - https://news.ycombinator.com/item?id=18969264 - Jan 2019 (158 comments)
Backblaze Drive Stats: 2018 Q3 Hard Drive Failure Rates - https://news.ycombinator.com/item?id=18229776 - Oct 2018 (49 comments)
Hard Drive Stats for Q2 2018 - https://news.ycombinator.com/item?id=17601687 - July 2018 (16 comments)
Backblaze's Hard Drive Stats for Q1 2018 - https://news.ycombinator.com/item?id=16967146 - May 2018 (30 comments)
Backblaze Hard Drive Stats for 2017 - https://news.ycombinator.com/item?id=16283844 - Feb 2018 (31 comments)
Hard Drive Stats for Q3 2017 - https://news.ycombinator.com/item?id=15559679 - Oct 2017 (47 comments)
Applying medical statistics to Backblaze Hard Drive stats - https://news.ycombinator.com/item?id=15438052 - Oct 2017 (3 comments)
Backblaze's Hard Drive Stats for Q2 2017 - https://news.ycombinator.com/item?id=15125340 - Aug 2017 (143 comments)
Hard Drive Stats for Q1 2017 - https://news.ycombinator.com/item?id=14300634 - May 2017 (4 comments)
Backblaze Hard Drive Stats for 2016 - https://news.ycombinator.com/item?id=13531768 - Jan 2017 (132 comments)
Backblaze hard drive reliability stats for Q3 2016 - https://news.ycombinator.com/item?id=12959457 - Nov 2016 (110 comments)
What SMART Stats Tell Us About Hard Drives - https://news.ycombinator.com/item?id=12655445 - Oct 2016 (68 comments)
Hard Drive Stats for Q2 2016 - https://news.ycombinator.com/item?id=12210509 - Aug 2016 (57 comments)
One Billion Drive Hours and Counting: Q1 2016 Hard Drive Stats - https://news.ycombinator.com/item?id=11712989 - May 2016 (135 comments)
Backblaze Hard Drive Reliability Stats for Q3 2015 - https://news.ycombinator.com/item?id=10387962 - Oct 2015 (19 comments)
Hard Drive Reliability Stats for Q1 2015 - https://news.ycombinator.com/item?id=9589234 - May 2015 (26 comments)
Predicting Hard Drive Failures with SMART Stats - https://news.ycombinator.com/item?id=8745062 - Dec 2014 (20 comments)
I actually thought there had been more of these.