Hard Drive Stats for Q2 2016

_mqsf · on Aug 2, 2016

Something important that I don't think many people bring up is the cost of these drives in a RAID, taking reliability into account.

When you're using this as the sole drive in a desktop machine or something like that, paying the extra for a HGST is pretty straightforward, but I see too many people passing over Seagate or WD when they intend to put them in a RAID, for "reliability"'s sake, without (it seems to me) much thought into whether it's really worth it.

Sorting by $/GB for Hitachi, Seagate and WD on drives >= 4TB (http://pcpartpicker.com/products/internal-hard-drive/#m=19,3...), you get:

- 4TB WD: $129

- 4TB Seagate: $114

- 4TB Hitachi (HGST): $158

So in order to get the reliability of a HGST, you're paying a 38% premium, is the extra reliability helpful in a RAID?

If you look at say a RAID6 array, aiming for usable storage of say 16TB, HGST's longest running drive (at 0.11% failure rate) would have a roughly 0.000016% chance of failing in a given year (0.0011^3 * 6 * 5 * 4) (chance of first drive failing = 6x(single), second=5x(single) etc.), assuming nobody replaced a failed disk, since you need 3 drives to fail.

Now an array of Seagate's longest running drive (2.66% failure rate) of the same size would have a chance of failure of ~0.225853% in a given year.

But, the cost of the disks in the Hitachi array would be 6x158 = $948 while the price of the disks in the Seagate array would be 6x114 = $684. For the price of the Hitachi disks you can get 8 Seagate disks.

So what happens if you just add the extra Seagate disks to the array as extra parity? Now you need 5 drive failures, giving an equation that looks like ((8 * 7 * 6 * 5 * 4 * (0.0266^5)) * 100), and a chance of failure of 0.008949%, still way higher than the Hitachi.

In the end buying Hitachi/HGST seems like the right choice anyway but I thought it interesting since I hadn't seen anyone else look at things this way.

If anyone has any problems with my math, please feel free to point it out, my stats background is pretty limited.

haasn · on Aug 3, 2016

I think a better way to look at it is not the chance of total data loss, but the expected running cost over time. Even with just RAID6 it's hard to imagine a scenario in which you would be unable to replace one of the failed drives and resilver before getting two further failures, so the naive reliability calculations are extremely biased.

So if we assume that we can safely operate at some fixed level of redundancy (say 3 copies) across all hard drive vendors, the only question is (how many drives you need to replace per year) × (the price of those drives).

_mqsf · on Aug 3, 2016

The problem for me at least is that most of the drives I use come with 3ish year warranties, so they're RMA'ed for free. By the time I'd have to buy a replacement I'd be intending to upgrade anyway.

aidenn0 · on Aug 3, 2016

RAID6 is unusable for many types of "always on" data, in particular databases backing high volume sites; the performance drops so much during a rebuild that for many uses it might as well be offline.

throwaway7767 · on Aug 3, 2016

Then you use mirrors, you can apply a similar calculation there. Lower price per disk means you can put more disks in each mirror and compensate for lower per-disk reliability.

Obviously the type of redundancy used will depend on your specific application requirements, including performance.

npunt · on Aug 3, 2016

Good analysis and I've wondered this myself, though this assumes failure rate is linear rather than accelerating over time, that failures are independent (vs a design flaw that causes them to fail around the same time, which happens enough that people recommend buying drives from separate batches / date of mfg), and you have to factor in the cost of a larger RAID array + power to support the same level of reliability. It may make sense in data centers but the smaller you go, the bigger the jump in cost from drive arrays (e.g. a synology 2-bay, 4-bay, 5-bay, and 8-bay are each pretty significant jumps). I too am on team HGST because of that.

kgermino · on Aug 3, 2016

>though this assumes failure rate is linear rather than accelerating over time, that failures are independent (vs a design flaw that causes them to fail around the same time, which happens enough that people recommend buying drives from separate batches / date of mfg)

I saw this happen once. A drive died, it was replaced and while the array was rebuilding several more failed. They ended up losing the entire SAN.

It wasn't a good day.

throwaway7767 · on Aug 3, 2016

This is why I try not to buy many disks at once, just buy one at a time, and if I need more than one I'll go to different stores. Too much chance of getting a bad batch of disks and having the whole array go down (I've had that happen too).

Of course, that doesn't work at a large scale, but then you have other methods of increasing redundancy.

_mqsf · on Aug 3, 2016

Yeah it's by no means perfect. Another problem that comes up is that when one drive fails the load on the others increases, which can lead to more failures. It'd be really interesting if someone's looked into the stats here but I haven't seen it.

As for paying for the power+drive bays, that's true of the consumer NAS devices but if you buy used servers instead it isn't such a problem. For the price of a 12 bay Synology disk station you can buy a 24 bay, 4U, used Supermciro 846E16-R1200B with a couple of CPUs and 144GB of RAM.

Not a great choice if you're short on space but can be decent otherwise.

JoshGlazebrook · on Aug 2, 2016

Looks like Hitachi (HGST) is still leading in terms of reliability. I actually have a 400GB Hitachi drive that came in my Dell Dimension 9100 a little over eleven years ago. It has a few bad sectors but it's still chugging along just fine.

Looks like it's this one: http://www.newegg.com/Product/Product.aspx?Item=N82E16822145...

cm3 · on Aug 2, 2016

I used to shy away from Seagate drives, and even though the stats here are for 3.5", it doesn't look like WD drives are more reliable than the Seagate. I was planning to get a 4TB 2.5" portable usb drive, and right now it's just Seagate that offers one without scary features. WD has one but not the Elements, just the MyPassport which includes the risky AES feature, and I don't want to accidentally lock or encrypt the disk because of that and then need a Windows tool.

Would anyone advise against getting a http://www.seagate.com/consumer/backup/expansion-portable/?s... drive?

slededit · on Aug 2, 2016

I do wish they'd spend more time on their client. I choose them specifically because of how they publish results like these, but when your install can get corrupted like this [1], it doesn't inspire confidence.

1. http://imgur.com/a/3XmEL

rickyc091 · on Aug 2, 2016

Yeah, their software was really buggy so I ended up changing to crashplan. I've been using them for years without any regrets.

dagw · on Aug 2, 2016

Heh. I switched my wife from crashplan to backblaze about 18 month ago because I was tired of the crashplan client being buggy. Backblaze has just worked.

hrrsn · on Aug 2, 2016

I don't like CrashPlan very much, but my backups are way faster to them than they ever were to Backblaze. My home connection has 200mbit upload, so I don't think the bottleneck is on my side.

kalleboo · on Aug 3, 2016

Funny, Backblaze backs up faster than my Apple Time Capsule. I'm on a 1 GBps home connection. I guess it all depends on your peering.

hrrsn · on Aug 10, 2016

I suspect that latency has something to do with it. Crashplan's servers are a good 100ms closer to me.

dagw · on Aug 3, 2016

100 Mbps here, and backblaze uploads are a good 50-100% faster than crashplan

brianwski · on Aug 3, 2016

Brian from Backblaze here.

If you want to use more bandwidth in Backblaze, after you install find the Backblaze Control Panel, open the "Settings..." dialog, go to the "Performance" tab and UNSELECT "Automatic Throttle". Slide the manual slider all the way to the far far right. Then dial up the number of threads to 4 or 6. Seriously, you don't need to go any higher on the threads, after we use 100% of your bandwidth you can't get it to backup any faster. :-)

swinglock · on Aug 2, 2016

The CrashPlan client is horrific but at least it runs on Linux unlike Backblaze. I was hoping Backblaze was better, too bad.

click170 · on Aug 3, 2016

This really kills me about Backblaze. I'd love to support them, I'd prefer no other backup vendor. That's why this is so hurtful:

https://help.backblaze.com/hc/en-us/articles/217664628-Is-Ba...

Spoiler alert, the answer is a surprisingly firm no. Not even an optimistic "we're working on it!".

Doesn't even have to be "Linux support", just open an API, we'll do the rest! Use quotas for API users if the concern is around abuse, but please help us Linux folks use your service! Please!

brianwski · on Aug 3, 2016

Brian from Backblaze here.

We support Linux now with our command line tool found here: https://www.backblaze.com/b2/docs/quick_command_line.html There is no GUI, but the command line tool can "sync" your files with an arbitrary roll back time (the Mac/Windows client are hard coded to 30 days).

For good or bad, syncing to B2 charges "per GByte". If you have less than 1 TByte this will be cheaper than the $5/month for the Mac or Windows client. But if you have more than 1 TByte on Linux, it will cost you about $5/TByte/Month to back it up.

tener · on Aug 3, 2016

They say they don't have enough workforce for the Linux client to happen now, not that they won't ever do it.

aidenn0 · on Aug 3, 2016

Don't they have a storage API now at $0.005/GB/mo?

haasn · on Aug 3, 2016

Wait, wouldn't that be standard? I've never used online backup, but I would naively assume that Linux would just have a block device or FUSE driver for these things.

How can that not be the case?

anonbiocoward · on Aug 3, 2016

I'm not sure, what would a block device have to do with diff'd filesystem backups? And what would FUSE add?

I'm pretty sure the CrashPlan client a .jar, so you just need a jvm, thus broad OS compatibility.

haasn · on Aug 3, 2016

On second thought, block devices are sort of sketchy for this use case - but FUSE still seems like a perfectly natural way to export networked storage.

I'd expect a backup hierarchy to just be a directory of read-only snapshots of my files - I mean, this is how I organize backups locally either way (e.g. /snapshot/YYYY-MM-DD/etc/whatever.conf).

The advantage of using a standard API like filesystems and FUSE is that you can use it easily and from the terminal without needing to deal with bloated GUIs and JVMs that are no doubt hard to install, hard to use and hard to rely on.

A JVM is also a pretty hefty requirement to have on a machine. (I personally avoid installing it all costs, mostly because everything that targets JVM seems to be crud to begin with)

mbesto · on Aug 3, 2016

I have yet to use a single backup client that didn't suck (i.e. churn my CPU, get out of my way, etc). That being said, Druva InSync has been OK-ish so far.

thescriptkiddie · on Aug 2, 2016

I had to stop using crashplan because there is no way to roll back revisions within a file. If a file gets corrupted it'll just back up the corrupted version on top of what was previously a good backup.

barrkel · on Aug 2, 2016

? That's not what I see in crashplan. Expand the tree view until you get to the file level, and then expand the files: it shows the times of the backup revisions. You can also specify a target time to restore to at the bottom of the restore tab.

jarrodatCode42 · on Aug 3, 2016

Wanted to pop in here and expound on the topic.

CrashPlan has versioning that is set (by default) to back up changed files in 15 minute increments. If a file is corrupted, CrashPlan will attempt to back up the changes as it is not a corruption detection utility.

However, it's versioning allows for a user to easily restore the uncorrupted file by selecting a date prior to the corruption occurring.

This process is explained in greater detail here:

https://support.code42.com/CrashPlan/4/Troubleshooting/Recov...

The default versioning can be adjusted, so ultimately it's the responsibility of the individual user to confirm the health of their backup and verify the versioning frequency that best fits their needs.

brianwski · on Aug 3, 2016

Brian from Backblaze here.

The failure in that picture is that somebody deleted the file found at C:\Program Files (x86)\Backblaze\bzbui_interface.xml

That extremely simple file contains all the strings for the Backblaze GUI in all 11 languages. What your screenshot shows is what happens when that file is missing.

If you run a Backblaze installer over the top of that installation it should repair that by placing the latest version in place as it always does. If you STILL cannot get that file installed, something is going entirely, totally hog-wild wrong with your system. Hundreds of people install every day and that file is always present.

If you were running along and suddenly that file disappeared from your computer, I would stop and prepare a full restore from 15 days ago from Backblaze. This is totally free. When files randomly disappear from the hard drive, one cause might be your hard drive is starting to die.

slededit · on Aug 4, 2016

Since your willing to look at this now, here is the install log: http://pastebin.com/MB6dEiZD

Looking at it, its clear it had some problems. The issue is the installer said it finished successfully.

TeMPOraL · on Aug 3, 2016

Well, one has to grant them that they at least did use meaningful labels for their strings. You can figure out how to use the app from that alone.

pcurve · on Aug 2, 2016

What are we exactly looking at here? That is somewhat frightening how busted it looks.

atYevP · on Aug 2, 2016

Yev from Backblaze here - Looking at those screens that's what happens when the installer doesn't install correctly (obviously), it can happen for a bunch of reasons, usually the easy stuff like anti-virus and all that jazz, but usually clears up if you reboot and retry.

toast0 · on Aug 2, 2016

That doesn't look that busted to me, it appears that something prevented localized strings from loading, so we're just seeing the translation tags. It's not surprising that they don't fit in the UI, cause they're kind of long -- probably much larger than the localized strings are.

brianwski · on Aug 3, 2016

Brian from Backblaze here, you are absolutely correct. I'll copy and paste my answer from above to here:

The failure in that picture is that somebody deleted the file found at C:\Program Files (x86)\Backblaze\bzbui_interface.xml That extremely simple file contains all the strings for the Backblaze GUI in all 11 languages. What your screenshot shows is what happens when that file is missing.

If you run a Backblaze installer over the top of that installation it should repair that by placing the latest version in place as it always does. If you STILL cannot get that file installed, something is going entirely, totally hog-wild wrong with your system. Hundreds of people install every day and that file is always present.

If you were running along and suddenly that file disappeared from your computer, I would stop and prepare a full restore from 15 days ago from Backblaze. This is totally free. When files randomly disappear from the hard drive, one cause might be your hard drive is starting to die.

slededit · on Aug 2, 2016

The results of my attempt to try them out about a month ago. These are screenshots I sent to their tech support.

advisedwang · on Aug 2, 2016

It looks to me like this analysis is ignore the age of drives. Specifically it might be comparing the current failure rate of years old drives against the failure rate of months old drives. This is likely to make newer disks look much better.

I'd love to see some attempts at calculating mean time to failure (which is admittedly difficult to estimate until you have had a disk long enough to see a whole generation of them fail).

jxcl · on Aug 2, 2016

In case anybody missed it, Backblaze publishes their raw data, so it should be possible to do this analysis:

https://www.backblaze.com/b2/hard-drive-test-data.html

tkinom · on Aug 2, 2016

I love to see similar stat for SSD.

Maybe DigitalOcean can publish their SSD failure rates if any.

retrack · on Aug 2, 2016

This study from Google is interesting already:

http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f...

honkhonkpants · on Aug 2, 2016

The problem with the Google information is their rig is full-custom everything: silicon, firmware, and interface. It can't possibly influence your buying decisions.

olavgg · on Aug 2, 2016

Patrick from ServeTheHome recently posted that of the 160 used Intel SSD drives he had bought from Ebay. Not a single one has failed yet.

Intel, for example, advertises a 0.2% AFR on its data center SSDs which is much lower than observed hard drive failure rates which is 3-5%.

donatj · on Aug 3, 2016

I love and look forward to these. I bought the drives for my NAS based on the previous one.

asher_ · on Aug 3, 2016

I love these writeups from BackBlaze.

Does anyone know why some of the confidence intervals around the ARF are asymmetric?

def8cefe · on Aug 2, 2016

Can somebody tell Seagate what SSHD means.

kirian · on Aug 2, 2016

This the the acronym Seagate uses for solid state hybrid drives - http://www.seagate.com/gb/en/solutions/solid-state-hybrid/

I don't think Backblaze uses any of these so likely they've just pulled a stock photo of a hard drive from somewhere.

def8cefe · on Aug 2, 2016

I know what they mean, but the initialism has another well known use so it's annoying.

Not to mention that a device being solid state by definition precludes it from being a hard disk. Logically it just makes no sense to me.

shakes fist get off my lawn!

jimminy · on Aug 2, 2016

To clarify what dknoll is talking about:

'sshd' is the reference command to the 'ssh daemon'.

To dknoll, SSHD for the Solid State Hybrid Disk, means that it has components of both drive systems, spinning platters and a functional amount of NAND for feequently used applications.

You could use it purely for it's SSD assuming the drive controller continues to function. Though that is a ridiculous proposal.

def8cefe · on Aug 2, 2016

I understand what the device does and why they named it as they did. It's an awkward use of language, that's all I'm pointing out. I think 'Solid State Hybrid Disk' is a syntaxically incorrect name as it suggests the entire device is solid state when it is not. Hybrid State Disk perhaps?

jimminy · on Aug 2, 2016

I read it as "Solid State Hybrid" -Disk.

Which when regarded under the conventional drives at the time it was announced being primarily HDD's it makes it more clear.

You have an HDD that is a hybrid with solid state components. Logically clean cut, imo.

def8cefe · on Aug 2, 2016

I'm being super pedantic here, but see how you said hybrid before solid state in your third sentence when explaining it? That's basically what I just suggested would be a clearer name.

jimminy · on Aug 2, 2016

Likewise, I'm being pedantic.

Hybrid disk would be the generic term.

There has been work on liquid-state storage.

So Solid State becomes the adjective modifier for providing the classification of the type of Hybrid Disk.

You could in theory have a Liquid State Hybrid Disk, or a Liquid Solid State Hybrid Drive.

the_duke · on Aug 2, 2016

They are very different domains though, reading the whole sentence mentioning SSHD is probably enough to determine which one of the two is talked about...

def8cefe · on Aug 2, 2016

"Hey, did you spin up an SSHD on that new box?"

"Can you bounce the SSHD?"