Something important that I don't think many people bring up is the cost of these drives in a RAID, taking reliability into account.
When you're using this as the sole drive in a desktop machine or something like that, paying the extra for a HGST is pretty straightforward, but I see too many people passing over Seagate or WD when they intend to put them in a RAID, for "reliability"'s sake, without (it seems to me) much thought into whether it's really worth it.
So in order to get the reliability of a HGST, you're paying a 38% premium, is the extra reliability helpful in a RAID?
If you look at say a RAID6 array, aiming for usable storage of say 16TB, HGST's longest running drive (at 0.11% failure rate) would have a roughly 0.000016% chance of failing in a given year (0.0011^3 * 6 * 5 * 4) (chance of first drive failing = 6x(single), second=5x(single) etc.), assuming nobody replaced a failed disk, since you need 3 drives to fail.
Now an array of Seagate's longest running drive (2.66% failure rate) of the same size would have a chance of failure of ~0.225853% in a given year.
But, the cost of the disks in the Hitachi array would be 6x158 = $948 while the price of the disks in the Seagate array would be 6x114 = $684. For the price of the Hitachi disks you can get 8 Seagate disks.
So what happens if you just add the extra Seagate disks to the array as extra parity? Now you need 5 drive failures, giving an equation that looks like ((8 * 7 * 6 * 5 * 4 * (0.0266^5)) * 100), and a chance of failure of 0.008949%, still way higher than the Hitachi.
In the end buying Hitachi/HGST seems like the right choice anyway but I thought it interesting since I hadn't seen anyone else look at things this way.
If anyone has any problems with my math, please feel free to point it out, my stats background is pretty limited.
I think a better way to look at it is not the chance of total data loss, but the expected running cost over time. Even with just RAID6 it's hard to imagine a scenario in which you would be unable to replace one of the failed drives and resilver before getting two further failures, so the naive reliability calculations are extremely biased.
So if we assume that we can safely operate at some fixed level of redundancy (say 3 copies) across all hard drive vendors, the only question is (how many drives you need to replace per year) × (the price of those drives).
The problem for me at least is that most of the drives I use come with 3ish year warranties, so they're RMA'ed for free. By the time I'd have to buy a replacement I'd be intending to upgrade anyway.
RAID6 is unusable for many types of "always on" data, in particular databases backing high volume sites; the performance drops so much during a rebuild that for many uses it might as well be offline.
Then you use mirrors, you can apply a similar calculation there. Lower price per disk means you can put more disks in each mirror and compensate for lower per-disk reliability.
Obviously the type of redundancy used will depend on your specific application requirements, including performance.
Good analysis and I've wondered this myself, though this assumes failure rate is linear rather than accelerating over time, that failures are independent (vs a design flaw that causes them to fail around the same time, which happens enough that people recommend buying drives from separate batches / date of mfg), and you have to factor in the cost of a larger RAID array + power to support the same level of reliability. It may make sense in data centers but the smaller you go, the bigger the jump in cost from drive arrays (e.g. a synology 2-bay, 4-bay, 5-bay, and 8-bay are each pretty significant jumps). I too am on team HGST because of that.
>though this assumes failure rate is linear rather than accelerating over time, that failures are independent (vs a design flaw that causes them to fail around the same time, which happens enough that people recommend buying drives from separate batches / date of mfg)
I saw this happen once. A drive died, it was replaced and while the array was rebuilding several more failed. They ended up losing the entire SAN.
This is why I try not to buy many disks at once, just buy one at a time, and if I need more than one I'll go to different stores. Too much chance of getting a bad batch of disks and having the whole array go down (I've had that happen too).
Of course, that doesn't work at a large scale, but then you have other methods of increasing redundancy.
Yeah it's by no means perfect. Another problem that comes up is that when one drive fails the load on the others increases, which can lead to more failures. It'd be really interesting if someone's looked into the stats here but I haven't seen it.
As for paying for the power+drive bays, that's true of the consumer NAS devices but if you buy used servers instead it isn't such a problem. For the price of a 12 bay Synology disk station you can buy a 24 bay, 4U, used Supermciro 846E16-R1200B with a couple of CPUs and 144GB of RAM.
Not a great choice if you're short on space but can be decent otherwise.
Looks like Hitachi (HGST) is still leading in terms of reliability. I actually have a 400GB Hitachi drive that came in my Dell Dimension 9100 a little over eleven years ago. It has a few bad sectors but it's still chugging along just fine.
I used to shy away from Seagate drives, and even though the stats here are for 3.5", it doesn't look like WD drives are more reliable than the Seagate. I was planning to get a 4TB 2.5" portable usb drive, and right now it's just Seagate that offers one without scary features. WD has one but not the Elements, just the MyPassport which includes the risky AES feature, and I don't want to accidentally lock or encrypt the disk because of that and then need a Windows tool.
I do wish they'd spend more time on their client. I choose them specifically because of how they publish results like these, but when your install can get corrupted like this [1], it doesn't inspire confidence.
Heh. I switched my wife from crashplan to backblaze about 18 month ago because I was tired of the crashplan client being buggy. Backblaze has just worked.
I don't like CrashPlan very much, but my backups are way faster to them than they ever were to Backblaze. My home connection has 200mbit upload, so I don't think the bottleneck is on my side.
If you want to use more bandwidth in Backblaze, after you install find the Backblaze Control Panel, open the "Settings..." dialog, go to the "Performance" tab and UNSELECT "Automatic Throttle". Slide the manual slider all the way to the far far right. Then dial up the number of threads to 4 or 6. Seriously, you don't need to go any higher on the threads, after we use 100% of your bandwidth you can't get it to backup any faster. :-)
Spoiler alert, the answer is a surprisingly firm no.
Not even an optimistic "we're working on it!".
Doesn't even have to be "Linux support", just open an API, we'll do the rest! Use quotas for API users if the concern is around abuse, but please help us Linux folks use your service!
Please!
We support Linux now with our command line tool found here: https://www.backblaze.com/b2/docs/quick_command_line.html
There is no GUI, but the command line tool can "sync" your files with an arbitrary roll back time (the Mac/Windows client are hard coded to 30 days).
For good or bad, syncing to B2 charges "per GByte". If you have less than 1 TByte this will be cheaper than the $5/month for the Mac or Windows client. But if you have more than 1 TByte on Linux, it will cost you about $5/TByte/Month to back it up.
Wait, wouldn't that be standard? I've never used online backup, but I would naively assume that Linux would just have a block device or FUSE driver for these things.
On second thought, block devices are sort of sketchy for this use case - but FUSE still seems like a perfectly natural way to export networked storage.
I'd expect a backup hierarchy to just be a directory of read-only snapshots of my files - I mean, this is how I organize backups locally either way (e.g. /snapshot/YYYY-MM-DD/etc/whatever.conf).
The advantage of using a standard API like filesystems and FUSE is that you can use it easily and from the terminal without needing to deal with bloated GUIs and JVMs that are no doubt hard to install, hard to use and hard to rely on.
A JVM is also a pretty hefty requirement to have on a machine. (I personally avoid installing it all costs, mostly because everything that targets JVM seems to be crud to begin with)
I have yet to use a single backup client that didn't suck (i.e. churn my CPU, get out of my way, etc). That being said, Druva InSync has been OK-ish so far.
I had to stop using crashplan because there is no way to roll back revisions within a file. If a file gets corrupted it'll just back up the corrupted version on top of what was previously a good backup.
? That's not what I see in crashplan. Expand the tree view until you get to the file level, and then expand the files: it shows the times of the backup revisions. You can also specify a target time to restore to at the bottom of the restore tab.
CrashPlan has versioning that is set (by default) to back up changed files in 15 minute increments. If a file is corrupted, CrashPlan will attempt to back up the changes as it is not a corruption detection utility.
However, it's versioning allows for a user to easily restore the uncorrupted file by selecting a date prior to the corruption occurring.
The default versioning can be adjusted, so ultimately it's the responsibility of the individual user to confirm the health of their backup and verify the versioning frequency that best fits their needs.
The failure in that picture is that somebody deleted the file found at C:\Program Files (x86)\Backblaze\bzbui_interface.xml
That extremely simple file contains all the strings for the Backblaze GUI in all 11 languages. What your screenshot shows is what happens when that file is missing.
If you run a Backblaze installer over the top of that installation it should repair that by placing the latest version in place as it always does. If you STILL cannot get that file installed, something is going entirely, totally hog-wild wrong with your system. Hundreds of people install every day and that file is always present.
If you were running along and suddenly that file disappeared from your computer, I would stop and prepare a full restore from 15 days ago from Backblaze. This is totally free. When files randomly disappear from the hard drive, one cause might be your hard drive is starting to die.
Yev from Backblaze here - Looking at those screens that's what happens when the installer doesn't install correctly (obviously), it can happen for a bunch of reasons, usually the easy stuff like anti-virus and all that jazz, but usually clears up if you reboot and retry.
That doesn't look that busted to me, it appears that something prevented localized strings from loading, so we're just seeing the translation tags. It's not surprising that they don't fit in the UI, cause they're kind of long -- probably much larger than the localized strings are.
Brian from Backblaze here, you are absolutely correct. I'll copy and paste my answer from above to here:
The failure in that picture is that somebody deleted the file found at C:\Program Files (x86)\Backblaze\bzbui_interface.xml
That extremely simple file contains all the strings for the Backblaze GUI in all 11 languages. What your screenshot shows is what happens when that file is missing.
If you run a Backblaze installer over the top of that installation it should repair that by placing the latest version in place as it always does. If you STILL cannot get that file installed, something is going entirely, totally hog-wild wrong with your system. Hundreds of people install every day and that file is always present.
If you were running along and suddenly that file disappeared from your computer, I would stop and prepare a full restore from 15 days ago from Backblaze. This is totally free. When files randomly disappear from the hard drive, one cause might be your hard drive is starting to die.
It looks to me like this analysis is ignore the age of drives. Specifically it might be comparing the current failure rate of years old drives against the failure rate of months old drives. This is likely to make newer disks look much better.
I'd love to see some attempts at calculating mean time to failure (which is admittedly difficult to estimate until you have had a disk long enough to see a whole generation of them fail).
The problem with the Google information is their rig is full-custom everything: silicon, firmware, and interface. It can't possibly influence your buying decisions.
'sshd' is the reference command to the 'ssh daemon'.
To dknoll, SSHD for the Solid State Hybrid Disk, means that it has components of both drive systems, spinning platters and a functional amount of NAND for feequently used applications.
You could use it purely for it's SSD assuming the drive controller continues to function. Though that is a ridiculous proposal.
I understand what the device does and why they named it as they did. It's an awkward use of language, that's all I'm pointing out. I think 'Solid State Hybrid Disk' is a syntaxically incorrect name as it suggests the entire device is solid state when it is not. Hybrid State Disk perhaps?
I'm being super pedantic here, but see how you said hybrid before solid state in your third sentence when explaining it? That's basically what I just suggested would be a clearer name.
They are very different domains though, reading the whole sentence mentioning SSHD is probably enough to determine which one of the two is talked about...
When you're using this as the sole drive in a desktop machine or something like that, paying the extra for a HGST is pretty straightforward, but I see too many people passing over Seagate or WD when they intend to put them in a RAID, for "reliability"'s sake, without (it seems to me) much thought into whether it's really worth it.
Sorting by $/GB for Hitachi, Seagate and WD on drives >= 4TB (http://pcpartpicker.com/products/internal-hard-drive/#m=19,3...), you get:
- 4TB WD: $129
- 4TB Seagate: $114
- 4TB Hitachi (HGST): $158
So in order to get the reliability of a HGST, you're paying a 38% premium, is the extra reliability helpful in a RAID?
If you look at say a RAID6 array, aiming for usable storage of say 16TB, HGST's longest running drive (at 0.11% failure rate) would have a roughly 0.000016% chance of failing in a given year (0.0011^3 * 6 * 5 * 4) (chance of first drive failing = 6x(single), second=5x(single) etc.), assuming nobody replaced a failed disk, since you need 3 drives to fail.
Now an array of Seagate's longest running drive (2.66% failure rate) of the same size would have a chance of failure of ~0.225853% in a given year.
But, the cost of the disks in the Hitachi array would be 6x158 = $948 while the price of the disks in the Seagate array would be 6x114 = $684. For the price of the Hitachi disks you can get 8 Seagate disks.
So what happens if you just add the extra Seagate disks to the array as extra parity? Now you need 5 drive failures, giving an equation that looks like ((8 * 7 * 6 * 5 * 4 * (0.0266^5)) * 100), and a chance of failure of 0.008949%, still way higher than the Hitachi.
In the end buying Hitachi/HGST seems like the right choice anyway but I thought it interesting since I hadn't seen anyone else look at things this way.
If anyone has any problems with my math, please feel free to point it out, my stats background is pretty limited.