They also open sourced their server chassis which is awesome!
It's nice to see a reduction of the failure rates as a whole. It looks like the next few years will be some interesting times for the growth in storage capacities for HDDs.
Is there any real justification for doing this kind of test on a new drive?
Doing it takes quite awhile, so I've been wondering lately if it's even worth it. I've never found anything with it.
They have another good idea in there: recording the SMART data before and after the test. Their theory is that even if SMART attributes are still OK after the test, looking for a trend in the attributes may point to eventual failure later. Nice.
$ cd /Volumes/your_new_SD_card
$ dd if=/dev/urandom of=test.bin bs=10000000 count=1000
$ for i in $(seq 1 16); do cp test.bin test$i.bin; done
1. It's notorious that hard drives have a higher failure rate at the beginning of their lives than in the middle (see bathtub curve ). So it's not absurd to test them hard early on before writing any useful data and to do an early RMA.
2. The failure rate on drives is low enough that his methodology may be right but he still never has any failure in his life. Doesn't it make insane.
Next day the HDD didn’t turn on. Completely dead. :( I’ve never had a failure since but I backup everything now.
Apparently, quality assurance should never be a thing.
So then I started doing math on MTBF and lots of drives and things looked bleak. Just a couple years later MTBF had gone way up and continued to climb for a while after, but at that particular moment it was something like eight months between failure of you had any more drives than we had and just accelerated from there.
Because if it's the latter, it could've been caused by some other part of IO stack and not necessarily the drive itself.
insanity | inˈsanədē |
the state of being seriously mentally ill; madness
I'm also curious why those companies wouldn't directly talk to backblaze. I read somewhere a blog post on how they bought specific drives online at a sale.
> I read somewhere a blog post on how they bought specific drives online at a sale.
Yea, we used to buy drives wherever we could, but that was years ago. We're larger now so we go through more established channels.
What is your procedure/policy on which disks to use in the pods? Do you try and maybe control the risk by using different harddisk brands in a single storage pod? Or do you just not care, because there have never been 3 pods dead at the same time? :)
Do you still use 17 data plus 3 parity shards?
> In Q3 we added 79 HGST 12TB drives (model: HUH721212ALN604) to the farm. While 79 may seem like an unusual number of drives to add, it represents “stage 2” of our drive testing process. Stage 1 uses 20 drives, the number of hard drives in one Backblaze Vault tome. That is, there are are 20 Storage Pods in a Backblaze Vault, and there is one “test” drive in each Storage Pod. This allows us to compare the performance, etc., of the test tome to the remaining 59 production tomes (which are running already-qualified drives). There are 60 tomes in each Backblaze Vault. In stage 2, we fill an entire Storage Pod with the test drives, adding 59 test drives to the one currently being tested in one of the 20 Storage Pods in a Backblaze Vault.
Seagates, in general, could always be expected to fail in a 3 year span. Often multiple times. IBM/Hitachi/HGST, especially the UltraStar enterprise line, would basically never fail. Over 20 years and hundreds of discs concurrently running (switching out as opportunity arose), we probably RMAed single digits of HGST drives.
In comparison, while I had a few pockets of Seagate drives that didn't fail (I had 6 in a storage server at home that never gave me problems), we could generally expect 5% of Seagate drives that we had running to fail in a given year. Enterprise or regular didn't really seem to matter.
For us, replacing a drive was fairly expensive, using a limited resource (our time). So we gravitated to HGST almost exclusively.
But: We also did an extensive burn-in process before a drive went into production. Basically: "badblocks -svw" for a week. He noticed that we had some drives that would fall out of RAID arrays, but if we ran badblocks on would never report an error. My theory was that there were some marginal sectors that would bit-rot. Running a week of badblocks would exercise those and allow badblock remapping to remove them from use.
Remember the IBM "Deathstar" 75GXP? We even had good luck with those. I had one of them start freaking out, and I was aware of the "Deathstar" name, so I went to replace it with another drive I had on hand. When I pulled it out to replace it I realized it was HOT. Not in the grand scheme of things, but definitely hot to the touch. I looked up the temp specs and it was clearly above that. The box it was in had 2 5-inch bays that didn't have the covers on them. I covered those bays, turned the machine back on after re-installing the Deathstar, and the drive continued operating for another 3-5 years with no problem. Made me wonder if improper cooling was the source of those reports.
You've had far more experience with this stuff than I have. I wonder what your view is on all these NAS RAID boxes and HDD longevity and heat.
I personally never saw so many HDD failures until I started running them inside a small NAS I bought some years ago. I nicknamed it my "drive killer" because I was replacing the things so often. Other HDDs I have in other (non-NAS) machines sure seemed to last much longer.
As for the Deathstar story: I had a similar experience with a Seagate. Inside the NAS, one of the SMART error counts started to rise in a foreboding way, so I replaced it (with an HGST) before it had a chance to die. Then, out of curiosity, I tried the old Seagate in a different non-critical setting (non-NAS) and it's continued to work for two more years. IDK. Too much heat?
If you see this repeatedly with one machine, I would definitely consider that it is the machine which is bad rather than the drives. A power supply may be insufficient to support the peak requirements of the drives, for example. A poor mounting structure might resonate or transfer vibration or shock into one or more drive slots from the environment.
It might seem minor, but just watch this rather old now but infamous video of Brendan Gregg shouting at HDDs causing significant drops in throughput:
In the past I've used a lot of those Supermicro 5x3" drive carriers and they have really good airflow. This was for storage in servers I built for home use. I wrote up something in 2008 about one I built here: www.tummy.com/articles/ultimatestorage2008/
Faster spinning disks generate more heat and vibrations than slower spinning ones. 15K drives are notorious for needing good environmentals.
My current storage needs are pretty limited, which probably would make SSD attractive. Or we are getting there at least. My storage server right now has 6TB used, and around 1TB of that is backups. If I could get by with 4x 2TB SSDs, I might consider it, just for longevity sake. I'm using ZFS BTW.
However, I can't for the life of me find out whether or not those bays will work with 6 Gb/s SATA. I've been considering putting my Ryzen 7 board into the NAS case to consolidate two systems into one, but I'm not going to go to that effort if I have to get new hotswap bays.
Perhaps heat could have accelerated the process by loosening the material somehow, but it was definitely a design/manufacturing flaw.
THEY ONLY WAY YOU CAN VERIFY THAT YOU DID INDEED RECEIVE THE FULL 5 YEAR HGST WARRANTY IS YOU MUST CALL HGST TECHNICAL SERVICE, GIVE THEM THE SERIAL NUMBER AND ASK IF THE DRIVE IS A NON-OEM, OEM OR REMANUFACTURED.
I have ordered expecting the full 5 year warranty, and verifying with technical service, have discovered - one with over 18,000 hours (from Crystaldiskinfo), several OEM's (no manufacturer warranty), a remanufactured and a full 5 years manufacturer warranty. AND once in awhile I get lucky and received as advertised - a full 5 year manufacturer's warranty.
I can generally get the same or relatively comparable prices at local computer stores that I trust significantly more than Amazon.
This includes drives that were shrink-wrapped! I've heard bestbuy re-wraps returns and doesn't check them very well.
I now have at hetzner.de a host that has 3 4TB Seagate drives and I ended up building a 3-disks-raid1 instead of a raid5 because I'm too scared that when one drive fails, an additional one will fail as well during the rebuild-process when the remaining ones are stressed (a raid5 would then implode in this situation).
The larger factor is the amount of labor involved in dealing with a bad drive.
Their (large) operation seems automated so I assume if they have a single drive failure in machine #1234 (which has 50+ drives) they have an automated way of switching off said drive. Then they leave it there until the entire rack is replaced many years later.
Other operations have to send a human to manually replace every single failed drive as soon as it fails, which is very expensive as a % of the cost of the bare drive.
They can get large numbers of them cheap, and their system is good at detecting and replacing bad drives, so why not use them?
Maybe it's just my social/business groups, but "because reasons" doesn't have to have a negative meaning. I use it as more a statement of fact: There are reasons for this.
They have over 20 thousand HGST drives now, according to this year's report, which seems significant to me.
They continue to have lower failure rates than the Seagates.
If you're an individual person with a 12-drive ZFS RAIDZ2 with hotspare, in a file server in your garage or something, you might prefer to pay the premium to buy HGST, because what if you have a double drive failure during a three week period while you're on vacation out of the country?
That Seagate 10 TB was the singular standout in terms of reliability. Maybe it is worth checking out.
You can't argue with results.
Sometimes, personality dynamics cause certain approaches to work better than one might prefer. :)
I would be surprised if the number of drives they have is actually "statistically insignificant" \* but I haven't crunched the numbers.
\* By this I guess you mean that the null hypothesis of "they have the same failure rate" is not rejected.
Granted, I'm not a hardware guy so my takeaway could've been wrong, but that's what it looked like to me when reading them.
I'd expect on next year's report, those particular drives will show up as 49 drives with around 42000 drive days, assuming they aren't replaced by then.
If you have terabytes to back up, are there still any backup services left that'll let you ship them a drive for a faster initial backup?
> If you have terabytes to back up, are there still any backup services left that'll let you ship them a drive for a faster initial backup?
Backblaze offers a "Backblaze Rapid Ingest Fireball" to allow you to ship us 60 TBytes of data on an appliance.
If you only have 2 - 10 TBytes, I suggest you get a faster network connection, or carry your laptop to a location (like your work place, or a library, or your neighbor's house) with a fast connection and just upload it. You might be surprised how easy and fast it is to upload a couple of TBytes nowadays. Using the Backblaze Personal Backup with 30 threads, I can upload about 1 TByte every 12 hours or so. So if you can leave your laptop at your workplace for 4 days you can upload 8 TBytes, then bring your laptop back home for the incrementals.
> please add ability to comment on snapshots
Actually, very very quietly as part of the 6.0 release (4 days ago), we now allow you to "name your snapshot".
While this is not a comment, and you can't change the name LATER (so useless to old snapshots), at least going forward you can put up to about 1,000 characters of description in the snapshot name.
Backblaze "Best Practices" recommends you get fully backed up within 30 days, but honestly we won't be bothered and cut you off even if it takes 6 months to upload your whole backup. As long as you are aware of the exposure for the first two months (where only half your data is backed up), I still think this would be FINE for somebody with 10 TBytes.
If you do something like this, the only thing you have to know is Backblaze backs up files in "size order". Small files first. So maybe if your digital movies are more replaceable than your pictures, you might be through 5 TBytes of photos in the first month and be "protected enough" to live with?
Funny story: I'm kind of unusual that I run an open WiFi hotspot in my home, because I have plenty of unused bandwidth and guests in my house and even neighbors are welcome to borrow some of it anytime. But one of my neighbors one time downloaded illegal content on my WiFi (the exact name of the "True Blood" episode was included, with a date and time), and I got a "cease and desist" letter from the ISP. (sigh) I felt it was pretty rude of my neighbor. I mean, just because I leave my car door open in my driveway doesn't mean it isn't rude of you to use my car as a getaway car in a robbery, you know?
Unless they have data caps in their contact and you go over them. Or they monitor for anomalies and data leaks and the IT will come asking questions / reporting incident. If you don't know everything about the setup and have an approval, don't do it.
Thank you for reaching out. I love it when company folks will get into the conversation on HN.
As far as connection, I've got 300/300 from Frontier. It's good, and I can help out my friends. But more questions below...
I had been using CrashPlan for years. Converted to their business plan when they decided to ditch the Consumer stuff.
My confidence in their viability/user experience has eroded. I have personally ditched their service. My concern is for some family members who I've also had on my plan. I think they have roughly ~3TB of photos backed up. I want to find them a new home.
Color me skeptical that folks outside of the public cloud providers are going to be around in 10 years.
Convince me why Backblaze is a good option to send folks to, and please understand, I'm not trying to be a jerk. I'm just wary after having "the CrashPlan experience." Know what I mean?
> Color me skeptical that folks outside of the public cloud providers are going to be around in 10 years.
Backblaze is now 12 years old, and we're actually kind of unique in that we have never raised any significant VC funding and we're (slightly) profitable. We have run the business entirely as a business (not on unsustainable VC dollars), and we aren't planning on going anywhere. Backblaze is employee owned and run, the only voting board members are the original five founders. We have a couple of "board observers" from outside the company for "adult supervision and experienced advice", but they cannot control us, and they cannot even vote.
Side Note: The Backblaze founders and a good portion of the staff all came from the same previous startup/company, and in that case the VCs forced us to sell it, which murdered it. The whole reason we self funded Backblaze and ran a sustainable (profitable) business was a reaction to how horrible that situation was.
Backblaze currently has 803 PBytes of storage in our three datacenters, and business is really going well and we are growing quite healthy. We also understand (and talk about among ourselves) the large responsibility here, which is realistically that amount of data cannot ever be moved. If Backblaze decided on a whim to shut down, we would seriously, SERIOUSLY hurt or even manage to destroy thousands of businesses which depend on us. So we are not going to do that.
Though what I probably want is just S3... Maybe I should just build it and sell that.
In such a case, could the HDD itself be a bottleneck as it's trying to slice and dice a lot of files large and small? It's running at 70-80% active time.
Probably doesn't help that I'm located abroad either..
Yes. Backblaze makes a copy locally of all files larger than 30 MBytes, broken into 10 MByte chunks. They’re stored on your “Temporary Scratch Disk” (which you can specify). One hint would be to put your temporary scratch disk on a fast SSD.
Personally I can get over 150 Mbits/sec upload, but I am on a PCIe SSD and have excellent latency to the datacenter. The worst case is a 5400 RPM drive powered only by USB, located in New Zealand. They would have trouble hitting even 20 Mbits/sec using the newest 6.0 client with the max of 30 threads.
You might be surprised how slow it is in many areas of the US, let alone the world.
I know there are digital deserts. (I just made that name up.)
But one thing I'm curious about -> often when the only CONSUMER ISP in one area (like your DSL company) is slow, there are quietly companies in your area with Gigabits of connection. If you really live out in a remote area of Colorado this may be 30 miles away from your current location, but I would LOVE to see a real "heat map" of high speed connections in the USA.
I think an absolutely killer feature for a company like Kinkos would be to have a "rent a super fast internet connection for a couple days" so you could drive someplace, download a movie or upload your backup, then come home.
The Backblaze office in San Mateo California originally only had very sad DSL line available for consumers. We requested a 10 Gbit/sec symmetric fiber connection to our office, and it took a VERY ANNOYING 3 month wait, but eventually a commercial provider (AboveNet or whatever it is called now) brought it to us as long as we signed a 3 year contract. Totally without asking, AboveNet put a fiber line in where we can light it up at up to 100 Gbits on a single strand of fiber, and they put 40 fiber strands into our office!! This costs $1,500/month so not really viable for an individual home, but maybe for a Kinkos or local ISP or shared among 10 - 100 houses this may be viable.
I have three suggestions, but I have only tried the second suggestion, so please do your own research before my bad advice costs you a lot of money. :-)
1) Comcast (at least in many places) allows you to exceed your bandwidth cap for two months before clamping down on you. I think they are trying to prevent serious long term abuse, not a one time overage. So if you have 3 TBytes and can get it uploaded in one or two months, just do it, apologize, and it won't cost you anything. Backblaze only does "incrementals" after the initial upload.
2) Personally I have Comcast and I pay them an extra $30 or so per month for "unlimited" (remove the cap). Now when I look at my usage, my family stays just under the 1 TByte limit ANYWAY, so this is wasted money, but I don't want to stress about it, and I run like 5 Nestcams CONSTANTLY streaming video, plus my family loves Netflix, so I just drop the $30 and relax. So you could call up Comcast and change over to "unlimited" if you can afford it.
3) A modification of #2 that I have NOT TRIED is to raise it to "unlimited" for the duration of the initial backup. I don't know if you have to commit to a year of unlimited bandwidth, or six months, or if you can change at any time?
Haven't used it personally, but did a fair amount of digging ~3 years ago. It was a potential backup solution, but we ended up shipping a 4U server full of 4TB drives to a partner for off-site backup, since they needed routine access anyways.
Check out the AWS page: https://aws.amazon.com/snowball/
> Good thing they charge me $0.94 a month for my B2 storage.
We thank you for your business! :-) The absolute beauty of Backblaze B2 (or Amazon S3 or Azure) is that we can build a storage system at scale, and sell off all the little pieces of that. You win because you get a fair price on a sliver, and we win because we add up 100,000 customers like you and make about $1 million per year.
The very best business is where the customer and provider are happy with the relationship.
I was just curious when I saw the # of drives as the largest chunk in this writeup - to see how much that chunk cost them.
> When a drive fails, it's effectively a brick with no terabytes.
Interesting factoid: that isn't always true. What you describe is actually the CLEANEST type of failure, the drive suddenly becomes a brick. We replace the drive and rebuild it from parity.
A way more interesting failure is when disk blocks start going bad at an unacceptable rate. Backblaze splits your data across 20 different hard drives in 20 different machines in our datacenter. The sub-parts we call "shards", a shard sits on one disk. Each shard has a SHA-1 checksum, so we know if each shard has been corrupted. If an individual shard is missing or corrupted, we know it needs to be rebuilt from parity.
So when a drive is HALF-FAILED, we even have a procedure to pull the drive out, and then opportunistically copy whatever files we can recover onto a new drive, then put the new drive back into production. Any files we recover where they are in the correct filesystem location and their SHA1 says they have not been corrupted speeds up the rebuild.
The reason the speed of rebuild is important is the whole concept of 11 or 12 "nines" of durability. We can't have more than 3 drives fail in any one group of 20 drives, and the faster the rebuild time, the less likely for 4 simultaneous failures. It plugs into the formulas in this blog post we did about durability: https://www.backblaze.com/blog/cloud-storage-durability/
Do I understand correctly, that when the drive is half-failed, you don't just say "it will probably completely stop working in the near future" and discard/replace it but keep using it?
To provide more color, if a 20 drive "tome" (as we call it) is 1 drive down, we don't even wake people up in the middle of the night, but Backblaze datacenter employees replace it when they arrive at the datacenter the next day at 8am. All drives having problems are replaced by 5pm when the employees go home. This is completely business as usual, about 5 - 10 drives fail every day.
However, if 2 drives fail out of 20 (or 1.5 in our example above), pagers go off, people wake up and get out of bed at 3am and start driving towards the datacenter. Or we employ "remote hands" to swap the drives immediately, it depends on the capabilities of the night crew in the datacenter which varies by datacenter. "remote hands" is a contract service where semi-skilled technicians work for the datacenter and we can pay them $80/incident or there abouts to do things you can only do "in person" like replace drives. All the pods (where data is stored) have "base board management" which means as long as they are powered up and online we can log in remotely from home or office to figure out what is going on and fix a variety of problems. AUTOMATICALLY if 2 drive fail we stop sending any data into that "tome" of 20 drives. We have found that writing to drives causes more failures, so not writing to them is safer.
If 3 drives fail, it is instantly a "Red Alert" at Backblaze and a whole lot of official procedures kick in. An "incident manager" is assigned and the whole company's number one concern is to drop EVERYTHING and never sleep again until the Red Alert is lowered to Yellow. We light up a "situation room" (in Slack - our internal chat tool) and information and status is relayed through that.
SIDE NOTE: Backblaze has a relationship with an excellent company named "DriveSavers" who can recover SOME data off of failed drives. This is very expensive (thousands of dollars per drive) so we only do it to test the procedure and then in extreme situations. Three drives down is an extreme situation and extremely rare, so ALL OF THE THREE FAILED DRIVES would be immediately hand carried to DriveSavers even while we rebuild the customer data from parity. Notice Backblaze STILL has a complete copy of the customer data on 17 drives -> But if a 4th drive dies, the hope is we can recover at least one of the drives via DriveSavers thus saving the customer data. (We need at least 17 out of 20 drives in a "tome" to reconstruct the data.) In our experiments, DriveSavers seems to recover about half the drives, or in some situations half the data from a drive (imagine if 1 platter on a drive has a head crash and is destroyed, but the other platters are fine). We have made the decision that it is less expensive (for the same durability) to pay DriveSavers the thousands of dollars rarely instead of increasing parity to allow reconstructing data from 16 out of 20 drives instead of the current 17 out of 20 drives.
You guys should think about writing a short eBook about e.g. general recommendations about setups/analysis/projections & stories about past failures/chain-of-events/etc - I might buy it :)
>> We can't have more than 3 drives fail in any one group of 20 drives...
Wow, for me, subjectively, an low threshold - and I underderstand that each drive being hosted on a different machine protects you as well from a machine/controller failure (happened to me twice with the controller - both times it was very hard to diagnose and the experience in general has been terrible).
Do you have as well "backups"? Or is that in the hands of the customers/users?
If you store data in Backblaze, there is no "backup" of that data. If Backblaze ever lost 4 drives simultaneously and could not recover the data, the customer would lose data. This is much like Amazon S3.
In general, we recommend a 3-2-1 backup strategy where there are 3 copies of the data, at least 2 copies on your site, and 1 copy in the cloud. You can read about that philosophy in our blog post here: https://www.backblaze.com/blog/the-3-2-1-backup-strategy/
To summarize I understand: A) the local working copy (locally replicated in your case), B) the local backup and C) the cloud/very remote backup. B & C cover each other if any datacenter is completely wiped out.
> if any datacenter is completely wiped out
Correct. When all of our datacenters were in Sacramento, California, some customers told us they were concerned because they were ALSO in Sacramento and a meteor could wipe out both their computer, the local backup, and Backblaze's cloud backup, all in one meteor strike.
While by default we put your data where it is convenient for Backblaze, we CAN work with customers (and have done so) to place their data in our Phoenix Arizona datacenter or one of our Sacramento datacenters if it is important. As we add our European region (coming soon) this will become a pull down menu for all customers. For now, we only work with larger customers to make sure the customer data lands in the correct location for them.
Interesting about the new European region, for sure at least from the point of view of "locality" (I assume that from the point of view of "data ownership" the US will still consider itself "owner" of the data as the holding/legal entity (don't know what kind of company it is, but your website mentions San Mateo US) has its headquarters in the US.
Well, I work most days in San Mateo, California, but 15% of the data we store for customers ALREADY comes from the EU, and more from other countries. Backblaze fully complies with all EU laws already, such as collecting VAT and passing that money through to EU countries.
Philosophically, we feel the data belongs to the customer, but we comply with all laws in that customer’s country. For the Backblaze Personal Backup product this was easiest, since it is encrypted on the customer machine before being sent. For B2 (our object storage product like Amazon S3) it got much more complicated because for the first time customers can configure it to be a publically accessible web host, so Backblaze sometimes gets served with takedown notices due to illegal content hosting.
We ABSOLUTELY comply with standard procedures the same as Amazon S3 must. Backblaze is not some crazy safe harbor for criminals hosting stolen movies. With that said, if you encrypt the data before it leaves your computer and store it in a private bucket, Backblaze has no possible way to know your file contents and we do not want to know. And we would have no way of handing that over to the US government (or the EU) even if they demanded it.
Thanks for mentioning this! I’ve always (begrudgingly) had to tell people that while I love Backblaze you have to understand the geographic risk. I always suspected you had a “yeah, we can put you here if we edit this config file” but the drop down will be much better for everyone. Looking forward to it!
1 10TB drive
10 1TB drives
For large deployments, the concern is between failure rates and the amount of time it takes to rebuild data from a failed disk.
For small deployments, my main concerns are whether disk failure takes a machine or volume out, causing availability problems.
I’m trying to figure where failures per GB would be how you would choose, what scenario we’re you thinking of?