It has been our experience that it is no longer possible to buy new, non-fraudulant drives of a sort of recent, but not brand new drive model from Amazon.
So, for instance, in early 2016 if you want to buy 4TB enterprise drives from Amazon, you will find them, they will be classified as brand new, and they will be sold by some big amazon parts seller.
When you receive them they will be nice and shiny brand new wrapped - perfectly sealed - and when you spin them up, SMART stats will show 4000-6000 hours of use and that they are 2-3 years old.
This is almost universal and has been happening for at least 3-4 years. These sellers are selling the drives as brand new and they are anything but. When you complain, they will immediately exchange or refund - there's never a hassle there - and once in a while the seller will spout some bullshit about the drives being "new pulls" ... that is, drives they stripped from unsold servers/desktops.
Hope that helps.
All of that to say I'm wondering who runs a drive for 250 days and then sells it. I could easily see a thousand hours being run as burn in to catch infant mortality failures but those are pretty much all gone after the first thousand hours.
Another problem we see a lot with 'refurb' purchases (especially SAS models) is that the drives can be OEM types made to order for use in a vendor's SAN/NAS enclosure; the OEMs often load up their own firmware and tweak the hardware design, so generic code may be incompatible and can't be installed. Buyers try these drives with their own hardware and when they don't work (at all or partially) they come to us for a firmware 'fix'.
Warranty and support issues for OEM drives are handled by the company for whom the drives are made, and they may not cover second-hand drives purchased through certain channels. A customer the other day had around $8K of 'refurb' drives that wouldn't work in their storage enclosure and we had to refer them back to their auction-site based supplier.
Agreed, Fulfillment by Amazon opened the floodgates for dishonest sellers and reviewers. It's slowly turning into Alibaba.
I no longer trust anything I see on Amazon unless it's a well-known, name brand product being sold by Amazon themselves.
Saddest part of this trend is it's not just Amazon. Newegg, BestBuy, and even Walmart, now have these "marketplaces" full of spurious sellers. Drop-shipped rejects, so-called refurbs and mis-represented descriptions abound.
edit: sorry if not from US, I and my experiences are from the US and tend to be US-centric.
Newegg really needs to drop that shit, it's bad for business.
On another note: THANK YOU NEWEGG! For adding an apply to their side-bar search function so my page doesn't have to refresh EVERY! TIME! I! MAKE! A! SELECTION!
There's no differentiation in the warehouse between "item as sold by Amazon" and "item as sold by Joe's Hard Drives and Fulfilled By Amazon".
People have commented on this before, with returns and exchanges, unique identifiers on something that they bought back from themselves (can't remember the reasoning) not being present, and Amazon saying "well, all items with a given SKU get grouped together".
Some more info I wrote a while ago: http://lucb1e.com/rp/batstat/en/
Maybe if they have tens of thousands of perfect ratings and it is fulfilled by Amazon, worth the risk for non-mechanical stuff.
But even then, there is tons of counterfeit stuff on Amazon and eBay now - if it is a brand name and the price is too good, it is fake. Even if the price is okay it still might be fake.
China is getting amazing at making fakes. Even medication for humans and pets.
What happens once they are able to wipe SMART data? Is that even possible?
Edit: Here's the thread: https://forums.servethehome.com/index.php?threads/goharddriv...
That forum also has some in depth reviews of white label hard drives.
we've had good luck but something about your username tells me you buy more than we do...
the worst luck we've had is the large 2.5" drives. those are basically just disposable.
If they are very new, they haven't had time to percolate through the supply chain and come out the other end as "drives we can pretend are new".
So again, as of May 2016, it is 4TB drives that we are seeing at great prices, sold as brand new (no confusion about this at all) on Amazon, and ... turns out they are not new.
I keep meaning to write a blog post about this with specific vendor names ...
Good to see Seagate has improved the quality of its product significantly, but a 3.5% failure rate still seems rather high. How old are the drives in question?
When the FCC started doing reports on internet speeds, ISPs took notice and in the second report, an ISP's actual speeds better matched advertised speeds. It would be awesome if these Backblaze reports were likewise improving hard drives.
DISCLAIMER: I have not run my brilliant idea past any lawyers, so it may not be legal. We have not done this (yet).
Are you talking about some kind of burn-in at Backblaze for units sold to get past the infantile failure region? Or just Backblaze-as-drive-retailer but they specifically endorse this model because they have good experience with it?
I've actually been backing up to Backblaze for about a year now but (knock on wood) haven't had to restore any data. That said, one of my drives has been acting up in the last few days so that moment may be at hand (though I also have it backed up to a second local drive).
You don't have backups, unless you can do restores. So you have to practice doing restores.
For my servers at work, we're using btrfs snapshots and send/receive to the backup host. So restoring files is just going into the appropriate snapshot directory, and copying out the files of interest.
If your backup scheme is any more complicated than that, you need to practice it at least a few times per year so that it is completely familiar.
Hilarious story from the old days...
We were doing backups to QIC tape drives. At one point, there was a lightning storm. The servers were plugged into UPSs with power protection though.
However, when running a backup, I noticed that the tape drive sounded a little different. So I check one of the backup tapes... the tape drive would no longer switch over the tracks on the tape. So it was just overwriting the same track again and again. Corrupted backups. Worse yet: silently corrupted backups. No messages from the OS about a hardware problem.
That could have been bad news if it wasn't caught quickly.
100% - I worked hard to make sure that was in the Best Practices sent out to every person that signs up with Backblaze. Restoring is the most important part. So far we have over 200PB of backups, but the stat that I like even more is that we have restored over 10 Billion files.
I was a 'terminal room consultant' in college... back when we had serial terminals hooked to Unix systems. Part of the job was the care and feeding of a couple printers, a big ol' line printer (green bar paper) and a Printronix graphics printer (dot matrix, for printing out fancy lab reports you wrote up using troff).
So over time, from loading paper and clearing jams, I had accumulated hours and hours of hearing these two guys chatter as they went about their business.
At one point, I noticed that the Printronix printer sounded funny. Just off, in some way. So I call it in for maintenance, but they don't seem to care what an undergraduate punk thought about printer sounds.
Sure enough, a week later, I see it is down and taken apart for repairs.
Your ears, your nose, all your senses should be used for debugging and general investigation.
Bingo, I was just explaining this to someone yesterday. Testing the restores MUST be part of the backup strategy. If your db data is small enough to have it all in your test environment, I often try to test the restores by restoring to the test db and then using that db for the test environment until the next test restore.
Yev from Backblaze. Please test a restore. We have that as part of our best practices. Why? Hopefully you won't but if you DO need a restore, it's likely that you'll be in a heightened state of panic, so familiarizing yourself with the process before hand make it go more smoothly! It's pretty easy, though we are currently working on ways to streamline it even further :)
When customers are doing a restore, almost by definition they are not having a good day. For example, this could be somebody who just had a $1,200 laptop stolen, and now they might lose every photo they have of their child who died last year. Real example. :-( So they show up to the Backblaze website freaked out of their minds, and they FORGET THEIR PASSWORD or something silly and minor like that, and after guessing a few times our support guys get a flaming hot chat session with a person using more four letter words than not accusing Backblaze of not having their restore.
When we resolve it all (help them with the "recover password" feature) then we usually get a happy customer for life who apologizes for losing their temper earlier. I always find it amusing when they think they are the FIRST customer to ever lose their temper under such a stressful situation. Usually it isn't even the first one THAT DAY.
TL;DR - you only restore when you are freaking out. And that's Ok - your worst day is the day when Backblaze has to be the best.
A lot of our customers also restore just for fun or out of convenience. But yea, if you're one of the ones that's doing it out of necessity after a crash or theft, knowing how it works makes everything go a lot smoother.
I have one at work, and another at my parent's house, more than 100 miles away. In case of a literal blaze and/or high water, even the local drive should be fine. If all drives are gone from a single catastrophe, I figure I (and everyone else) have more serious things to worry about, like what's for dinner.
How often can you swap them?
type your username, password, and click "Sign In", then....
Look on the left nave for "View/Restore Files", browse to one file you know you changed recently, and check the checkbox by it (and maybe a few other files) and click "Continue With Restore". Done!
Within a few seconds you can go to "My Restores" (on the left side of the web browser) to download the restore!
Brian from Backblaze here. Our new B2 product is priced at half a penny per GByte per month which accurately reflects how much it costs for us to store your data including a small profit for us.
So the $5/month is profitable for us up to a 1 TByte backup. We have about 25 customers with more than 50 TBytes in a $5/month backup, and yes, we lose money on them (which is FINE - they often recommend us to friends with less data). On the opposite end, we have about 20,000 customers with less than 20 GBytes backed up where we are massively profitable on those particular customers. Interestingly enough, my 84 year old father is in that demographic - no digital music, no digital movies, a few digital photos and a Quicken file. Last year he lost a hard drive, we restored his files from Backblaze. :-)
In between the 50 TByte and 10 GByte customers is a big bell curve with the bulk of our customers basically paying for their own backups.
A different way to think about it is the vast majority of people store files inside their laptops and are happy. The maximum size hard drive you can configure in a laptop today is probably about 1 TByte (the 2 TByte laptop size drives are just starting to appear), so by definition we're profitable on people like this. Technical people think everybody is like them and has a 16 TByte RAID array filled with all the Linux ISOs and movies. :-) But really most people have less than a TByte of personal data.
I also keep a periodic backup in my office as well as various less systematic backups in various ways. I'd recommend anyone keep at least a couple of backups made with different methods. You never feel more exposed than when you need to use a backup and realize you only have one copy of all your data. Hope nothing else fails or you do something boneheaded with the restore.
You logon or use their app, choose the folders/files you want to restore and await an email from them letting you know when the folders/files are available for download.
One thing that bothers me is that the data presented doesn't really take into account the age of the HDDs. For example, if a batch of HDDs of a particular model is 6 years old and has a failure rate of 12%, that really doesn't tell me much except that it's an old HDD.
What I'd like to know, for a given model, what the blended failure rate is after 3mo, 6mo, 12mo, 24mo etc of operational time. That would be a real apples-to-apples comparison.
tldr; It clearly shows that HGST has an overall superior survival rate over time. WD is a distant second and Seagate in third (although the Seagate ST4000DM000 model is exceptional and fairs very well).
You might actually be able to pull that from the raw data. We post it all here -> https://www.backblaze.com/b2/hard-drive-test-data.html in case you wanted to play around with it!
That's about right ;)
First of all, it is a great way for people to hear about our company, which sometimes leads to people buying our products and services.
Second, even just for the good of all humanity, why wouldn't you do this to promote the best drive manufacturers and heck, even help them learn how their drives are working in the field?
It boggles my mind that Facebook, Yahoo, Google, and Amazon (Amazon Web Services include S3) don't publish their drive failure stats. What could possibly be the reason they keep silent?
The boardroom is so divorced from the showroom that it's not even funny how big he disconnects can get.
It's still an awesome study and well worth reading and useful to a ton of people (like us at Backblaze when we were starting out).
I would have expected more absorbtion+rationalisation from the takeover, yet they still clearly look of the IBM family than anything in WD's ranges.
IBM always had a pretty good rep for failure rates, aside from a couple of horror drives in the 90s. I wonder how they've managed to keep their rates markedly better than other makes, even under changes of ownership.
More to the point I wonder why WD haven't been able to improve rates as a result of taking over IBM/HGST.
Sometimes heat causes a machine room meltdown, where machines start failing left and right. This is not uncommon. What usually happens is this - the company knows enough to put their web server etc. at a controlled colocation facility (which isn't always ideally controlled, but that's another story). But the developers would like a local file server, development source control server etc. There is a spare, windowless room in the office, and without much planning, a machine or two goes in. Then a machine goes in attached to a tape drive, which backs up those machines and the local desktops. Then more machines go in, then more. All these machines generate heat, so a room cooler is bought. But it drops condensation in a cup, and shuts off when the cup is full of water. So someone's job becomes to empty that cup every morning. But then summer comes, and on a particularly hot Sunday afternoon the condensation cup fills, the cooler shuts down, the outside temperature combines with the temperature of a closed, windowless room full of machines for hour after hour. Finally one machine has a component fail. Then another. Then e-mails and phone calls and panic starts flying.
I have seen this happen more than once, and have heard about it more. It always starts as an ad-hoc, unplanned, "temporary" solution for "unimportant" machines. But as time goes on, and machines are added, and business dependencies are formed, you have to start supporting a machine room that was never planned as a machine room.
To me, that's an astoundingly large number of hard drives. But I realise there are probably much bigger deployments. Does anybody happen to know just how many hard drives Amazon or Microsoft have for AWS/Azure?
1,000 TB (1PB) can be easily handled across ~150 (6-7TB)HDDs for one copy, but 300-450 HDDs would be required for additional mirroring.
Largest tape cartridges out there are between 6 and 8.5 TBs, and cost around $22 per TB. That's only $22,000 per PB, and this is for high throughput cartridges like LTO7 or StorageTek Titanium. LTO5 is much cheaper.
Considering that the largest tech companies and major organizations routinely cut POs for several $100ks and are dealing with 100s of PBs of data across disk, tape, DVDs etc, it isn't outside the realm of possibility to have 300,000+ individual disks and tapes floating out there :)
And I remember reading a tweet from a Google engineer that they would be paged if their free storage dropped below 5PB.
At that scale, it's just a function of (number of ethernet cables) x (avg size of ethernet cables), rather than disk space in their data center, I'd imagine!
Mirroring is not used at that scale
How do cloud storage vendors guarantee triple-mirroring and uptime then? Lots of 2TB drives? Lies? :)
Yea, our latest storage pod (https://www.backblaze.com/blog/open-source-data-storage-serv...) has 60 drives at about 8TB a piece so we're pushing 480TB. Two pods are about a Petabyte, if you go up to the 16TB Hard Drives some of the manufacturers are testing, you can hit pretty close to 1PB in an enclosure, and Dropbox is actually doing that already with their "Diskotech" boxes (HN link -> https://news.ycombinator.com/item?id=11282948) - so folks are already getting more and more dense :D
We've just started heading down the direct paths, but yea it seems like the minimum order number to work direct with the manufacturers is about 10,000 hard drives per order. We aren't quite at that capacity and we don't like to keep inventory since we try to run pretty lean. Also the price for hard drives tends to drop by a small percentage monthly, so for every month we have excess inventory we purchased earlier, that could potentially be money left on the table. So our orders to be smaller than their minimums. We're inching towards it though :)
I'm asking myself, even if I'm a WDC rep that is selling hundreds of thousands of PC hard drives and having an excellent quarter/year, why would I turn away the business of a growing company?
Tape cartridges can be purchased in packs of 20 from any corporate vendor and no order is too small to attract a rep's attention. I've seen deals (25th to 75th percentile) from $10,000 to $200,000, and the min/max deal range is from $2,000 to $500,000
I'd like to know YouTube's totals with 500 hours of video being uploaded every minute...
Then being conservative, we can say an hour of video is 1-2GB of data, so between 500 and 1,000 GB of data uploaded (just to YT) every minute.
Accounting for mirroring and compression, 525,949 minutes per year * 1 TB per minute = ~526 PB per year
Ironically this would be about 65,743 8TB HDDs, which is close to the number in this article :)
- Price of buying a drive and a pod to house it: ~$0.036/GB 
- 1.84% of drives need replacing each year (warranty aside)
- They use 17:3 parity for redundancy (15% of storage)
So the hardware price of a GB should be something like $0.036 * (1/0.85) * 1.0184^(years). For 10 years, the hardware would cost $0.05/GB, or $0.0004/GB/month.
Power costs and DC space of course need to be taken into account but I still find it interesting that the hardware itself costs only ~10% of what they charge for B2.
> I wasn't sure how the "weird" sourcing they had (cracking external drives open during the HDD drought) would affect things, but that was over in 2015
That was actually in 2011, all those hard drives have been out of our system for a while!
Are you buying bare drives, or shucking them from enclosures?
We haven't had to shuck drives since 2012. And we only started doing that because regular drive prices went up so much due to the Thailand flooding. I believe all those drives have now been replaced as well. If you want a good write-up (along with links to past posts) this is a good "Hard Drive Farming" write-up -> https://www.backblaze.com/blog/farming-hard-drives-2-years-a...
People on /r/buildapc don't seem to believe it's a problem though: https://www.reddit.com/r/buildapc/comments/4jkpor/avoid_the_...
I personally wouldn't use one for anything more critical than a door stop. Been there, done that, lost some data.
Thank god for RAID, because every single one of them and every single warranty replacement has now failed.
I bought HGST after that and have not seen a single bad sector.
I'm regretting the last batch of Seagates I bought to replace HGSTs now.
That's an interesting idea in other circumstances, however. Thanks for sharing!
edit: just found this:
Thanks Yev and everyone else there, a shining example of how to build a company and a reputation at the same time. Keep it up.
I guess I should just buy another one. I'm going to lose all my data anyday now.
It is true that hard drives often give a surprising amount of warning. I don't think I've ever had a drive totally spontaneously fail on me. But it's still best not to count on that.
That may help explain how your drive's firmware got corrupted.
Yev from Backblaze here -> good hustle! That sounds a lot like our "anti-vibration sleeves" from our earlier pod versions: https://www.backblaze.com/blog/backblaze-storage-pod-vendors... (https://www.backblaze.com/blog/backblaze-storage-pod-vendors...).
Someone else must be making the drive for them or they have a new factory doing something different.
For us the drives are constantly spinning, we don't power them down. One of the reasons is we never know when a customer will want a restore, so we have the data available 24/7. That said with our Vaults (https://www.backblaze.com/blog/vault-cloud-storage-architect...) it's theoretically possible to power down entire cabinets and still have the data available, but we don't currently see a need to do that.
I think he's right - energy is too cheap for this to be a commercially relevant factor to you, but if the true cost of energy was factored in (climate change, the human cost s of the relatively dangerous mining and oil drilling industries, local pollution, etc) then I think that would tip the balance.
Oh well. OK, final suggestion, maybe do it and spin it as a public relations win? :)
Sound pretty interesting. What's really going on?
Anyway I'm going to be a backblaze customer for $5 a month!
They use ext4:
Each of the drives in a Vault has a standard Linux file system, ext4, on it. This is where the shards are stored. There are fancier file systems out there, but we don’t need them for Vaults. All that is needed is a way to write files to disk, and read them back. Ext4 is good at handling power failure on a single drive cleanly, without losing any files. It’s also good at storing lots of files on a single drive, and providing efficient access to them.
Compared to a conventional RAID, we have swapped the layers here by putting the file systems under the replication. Usually, RAID puts the file system on top of the replication, which means that a file system corruption can lose data. With the file system below the replication, a Vault can recover from a file system corruption, because it can lose at most one shard of each file.
"Failure rates with a small number of failures can be misleading. For example, the 8.65% failure rate of the Toshiba 3TB drives is based on one failure. That’s not enough data to make a decision."