Seriously, you can find reports of this all over the internet. It's on reddit, twitter and elsewhere- these guys do not back files up properly, and their systems are so shoddy that they'll report they have files even when they don't.
My girlfriend brought her computer in for repair and they changed the hard drive. She specifically checked before she left that the backups were set, and the dashboard said they were. When we went to restore, however, none of the files could be.
We contacted support and they kept insisting we did something wrong. Finally they admitted that they had erased the files (yes, actually erased them). Due to a "bug" their dashboard reported we still had the files, even though they were gone.
They did everything they could to hide their mistake. They lied to people on twitter about what I was saying (since the only way to get real support was to hound their twitter account). They lied about all sorts of things. I had to take screen shots of everything and post it online before they'd even admit to screwing up. Even at that point their dashboard was still saying it had all of these files, when it in fact had nothing.
Backblaze is a completely untrustworthy company. If they spent half as much effort on backing up files as they do on marketing that may be a different case. Anyone who uses them for backups might as well not have backups at all.
We've since fixed that visual glitch (if you have a trial that expires and is not paid for, once the data has been deleted, that will be reflected on the overview page) AND we're currently working on some other notification systems to help people avoid losing data. As always, I apologize that you had this problem, but I'm happy to inform you that we've since fixed that visual bug on our end.
Also, to be clear, the customer service experience was horrible. The support team repeatedly ignored what we were saying and send us form responses back which were just not relevant. The twitter team claimed we misread the retention policy. The whole thing was kind of ridiculous.
Even more to the point, and the main reason why I tell this story, is that people should know how this company operates. People should know that they don't have backups of backups, that there is no margin for error should your account lapse, and that their systems in general are not robustly programmed in the sense that there are ways that your data could be lost and you wouldn't know it.
From my perspective Backblaze only has one thing going for it over competing companies, and that's it's marketing department.
You also have bonus fault on your side for not testing the BackBlaze backup. Never ever trust a backup, even if it's right in front of you on your own disk, unless you can successfully restore it.
I am not claiming to be without fault, but I do think that Backblaze is on the lower end of the reliability spectrum when it comes companies that do this, which is why I bring up these issues. I've switched over to Crashplan and have external harddrives doing local backups.
I should also point out that this isn't the only issue that's come up with this company. Apparently their client side GUI is also not completely truthful about when it's backed things up, and will claim to have backed up files when it's really only backed up the indexes. It then spreads the real backups out in order to increase performance. Although this is a cool feature, it's one people should be aware of and not hidden. I have less knowledge of this though, as I only saw it in a few blog posts (google around instead of taking my word for it).
I'm a big fan of using a combination of Arq (Love It), Dropbox, BackBlaze, and SuperDuper, but - crashplan/backblaze are both pretty similar, and both have their pros/cons, and, likewise, hundreds of anecdotes from the user community of each.
For what it's worth, I'm in my fifth year of Backblaze, and I've always been able to recover all my files - but I've also kept my subscription paid up.
And, I don't believe I've ever seen your "real backups happening spread over" claim - I am frequently in the field, so I am very aware of exactly when backblaze is sending data (frequently over my iPhone using Roaming Data) - and when backblaze says it's done backing up - that's pretty much exactly when data is no longer sent out.
> If you boot up real quickly and want to do a scan one thing you can do is open up the control panel and hit "Alt" + "Restore Options", we'll do a small-file rescan immediately and schedule a large-file scan.
>Small is files under 30MB or so, it's a quick index, the larger files take a bit longer and we try to spread those out over the course of a few hours so as not to be too heavy on your system.
All of this to make it look like your application is performing better than it actually is.
On top of that you keep doing that infuriating thing where you ignore what people are saying and respond as if they're idiots or by trying to deflect the issue. No one expects you to run when the computer is off, and pretending like that's what's being said is downright insulting. What they do expect is that hitting the "backup now" button, and having your application respond that their files have been backed up, should actually mean that the files are backed up.
The more you talk the more you try to twist this around into something it's not- a visual glitch, misreading your policies, or people turning their computers off when they shouldn't- rather than trying to understand why your customers are upset and actually deal with it. That's why this issue has grown, and why we keep having these conversations- you guys have zero concept of responsibility and would rather insult your customers than actually make a viable product.
If your complaint is that the Backup Now button does not do a full-system scan, that is totally valid and after the reddit thread our engineering folks are looking in to changing the "backup now" button's behavior. One of the reasons it doesn't do that now is because a full-scan will hang your system, where uploading the remaining files is very light and unnoticeable. Like with any functionality decision it's tough to say what is the best answer, hanging someone's system each time they press a button, or kicking off a remaining files upload and gradually scanning the drive over the next hour.
I sincerely apologize if I am not communicating well though, I am not trying to talk down or assume you or anyone else is an idiot in any way. I've tried to address everything that you bring up on here and on reddit. As far as expecting Backblaze to run when the computer is turned off, you'd be surprised at how many of our support tickets ask, "If my computer is off, are you still working?" so we do see that quite a bit. I don't bring it up to try and dig at anybody. I also think we've taken responsibility for the bug that mislead you in to believing you had data on our system when it had already been removed. Once we realized what had happened in your case we offered a refund and have since fixed it so that it does not happen to anyone else.
We're in the business of backing up data. When customers lose data, whether it's something they did or something that occurred on our end, we feel badly about it and try to make it right. We do have a viable product. We've restored over 4 billion files for the customers that have accounts with us. We take it very seriously.
Our CSR didn't understand the issue that was occurring, because he had not experienced this particular problem before. The CSR was not deliberately trying to mislead anyone, and in truth once the issue was escalated and we realized was going on, we learned that Tedivm had a legitimate complaint, that we had a visual bug that was showing the data as still being there when it was not. Once we realized what was going on, we acknowledged the issue and we've since fixed that glitch as well, so now expired trials with their data deleted show that they have 0 files available online.
I can go in to further into detail, but we try to keep all customer information/support queries private. We did do a bit more training with the CSRs about identifying atypical issues like this one, so hopefully they can be found out earlier in the support process instead of after a bit of back and forth as was the case this time.
1) Answering basic general questions like "how much does it cost"?
2) Debugging live customer issues. For example, if crappy anti-virus quarantines one of the Backblaze executables, what the heck is happening and why? Of if a customer's client cannot contact Backblaze's datacenter, which firewall is blocking it (maybe it is the software firewall on the computer, or maybe it is the router not allowing HTTPS through)? Etc....
In the process of debugging problems, the customer service reps are dealing with a truly insane array of possible issues and symptoms and they go back and forth with the customer asking for log files, ask if specific symptoms are happening, and in general COLLABORATE with the customer to get to the true bottom of the issue. Nobody lied. These are good, honest people that are really looking out for our customers and take their responsibilities seriously.
This is the kind of thing that happens when dealing with Backblaze. It's constant misrepresentation on the public side of things. Their twitter account people twist things around and misrepresent things and their employees come into discussions like this and defend the company by claiming I said things I didn't say.
Once we realized what was going on we told the social folks to try and defer to the support folks. It's hard to have a long conversation on twitter as well, it's not a great medium for full-on support, but it's great for sending links to FAQs and stuff like that.
I do honestly think that the conversations you had with our team on multiple fronts were misinterpreted on both sides which led to a lot of confusion. We've also gotten better at making sure that the social team asks our support folks for guidance if they see a tweet or post that they are not familiar with.
But computers always lie. Your disk says the file is there, you double click it, and it starts making noises and the file doesn't open. Backup systems are the same. It says the files are safe, but unless you test it, they may not be.
In the end, all I want to say is, I wish ZFS was free software :(
What I found is I had back ups of back ups of the things I actually cared about and the things I lost I hadn't truly looked at in a long time. A lot of the music I hadn't listened to in years.
I have found that I can fit on small SSD drives and as a result of the things I do wish I hadn't lost, I'm just keeping up with the practice of keeping multiple back ups.
TL:DR: I have made an effort to keep a smaller data footprint instead of hoarding.
Likewise: data that isn't backed up off site isn't really backed up.
An IMO more realistic statement: Data that is not backed up is implicitly classified as "nice to have" - in many cases a perfectly rational choice, if done consciously.
> Likewise: data that isn't backed up off site isn't really backed up.
You may as well argue that no data is ever "really" backed up because there is always the possibility of simultaneous failure. It's always a choice of how much risk you're willing to accept vs. how much money and effort it would cost to reduce it further.
This proves by induction that no matter how many backups you have, it's not enough.
There are several horror stories of people going to their offsite tape backups to restore some lost data only to find that their backups didn't work due to a malfunctioning tape drive.
Other stories about encrypted backups where the decryption key is lost (backup data is encrypted with a randomly chosen "session" key, session key is encrypted using a public key, no-one could find the appropriate private key to decrypt the session key.
In the event of the failure that revealed the problem, they lost about 7 months worth of data (which was the time since the computer had last been connected to the network, and had synced via Active Directory, or whatever they call it.)
I currently have two external drives: an archive drive and a backup drive: both 2TB. (Most things on the backup drive are compressed and deduped, which is why I can fit everything.)
I browsed the comments with two things in mind: was anyone uncomfortable with shucking and farming on this scale, and were the drives of equivalent quality?
Regarding the latter, user rsync of rsync.net claimed that "the drives inside these enclosures are the worst spec'd, highest failure rate drives." User budmang of Backblaze, however, said, "The drives were literally the same inside the external enclosures." User atYevP of Backblaze posted a similar comment.
Failure of external drives has mostly to do with their handling.
They get bumped around while runnning.
People trip over the cords and knock them over.
They stack them like pancakes and they overheat.
They block the fans and cause overheating.
Many enclosures are designed to look stylish - but don't cool well.
Heads fly at nanometers over the surface - they dont tolerate bumps well.
The drives we recover from external enclosures appear to be the same as bare drives that we recover.
I encourage people to buy their enclosures and drives separately - it's a little more expensive, but this way you can RMA just the enclosure if it fails. With an all-in-one, you either void your warranty to get the data back, or ship the whole unit back for a replacement.
So, technically, I could RMA the drive myself if it fails... or would they notice it and not process the return?
I mean, of course they are standard form factors, standard interfaces, etc- they are standard 3.5" hard drives- but they are always some odd model number I've never heard of with a 32kB cache, 1 month warranty, and 3600RPM. (slight exaggeration).
Actually garbage / messing up ecosystems is a different type of problem than obtaining resources. It is somewhat related to the second law of thermodynamics. Once you introduce so much entropy into a system the genie is out of the bottle.of course this isn't exactly thermodynamics but I can say it this way -- putting humpty dumpty together again is different than figuring out how to make humpty give you more food for decades.
What am I arguing for - that while we do consume more resources, progress is exponential as well. For example if we seriously begin to run out of fossil fuels, amount of effort that is put into other energy sources would increase by order of magnitude and solution will be found. It might not be right away, it might not be silver bullet, but when push comes to shove and there are no alternatives, human ingenuity is frightening.
In the end progress stacks multiplicatively - you add great battery technology to great energy generation technology and you got more than sum of the parts. So more people there are doing research - faster it goes.
This point is only true from certain point of view. Consider genius starting from nill, how far could he take technology within his lifetime? Electricity, probably. Microprocessors and genetics, probably not. Now consider million of people working, each discovery builds on an earlier one, each tool makes other people researching more efficient or enables them to make new discoveries. Now think about multiple generations of people building discoveries on top of each other while sharing information.
But that leads to a big problem as soon as the oil reserves diminish.
How should one not be fatalistic then?
"Switching hard drive manufacturers from Hitachi to Seagate and Western Digital. The assumption here is that all drives are the same and as we’ve learned that’s not really the case."
Where is the footnote. Why aren't they the same?
From their article on v2 they said: "The Western Digital and Seagate equivalents we tested saw much higher rates of popping out of RAID arrays and drive failure. Even the Western Digital Enterprise Hard Drives had the same high failure rates. The Hitachi drives, on the other hand, perform wonderfully."
There are other articles where they mention that they do testing and some brands perform better than others.
Having said that, my main little server at home hasn't had a power cycle for somewhat longer than a month as there have been no kernel updates or that like that have required it...
I mean, when you look at the dotted trendline in this graph: http://blog.backblaze.com/wp-content/uploads/2013/11/blog-co... you'd think by 2016 drives would cost $0 per gigabyte and by 2020, drive companies would be paying you to take drives off them.
Ultimately, after traversing enough benchmarks, the scale would go logarithmic, and measurements would have to take orders of magnitude into account.
I'll stick with a single blade.
So who's going to figure out what to do with our future profusion of razor blades?
With all that said, EVERY MONTH we bid 20 different suppliers against each other and choose the lowest price. For goodness sake, anybody on earth who wants free money just has to undercut the other suppliers by 1 penny and you'll win our business. Always open to new suppliers, please, take our money!!
I know a few Backblaze people are reading/posting in these comments, so I'll ask here:
Do you guys know the approximate price of when you can purchase them direct in 10,000 blocks?
It's a cool story but something just doesn't add up here.
We're building and deploying 17 pods this month, each pod has 45 drives, so we purchased 765 drives this month (plus replacements for failed drives that are out of warranty, etc). So if we would buy 13 months of drives all in one purchase, we might be able to go direct.
THE TRADEOFF (Gamble?) is that drive prices drop every month, so stockpiling a year worth of drives (even at a 10 percent discount) would lose us money. Before the Thailand drive crisis we had a rule of thumb that drive prices dropped 4 percent each month. But this is just a judgement call, your guess is as good as mine whether this makes sense. We're figuring this out like everybody else.
What did Andy Grove say? "Only the paranoid survive." Now we run a larger buffer of drives.
He does note that his costs per drive didn't go up dramatically after the Thailand floods, and they were pretty close to what backblaze was playing for their "shucked drives" - though, they didn't go down signficantly either.
I'm guessing Backblaze is still at the volume levels where they can't negotiate directly with Drive Vendors - I'm not sure where that level is - 10 Petabbytes/Quarter? 100 Petabytes/quarter?
Regardless, I continue to be very impressed by how backblaze manages to circumvent what others might consider insurmountable hurdles. Truly hacking the system in a good way.
Part of the is price sensitivity. External drives are geared towards consumers, who simply will not buy the drives if they go up too much in cost. The internal drives we were buying are targeted at business' who have a choice to make, either buy at the higher price, go out of business, or get creative. We try to go the go creative route because we really don't want to go out of business, and paying the higher prices for drives, especially in this case, would have gotten us there.
As for the negotiation, BrianWski mentioned it in another reply on here, but we keep getting weird answers from vendors and manufacturers. Typically they want a minimum order of about 10,000 hard drives in order to work directly with them and we're simply not willing to buy that many at a time, especially since normally the price goes down monthly, meaning carrying inventory is leaving money on the table.
Love that you enjoy your blog posts, stay tuned ;-)
This is a typical response in all markets that are commodities (or approaching commodity status). Once makers have had an excuse to move prices up, they want to hold onto the extra profit for as long as they can, so prices are fast to rise, but slow to fall.
The most direct effect can be witnessed in gasoline prices. Notice how a news report that a camel farted in the desert of Saudi Arabia results in the price of gasoline at your local station rising by 25 cents a gallon (or more) nearly instantly. Even though the gas in the underground tanks was the same gas (at the same cost to the station) that it was 15 minutes ago.
But then, it takes weeks before that 25 cent increase goes away when no more camel farts occur over those same weeks. Same effect here, just with hard drives rather than gasoline.
Does anyone know why external drives are typically cheaper than internal drives?
Backblaze tries to avoid the "enterprise tax" as often as we can, and internal vs. external drives are just another example of how "enterprise-grade" items get marked up because the target market will pay for it.
PS: The NSA could be recording every phone call ever made and it's still not all that much actual data.
How about render farms & their associated data stores. How about Google Maps & the associated scads of satellite imagery.
You could be right, of course, but I'm not sure we can say with certainty.
4tb external drives are something an end-user consumer would shop for based on price.
While a 4tb bare internal is shopped for by techies building systems.