I'm not a really big expert on your file system. How easy is it for you to replace a faulty drive? Just pop it out and put in a brand new one, even if it's other capacity and/or brand and/or model? Is firmware upgrade of harddisks supported?
Sometimes i wonder if it's possible to make a sort of small consumer (nas4free?) edition of the storage pod. Must be awesome to use almost any drive and still have a reliable big nas at home.
There are still some rough edges, but it's overall a pretty nice setup for me. I setup the virtual drive to require at least 2 copies of the data on the underlying disks (but there are options for 3 copies and more I believe), and then you can add and remove disks to the array kind of whenever you want. They can be different speeds, sizes, whatever.
Of the drives installed in 2012, by 2015 32% had failed, 62% were removed out of caution for failing some diagnostics, and 6% were still in place.
Not worth the risk to save $20. I have no way of knowing if Seagate fixed the issues and kept producing them or if they're still trying to dump old stock of bad drives.
EDIT: Ah, I see that you meant you're turned off by 3tb Seagates in general. You could just look at the model number.
Got a decent deal on an HGST a few weeks later,
Home users should just backup their important stuff and buy whatever they want. A super reliable drive won't save them from an accidental drop, a fire or theft.
I imagine most of their drives don't even see much traffic if they're used for backups. Write once, read rarely. I wonder how B2 changes that equation.
That being said, I will never ever buy a Seagate HDD again. They joined Maxtor in my blacklist a long time ago. I lost every single one I ever bought (and RMAed) within a year.
I basically just have a cron job setup to run b2 sync at 2am, and then kill it at 8am. It generally doesn't take that long to run now, but when I was first syncing everything (couple hundred GB) this allowed it to take place over several days without affecting my daytime bandwidth.
Btw, they should include Amazon Glacier on the pricing page comparison, not just S3.
What is it missing that Crashplan and Backblaze Personal Backup have?
$5/mo of B2 is about 1TB of storage, so if you're above that then maybe cost is a factor?
It's cheaper then backblaze, slightly. It's not really comparable, because it's not distributed and there is real risk of data loss. However, for my needs, it works as part of a comprehensive backup plan. I get a bit more bang for my buck and more flexibility.
Seems to work OK so far. Won't be as cheap as their unlimited plan but I'm OK with that.
I'd have to say the 30 day retention policy is one of the main issues. I'd also add their lack of restore in their client myself is a big issue.
> It's like a gift.
We're glad you like it! It isn't hard for us, and we get a little press out of it so it is TOTALLY worth it for us to do. Once we setup all the analysis scripts to pull the data, it's mostly just an automated system.
What baffles me is why nobody else reports drive failure stats? I mean, people hear about Backblaze and buy our product because we're providing this data, why don't any OTHER companies want this free stream of customers?
I have no idea why and am wondering too. Maybe a tradition of closeness, or they didn't think it would matter to anyone. Also I think hard drives are a special case because drive reviews are rare if non existant nowadays; and you need large amounts of them to do analysis. Which until companies like your came, was probably relegated to big business only; unlike today where cloud storage is mainstream so the average joe may be involved...
I think more sectors should have automated statistics like these; consumer struggles to assess the quality of goods; relying on adhoc comments or tiny (paid) reviews...
Amazon, Microsoft and Google probably have too much to lose from calling-out the vendors of poorly performing drives and the ever-present risk of a lawsuit - which is probably the overriding concern: if Contoso Storage Ltd had a single bad batch that coincidentally Azure used for their storage operations, they'd report a on-the-whole inaccurate failure rate, and Contoso's revenue and stock price would dip accordingly.
Given the size, scale and marketing of AWS, Azure and Google's Cloud services respectively I don't think them publishing their hardware failure rates would positively affect their cloud services revenues any detectable amount - all for more work to analyse and publish the findings and the subsequent liability.
My wild-ass guess is that organisations with enough hard drives to make reliable estimates give performance/failure data to whoever installed/designed the arrays (who i imagine get some competitive advantage from knowing which drives fail) and so there's probably a pretty serious culture of secrecy around this stuff.
Where you do see it is when relations are strained, see for example YouTube bandwidth reports.
I'm curious if your guys' view on NAS options is evolving at all?
My interest is this:
Here at Pixar we have several folks who I'd call "lazy power users" at home. Folks like us are familiar with computers, and we want a strong home network, but want to spend as little time as possible sysadmin-ing the thing. That generally means powerful, easy to manage wifi, proper firewalls, etc.....and networked storage/sharing & backup of all the family computers, from personal machines to spouse and kid setups.
For the circles I run in, this is a fairly common case, and no single service seems to fit the bill.
Backblaze seems so close (especially WRT "it just works"). If it could offer a "Home backup solution" as a service...oh man, I know of at least a hundred people who would sign up in a heartbeat.
- MikroTik firewall, centrally monitored by "The Dude"
- Unifi wireless on a hosted controller if the size justifies, otherwise do MikroTik CAPSMan or just straight integrated wifi AP on the bridge.
- VPN tunnels on the MikroTik to HQ (or not).
- Synology NAS on-site in 2-5 bay config (hot-spare).
- Time Capsule the Macs.
- Windows File History the PCs
- rsync the lunix.
- Use the cloud connector to back it all up to a central Backblaze B2 bucket (straight from the Synology).
- Do more with Dude like alert to order toner when printers SNMP fires.
Multiply ad nauseum.
Our Backblaze "B2" product line was designed so that you get the exact same cost of storage of the online backup product line but you were free to write ANY policy you like (such as backup NAS boxes). Developers can use these APIs: https://www.backblaze.com/b2/docs/ And if you are a "lazy power user" who wants something that just works, maybe check out one of these 3rd party tools: https://www.backblaze.com/b2/integrations.html
It's simple, Backblaze "home" doesn't work on NAS boxes, but Backblaze B2 does, Synology NAS supports it natively via the Cloud Sync package.
Backblaze home is $5 flat rate for a single machine.
Backblaze B2 has granular pricing but it's like <$20 a year cheap for several of my clients.
For what I want (a fairly static 2 TB backup), it would cost around $130 / year. If my QNAP box supports it, I think I'm going to sign up.
Do you mean some kind of _managed_ on-prem NAS or a NAS hooked up to Backblaze online storage?
In other words it either fails early or it will last for a while.
Can you plot the time to failure distribution for various models to confirm/deny this rule of thumb? I think it'll be a good addition to list of current charts you have since it's a bit more meaningful that overall failure rate.
'While files are expunged from the servers after 30 days if they're removed from a computer, your most recent backup snapshot will be retained for 6 months if your computer is completely unable to contact our servers (either it's shut off, or no internet connection). As long as your computer can contact Backblaze at least once every 6 months and perform a full Backblaze file scan operation, you don't delete or transfer the backup and you retain active billing, your most recent snapshot will be retained.'
I use it strictly for "personal" and critical data that I cannot recover or get from anywhere else once lost - personal pics, personal videos, my notes - diary, some mails. No, not all mails - most of them are left with Google, MS, on my VPS and if they are gone, I am not gonna miss them terribly. I don't even store my code on CrashPlan servers - for that's there gitlab and bitbucket (mirrored there) and my external hard disk.
In fact I have a ~3GB folder in Dropbox that I have named "Emergency Backup" and if all is lost I might be happy with just that.
It's not that I am being a model "low storage" customer so that CrashPlan can function. But I want to keep my backup habit disciplined (in my own way, of course - many would find my backup strategy as stupid for their own use cases and that's fine).
So please, for the love up proper backup, bring unlimited versioning and file retention for customers like me :-) Or float a cheaper plan where you limit the storage. Or hell, bring something like cold storage and dump my backup there
(I know B2 but that's not what I am looking for; something baked in your main backup service). All I would do is periodically keep checking whether my backup is there or not and I will just leave it be. (Okay, if not really unlimited then something close to it).
We need a CrashPlan alternative. I am willing to stick with and wait for almost a year but after that I would like to move to a better alternative and more trustworthy which you are - except, in all honestly - that glaringly missing (or omitted) critical backup feature. Also, my backup is something I want to pay and let someone else handle in a very solid way.
Here's a discussion I had with Brian few days back and there are some points I have raised. I am not saying they are brilliant ideas, actually that is a wish-list but please have a look if you can - https://news.ycombinator.com/item?id=15074647
I understand the price difference but I want the ease, peace etc, not something where I need to hook two or few things together.
Keep your focus on the storage angle, there are ways to accomplish this, I use a "glue box" (Synology NAS) to collect the data and fire it off to B2, could easily be a linux machine running b2 cli tools.
For offsite backup, I use a time4vps storage server, but I might transition to B2 in the future. I like what I'm hearing.
> Part of it might be because they don't support Linux
Backblaze DOES support Linux through the B2 product line and third party applications. There is a list of applications that support Linux here: https://www.backblaze.com/b2/integrations.html (scroll down and look for pictures of Penguins).
Supported applications include Dupicity which ships inside of several Linux distributions such as Ubuntu and Debian. So you may already have your local client application pre-installed on your linux computer ready to backup to Backblaze!
> I've heard stories of people using Amazon to store over 1PB of data which would be thousands of dollars monthly even at Glacial cost. Do the "ultra" users weigh down on your margins?
(Obviously with a price that scales with the amount of data stored, "ultra" users wouldn't be a problem on B2.)
If their building catches fire, you're SOL.
The second consequence is that yiu can't choose a data center that is geographically close to you.
That makes B2 unsuitable for off site primary storage for critical data and I wouldn't use it for more than a backup.
That being said, I'm a happy backblaze customer and if I ever get a NAS, I would definitely use B2 as the backend for my backup solution.
> If their building catches fire, you're SOL.
We call this the "meteor hits our datacenter" scenario. With the Backblaze Online Backup product, the hope is your laptop isn't hit by the same meteor so you still have a primary copy of the data.
But I'm a HUGE believer in having two lower redundancy backups stored with completely different technology backed up by two separate companies that don't share a single line of code. In two separate locations. For example, make a local copy onto an external USB hard drive, and use Backblaze for a remote copy in case your house burns down. It would also be Ok to put one copy in Amazon S3 and a separate copy in Microsoft Azure, and try to use two separate pieces of software to do each of those backups.
The main reason to use two different companies is in case a bug exists in the backup software. The same bug won't hit both backups at the same time.
Just reading this right now I realized, actually, usually for online services you want to use the server closest to you, but for backup I guess it's the opposite - you want the server that's furthest away from you!
Source: I emailed Jeff Barr and asked.
AZs are usually within something like a 50-mile radius, which doesn't get you meteor level separation but does get you fire level separation.
The TL;DR from AWS documentation :
An Availability Zone is represented by a region code followed by a letter identifier; for example, us-east-1a. To ensure that resources are distributed across the Availability Zones for a region, we independently map Availability Zones to identifiers for each account. For example, your Availability Zone us-east-1a might not be the same location as us-east-1a for another account. There's no way for you to coordinate Availability Zones between accounts.
The long and confusing explanation:
At least not in us-east-1 and us-west-1-2, but I am pretty sure many of the large regions are also run in multiple physical facilities.
The so-called availability zone is an abstract and virtual concept. Let us use us-east-1 as an example.
Assume the following:
* Physical DC buildings: Queens, Brooklyn, Manhattan, Staten Island
* AWS accounts: Joe, Alice, Bob
* AZ: us-east-1a, us-east-1b, us-east-1c, and us-east-1d
Every AWS account in us-east-1 region is assigned three AZs. But for sake of this explanation, we assume only two.
* Joe: 1a, 1b
* Alice: 1a, 1b
* Bob: 1a, 1c
You now ask, "WTF?" but you let this go, think this is done for capacity reason. So do we actually have four different physical facilities, one per each AZ? Nope.
So is 1a and 1b in the same facility? Not necessarily, but very possible.
So 1a and 1b in Queens, 1c in Brooklyn, and 1d in Manhattan? Nope.
So what the fuck is AZ? What is the relationship between AZ and physical facility?
Think about virtual memory address space.
Joe's 1a and Bob's 1a are in Queens, but Alice's 1a is in Manhattan. But Joe's 1a and Bob's 1a are on a different floor, different racks, while Joe's 1b and Bob's 1c are in Brooklyn and on the same floor. This is why certain customers run out m3.xlarge in 1a but others don't in their 1a.
In essence, AZ is a label and is unique per account. AZ is very similar how virtual memory address in OS looks like.
We learned this because our EMR failed due to low capacity in one account.
"Amazon initially said little about the physical layout of AZs, leaving some customers to whether they might be different data halls within the same facility. The company has since clarified that each availability zone resides in a different building."
“It’s a different data center,” said Hamilton. “We don’t want to run out of AZs, so we add data centers.”
To make this work, the Availability Zones need to be isolated from one another, but close enough for low-latency network connections. Amazon says its zones are typically 1 to 2 milliseconds apart, compared to the 70 milliseconds required to move traffic from New York to Los Angeles.
“We’ve decided to place AZs relatively close together,” said Vogels. “However, they need to be in a different flood zone and a different geographical area, connected to different power grids, to make sure they are truly isolated from one another.”
So, distance of availability zones from each other is limited by speed of light in fiber optics (which is slower than through a vacuum or microwave wireless).
Based on this calculator: http://wintelguy.com/wanlat.html, availability zones can't be more than 0.5-1 miles apart (about) to retain their 1-2 millisecond network latency, so they're different buildings in the same industrial/business park. We could confirm this by pulling permits (public record) in Amazon's (or their subcontractor's) name.
That is not a guarantee. AWS doesn't actually publish more than what I cited (well there are photos of the DC flooding around the Internet). But there are different physical facilities, and they are some miles apart. Like I said above, 1a for Joe and 1a for another customer don't have to be in the same building, or on the same floor.
Can you have multiple API "keys" set up for different buckets within one account? And can these keys be permissioned to constrain an API key to bucket(s), and to control the operations possible within that bucket?
My use-case here would be for the remote server to have a B2 API key available to it which only permits data to be added to the bucket. That way, if the server were ever compromised, an attacker couldn't tamper with or erase the backups.
I'd use another (more privileged) API key to do any maintenance of deleting from buckets as-and-when it was needed.
The reason for asking about having separate keys for buckets was purely for "isolation", so a laptop backup bucket API key wouldn't be able to do anything to the desktop bucket (and vice versa etc.)
B2 user here. No, and its the one thing keeping me from migrating more data from S3 to B2.
One key at a time only, and there is no permission granularity.
Since the backblaze folks seem to read these threads, please make this happen! I'm ready to give you more money per month.
That's a real shame - I am sure I could segregate the different devices by making multiple accounts under the "groups" system, but that gets a bit messy!
I'd love to see this happen! Even if it was just a list of the API functions from the docs with a checkbox against each, hidden behind a "danger" screen, it would make me feel more confident using this! Even if it wasn't granular per-bucket, it would make it a little safer when leaving credentials in cronjobs or bash scripts.
> two separate credentials
It is literally at the very top of our list to do. I'm staring at my task list and helping out on that is my number one task. You should see it coming soon!
However, can you provide me with a backup client integrated with B2 that makes backups as simple as the regular home service? I don't want to deal with CLI or anything complicated.
I get that you don't want 150TB uploads, but why do you insult your potential customers by calling them hoarders just by the OS they use on their home devices?
30 drives gets like 600 MB/s on ZFS, compared to 1200 MB/s with Adaptec hardware RAID.
Also got a lot of bad stuff to say about backuppods. Mostly that they don't sell the correct number of wire harnesses. (They need to sell two but they're only selling one, I assume I'm the only person stupid enough to buy from them...)
You can still use hardware raid with ZFS if you export each disk as its own device and assemble the raid with ZFS. It is not the parity calculations that are slow, it is the additional io operations that are required and ZFS writes these directly to disk. A hardware raid controller writes these io operations to memory. Another advantage with ZFS is if your data compress well, you will get additional speed benefit from that.
I got what I think is a correctly configured ZFS system and its getting the same speeds as most people report online.
NOW the thing is that the ZFS system doing about 2x worse for the dense sequential write workloads compared to the RAID6 HW solution. Not to mention eating a dozen or so GBs of RAM.
Software raid has a larger write penalty, that is why you see slower write performance versus using hardware raid. As I said earlier, writing directly to memory helps. Recommended read: http://rickardnobel.se/raid-5-write-penalty/
There are providers like Spanning or Backupify, but I'd prefer to buy it from you because I've been a Backblaze customer since the early days and I trust you.
Notice the rows where the number of failures is higher than the number of drives. That means the replacements failed too.
Edit: the 3TB Seagate is apparently infamous enough to have its own Wiki page: https://en.wikipedia.org/wiki/ST3000DM001
Only the "large" Seagates (6TB and above) seem to be doing OK.