Hacker News new | past | comments | ask | show | jobs | submit login
Hard Drive Stats for 2019 (backblaze.com)
355 points by sashk on Feb 11, 2020 | hide | past | favorite | 163 comments

Interesting how the numbers carry over year-to-year in


Some models are dwindling. Some are being tested. Others (like the Seagate and HGST 12 TB) are increasing. Only thing that's really perplexing is why they keep buying more and more of the high-failure-rate Seagate 12 TB drives. It must be more than 3% cheaper to buy (and service!) a Seagate with a 3% chance of failure than to buy an equivalent HGST with a 0.4% chance of failure. I guess when you have 120,000 drives, easy hot-swap enclosures, and software to handle it all that makes good sense! But as an individual consumer, even with a Backblaze backup, it's definitely worth my time to spend a bit more on a drive that's far more reliable than to save a few dollars on a Seagate.

If I make a hard drive, and sales are crappy, in part because BackBlaze told the world how shitty they are, I'm going to have to drop the prices to move product.

I suppose there's a movie plot in there where BackBlaze negs their favorite drive so they can buy them cheaper.

> Only thing that's really perplexing is why they keep buying more and more of the high-failure-rate Seagate 12 TB drives.

I am guessing they RMA the drives and get replacements.

Your comment just sparked an interesting question in my mind: If a drive has failed, until now I always imagined the drive was just trashed. But now that you mention they are probably RMA'ing them, do you think that BackBlaze send the RMA drives through a magnetic tunnel of some sort before they ship the drives back to the manufacturer? Because otherwise, how do they ensure potentially unencrypted customer files are not accessed during the repair/refurbishment process?

I work at a large B2B SaaS that stores customer data, we pay extra for the option not to return failed drives that can't be wiped for RMA. We still get a replacement but the original is physically destroyed with a shredder.

I'd hope that their data is all encrypted at rest. Compared to the bandwidth of spinning disks, the cost of doing hardware assisted AES isn't big.

Yeah, I would expect any data reaching the drives to be encrypted by Backblaze, with the key newer reaching the disk.

You could even have keys per disk and wipe them when a disk fails.

Either way, you should be fine to RMA the drives as for an external observer without the keys they just contain random noise.

When I back things up with BackBlaze, they leave my computer encrypted, so they're encrypted at rest with them.

They once wrote somewhere that they have contracts with Seagate that allow them to get the drives much cheaper if they buy certain quantities.

At least in Europe, HGST is much more expensive than Seagate. Almost double the price, usually.

Very anecdotal evidence, but 3 of the 3 Seagate drives I ever used (all external 2,5” USB 3 HDD’s, in Seagate’s own enclosures) failed within 2 years, under very modest workloads (just used to store video files for my tv to play).

Meanwhile all WD’s have been rock solid.

FWIW, the consensus on /r/datahoarder seems to be that Seagates should be the absolute last choice for long-term storage.

Seagate do an 'archive' HDD, but with only a 3-year warranty I wouldn't expect it to work as a long term solution.

I have same experience with 5x 3 GB Seagates. None lasted 2 years. Replaced those with Toshiba's and HGST in my Synology and it's been 3+ years without bad sectors. Will never buy Seagate again.

I wonder if that's a characteristic of streaming a single file, consecutive blocks. That's be really strange behaviour though. Perhaps a thermal issue if the TV keeps it powered and spinning all the time? Certainly Xbox seems to keep the disk spinning - I had a WD Ext HDD attached and the light was always flickering even with the console off for whatever reason.

Personally I also find Seagate the loudest and 'clickiest' of all the drive brands. I can hear the mechanicals, making me think they will fail, so I trust them less than other brands.

Why do people use Amazon S3 when Backblaze B2 is 1/4 the cost of S3 and also includes a CDN for free. You also get way faster access speeds with Backblaze vs Amazon since they tier their IO speeds.


Unless you're a bootstrapped startup with just a couple people, paying the AWS bill is not something the engineer probably thinks about too much. Setting up a new billing account with another company is just enough friction to just use whatever AWS offers and call it a day.

Also, most employees aren't really incentivized to reduce or minimize infrastructure expenses.

I think a big reason is that people are using the rest of the amazon ecosystem. If your costs aren't primarily storage, you might be willing to pay a premium to use something that integrates nicely with other services you're using. Here's an article[0] that does some other comparisons between providers and mentions things like upload speed and security features.

[0] https://www.cloudwards.net/azure-vs-amazon-s3-vs-google-vs-b...

Last I checked, Backblaze still stores most data in 1 location, no?

So, durability of data (which to be fair doesn't matter for most s3 use cases), and interop with literally everything else in AWS

Intelligent data tiering

Actual access control

Pre signed URLs

I've combined cloudflare workers with backblaze to implement etags, signed URLs, etc. Backblaze is part of CF's bandwidth alliance so your bandwidth fee is zero. This makes for a very low monthly cost

Can you elaborate further about this setup? Is there an article or a FAQ topic about it?

Hi, I haven't had time to write up about this, however, I have dumped the majority of the related code here for you and others who are interested in this solution: https://gist.github.com/chocolatkey/a7ef0364e357629e9875521d.... That should help you get started. It includes HMACSHA256 shared secret URL signatures based on IP, expiry, and optional path scope restriction, caching, ETAGs, sentry error reports, access to non-B2 data from a server w/ basic auth, and more... URLs look like this: https://example.com/delivery/UNIQUE_ID/p-001.jpg?token=16fb4... . My B2 bucket is public, however the requested path is also hmac'd with a secret known only to the CF worker to derive the path of the resources in the bucket. It is optimized for my use case of serving EPUB data. I do not guarantee it to be free of flaws, but it's worked well so far.

Thanks a lot!

Yev from Backblaze here -> the Nodecraft folks did a great job with that project -> https://jross.me/free-personal-image-hosting-with-backblaze-...

Hi, Yev! Thanks a lot.

I am fancying the idea to move our CDN from AWS stack to B2 + CF, thanks to Bandwidth Alliance. There's at least one thing stopping me: for simple schema of hosting static content out of bucket we should deploy Workers just for URL rewriting. Guys from CF recommending that way and not URL rewrites by simple rules[1]. But it puts us in a weak position of raising cost twice: for increased edge trafic AND for increased number of requests.

Can anything be done on BackBlaze side to address the problem, like custom domains for buckets? Like https://f001.backblazeb2.com/file/bucket-name/file.jpg => https://bucket-name.f001.backblazeb2.com/file.jpg ?

[1] https://community.cloudflare.com/t/page-rule-setting-request...

Disclaimer: I work at Backblaze.

> Last I checked, Backblaze still stores most data in 1 location, no?

Backblaze now has multiple regions! One in Europe (Netherlands) and one is called "US-West". Quietly the US-West is actually three separate data centers, but your data will only really land in 1 datacenter somewhere in US-West based on a few internal factors.

To be absolutely clear, if you only upload and store and pay for 1 copy of your Backblaze B2 data, it is living in one region. To get a copy in two locations you have to pay twice as much and take some actions. So if this kind of redundancy is important to you for mission critical reasons Backblaze B2 would only be half as expensive as one copy in Amazon S3, not 1/4 as expensive.

In the one copy in one region in Backblaze B2, any file is "sharded" across 20 different servers in 20 different racks in 20 different locations inside that datacenter. This helps insulate against failures like if one rack loses power (like if a power strip goes bad or a circuit breaker blows). But if a meteor hits that 1 datacenter and wipes out all of the equipment in a 1 mile blast radius, you won't be getting that data back unless you have a backup somewhere else.

S3 is single zone unless you specifically request multi zone durability (most don't.)

I don’t think this is accurate. Within a single region, S3 stores in multiple availability zones by default.


You're completely right. I guess I was working on old info.

Paying for outbound bandwidth is a big one.

In the same way, why people use Backblaze when they can use Wasabi and not pay the bandwidth ? https://wasabi.com/cloud-storage-pricing/

I looked at Wasabi some time ago, but their pricing is a LOT less simple than their headline says it is.

The major caveats are hidden away in their pricing FAQ: they charge a 1TB minimum if you use less, and there's a 90 days minimum retention period, meaning if you update a file a few times you will pay for the full 90 days of every intermediate version. Additionally, they reserve the right to make you pay for egress if it looks like you transfer more than you have stored.

So all in all, Wasabi might be the right fit for you if you store >1TB of files that are infrequently updated and get less than 1 download/month on average. If you fit that use case, I think their free egress pricing is awesome, but it's definitely not for everyone.

Wasabi does not allow you to use unlimited bandwidth. Your egress is supposed to stay close to your total ingress. So if you are uploading assets that will be access more than a few times in the first month, I think you will be out of spec for wasabi.

If I understand right, if you put CloudFlare in front of Backblaze, you get free bandwidth thanks to Bandwidth Alliance: https://www.cloudflare.com/bandwidth-alliance/backblaze/

Not unlimited, and not for all use cases. Check out Cloudflare ToS.

Wasabi charges a minimum of 3 months of storage on anything uploaded.

Because of vendor lock in. When you move a lot of data between S3 and EC2 it costs nothing (or very cheap). When you move data outside of AWS, there is extra cost, so it might not even be cheaper overall.

I am strongly considering B2 as an option for a dropbox-style system. Something where I run 8-16TB of hot tier on my local LAN, with B2 serving as the slower mass storage tier behind it. It seems that the average B2 access latencies would be ~100-200ms, which is very tolerable for a cache miss on such a massive tier of storage. With this amount of space available you could have pre-fetch rules that do things like pull down entire directories as files within them are accessed.

There are usually many reason:

1. Scale - S3 is big - really really big! You don’t need to care if you store one KB or several petabytes.

2. Tiers: the default on S3 is several way replicated storage with 11 9s of durability with high availability. However you can select from cheaper options with the trade off you are happy with.

3. Cost: S3 has reduced prices several times, you can be reasonably sure your costs will go down over time on per unit basis.

Here's one reason I would need something like S3. Sure, I hack all that together into something barely functional myself, but it's not worth it. Pretty handy. https://aws.amazon.com/solutions/serverless-image-handler/

I too use the serverless image handler but it's not perfect. The documentation is really crappy and over the summer they transitioned the whole system from thumbor to sharp and didn't provide great backwards compatibility.

Where's the info about the free CDN?

Check Cloudflare bandwidth alliance

This bandwidth alliance? https://www.cloudflare.com/bandwidth-alliance/

Cloudfare is a featured integration that only mentions that transfer fees are free not that CDN hosting is free: https://www.backblaze.com/b2/solutions/content-delivery.html

Cloudfare does have a free CDN tier "For individuals with a personal website and anyone who wants to explore Cloudflare." but it's not the same as B2 including a CDN for free, even Azure is apart of the bandwidth alliance.

Right, but it means you can basically (ab?)use Cloudflare to get free egress from B2 storage. Cloudflare won't get too mad until you start hitting terabytes per month; even the free tier doesn't have restrictions.

You can also turn on an extremely aggressive caching policy with a page rule that will keep everything under a given subdomain for a month. This makes the "free CDN" part easy, though again, people who do this run the risk of getting their accounts terminated.

You just get a discount on data egress, it's not free.

It depends on the partner. For Backblaze specifically it is indeed free.

I use B2 as “cold storage” of large-ish files. It’s incredible how low the monthly bills are.

If you're with another cloud provider, you still have to pay egress fees to Backblaze. That cancels out the cost savings.

B2 does not implement the S3 API. Also the B2 API is much slower than S3.

Disclaimer: I work for Backblaze so I'm biased. :-)

> the B2 API is much slower than S3.

This is "generally true" for 1 upload thread. We aren't even sure what Amazon is doing differently, but they can be a little faster in general for 1 thread (some people only see 20% faster, some see as high as 50% faster, might be latency to the datacenter and where you are located).

As long as you use multiple threads, I make the radical claim that B2 can be faster than Amazon S3. The B2 API is slightly better in that we don't go through any load balancers like S3 does, so there is no choke point. What this means is that in B2 40 threads are actually uploading to 40 separate servers in 40 separate "vaults" and none of the threads could possibly know the other threads are uploading and it does not "choke" through a load balancer. This was all designed originally so that 1 million individual laptops could upload backups all at the same time with no issues and no load balancers. And it works great every day.

Practically speaking, for most people in most applications, this means both Amazon S3 and Backblaze B2 are essentially free of any limitations. If you aren't using enough of your bandwidth, spawn a few more threads (on either platform) and soak your upload capacity. But in full disclosure, if your application is only single threaded, yes, B2 tends to be 20% slower for that 1 thread.

IAM access control, ease of use, reliability, speed.

Because the rest of my infrastructure currently runs on AWS and aws egress charges are far more expensive than the b2 savings.

Genuinely curious ... do you not assign any value to having a backup outside of Amazon ?

AWS can certainly provide geographical diversity, but on the organizational abstraction layer, all eggs are in one basket, yes ?

Is having organizational redundancy something you assign zero value to, or something whose value conflicts with the egress costs so as to make it a difficult decision ?

Again, genuinely very interested ...

Not OP, but of all the things that could kill the startup I work at, AWS shutting down is about on spot 63864664 on the list.

I mean we have like 2 millions of line of python code written for lambda, S3, SQS, SNS, Kinesis, Redshift etc using boto3. So if AWS dies, it's not like data backup will save my startup. We're dead.

That sounds troublesome, no?

Not the parent, but they mentioned that they are a startup. AWS "dying" has killed zero startups so far. Time to market has killed many more, same for "not-invented-here" syndrome, and prematurely building for the future.

Maybe? I'm not an influential enough engineer to change something that fundamental. Seniors say it's troubling but they're already married to AWS so it's very expensive to have a plan B. I don't think AWS dying is high on the list of why the startup can die. There are bigger dangers and they can only be solved by writing code that works.

I see the bigger danger as not AWS falling over, but AWS deciding to charge more money once you're locked-in.

We attempted to be cloud agnostic (using terraform instead of CloudFormation for example) and then later multi-cloud. The amount of complexity and cost around it was just too much.

If AWS goes down, more or less a good portion of the internet goes dark. It's an acceptable risk at this point unless you are truly massive and entirely self contained- if you are using any 3rd party services, IE for auth, payment, whatever- they may be using AWS as well and you are still exposed.

We backup data that's not on S3 outside of AWS (code, operational databases), but most of our S3 data is effectively stuck due to the insane export prices. It's not the end of the world if we were to lose everything in S3 anyway.

To anyone reading this: Don't store lots of small files on S3. It's a terrible idea.

Collocation providers now have options to put your physical environment on network with your amazon account to avoid egress fees.

I live for these reports. Always insightful and professional. Thank-you SO MUCH for publishing this data.

Yev here -> You're welcome! The conversation's always fun :D

I barely had time to skim it, but I'm not sure I like how the ST12000NM0008 shows up in the table. I find it really hard to reason about what the real failure rate could end up being on those drives. For example, you've got about 45 days average on each drive, so the failure rate is multiplied by roughly 8 to extrapolate the annualized failure rate. Doesn't that over state the estimated rate of failure since drives will tend to fail more often at the start of their life?

I only guesstimated out of the table and didn't have time to look at the actual data, so it's possible I misread something.

Does anyone remember what is their definition of "drive failure"? Is it SMART "failure imminent" report, single uncorrectable read error or complete data loss for a whole disk? I recall reading about it in one of their previous report, but can't find it again.

EDIT: nevermind, found it.

"Backblaze counts a drive as failed when it is removed from a Storage Pod and replaced because it has 1) totally stopped working, or 2) because it has shown evidence of failing soon.

A drive is considered to have stopped working when the drive appears physically dead (e.g. won’t power up), doesn’t respond to console commands or the RAID system tells us that the drive can’t be read or written."


Ah yes, the reliable BackBlaze folks. That they've out-Googled Google in a niche using mostly commodity infrastructure and kept their business alive for so long is a testament to their ingenuity (I wonder how their operating costs compare with AWS Glacier which has a theoretical advantage of unpowered disks.). And the releasing of this proprietary operational business data is a testament to their coolness factor.

It's a timely article as I'm looking at HC530's (WUH721414ALE6L4 / WUH721414ALN6L4 (wiredzone carries it)) for a home FreeNAS box:

- any relatively-modern enterprise 4U 3.5" storage box with Xeon 4 cores or so

- quieter, high-volume fan mod

- RAM: 64-128 GiB, beyond that isn't useful unless deduping

- NIC: X710-T4L 4x 10GbE copper NIC

- ZIL: mirrored pair of high-endurance, write-intensive, reliable SSD like Optane 900p/905p 280-480GB

- L2ARC: striped pair of read-intensive/larger SSDs like the Gigabyte Aorus Gen4 1 TB

This will fit nicely as my home NAS for a water-cooled dual EPYC virtualized server/workstation build underway. I managed to get a single water block with (3) G1/4 connections that will cool both CPUs and the VRM chokes/converters.

If anyone has better suggestions, please chime in.

Why does someone need something like this? I ran a home symbology NAS at one point but it wasn’t worth the trouble. Let others run and maintain those hot, loud, power hungry disks.

Then you already made the mistakes of:

- conflating trouble for you with trouble for me, which it clearly isn't

- not owning your own data

- paying more to store it

- paying to access it

- ability to keep things that aren't worth storing on paid clouds but aren't all that much when kept on cheap drives

Furthermore, there are additional network costs such as AWS network charges AND home ISP data limits.

And there are other uses, such as:

- backing-up VMs

- backing-up computers

- caching package and source code repos

- backing-up CCTV footage

- and whatever else comes along

Speed. Streaming files (movies, tv shows, youtube channels, linux ISO's, large file collections) is a lot faster; I can reliably hit 100MB/s on my home connection, my DSL caps out at 20MB/s if I do nothing else.

I’m a Backblaze customer of many years & respect their team a lot. But seriously, “out-Googling Google” because they have cheaper storage is a meme that needs to die.

GCP and AWS both store full copies of your data in multiple locations by default (Availability Zones in AWS-speak). So it’s not an apples to apples comparison. The reduced redundancy is priced in, for people who can tolerate it.

Woah, chillax the dramatic rhetoric, your majesty. ;-P

The original scrappy Google was founded on commodity hardware held together by LEGO. The point was to not do as enterprise with redundant everything, which was wasteful for web-serving use-cases that were solved with better high-availability in software. These days, if you're a giant company like FAANG, you can easily afford to go to Quanta and say: give me 10k racks worth of compute nodes to this specification. If you're starting out and broke, you gotta use what's on the shelf, cobble together a custom solution optimized for the purpose and/or kit out a test lab with a mis-mash of used servers from eBay.

Slightly off topic: is anyone using B2 (which seems cheaper if you have more than one computer for a certain amount of data) for personal data backups with strong client side encryption across multiple platforms (Linux, Mac, Windows)? If so, how do you handle it?

I sync all my device files to a local Freenas server which runs duplicacy in a jail and sync's it every night to backblaze B2. I looked at duplicity, restic, attic, borg and in the end settled for duplicacy. Pay attention to the duplicacy license, for somebody it could be a problem.

I do this, though not from Windows, just Mac and Linux. I use restic, which has B2 support smd handles all the encryption. it also does diffing for backups. There's a Windows build, so I assume it would work for you there as well.

You can view and download builds at https://github.com/restic/restic/releases/

I don't automate this though, I just use it for occasional backups. Not sure what the automation story around restic is.

Similar to the other two sibling comments, been using restic to sync to B2 over the past 6-7 months. Stored amount has been 450-475 GB, and total costs tend to be about $2.50-$2.75 per month.

Yes. I use restic same as the sibling comment.

Have >8TB of data from multiple machines with a lot of deduplication (source is somewhere around 10 to 12TB).

I use Arq on two macs and it works very well with B2.

I use and like Arq also, but the OP asked for something that covers Linux, which I believe Arq does not.

Looking at those Data,

It seems they will soon reach 1000 PB / 1EB.

The top 5 Annualised hard drive failure rate are all from Seagate. All Drive from Hitachi and Toshiba has AFR lowered than 1%.

So basically dont buy Seagate.

It's pretty much been this way for a few years, with only a few model lines of Seagate being the outlier. As always, thanks to the BackBlaze team for publishing these numbers.

My math says they're already over 1000 PB/1 EB:

1,089,318 = 4 * 2852 + 4 * 12746 + 8 * 1000 + 12 * 1560 + 12 * 10859 + 4 * 19211 + 6 * 886 + 8 * 9809 + 8 * 14447 + 10 * 1200 + 12 * 37004 + 12 * 7215 + 4 * 99 + 14 * 3619

Don't think I made a typo there, but please check my work. Even counting as 1024 TB = 1 PB and 1024 PB = 1 EB, that leaves 1,048,576 TB = 1 EB and they're over that threshold.

The February 5, 2018 "500 Petabytes and Counting" blog post should soon be eclipsed by a 1 EB post - though it appears they're counting actual data stored, not capacity. Nonetheless, with some redundancy, extra capacity, and overhead, we'll likely see that number soon.

> So basically dont buy Seagate.

Or do, because they're cheaper than the competition and modern systems can handle failures.

True with enough redundancy it's fine, and if they have special terms with SG such as free replacements and heavy discounts then it's little wonder they use so many.

Myself though, for SoHo use, I'm willing to pay more for less stress because I don't have the sheer volume of devices, and the time to replace is time spent doing something useful instead of shuffling HDDs and rebuilding RAID arrays. A 5% saving on a handful of drives is not worth it, but a 40% saving on thousands makes them competitive.

Every time I look at HDD price it seems Seagate is always a little more expensive. But the difference is less than 5% to the point one may argue they are all priced similarly.

I dont want to save a few dollars for potentially 4x the chance of failure and hassle.

And even if we ignore the outlier to 2%+ from two models, Seagate is still on average 2-3x more likely to fail.

Does anyone here have experiences with BackBlaze's B2 service for hosting files? I'm considering switching to it from S3 because it is much cheaper. (I need to transfer 2-3TB / month, usually in 2-3 bursts of worldwide distribution).

Yev from Backblaze here -> We're definitely more affordable and our integrations (https://www.backblaze.com/b2/integrations.html) make it easy to get your data to us. We even have partnerships with companies who can help transfer data from S3 into Backblaze B2!

How is Backblaze able to be so much cheaper than the other, larger competitors? I assume Amazon/Google/Microsoft has squeezed every last cent from suppliers and also has highly cost-optimized staffing costs.

Yev here -> great question! We are a bootstrapped company and we focus on inexpensive storage (https://www.backblaze.com/blog/vault-cloud-storage-architect...). Because we've built a robust system that doesn't use a ton of expensive components we can provide hot cloud storage (B2 Cloud Storage) and computer backup at an affordable rate while still making decent margins. To learn more about our business and decision making, we have a pretty cool series of entrepreneurship blog posts that might be interesting to some: https://www.backblaze.com/blog/category/entrepreneurship/

Reading about b2 pricing it says, you get "10GB of free storage, unlimited free uploads, and 1GB of downloads each day". Doesn't that amount to essentially free backups for (reasonable) personal use? Or am I missing something?

You aren't missing anything. I use B2 along with Restic to backup my Linux machines since their standard backup solution doesn't support Linux. It costs me around $1/month to backup my primary desktop and two laptops.

They had a blog post about doing this a while back, so they are definitely aware of the use case: https://www.backblaze.com/blog/backing-linux-backblaze-b2-du...

I still use their standard backup service for my family's Windows machines since its more "batteries included".

I think even casual users tend to have more than 10GB of data these days.

I don't. Although I can easily fill up a terabyte drive, little of that is my own personal files that I need to keep if the drive blows up. Most of my stuff is source code, documents/notes and some photos (with photos being the only thing that takes up significant space). Almost everything else I can re-download or rebuild from the original source as and when I need it.

In total, sure, but at least for myself the really important stuff would fit in 10MB and I think I could fit all of the medium importance stuff in 1GB. The remaining terabytes are nice-to-have but I wouldn't be too upset if I lost it.

I'm over the 10GB free limit. It costs me about $1.50 a month to backup "irreplacable" data from my NAS.

Is there any consensus among Backblaze employees (or even just your personal opinion if applicable) for what brand/series of drives to use for home NAS devices?

I ask because the online favorite appears to be WD Reds, which you have phased out since 2018.

Yev here - it's interesting, we don't really chat about that often - what I would do is get the least expensive drive that has the most capacity and make sure the NAS is backed up somewhere in case of failure or theft. Personally I think the Toshiba drives are pretty good, but Seagates are affordable and do a good job. Plus there's always HGST which are rock-solid, but tend to run a bit more expensive.

Thank you Yev. I'm wondering about the bandwidth, especially internationally. Do you have any numbers on that? Say split by Europe/US/Other.

We do have a datacenter in the EU (Amsterdam) - and if you set up your account there you'll be able to transfer data to it. That's a popular destination for folks living closely to it, but even before that one went "live" we had lots of people using the our West Coast Data Centers without much issue. If you have a ton of data you can take a look at the Fireball (https://www.backblaze.com/b2/solutions/datatransfer/fireball...) which allows you to rapidly ingest data to us.

What are you using as TCP congestion controller? BBR should provide better utilization on long pipes (e.g. transoceanic transfers if stuff isn't geo-replicated). Totally anecdotal, but it helped me FTPing data from the US to europe.

Yev here -> This question's beyond me, lemme see if I can get a dev on the line :D

*Edit - sounds like BBR is used in some of the environment!

(I need to quickly ship a 50mb file to 50,000 clients worldwide.)

Hey Michael, I host RAW photos I want to share inside B2 (48mb each), and then put CloudFlare in front of it using their tutorial [1]. It gets edge caching, and achieves 200-500mbps. Its great, and I have absolutely no complaints.

1: https://help.backblaze.com/hc/en-us/articles/217666928-Using...

@mherrmann - Only about 10-20GB, so not the TB levels you are dealing with, but backblaze isn't actually doing the transfer, it is CloudFlare.

@toomuchtodo - Yes, and on top of that, both B2 and CloudFlare are completely free since I'm under the 10gb storage limit (for now), and i'm a personal user of CloudFlare(for now).

Is your outbound data free because of the bandwidth alliance deal Cloudflare has with Backblaze?

Thank you; and how much data are you transferring each month?

I used them for both backup (B2 storage with restic on linux servers) and also for serving static content for my homepage, together with Cloudflare ( https://www.backblaze.com/blog/backblaze-and-cloudflare-part... ) Works like a charm

I use it for personal backups with rclone. Works great.

I have made all my hard drive purchasing decisions based almost entirely on these reports for the last couple years and have not been disappointed with the results.

I use Backblaze's massive infrastructure to store pictures of my keyboard.

Is it a cool keyboard?

Are the keys very large?

I still can't believe BackBlaze gives this data away for free. Seems like something they should be selling to other cloud providers

Maybe they consider this report to be an ad for their services? The name recognition this report gives them is probably quite valuable.

It also pressures HDD companies to make better products and appear higher on these lists, which is good for Backblaze.

To quote them : Transparency breeds trust. We’re in the business of asking customers to trust us with their data. It seems reasonable to demonstrate why we’re worthy of your trust.

I’m sure Amazon, Google, Facebook, etc collect their own stats on drive failure. It would be almost negligent if they were just guessing in the dark every time they buy drives.

Main difference is probably Backblaze is small enough to publish these stats without hurting their supplier relationships. (pure speculation)

I love Backblaze, but their log package in my Library folder has grown to something like 10 gigs. Wish there was a way around that.

Signed up for this a week ago. 45 days remaining to upload.

Hurray for Canadian internet.

This may not apply to you, but atleast 2 of the UnderDogs in the Canadian ISP world (MNSi, & TekSavvy) have been rolling out Gigabit fiber.

I've got a 1Gb fiber pipe for 1/10th the cost that Cogeco was charging.

I have ~10mbps upload here in the US, and my backup was looking to take about a month for about 3ish TB of data. One thing that helped is that with default settings of only 1 backup thread, the Windows client was unable to saturate my upload bandwidth. Upping it to 4-6 threads allowed it to keep enough data moving to actually saturate my upload bandwidth and brought my backup down to like a week.

No really related remarks about this handy study, but anybody else still in real awe about how spoiled we are with regards to the sizes and speeds of HDs nowadays? I mean the smallest capacity drive on their chart is 4 Terabytes.

Not feeling spoiled at all, not at all. Especially not with 2 to 3 percent of failure rate. The failure rate I experienced in my workstation makes me worry about not having raid 1 or 10. HDs for 9 TB in raid 10 are not that cheap.

But the bigger issue is that the warranty terms for HDs nowadays is down to 2 or 3 years, so this investment is short living. It also tell you something about the manufacturers reliability estimation of their products.

Can't say I agree with that sentiment. The fact that I can quite reasonably have a 30TB usable RAID5 NAS array makes me feel pretty spoiled. Then again, I'm old enough that my first HDD was 10MB.

Mine was 10MB as well, with a dedicated controller. Quantum if I'm not mistaken. And it lasted much much longer than the averages I get from 4TB disks. I believe I managed to take files out of it in 2000, about 13 years after it was installed.

Edit: nope, probably was a ST506 or 412.

I'd be wary of making a RAID5 array with drives that big; you could easily lose another drive from the I/O caused by a rebuild; though if you have backups (you should) then it's probably an acceptable risk for non-critical data.

I'd agree with that. Even 2-disk redundancy these days is a bit dangerous when you're talking about 14TB drives and 100+TB arrays. As is often stated: RAID is not backup.

Since HDDs have, for the most part, been relegated to being external drives on laptops, I'm still looking forward to SSDs becoming way cheaper and reaching current HDD prices per GB. Internal storage on laptops has reduced or stayed the same while our datasets have grown exponentially over the years (with photos and videos). Since SSDs also perform much better when there's always a good amount of free space (for wear leveling and maintenance), it's all the more painful to live with lower capacity SSDs on laptops.

Sizes, yes, speeds, no. 600 MB/s of data transferred, and only for linear accesses.

I have about 10 TB of video files. I use BackBlaze for Windows but I would like the files to be available on other computers and my phone in my local network.

What can I use to do this and still keep offsite backups?

I think their more premium plans offer sharing


I have two ST12000VN0007 (VN) Seagate drives. The report shows the ST12000NM0007 (NM) has a 3.32% failure rate. I wonder how closely related the VN and NM models are.

If you look, that drive model is also the most highly used by far. I think it's just a matter of the larger sample size / use time.

Surely it doesn’t matter when you have 10,000s of drives? Aren’t you already at a large enough sample size? If it isn’t, what is the point of them publishing this every year? I don’t know the math of the matter though.

> Surely it doesn’t matter when you have 10,000s of drives? Aren’t you already at a large enough sample size? If it isn’t, what is the point of them publishing this every year? I don’t know the math of the matter though.

I think drive age matters? I'm not clear if they cycle drives out at a certain age or just run them until they fail.

Also, if a drive is low enough in cost, then the additional cost of replacing an incremental 1% may be lower than the cost of acquisition of a more reliable drive.

Yea I don't know. I'm not big on statistics either. I just noticed that the drives that did the worst were the ones that had the most usage overall.

That would probably be price ~ failure rate correlation.

Looks like the Segate 0007 are 1y old on average, where the 0008 are 44 days old on average.

The 12TB HGST are 220 days old on average. The Segate 12TB failure rates seem high, quite unfortunate as I own 6 of them.

I'm a very happy customer, but please do something about your mobile app (android) it's really horrible.

I agree the mobile app (on iOS in my case) is at best an afterthought, and most likely not even a high ranking afterthought.

However, out of curiosity...what would you imagine a better Backblaze mobile app would do?

For sure it is/should not be high priority, but releasing such an app in 2020 for sure does not reflect the great skills of the backblaze team. At least show me some basic stats, account settings and invoices. You can only download files from your buckets and that's it.. really?

Semi-bummed my school partnered with another backup company, cause I'd love to support BackBlaze.

Yev here -> Thanks! Out of curiosity, does your school provide backup to all the students?

To all grad students and faculty:


Seagate always seems to have much higher faliure rates compared to HGST/WD/Toshiba etc

Does anyone here know the exact reason why? I assume there are enough people on this site who have worked for them or a competitor :)

30% profit margin, why change anything?

Does anyone have any opinions and experience using backblaze as a personal only cloud storage and offsite backup for smaller amounts of data (under 30 TB)

I use Backblaze's B2 service for both backup (via restic) and archival storage (via git-annex). I only maintain a distinction between the two in case I ever want to move to another service, and also because git-annex and restic have different strengths that make them more or less suitable for unchanging archives and often changing backups respectively. Between the two I have about 1 TB stored with them.

I have yet needed to do a full restore, but I do partial restores from time to time to double-check my backup procedures and every time it's done what I wanted. My monthly costs are usually a bit under $5.

Note I essentially never use B2's API directly, and only use it as a backend through wrappers others have written, so I have no real experience with how good its API is. One of the few times I did try the API, I remember at one point I think I was getting Java exceptions back in the error messages, which was mildly concerning from a hygiene perspective and made for rather terrible error messages, but no sensitive data was being emitted. I also think that's been fixed.

The bottom line is that B2 has worked fine for me and at a good price point.

> for smaller amounts of data (under 30 TB)

Did you mean to say 30 GB or 30 TB? Calling 30 TB as "smaller amount" seems weird to me in 2020, especially for personal data. Perhaps it would be the norm in a couple of decades. :)

FWIW, I have way under 1 TB of personal data to backup to different locations, and I consider that to be relatively large.

I did mean 30 TB, I have approximately 12 TB of data currently between all of my storage for video, audio, books, and games. However, I have been avoiding doing a lot of conversions to digital media from my physical collections because I'm just unsure of running a full blown archival server at home. I would estimate if I converted my entire video library to 4k it would put me somewhere over 10 additional TB. My comic books/manga and graphic novels, upgraded to archival resolution would probably run over 10 TB as well. Then there is the soon to be required ripping of PS2/PS3/WIIU roms when those hardware units become less reliable for actual playing. So I think that 30 TB of storage would do for the time being for me, but I think I will eventually need more than that.

TL;DR I am a digital horder, so I've convinced myself I do in fact need 30+ TB of storage.

Yeah I use B2 with rclone (https://rclone.org/) and it works great.

+1 I have the same question and would like to read replies.

I have wondered about system downtime or time operating in a degraded state.

My understanding is other than mirrored, RAID configurations may take a long time to rebuild on the larger drives and this is a contributing factor to why the highest sales volume of drives has been 'stuck' at 4TB (thus the lower $/GB price).

They don't use traditional RAID setups there. My understanding is they use a proprietary data encoding and distribution, which is more accepting of individual drive failures and reduces rebuild times. I believe I've heard they use something more like erasure coding rather than RAID-5.


There are many open source libraries.

Any particular reason they don't use Western Digital drives ?

I will point out that HGST is owned by Western Digital and all their products are being rebranded to WD.

Other than HGST is owned by Western Digital, they also said:

There were no Western Digital branded drives in the data center in 2019, but as WDC rebrands the newer large-capacity HGST drives, we’ll adjust our numbers accordingly.

So what does these mean:






Thank you!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact