Some models are dwindling. Some are being tested. Others (like the Seagate and HGST 12 TB) are increasing. Only thing that's really perplexing is why they keep buying more and more of the high-failure-rate Seagate 12 TB drives. It must be more than 3% cheaper to buy (and service!) a Seagate with a 3% chance of failure than to buy an equivalent HGST with a 0.4% chance of failure. I guess when you have 120,000 drives, easy hot-swap enclosures, and software to handle it all that makes good sense! But as an individual consumer, even with a Backblaze backup, it's definitely worth my time to spend a bit more on a drive that's far more reliable than to save a few dollars on a Seagate.
I suppose there's a movie plot in there where BackBlaze negs their favorite drive so they can buy them cheaper.
I am guessing they RMA the drives and get replacements.
You could even have keys per disk and wipe them when a disk fails.
Either way, you should be fine to RMA the drives as for an external observer without the keys they just contain random noise.
Meanwhile all WD’s have been rock solid.
Personally I also find Seagate the loudest and 'clickiest' of all the drive brands. I can hear the mechanicals, making me think they will fail, so I trust them less than other brands.
Also, most employees aren't really incentivized to reduce or minimize infrastructure expenses.
So, durability of data (which to be fair doesn't matter for most s3 use cases), and interop with literally everything else in AWS
Intelligent data tiering
Actual access control
Pre signed URLs
I am fancying the idea to move our CDN from AWS stack to B2 + CF, thanks to Bandwidth Alliance. There's at least one thing stopping me: for simple schema of hosting static content out of bucket we should deploy Workers just for URL rewriting. Guys from CF recommending that way and not URL rewrites by simple rules. But it puts us in a weak position of raising cost twice: for increased edge trafic AND for increased number of requests.
Can anything be done on BackBlaze side to address the problem, like custom domains for buckets? Like https://f001.backblazeb2.com/file/bucket-name/file.jpg => https://bucket-name.f001.backblazeb2.com/file.jpg ?
> Last I checked, Backblaze still stores most data in 1 location, no?
Backblaze now has multiple regions! One in Europe (Netherlands) and one is called "US-West". Quietly the US-West is actually three separate data centers, but your data will only really land in 1 datacenter somewhere in US-West based on a few internal factors.
To be absolutely clear, if you only upload and store and pay for 1 copy of your Backblaze B2 data, it is living in one region. To get a copy in two locations you have to pay twice as much and take some actions. So if this kind of redundancy is important to you for mission critical reasons Backblaze B2 would only be half as expensive as one copy in Amazon S3, not 1/4 as expensive.
In the one copy in one region in Backblaze B2, any file is "sharded" across 20 different servers in 20 different racks in 20 different locations inside that datacenter. This helps insulate against failures like if one rack loses power (like if a power strip goes bad or a circuit breaker blows). But if a meteor hits that 1 datacenter and wipes out all of the equipment in a 1 mile blast radius, you won't be getting that data back unless you have a backup somewhere else.
The major caveats are hidden away in their pricing FAQ: they charge a 1TB minimum if you use less, and there's a 90 days minimum retention period, meaning if you update a file a few times you will pay for the full 90 days of every intermediate version. Additionally, they reserve the right to make you pay for egress if it looks like you transfer more than you have stored.
So all in all, Wasabi might be the right fit for you if you store >1TB of files that are infrequently updated and get less than 1 download/month on average. If you fit that use case, I think their free egress pricing is awesome, but it's definitely not for everyone.
1. Scale - S3 is big - really really big! You don’t need to care if you store one KB or several petabytes.
2. Tiers: the default on S3 is several way replicated storage with 11 9s of durability with high availability. However you can select from cheaper options with the trade off you are happy with.
3. Cost: S3 has reduced prices several times, you can be reasonably sure your costs will go down over time on per unit basis.
Cloudfare is a featured integration that only mentions that transfer fees are free not that CDN hosting is free:
Cloudfare does have a free CDN tier "For individuals with a personal website and anyone who wants to explore Cloudflare." but it's not the same as B2 including a CDN for free, even Azure is apart of the bandwidth alliance.
You can also turn on an extremely aggressive caching policy with a page rule that will keep everything under a given subdomain for a month. This makes the "free CDN" part easy, though again, people who do this run the risk of getting their accounts terminated.
> the B2 API is much slower than S3.
This is "generally true" for 1 upload thread. We aren't even sure what Amazon is doing differently, but they can be a little faster in general for 1 thread (some people only see 20% faster, some see as high as 50% faster, might be latency to the datacenter and where you are located).
As long as you use multiple threads, I make the radical claim that B2 can be faster than Amazon S3. The B2 API is slightly better in that we don't go through any load balancers like S3 does, so there is no choke point. What this means is that in B2 40 threads are actually uploading to 40 separate servers in 40 separate "vaults" and none of the threads could possibly know the other threads are uploading and it does not "choke" through a load balancer. This was all designed originally so that 1 million individual laptops could upload backups all at the same time with no issues and no load balancers. And it works great every day.
Practically speaking, for most people in most applications, this means both Amazon S3 and Backblaze B2 are essentially free of any limitations. If you aren't using enough of your bandwidth, spawn a few more threads (on either platform) and soak your upload capacity. But in full disclosure, if your application is only single threaded, yes, B2 tends to be 20% slower for that 1 thread.
AWS can certainly provide geographical diversity, but on the organizational abstraction layer, all eggs are in one basket, yes ?
Is having organizational redundancy something you assign zero value to, or something whose value conflicts with the egress costs so as to make it a difficult decision ?
Again, genuinely very interested ...
If AWS goes down, more or less a good portion of the internet goes dark. It's an acceptable risk at this point unless you are truly massive and entirely self contained- if you are using any 3rd party services, IE for auth, payment, whatever- they may be using AWS as well and you are still exposed.
To anyone reading this: Don't store lots of small files on S3. It's a terrible idea.
I only guesstimated out of the table and didn't have time to look at the actual data, so it's possible I misread something.
EDIT: nevermind, found it.
"Backblaze counts a drive as failed when it is removed from a Storage Pod and replaced because it has 1) totally stopped working, or 2) because it has shown evidence of failing soon.
A drive is considered to have stopped working when the drive appears physically dead (e.g. won’t power up), doesn’t respond to console commands or the RAID system tells us that the drive can’t be read or written."
It's a timely article as I'm looking at HC530's (WUH721414ALE6L4 / WUH721414ALN6L4 (wiredzone carries it)) for a home FreeNAS box:
- any relatively-modern enterprise 4U 3.5" storage box with Xeon 4 cores or so
- quieter, high-volume fan mod
- RAM: 64-128 GiB, beyond that isn't useful unless deduping
- NIC: X710-T4L 4x 10GbE copper NIC
- ZIL: mirrored pair of high-endurance, write-intensive, reliable SSD like Optane 900p/905p 280-480GB
- L2ARC: striped pair of read-intensive/larger SSDs like the Gigabyte Aorus Gen4 1 TB
This will fit nicely as my home NAS for a water-cooled dual EPYC virtualized server/workstation build underway. I managed to get a single water block with (3) G1/4 connections that will cool both CPUs and the VRM chokes/converters.
If anyone has better suggestions, please chime in.
- conflating trouble for you with trouble for me, which it clearly isn't
- not owning your own data
- paying more to store it
- paying to access it
- ability to keep things that aren't worth storing on paid clouds but aren't all that much when kept on cheap drives
Furthermore, there are additional network costs such as AWS network charges AND home ISP data limits.
And there are other uses, such as:
- backing-up VMs
- backing-up computers
- caching package and source code repos
- backing-up CCTV footage
- and whatever else comes along
GCP and AWS both store full copies of your data in multiple locations by default (Availability Zones in AWS-speak). So it’s not an apples to apples comparison. The reduced redundancy is priced in, for people who can tolerate it.
The original scrappy Google was founded on commodity hardware held together by LEGO. The point was to not do as enterprise with redundant everything, which was wasteful for web-serving use-cases that were solved with better high-availability in software. These days, if you're a giant company like FAANG, you can easily afford to go to Quanta and say: give me 10k racks worth of compute nodes to this specification. If you're starting out and broke, you gotta use what's on the shelf, cobble together a custom solution optimized for the purpose and/or kit out a test lab with a mis-mash of used servers from eBay.
You can view and download builds at https://github.com/restic/restic/releases/
I don't automate this though, I just use it for occasional backups. Not sure what the automation story around restic is.
Have >8TB of data from multiple machines with a lot of deduplication (source is somewhere around 10 to 12TB).
It seems they will soon reach 1000 PB / 1EB.
The top 5 Annualised hard drive failure rate are all from Seagate. All Drive from Hitachi and Toshiba has AFR lowered than 1%.
So basically dont buy Seagate.
1,089,318 = 4 * 2852 + 4 * 12746 + 8 * 1000 + 12 * 1560 + 12 * 10859 + 4 * 19211 + 6 * 886 + 8 * 9809 + 8 * 14447 + 10 * 1200 + 12 * 37004 + 12 * 7215 + 4 * 99 + 14 * 3619
Don't think I made a typo there, but please check my work. Even counting as 1024 TB = 1 PB and 1024 PB = 1 EB, that leaves 1,048,576 TB = 1 EB and they're over that threshold.
The February 5, 2018 "500 Petabytes and Counting" blog post should soon be eclipsed by a 1 EB post - though it appears they're counting actual data stored, not capacity. Nonetheless, with some redundancy, extra capacity, and overhead, we'll likely see that number soon.
Or do, because they're cheaper than the competition and modern systems can handle failures.
Myself though, for SoHo use, I'm willing to pay more for less stress because I don't have the sheer volume of devices, and the time to replace is time spent doing something useful instead of shuffling HDDs and rebuilding RAID arrays. A 5% saving on a handful of drives is not worth it, but a 40% saving on thousands makes them competitive.
I dont want to save a few dollars for potentially 4x the chance of failure and hassle.
And even if we ignore the outlier to 2%+ from two models, Seagate is still on average 2-3x more likely to fail.
They had a blog post about doing this a while back, so they are definitely aware of the use case: https://www.backblaze.com/blog/backing-linux-backblaze-b2-du...
I still use their standard backup service for my family's Windows machines since its more "batteries included".
I ask because the online favorite appears to be WD Reds, which you have phased out since 2018.
*Edit - sounds like BBR is used in some of the environment!
@toomuchtodo - Yes, and on top of that, both B2 and CloudFlare are completely free since I'm under the 10gb storage limit (for now), and i'm a personal user of CloudFlare(for now).
Main difference is probably Backblaze is small enough to publish these stats without hurting their supplier relationships. (pure speculation)
Hurray for Canadian internet.
I've got a 1Gb fiber pipe for 1/10th the cost that Cogeco was charging.
But the bigger issue is that the warranty terms for HDs nowadays is down to 2 or 3 years, so this investment is short living. It also tell you something about the manufacturers reliability estimation of their products.
Edit: nope, probably was a ST506 or 412.
What can I use to do this and still keep offsite backups?
I think drive age matters? I'm not clear if they cycle drives out at a certain age or just run them until they fail.
Also, if a drive is low enough in cost, then the additional cost of replacing an incremental 1% may be lower than the cost of acquisition of a more reliable drive.
The 12TB HGST are 220 days old on average.
The Segate 12TB failure rates seem high, quite unfortunate as I own 6 of them.
However, out of curiosity...what would you imagine a better Backblaze mobile app would do?
Does anyone here know the exact reason why? I assume there are enough people on this site who have worked for them or a competitor :)
I have yet needed to do a full restore, but I do partial restores from time to time to double-check my backup procedures and every time it's done what I wanted. My monthly costs are usually a bit under $5.
Note I essentially never use B2's API directly, and only use it as a backend through wrappers others have written, so I have no real experience with how good its API is. One of the few times I did try the API, I remember at one point I think I was getting Java exceptions back in the error messages, which was mildly concerning from a hygiene perspective and made for rather terrible error messages, but no sensitive data was being emitted. I also think that's been fixed.
The bottom line is that B2 has worked fine for me and at a good price point.
Did you mean to say 30 GB or 30 TB? Calling 30 TB as "smaller amount" seems weird to me in 2020, especially for personal data. Perhaps it would be the norm in a couple of decades. :)
FWIW, I have way under 1 TB of personal data to backup to different locations, and I consider that to be relatively large.
TL;DR I am a digital horder, so I've convinced myself I do in fact need 30+ TB of storage.
My understanding is other than mirrored, RAID configurations may take a long time to rebuild on the larger drives and this is a contributing factor to why the highest sales volume of drives has been 'stuck' at 4TB (thus the lower $/GB price).
There are many open source libraries.
There were no Western Digital branded drives in the data center in 2019, but as WDC rebrands the newer large-capacity HGST drives, we’ll adjust our numbers accordingly.