I'm always amazed by how open they are. Its great to see people like this succeed.
My one BB question is how many data centers do they have? I know they have great sharing tech to keep data online if a pod or two goes down, but how many physical locations do they run?
Two separate datacenters in the Sacramento (California) region, and one in Phoenix (Arizona). We are trying to open a European (Netherlands) datacenter this month or next month.
However, unless you take explicit action to copy your data to two datacenters, any one file (or piece of file) is in exactly one datacenter. We believe the data to be extremely DURABLE (survive), but if your strategy is to "host content" in a highly available fashion where people will die if your content is offline for an hour, we recommend you use two different providers with some sort of fail over. Another alternative is to use a CDN (Content Delivery Network) in conjunction with Backblaze. You can find out more info here: https://www.backblaze.com/b2/solutions/content-delivery.html
The exact system for how we achieve high durability is described in this blog past: https://www.backblaze.com/blog/vault-cloud-storage-architect... where any one file is striped across 20 separate computers in 20 separate locations in the one datacenter, where we can entirely lose any 3 computers and the data is completely fine and available.
We believe the data to be extremely DURABLE (survive), but if your strategy is to "host content" in a highly available fashion where people will die if your content is offline for an hour, we recommend you use two different providers with some sort of fail over
Do your durability metrics take datacenter failure or human error into account?
Datacenter failures are rare, but they do happen and can cause data loss if all of your data is in that data center.
Likewise, human error can cause cascading failures across a datacenter (or beyond) if there are no firewalls between zones that prevent a single person/command/software update from affecting all copies of the data.
That depends on what you mean by failure. Are you talking about a data center failing because every single machine inside blew up? Otherwise the common failures like DC power outage or network drops are about availability rather than durability. The data is stills safe on multiple drives.
“ but in this instance, temperatures increased so quickly in parts of the data center that some hardware was damaged before it could shut down" ... "A significant number of storage servers were damaged, as well as a small number of network devices and power units.”
If a lightning strike is close enough the surge protector won’t save your equipment. When I read the overview of this I expected a roomful of fried servers, not an orderly shutdown triggered by a cooling failure.
That's why I asked if that 99.999999999% durability number includes datacenter loss. It's an unlikely failure mode but is it .00000000001% unlikely? I don't know.
Given the fact that Azure lost a datacenter with this failure mode, I don't think it's in the "likelihood of an asteroid destroying Earth within a million years" ballpark.
Their durability page doesn't really clear it up, they say "Because at these probability levels, it’s far more likely that ... Earthquakes / floods / pests / or other events" known as “Acts of God” destroy multiple data centers". But from the post above: "any one file (or piece of file) is in exactly one datacenter."
So it doesn't take multiple datacenter failures to lose data, just one unless you explicitly copy your data to multiple datacenters.
To be a little more specific, the "2" in 3-2-1 is for two different media types. Hard drive + tape, for example. [Edit: ok it doesn't have to be "types". But two different media - don't put all your backups on one disk!]
Last I checked, the backup service exclusively used the Sacramento DC. Has this changed? Being in the Sacramento area myself, I'd be much more comfortable if my offsite backup was more than a couple miles away.
All clouds have options to do multi-regional storage. GCS has multiregional class. Azure has GRS class. AWS has cross-region replication that can be added to a bucket.
I thought basic products like S3 provided cross-region replication, which gives georedundancy?
But anyway why should I have to build it on top of the provider's offering - why wouldn't they provide georedundancy for me? It seems like a truly basic thing to expect for a backup solution?
> I'm always amazed by how open they are. Its great to see people like this succeed.
You don't have to be amazed. Watch and learn. They do it because it's good marketing and results in business. The catchy headline got me to click and read. It got the name 'backblaze' into my head one more time. It makes them appear relevant and enhances the brand.
Since Yev is probably reading the comments I will offer another topic that would be interesting.
Do a post on erasing data from SSD's and then being able to recover that data. There was a well known paper by some academics years ago about this. The result was YMMV depending on the drive, controller etc. That would make an interesting blog post.
Yev here as prophesied! Interesting subject! I'll have to toss it to the group. My assumption is that it would a bit tough for us to write since we aren't necessarily experts in SSDs (yet) - but might be something to consider for the future!
My main concern with this type of services is, are they just reselling S3 and hoping people will never use their quota (something I don’t see as sustainable business model).
It's crazy to see how fast storage prices have fallen. Just a couple days ago I saw that you could get a WD White (Shucked easystore, pretty much a relabeled WD Red) 10TB for just $169. 1.69 cents/GB. And you can get 8TBs for $129 (1.61c/GB)
I always enjoy reading about Backblaze updates. New Pod designs, drive statistics, operations, etc... One thing I am particularly excited about is the future Pod designs with 2.5 inch drives. There may be enough miniaturization with magnetic drives to make this feasible, but I expect that the real transition will come with a Pod full of SSD. Any idea when that might happen? What is the expectation for that timeline? Do you have additional products or price/performance improvements planned in that transition? The SSD endurance experiment from 4 years ago indicates that, reliability wise, they are more than ready. I guess the only limitation now is price and maybe processing?
The only thing missing from the museum are the people behind Storage Pod. Other than that it's always really cool to see the evolution and history of the product
My one BB question is how many data centers do they have? I know they have great sharing tech to keep data online if a pod or two goes down, but how many physical locations do they run?