Hacker News new | past | comments | ask | show | jobs | submit login
An Inside Look at the Backblaze Storage Pod Museum (backblaze.com)
165 points by sp8 on Feb 15, 2019 | hide | past | favorite | 31 comments



I'm always amazed by how open they are. Its great to see people like this succeed.

My one BB question is how many data centers do they have? I know they have great sharing tech to keep data online if a pod or two goes down, but how many physical locations do they run?


I work at Backblaze.

> how many physical locations to they run?

Two separate datacenters in the Sacramento (California) region, and one in Phoenix (Arizona). We are trying to open a European (Netherlands) datacenter this month or next month.

However, unless you take explicit action to copy your data to two datacenters, any one file (or piece of file) is in exactly one datacenter. We believe the data to be extremely DURABLE (survive), but if your strategy is to "host content" in a highly available fashion where people will die if your content is offline for an hour, we recommend you use two different providers with some sort of fail over. Another alternative is to use a CDN (Content Delivery Network) in conjunction with Backblaze. You can find out more info here: https://www.backblaze.com/b2/solutions/content-delivery.html

For backups, Backblaze advocates for a 3-2-1 backup strategy. https://www.backblaze.com/blog/the-3-2-1-backup-strategy/ This is where you keep 3 copies of your data, 2 on site, and 1 in the cloud.

The exact system for how we achieve high durability is described in this blog past: https://www.backblaze.com/blog/vault-cloud-storage-architect... where any one file is striped across 20 separate computers in 20 separate locations in the one datacenter, where we can entirely lose any 3 computers and the data is completely fine and available.

We are COMPLETELY transparent on how we calculate the durability, we do the math (including the assumptions) in this blog post: https://www.backblaze.com/blog/cloud-storage-durability/


We believe the data to be extremely DURABLE (survive), but if your strategy is to "host content" in a highly available fashion where people will die if your content is offline for an hour, we recommend you use two different providers with some sort of fail over

Do your durability metrics take datacenter failure or human error into account?

Datacenter failures are rare, but they do happen and can cause data loss if all of your data is in that data center.

Likewise, human error can cause cascading failures across a datacenter (or beyond) if there are no firewalls between zones that prevent a single person/command/software update from affecting all copies of the data.


That depends on what you mean by failure. Are you talking about a data center failing because every single machine inside blew up? Otherwise the common failures like DC power outage or network drops are about availability rather than durability. The data is stills safe on multiple drives.


I'm talking about the kind of failure that hit a Microsoft Azure data center:

https://www.datacenterknowledge.com/microsoft/azure-outage-p...

“ but in this instance, temperatures increased so quickly in parts of the data center that some hardware was damaged before it could shut down" ... "A significant number of storage servers were damaged, as well as a small number of network devices and power units.”


If a lightning strike is close enough the surge protector won’t save your equipment. When I read the overview of this I expected a roomful of fried servers, not an orderly shutdown triggered by a cooling failure.


Yea if that damaged all the storage servers containing your data then there would be data loss.


That's why I asked if that 99.999999999% durability number includes datacenter loss. It's an unlikely failure mode but is it .00000000001% unlikely? I don't know.

Given the fact that Azure lost a datacenter with this failure mode, I don't think it's in the "likelihood of an asteroid destroying Earth within a million years" ballpark.

Their durability page doesn't really clear it up, they say "Because at these probability levels, it’s far more likely that ... Earthquakes / floods / pests / or other events" known as “Acts of God” destroy multiple data centers". But from the post above: "any one file (or piece of file) is in exactly one datacenter."

So it doesn't take multiple datacenter failures to lose data, just one unless you explicitly copy your data to multiple datacenters.


To be a little more specific, the "2" in 3-2-1 is for two different media types. Hard drive + tape, for example. [Edit: ok it doesn't have to be "types". But two different media - don't put all your backups on one disk!]


Yev from Backblaze here -> Or Hard Drive (Internal) + Hard Drive (External) - that's what we typically see!


Last I checked, the backup service exclusively used the Sacramento DC. Has this changed? Being in the Sacramento area myself, I'd be much more comfortable if my offsite backup was more than a couple miles away.


No disaster recovery? :/


Same as every other storage provider's default/basic storage offering. If you want georedundancy, you will need to build it.

EDIT: Apparently GCS has this feature built in. Did not know, very cool!


All clouds have options to do multi-regional storage. GCS has multiregional class. Azure has GRS class. AWS has cross-region replication that can be added to a bucket.


I thought basic products like S3 provided cross-region replication, which gives georedundancy?

But anyway why should I have to build it on top of the provider's offering - why wouldn't they provide georedundancy for me? It seems like a truly basic thing to expect for a backup solution?

But I'm not an expert in this area.


S3 replicates across available zones, meaning copies in multiple DCs but in the same general area.

You can setup a bucket to replicate a bucket in another region, at double the storage costs plus bandwidth charges.


Good Cloud Storage (GCS) has this functionality out of the box.

https://cloud.google.com/storage/docs/locations

/I work for Google/


> I'm always amazed by how open they are. Its great to see people like this succeed.

You don't have to be amazed. Watch and learn. They do it because it's good marketing and results in business. The catchy headline got me to click and read. It got the name 'backblaze' into my head one more time. It makes them appear relevant and enhances the brand.

Since Yev is probably reading the comments I will offer another topic that would be interesting.

Do a post on erasing data from SSD's and then being able to recover that data. There was a well known paper by some academics years ago about this. The result was YMMV depending on the drive, controller etc. That would make an interesting blog post.


Yev here as prophesied! Interesting subject! I'll have to toss it to the group. My assumption is that it would a bit tough for us to write since we aren't necessarily experts in SSDs (yet) - but might be something to consider for the future!



I'm anxiously awaiting the SSD/HDD $/GB crossover day.


Their openness is the reason why I’m a customer.

My main concern with this type of services is, are they just reselling S3 and hoping people will never use their quota (something I don’t see as sustainable business model).


It's crazy to see how fast storage prices have fallen. Just a couple days ago I saw that you could get a WD White (Shucked easystore, pretty much a relabeled WD Red) 10TB for just $169. 1.69 cents/GB. And you can get 8TBs for $129 (1.61c/GB)


I hope it has a pile of the dead 3gb seagates. Perhaps in an interactive exhibit where there are some implements of destruction.


Yev from Backblaze -> We do have a drive crusher in the office, but mostly use that for small externals. It's quite satisfying!


Do you have to treat the output of the drive crusher as hazardous waste?


If you happen to grab a video of that, I'd love to see it :-)


I always enjoy reading about Backblaze updates. New Pod designs, drive statistics, operations, etc... One thing I am particularly excited about is the future Pod designs with 2.5 inch drives. There may be enough miniaturization with magnetic drives to make this feasible, but I expect that the real transition will come with a Pod full of SSD. Any idea when that might happen? What is the expectation for that timeline? Do you have additional products or price/performance improvements planned in that transition? The SSD endurance experiment from 4 years ago indicates that, reliability wise, they are more than ready. I guess the only limitation now is price and maybe processing?

https://techreport.com/review/27909/the-ssd-endurance-experi...


The only thing missing from the museum are the people behind Storage Pod. Other than that it's always really cool to see the evolution and history of the product


It's always fun to read stuff from the backblaze guys. The stuff they do is just neat.


I love this kind of stuff.

I like the domain I program in but some of the problems in areas like this are straight up nerd sniping[1].

https://xkcd.com/356/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: