An Inside Look at the Backblaze Storage Pod Museum

berbec · on Feb 15, 2019

I'm always amazed by how open they are. Its great to see people like this succeed.

My one BB question is how many data centers do they have? I know they have great sharing tech to keep data online if a pod or two goes down, but how many physical locations do they run?

brianwski · on Feb 15, 2019

I work at Backblaze.

> how many physical locations to they run?

Two separate datacenters in the Sacramento (California) region, and one in Phoenix (Arizona). We are trying to open a European (Netherlands) datacenter this month or next month.

However, unless you take explicit action to copy your data to two datacenters, any one file (or piece of file) is in exactly one datacenter. We believe the data to be extremely DURABLE (survive), but if your strategy is to "host content" in a highly available fashion where people will die if your content is offline for an hour, we recommend you use two different providers with some sort of fail over. Another alternative is to use a CDN (Content Delivery Network) in conjunction with Backblaze. You can find out more info here: https://www.backblaze.com/b2/solutions/content-delivery.html

For backups, Backblaze advocates for a 3-2-1 backup strategy. https://www.backblaze.com/blog/the-3-2-1-backup-strategy/ This is where you keep 3 copies of your data, 2 on site, and 1 in the cloud.

The exact system for how we achieve high durability is described in this blog past: https://www.backblaze.com/blog/vault-cloud-storage-architect... where any one file is striped across 20 separate computers in 20 separate locations in the one datacenter, where we can entirely lose any 3 computers and the data is completely fine and available.

We are COMPLETELY transparent on how we calculate the durability, we do the math (including the assumptions) in this blog post: https://www.backblaze.com/blog/cloud-storage-durability/

Johnny555 · on Feb 15, 2019

We believe the data to be extremely DURABLE (survive), but if your strategy is to "host content" in a highly available fashion where people will die if your content is offline for an hour, we recommend you use two different providers with some sort of fail over

Do your durability metrics take datacenter failure or human error into account?

Datacenter failures are rare, but they do happen and can cause data loss if all of your data is in that data center.

Likewise, human error can cause cascading failures across a datacenter (or beyond) if there are no firewalls between zones that prevent a single person/command/software update from affecting all copies of the data.

manigandham · on Feb 15, 2019

That depends on what you mean by failure. Are you talking about a data center failing because every single machine inside blew up? Otherwise the common failures like DC power outage or network drops are about availability rather than durability. The data is stills safe on multiple drives.

Johnny555 · on Feb 16, 2019

I'm talking about the kind of failure that hit a Microsoft Azure data center:

https://www.datacenterknowledge.com/microsoft/azure-outage-p...

“ but in this instance, temperatures increased so quickly in parts of the data center that some hardware was damaged before it could shut down" ... "A significant number of storage servers were damaged, as well as a small number of network devices and power units.”

hinkley · on Feb 16, 2019

If a lightning strike is close enough the surge protector won’t save your equipment. When I read the overview of this I expected a roomful of fried servers, not an orderly shutdown triggered by a cooling failure.

manigandham · on Feb 16, 2019

Yea if that damaged all the storage servers containing your data then there would be data loss.

Johnny555 · on Feb 16, 2019

That's why I asked if that 99.999999999% durability number includes datacenter loss. It's an unlikely failure mode but is it .00000000001% unlikely? I don't know.

Given the fact that Azure lost a datacenter with this failure mode, I don't think it's in the "likelihood of an asteroid destroying Earth within a million years" ballpark.

Their durability page doesn't really clear it up, they say "Because at these probability levels, it’s far more likely that ... Earthquakes / floods / pests / or other events" known as “Acts of God” destroy multiple data centers". But from the post above: "any one file (or piece of file) is in exactly one datacenter."

So it doesn't take multiple datacenter failures to lose data, just one unless you explicitly copy your data to multiple datacenters.

sp332 · on Feb 15, 2019

To be a little more specific, the "2" in 3-2-1 is for two different media types. Hard drive + tape, for example. [Edit: ok it doesn't have to be "types". But two different media - don't put all your backups on one disk!]

atYevP · on Feb 15, 2019

Yev from Backblaze here -> Or Hard Drive (Internal) + Hard Drive (External) - that's what we typically see!

luhn · on Feb 15, 2019

Last I checked, the backup service exclusively used the Sacramento DC. Has this changed? Being in the Sacramento area myself, I'd be much more comfortable if my offsite backup was more than a couple miles away.

chillaxtian · on Feb 15, 2019

No disaster recovery? :/

toomuchtodo · on Feb 15, 2019

Same as every other storage provider's default/basic storage offering. If you want georedundancy, you will need to build it.

EDIT: Apparently GCS has this feature built in. Did not know, very cool!

manigandham · on Feb 15, 2019

All clouds have options to do multi-regional storage. GCS has multiregional class. Azure has GRS class. AWS has cross-region replication that can be added to a bucket.

chrisseaton · on Feb 15, 2019

I thought basic products like S3 provided cross-region replication, which gives georedundancy?

But anyway why should I have to build it on top of the provider's offering - why wouldn't they provide georedundancy for me? It seems like a truly basic thing to expect for a backup solution?

But I'm not an expert in this area.

luhn · on Feb 15, 2019

S3 replicates across available zones, meaning copies in multiple DCs but in the same general area.

You can setup a bucket to replicate a bucket in another region, at double the storage costs plus bandwidth charges.

siculars · on Feb 15, 2019

Good Cloud Storage (GCS) has this functionality out of the box.

https://cloud.google.com/storage/docs/locations

/I work for Google/

gist · on Feb 15, 2019

> I'm always amazed by how open they are. Its great to see people like this succeed.

You don't have to be amazed. Watch and learn. They do it because it's good marketing and results in business. The catchy headline got me to click and read. It got the name 'backblaze' into my head one more time. It makes them appear relevant and enhances the brand.

Since Yev is probably reading the comments I will offer another topic that would be interesting.

Do a post on erasing data from SSD's and then being able to recover that data. There was a well known paper by some academics years ago about this. The result was YMMV depending on the drive, controller etc. That would make an interesting blog post.

atYevP · on Feb 16, 2019

Yev here as prophesied! Interesting subject! I'll have to toss it to the group. My assumption is that it would a bit tough for us to write since we aren't necessarily experts in SSDs (yet) - but might be something to consider for the future!

berbec · on Feb 21, 2019

Not too far in the future!

https://news.ycombinator.com/item?id=19220210

berbec · on Feb 20, 2019

I'm anxiously awaiting the SSD/HDD $/GB crossover day.

jpalomaki · on Feb 16, 2019

Their openness is the reason why I’m a customer.

My main concern with this type of services is, are they just reselling S3 and hoping people will never use their quota (something I don’t see as sustainable business model).

skunkworker · on Feb 15, 2019

It's crazy to see how fast storage prices have fallen. Just a couple days ago I saw that you could get a WD White (Shucked easystore, pretty much a relabeled WD Red) 10TB for just $169. 1.69 cents/GB. And you can get 8TBs for $129 (1.61c/GB)

wiredfool · on Feb 15, 2019

I hope it has a pile of the dead 3gb seagates. Perhaps in an interactive exhibit where there are some implements of destruction.

atYevP · on Feb 15, 2019

Yev from Backblaze -> We do have a drive crusher in the office, but mostly use that for small externals. It's quite satisfying!

russh · on Feb 15, 2019

Do you have to treat the output of the drive crusher as hazardous waste?

freedomben · on Feb 16, 2019

If you happen to grab a video of that, I'd love to see it :-)

daveguy · on Feb 16, 2019

I always enjoy reading about Backblaze updates. New Pod designs, drive statistics, operations, etc... One thing I am particularly excited about is the future Pod designs with 2.5 inch drives. There may be enough miniaturization with magnetic drives to make this feasible, but I expect that the real transition will come with a Pod full of SSD. Any idea when that might happen? What is the expectation for that timeline? Do you have additional products or price/performance improvements planned in that transition? The SSD endurance experiment from 4 years ago indicates that, reliability wise, they are more than ready. I guess the only limitation now is price and maybe processing?

https://techreport.com/review/27909/the-ssd-endurance-experi...

chaostheory · on Feb 16, 2019

The only thing missing from the museum are the people behind Storage Pod. Other than that it's always really cool to see the evolution and history of the product

DFXLuna · on Feb 15, 2019

It's always fun to read stuff from the backblaze guys. The stuff they do is just neat.

noir_lord · on Feb 15, 2019

I love this kind of stuff.

I like the domain I program in but some of the problems in areas like this are straight up nerd sniping[1].

https://xkcd.com/356/