
Backblaze is building a 270 TB storage pod - LukeLambert
https://www.backblaze.com/blog/why-now-is-the-time-for-backblaze-to-build-a-270-tb-storage-pod/
======
vamega
Man, I keep seeing awesome posts about Backblaze, and really would like to use
them. However I run exclusively on linux these days, and they don't seem to
have a Linux client.

Public pledge, if backblaze releases a linux client, even if it's command line
only and requires that I manually edit an XML file, I'll purchase a 1 year
subscription.

~~~
toomuchtodo
> Public pledge, if backblaze releases a linux client, even if it's command
> line only and requires that I manually edit an XML file, I'll purchase a 1
> year subscription.

Consider how many Linux users would need to do this to a) pay for the
developer time to write/maintain the client and b) pay for the support time
for Linux issues

~~~
pyre
There are other things to consider too: developer community goodwill, people
that may not be on Linux now, but only use systems that are cross-platform (to
curb lock-in to a particular OS), etc.

How much revenue is Atom for Linux expected to generate for Github?

~~~
toomuchtodo
Developer/Linux community goodwill for a user-friendly backup services isn't
as valuable as you'd think.

IT support staff? Those are the people you want to enamor yourself with. And
they're either on Apple or Windows gear.

~~~
ars
Yes it is. Assuming your IT guy at work runs linux, then when you ask him:
What backup should I get, if his answer is backblaze that brings a lot of
business to you.

Personally I assume they don't release a linux client because then people
would backup their server to it.

~~~
toomuchtodo
Those IT guys don't need to use Backblaze on Linux to recommend it; they're
going to recommend whatever "just works" so they don't have to "go down the
rabbit hole".

When I still used Android, I would recommend iPhones all the time to people.
Why? Because it just works, and I don't want to be stuck troubleshooting their
Android issues for the life of their phone.

~~~
StavrosK
What about server backups?

------
ansible
Just put a 6TB drive in my media server. I was going to put in another 4TB,
but the cost per TB was not that much worse, so I decided to give it a go.
There's also the issue that I only had room for one more drive anyway, and I
wanted to RAID-1 the OS and important data partitions too.

I'm still learning the ins and outs of using btrfs too. I recently found this
btrfs snapshot based backup utility that I'm going to give a try:

[https://github.com/ruediste1/btrbck](https://github.com/ruediste1/btrbck)

This should be a lot, lot faster and more efficient than using rsync and
rsnapshot for disk-to-disk backups with my current scheme.

~~~
ruediste
How is it going?

~~~
ansible
I haven't tried that backup utility yet. I'm frankly a little nervous to trust
it with valuable data. I'll run it and something based on rsnapshot for the
near term. Now I just have to figure out if I've got enough external drives
for that, or do I need to buy more.

I'll likely first try to run things so that the btrfs snapshots are running
over the Internet, which will really cut down on the time and bandwidth
needed. And as a backup use rsnapshot locally.

------
jtbigwoo
I always look forward to backblaze's posts.

Does anybody know if drive manufacturers pay attention to their posts about
reliability? I know when I worked with mainframes we sent every bad drive back
to IBM for failure analysis, but would Backblaze's aggregate data be useful to
somebody like Seagate?

~~~
atYevP
Yev from Backblaze here -> We've since spoken to some of the device
manufacturers after some of our more recent posts. :-p

------
KaiserPro
What is interesting is that backblaze still use the 45 drive in 4u config.

We use Dell/LSI/Netapp/engenio md3260s which pack 60 drives into the same
space. We use raid 6, as we want the storage to manage the redundancy, not the
software. With four hot spares we achieve a density of 170 tb usable per 4u
(technically its 5U as we have a fileserver sitting on top. however assuming
48u it works out the same density)

With backblaze, assuming raid 6(unless you are using FEC then there is no
other way.) you'll only get 120TB per 4u (but a file server as well)

The big advantage of the Dell/netapp/etc raid is that the drives are easily
hot swappable. Ironically at the scale that backblaze are doing it, they are
probably cheaper as well.

~~~
wmf
We also have some of those Engenio JBODs at my office for "big data", but I
assume they're much more expensive than the Storage Pod. My general experience
with enterprisey hot-swap JBODs is that the enclosure costs more than the
drives.

Supermicro also has a 4x12 drive in 4U box that looks a little less "homemade"
than the Backblaze Storage Pod.

~~~
KaiserPro
Yes and no, ours are pretty cheap, and they come with 24x7 4 hour response,
which means that we don't have to worry about keeping lots of spares.

------
lucb1e
I don't understand this: 6TB drives are more expensive per gigabyte and they
even mention it in their post. They say that they expect the drives to become
cheaper over time and that's why they are switching now. Wait, what? Switching
now when they are, I quote, "at the top of the curve"[1] does not make sense?

It might be that the physical space reduction compensates for it, but they
don't mention that so I don't get it.

[1] [https://www.backblaze.com/blog/wp-
content/uploads/2014/08/bl...](https://www.backblaze.com/blog/wp-
content/uploads/2014/08/blog-cost-per-gb-generic2.jpg)

~~~
ghshephard
My read was they they aren't switching over everything - they are testing new
drives, to figure out what reliability looks like.

Once the cost curve costs over, they will then know which vendors drives to go
all in on.

~~~
lucb1e
Oh, that explains, thanks :)

~~~
delucain
Yeah, I don't think they replace drives in pods until they fail. Even if
they're old 1TB models. The drive costs just outstrip their infrastructure
costs by so much that 4 racks of 1TB drives is still cheaper than retrofitting
even 1 rack with new 4TB drives.

------
bumbledraven
I stopped using backblaze when I learned that they require you to TRANSMIT
YOUR PRIVATE KEY TO THEIR SERVER in order to restore your files from backup.

~~~
brianwski
Brian from Backblaze here. To be clear, there are two levels of
security/encryption at Backblaze:

1) The friendliest way we could design for people to restore their files was
to allow customers to sign into a website with a username/password and recover
one or more files. This is the default situation.

2) You can optionally turn on a "private encryption key" but if you do that,
understand you MUST write down that key because if you lose it, you can never
recover it, and Backblaze (nor any government organization) will EVER be able
to recover your files. NEVER. LOSE THAT PASSWORD AND THEY ARE GONE GONE GONE!

In the case of #2, as long as you don't need to recover from a crash, you
don't enter your private encryption key and nobody will ever have access to
your files, period. However, if you lose a file, you have to sign into the
Backblaze website and provide your passphrase which is ONLY STORED IN RAM for
a few seconds and your file is decrypted. Yes, you are now in a "vulnerable
state" until you download then "delete" the restore at which point you are
back to a secure state.

If you are even more worried about the privacy of your data, we highly
recommend you encrypt it EVEN BEFORE BACKBLAZE READS IT on your laptop! Use
TrueCrypt. Backblaze backs up the TrueCrypt encrypted bundle having no idea at
all what is in it (thank goodness) and you restore the TrueCrypted bundle to
yourself later.

------
ksec
Seagate promised 8 / 10TB HDD within the next 10 months. Cant wait to see
that.

~~~
atYevP
Yev from Backblaze here -> We've seen an 8TB drive, can't wait to get enough
of them to drop in to a pod and test :)

------
revelation
I wonder if the power savings going from 4TB -> 6TB could make up for the
difference/GB.

Spinning disks still operate at relatively high power (~8W), and I don't think
this value would change with capacity (they don't add extra disk heads, I
think its just more platters/higher capacity platters).

Pretty much the major cost running servers nowadays is simply the cost of the
necessary electricity to run them and the cooling systems.

(Note that this doesn't apply for Backblazes usecase; they fill harddisks up,
then pretty much power them down. That is why you can't instantly access your
data stored with them)

------
themoonbus
I really do like this level of communicativeness from my backup service.

------
pezh0re
I was always a fan of Backblaze sharing details about how they built out their
storage - it made me feel that if I wanted to, I could also build a
redundant/massive storage array.

~~~
toomuchtodo
I have Backblaze-based storage pods in production at an old employer for a
broadcast video automation system (plays out over the air). Quite the sturdy
design.

------
jewel
For certain workloads these storage pods are much, much cheaper than S3.
Anything where you are storing files that rapidly become stale, but still need
to be instantly accessible for the rare random request.

I've only looked at it from the perspective of video files, though. Where I
work we add a gigabyte of data per user per month. Eventually our S3 storage
bill is going to be our largest cost due to compounding growth.

~~~
res0nat0r
Any idea what the durability of Backblaze is? S3 is 99.999999999%, is
Backblaze anywhere near that? I can't seem to find it in their FAQ.

~~~
jewel
S3's number is just marketing. It's going to be that durable until a black
swan event wipes out 0.1% of all files. I imagine they keep multiple copies of
every file, spread across multiple data centers. It's durable against hardware
failure of all varieties, but it's still vulnerable to a software bug. It's
also possibly vulnerable to certain types of natural disasters.

IIRC backblaze employs a similar strategy. They have the advantage that your
home computer is still storing a copy, so even if they have a minor
catastrophe they can recover by simply having their client re-upload those
files.

~~~
mikeash
Human disasters too. I'd wager the odds of a catastrophic massive nuclear war
are higher than one in ten million per year. Not that I expect Amazon (or much
of anyone) to protect against _that_ , nor would the integrity of my S3 data
be a priority afterwards, but it makes the claim kind of absurd.

Edit: reading more closely, their durability number is per object. It's one
object lost per 100 billion objects per year. The 10 million number comes from
a hypothetical situation where you store 10,000 objects.

I don't know how many objects S3 stores altogether, but if we say it's a
billion (presumably a vast underestimate) then that would imply that the
probability of losing all objects in the system over the coming year is one in
100 quadrillion. I don't think this planet is that safe.

~~~
res0nat0r
Not sure why it is absurd due to the redundancy they've implemented + data
checksumming. Plus AWS hit two trillion objects a bit over a year ago.

[http://aws.amazon.com/blogs/aws/amazon-s3-two-trillion-
objec...](http://aws.amazon.com/blogs/aws/amazon-s3-two-trillion-
objects-11-million-requests-second/)

~~~
mikeash
As great as S3 is, it's still confined to Earth. Seems to me that the odds of
a planet-wide disaster taking out all Amazon infrastructure (as well as
certain less important things like human civilization) are higher than the
odds they're giving of a catastrophic S3 failure.

~~~
res0nat0r
Sure. The AFR of your Western Digital desktop HD doesn't factor in and lower
it's reliability slightly because of the slim chance that a 747 is going to
crash into your house and destroy your home.

Barring unforeseen acts of God, the 9's listed above apply and you just have
to personally weigh if you think the risk of S3 losing multiple datacenters is
high enough for you to risk storing your data there.

~~~
mikeash
Amazon pretty explicitly includes unforeseen catastrophic events in their
durability estimate. "In addition, Amazon S3 is designed to sustain the
concurrent loss of data in two facilities." I sure hope the loss of two
facilities doesn't fall into the "foreseen" category!

~~~
res0nat0r
Sure, it says they account for that right there in their FAQ, so I guess I
don't understand your point.

If you think events like the world being destroyed by a meteorite, the Sun
dying, or a zombie apocalypse should factor in to their 9's reliability
percentage, it shouldn't.

~~~
mikeash
OK, why not?

Serious question, here. Things like gigantic hurricanes flooding their data
centers should factor into it, right? Risk of war destroying the data center
should factor into it, right? (I mean, would you trust S3 to the same degree
if all of their data centers were located in Gaza?) So why _shouldn 't_ a
scenario like "all of our data centers are simultaneously destroyed as part of
a worldwide nuclear conflict" factor into it?

~~~
res0nat0r
Extreme weather events I'm sure are calculated into their factors based on
location. IE: No hurricanes are going to happen in Indiana, but how are you
going to predict a worldwide nuclear conflict?

Should your house insurance be higher because the world might be destroyed
tomorrow by aliens? Something like this isn't quantifiable and if it happens
you have way bigger things to worry about than your mp3's in S3, so minuscule
events like this aren't relevant in the grand scheme of things.

~~~
mikeash
My house insurance calls out certain extreme circumstances as being ineligible
for coverage. Yes, including nuclear war.

I agree, it's not really quantifiable. However, Amazon lists their durability
to a number of significant figures that implies they are able to quantify the
risk down to that level. Yet these unquantifiable risks give every appearance
of being considerably larger than Amazon's figure.

Does Amazon's figure come with a "excluding loss due to ..." clause? If so,
what do they exclude?

~~~
res0nat0r
Standard exclusions:
[http://aws.amazon.com/s3/sla/](http://aws.amazon.com/s3/sla/)

> The Service Commitment does not apply to any unavailability, suspension or
> termination of Amazon S3, or any other Amazon S3 performance issues: (i)
> that result from a suspension described in Section 6.1 of the AWS Agreement;
> (ii) caused by factors outside of our reasonable control, including any
> force majeure event or Internet access or related problems beyond the
> demarcation point of Amazon S3; (iii) that result from any actions or
> inactions of you or any third party; (iv) that result from your equipment,
> software or other technology and/or third party equipment, software or other
> technology (other than third party equipment within our direct control); or
> (v) arising from our suspension and termination of your right to use Amazon
> S3 in accordance with the AWS Agreement (collectively, the “Amazon S3 SLA
> Exclusions”). If availability is impacted by factors other than those used
> in our calculation of the Error Rate, then we may issue a Service Credit
> considering such factors at our discretion.

------
dchest
Backblaze still shows country block due to sanctions that were dropped by US
many years ago, for the country which no longer exists.

~~~
dublinben
Can you elaborate on this? Where are they blocking their website?

~~~
brianwski
BrianW from Backblaze -> yes, this is a silly situation we need to find time
to fix. We blocked a list of countries something like 6 years ago and
geopolitical boundaries and alliances have since changed. (sigh) Did I mention
we have two open reqs for datacenter employees right now? Anybody want to come
join us to help? We're in San Mateo, California. Hit up our "jobs" page...

------
brokentone
I'm wondering why Backblaze hasn't been moving toward cold storage? With a
wait time on recovering/downloading files, I don't see an obvious reason why
not. But I can see some cost and energy savings to be had.

~~~
jonknee
Likely because they keep deleting stuff which is difficult in cold storage.
They keep stuff for 30-days after you delete it.

[https://www.backblaze.com/remote-backup-
everything.html](https://www.backblaze.com/remote-backup-everything.html)

> Backblaze will keep versions of a file that changes for up to 30 days.
> However, Backblaze is not designed as an additional storage system when you
> run out of space. Backblaze mirrors your drive. If you delete your data, it
> will be deleted from Backblaze after 30 days.

------
Keyframe
This is really great! Anyone has any idea what kind of RAID are they running
on their pods and what kind of read and write speed one can get from one of
these?

------
insertion
I'm a happy Backblaze customer, but I've noticed it typically uses around
240MB of memory. Is this normal? Why is it so high?

~~~
darkr
The current desktop client runs atop a JVM

~~~
brianwski
Backblaze engineer here -> we absolutely DO NOT use a JVM on the client
running on laptops. We love Java and use it in the datacenter on every web
server and every pod. The reason we don't use it on the client running on
laptops is twofold:

1) Java doesn't deploy super smoothly - The initial download might be 30
MBytes to include the JVM instead of 1 or 2 MBytes for a 'C' executable. Also,
you have to keep updating the JVM separately, etc. It's friction to customers.

2) Java is hard to make look "native". Macintosh/Apple customers especially
are sensitive to the look and feel of applications and like them to feel
extremely "native".

------
pbreit
So straightforward but still so fascinating.

