
Backblaze has ordered 100 petabytes of hard drives - _jcwu
https://www.backblaze.com/blog/400-petabytes-cloud-storage/
======
godzillabrennus
I’ve been a customer for years and recently had a catastrophic failure of a
computer and it’s direct attached backup drive. I have spent the last four
days waiting for backblaze to create a restore a backup for a computer on and
last I checked it was at 9%.

I chatted with support and they said this is normal. That progress may jump to
completed any time because the restore only tracks file count.

I wish I could say I’m happy about this but I’m four days into trying to pull
down my data and I still have no idea when I’ll be able to start the download.

This sucks.

~~~
atYevP
Yev from Backblaze here -> That's definitely not normal, how much data/files
do you have (are you doing a .zip or a USB restore)? We're currently QAing a
dramatic increase to the restores, but right now only available internally
until we are confident in it. In the mean time though you can download
mission-critical files in smaller-sized restored, that might help you get up
and running faster while the mega-restore completes.

~~~
biomene
Do you reckon the increase in restores has anything to do with APFS? I also
lost my computer and Time Machine this week after upgrading to High Sierra,
had to restore from backblaze.

Btw, unlike parent comment, I had a great experience with the restore and
couldn't be happier with the decision to go with backblaze.

~~~
burnte
HE means they're QA testing a new method of restoring that will dramatically
increase restore speed, not that there's a sudden uptick in restores. It's
unclearly written, I had to read it twice before I got it.

~~~
atYevP
Sorry - yes, you're correct!

------
ericfrederich
I love the articles from Backblaze. I enjoy them publishing the failure rates
of drives and anything else on the blog.

Two things missing from them in my opinion 1) a Linux client and 2) a second
location

I guess I could get away without a Linux client if there was an unRAID plugin
for B2.

~~~
jstrassburg
Second on the linux client. I have a linux machine with a zfs volume that I've
been backing up to crashplan. Since they've announced they're dumping their
consumer market I have no cost-effective place to go. Most likely, I'll be
more picky about my backups and use S3 life-cycled to Glacier.

~~~
jlgaddis
I would really like to use rsync.net for my "external third-party" backups
specifically because of their "ZFS Platform" [0] but (from my off the top of
my head calculations) their pricing seems to come out a little on the high
side.

With my luck, the first time I need to be able to quickly restore a full
dataset or something, it will probably have been well worth it and I'll likely
be kicking myself for not doing it.

[0]:
[http://www.rsync.net/products/platform.html#zfs](http://www.rsync.net/products/platform.html#zfs)

~~~
metalliqaz
Be careful with rsync.net if you're on a comcast cable modem. They can't seem
to handle the "xfinity powerboost" and you end up being throttled to 160kB/sec

------
bacoboy
I guess all the crashplan personal users are jumping over there en-masse. Good
for business...

~~~
atYevP
Yev from Backblaze here -> An influx of Computer Backup customers has
certainly played a factor, but also Backblaze B2 is growing quickly!

~~~
no1youknowz
[Resolved]

~~~
corobo
I have a hunch the page you're after is
[https://www.backblaze.com/company/partners.html](https://www.backblaze.com/company/partners.html)

~~~
no1youknowz
Contacting via HN was the last option. I tried there and sales on numerous
occasions.

------
linsomniac
I've watched Backblaze closely, and I love their detailed reports of drive
stats. I've always been curious about their choices of manufacturers though.
In my past I found that Hitachi/HGST drives were far more reliable than
others. We had a couple people doing hardware and software support on ~120
machines, and hard drive replacements were expensive (in man-hours) for us, so
whenever possible we got Hitachi drives.

The Backblaze numbers seem to support this as well, far fewer failures of HGST
drives vs Seagate, but they seem to buy mostly Seagate. I guess when you have
tens of thousands of drives, you're always going to need to be replacing them.
So maybe a 10x reduction in failures isn't worth it? Or maybe they just
couldn't get HGST drives in quantity?

~~~
atYevP
Yev from Backblaze here -> We have relationships with manufacturers but don't
directly buy from them (yet - fingers-crossed). It really is a combination of
price/reliability. At our scale we can afford slightly higher failure rates if
it means less expensive drives, so our #1 data point for purchases is the cost
per gigabyte. After that we do look at failure rates, but unless it's
something wildly out of "normal" we tend to live with the occasional failures.
Thanks to Vaults the back-end is designed in such a way that if a drive fails
we don't have to run and replace it - so individual failures tend not to
stress us out too much anymore!

~~~
dzdt
If a 10's of petabytes isn't enough scale to talk to manufacturers directly, I
wonder what is!

~~~
MattSteelblade
After the order mentioned in the article, I wonder if it will be. Since they
mention a mix of 10 an 12 TB hard drives, we can assume the order was for
around 9000 drives. At a cost of $0.03/GB (not taking into account shipping or
taxes), you're looking at a 3 million dollar order.

------
amelius
Do you keep them powered on at all times? Does a storage pod include a
software-switch to turn them off selectively?

~~~
atYevP
Yev from Backblaze here -> Yes, unless the storage pod is in maintenance, it
remains on so that backup and B2 data can be uploaded and accessed.

~~~
amelius
Interesting. Have you ever computed how much power you would save if you
turned them off when not needed?

Or is every pod constantly in use? Would it help if smart algorithms
repartition the data on storage pods, to optimize power use?

~~~
brianwski
Brian from Backblaze here.

> Have you ever computed how much power you would save if you turned them off
> when not needed?

Electrical power is a gigantic part of our datacenter bill (like 60%).
However, we cannot really ever turn off the hard drives.

If they were powered down, it would take too long to retrieve data. It would
be like Amazon Glacier where it can take 15 minutes to get a single image
back.

> is every pod constantly in use?

The way we built it, yes. If one user deletes a backup or unsubscribes from
the Backblaze service, it frees a little bit of space on pods all over the
datacenter. So we allow some other customer to use that space up as soon as we
can.

We also use ALL the pods in the datacenter to give us "breadth" to accept data
faster. This is particularly helpful on surges of new customers due to a press
event, or if a competitor like CrashPlan decides to exit the market.

~~~
amelius
> If they were powered down, it would take too long to retrieve data. It would
> be like Amazon Glacier where it can take 15 minutes to get a single image
> back.

Well, speaking for myself, I don't mind if I have to restore a backup, and
have to wait 1 minute extra for a system to come online (that's about the
typical boot time of a Linux system, so I suppose for a storage pod it would
be about the same).

By the way, it seems possible to power down an individual SATA drive if it is
not in use: [1].

[1] [https://unix.stackexchange.com/questions/112117/shutdown-
my-...](https://unix.stackexchange.com/questions/112117/shutdown-my-backup-
hard-disk-on-linux-when-i-dont-use-it)

~~~
sml156
Interesting, your considering that your setup somehow compares to Blackblaze

Also if if he says it takes about 15 Min's for it to come online and be usable
in their massive data center I would believe him.

~~~
amelius
Not really, he was talking about Amazon's solution.

------
philfrasty
If anyone from the Backblaze team reads this: I love your service BUT please
provide proper invoices.

The ones provided in the „My Account“ page are an absolute nightmare.

Look at services like Stripe or Digitalocean and make them downloadable as
PDF. Best case: send them out via email or auto-upload to specific Dropbox
folder.

------
buro9
@atYevP how much does the Backblaze business plan cost?

Or rather, if I have 10TB on a NAS and a Linux machine, what would it cost to
back that up to Backblaze as a consumer given that you don't support this and
I'd need to purchase the business thing?

~~~
atYevP
Hey there! The business and consumer computer backup service have the same
feature-set, so our business service would also not work with a NAS or Linux
box. I'd recommend looking in to Backblaze B2. It'll be a bit more expensive
as it's $0.005/GB but you'll have more control over your backups and lots of
integrators make it easy.

~~~
corobo
Is Linux backup support anywhere on the radar at all?

I recently trimmed down my Backblaze licenses after switching to Linux
installs, just went with s3 as s3cmd had a path of least resistance to storing
backups. Would love to switch back if it's ever possible :)

~~~
ac29
Theyve repeatedly said no -- it breaks their business model to allow Linux
machines (often servers) to backup "unlimited" data for $5/month. People with
multi-hundred TB NAS machines would end that business model, quickly (see
Amazon Cloud Drive, Crashplan, and others).

But if youre using S3, just switch to B2, its per GB pricing, much cheaper
than S3 and supports lots of Linux tools. I use rclone.

~~~
x4d66a5ce
I took the GP's remark about how he "went with s3 as s3cmd had a path of least
resistance to storing backups" to mean that he considered B2 but opted for S3
since the latter offers a backup utility, but the former supports neither a
"native" (e.g., git or rsync over SSH) backup interface, nor does it have a
Linux client—the approach for backup on B2 from desktop Linux is to roll your
own backup solution or use somebody else's client that has already added
support for the B2 API.

I'm pretty sure the right Linux backup solution is rsync.net, by the way,
which _does_ support a native (SSH) file access interface. For big backups,
storage is (slightly less than) twice as expensive as S3, but unlike S3, there
are no bandwidth costs.

~~~
rsync
"For big backups, storage is (slightly less than) twice as expensive as S3,
but unlike S3, there are no bandwidth costs."

Just to clarify ... we (rsync.net) support "borg backup" now (with borg
deployed on the server side ...) which is the "holy grail" of efficient, zero
knowledge, remote backups:

[https://www.stavros.io/posts/holy-grail-
backups/](https://www.stavros.io/posts/holy-grail-backups/)

... and if you are willing to give up our ZFS snapshots on the account (you
manage retention yourself with borg) _and_ you are willing to handle your own
technical support of borg, we offer a 3 cents rate. This is _about_ the same
price as S3:

[http://rsync.net/products/attic.html](http://rsync.net/products/attic.html)

~~~
corobo
Sorry guys the path of least resistance also took into account price. I'll
admit I've not checked your pricing recently but last time I used your service
I was paying something like $70/yr for <100G storage

------
chrisper
@atYevP Do you guys do anything special regarding HDD vibrations? I am
assuming your special case is built with this in mind. It is crazy how many
things one has to consider at such scale. Things people normally do not care
about!

~~~
brianwski
Not Yev but I also work at Backblaze.

Most of the anti-vibration tech is straight forward, like the fans aren't
bolted to the chassis, they are held by rubber rivets, and we have slowly but
steadily improved the "pressure" that drives are held in place to limit
failures, etc.

From time to time, a fan will "half fail" causing large vibrations which can
massively increase drive failures in the rack that bad fan is in.

~~~
chrisper
Thanks. Another question I got for you is:

What is louder? The fans or all the disks together? I had a workstation with 6
spinning disks and that was already quite loud.

~~~
atYevP
Now it's Yev - you'd have to ask our DC techs. Every time I go to the
datacenter it "feels" like the loudest part are the fans, but my ears aren't
attuned to what is causing which ruckus :P

------
xhrpost
I imagine you'll have to reduce pricing on Backblaze B2 eventually in order to
stay competitive. What's it like having to flip such a massive switch where
your revenue is X one minute and then X - 20% the next?

~~~
teach
I don't understand. B2 is currently about half the cost of S3 for outbound
data and 1/4 the cost for data at rest.

What do you mean "stay competitive?"

~~~
plantain
They're not equivalent. B2 is singly homed.

~~~
liquidgecka
So is S3. You have to pick the region (home) for the data when you create the
bucket.

~~~
gamegoblin
From [https://aws.amazon.com/s3/](https://aws.amazon.com/s3/) and
[https://aws.amazon.com/s3/details/](https://aws.amazon.com/s3/details/)

    
    
        Data is automatically distributed across a minimum 
        of three physical facilities that are geographically
        separated by at least 10 kilometers within an AWS Region
        
        Designed for 99.999999999% durability and 99.99%
        availability of objects over a given year.
        
        Designed to sustain the concurrent loss of data in two facilities.
    

From [https://www.backblaze.com/b2/cloud-
storage.html](https://www.backblaze.com/b2/cloud-storage.html)

    
    
        Uptime: 99.9% SLA
    
        Reliability: 99.999999% durability
    

So S3 is 10x more available, 1000x more durable, and datacenter redundant.

~~~
shawn-furyan
Eleven 9s of durability is an utterly farcical guarantee from a decade old
service. A single big durability event in the next century would irrevocably
decimate that record. They really might as well claim 100% durability, because
it is no less plausible.

I mean, what are the odds that S3 will even be operating in 100 years? I'll
certainly put $1 against $1,000,000,000 that it won't, and that's only one
many ways it could fail the standard.

------
thebiglebrewski
Love these articles! Thanks for writing it. Must feel so good to buy that many
HDDs at once...something really gratifying about that.

~~~
brianwski
As the guy who signs the check, "gratifying" isn't quite the right word. :-)
Some stress, some amazement that everything got this large.

When we started 10 years ago, we could only afford to rent a quarter of one
cabinet at the datacenter at 365 Main Street, San Francisco. We had "right of
first refusal" on the rest of the cabinet we occupied. We could pile all of
the hard drives ever deployed in that cabinet on my kitchen table.

Now it is semi trucks filled with fork lift pallets of drives. But it just
"snuck up on us". Each day it was a little larger, each drive order was just
slightly larger than the previous.

~~~
atYevP
Yes, something taking 10 years to sneak up on you is very sneaky.

------
ape4
How about security? If Backblaze can recover files for users then they can get
the same files for any TLA.

~~~
grincho
That's the advantage of B2: straight storage, bring your own (encrypting)
client. If Backblaze proper had a decent restore story (rather than typing
your key into a web page and getting a zip), I'd recommend it wholeheartedly.

------
tooltalk
wonder if Backblaze is buying directly from HDD manufacturers or wholesalers
and how much volume discount?

~~~
chiph
They've talked about this before -- despite orders like this, they're still
not a big enough customer to be on the manufacturer's radar. So they're
ordering from a middle-man.

Back during the hard-drive crisis after the tidal wave that disrupted
production ... they were buying from anyone that had drives to sell, including
people going into their local big-box store to source drives.

~~~
niftich
Link to their blog post on mad dashes to Costco after the floods [1]. It's an
engaging read.

[1]
[https://www.backblaze.com/blog/backblaze_drive_farming/](https://www.backblaze.com/blog/backblaze_drive_farming/)

~~~
atYevP
Yev here -> it was a pain to drive around the Bay, let me tell you :P

------
pcunite
Could you let us know about failure rates over time? Thanks for sharing.

~~~
atYevP
Yev here -> These drives will be part of our ongoing "Hard Drive Stats" posts
on our blog!

------
chr4004
Curious where one would buy those, directly from the manufacturer?

~~~
atYevP
Yev from Backblaze here -> Most of our purchasing is done through
distributors. We'd love to buy direct at some point, but for now we have good
manufacturer/distributor relationships! We're getting there though!

~~~
rasz
There is a simple fix - become a distributor. I worked for local European
distributors that moved less merchandise per year and had direct relationships
with Samsung/Fujitsu/IBM/WD/Seagate (you can tell it was a long time ago :P).

~~~
atYevP
We've batted the idea around :P

