
Our 6 TB Hard Drive Face-Off - ehPReth
https://www.backblaze.com/blog/6-tb-hard-drive-face-off/
======
benguild
It’s really great that BackBlaze does these real world, mass comparisons.
Otherwise it’s just limited to what a single test says, often at a single
time, with a single batch of drives from various manufacturers.

These guys are probably one of the only qualified places to do this at scale
and report real world results. Glad they do!

~~~
emsy
I often think of it as open sourcing a company. They give up a competitional
advantage but probably gain as much or more in return (reactions on their
article as "pull requests" for instance). I don't want to sound hyperbolic but
I think it also advances humanity (The information and the spirit behind it).

Other companies do talks and papers too, but the Backblaze info seems really
unfiltered and close to the real world.

~~~
Ironchefpython
> the Backblaze info seems really unfiltered and close to the real world.

Cloudflare seems to be a similar company. I recommend them both to anyone
somewhat technical who needs a CDN.

My only complaint about Backblaze is no Unix client. I thought about setting
up a Windows VM to manage backups, but I was terrified that Gninjas would
sneak into my house at night and shave my beard.

~~~
akgerber
They have a Unix client— probably just not the Unix you use (OS X).

------
Bluestrike2
The article mentions having to buy 10k+ new drives in the next few months? Is
that number just your average increase for new drives and/or replacements for
failed drives? Or does it include planned replacement of older pods with
smaller drives? I'm curious if there's a point where you've found it less
expensive (or less problematic with drive failure rates per pod) to replace
and upgrade older pods compared to just letting them continue to run.

~~~
bluedino
They used to say they don't spend any money on drive replacements (except for
labor, obviously) because of the drives having 3-year warranties. Those were
using 1.5TB drives. However, the original 67TB pods came out in 2009 and
wouldn't be warrantied anymore.

According to their blog they took until 2011 to hit 10PB, which would be
roughly 150 of the original 67TB pods. The 6TB drives are 4 times better, so
they'd be able to replace all of those with just 40 of the new pods, I'm sure
the savings in electricity and rack space would be too huge to pass up.

Then again, they have their own datacenter space now, so space probably isn't
an issue but I'm sure electricity is. They said something around $2,100 a rack
(which holds 10 pods) for electricity, space, and data. They were incredibly
stoked to cut that in half by moving to 3TB drives with the 2.0 version of the
pod so I'm sure cutting that in half excited them as well :)

~~~
brianwski
Backblaze employee here. Some of the drives are "voided warranty" now, like
the ones we "shuck" out of USB enclosures. We just include that into our cost
analysis - for example if 5 percent of the drives fail, the warranty saves you
5 percent of cost - so if purchasing the warranty costs 3 percent we do it,
but if purchasing the warranty costs 7 percent we do without.

------
java-man
Firefox: Unable to Connect Securely Firefox cannot guarantee the safety of
your data on www.backblaze.com because it uses SSLv3, a broken security
protocol. Advanced info: ssl_error_no_cypher_overlap

can we trust them with our data?

~~~
theandrewbailey
I thought it was just me (I've disabled some older cipher suites in
about:config). Turns out this site only supports broken ciphers (RC4 and
3DES). Ewwww.

[https://www.ssllabs.com/ssltest/analyze.html?d=backblaze.com](https://www.ssllabs.com/ssltest/analyze.html?d=backblaze.com)

~~~
yuhong
3DES is not broken but it is legacy.

------
fubarred
Backblaze is pretty awesome for open-sourcing, effectively, their costs in a
high-competition segment. It would be difficult (read: pointless) to compete
in that market without massive vertical integration, greater scale, deep
pockets and compelling, differentiated value (eg something different enough
besides all the other entrants).

For anything critical in business (customer list, billing, finance, etc.), I
would recommend also using Tarsnap in addition to whatever other (non-AWS)
vendor/s are used for "offsite"/"offcloud" backup.

------
tzz
Backblaze should expand their service to a cheaper cloud storage service
similar to Amazon S3. They already have the infrastructure and the know-how.

~~~
dsl
The business model is based on only rarely having to retrieve old data. I
suspect they spin down the drives (or possibly hibernate the whole box) when
not in active "load" mode.

You would need to compare the pricing here to Amazon Glacier, if anything.

~~~
brianwski
Backblaze employee here. Our drives never spin down, although we have
considered this because electricity is a large monthly bill to us.

Because it is "backup" we are pretty relaxed about shutting down our pods to
do maintenance, like replacing a drive. But that only takes 10 minutes, we try
not to have machines offline more than that if possible, or it causes support
issues from people trying to restore files.

~~~
rbanffy
Don't the writing patterns make old data tend to remain in older drives and
newer data ending up in the newer devices? If that's the case, would reads to
those devices become increasingly infrequent to the point it would be worth to
spin them down? Are you worried about the stresses induced by spin up/down and
thermal variations?

~~~
brianwski
In general that's my theory also (that after the initial fill, and maybe after
an initial "data churn" we might power them down). However, a full pod often
has at least one file from several hundred thousand individual customers. If
any one of those customers prepares a restore, those drives have to be
spinning. Also, we now have an iOS and Android app that you have access to all
your files on your laptop from the convenience of your Smartphone - which
again means the drives have to be spinning.

> Are you worried about the stresses induced by spin up/down and thermal
> variations?

Absolutely. Our datacenter techs are convinced that if they swap a drive
QUICKLY the pod has a higher chance of coming back up without any problems, if
they let the pod get entirely cool it statistically seems to have more
problems when it comes back up - a full cool down and heat up seems to cause
issues.

------
SideburnsOfDoom
No failures recorded at all. This is pretty impressive for the drive
manufacturers:

"Let’s review the Seagate and Western Digital drives so far:

Initial reliability (how many drives failed) – No failures.

Running reliability (3 months) – No failures

SMART Stats (3 months) – No error conditions recorded for the 5 stats that we
utilize."

~~~
Cthulhu_
Of note is that since Backblaze is a backup company, their hard drives will
probably not see a lot of churn / activity - write once, wait until something
happens. Maybe refresh the data every once in a while.

I'd like to see similar tests from companies like Amazon and Dropbox (although
the latter probably uses the former), in various use cases.

~~~
chrisan
Wouldnt they see the same amount of churn a typical user would see?

I do not use them but they say "That’s why Backblaze backs up your data
automatically and constantly looks for new and changed files to backup.
Install Backblaze and never worry about losing a file again."

Unless you are running a database it seems like the most common use case for
files is write once, update every once in a while. Document creation (code for
us or office docs non-coders) would be a bunch of writes up front then follow
the same pattern of update every once in a while

I suppose their reads are much less than a typical user, but I didnt think
reads contributed much to eventual drive failure

~~~
prawks
Hypothesizing about how Backblaze stores data: because they likely keep a
series of backups, it's more likely they write each delta out separately. I'd
imagine it's likely they no not have whole copies of your latest backup on
their drives, but they could reconstruct one at will if needed. If that were
the case, it would indeed be close to write-once.

~~~
brianwski
Backblaze engineer here. If a file is less than 30 MBytes and it changes even
1 byte we push an entire copy to a new location in our datacenter. We preserve
the old copies for 30 days so you can "roll back time" up to 30 days (like if
you edit a document but want to revert).

For files larger than 30 MBytes we break them into 10 MByte chunks and only
transmit the chunks that have changed. So the worst case is you insert 1 byte
at the very start of the large file - this effectively changes EVERY 10 MByte
chunk and we transmit it all. The best case is you append 1 byte to the end of
the large file, because then we only have to transmit that one chunk.

~~~
existencebox
I'm very curious; has a scheme that attempts to optimize against that worst
case been considered? (Assuming you guys have, was it not used because the net
tradeoff wasn't worth it?)

It seems like if you're already going through the effort to see if chunks have
changed, you can readily do some heuristic where you check the end chunk of
both sides, and scan inwards until you find a change; something to that
respect, further minimize replicated data? I'd assume you might have CPU
cycles to spare, but I'm too far into assuming already to feel comfortable so
I'll just hope you answer :P (thanks in advance, it's been great to read what
you've written thus far.)

~~~
brianwski
> have you tried a better binary diff algorithm

We just never got around to it. In practice, it turns out most well written
programs with large data try not to insert one byte at the start of the file.
For example, take your large Outlook "pst" file (commonly 1 - 4 GBytes). When
you get an email, it seems to append it to the END of the pst file, plus
update some internal tables. So a large amount of that won't change.

Also, the worst case is it wastes space in our datacenter (and some bandwidth)
for 30 days and then is cleaned up anyway, so you can measure the theoretical
amount of money to save and it won't profoundly change our business so we put
it off. Not to say ANYTHING is off the table, we're just always swamped with
some project or other. :-)

------
plus
Unfortunately not the most direct apples-to-apples comparison. The WD Red
drives are designed for use in NAS systems, whereas the Seagates used are
desktop drives. Seagate does have a brand of NAS drives (called simply enough
Seagate NAS), though not apparently at 6TB.

~~~
sp332
It's not saying that the WD drives are better in all cases. Backblaze has a
specific use-case and the WD drives are the best fit.

------
rwmj
I wonder if anyone has real world experience of Shingled Magnetic Recording
(SMR) drives? I saw a talk by Red Hat's Ric Wheeler about a year ago, and they
seem both interesting and pretty strange - as in, we may need new filesystems
to cope with them.

~~~
brianwski
Backblaze employee here: we will run our first experiment with SMR soon! We're
SUPER curious about whether they will work in our application.

------
Tepix
It's unfortunate that Backblaze feels the need to transfer the private key to
their servers.

Their knowledge base entry at
[https://help.backblaze.com/entries/20203731-Can-you-tell-
me-...](https://help.backblaze.com/entries/20203731-Can-you-tell-me-more-
about-the-encryption-Backblaze-uses-) doesn't clarify wether or not the secret
private password is transmitted to their servers to decrypt the private keys
there which would make the scheme rather weak.

Ideally, the private keys would remain on the user's PCs.

~~~
brianwski
Backblaze engineer here. The private encryption key is a completely standard
PEM file from OpenSSL. We have two modes: by default the PEM file is secured
by a "pass phrase" known by Backblaze. In this lower security mode, the user
data is essentially only secured by their email address and their chosen
password on the Backblaze website, and the user can recover their password via
email. That is a great mode if you are just backing up stuff you don't
terribly mind somebody getting a copy - like photos of your cat.

If you are concerned about security, the second mode is you set your own "pass
phrase" on the PEM file. But oh my lord, make sure you remember that pass
phrase because Backblaze never writes it to disk and if you forget it, NOBODY
ON EARTH is getting your files. Not you, not the USA government, not
Backblaze, that data is GONE. You cannot recover that password. This is a good
security mode if you would be arrested if the NSA got a copy of your data,
because we don't think it can be broken.

~~~
bumbledraven
Even in "secure mode", your passphrase must be sent to backblaze's servers
whenever you do a restore. They then have posession of decrypted version of
your files for a while. This is unacceptable.

As brianwski wrote in
[https://news.ycombinator.com/item?id=8169040](https://news.ycombinator.com/item?id=8169040):

 _...[I]f you lose a file, you have to sign into the Backblaze website and
provide your passphrase which is ONLY STORED IN RAM for a few seconds and your
file is decrypted. Yes, you are now in a "vulnerable state" until you download
then "delete" the restore at which point you are back to a secure state._

------
EwanG
For those not willing to read a full page, in the end run it is more an energy
decision than a reliability decision. And they mention at the end that they
are already looking into the 8TB Helium and 8TB SMR drives.

~~~
pbreit
My sense was that the 20% speed advantage was as important or more than the
small energy savings. Is that common for slower drives to handle data faster?

~~~
pfg
Maybe it's a side-effect of increased vibration for 7.2k drives? Hard drive
performance drops in environments with high vibration. I could imagine that
being a factor when you put 45 drives in a single server.

------
yc1010
Does anyone know if there is a set of command line tools (linux) or an api for
uploading, downloading and listing files on your account or do they
specifically forbid this type of use and obfuscate?

~~~
e12e
I've been hoping for Backblaze to launch a "pro"/"enterprise" tier that
allowed for this kind of use. Unfortunately (for me) it would appear they
prefer to stay within their chosen niche: home user backups.

The service I'm aware of that is closest to what you describe is spideroak[1].
Last I used them (years ago) they were a little too slow, and had some other
minor kinks -- but might be worth checking out again now.

It looks like they've also finally made good on delivering Nimbus.io -- both
as source and as a service (as I said, it's been years since I looked at
Spideroak...):

[https://nimbus.io/](https://nimbus.io/)

[https://github.com/SpiderOak/nimbus.io](https://github.com/SpiderOak/nimbus.io)

[1] [https://spideroak.com/](https://spideroak.com/)

~~~
yc1010
So its not really "unlimited" storage they offer, there is a ceiling that
matches the size of a users primary hard drive (max 6-8 TB at present?) with
most users probably using laptops with small drives. For example I would love
to backup 3-4TB on my NAS, and be able to occasionally access+download and
check integrity

~~~
brianwski
Backblaze employ here. We don't back up NAS, but if you direct attach a Drobo
we include that in "unlimited" for $5/month.

FYI - our largest customer backs up about 60 TBytes for $5/month. So is it
"unlimited"? If you have less than that it will work, we have proof! :-) But
we lose money on that customer, we survive on the "average" amount of data
people have, and on average a laptop of your average home user has only a few
hundred GBytes.

------
unicornporn
I'm so looking forward to the Seagate 8TB Archive HDD. Finally a drive that
will hold all my photos and quite a bit more. It will make my backup procedure
a whole lot easier.

~~~
pjc50
Related ask HN to you and others: what software are you using to backup/sync
to that drive? (What would you use on Windows?)

~~~
flyinghamster
I can't speak for Windows, but the setup where I work involves four drives
(currently 2 TB each), with three running as a triple-mirror ZFS pool. For
backup, I split a drive off the pool, pull it, slide in the previous backup,
and resilver the newly-inserted drive. Each time, I use a different drive bay.

The ZFS pool itself provides backup for several other systems, and snapshots
are made periodically by a cron job to provide a further line of defense.

------
justcommenting
the critical assumption in this post seems to be that drive speed was the only
rate-limiting factor when they were collecting data... but as i understood the
post, it seemed like perimeter load-balancing, sata controller/nic load,
and/or customer usage patterns could be responsible for some of the
differences they observed.

~~~
lsc
You have to screw up a storage system pretty good to have something other than
the hard drives be the limiting factor.

~~~
justcommenting
this is generally true, but even small variations in other factors could
propagate through a complex architecture to account for some of the
differences presented here, especially over time. these data are still useful,
but some of the caveats could be made more explicit.

similarly, most drive benchmarks are 'artificial' in any number of ways and
come with another set of caveats. since drive testing is already part of
backblaze's process for putting drives into production, it could be especially
interesting to compare and contrast performance data from testing with data
from production use.

i pick these nits because real-world storage performance info seems like a
valuable public good (particularly for enterprise and enterprise-like usage of
consumer drives), and i'm grateful to backblaze for offering it up.

------
lsc
hm. I wonder if their data access patterns are largely sequential; I imagine
the 7200rpm drives would beat the lower-RPM drives in an environment where
random access matters.

~~~
justcommenting
i have similarly wondered how important drive cache size might be for these
types of use cases. multivariate regressions including brand, cache size, rpm
etc. as predictors of various performance metrics would be super interesting
in a place like blackblaze.

~~~
lsc
Cache gets you very little if you can feed in sequential data at a constant
rate and at the blocksize of the device.

Cache becomes more important as the input data becomes more messy; cache is
one tool for turning bursty and random data into constant, sequential data.

------
ck2
Helium filled drives are literally a dead-end as their lifespan can be shorter
than SSD

~~~
ghshephard
Re: "Helium filled drives are a dead end" \- Citation?

Also, SSDs come with 10 year warranties [1], so, which exceeds anything you
can get with a normal spinning HD. If anything, I'd have more confidence with
my SSD than I would with a spinning HD - particularly as you don't have to
worry about the drives failing to spin up over time.

[1] [http://www.amazon.com/SanDisk-Extreme-2-5-Inch-Warranty--
SDS...](http://www.amazon.com/SanDisk-Extreme-2-5-Inch-Warranty--
SDSSDXPS-240G-G25/dp/B00KHRYRNM/)

~~~
dsl
One legitimate argument against helium drives is they contribute to the
depletion of the relatively scarce helium supply.

[http://www.boston.com/news/science/articles/2010/10/17/scien...](http://www.boston.com/news/science/articles/2010/10/17/scientists_warn_worlds_supply_of_helium_close_to_depletion/)

~~~
ghshephard
That article reads a lot like "Peak Oil" concerns. There is a lot of Helium
available - [1] _The global reserves of helium are known to be approximately
41 billion cubic meters. Most of them lie in Qatar, Algeria, the USA and
Russia. Annual global production of helium is about 175 million cubic meters,
and the USA remains the largest producer._ "

Regardless, this is something the market can correct for very, very easily. As
helium supply becomes more scarce, the price will go up, resulting in greater
supply. Most of it comes from natural gas, and is so cheap [2] it's not worth
capturing. Oil trades for around $100/barrel. Helium trades at $100 for a
thousand cubic feet (albeit in gas form)

Some of the problems with Helium is that Physics experiments use a _LOT_ , and
previously was so cheap that it wasn't worth trying to conserve. That's
changing - [3] A recycling system can recapture about $12,000/year of lost
helium for a single scientist.

From reading articles - apparently the problems isn't so much that the cost of
helium is increasing - but that it's been so cheap because of the US Natural
Reserves making it completely non-competitive to capture - they are basically
giving the stuff away for next to nothing.

And, putting it in a $100+ Hard Drive that will last a half decade is a _FAR_
better use than putting 10x that much in a $0.50/balloon that will last 30
minutes. (Or Macy's parade which uses 400,000 cubic feet of helium) In fact,
it may turn out to be the best conceivable use of Helium in terms of a value
per m^3 equation.

[1]
[http://www.gazprominfo.com/articles/helium/](http://www.gazprominfo.com/articles/helium/)

[2] [http://finance.yahoo.com/news/airgas-increase-prices-
helium-...](http://finance.yahoo.com/news/airgas-increase-prices-
helium-20-164550580.html) _And on Friday, the bureau announced that it was
raising the price for a thousand cubic feet for crude helium from $84 to $95_

[3] [http://www.nature.com/news/united-states-extends-life-of-
hel...](http://www.nature.com/news/united-states-extends-life-of-helium-
reserve-1.13819)

~~~
Cthulhu_
Thanks for that post - helium scarcity is a load of FUD IMO, since if it
becomes scarce and thus more expensive, companies will jump into the market
and start producing it. Similar to the production of hydrogen for the car
market. Rare earth materials are a much bigger issue, since those are much
harder to chemically manufacture and aren't a by-product of anything (except
recycling)

~~~
rsynnott
> and start producing it.

How? Through transmutation? I can't see that being economically viable.

Helium is a noble gas (generally does not combine into molecules) and tends to
leave Earth's atmosphere when released. There's no viable way to produce more
of it.

~~~
zo1
>" _There 's no viable way to produce more of it._"

If you throw enough money at a problem, it starts becoming viable/feasible.

Have a look at the Diamond production industry for comparison:
[http://en.wikipedia.org/wiki/Synthetic_diamond](http://en.wikipedia.org/wiki/Synthetic_diamond)

Note, don't let the "synthetic" part fool you into thinking they're not real
diamonds. They are real in every imaginable way except they're not "a girl's
best friend".

~~~
teraflop
Diamonds are made out of carbon, which is incredibly abundant on earth.
There's no comparable way to manufacture helium out of anything that doesn't
already contain helium; you need nuclear reactions, which are much more
difficult to do at large scale.

~~~
zo1
Helium is made up of electrons, which is also incredibly abundant on earth.
Either way, sarcasm aside, I guess you didn't really understand what I was
trying to say if you're taking my diamond comparison as perfectly analogous,
and then shooting down the idea because it doesn't extrapolate the same way
you expect it to.

Up until some point, it was also difficult to manufacture synthetic diamonds
at large scale. When the price of the thing desired is right / high enough,
technology will be found and whatever costs necessary will become feasible.

~~~
tomwuttke
Then where is my time machine? I will pay 1 trillion dollars for it! (payment
after a few successful trials)

