
135TB for $7,384 - Backblaze Pod 2.0 - hemancuso
http://www.zdnet.com/blog/storage/build-a-135tb-array-for-7384/1453
======
mrb
Direct link to Backblaze's blog post:
[http://blog.backblaze.com/2011/07/20/petabytes-on-a-
budget-v...](http://blog.backblaze.com/2011/07/20/petabytes-on-a-
budget-v2-0revealing-more-secrets/)

They got rid of the PCI bottleneck by switching to PCIe, a bottleneck which
surprised me when they designed version 1.0 of their pod. They could have gone
PCIe at the time, I maintain <http://blog.zorinaq.com/?e=10> and they were
SATA controllers at the time that met their technical requirements (nr. of
ports, Linux support, etc).

~~~
SoftwareMaven
According to the post above, they were using three PCIe and one PCI.

"In the first generation storage pod, we ran out of the faster PCIe slots and
had to use one slower PCI slot, creating a bottleneck."

~~~
mrb
I know. I meant to say they could have managed without using that PCI slot at
the time (ie. using only 3 PCIe SATA HBAs, or using a mobo with 4 PCIe slots).

------
cpt_yesterday
FYI: Protocase (<http://www.protocase.com/>), the company that will build you
the Backblaze case, sent me an email yesterday announcing that they are now
selling _completed_ Backblase storage pods without the hard drives for
$5395.00. I'm not sure if this is the new design or not but I've set up 3 of
these and finding some of the parts (SATA back planes) took weeks of searching
and shady dealings.

~~~
SoftwareMaven
I'm considering putting a few of these together. Do you have any sources you'd
mind sharing (preferably not shady :) for the harder-to-find parts?

~~~
cpt_yesterday
The 5 port SATA backplanes were the hardest part to find when I built these.
You can get them through <http://www.chyangfun.com/> which is who Backblaze
gets them through and we did too. They are hard to get ahold of sometimes so
that's why I was looking around and found some shady companies selling them.
Protocase mentioned above sells these which would probably be the easiest
company to get these through.

Other than this, the other parts are pretty standard and are easy to find with
some simple searching.

------
aw3c2
please submit the actual source next time:
[http://blog.backblaze.com/2011/07/20/petabytes-on-a-
budget-v...](http://blog.backblaze.com/2011/07/20/petabytes-on-a-
budget-v2-0revealing-more-secrets/)

~~~
ChuckMcM
Excellent article.

Running LVM over raid 6 volumes is the 'standard' approach of many enterprise
storage deployments. The 'magic' in a good RAID6 implementation is what it
does when things go wrong, and lots of things do go wrong. The checksumming is
great too.

At some point you can dump the sheet metal on the 'pods', just build your own
rack unit. If you look at the 'big iron' systems from NetApp, EMC, and others
you will see they make one big enclosure then can install simpler systems
within that enclosure. What this buys you is that you can distribute enclosure
services into the big box and take it off the individual boxes. That gives you
better cost efficiencies. And as you point out your system can run with fewer
fans, so you can put a couple of largish three phase fans in the big enclosure
(dead simple and very reliable) and use them to push the air through all the
other boxes. (or pull it). Then add an android tablet for the 'door' of the
enclosure that tells you drive status, etc and you're practically a player in
the storage game :-)

------
alfet
Does anyone know if Backblaze will ever support Linux? I've wanted to use
their service for a while now, but their lack of Linux support has being a big
turn down, and I don't think they have made any change in their statements
regarding this 'issue'.

~~~
brianwski
We would like to, we just haven't had time to get it done yet. It runs
internally, but is lacking an installer and a GUI, and we would need to
prioritize and choose one or more Linux distributions to launch with. Ubuntu
is an obvious choice (we focus more on desktop backup than on servers). But
some people also ask for CentOS and a few others. It bums me out the Linux
community has not solved binary compatibility anywhere NEAR the same level
that Microsoft or Apple has, and few in the Linux community seem interested in
solving this issue which massively, MASSIVELY hinders development and
deployment, but that is a side tangent...

Explanation about the "GUI" comment above -> the Backblaze backup client was
simultaneously written from the ground up compiling on Mac OS, Windows, and
Linux. The same tree and the same source compiles on all three on EVERY SVN
CHECKIN. There is one exception, which is the GUI is an extremely simple stand
alone process entirely natively written to match the host OS. On Mac OS it is
in Objective C in the beautiful Apple GUI layout editor, on Windows we use
Visual Studio and C++ and Win32. The firm rule is these GUIs are _ONLY_
allowed to edit one or two simple XML files, and all the real encryption,
compression, transmission is done by other cross platform processes. On Linux
we configure the XML file with "vi". :-) The X-Windows GUI has not even been
started.

~~~
shapoopy
I don't know how many other folks feel this way, but I would _kill_ for a GUI-
less version of your client. I'd _love_ to configure XML files, or some (any!)
analog.

I've been looking for an off-site backup solution for a (nerdy and technically
competent) home user for years, and yours is the _only_ one that I could
afford (poor recent BA here).

Please?! I would pay $10 a month (probably more, really) for that. Even if you
don't want to support it, could you do it just for me? It'll be our little
secret!

~~~
cakeface
It sounds like those XML files are already there when you install the windows
or mac version. Why don't you just open them up and edit them by hand?

~~~
brianwski
Honestly, the XML files are pretty simple. The one main one the GUI writes out
is called bzinfo.xml and is found on any Mac system at
/Library/Backblaze/bzdata/bzinfo.xml and on any Windows Vista or later system
at C:\ProgramData\Backblaze\bzdata\bzinfo.xml

Backblaze is designed to be used with absolutely no configuration (for many
users they have no idea where their Outlook.pst file is and we don't think
they should have to know), and the only way we could figure out how to make
this work was to backup EVERYTHING on your system unless you explicitly
exclude it. So bzinfo.xml is basically a flat list of excluded folders you do
not want backed up. There is also a throttle in there if you don't want
Backblaze to utterly destroy your network uplink, and a few other small
settings. It's pretty straightforward.

With that said, we really pride ourselves on easy to use software, so it goes
against everything in our DNA to release software with _NO_ GUI at all, but
maybe we'll give that a serious thought. If you are using Linux, you probably
aren't the average Mom & Pop user. :-)

~~~
ebiester
To a linux user, editing an xml file _is_ easy to use. And since it's a pretty
reasonable default to backup /home, you're likely to be pretty safe with
defaults anyway.

Call it alpha, see how much demand there is, and the extra money you bring in
might be the motivation to finish up the pretty. :)

------
ck2
0S03086 can be had for $100 in volume so that's another $900 in savings right
there.

Recently was as low as $107 from amazon in retail package:

[http://camelcamelcamel.com/Hitachi-Deskstar-CoolSpin-
Interna...](http://camelcamelcamel.com/Hitachi-Deskstar-CoolSpin-
Internal-0S03228/product/B004QMA882)

------
neelm
The diagram of the _Cost of a Petabyte_ is very interesting if it is true. It
demonstrates the profitability of some SaaS models for selling what is a
straight commodity.

However it seems in contradiction to the AWS, Rackspace model, which is a race
to the bottom in that there are many competitors and they are selling a
commodity (independent of the other high value services they are selling).
There is some threshold of volume that is key in order to make money in that
space.

------
kylec
Backblaze should really consider selling their pods - it would basically free
money for them because whatever inventory they don't sell they can use for
their service, and I'm sure there's lots of businesses that would pay good
money for a cheaper alternative to storage servers from Dell or HP.

~~~
SystemOut
It'd be a major distraction, though. Setting up a separate support/services
organization for that is not trivial.

Also, most businesses buy from the Dells or the HPs because they don't have
the in-house expertise to manage a more bare-bones box (or more likely just
don't want to). The companies that have the need/capability to manage more raw
storage could just take the plans and build them out anyways I would imagine.

~~~
stcredzero
Sounds like there's a niche and a partnership opportunity. They certainly have
enough mindshare to consider this.

------
ghoul2
The blog post provides some fascinating data, thanks Backblaze!

$2100 per month for an entire rack worth of Pods (space, power, connectivity)
$74,000 is the cost to build 10 Pods to fill that rack. If the Pods are
assumed to have a lifetime of 3 years (most will last longer, but lets
depreciate at this rate), and if the cost of capital is 20%/year, this equates
to a monthly "payment/amortization" cost of $2750. Thus leading to a total
cost per rack of $4850 ~ lets say $5000.

1350TB of _raw_ storage is provided by this rack, which can be scaled to 13/15
to account for RAID6 (as revealed by brianwski here) - thus leaving 1170TB
available for use. FS overhead etc, lets take this to provide 1PB of storage.

So, essentially, storage _costs_ backblaze about 0.5 _cents_ ($0.005) /GB-
month. There are other costs ofcourse, Sean (amongst others) needs to get
paid, etc. To go by a common thumb rule, for a minimum sale price, one third
of sale price should be profit, another third should be org
expenses/marketing/everything else and the remaining third should be the
actual cash cost of the building/providing the product/service.

So very roughly, Backblaze could provide storage at about 1.5 cents /GB-month.
Factor in 3-way software-level redundancy of data, and you are now upto 5
cents/GB-months for a very high quality storage service.

Contrasting this with 15c/GB-month that Amazon charges (in addition to
transfer charges), I do have to wonder why Backblaze wants to stick to the
"unlimited desktop backups" business. Even Google storage charges 17c/GB-
month, in addition to per-request charges.

Its quite possible I have some factors wrong here and If anyone can spot
anything wrong, I'd like to know. Nevertheless, it seems from the numbers
provided that backblaze could make a killing in this market. I know I would be
interested in using a pure storage backend - equivalent to S3. I use tarsnap
and if tarsnap could reduce its backend costs by two-thirds, I know I'd be
very happy.

What am I missing?

edit: wrote PB instead of TB. Numbers remain correct, though

~~~
ghshephard
I pay Amazon.com $100/year for 100 Gigabytes, or $8.33/month for 100
gigabytes, or $0.083/month/gigabyte. No transfer charges.

[http://phx.corporate-
ir.net/phoenix.zhtml?c=176060&p=iro...](http://phx.corporate-
ir.net/phoenix.zhtml?c=176060&p=irol-newsArticle&ID=1582734&highlight=)

~~~
ghoul2
True. Though thats not a generic storage service in the sense of S3 or Google
Storage. Some also think its a "trojan" to detect pirated media :-)

On that note: I could not find any real detail about this - do they encrypt
your files? what kind of privacy/security do they promise? I am assuming the
media files do get de-duped across all users, but what about other, personal
docs, spreadsheets, pdf etc?

~~~
ghshephard
I actually haven't found any restrictions on what I can store - I've just been
tossing stuff up like crazy.

Tools like: [http://lifehacker.com/5788550/mount-your-amazon-cloud-
drive-...](http://lifehacker.com/5788550/mount-your-amazon-cloud-drive-space-
like-a-network-folder-in-windows) will let you mount the drive in windows - I
have to believe it will only be a little while before Amazon lets me mount
this drive with a utility they provide.

------
Valien
I switched from JungleDisk to Backblaze a year ago and haven't looked back. BB
is amazingly fast and painless to use. On top of that the cost is insanely
cheap in comparison to using AWS (which Jungle Disk uses).

~~~
unkoman
I am too a very happy switcher from Carbonite to Backblaze. It really is fast,
as Valien says. The cost? Well, it has gotten cheaper since the dollah is
dying.

------
rdoherty
It's 135TB worth of drives, but with RAID don't you see a far less useable
amount?

Also, considering saving money on hardware costs is a key factor in Backblaze
staying competitive, they must be saving money elsewhere and/or have other
competitive advantages. Otherwise releasing this information would be akin to
publishing a restaurant's 'secret sauce'.

~~~
brianwski
Anybody who builds their own pod is welcome to use RAID or not, and which RAID
you choose will affect your final numbers. At Backblaze, we configure it as
RAID 6 groups, each group is 15 drives which includes 2 parity drives. So you
are down to 13/15 = 86.67 percent of the raw unformatted space BEFORE the
overhead of ext4 which adds a little tiny bit extra. So after formatting in
our datacenter we are left with about 116 TBytes of storage space for customer
files. On the other hand, we use lossless compression on the customer's files
before transmitting to the pod, so it can sometimes appear like we are fitting
more than 116 TBytes of customer data on a pod, if that makes sense.

------
ComputerGuru
I wanted to use Backblaze after hearing people rave about it, but on Lion it
completely breaks the network stack somehow during upload/backup..

~~~
brianwski
Yikes, what? We all run Lion here at Backblaze and it works flawlessly for us,
and our internal stats show THOUSANDS of Lion customers happily backing up. If
you can provide more details of your setup we would LOVE to fix that for you.
Since Mac customers make up more than half our customer base, we're
completely, 100 percent dedicated to Lion. Send us info or contact us at our
company name at twitter or facebook or use the help links off the homepage.

As a side note -> Backblaze only does one thing -> HTTPS POST. In user space.
We do _NOT_ extend the kernel, we do not have drivers (at all), we simply read
files (we don't even write to your files), compress and encrypt them in RAM,
and push them through the completely common HTTPS. It is unusual for Backblaze
to cause any problems except a sluggish network or a hiccup in a skype phone
call.

~~~
ComputerGuru
Hi Brian!

I sent an email to sales support explaining my issue on OS X Lion when I was
trialing your product, but did not get a response.

Please feel free to contact me at mqudsi@neosmart.net. What I was seeing
seemed to be that the software was flooding the network card (wi-fi on a slow
WiMax connection) with requests, causing it to timeout entirely such that I
could not even ping my router. This doesn't require drivers, kexts, or
anything and can be replicated in user mode. It's pretty much akin to DDOSing
your own connection.

While I ended up purchasing a Mozy home license for a year, I do not think
I'll be staying with them as I am not satisfied with either their client or
their backend.

------
mrbill
I would mirror my rsnapshot backups to Backblaze if they had Linux support.

------
bgentry
What's the RAID config of these boxes? 45 drives, 4 controllers, and a 16TB
volume limitation from ext4. Please share if you know!

 _\--edit--_

I know it's RAID6 from the article. What I'm wondering is, how many drives in
an array? How many arrays per box?

~~~
brianwski
At the lowest level there are three RAID groups in each pod. Each RAID group
is made of 15 drives configured in software RAID 6 with 2 parity drives. This
means you can lose 2 dives and the data is entirely safe and intact. If 3 or
more drives completely fail simultaneously (not just pop out of the RAID group
or power down, but where that drive is lost forever, like it will never power
up again) you will lose at least some of the data on that RAID group. Layered
on top of the 15 drive RAID group is LVM. The specifics of the LVM config are
there is one PV (Physical Volume) spanning the 15 drives, then on top of that
are one VG (Volume Group) spanning the same exact 15 drives. Then on top of
that are as many LV (Logical Volumes) as it takes to keep each logical volume
under the ext4 limit of 16 TB. With 3 TB Hitachi drives, there are 3 separate
LV on top of the same exact 15 drives. Finally, there is one ext4 file system
sitting on top of each of the LV (one to one with the LV). Disclaimer: I work
at Backblaze, but datacenter and pods aren't my main area of focus.

~~~
whakojacko
While you are here....Why only a single boot drive? It seems like an obvious
failure point, and at only $40 per drive I would think soft raid 1 would be
no-brainer for reliability.

Regardless though, super impressed by the work into rolling your own hardware,
hope you guys continue to do well.

~~~
brianwski
I just asked, and found out we actually HAVE had a number of boot drives fail
in our fleet of 200 pods. Most decisions in the pod are around saving money,
so our initial thoughts were just that no customer data is on the boot drive
so it isn't all that important. But don't get me wrong, there are SO MANY GOOD
opportunities to improve the pod, Backblaze just stops working on the pod when
it does what is needed for us and we run off to focus on other things. Your
call on the boot drive is every bit as valid as ours. :-) I'm staring at an
open pod here and I see plenty of good spots to put a second boot drive, and
we'll probably be going to a 2.5" form factor (laptop) boot drive sometime
soon which would yield even more space.

~~~
thaumaturgy
(Just FYI, I'm sure you guys have already thought of or experimented with
this...)

We've had good luck so far with using small USB flash drives for booting big
file servers. We keep the drive image pretty generic and if there's a problem
with one, we just replace it with a cloned USB flash drive and reboot, no
problem.

It doesn't seem to hurt performance at all for these kinds of uses -- although
we do set it up without swap to keep the life of the USB flash device
reasonable, which might or might not work in your case.

------
jbooth
No ECC?

~~~
brianwski
Yes, the RAM is ECC. Here is a link:
[http://www.crucial.com/store/partspecs.aspx?imodule=CT25672B...](http://www.crucial.com/store/partspecs.aspx?imodule=CT25672BA1339)

The ECC RAM absolutely does find and corrects problems (we see them in the
logs). However, just to be absolutely clear we would not need ECC RAM ->
Backblaze checksums _EVERYTHING_ on an end-to-end basis (mostly we use SHA-1).
This is so important I cannot stress this highly enough, each and every file
and portion of file we store has our own checksum on the end, and we use this
all over the place. For example, we pass over the data every week or so
reading it, recalculating the checksums, and if a single bit has been thrown
we heal it up either from our own copies of the data or ask the client to re-
transmit that file or part of that file.

At the large amount of data we store, our checksums catch errors at _EVERY_
level - RAM, hard drive, network transmission, everywhere. I suppose consumers
just do not notice when a single bit in one of their JPEG photos has been
flipped -> one pixel gets every so slightly more red or something. Only one
photo changes out of their collection of thousands. But at our crazy numbers
of files stored we see it (and fix it) daily.

~~~
db48x
Merkle Trees FTW, but it sounds like you've basically reimplemented ZFS.

Have you tried using ZFS on one of these?

~~~
brianwski
Earlier we were totally interested in ZFS, as it would replace RAID & LVM as
well (and ZFS gets great reviews). But (to my understanding) native ZFS is not
available on Linux and we're not really looking to switch to OpenSolaris.

ANOTHER option down this line of thinking is switching to btrfs, but we
haven't played with it yet.

By the way, at Backblaze I've felt like we have had to implement several
things that I would have guessed would be standard "off the shelf". One
example: When a customer wants to download a restore file with a web browser,
are you aware there are no checksums for over-the-network transfers other than
the built in (completely unacceptable) 16-bit TCP checksum? You are virtually
guaranteed to have an undetected corruption within 30 Gbytes of download,
which is basically what we like to call "a totally average customer restore".
So Backblaze had to write our own custom reliable, restartable "downloader".
It boggles my mind that the whole internet is throwing undetected errors on
HTTP downloads and nobody cares to fix the protocol?!! Where the heck is
Google, Apple, Facebook, Microsoft, or <insert standards body> defining a
standard for web browser downloads larger than a few GBytes?

~~~
zerosanity
We're using FreeBSD's ZFS support and haven't had any hiccups to complain
about.

~~~
thaumaturgy
We've had a backup server running for a client with FreeBSD's ZFS for about a
year now, also with zero hiccups. We first tried to get this kind of stuff
working about three or four years ago, and went the OpenSolaris/ZFS route --
what a pain that was. OpenSolaris somehow corrupted its own boot partition one
day and just completely refused to start with a totally cryptic error code.

The last time we tried Linux for this, it couldn't do it. Maybe that's gotten
better though.

We also tried DragonflyBSD, but it had hardware support issues, and OpenBSD,
but unfortunately OpenBSD just simply cannot do large filesystems. At all.

ZFS-on-FreeBSD is the way to go at the moment, I think.

