
135 Terabytes for $12,000 - gourneau
http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/
======
dmk23
The real lesson here is that if your service relies on "Big Data" processing
it pays off to build your own storage.

If you were to buy 135Tb on Amazon EC2/EBS the same capacity would cost you
~$19,000 _per month_ , not counting charges for I/O and bandwidth.

~~~
beagle3
Well, space, air conditioning electricity and redundancy comparable to what
Amazon offers would add a few thousands per month (I guess $2000-$5000,
depending on where you are).

Still, way cheaper than Amazon, but there are nontrivial running costs beyond
the purchase and build price.

~~~
moe
You should be able to rent half a rack with sufficient power to house this for
around $1000/mo in just about any civilized datacenter.

~~~
lsc
I'd love to sell you a half rack and a 20a circuit for a grand a month. That's
high even here in the sf bay area.

------
driverdan
I've been following the online storage space for years. Every time something
like this comes out I redo my back-of-the-napkin math on potential
profitability, get excited, then find excuses to not move into the space.

Prices have gotten so low that there's no excuse for lack of quality online
storage providers. There are many, but there is only a handful I would
consider good and their pricing isn't following the market. I see this leaving
a huge opening for competition.

If anyone else is seriously interested in online storage contact me (see
profile).

~~~
gourneau
I think there would be a big market to provide $100,000ish per Petabyte
collocation, when the hardware gets cheap enough for that.

~~~
lsc
Am I missing a zero? that's about $0.10 per gigabyte, no? which is dang close
to what amazon charges. Unless you mean one hundred grand up front rather than
one hundred grand a month.

~~~
gourneau
I mean a upfront cost, and then some monthly hosting fee maybe. Might be a
while for hardware to be that cheap.

~~~
lsc
hm. I think the problem is mostly the profit margin. cheap 3tb drives are what
$130 each? and you need 333 or so of those to get 1000gb. If you go with, say,
12 disk zraid2 sets, that'd be around 400 disks. Of course, these are 'seagate
petabytes' I'm counting, but that's only $52000 for the disk; then, say 10 of
the supermicro chassis;

I think you could do it, I mean, at scale where labor costs are mostly
amortized out, if you were willing to accept really low margins, or if someone
would design and test it for free.

But that's the thing, someone like me? even a cut-rate dedicated server/VPS
provider is used to charging around 1/4th to 1/6th the cost of the hardware
/every month/ - obviously, this results in margins that are pretty nice by the
standards of companies that sell goods with significant marginal costs, at
least if you have enough scale that you don't blow it all on labor.

Really, the "big fee up front" model is interesting and warrants a discussion
all it's own.

------
johngalt
There may be a reason to build 20 backblaze pods, but I doubt there is a
reason to build 1. The whole point of a backblaze build is to move the
redundancy out of the individual box.

~~~
asto
If you're handling large amounts of data on your sever(s) and want a single
backup just in case? I think that's what they wanted.

~~~
johngalt
By itself this device is poorly suited to backup. Specially for their use
case. A huge pile of desktop Hitachi disks isn't something you can stick in a
rack and forget about, and people like to forget about backups.

Best case: you have someone who's constantly checking/maintaining this box and
replacing the drives as they fail.

Worst case: an expectation of a backup, but every time you refer to it, it
doesn't work.

Most likely case: A mix of both of the above. Where the box takes a lot of
attention, and occasionally works as intended.

The only redeeming factor is that they proclaim all the reasons why this is a
poor choice. So it seems someone is at least thinking about it.

------
blhack
Okay, so maybe I'm missing the point here, but, uh...

No shit?

Is there anybody who seriously is tossing 135Tb at a _single_ JBOD?

I'm not a storage guy, not even close, and even I know that that is a
frighteningly ridiculous, get-you-fired idea.

These are meant as individual parts in a larger, highly redundant setup.
That's _the entire point_.

Is the next post going to be: "Why you won't be able to download music at
1Gbps even with a GIGABIT SWITCH! Cisco lies!"?

~~~
tommi
Read the post. In particular "Why the folks at Backblaze don’t care about the
“risks”" section.

~~~
blhack
I know, my point is that anybody with enough knowledge about this topic to be
interested in this article would already know.

------
mrb
"The backblaze 2.0 pod has _exceeded expectations_ when it comes to data
movement and throughput. We get near wire-speed performance across a single
Gigabit Ethernet link." - <http://bioteam.net/2011/08/backblaze-performance/>

Oh my. These guys' expectations were very, very wrong. Many single-drive
configurations can sustain 120MB/s+ at the beginning of the platter. A single
drive! It should be no surprise that a 45-drive beast can saturate a GbE link!

~~~
icefox
It is too bad they didn't bond the two ethernet links. Reading through I
didn't see a mention of what the other ethernet link was used for. That should
have given them a jump in speed.

~~~
mmetzger
We're considering building one of these but I used to work for a clustered
storage vendor. The import thing to notice here is this is single client
performance - aggregate performance is often considerably better. ie, you
might be able to push 1GB/s with 10 clients, but only 100MB/s with one...

------
rbanffy
Well... You can make all 46 drives into a zpool with raid-z and boot FreeBSD
or *Solaris from it.

BTW, I can't see the vibration sleeves around the disks on the pictures.
Vibration must be a problem when you pack so much rotating media in such a
limited space.

~~~
cbs
Backblaze's blog talks about the vibration problem and their solution.

~~~
moe
Just a datapoint, yes, vibration is a real issue:
<http://www.youtube.com/watch?v=tDacjrSCeq4>

------
tomkinstinch
I am mildly surprised they can dissipate enough heat from the pods--those
drives are packed so close together.

I can't tell from the photos: Is there any airflow between them?

(For that matter, do higher temperatures decrease MTTF for hard drives?)

~~~
drv
A Google study about hard drive failure indicated that they couldn't find any
conclusive relationship between temperature and failure rate. However, they
did mention it had some effect at the extreme high end of the temperature
spectrum (based on the graph, above 45 °C or so).

[http://static.googleusercontent.com/external_content/untrust...](http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/disk_failures.pdf)

~~~
barrkel
It doesn't take a huge amount of work to drive a disk over 45 degrees. Using
5K drives helps a lot, since many 7K drives will idle at 35 degrees in a room
temperature environment and passive airflow.

I've seen _actively_ cooled 7K drives in my ZFS NAS start giving checksum
errors while silvering (i.e. recalculating parity etc.) after replacing a bad
disk; smartmon reported temperatures of about 60, IIRC. Using an LSI 8-port
SATA controller, I was silvering a 4-disk raidz at a rate of perhaps
300MB/sec, and it was making the drives too hot. I had to fall back to the
motherboard SATA connectors (and ~150MB/sec array throughput) to keep things
cool enough to complete.

~~~
derobert
If your active-cooled drives are getting that hot, then something is very
wrong with either your ambient room temperature (is it 40°C?) or with your
active cooling.

Active cooling in a reasonable-temperature room keeps even 15K drives well
under 40°C at full duty (closer to 30°C, really, according to our monitoring
data I just looked at, the room is around 20°C). Keeping drives within 15°C of
ambient shouldn't be a huge deal (though it can be noisy).

------
gourneau
What do you guys think about using this with Swift from OpenStack? Could I buy
100 of these pods, drop OpenStack on them and get close to 135,000 Terabytes
with some level of redundancy?

~~~
notmyname
Some of the hardware choices that Backblaze has made in their pod design are
interesting. Most of the concerns I have (like the difficulty of getting to
drives to replace them) can be addressed by operational practices (update the
swift ring rather than immediately replace the drive). Other concerns (like
2gbe for 135TB) are more nuanced. Lack of redundancy in the Backblaze pods is
addressed by swift itself--swift will ensure that no two copies are on the
same pod.

I would love to see someone run swift on some Backblaze pods. If you'd like to
talk further, my contact info is in my profile (or drop by #openstack on
freenode).

------
ww520
What would the electric bill and cooling cost be? Power bill is quite
expensive in data center.

~~~
mrb
Very low. I estimate the power consumption at the wall to be ~400W. That's
$29/month at $0.10/kWh. Host this in a datacenter with a non-remarkable PUE of
1.5, and that's $43.5/month including cooling.

* 45 x Hitachi HDS5C3030ALA630 drives, 4.8W each (average when idle), a tiny bit more when seeking so let's round to 5W. Source: [http://www.hitachigst.com/tech/techlib.nsf/techdocs/02D91977...](http://www.hitachigst.com/tech/techlib.nsf/techdocs/02D9197756A273D0862577D50024EC1D/$file/DS5K3000_ds.pdf) and my measurements that HDDs seeking only consume a tiny bit more current than when idle <http://blog.zorinaq.com/?e=50>

* 100 Watt for the mobo, CPU (73W TDP), SATA controllers, and all the fans. Source: my own clamp-meter measurements on dozens of PCs <http://blog.zorinaq.com/?e=42>

* 85% average efficiency of the Zippy PSM-5760 power supply. Source: its 80PLUS certification report [http://www.plugloadsolutions.com/psu_reports/SP105_EMACS_HU2...](http://www.plugloadsolutions.com/psu_reports/SP105_EMACS_HU2-5760V_760W_Report.pdf)

* Hence: (5.0*45+100)/.85 = 382 Watt

------
iamelgringo
Wow. The cost of 1 Petabyte of spinny disk storage is $100k.

A friend of mine who raised money in 1999 said that to get their MVP off the
ground, it took 20 engineers, and $7M in venture funding. I doubt $100k would
have paid for the Oracle licenses back then.

------
wmf
Given that an empty pod costs $5,395, Supermicro might actually be cheaper
(although a little less dense) and it's a real product instead of a prototype.

~~~
lsc
I'm looking at

[http://www.supermicro.com/products/chassis/4U/847/SC847E26-R...](http://www.supermicro.com/products/chassis/4U/847/SC847E26-R1400U.cfm)
(36 drives, and room for a motherboard)

or

[http://www.supermicro.com/products/chassis/4U/847/SC847E16-R...](http://www.supermicro.com/products/chassis/4U/847/SC847E16-RJBOD1.cfm)
(45 drives, das. I have had serious outages caused by DAS back in the SCSI-3
VHDCI days, so I'm likely to start with the 36 bay system.)

for my mass storage project. they are both well under $1500 with power
supplies, backplanes and expanders taken care of.

For me, the big thing is that with my storage model, I'm going to be replacing
disks as they fail and rebuilding the RAID, so having easily accessible and
easily swapped disks is worth paying a premium. (I am planning to have some
cross-chassis redundancy by using zfs snapshots, but I'd rather just keep the
nodes going as is.)

Also, rack density? doesn't save you that much money. Most of what you are
paying for in a data center is power. At the cheapest co-lo I'm in, here is
the cost breakdown:

1 full cabinet (44u) with two twenty-amp 120v circuits: $875 1 full cabinet
(44u) with one twenty-amp 120v circuit: $530

So, if I can double my density, I save $185 a month; and even at the disk
density for my compute nodes (close to 100 disks in a rack) I get one disk
failure maybe every two months per rack; so if I have to slide out the whole
goddam computer, causing some chance of the power getting disconnected and
downtime? yeah, with my model? it's probably worth paying the premium.

I'm just saying; if you are small enough that paying five grand for a
backblaze pod.

Sure, at-scale, it's best to design your systems so that you can get zero
downtime even when hardware fails. But, that's really difficult to do without
introducing new failure modes; even amazon has trouble with it. My strategy is
to accept that hardware failures mean a truck roll and downtime for customers
on the hardware in question. As long as you don't have any one server go down
more than once a year, (and a particular server failing once a year is pretty
pathetic) and as long as you don't have a system where any one server brings
down everyone, you are going to see pretty good reliability using this
strategy.

~~~
e1ven
Who would you buy through? Googling for the 45 drive version gives me slightly
higher pricing.

Also- The big advantage of the BackBlaze version is that people have done it
before, and it mapped out. With the SM case, there's less community.. But it
does look like a good solution.

It looks like the drive bays are pre-wired up, so you wouldn't need to worry
about that? What else would you need? Mboard/CPU/RAM, Raid cards, HDs and
sleeves?

~~~
lsc
Like most supermicro chassis, it comes with the drive caddies, backplane, and
power supplies. All you need is the motherboard, cpu, ram, and a SAS card,
well, and the drives. It's even got an expander built into the backplane. If
you want h/w raid, you need to bring that as well. (I plan on using zraid2,
most of the raid cards that cost less per port than the drives are not better
than software raid.)

I buy most of my supermicro stuff through kingstarusa.com - I know the site
looks a little shady, and you have to email for quotes for almost everything,
but they are good people. My office is actually above their warehouse; I'm
unit C. Their price is usually a few dollars less than provantage, which is
usually the next best retailer for supermicro chassis, and they don't do shady
tax dodge bullshit, and I don't have to pay shipping. I could be
misremembering on the exact price on the 45 bay.

Most of the 'mapping out' done with the backblaze version is already done and
tested on the supermicro.

~~~
wmf
_most of the raid cards that cost less per port than the drives are not better
than software raid_

Let's not get carried away; isn't a "gold standard" LSI controller only
~$1,000 ($27/drive)? But you certainly can buy a lot of Intel cores for that
price.

~~~
lsc
yeah, I am exaggerating on the port cost some, but the big advantage of
hardware raid over software isn't the hardware calculation of parity. A CPU
can calculate parity so much faster than you can write to a drive that it
doesn't matter at all that special hardware can do so even faster still.

The advantage of the hardware raid card is the battery backed cache. If it
doesn't have a BBU and a fair amount of cache, as far as I am concerned, you
might as well be using MD.

Hardware RAID cards have improved quite a lot recently; some of the stuff now
has reasonably sized caches, so perhaps I should revisit my assumptions in
this area. Of course, I'm planning on using ZFS on my storage servers, so even
if hardware raid cards are now a reasonably good deal, they won't do me a
whole lot of good.

~~~
wmf
As much as I like ZFS, I do feel a little misled by the rhetoric about
replacing "expensive" BBUs with slog SSDs... that are actually much more
expensive.

~~~
lsc
Until quite recently, I would not have understood what you meant. The cost per
gigabyte for even really fast SSDs is lower than the cost per gigabyte of RAID
cache ram, so I'd have said "what are you on about?"

But, I think I understand what you are on about now.

Most of us (well, speaking for myself, but I think this is true of most
SysAdmins) have very strong experience telling us "more read cache is better"
- I mean, more read cache, up until you can cache everything the server
commonly reads, makes an absolutely huge difference in performance.

So we look for big caches.

The problem is that most of us don't have the same intuitive grasp of where
the benefits stop coming when adding more space to the write cache, as most of
us don't have a whole lot of experience with large write-cache systems
(outside of netapp/emc type boxes, and I personally attribute their superior
performance in part to their gigabytes of ram that can be safely used as
write-back cache.)

So if write cache works the same way as read cache? yes I will pay the premium
for the fastest 32GiB SSD I can find, if I can use it all as write cache.

The thing is, I'm told, that after a few gigabytes, the returns to adding more
write cache fall off sharply; and if that's true, then yeah, you are right,
'cause you are wasting most of the SSD.

I mean, the real question here is "how much write-cache do I need before I
stop seeing significant benefit to adding more write-cache?" and if that
number is much above what you can get in a RAID card, then the zfs/ssd setup
starts looking pretty good.

------
serverascode
I like the backblaze pod just in terms of using it as a comparison, where it
is the absolute cheapest possible storage really. (Sure you could prob go
cheaper.)

I think they've done something that makes sense for them. It doesn't make
sense for me to use it for storage where I work, but it works for them in this
instance. And it seems to work for Backblaze.

------
mckoss
The application here is write once, read rarely. Would it be possible to
unpower drives once they are full (until needed)?

------
charlesap
Grown defects. Graaah.

------
rorrr
Seems expensive.

2TB drives are now around $70 (Maybe even cheaper if you buy in bulk).

135TB = 68 drives.

68 * $70 = $4,760

Is that rack so expensive?

~~~
wazoox
The box has 48 slots, so they must use 3 TB drives. They may be not completely
crazy and use professional, not desktop drives. A 3TB pro drive costs about
$200.

~~~
duskwuff
If you're sufficiently crazy to trust your data to a Backblaze pod, you're
probably going to go full retard and use desktop disks anyway. If you cared
about your data you probably wouldn't be building one of these things in the
first place. :)

Also: depends what you're considering "pro drives", but enterprise SATA 3TB
disks are in the $250 - $300 range -- about 2x the price of a similar
consumer-grade disk.

~~~
imaginator
Has anyone done any research to find a difference between "desktop" drives and
"enterprise/pro" drives? I'm inclined to believe it's marketing speak and
targeted at the same people that buy electron-aligned speaker cable.

~~~
wazoox
Actually the difference is only in firmware; each drive is tested when
reaching the end of the factory line, and the test results decide if it'll be
desktop or enterprise. So physically the only difference is the label.

Different brands makes the separation more or less clear : WD desktop drives
firmwares are explicitly crippled to be made almost unusable in RAID arrays
(on the web and forums you'll find countless horror stories of lost arrays);
Seagate and Hitachi desktop drives work about OK in RAID arrays, but you may
have surprises at times.

So what's the difference? First, the desktop drive assumes to be alone. In
case of a read or write error, it will try and retry to access the data for
several long minutes (long timeout). Pro drives assume to be in RAID arrays,
in case of an error they fail almost immediately not to block any outstanding
IOs to the array.

Another difference is in vibration compensation. Desktop drive don't use their
movement detectors to compensate from drive-induced chassis vibration, which
in case of high IO will significantly reduce throughput by augmenting error
rate.

~~~
bradfa
If you (or someone you know) has a write up detailing these statements and
tests showing them to be true, I'd be very interested in reading more about
this. Please do share.

It sounds like what you're saying makes sense, from a drive business and
manufacturing perspective.

~~~
wazoox
> _If you (or someone you know) has a write up detailing these statements and
> tests showing them to be true, I'd be very interested in reading more about
> this. Please do share._

This is NDA information :) I'm repeating what I've heard from drive makers and
my experience after setting up several thousands of RAID systems. I didn't
made tests recently, but vibrations can kill an array performance. I've seen a
chassis where the central drive slot (among 24) wasn't usable because it
vibrated more than the others :)

