
Sun engineer responds to the Backblaze "Petabytes on a budget" design - bensummers
http://www.c0t0d0s0.org/archives/5899-Some-perspective-to-this-DIY-storage-server-mentioned-at-Storagemojo.html
======
sophacles
Ok I must be missing something. Everyone is knocking this backblaze thing
because it doesnt do ZFS or because it is not a "super duper high end san
replacement" or because the components are not rated for enterprise level
work. It seems to me that a company whose product is personal and small
business backups does not need any of those things. They have software to do
mirroring. Why would they need any fast network filesystems if all data comes
and goes via the internet? Will the hard drives see that much data I/O, or
will they mostly just fill up and sit idling with the very occasional read for
restore (i strongly suspect the later)[1].

Of course on the other hand, there are the poeple touting it as the end all be
all, a solution to kill NetApp.

I guess what Im wondering is: how did so many people get the idea that this
article about a specific solution to a specific problem was actually some sort
of general purpose solution attacking all the big name people? What am I
missing?

[1]A huge chunk of this article is about the hard drives and PSUs not being
enterprise ready, but for an enterprise load, but I just don't see it. I bet a
lot of these boxes run idle a large chunk of the time. I have 10 year old
desktop hard drives running just fine in a file server, because it has a
similar load: mostly idle most of the time.

~~~
rbanffy
"how did so many people get the idea that this article about a specific
solution to a specific problem was actually some sort of general purpose
solution attacking all the big name people?"

Backblaze did it, They did it when they decided to put this graph in the
article:

[http://blog.backblaze.com/wp-content/uploads/2009/08/cost-
of...](http://blog.backblaze.com/wp-content/uploads/2009/08/cost-of-a-
petabyte-chart.jpg)

It's an apples to oranges comparison.

~~~
russss
Well no, it's not an apples to oranges comparison because no hardware vendors
offer a solution which is comparable to the one Backblaze have devised.

That graph is simply comparing their solution to the closest commercial
equivalents. It just so happens that these are all way off because all
hardware manufacturers want to design their hardware to provide the highest
throughput.

(That said, our experience with the Sun X4500s hasn't been great from an I/O
point of view.)

~~~
Periodic
It is an apples to oranges comparison because Blackblaze is only including the
cost of components in that graph for their solution, but they're including all
of the research, development, assembly, and support costs in the other
vendors' solutions.

What are the labor costs for testing all their hardware (with 10 sata
controllers no less)? What are the labor costs for assembling all those
systems? What is the labor costs of developing the system architecture? What
are the labor costs of creating their storage application which reduces their
need for in-box redundancy? And if you want to talk about Amazon there is data
center rental, power, cooling, and ongoing administration.

In short, they're comparing the costs of their components to the costs of a
ready-to-go solution from other vendors. Sounds like apples and oranges to me.

~~~
easp
They did invite these criticisms by drawing comparisons to S3, EMC, etc.
However, they already tried to factor out the cost of operations so that s3
was on equal footing with the non-service offerings. We can quibble about the
specific numbers they used, but I think the bigger quibble is that by their
own estimates, those costs are substantial, seemingly far outstripping their
hardware costs by something like an order of magnitude. Even so, at the scale
of a pentabyte or more, the savings are substantial enough to be worth
addressing.

As for the missing costs that you cite, think this through. What do you really
think the per-unit costs for assembly and testing are? Even if each unit
required a couple of man-days, the costs would probably only add ~10% or so
per unit. The other costs you cite for the initial hardware and software
engineering are fixed costs, the same whether they are storing 1PB or 1,000.
Also, if they did their job right, much of the per-unit testing cost should be
mitigated by the overall systems design. The system management automation can
do a test cycle on new nodes before promoting them to production use, and of
course, failures should be dealt with automatically.

------
ShabbyDoo
I used to work for a large consumer photo site, and we wished we could trade
speed for lower cost or lower power consumption. Forget SCSI drives, SATA
disks were 10x faster than what we needed for long-tail storage. If a photo
became popular, we could cache it forward. The parato-ization of the photos
was probably 99.9/0.1. I'm actually surprised that drive manufacturers haven't
come out with a slower disk that is cheaper, lower power, and/or more
reliable. We could find a zillion places to hide latency from our users.

~~~
wmf
Seagate Barracuda LP and WD RE-GP are ~6000 RPM disks that are lower power
than normal 7200 RPM SATA. Maybe next year they'll come out with ~5000 RPM
disks.

~~~
ShabbyDoo
Cool. I wonder what sort of density could be achieved with slow solid state
storage. The line between disk and tape continues to blur, I guess.

------
jacquesm
That's a pretty weird analysis, comparing a system that is special purpose
with a general purpose one.

He's concentrating on disk-to-board throughput, but this being network
attached storage on a LAN, using the motherboards' single ethernet port the
amount of data read or written to those drives per second will never exceed 1
Gbit/second anyway.

I agree with the reliability analysis though, backblaze seems to be on fairly
thin ice there.

Still, it seems to work for backblaze, I'm taking it as read that they've
lived through a bunch of drive failures by now and they seem to still be in
business.

~~~
anigbrowl
I think his 'DC-3' analogy is valid - BB has a decent system if you don't mind
the risks, Sun is selling peace of mind.

~~~
Hoff
FUD and RAS being two of the basic tenets of marketing most anything in the
enterprise IT space, after all.

~~~
rbanffy
There is a lot of good engineering in the Thumper. There is a lot of clever
engineering in the Backblaze. The comparison states a lot of valid points on
power distribution, drive reliability and RAID throughput.

I see Backblaze will learn a lot from building their storage systems and
future versions may be much better, but this one cuts too many corners for me
to feel confident it will have the same kind of reliability I would recommend.

What amazes me most is the use of desktop-grade hard drives. Not because they
are sloppily built, but because their performance requirements are so
different from a server environment.

This makes for some nice reading:
[http://cacm.acm.org/magazines/2009/6/28493-hard-disk-
drives-...](http://cacm.acm.org/magazines/2009/6/28493-hard-disk-drives-the-
good-the-bad-and-the-ugly/fulltext)

------
tlrobinson
"It's the same with this storage, this hw needs the parachute in form of the
software in front of the device."

Yes, _software_ on (relatively) unreliable hardware is exactly how Google has
been able to scale to such an immense scale.

------
smanek
I think everyone agrees the Backblaze (bb) has no performance - and that's
fine.

But a lot of his complaints are about reliability, and thats kind of a moot
point since Backblaze (and anyone sane) stores every piece of data on at least
two (and ideally 3 or more) separate bbs. I would argue if you don't need
performance (i.e., no huge database), but you do need a lot of space, you're
better off just setting up two 'mirrored' bbs (so that when one goes down, the
other takes over) than almost any 'enterprise' solution.

He does have a point about ZFS though - I'm sure at that kind of scale
eventually the RAID 5/6 write-hole is going to bite you in the ass.

~~~
abyssknight
I read that as "every piece of data on at least two (and ideally 3 or more)
separate bbs", meaning BBS as in a bulletin board system. Wow, that caught me
off guard.

------
skolor
Fascinating read, and its nice to see he brought up a lot of the concerns that
were mentioned here. A few things that I noticed:

He mentioned that 1 Disk uses 120 MB/s, so 5 of them is 600 MB/s. He then says
that converts to 6 GB/s. Now, I'm not all that familiar with speeds, but I
know 1000 MB is 1 GB, so shouldn't 600 MB/s be .6 GB/s?

Other than that, he raised a number of valid points. The Backblaze system is
NOT general purpose. Its designed to take a bunch of data (Backups) and hold
on to it for as long as possible. In that situation you don't need high
Throughput, or even all that reliable hardware. Once the drives are full,
their usage will drop, and as long as the RAID can be kept up to keep the data
secure, as long as costs stay down they're fine to keep replacing hardware.

One thing of note, and something I never saw in the Backblaze article: Their
service promises to hold your data for 50 years. Assuming that one of their
pods is filled, what is the estimated cost to keep that data over the course
of 50 years? I would assume they calculated that out, and that the data showed
to go with the consumer grade parts, but I would really like to see the
numbers.

~~~
jacquesm
You got confused about the GB/s because he uses the capital "B" at some point
where it should have been the smaller 'b' for bits, not bytes. So, 600 MB/s is
600*8 = 4800 Mbps.

I hope that clears it up.

Please note that this is the guys private blog, he definitely does not speak
for Sun in that spot and to label the article "SUN Engineer responds" makes it
seem like the response is an official one but it very clearly isn't. The text
is full of other mistakes as well.

------
psadauskas
He also leaves out that you could buy 10 of these Backblaze things for the
price of one Sun X4540. So with a distributed load, thats 10x the performance,
10x the reliability, etc... I'm not an expert on storage, but I'd be curious
to see how that stacks up, never mind the comet-proof redundancy by placing
your storage in datacenters across the world.

~~~
yangyang
10x the rack space, 10x the cooling, 10x the power ...

~~~
abyssknight
People often forget that the rack space, the actual physical space, is very
costly. Depending on the datacenter, the power might run you just as much too.
Last company I worked at was essentially forced out for using too much power
per U.

~~~
sho
Would you really buy a hackish low-spec solution like this and then place it
in a premium rack?

Given the idea, buy 10 and place them at geographically diverse points, you
may as well put them in the back room at branch offices or something. Who
cares if one goes offline for a while. Hell, put them in countries with good
consumer internet connectivity (japan, korea, sweden etc) and just use that.

If you're going ghetto, go all the way!

------
barryrandall
_At first don't compare the mentioned list prices with the street prices for
components._

This is why small companies, startups, and people new to the storage world
hate working with storage vendors. I understand the economics, I get why they
sell storage the way they do, and I like my 70% discount off list price, but
it makes for a horrible user experience.

If I were a cleverer person, I'd apply an every-day low price model to
"enterprise" storage.

~~~
lsc
I went with the reseller I did at svtix, primarily because they were upfront
enough to list prices on their website. <http://egihosting.com> \- I don't
want to spend time fucking around over price, and I'm sure you don't want me
to waste your time. Post the price, and I'll buy it or I won't.

Seriously, If you say you want six figures for your product, I'm walking. I
don't have that kind of scratch, and I'm not going to waste your time or my
time lowballing you. If you really mean $30K, well, I'd have to scrimp and
save for a few months, but it's doable if the product really does solve more
than $30K worth of problems for me.

------
rythie
Even though Sun are at the cheaper end of the SAN market - I struggle to see
why adding a network interface to hard drive still ends up as 10x the cost of
the drive.

------
anarcticpuffin
I agree that what the article misses is that Backblaze is spending the money
on software rather than hardware, since the price difference is so huge (and
the software solution is a fixed cost).

What I'm more interested in is whether the lower MTBF of the cheap drives and
home-brewed chassis ends up with a higher cost per year due to higher failure
rates. If a desktop drive costs $100 and fails three times a year, but a
server drive costs $200 and fails once every 2, the initial cost savings is
moot.

~~~
sho
Well, remember that if you were seriously on a budget you'd just return all
the failed drives for warranty replacement (Seagate is 5yrs). The hardware
costs would never add up to more than the enterprise, it's that the labour
costs of dealing with it would be more, and worse - more likely to lose data.
Enough more likely to justify the additional cost? Hm, I doubt it, not for
this use case.

------
brisance
How is MTBF derived? He claims "1.2 million hours normalized on 40 degrees",
that's almost 140 years. It seems to me that it's just some extrapolated test
statistic being bandied about with no relevance to real world conditions.

------
thegoleffect
Seems like he's just making it easier for Backblaze to match feature to
feature by listing out the differences O_o.

~~~
docmach
I'm pretty sure Sun isn't worried about Backblaze as competition in the
enterprise storage market. Building cheap storage boxes is much different than
building what Sun provides.

~~~
thegoleffect
That's usually how disruptive technologies start out though; the fact that Sun
doesn't care makes Backblaze that more dangerous.

------
c00p3r
zfs with OpenSolaris is a good tip. 2009.06 is a much less mess (leave alone
that we-will-rewrite-everything-in-java svc:/* horror)

~~~
kdw
zfs performance in 2009.06 has some _very_ serious problems, particularly with
zvols.

~~~
jrg
Do you have references and pointers? Is this going to hurt me when I put two
non-storage servers live running 2009.06?

~~~
kdw
You can find a number of them on the opensolaris forums if you search on terms
like 0906 iscsi, comstar 0906, etc... here's an example:
[http://opensolaris.org/jive/thread.jspa?threadID=104593&...](http://opensolaris.org/jive/thread.jspa?threadID=104593&tstart=105)

Our experience was that 0906 was utterly unusable if you're using comstar,
with performance being mildly degraded for straight zfs usage.

