
Backblaze Storage Pod 4.5 – Tweaking a Proven Design - slyall
https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-proven-design/?hn=1
======
ricardobeat
I really enjoy these posts, and would be a happy Backblaze customer if they
didn't have this (unpublicized) policy of deleting external hard-drive backups
after 30 days:

    
    
        Backblaze works best if you leave the external hard drive attached to
        your computer all the time. [...] If the drive is detached for more
        than 30 days, Backblaze interprets this as data that has been
        permanently deleted and securely deletes the copy from the
        Backblaze datacenter.
    

This probably reduces the total amount of storage used, making the $5/month
economically viable. I'd happily pay more if it weren't for this though,
currently paying ~$12/month for Bitcasa.

~~~
atYevP
Yev from Backblaze here -> we do have that on our website
([https://www.backblaze.com/cloud-
backup.html](https://www.backblaze.com/cloud-backup.html) and
[https://www.backblaze.com/remote-backup-
everything.html](https://www.backblaze.com/remote-backup-everything.html)) and
in some help articles but I hear you. It's a topic that comes up, but when
we've done the calculations in the past, keeping a copy forever as an archive
raises our costs to the point where we'd have to raise our prices and we're
not too keen on that! We have notification emails so that we let you know
ahead of time before data is removed though, so it's rarely become an issue in
terms of "accidental deletions".

~~~
ricardobeat
Hi Yev. It is indeed on the website, but only in the second link you
mentioned. I only found out after signing up for the service last year, and
then decided to cancel. I wanted to backup photo collections that I might only
access once a year - not very keen on having to do a maintenance scanning
session once a month. Maybe you could come up with a separate plan, with
limited storage, that solves this use? In any case, thanks for taking the time
to reply.

~~~
atYevP
Yea, there are lots of ways to address it, and we talk about it every now and
again though haven't come up with an elegant solution yet! One the first link
it states that deleted files will be removed after 30 days. A removed hard
drive acts the same as a deleted file, since it's no longer on your computer.
I'll chat with our team and see if we can make it clearer.

------
grandinj
Aaaargh. Yet another idiotic website that disables zoom gestures, making it
unreadable on my iphone.

------
k1t

      Secure Connection Failed
    
      An error occurred during a connection to www.backblaze.com.
      Cannot communicate securely with peer: no common encryption algorithm(s).
      (Error code: ssl_error_no_cypher_overlap)
    
    

Oh dear, please upgrade your SSL certs.

~~~
duskwuff
This has nothing to do with the SSL certificate. The site is configured to use
RC4, which _used_ to be the recommended cipher to avoid certain attacks
(particularly BEAST). However, RC4 has other weaknesses, and this is no longer
recommended. Current versions of Firefox will actually refuse to negotiate an
RC4 connection.

~~~
sp332
Doesn't that mean that the certificate is using the outdated RC4 cipher, and
should be updated?

~~~
skuhn
No, the SSL cert does not use RC4. RC4 is a block cipher which is used to
encrypt data transferred between your browser and the server, akin to AES.

Backblaze does need to step it up in terms of their SSL configuration [1],
particularly if this is indicative of their configuration for actual file
transfer (no idea though). If I had to guess, they're using their OS's OpenSSL
and thus trapped on 0.9.8 or 1.0.0 (both of which do not support TLS v1.2) and
maybe haven't spent much time tuning the config lately.

A lot of places are afraid of diverging from their OS vendor's version of
OpenSSL, but I personally think it is a mistake to get stuck in time with such
a crucial component. Likewise if your service is built around nginx -- are you
going to stick with whatever RHEL 5 shipped for 8 years?

[1]
[https://www.ssllabs.com/ssltest/analyze.html?d=www.backblaze...](https://www.ssllabs.com/ssltest/analyze.html?d=www.backblaze.com)

~~~
WatchDog
RC4 is a stream cipher, not a block cipher.

------
Scarbutt
So they don't offer a linux client to keep people away from running it on
servers? ;)

------
fraXis
Duplicate:
[https://news.ycombinator.com/item?id=9154273](https://news.ycombinator.com/item?id=9154273)

~~~
slyall
I submitted and got that one. Problem is it had already vanished from the
"new" page so wasn't going to make the front, ever.

Resubmitted with "Backblaze" in the title to get better traction. I got lucky
this time, sometimes I submit something and get nowhere while another submit
is the ones that takes off.

Maybe a tweak to the new page so that if somebody resubmits a URL that has
fallen off the new page it gets pushed back on there. The current system is a
bit of a lottery, just one chance for a URL to make it and then it's gone.

------
omarforgotpwd
Why design your own storage pods when services like Amazon S3 offer similar
prices, even with bandwidth?

~~~
skuhn
It's not trivial, but it is fairly easy to beat S3 on storage costs alone, let
alone the other factors.

I personally wouldn't use Backblaze's storage pod, so my numbers are based on
equivalent components (and don't reflect what's possible if you negotiate
prices and optimize for your particular needs -- Backblaze's costs are surely
lower than this):

    
    
      11 Supermicro 36 drive 4U server : $4000 each
      1 48-port top-of-rack switch : $5000 each
      374 4TB drive : $400 each
      22 500GB drive : $200 each
      2 CDU : $3000 each
      Rack + integration : $10000 each
    

Total rack cost is $220,000 for 1.4PB or 700GB with basic 2-copy redundancy.
Power draw is around 8kw, so 1 year of power is around $1100. Figure on a
drive failure rate of 5%, memory 2%, PSU 3% and you might spend another
$18,000 to stock spares. So besides your core datacenter costs, this solution
will cost $239,100 to operate for a year. These costs only improve when you
amortize the hardware over a 3 year period, which is standard.

Now how much does S3 cost to store 700GB for a year? $248,530.94.

Finally, consider bandwidth costs. I'll crib from a recent comment of mine
[1]:

1 gbit/s in a datacenter: $30,000 / year 500 mbit/s in S3: $142,540 / year

And that in a nutshell is why AWS doesn't make financial sense at any
significant scale. S3 makes sense if you aren't really going to use all of
that space: you can scale your spending up as your needs scale up. That isn't
the kind of problem Backblaze or other storage businesses have, there isn't
much chance of that storage going unused for any extended period of time. And
if they architect their datacenter and service correctly, the down period
between rolling in hardware and getting it utilized sufficiently to begin
seeing a return on the investment should be minimal.

[1]
[https://news.ycombinator.com/item?id=9139244](https://news.ycombinator.com/item?id=9139244)

~~~
toomuchtodo
Thanks for doing the math. I've repeatedly shown people (I do Infrastructure,
sometimes AWS sometimes physical) that AWS is _awesome_ for proof of concept.
All opex, no commitments. As soon as you start getting to a certain scale
though, time to move onto physical equipment when you've got predictable load
patterns.

If Stackoverflow can still do hardware colo'd, so can you.

~~~
skuhn
I completely agree. I sometimes take flack for "hating" AWS (and I _do_ hate
some aspects of it), but I think it's a great platform for incubating your
service or proving a concept. Every company I've been at in the last five
years has used AWS to an extent, despite operating their own datacenters. It's
great for random analytics jobs or other tasks that aren't steady work tied to
your core business.

It quickly starts to not make sense for services that have usage-driven growth
patterns. AWS's billing model just doesn't work for this -- even though higher
usage is discounted, it isn't enough to overcome the skew of the model.
Companies that have deployed on AWS often realize this too late: their
business starts to take off, and they get crushed under giant AWS bills that
would be disproportionate even at 50% off list. It's much harder to move off
once you've hit scale.

~~~
acdha
A lot of this also comes down to ops & the features you really need – e.g.
your example above didn't include the costs of the geographic redundancy or
strong data integrity features which are standard with S3. A lot of places
simply don't have geographically redundant datacenters and either lack the
support team to run something like a replicated cluster filesystem in heavy
production usage or, more commonly, don't have enough usage to really push it
into clear win.

In the case above, the cost differential at 700TB is roughly the cost of a
single sysadmin and so the running operation would probably be a wash versus
S3 unless it had heavy data churn to maximize the S3 expenses or a good way to
amortize the staffing needed for 24-hour-a-day support across other projects.
Many places have budget processes which favor spending more on known upfront
costs than getting permission to add full-time employees in the hopes of being
able to beat those costs down the road.

I'm hoping that something like OpenStack Swift matures to the point where some
of the complexity at the software + ops level starts making the DIY option
routine. You'll never get rid of the minimum sysadmin commitment but that
becomes a lot easier if it can be 5% of the night-shift because the software
doesn't require a lot of care and feeding.

~~~
skuhn
Yeah, my solution is a gross oversimplification -- my math is just back-of-
the-napkin stuff, you can actually do much better on component pricing (I
certainly don't pay $5000 for a 1G top-of-rack switch) and there are other
variables to consider.

To quickly address some of your points:

Geographic redundancy: not included. From a hardware perspective it's a pretty
easy upgrade (build two datacenters in CA / VA, buy two racks, put one in
each). From a service perspective, it's a bit more challenging. If storage is
your business, these are the core components you should own though.

The US Standard region in S3 does sort of provide geographic redundancy, but
it's a question of whether it's good enough for your use case. S3 is a
complete black box, and you have no way of knowing with certainty that every
data blob is actually in both geographic regions. Besides that, it's
eventually consistent, so if you need absolute assurances that data is not
lost after a write succeeds, you have to implement logic on top of S3 -- and
that logic must run in EC2 instances, or you'll pay a fortune in bandwidth to
execute it.

No other region in S3 provides geographic redundancy (or eventual consistency,
the two are related), so if you wanted to deploy storage in Europe or APAC,
you have to handle the geographic distribution yourself -- and it will cost
double to store that data.

The small cost difference: 1 rack over 1 year has a pretty small differential
between the two. But you can optimize the datacenter numbers, and the S3
numbers are what they are unless you get massive. For example, over 3 years
the datacenter cost goes to around $275,000 whereas S3 is $745,000. That's a
serious difference now. Ownership is a powerful advantage.

OpenStack Swift: I don't think it will ever be a good solution, based on the
project's history. It's unfortunate, but hopefully something else will come
along that does blob storage in a sane and scalable way. This is not rocket
science, but there are some devils in those details. If a mature open source
solution came along with immutable objects, erasure coding, background
scrubbing and sane ops tools, it would blow Swift out of the water.

Finally, I think it's a (common) mistake to think that you can get by without
ops personnel if you deploy on AWS. The actual site / dc ops component is
pretty minor if you build right. Either way, someone still has to run the
service, do capacity / project / budget plans, and other things. And with AWS
you need people who are capable of winning arguments with their support --
where you will burn substantial time proving where fault lies.

S3 does a lot for you, but it's still a service that you build yours on top of
and it's far from a perfect solution that never breaks. Buckets have to be
primed before heavy usage. Your file naming scheme will impact your
performance at scale. And so on. You need people to look at this, whether you
call them ops engineers or sysadmins or developers.

~~~
viraptor
I'm curious - since you went through trying out swift, when was that? Pre, or
post diablo? That was the time of a pretty big change.

Also, have you tried ceph at the same scale?

~~~
skuhn
Post-Diablo, and it never got to a point where I deployed it at scale and
found issues: it flunked out before that.

Haven't tried ceph, although I look at it every few years. It sounds promising
but it also sounds really complicated to setup and administer.

