Storage Pod 4.0: Direct Wire Drives – Faster, Simpler and Less Expensive

andrewmunsell · on March 19, 2014

Backblaze is great-- the product is fast and works well. But, there's one big issue that prevents me from using it-- they delete your backups of your external hard drives if they aren't plugged in for 30 days.

I have a drive that I may be away from (at least, my Macbook Air is) for weeks at a time, and I don't want to continually lose version history or have to reupload some of the larger files (home videos and my photos).

CrashPlan doesn't delete your backups after 30 days (they keep everything as long as you're subscribed), but the upload speed is so horrendously slow I couldn't upload the entire photo library plus videos (~300 GB or so) in the month I was subscribed. I'm on a gigabit network over Ethernet, so it's not my connection that's the issue (Backblaze, on the other hand, uploads this data very quickly).

I realize that they're trying to reduce costs by deleting data that doesn't seem to "be in use", but I'm sure a lot of people have "archival" data that they don't look at often, but really value.

Instead, I'm currently using Arq 4 (which is fantastic, by the way) with Amazon Glacier for the photos. It may cost something to retrieve them if my drive dies or is physically destroyed, but I'm not planning for that to happen. I have Arq set to use S3 for my other documents and such so I can restore versions whenever I need to, so it's really the best of both worlds.

ghshephard · on March 19, 2014

Agreed that this was one of the issues I had with Backblaze - I constantly would lose all of my external drive backups when I went on a business trip, come back, and all my backups would be gone and I would have to start over again.

I finally ended up just purchasing multiple hard drives and using superduper to clone/backup.

Crashplan wasn't an option - would take 60 days to do a 1 Terabyte backup, and I don't leave my laptop connected to the externals that long.

I'd be willing to pay more money to backblaze to have them retain my external hard drivers for a longer period of time.

stankal · on March 19, 2014

I have to agree that this is one of the biggest downside of Backblaze as I’m right now dealing with recovering data from an external drive that has failed. According to their support staff there’s no way for them to extend that period which makes me super nervous about getting the data and verifying it’s all good before they delete it (I did not notice the drive has failed until I got an email from Backblaze telling me that they will delete the backup in 15 days) Their 30-day retention policy on external drives defeats the purpose of having the backup.

The second problem with Backblaze is the recovery process: they decrypt the files on their servers and store unencrypted zip archive for 7 days or until you download it. I would much rather see them create a restore process where encrypted data is downloaded and then decrypted locally.

I’m evaluating Arq, and so far it looks very good. Upload to S3/Glacier is super fast.

fatrachet · on March 19, 2014

To everybody considering Crashplan, you need to be careful with them too, during the two years I've used them they twice deleted the backup of a external hdd I don't mount too often, even though I had it set to never delete files. All I got from their support was excuses.

This was over a year ago, not sure if it's still the case, just be sure not to rely solely on them.

rikkus · on March 19, 2014

With CrashPlan, I use their 'cloud' storage as just one destination, which files get to eventually. Backing up to a few devices I control is cheap and easy - and much faster.

adrianhoward · on March 20, 2014

"CrashPlan doesn't delete your backups after 30 days (they keep everything as long as you're subscribed), but the upload speed is so horrendously slow I couldn't upload the entire photo library plus videos (~300 GB or so) in the month I was subscribed."

That problem, at least, seems to have improved. I backed up ~250GB a couple of months back and while I don't recall the exact time - might have been several days - it was certainly well under a month.

dpe82 · on March 19, 2014

I love all the engineering Backblaze puts into this and their willingness to share their experience.

I've noticed Supermicro offers a 45-drive 4u chassis* that costs more than the Storage Pod's raw parts cost, but less than if you buy one preassembled by 45Drives. Does anyone have any experience with Supermicro's solution?

* Part: CSE-847E26-RJBOD1 (http://www.supermicro.com/products/chassis/4U/847/SC847E26-R...)

budmang · on March 19, 2014

Supermicro makes good systems that are very widely used and we love that companies are continuing to work toward more dense and less costly storage systems. When we started Backblaze in 2007, all the options were astoundingly expensive.

One quick note on this particular Supermicro system - it's slightly apples/oranges as the Backblaze Storage Pod is a complete server and this Supermicro system is a JBOD, meaning it still needs to be plugged into a server to work.

Gleb, Backblaze co-founder

existencebox · on March 19, 2014

As a note to this: while the above linked chassis is indeed JBOD, and one of the parent posts also mentions the 90 disk JBOD chassis, there's an intermediate option which thanks to some sort of server-geometry-tetris-magic is actually a proper machine as well as supporting 72 drives.

http://www.supermicro.com/products/chassis/4U/?chs=417

(Disclaimer/answer to the parent post: we use the 24 drive unit for the GPU compute nodes in one of our clusters, and the 45 drive JBOD units for storage nodes in the same cluster. We have had a very positive experience with both (to the point that I got the 24 drive one as my home fileserver), as well as the Supermicro customer support for such.)

nanch · on March 19, 2014

At KeepVault we've been using Supermicro since day 1 (pre-2007) and they've worked out very well. Supermicro is more expensive (on multiple metrics), but you're also getting different features like entry-level cost and power redundancy.

What happens when a Backblaze pod power-supply fails? I'd love to see a post about that. :)

David, KeepVault CEO

derekp7 · on March 19, 2014

> What happens when a Backblaze pod power-supply fails?

From what I understand, the whole pod becomes unavailable. Which is why you would use a front-end system to have redundancy across pods.

The way I would do it myself, is set up network connections between two pods, and use DRBD, along with clustering software for the iSCSI or NAS (nfs/samba) daemons.

e12e · on March 20, 2014

From an older blog, they hint at their architecture, basically storage over https -- with a little effort to make sure pods don't drop dead too easily (raid6):

http://www.backblaze.com/petabytes-on-a-budget-how-to-build-...

For replicating a similar architecture, I'd probably look at HekaFS/GlusterFS and/or CEPH.

dpe82 · on March 19, 2014

Thanks for the reply! Yeah, this is a JBOD of course. I was thinking of their 36 drive server cases, thought one was 45 and didn't look too closely when I found the link I thought I was looking for. :)

Now that prices have come down so much on the 3rd party stuff, have you evaluated / considered using any of it?

chx · on March 19, 2014

Supermicro is a very popular "white box" server manufacturer. The chassis you linked is much more of a "real" server: hot swap drives, hot swap cooling and redundant PSU. Also, Supermicro even has a 90 (!) drive 4U chassis and for what it does, it's quite cheap.

nanch · on March 19, 2014

I'm also inspired by how open Backblaze has been, one might even say they're "blazing" a trail. :)

I have a lot of experience with using the Supermicro chassis for storage. The question to ask yourself when deciding between "Supermicro or Backblaze pod" is: "Is storage density important to me?"

The final costs of a loaded Supermicro chassis vs a loaded Backblaze chassis are pretty close. In fact, the Supermicro may be a better option if storage density is not important to you (e.g home/office or inexepensitve city - http://imgur.com/gallery/QfD6qIw) But if your server is in a location where square-footage is expensive, the storage density is the primary metric you're looking to optimize for, since rent (and electricity) is your primary ongoing cost.

To run a scalable storage solution with the Backblaze pods you need GREAT SOFTWARE. Maybe a really well configured ZFS pool/custom software/and great sysops? As far as I can tell that's Backblaze's secret sauce.

For a company that wants to run a server that's going to be "pretty good" for what they need, using a Supermicro chassis with hardware RAID is definitely going to get you by for up to 96 TB.

gnoway · on March 19, 2014

This looks like a JBOD chassis. Am I missing something?

kyrra · on March 19, 2014

Side story: A company I worked for built a 60 drive unit similar to this. On some of their early prototypes, they could hit power issues when all the drives spun up at once (someone forgot to do their math). So all the development on it had to be done with 3/4 the drives it actually supported. Story #2: A 4U enclosure fully loaded with 60 drives goes beyond the shipping weight limit for Fedex and UPS. You need to either freight ship it or not ship it loaded with drives.

toomuchtodo · on March 19, 2014

Did their SATA interface not support staggered spinups? Almost all do.

kyrra · on March 19, 2014

I believe they did. It may have also done it when all drives were running at max speed... I honestly forget at this point (it's been 7 years or so). It was just the hardware guys screwing up on power distribution and calculated power draw from the drives. Luckily they were prototypes.

theandrewbailey · on March 19, 2014

Reading this, I realized more than ever that while SATA is a good hardware standard, they messed up the naming. All throughout this post, mentions are made to "SATA 3 throughput", which made me think they were referring to the one with 300 MB/s transfers. Then further down, it was made clear that it referred to the third SATA version, AKA SATA 6.0 Gbit/s.

At least we don't have to deal with master and slave jumpers anymore.

mikeash · on March 19, 2014

Could be worse. USB speeds go, in order, Low Speed, Full Speed, High Speed(!), SuperSpeed(??).

mkesper · on March 20, 2014

Remember "Ultra High Frequency"? What will USB27.0 be called, "AbnormalSpeed"? ;)

radiowave · on March 20, 2014

IIRC that one's going to be called "Ludicrous speed".

jobu · on March 19, 2014

Looking at the parts list I see the largest cost is the 2x 40-port SATA cards (aside from the drives). I wonder why they didn't decide to reduce the number of drives to 40 so they could just use one SATA card and save $688?

atYevP · on March 19, 2014

Yev from Backblaze here -> Density is key! Our biggest need is to fit as much data as possible on each pod for as little price. The cost difference between removing 1 SATA card or removing 5 drives is negligible, but the added drives at another 20TB to the pod.

dpe82 · on March 19, 2014

Why not use a smaller card (or the motherboard's controller) for the last 5 drives?

atYevP · on March 19, 2014

We're constantly tweaking the design, that might pop up in the future!

darkarmani · on March 19, 2014

If density is your biggest issue, why don't you boot from USB and fit a 46th drive in there using the boot drive? Does that put you over heat or power budgets?

fragmede · on March 19, 2014

or network/pxe boot

budmang · on March 19, 2014

That's an interesting idea. We considered trying to get rid of the extra card by mounting 5 drives either on the motherboard or on another card (and hope to experiment with these paths in the future.)

jccooper · on March 20, 2014

Rack space is an ongoing cost while the hardware is a fixed cost. Presumably they've decided it's worth having a slightly higher fixed cost per GB in exchange for a reduced ongoing cost per GB. (There's probably other benefits as well, such as reduced workload and just getting the space online now.)

Let's take a look...

They do have an oldish post on hosting costs: http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v...

A rack in 2011 cost them $2,100 per month (it's more now, guaranteed). Their racks look to hold about 11 machines. Each pod is then about $191/mo.

Each rack thus needs 1.375 extra pods at "low density". That's $262.50 a month. Since you now need about 10% more pods, you're paying about $29 more in hosting costs per month per pod.

Your savings are not $688, since you also need to construct 1/8 pod later. A low-density pod will cost about $2719, with is $340 split 8 ways. So real hardware savings is $348. Hosting costs eat that in exactly a year, and they are certainly expecting a longer life than that. (That's 2011 hosting costs, which are certainly lower.)

So it's worthwhile to spend a little extra for the additional drives. (Of course, if they could get the extra five in with more modest hardware, as seems quite possible, that would be a sure win. But sometimes ease of development/assembly/spare parts wins over hardware cost.)

Vvector · on March 19, 2014

That was my first thought. But the math backs them up. You lose 12% capacity per chassis. But you only save 7% per chassis ($688 off of $9305). So they would end up paying more for extra pods just to get the same storage. This doesn't include the rack space cost.

dcuthbertson · on March 19, 2014

Nitpick: In addition to the savings of $688 for the SATA card, you also have to include a savings of about $1400 for the five 4TB drives that won't be included. That bumps the savings on the pod to just over 22%.

Still, at least 12% extra rack and floor space to get the same storage capacity would probably cost them more in the long run.

greedo · on March 19, 2014

In the article, they say the average cost of their 4TB drives is $160, so the savings from removing the five drives would only be $900, not $1400.

on March 19, 2014

[deleted]

kgermino · on March 19, 2014

It's worth noting they only put 45 drives in the box so that extra $700 only gets you a 12.5% storage gain. However I suspect that splitting the drives between the two increases read/write performance and reliability as well.

nulagrithom · on March 19, 2014

"When a 5-drive backplane went down, we had 5 drives go offline. Now if a connector goes down, only 1 drive will go offline."

Wouldn't a backplane be more analogous to a SATA card than a single connector? In which case, 40 drives will go down on failure.

Neat stuff though. Would be fun to build one.

davis_m · on March 19, 2014

The former model still had SATA cards that the backplanes connected to. I think the biggest win for them is the fact that the backplanes were one of the more error prone pieces in the box.

On a side note, each of the new SATA cards can support up to 40 drives, but there are only 45 drives in the pod, so one failing would presumably take out 22-23 drives instead of 40. Still worse than losing 5 drives though.

justincormack · on March 19, 2014

They still had 3 PCI cards before, as well as the multipliers, so card failure is not much different. But port multipliers are all all the same bus, so more error prone.

ksec · on March 19, 2014

One of the thing i immediately noticed were the price /GB over the years. We had literally no decrease since the flood in 2011. So three years later we are still paying for about the same price per GB.

But that is finally going to change. As Google lower their price, i see this as HAMR finally coming. With both Seagate and WD promised to get 20TB HDD by 2020, and with a 60TB version after. We have a clear roadmap of what is coming along. Which means Google could now price their storage by it.

davis_m · on March 19, 2014

I'm sure that the move to direct wire was done purely because of the former backplanes being error prone, but the fact that they realize finding the parts is difficult for others and mentioning it in the blog post shows that the guys at Backblaze care about more than just making money.

atYevP · on March 19, 2014

Yev from Backblaze -> We do! In fact we make $0 off of the pod design and why we open sourced it. We want other folks to tinker, so we try to stay as open as possible about the design and building of the pods!

derekp7 · on March 19, 2014

Question: Has any of your design improvements come from the community as a result of being open? And do you end up getting better pricing from your suppliers because there is a bigger market for this design?

atYevP · on March 19, 2014

Yes actually! One of the first things that the community came back with after Version 1 was that the hard drives vibrated a bit hard for most use-cases. They recommended we use "hard drive vibration dampening sleeves" (rubber bands) to make the drive fit a bit more snug. It worked like a charm! We don't really get much better pricing, though sometimes we get a small discount because we buy some pods/parts in bulk, though not in extreme quantities.

e12e · on March 20, 2014

I really love these technical blog posts (and the designs, of course). Did you ever calculate something like mean-failure rate for pods? Such data might be helpful when evaluating if the architecture is viable for a certain use-case. I'm thinking both disk-loss (which you've blogged about before), but also aggregate failure rates (psu, controller cards etc)?

I'm thinking that for small-medium deployments, (with archiving in mind) three pods would be a reasonable minimum, possibly starting them out sparse (single radi6 to each pod, say)?

I'm aware there aren't any silver bullets :-)

atYevP · on March 20, 2014

Not the failure rate of pods themselves, but we do keep track of the failed drives within them. We did publish some blog posts about those stats, you can find them on blog.backblaze.com! The pods as a whole are pretty stable, them moving parts within them...not so much! Which is one of the reasons we went to direct wired connections!

vicaya · on March 19, 2014

Always love the practical hardware designs from these guys. I'm also interested in some software stories: Linux kernel/distro used, issues encountered/solved etc.

km3k · on March 19, 2014

I love the look of everything Backblaze does, but I don't use them because I need Linux support. I currently use Crashplan because of their Linux support, but I find the service mediocre overall.

3327 · on March 19, 2014

This is great, I might actually build one. I love to see a company emerge that doesn't take the status quo and hacks together a unique solution. This is innovation that I love to see.

NKCSS · on March 19, 2014

Small error in Appendix a; they list 2 zippy psu's (they only use one) and the total also only count them once.

budmang · on March 19, 2014

Fixed. Thanks!

pronoiac · on March 19, 2014

This is awesome! I like how they document everything. Tomorrow, I'm interviewing at a different place, which is trying to deal with their storage issues, and I was going to mention Storage Pod 3.0. So this is very timely for me!

Xorlev · on March 20, 2014

Just make sure they know they still need to manage data replication themselves. You plan on entire machines going dark, not just drives.

wilhil · on March 19, 2014

I'm so happy they give these plans for free. I have been trying to get involved with Open Compute for some time and just get no where - with these plans, anyone can get a head start.

Amazing and I hope they continue for a long time!

kabdib · on March 19, 2014

Not sure I'm okay with the single power supply. I guess if you can do redundant storage across pods it'd be okay. But if that single PSU goes under, you're in for a fun couple of hours.

mike-cardwell · on March 19, 2014

This is better than their previous POD though, which had two PSUs, but no redundancy. I.e, if a single PSU failed, the entire machine would go down as it wouldn't have enough power to run all of the drives.

Obviously, for them it is cheaper to live without PSU redundancy because it is a normal day for them to lose disks and PODs because they've got so many of them, and they designed for this scenario.

DonGateley · on March 19, 2014

What an amazing degree of disclosure. Tim Nufire is an engineer's engineer. He gets me to thinking of how I might be able to justify the cost of one. :-)