Facebook Opens Up Hardware World With Magic Hinge

notatoad · on Dec 31, 2012

The overuse of open-source is really starting to annoy me. It's not an 'open source specification', it's an open specification. Facebook didn't open-source a data centre, they published a couple of white-papers. Open source actually means something relatively important, unless you are publishing your source code under a permissive license, it isn't open source.

gizmo686 · on Dec 31, 2012

Open source can have as restrictive a license as you want, the point is that the source can be reviewed and improved by anyone, which leads to a higher quality product. Free software requires a permisive lisence, as the point is to protect the user's freedoms. Unfourtuantly, in practice both types of open source software look the same, so the labels get confused.

notatoad · on Dec 31, 2012

>the point is that the source can be reviewed and improved by anyone.

Exactly. And therefore it needs a license permissive enough to allow that.

gizmo686 · on Jan 1, 2013

The license could include provisions forbidding the distribution of the source and object code, and provisions saying that the original owner maintains all rights to their code and your code, and the source code may be used for the sole purpose of contributing improvements back to the original owners. In practice, no one uses that type of license, hence the confusion between free and open source.

frabcus · on Dec 31, 2012

gizmo686 - strange definition of "open source", sounds like one Microsoft would use. The definition I've been using for over a decade is the one from the Open Source Initiative http://opensource.org/osd

gizmo686 · on Jan 1, 2013

I have a feeling we just walked into a political minefield. Anyway, My understanding of the words is based largely on GNU [1][2]. The main point brought up [1] and [2] is that "free software" is to protect user freedoms, while "open source" software is about quality. In all cases, free software is open source, however, as I explained in my response to notatoad, open source software is not necessarily free, however it almost always is.

[1]http://www.gnu.org/philosophy/open-source-misses-the-point.h... [2]http://www.gnu.org/philosophy/free-sw.html

jpitz · on Dec 31, 2012

Actually, the OSI has approved two of Microsoft's licenses.

http://opensource.org/node/207

sp332 · on Dec 31, 2012

The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.

So the terms of the original software might restrict creation or distribution of binaries, or anything else.

bcoates · on Dec 31, 2012

I thought the state of the art for large storage arrays like this was still install-and-abandon, where once the units entered into production they were never touched because physical maintenance had a tendency to cause more failures than it fixed. Dead drives get left in place until the whole thing is either too obsolete or has too many failures to justify its floorspace, then the whole thing gets decomissioned at once.

On a system like that, what would be the point of a fancy hot-swap rack? Do modern large storage arrays do constant maintenance of failed or failing drives? Or does Facebook use hotswapping for something else?

EwanToo · on Dec 31, 2012

The only storage array I know of that comes as a sealed unit where you don't do drive swaps, is the Xiostorage ISE [1]. It's a very fast array for the money, but it's not particularly high-end.

All the high end arrays, both in terms of performance and availability (IBM DS8000, EMC Symmetrix / V-Max, etc) have done drive swaps for as long as I've used them.

Without drive swaps, you either have to abandon traditional RAID concepts, which might not be a bad thing, or ship with enough standby disks to cope with 5 years of failures.

1 - http://xiostorage.com/products/hyper-ise/

regularfry · on Dec 31, 2012

The place I heard first not doing drive swaps was Google. I also heard that they didn't use RAID, but rather relied on data just being replicated over 3 separate machines. Both of these may have been apocryphal. I'm sure someone around here can correct me if so :-)

justincormack · on Dec 31, 2012

I first came across them in IBM's ice cube project around 2002 http://www.eetimes.com/electronics-news/4165609/IBM-stacks-h... which was later spun out into a startup and then canned. I think this was before the Google fail in place stuff. Because it was a 3d mesh you actually couldn't get to the hardware to replace it. They had blocks of 12 drives to offset some of the drive to CPU cost issues. There are a bunch of papers on it if you dig around and its still an interesting model I think.

EwanToo · on Dec 31, 2012

Google (and others like AWS) don't use RAID, that's absolutely right - but they also don't use arrays from the regular manufacturers like EMC, NetApp, etc.

Whether they eventually get around to swapping out failed drives or not, I don't know. I assume they do, since a CPU and memory costs a lot more than a drive replacement, so you want to keep them up and running.

TillE · on Dec 31, 2012

I imagine they'd swap out the whole blade as a quick fix, but then have it diagnosed, repaired, and put back into service.

Makes more sense than trying to fiddle around with a defective server while it's still in the rack.

bcoates · on Dec 31, 2012

Right, that's the kind of system I was thinking of: Not off the shelf enterprise RAID enclosures, but big piles of commodity drives run by companies that do storage as their business, like Google and Backblaze and Facebook.

lsc · on Dec 31, 2012

>where once the units entered into production they were never touched because physical maintenance had a tendency to cause more failures than it fixed.

Well, to make this happen, you can't use conventional RAID (well, you could use conventional raid, and just set, say, five spares, but once you are out of spares, the array would be binned)

The thing is? it's very rare to take out an array when swapping a drive. (I've... actually done it quite recently, but that was a combination of me making an extraordinarily unwise choice (Oh, I can get by with a used chassis/backplane, even though it's a brand I don't normally use, and a design I'm probably not qualified to re-qualify, no problem!) and me being an idiot when it came time to force-reassemble the RAID (I still don't have good docs on this, but one of my people now understands the problem of what data is on what drive, and what order to list drives during the force very well. I just need to get them to document. Or, really, I should sit down with them, really understand it, simulate a similar failure, and then document it myself. For now we've instituted a policy that two people need to sign off before using force with any mdadm command.)

I mean, if you are using conventional RAID, you need to keep spares. heck, you could run a 36 bay raid, and just designate, say, 4 spares, and 90% of the time, you'd want to replace the whole chassis before you ran out of spares. But now we're paying what, another 10% for disks? (the disks completely dominate the cost of cheap storage arrays; really nice 36 bay chassis[1] with room for a motherboard cost well under $1500. for a few bones more, you can get a similar chassis with 45 disks and no slot for a motherboard... And, of course, you can go way cheaper than that if you are willing to resort to disks that can't be swapped... but my point is that the cost of the chassis, even those nice supermicro chassis where all disks can be swapped without de-racking the thing, is already way below the cost of the disks you are throwing in it, so buying more disks in exchange for getting a cheaper chassis is likely a false economy.

I mean, you do have a point when it comes to labour... it is a big deal to get someone to swap a drive (and if you don't have a spare, the swap needs to happen right quick)

and yeah, swapping drives isn't without danger.

Now, the economics of this changes if you have some CEPH like system where you can fail an arbitrary number of drives in one chassis and still have the good drives function. But those systems are all relatively new, and have quite a lot of complexity overhead. I mean, in theory, if you have an array half full of good disks, with such a system you could migrate that data to a new array completely full of good disks, then remove and refurb the array that is half bad, but want to talk about complexity and chances to screw it up? yeah.

Also note, most drives I buy come with a warranty, meaning a bad disk is a token that is good for one completely free disk. 'enterprise' disks don't fall in price nearly as fast as consumer grade disks (yeah, go find me a 500gb 'enterprise' 3.5" 7200rpm disk for under $70 that isn't used or refurbished. Yeah, that's what I thought.) And often, if you warranty a really obsolete drive, you get back one that is fairly newish. I've warrantied a bunch of WD re3 drives and gotten back re4 kit (difference is that re4 has larger cache and fewer platters, meaning fewer/lighter r/w heads, meaning better seeks and, of course, fewer platters means less spinning weight and thus less power consumption.)

Of course, that's not a factor if you buy your drives without warranty. I don't know any way to extract even the shipping cost out of bad drives that have no warranty. (If you do, lemme know; I'm giving 'em away right now. Hell, half of 'em could be resold by unscrupulous folks; I discard drives once they start showing uncorrectable read errors, even if there is enough space to remap the bad sector, while some people don't replace the drive until they run out of space to see bad sectors (e.g. you will then start seeing /consistent/ bad sectors across badblocks runs.)

[1]http://www.supermicro.com/products/chassis/4U/847/SC847A-R14...

regularfry · on Dec 31, 2012

In my experience the most common way to kill an array (barring dodgy hardware RAID) is yanking the wrong drive. Yes, it happens.

lsc · on Dec 31, 2012

Sure, but if you don't screw up the force rebuild, the data is still there.

I mean, if I've got a RAID5 with one bad drive, and I pull one of the good drives, the thing is going to immediately hang hard. You can boot the thing into single user mode and force-reassemble it, and you get the data back (well, mostly. you certainly would get all the data loss you'd get when yanking the power)

That said, all my arrays are lit all the time, meaning that even without the red 'bad drive' lights (which I haven't gotten working yet with md) if you just avoid pulling drives with active activity lights, you are good.

Another trick I've tried is labeling the face of the hot swap caddy with the last 4 digits of the serial number of the hard drive. When I put in the ticket to pull the drive, I mention the last 4 of the serial.

(but that hasn't really gotten off the ground, mostly due to a lack of level 1 lackies.)

Right now I do most of the hard drive swaps myself, so it isn't a huge deal, but it is something I devote quite a lot of thought too; if I could use remote hands, I'd be ahead of the game, but most datacenter remote hands folks... well, lets just say that they seem to see 'foolproof' as a challenge.

jcase · on Dec 31, 2012

Short demo video http://www.datacenterknowledge.com/archives/2012/06/27/video...

neonkiwi · on Dec 31, 2012

Single-page link: http://www.wired.com/wiredenterprise/2012/05/facebook_storag...

ck2 · on Dec 31, 2012

I hope those hard drives aren't spinning when you move them on the hinge or the heads will bounce off the platters.

jacquesm · on Dec 31, 2012

That was exactly what I thought. bump. $6K down the drain. Presumably the whole thing powers down before you can do the slide out for a swap.

Firehed · on Dec 31, 2012

Never used a laptop? Hard drives seem to be pretty good at parking heads when a shock is detected these days.

nwh · on Dec 31, 2012

I'm not sure that desktop/server drives would even have accelerometers in them.