Hacker News new | past | comments | ask | show | jobs | submit login
How a Norco case killed 13TB of our data (wsyntax.com)
84 points by cyanoacry on Nov 3, 2012 | hide | past | favorite | 47 comments

Specs like "6A MOSFET" don't exist in isolation. The current rating of that MOSFET is due to the on resistance(Ron), the resistance between source and drain. As current flows through the FET this resistance causes power to be dissipated as heat. The 6A number comes from how much heat the junction can handle without frying and is measured under a set of assumptions.

Usually testing is done at room temperature (25C often used) with the best possible heat-sinking and ideal Vgs. Most MOSFETs of this type actually tie the internal thermal components to the drain pins. Its intended that the PCB designer use a large copper pour for the drain pads, this copper area acts as the heatsink for the chip, transferring heat away from the junction.

In other words, the MOSFET can only handle 6A if ambient temperatures are kept at room temp (usually means fans on it or its wide open to a room), there is adequate heat-sinking provided by the PCB design, and Vgs is high (higher Vgs reduces Ron). Further its important to remember that FETs have a "run away" characteristic in that as junction temperature rises so does Ron, which creates more heat.

So, there is a lot more to the functional rating of a MOSFET than the 6A on the datasheet. In reality you would never want to nor expect that FET to hold 6A. With just a cursory look at the PCB layout and previous experience with what the ambient temperatures tend to be on back-planes of server chassis loaded with many drives, 1A would be pushing it, so I'm not at all surprised that these things blew.

Perhaps the most WTFery aspect of that PCB is the lack of any suitable capacitance near the SATA connectors. If you aren't using staggered spin up and many drives were spinning up at the same time the rail voltages can dip which in this setup causes Vgs to drop, increasing Ron at the worst possible time as mechanical drives draw a lot more current at spin up. This could trigger thermal run away.

Came here to say essentially the same thing. In designing a 400A speed controller for my Battlebot I got to learn all about MOSFETs and their current ratings.

In this case the MOSFET is rated with a max current of 5A @ 25 degrees C if the gate is driven with a 5v signal. It has an 80 mOhm resistance when fully enhanced. So at 5A that is 2Watts of power dissipation (p = i^2r) and since the case has a thermal constant of 62 degrees C/W that means the case will be at 149 degrees C (25 + 62 2W) which causes Rds(on) to double and that 'explodes' the MOSFET. If the device was successfully carrying .5A loads then then the case is only good for about 20mW of power dissipation (which looks reasonable given the package size). So gluing a copper heak sink to the FETs with at least 2 sq inches of surface area would probably keep them alive with a .75A load. If you combined that with about 150 CFM of air flow at sea level (225 CFM at 8,000 ft) you'd stay pretty solidly inside the 'not to be exploding' parts of the parameter graphs.

> as junction temperature rises so does Ron, which creates more heat

What's interesting about this is that bipolar transistor has the opposite property -- as it heats up, resistance decreases -- and yet this opposite property causes the same problems with thermal runaway. (Okay, when talking about bipolar transistors we don't really talk about "resistance", but if you know that then you probably know what I'm going to say anyway...)

When hooked up to a load with a relatively constant voltage, there will also be a relatively constant voltage on the transistor. As the bipolar transistor heats up, the resistance decreases and more current flows through the transistor, and the transistor will dissipate power according to the law P=V^2/R. So once a transistor gets hot, it gets hotter until it blows.

The question is, "what kind of load looks like a voltage source?" There's actually a quite common load -- any amplifier with parallel output transistors will look like it's driving a constant voltage load, from the perspective of one of the parallel transistors. Basically, hooking up bipolar transistors in parallel does NOT multiply the power rating as you would think, because thermal runaway might cause one of the transistors to dissipate all the power.

The FETs blew as soon as we turned on the case, oddly enough, and even if we had a single drive plugged in, that FET would blow by itself. This is even the case when we took the backplanes out and started testing them individually in free air in the server room, where ambient is more like 18C instead of 25.

Is the design solution to use a FET with a lower Rds at 5V Vds for smaller heat dissipation?

But more importantly, is there a standard current that most chassis are tested to? I would expect all SATA hot-swap bays would support all SATA drives on the market, since nobody ever gives power dissipation or consumption figures.

> Is the design solution to use a FET with a lower Rds at 5V Vds for smaller heat dissipation?

Honestly - Its to not cheap out on the design and use a proper hot-swap controller.

Examples (many companies make these): http://www.ti.com/ww/en/analog/power_management/system-prote...

You can rig up something slightly better by designing a current sense circuit with feedback into the FET gate if your adventurous. In either case the goal is to limit and control inrush current during power on or disk insertion.

Based on what you've said I doubt the steady state current of the disks is the problem at all. It just sounds like the 3TB drives have a higher inrush current during power up and its either high enough or lasts long enough to blow the FET junction.

The funny thing is, I'm not sure that the hot-swap circuitry is even necessary in the first place due to the connector design.

The SATA spec defines "pre-charge" voltage pins that are connected after ground, but before any of the other voltage pins are connected. The idea is that by inserting a small (10 ohm) resistor, you can limit inrush current to a tolerable value while capacitors charge and regulators start up, and then when the other pins connect a couple of milliseconds later, the drive gets the proper low-impedance power connection.

Do you know if implementing pre-charge via connector mechanics obviates hot-swap protection circuitry? Supermicro seems to have hot-swap protection on their backplanes as well, but I haven't had a chance to closely inspect it.

It can partially solve the inrush current issue. There are still timing issues, if you slam a drive in you can shorten the timing of pin contact so much that its not effective. You also gain a real benefit in that one drive failing by something major, like a straight short, won't take out your entire system as the controller will detect the current spike and cut off the drive. Most hot-swap controllers also provide additional protection against things like pin reversal, ESD and accidental shorts on insert that you can't solve with just the connector. It all comes down to how robust you want the design really, how many failure modes you wish the system to survive.

EDIT: Those connector pins aren't normally solid gold. They are deposited metal (copper or tin) with electroplating of a few microns of gold on the surface.

Another reason to control insertion spikes is that when the first power pin hits you'll get a little arc (spark). This can cause small damage to the pins in the form of small chipping of the coating and/or carbon deposits(or oxidation of some metals). This contributes to a reduction in the number of insertions cycles the connectors will survive.

I've done some work with high current motor control boards - one of the problems we've had is '100 Amp' boards with capacitors on - when first plugged in, they draw more than 100 Amps as the capacitors charge.

If you have spare boards and a hankering for destruction, you could repeat your test with resistors instead of hard disks, to verify the current drawn by the hard disk is as advertised.

My thought was that even though it was the Norco hardware that was blowing out, it could be the 3T drives are the thing that are somehow exceptional. Their rated current is 0.75A, but what do they draw and for how long in the initial power on surge?

Well, there are a couple of surges not dealt with here. The first and likely highest current spike is the charging of all the capacitance in the hard drive. I've never looked but I would imagine the motor driver circuit and the electronics have fairly large capacitors on the input power rails. This should be relatively short lived but higher Rds from a crappy MOSFET would increase the charging time. The second is everything powering on, there are likely secondary regulators for the electronics (probably needs 1.8 or something not 5 or 3.3) and they could be regulating the 12V rail to something more controlled for the spindle motor as well. Those switchers coming online and charging capacitors on the output side will cause a bit of a spike also. Then there is the spike from firing up the spindle motor and the head servo, likely not insignificant. I have no idea if there is an inrush current / duration limit in the SATA/SAS spec or not, I would think so but I've never seen a drive manufacturer quote these numbers.

The thermal runaway is the "fun" part of the Tesla motor controller, right? It's got 3 MOSFETs, since there isn't a single one with enough power handling, and thus if one of them sucks a little more, it'll end up with more power, which will make it suck more, which will end up with more power...

Bipolar transistors suffer from the thermal runaway of parallel devices you describe, but not MOSFETs.

MOSFET on resistance increases with temperature, so with multiple devices in parallel the hottest one will flow the least current. This is an important effect even within one device, which is actually an array of thousands of junctions.

A related "gotcha" is not driving the gate hard enough. The 6A rating is also predicated on driving the gate hard enough to drive the on Rds to its minimum value.

I would be suspicious of this, in conjunction with inadequate heatsinking (i.e. heavy copper pads under the FETs) especially if the FETs are being driven by 3.3v. Looking at the specs, the FET is rated at 50mOhm given 4.5v gate drive - very acceptable - but if the FET is driven by 3.3v, it will have much higher Rds (may be running in the linear region which would be very bad). Note that the gate threshold voltage is 1.5v typical but 3v max so driving with a 3.3v logic signal would be marginal in the worst case situation.

In this application you typically use p-ch mosfets with gate pulled down to turn on, a resistor will provide pull-up of the gate to turn off.

So Ugs is the same as the rail voltage (3.3, 5 and 12v).

Those failures don't really look like moderately too much power dissipation. I'd be worried about static at the drain or counterfeit FETs. I suppose the switcher in the drives could just be drawing several amps as it's input voltage decreased and maybe those FETs don't have the best thermal resistance, but sheesh.

Oh I see, 62 K/W junction-to-ambient for an SO-8. Whereas the D2PAKs I've had desolder themselves are 1.5 K/W junction-to-case.

I'd honestly never heard of Norco, but that's because Supermicro tends to be awesome.

They're essentially the OEM for a lot of Intel developer stuff now, too. Over the past 10 years, they went from "yet another commodity motherboard manufacturer" up to being the big white box option. They have basically replaced Dell and HP as the go to vendor for a lot of big deployments, although Dell and HP do better financing, Dell and HP have a few higher end products, and Dell/HP have better desktop/workstation/laptop products if you want a full suite from one vendor.

You'd hear of Norco if you're on a budget, they're one of the many cut-rate brands. Supermicro charges roughly three times the price for enclosures, which do get the job done if you aren't bitten by firmware issues, but are by far not a budget brand compared to the competition out there.

I've actually considered some Norco enclosures in the past for a hobbyist project, but I had concerns over their hot swap backplanes -- mainly, a lot of reliability issues reported. Catching on fire pretty much settles it!

I've used http://www.aicipc.com/ for low-end. Or Chenbro, or Antec. (and eBay, or craigslist, or otherwise sourcing used stuff getting thrown out at a datacenter)

Thanks for the AIC pointer, it's hard to find much useful info when you're essentially ordering equipment from the finest brands DealExtreme has to offer.

Besides Supermicro and Chenbro and Dell and HP and IBM and (whats left of) Sun, who else is there to buy from? There doesn't seem to be any other brands anymore.

ASUS makes server mobos, but not cases (I'm not sure who's cases they're using for barebones units), and Antec makes low end server cases, but no barebones, and Intel makes pretty horrid server mobos but use someone else's cases for barebones.

Intel's someone else's cases = Supermicro, usually.

There's Penguin (Linux/HPC), iXsystems (FreeBSD/storage),and Tyan for "server barebones."

Usually I'd just pay these guys to build something: http://www.computerlink.net/

I'm pretty sure iXsystems uses supermicro cases as well, they just put their branding on them.

Apparently, Cisco also makes servers these days. No clue how good they are in any department (performance/reliability/costs), but they exist.

This may be a newbie question.

Article states: "... it should work fine in any metal box, right?"

But then says the case had some electronics which failed. Which means it wasn't just a metal case. Anyway TIL that server cases aren't just metal boxes.

And in case somebody wants to respond, how important is vibration dampening to ensure hard disk reliability? What's the best way to damp vibrations in a consumer tower desktop case?

So 10 years and a few employers ago, we had a case of a few "haunted" server chassis. Hard drives would fail on these chassis very frequently, and when a fresh drive was swapped in, it would take many days to rebuild the RAID, if it ever rebuilt at all.

Putting the RAID set in a new machine, it would rebuild fine. But in the original machine, we swapped out the raid controller, CPU's, even the whole motherboard, and the RAID sets still would not rebuild.

Long story short, each of these "haunted" servers had a bad fan that was causing a lot of vibration within the chassis - enough physical vibration happening that the hard drives were essentially rendered inoperable.

The moral of the story is to make sure you have good vibration dampening on your fans, and to use the sensors monitoring to alert you if the fans are going bad. (Even this is not perfect, since sometimes the fan gets off-kilter but is still happily spinning at 10K RPM. The first thing we did if we got an alert for a disk failure was to replace the fans and attempt a RAID rebuild before touching the "bad" disk)

This wasn't a Sun E450 was it? We had one (of a "matched" pair) that was "haunted" as well. Drives died, Sun replaced drives. Drives died again. Sun replaced SCSI controller and drives. Drives died again. Sun replaced motherboard, SCSI controller, memory, and drives. Drives died again, and we make the (at the time) scary move to Pentium III app servers, which were inexpensive enough to triple up compared to SPARC, but even better, drives didn't die.

We swapped out the E450s for 440s for Oracle when we moved to InterNAP, and all seemed to be well.

Hearing your story, I wouldn't be surprised if we had just enough/wrong vibration in the case to make it go Tacoma Narrows on us.

These haunted servers were actually supermicro barebones chassis.

It has been a (long) while since I have seen the inside of an e450 but iirc there were a bunch of fans in trays in there. So it is certainly possible that the vibration did bad things. I still carry one of the e450 era keys on my keychain as a momento.

Vibration dampening is this important --> http://www.youtube.com/watch?v=tDacjrSCeq4

I think it's more 5-platter 7200rpm drives vs. 3T drives, inherently (I think there may be a 4-platter 3T now). An old 5-platter 75GXP might kill it, too.

This is why I avoid 4 and 5 platter drives like the plague. They tend to overheat and die faster no matter how you try to cool them, and I'm not sure why: I'm pretty sure the drives aren't intentionally defective, and I doubt the manufacturers haven't compensated for the additional load on the spindle motor and the power bus.

The enclosure isn't usually something you think about that much as long as it has the requisite number of bays. The prospect of hooking everything up only to see the magic smoke get released is terrifying.

Everyone learns the cheap hardware lesson at least once.

"1% less downtime means 87 hours per year. Do you think Lenovo is 1% better than Acer?" <- what I say when I encounter a client that hasn't learned this lesson, and insists on low end gear.

It's not clear what you mean. If we have two components, one of which is 99% available and one of which is 100% available, the second is not 1% better than the first. It is infinitely better than the first. It's well understood in reliability engineering that, at least in the long term, the cost of availability increases exponentially as the goal approaches 100%.

Buying more expensive hardware (with better per-component availability) is one approach to a full availability picture. However, it's one with significant shortcomings. At the top end, it's very expensive. Across the board, it only addresses some possible threats, and completely doesn't address others (like natural disasters). The alternate approach is redundancy. That approach is not without it's shortcomings, including increased system complexity and the problems of failure detection and failover. However, it's generally lower cost and is more robust to multiple types of threats.

This is why RAID is so successful - because it is a reliable set of techniques for building highly-reliable systems out of unreliable components. Again, it only protects against one threat (drive failure) and not others (fire), but it's proven superior to the alternative "buy a single gold-plated drive" approach.

Low-end gear isn't for everybody. Cutting corners in the wrong places is dumb. On the other hand, high-end gear only saves you in some ways, and those ways may not matter to your business. If you want high-availability or high-durability you need to know to things: how much your data is worth, and what you are protecting it from. Until you know these answers, you're going to be making poor system design decisions.

I understand the math lesson and how it costs more to add each 9 to your uptime. All that will not help you convince someone who doesn't know these things. Its the difference between a sales ptich and an engineering discussion.

Redundancy is a great tool but its not a panacea. You mentioned many of the drawbacks yourself. I see too many people building gigantic RAID arrays with consumer hitachi drives and questionable controllers. It's exceedingly easy to think you're adding redundancy when you are just creating a house of cards.

We have a Norco 24-E which is not a system case, but an external drive enclosure with a SAS expander. I don't know if its the same backplane but I don't know why it wouldn't be. This article scared me because we have 24 SAS 15k rpm drives on it.

Then I checked the drive spec and saw that we are pulling 0.8A on the +5V rail and 1.2A on the +12V rail which is even more than these 3TB SATA drives without backplane issues for months.

We did have to upsize the case's PSU from the shipped 500W though as beyond 15 drives we got voltage warnings from the expander.

So I think YMMV.

I set up a Norco 4224 for backup purposes and plugged in a couple of 3TB WD green drives, and have not had any issues at all. Maybe the WD green drives don't require as much current as these ones?

Temperature can be just as important as current draw. The amount of current those MOSFETs can handle will reduce fairly rapidly as ambient temperature increases and green drives tend to run much cooler than high speed 7200rpm drives.

Thats the main point of the green drives- less power, less noise, less heat.

The main point of green drives is to make people feel better about the fact that they just bought a sloooow 5k RPM drive.

We actually used to use Samsung EcoGreen drives in order to get more storage for cheaper, since our main use was backup: write once, read maybe. Unfortunately there don't exist 5400RPM 3TB drives; if they were as cheap as the 3TB Seagates, we'd buy them for sure. The 5400 rpm Ecogreens used to be the cheapest 2TB drives on the market by a large margin.

I never expected that paying more for a slower drive (but better power and less power-related issues) would actually make sense!

This is terrifying to me because I have the 3U version of that exact chassis and I was just thinking about upgrading from 2TB to 4TB drives.

Penny wise and pound foolish, to be sure, but does the backplane work with 2T drives? There seems to be a lack of conclusion in this post.

Yes, we've had 2T drives in our cases for about a year now, with no issues. Our 2T drives are rated at 0.5A on +5/+12, while the 3T drives are rated at 0.75A, so we're suspicious of this difference.

(For what it's worth, the datasheet for the MOSFETs used are rated at 6A; it's possible the manufacturer got it mixed up with the 0.6A batch of MOSFETs. I don't have any hard evidence though, so I'd rather not post speculation.)

> 0.6A batch of MOSFETs

lold hard. transistors are not resistors. peak current is not just a variable parameter.

I suppose this was due to bad contact somewhere (socket or HD) and transistors working in linear mode for more time than they can withstand, overheating and boom...

EDIT: or may be they used a crappy hotswap controller (if they actually used one) which cannot pump enough current into gates of those mosfets.

Er, I should say "tested at 0.6A" MOSFETs. If they have thin bond wires or something, it's plausible that excess current just vaporized them. If they never tested them at 6A, and skimped on the gauge...

I also saw a couple instances where the trace to the gate blew up (like, black board, no copper, looks like a burnt fuse), so it's possible that static during manufacturing killed their FETs even before they shipped. (We're pretty good about static, and if it was our fault I think maybe one drive/channel would die, not several)

> Er, I should say "tested at 0.6A" MOSFETs

yes, that would be better. in reality parameters of transistors vary. but when it is off by an order of magnitude these parts (and actually the whole batch) go to trash.

> trace to the gate blew up

if it really was trace to the _gate_ then it might be blown after mosfet itself melted. Because even really fat FETs (with high gate capacity, D-S rated to hundreds of amperes) cannot do this to copper trace.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact