Hacker News new | past | comments | ask | show | jobs | submit login
Something is burning in the server room; how can I quickly identify what it is? (serverfault.com)
200 points by usea on Apr 5, 2013 | hide | past | web | favorite | 120 comments



You can tell from the responses who has a real DR plan and who are just winging it. The DR-backed commenters can switch to site B with nary a worry. Everyone else is trying to justify keeping the server room running while something is burning. To them a misplaced backhoe is a bigger problem than you know a server burning.

I love the sanity check part. But really, you're keeping someone on hand to drag your ass out after you pass out while you sniff up toxic fumes.


A million times this. If turning a whole datacenter off is catastrophic for your business, you've not done your risk management properly. Single events that take down datacenters happen. Not being prepared for them is unforgivable.


Even with DR, it has a cost. Especially if you're a colo provider, powering off your whole facility (or, ideally, at least a room) to find smoke is going to have cost, even if all your customers have DR plans.

In most datacenters I've seen, I'd probably be willing to do a run through with IR cam/temp probe, or just visual inspection, with a handheld 1211, especially if I had a respirator, if it were just "smell of smoke". Clear view and path to two exits, someone at the EPO switch, etc.

The "big scary things" are battery plant and generator plant, and any kind of subfloor or ductwork. As long as the fire isn't in any of those, it's far less of a big deal. I probably wouldn't EPO a room for a server on fire, either -- just kill the rack, which takes slightly longer.

I've been in places where "smell of smoke" was a fucknozzle smoking a cigar or burning leaves outdoors outside an air intake, and another where it was a smoker's coat being put on an air handler.


The great thing about DR plans is... when implemented correctly no one should have to risk their life avoid using it. He didn't say that one of his servers was running a little hot (which happens, a lot). He said there was smoke and the acrid smell of something burning. Which means that one of his components actually got hot enough to ignite.

If you're not ready to use your DR plan it probably means your DR plan is inadequate to begin with. Why the hell do fire drills? Even cruise ships do drills. God forbid they pull their passengers away from that very important game of Texas Hold'em.

I probably wouldn't EPO a room for a server on fire, either -- just kill the rack, which takes slightly longer.

You fail to understand how fires start or why they spread. I mean why the hell do datacenters spend millions of dollars on fire supression when an IR cam and a handheld extenguisher is just as good, right?


Essentially no one does "EPO drills" on their datacenters. Particularly in multi-tenant environments like commercial colocation centers. It's quite reasonable for your DR plan to involve a $200k+ cost per EPO pull or DR failover. Your business should have DR provisions, and you should test the DR plan, but it's probably not reasonable (or legal) to do a full test involving dumping agent, rapid power off, etc.

The fire suppression exists for two reasons. One, is to get code exemptions to be allowed to run wiring in ways which would otherwise require licensed electricians to do every wiring job, and prohibit people from being in the facility. Two is to detect small fires early, and to prevent their spread, as well as to protect facilities from catastrophic facility-wide fire.

Servers are just not that high a fire risk, particularly when de-energized. Generally inside a self-contained metal chassis, less than 100 pounds each, metal/plastic, etc. The power supply is the most likely component to start a fire, and contains a max of maybe 250g of capacitors and other components. The risk of one server catching on fire is low, and the risk of it rapidly spreading to anything else is low, so yes, I'd be comfortable pulling a single burning server out of a de-energized rack.

Also, in big or purpose-built facilities, those components most likely to be fire risks (batteries/power handlers, and generators) are in separate rooms, separated by firewalls from the datacenter. A fire in the battery room is going to be dealt with by sealing that room and powering it off, dumping suppression agent, and bringing out the FD immediately.

Life safety is much more important than business continuity, but a lot of people have jobs where they accept a non-zero risk of physical harm to do their jobs. It's certainly not reasonable to demand a datacenter tech go into a burning building to rescue a database server or something, but approximately zero datacenter staff I know would have a problem with assuming the level of risk I would to find problems. (it's probably a bigger deal for employers to actually discourage risk-taking by employees, particularly when it's risk-taking to save themselves effort, like single-person racking large UPSes or very large servers, etc.)


I think we are aiming at the same thing here. A proper, multi-tenant datacenter will have separate zones for generators, UPS, electrical, and climatisation. The actual chance of a fire starting and spreading in this type of configuration is low and this is the environment I prefer to work in. I've also worked in server rooms in 100+ year old buildings which did double duty as storage/broom closet. The original post was closer to this since they had racked UPSes next to their servers and network equipment. It apparently caused enough smoke to fill a server room and make the poster nauseous, which makes me wonder what air handling capacity they have. It's this type of "datacenter" where you have to worry about your life.


Even cruise ships do drills. God forbid they pull their passengers away from that very important game of Texas Hold'em.

Well, US flag passenger ships (among others) are required to hold Fire and Emergency drills at least once every week. But your point stands: it's only by having a plan, and executing that plan even if it's prefaced by "this is a drill..." is crucial if you want to have a hope of things going the right way in an actual emergency.

Here's the thing, though. If your people are properly trained in what to do, and how to use the equipment, then shutting down power to the rack and extinguishing fire on a single server with a fire extinguisher may be a reasonable course of action. But that contingency should have been considered ahead of time and be part of the emergency plan.

The time to decide what to do is not during the emergency.

As for walking around a smoky room looking for the source, that's nuts. I spent one long day (way too short, though) time at Military Sealift Command Firefighting School. One of the first things they do is put you in a room full of smoke and make you count out loud. After about 30 seconds you feel like you're going to pass out -- that gets the point across much better than lectures ever will.


But it happens.

Our colo went down. Fire system triggered it (no actual fire was harmed in the triggering, however). On Thanksgiving. When I'd volunteered for on-call seeing as my co-worker had family to attend to and I didn't.

He thanked me afterward.

Our cage was reasonably straightforward to bring up, once power was restored. The colo facility as a whole took a few days to bring all systems up, apparently some large storage devices really don't like being Molly-switched.


Yeah. So estimate the cost there are like $100-250k? I'm willing to accept a pretty low risk to my life for ~15 minutes of searching to save my company $250k. It's a risk on the order of riding a motorcycle from Oakland to San Jose in rush hour, I'd roughly estimate.


"I can totally reach into the back of this gigantic, flesh eating machine to move that widget a little bit to the right. Management might give me a raise for saving them money!" -famous last words of a former factory worker.


"Do not look into laser with remaining eye" is the classic, though.


Enjoy that cancer from burning and/or airborne plastic/arsenic/mystery material in 20 years.

And which part of Oakland? That can be a pretty broad range of risk. :-)


"You wouldn't download a new lung, would you?" (RIAA ad in 2033). I lived in a tent 200m downwind of a 24x7 burning trash dump on a former Iraqi and then USAF military base, for some time, so I think I've got particulate risk checked already.


Ugh, yeah, I'd say you're, umm, covered.

Edit: It helps that you're the CEO of your company. The risk profile changes a little bit. :-)


My condolences... I highly recommend eating healthy, otherwise your odd of getting cancer in the 40-50s is more than 30%.


I think I actually hope my odds of eventually getting cancer are ~100% over my lifespan, because it seems to be a natural consequence of living long enough. I also hope that by the time I have cancer of any size, it is something you can treat fairly successfully.


As long as they treat cancer as a profit center, a cure will never surface in the US of A. "Treament" is a multi billion (trillion?) business, a cure would reduce that to dust.

Also, you should look up the agony, people would rather shoot themselves than take the "treatment"


Citation, please.


I believe there was some talk of a lawsuit: http://www.armytimes.com/news/2008/12/military_kbr_lawsuit_1...

http://www.lawyersandsettlements.com/lawsuit/open-pit-burnin...

I don't think the situation was that bad. The one really unforgivable thing was shoddy electrical work in shower trailers (I think ~10 contractors and soldiers were fatally electrocuted while showering while in Iraq! I certainly got 230v a couple times and went through the reporting process, and actually got MPs and a friend from Contracting to turn it into a bigger issue.)


Electrocuted while showering: WTF!


I do that more days than not and I haven't been scraped off the pavement yet. I think most commenters here are overly concerned with the risk because they haven't properly equipped themselves to deal with it. It's much easier to keep yourself out of a bodybag when you are aware of your surroundings.


Spot on. We were rudely awoken at 3AM by our alert system after one of our DC's caught fire (host Europe/123-reg in nottingham - utter fucking cowboys now moved on from there). UPS blew and took out the entire power system and generator.

It definitely happens.


The number of colo issues I've seen triggered by various backup/redundant systems is pretty impressive.

Whether it was a redundant mains power system blowing (taking down the main PDU), spoiled diesel, failed generator cutover, UPS fire, smoke detector-triggered shutdown (associated with power management), a really bizarre IPV6 ping / router switch flapping issue, load balancer failures based on an SSL cipher-implementation bug (triggered an LB reboot and ~15s outage at random intervals), etc., etc., etc.

Just piling redundancy on your stack doesn't make it more reliable. You've got to engineer it properly, understand the implications, and actually monitor and come to know the actual outcomes. Oh, and cascade failures.


> Just piling redundancy on your stack doesn't make it more reliable.

Yeah, in a sense it actually makes it less reliable as far as mean-time-between-failures go. As an example, the rate of engine failure in twin-engine planes is greater than for single-engine planes. It's obvious if you think about it: there are now two points of failure instead of one. Why have two-engined planes? Because you can still fly on one engine (pilots: no nitpicking!).

What redundancy does do is let you recover from failure without catastrophe (provided you've set it up properly as per the parent).


> Yeah, in a sense it actually makes it less reliable as far as mean-time-between-failures go.

It depends on what you're protecting against, how you're protecting against it, and how you've deployed those defenses.

Chained defenses, generally, decrease reliability. Parallel defenses generally increase it.

E.g.: Putting a router, an LB, a caching proxy, an app server tier, and a database back-end tier (typical Web infrastructure) in series (a chain) introduces complexity and SPOFs to a service. You can duplicate elements at each stage of the infrastructure, but might well consider a multi-DC deployment, as you're still subject to DC-wide outages (I've encountered several of these) and a great deal of complexity and cost.

Going multi-DC doesn't increase capital requirements by much, and may or may not be more expensive than 2x the build-out in a single DC. It though raises issues of code and architecture complexity.

In several cases, we were experiencing issues that would have pervaded despite redundant equipment. E.g.: the load balancer SSL bug we encountered was present on all instances of multiple generations of the product. Providing two LBs would simply have insured that as the triggering cipher was requested, both LBs would have failed and rebooted. Something of an end-run around our Maginot line, as it were.


Curious: what's a handheld 1211?


I searched and this was in the top 5, I bet that's what they're talking about: http://www.h3raviation.com/halon_1211.htm


You might want to note that it's illegal to own or use Halon extinguishers in many countries.

http://en.wikipedia.org/wiki/Fire_extinguisher


In the US, it's ok IFF you use recycled Halon. Halon 1211 is still ~2x as effective as the nearest "friendly" alternatives (Halotron). For a facility-scale installed system, using an alternative agent is worth it, because you can just use 2x as much chemical. For a handheld, 5-10 pound is the biggest someone will realistically carry, so having more power is worth it. My goal is to not expend this agent in the next 5-10 years, and to lose maybe 5% during that period, so there's really no downside to the environment in having it in my 3 x 5lb extinguishers vs. in the presumably older extinguishers someone else had. I hope in a decade there is a better alternative, or I'll get them topped up (you're supposed to inspect them every year or 5 years depending on where they're used, but generally a 5-10y lifespan is reasonable).


Counterpoint: As a customer, I don't want to pay for the kind of redundancy that encourages employees to shut down the whole data center when a single tantalum cap pops in a power supply somewhere.

The fact is, there aren't that many flammable things in a data center. Nobody is going to die because they wandered up and down the aisles after they "thought they smelled something burning," in the absence of any visible smoke.

The guidelines in the highest-voted answer on the SO page make sense to me. 1: If you actually see smoke or fire in any significant amount, evacuate. 2: Make someone else aware of what's going on before doing anything else. 3: Keep your escape options open. 4: Think about how much time you can safely spend "guessing", and don't exceed it. 5: Don't second-guess your own common sense. You aren't paid to be a fireman or a hero.


Almost nobody (for any reasonable definition of nobody) has immediate redundancy for their data center. I've worked for three $1B+ companies whose entire business was based on their data center being up and running - and none of them would have returned to service in less than 48 hours if a data center had gone down.

"Not being prepared for them is unforgivable" - would mean that 99% of business do not deserve forgiveness.

It just doesn't make sense to have that kind of redundancy for such a rare event for all but a very, very small minority of businesses. (Telecoms, Google, Stock Exchange, 911, etc...)


I was working at a place as a contractor and they had an amazing backup power system (expensive, diesel with batteries). Semi hits the power main outside the server room and for some reason the diesel never starts. Whole datacenter loses power in under 15 minutes.

Always plan for the single event because no amount of money will keep a single site running.


The biggest danish ISP lost service for quite some time because the truck that were to deliver their new emergency power crashed into the mains transformer stations.

Of course they are idiots, they should never have powered their own mains of before they got the new one installed.

But it was still pretty funny.


I'm not sure immediately pressing the big red button because you mistook the smell a lead acid battery out gassing for something burning (with no smoke) is necessarily a measure of preparedness.


You ask the fire marshal what was burning, about an hour or so after you've Big Red Button'ed the server room.

This story is giving the Japanese engineer in me apoplexy.


Acting quick and getting a fire under control often makes the difference between an emergency and a disaster. But only after you've made sure the building is being evacuated, and the fire department has been alerted.

Don't be afraid to call the emergency number. They'll know what to do and walk you through it.

Under no circumstance should you enter a room filled with smoke. Smoke inhalation is incredibly dangerous.


Under no circumstance should you enter a room filled with smoke. Smoke inhalation is incredibly dangerous.

To re-iterate, a lot (most?) of the people who die in fires actually die from smoke inhalation than from getting burned by the fire/flames.


Hi. I have worked in a burn unit. Inhalational injury is usually not the cause of death. In fact of the people I saw only one who died of bronchoscopy-confirmed inhalational injury. And he was an obese smoker with minimal residual lung volume to begin with.


Also, in the Navy, where halon is also in heavy distribution, for fuel fires (nothing scarier than a fuel fire on a ship at sea), the training still includes that the person who smells smoke needs to find the fire, or at least some legit smoke before exiting the space. Most watch standees now have radios, which makes the decision of when to call a lot easier.


This seems to imply that most people who die are burn victims? (It feels like this is a stupid question, but I'm not sure of the answer so eh. Asking anyways.)

Would people at risk of a inhalation injury actually pass through you often? You're in a burn unit, so I'd assume that means you mostly get burn victims, and inhalation injuries would be pointed somewhere else?


> imply that most people who die are burn victims

Inhalation injury is a subset of burn trauma. The flow control is "Ambulance inbound from fire" -> ER calls trauma alert -> trauma team meets the ambulance(s) at the ER door. Those with inhalation injury are sorted from there.


Sorry, but this is selection bias. You won't even see all the people who die of smoke inhalation without any burns.


In this case, a mid-size city, the burn unit was also the trauma ICU. On the trauma team, we responded to all trauma calls at the ER door. It just so happened that all burns in the area also came to us, and thus passed through our trauma/burn unit. So, I doubt there was much selection bias in this case.


> Under no circumstance should you enter a room filled with smoke

The article says "smell", not "smoke", but I don't think inhaling it was a good idea in any event.


The post also says it was making them light headed...


Most likely it's just hyperventilation caused by walking around sniffing the air looking for the source of the smell. Although I wouldn't doubt the smell was toxic.


I've never seen hyperventilation caused by 'sniffing'. I wouldn't go so far as to say it's impossible, but you'd have to be sniffing far more aggressively than I've ever seen someone sniff before...


I've experienced light-headedness from aggressive sniffing. And it wasn't glue ;-).


Smoke from electronics, especially.


Speaking of consumer electronics:

When I went to the Midwest Reprap festival, for some reason power was being a bit flakey. The guy who's sitting across me had 2 printers using ATX power supplies.

Suddenly, the PSU sparks out and then outgasses a stream of smoke pillar 4 foot in diameter for the next 2-3 minutes. Near the top of the building perhaps 3 stories up, it looks like a mini-Hiroshima with mushroom cloud going.

And that was from a dinky ATX PSU.


Speaking of consumer electrics: around here universities have obsolete views of how students work, limited budget, and old buildings. This results in a shortage of wall plugs, and the smartest folks bring powerstrips, plug them into wall sockets, then into other power strips, and recursively until there's enough plugs for everyone's laptop. Cables turn seriously damn hot, and a burnt plastic smell crosses the room. Authority reaction: ban powerstrips because someone might trip on them.

Luckily energy management outpaced student equipment rate, and laptops now last a good part of the day without needing a plug.


Heh, they once shipped 110v PSUs to us 220v users... I had to build systems at the time, the first one I plugged in made an awesome explosion :P


My first (indeed, only) experience with the smell of burnt human flesh and hair was when a friend of mine flipped the switch on a PSU to 110v when it plugged in to 220v. He was fine, but the PSU didn't make it. RIP.


Waste of money.

What you really do is you evacuate the room and then release the CO2/other fire suppression system which shouldn't require you to shut anything down.

Then you went to the room and check whatever equipment isn't functioning -- if you smell a fire again you might have to press the big red button, but there is no reason to panic just because of a fire, so long as you can put it out.


You want to be able to EPO without agent (e.g. if someone's being electrocuted; especially with something like Halon 1301, you want to be able to remain in the room to give medical aid), but I've never heard of room-scale agent dump without EPO (either interlocked or as a checklist procedure).

The problem with that is that you only get one shot with the agent (although everywhere I've seen has a gas/clean agent with dry-pipe water as the true final protective measure). If there's electricity still going in, and you use your one shot with the facility clean agent, you might not actually put out the fire, or it could re-ignite, and then you either have no fire protection or only water. The cost of filling a datacenter with water, especially one important to have gas and water backup in the first place, is huge, even relative to an EPO pull.


You may want to call the fire department anyway.

If it's an "OK situation" they will be able to talk you through what needs to be checked, and they'll likely know it better than you if you don't have a proper in-case-of checklist prepared before hand.

If the smell turns into something worse, then you'll get them faster on-site.


Is that a Fukushima reference or something?


No. I previously worked at a Japanese megacorp and have a deep appreciation for the engineering culture in this neck of the woods (in a way I do not for, say, the culture of working at megacorps). I had a badge which would get me into the server room. I had to pass a 10 question test to be given the badge. Question #1 was, essentially, "What do you do if you think there is a fire?" The answer is beyond dispute: you execute the prepared emergency plan. (Rough order without spilling any beans: Big Red Button, evacuate, call fire department, call the numbers listed for emergency contacts.)

You do not try to debug the fire. There are people who are good at not dying while trying to do that. You are not one of them.

You do not try to avoid Big Red Buttoning because your bosses are idiots and they might come down on your hard for it: while your bosses are probably idiots, the first thing they'll tell you about Big Red Buttons is that nobody has ever gotten fired for pressing the Big Red Button, because everyone agrees that Big Red Buttons exist to get pushed and you never want to not have it pushed because someone was worried about getting fired. Big Red Buttons are costly affairs, sure. That's why we have redundant systems, insurance, and other various things that suggest we're responsible professionals.


> the first thing they'll tell you about Big Red Buttons is that nobody has ever gotten fired for pressing the Big Red Button, because everyone agrees that Big Red Buttons exist to get pushed and you never want to not have it pushed because someone was worried about getting fired.

I'm pretty sure that axiom is not present in all companies and cultures. Hence, the debate.


I was a bit appalled by the ludicrous approach toward safety. I would rather get fired over a crusted fuse than live with being responsible for a death. This whole "debug the fire" thing really smacks of 80s era business dick wagging.


What happens if someone tries to prosecute you for "intentionally" causing millions of dollars of damage?


Not quite the same, but I know of an employment dispute that was brought to a head by a staff member quenching and MRI scanner by pressing the big red button.


While I'm not a lawyer, I'd be surprised if trying to report a possible fire couldn't be be considered a "just cause", the lack of which is a key element in tort law everywhere.


Where did 'fire' come from? The smell of heat-damaged electronics, with no smoke, nothing visibly wrong, is most certainly not a strong indication of fire.


IR/thermal imaging cameras are SO USEFUL. I had a fire (bathroom fan caught fire due to being 45y old, knocked it down and extinguished it myself, but was worried about extension in the ceiling/duct).

Oakland FD came out and used their IR camera to check the heat from the ceilings nearby. Hilariously they found a hot water pipe (running between bathroom and kitchen) and almost axed the ceiling open (turning $1500 in damage into $3k+), but their captain was smart and figured it out from another angle.

Really tempted to hack an EOS 5Dm3 into an IR camera next. Not so much for fires as night vision, but it would be useful for fires too. I'm not sure how useful an IR camera is at detecting heat, since things which aren't yet on fire are not quite so infrared, though.

I usually use a Fluke IR temp meter when cooking and to find hot wires/etc. in the datacenter, though.


They are useful but they do have limitations. I'm not sure it would have necessarily detected the faulty battery in this case. Last year, there was a fire at my house and the FD searched for the source for three hours. With thermal imaging and everything. It was inside the walls, no open flame, just a lot of smoke and no clear readings on the imager. That was pretty frightening. (However when they finally did find it, they put it out in a couple of minutes.)


I'm glad it motivated me to get ABC Dry and Halon 1211 extinguishers for both rooms and the car, at least.

In a "real" datacenter, you should have smoke sensors which would map where heat/smoke is coming from (since you have controlled airflow, it should be obvious which rack or small group of racks was the source -- it doesn't just exhaust into the whole room). But it's pretty clear this wasn't a "real" datacenter by their lack of protocols for handling fire, it was some office server thing.


I'm just saying these things can quickly get more complicated than expected. I too had extinguishers handy at the time but of course I didn't know what to use them on (and neither did the FD for three hours). These electrical fires can be tricky to debug, especially when different kinds of barriers come into play.

> But it's pretty clear this wasn't a "real" datacenter by their lack of protocols for handling fire, it was some office server thing.

Probably. What's the right protocol though? In this case, it was apparently clear that something minor was amiss, nothing that would justify shutting down the whole thing. In any case, flooding the room with inert gas would probably not have made much of a difference, as it looks like the battery was never actually burning.


House construction is insane, anyway -- they're full of random stuff, and there are plenty of non-accessible void spaces. Datacenters are at least generally nice and open, so finding a weird residential hidden fire should be a lot easier. (which is why datacenters get permission to run their wiring the way they do, etc., because they have so much other safety)

The right thing to do in a real datacenter it to check which of your ~hundreds of laser VESDA sensors first tripped, and investigate in that area :) Presumably you have floor air supply, ceiling air return, so the first thing to trip should be a ceiling sensor near your fire. If no floor sensors trip, I wouldn't be super afraid to go in there, and if it's only a small number of them it's not a big fire.

You don't want the dry pipe to go off for sure, and you don't want the FM-200 either, but the consoles should be reporting the smoke alarm to you way before a human would smell it "filling the whole room", and they don't generally discharge either for very small events (at least everywhere I've seen).

In an office (some open plan, some cubicles, some conference rooms and offices, etc.), with a few racks of equipment, and maybe some lab space, it's a lot more similar to the scary hidden residential fire problem. :( Your risks in trying to uncover the problem are actually higher than in the datacenter because then you don't have the amazing gas system and a dry pipe backup to save you if it turns into a big fire while you're there, and it's not as designed for easy egress, and probably doesn't even have real EPO. I wonder if there's a firefighter on HN who would know the real answer to this case.


FM200 systems do go off for small events at times. I managed a team of datacenter facility specialists up until last year, and we'd seen issues like: FM200 dumps because underfloor smoke detectors notice smoke from a CRAC condensate pump (pretty low risk) smoking its winding, FM200 dumps when a quick refrigerant discharge (technician error) looks like smoke to the detector, and false positives at smoke heads due to a dirty area under the raised floor, combined with air flow irregularities.

I definitely agree that I'd be more concerned about a house fire, but the rule that we enforced to our people and the vendors, as well as the vendors working for us (not to mention the guidance that we received from our customers) was that nothing in that datacenter is worth potentially losing anyone's life. That having been said, I have Toucan Sam'd in a datacenter to try and find the source of an odd odor before, but never alone, and only to find out what to secure power to. I wouldn't sit there and try to fight it with a fire extinguisher.


The only "accidental discharges" I've seen were related to construction dust in an underfloor. And yes, suck :(

In general the purpose of a handheld extinguisher is to fight tiny fires as well as to help you escape a bigger fire. The thing I'd be most afraid of would be someone walking around trying to find a small fire, only to discover a big fire, have egress blocked, and need to figure out a solution. Or, coming across an actual person who is on fire or otherwise in danger (even if you'd expect virtually no personal risk for property, I think most people would accept substantial personal risk to save a person, particularly a coworker).


I have had a halon dump when the aircon went out and when it restarted the dust that burnt off the elements caused a dump.


The wavelengths used by heat detection equipment are way longer than you'll pick up with a DSLR, even after a conversion to remove its IR-blocking filter.


Yes, Silicon sensors are not photosensitive past ~900nm. The IR filter is only there to block "near infrared" 700-900nm which is not human visible but silicon sensitive.


How sad. I guess I'll be buying one of the $200 IR-Blue.


> I usually use a Fluke IR temp meter when cooking...

Please elaborate.


I got one of these (Fluke 62, http://www.amazon.com/Fluke-62-Mini-Infrared-Thermometer/dp/...) for $35 or so on sale.

You can get the same guts for $15 (http://www.amazon.com/Accuracy-Non-Contact-Infared-Temperatu...)

It tells you the surface temperature of whatever you point it at, and has a convenient laser for aiming. When cooking, I use it to see if a pain is hot enough yet (e.g. to sear meat), rather than relying on the "smoke point of various oils" test. You can also use it to see how close water is to a boil, although I'm not sure if it is measuring surface temperature, some slight penetration into the water, or the pan bottom (although, arguably, these should be fairly close in water).

It's also useful in something like a fusebox to find hot/overloaded circuits. It's essentially a 1x1 pixel themal imager, while a 100x100 thermal imager costs much more.

You still want a probe thermometer (for measuring meat internal temperature) such as http://amzn.com/B0000CF5MT, and if you have kids/sick people/etc., probably the internal-temperature kind (the ear/IR kind are the least gross).


I have a couple of the Amazon cheapies that I use for checking temperatures when brewing -- it's totally awesome to check the temperature of heating mash water from across the room.

The laser also came in pretty handy for entertaining the most recent foster puppy.


Thank you very much. I have a cheap non-contact thermometer, but I never thought of using it in the kitchen. I will try it out.



Wow, I wonder what the range of that is. I.e. if I can use it to analyse the thermal loss of houses.


I believe that's what the inventor originally made it for, if you look at his postings.


why do you need that you from my termofluids classes you should be able to calculate the heat loss for a given Delta T by taking a ball park figure for the materials ie so many square meters of brick Glass etc.


You use it when you don't know the material (because you're surveying an old house) or you don't know the delta-T (because you're looking for a hidden fire, grow op, etc).


Or you're trying to track down head loss caused by construction flaws, especially leaks and drafts caused by cracks or gaps. Also leaks caused by improperly installed power points. Missing or settled insulation is also common.


I couldn't answer the question there because of reputation thing. (just signed up)

Here is a little bit different approach;

If you are in the room and smelled the burn, that means something is already happened and you are dealing with its result and possible side effects, and that gives you possibly enough time before shutting down everything or getting out of the room. Your chance of not being harmed by this situation is high at least for 5-10 minutes.

In this case, having a termal check would not help you a lot since burned hardware is most probably not functioning anymore and might be colder than the regular servers. The other option would be that it is still working but not causing any fire yet so heat is not much different than the usual.

Now, smell is your only evidence,

I am hoping you guys have air conditioner in the server room. Put it on the max level so that the smell will not be so strong everywhere. This can help you identify where the smell is coming from. Before you check the smell, get out of the server room, breath as much as fresh air you can, so that your realisation will be sharper when you get back to server room. Having your colleague with makes the process faster.

This would be my first reaction to these kind of situations.

It is of course costly to turn off whole system but don't forget that it is not important than you!


I'm not sure if everyone responding really read this all that carefully. There was absolutely no mention of smoke in this question. There was a "smell". If you drop an entire datacenter, you are easily looking at $100K+ in damages just to reset the room. So, getting a buddy and taking some time to look for the problem until it is found or until you actually are seeing smoke or other specific danger seems like a pretty reasonable course of action.


X times out of Y that's going to be a reasonable course of action. But one time someone will die, and at that point (because it's rare and we freak out at rare dangers) people will be up in arms about it, and about how stupid and irresponsible it is to not hit the button.


Panic can be dangerous as well. A halon dump can asphyxiate someone who doesn't reach the oxygen mask or an exit in time.

Dropping a whole server room without seeing any smoke or fire is silly. Do you pull a fire alarm if someone smokes a cigarette indoors?


...and that someone could be you.


For some more context, this was the OP's previous question in Stackoverflow.

http://serverfault.com/questions/420877/ive-inherited-a-rats...

Doesn't that change the entire question!


If there was a burning smell in that, yikes, hit the Big Red Button and turn in your resignation.


Seeing that rats nest I would have let it burn!


It is pretty simple. You call the fire department.

They have a TIC (thermal imaging camera) that can detect heat/overheating sources pretty quickly. Plus, it's kind of nice to have them on hand in case a smell progresses to a fire.


I agree. Even if you call just to alert them that something is not right. Let them roll one truck just to have someone on hand in case someone gets burnt or shocked.


I agree. People seem to imagine that all they do is put out fires with a hose. They are professionals who know how to manage potentially hazardous situations.


It's been a while since I've been in IT, but don't failing UPS batteries put off fumes that destroy your lungs?

A friend of a friend was a hero and shutdown his datacenter cleanly/recovered some hard drives during a situation like this. He got severe lung damage (not from fire).


Lead acid batteries will have a rotten smell after shorting out - more so in wet ones, but SLA will smell the same.

If you wait to the point that an SLA smells, it has probably expanded and caused internal damage to your UPS/server rack, albeit minor if you can manage to get it out without dismantling anything.


Just an idea: the server room should be segmented into smaller areas with isolated power circuits. Using the sniff test, if you are truly concerned that you are about to have a fire, then it's only responsible to start an orderly powerdown to prevent equipment loss, and more importantly, prevent injuries to people.

If you start shutting down areas of the datacentre that appear to be closest to the smoke, then you will have a better chance of locating the issue in the fastest possible timeframe, with minimum disruption.

On top of this, if you then have critical infrastructure that you must keep running, then you keep your failover servers in different areas and failover to that equipment.

I'm not a server or datacentre guy in any way, but doesn't this seem sensible?


'hmmm... probably UPS battery venting' click 'yup'

When UPS batteries vent it has a distinctive odor. It's very pungent and sulfuric, but it doesn't smell like a fire or melting silicon. Any experienced operations guy has smelled it before.

Additionally in most fire suppression systems the Big Red Button is the abort button. A well designed room will dump itself when it detects smoke after a short evacuation alarm. It's precisely designed to keep people from screwing around with a real fire. They must make the active decision to stop fire suppression rather than start fire suppression.


HVAC. Has no noticeable smoke (it'd probably be outside the building anyway) but pumps a burning smell into the room when the motors start dying or aren't oiled right.

Don't hit the big red button just because you smell burning.


I've had this happen to me once. No alerts and all boxes were up, but there was a smell in the room. I went machine by machine and UPS by UPS and nothing was wrong or burning(1).

Next day we find out the breaker panel next door had a short that blew out several breakers. Smell was vented into the server room.

So, not always your room, could be something else just as or more dangerous.

1) shut down all machines, unplug all UPSes, open every case


What kind of server room is this which is not equipped with smoke detectors?


If it's just the "melting plastic" kind of smoke, it won't trigger smoke detectors. And I believe that his battery wasn't actually on fire - if it was, then yes, smoke detectors would have triggered.


Oh yes, it will.

I made my 12 year-old read the story (along with pictures) of a girl his age who was trapped in a burning house after he set off the smoke alarms at midnight by melting bits of plastic in his room.

Hopefully he learned something that night.


How did setting off smoke detectors cause a fire?


Poor sentence structure on my part.

My son set off the smoke detectors by melting plastic in his bedroom with a cigarette lighter.

To demonstrate the danger of what could come from this, I had him read a news report of a girl who was screwing around with fire and ended up being badly burned over most of her body.


Depending on how effective the ventilation system is / how diffuse the smoke is, the detectors may not do much to pinpoint where the smoke originated. Many DCs only have so many smoke detectors, at fairly wide intervals. There can be a lot of devices inside a 30' square.


Do smoke detectors also detect smells? The question doesn't report any smoke at all.


I spent a couple of summers interning at IBM and one of the things they taught you in orientation was the sound of "imminent halon dump" (the alarm that said Halon was about to be used in the machine room). The instructions were, hold your breath and make for an exit immediately. Failure to do so would lead to asphyxiation.


Lead based batteries often have a usable life of 3-5 years. Chances are, the others of that vintage are already dry and have already failed. Then they will rupture, often with smoke as their series connected brothers try to push electrons.


I have seen plenty of UPS batteries swell up so big they can't be removed without disassembly. The only indication was the failed self test. OP did say it was the UPS in the rack next to his production DB.

Don't forget Capacitor Plague. I still see it regularly.

https://en.wikipedia.org/wiki/Capacitor_plague

Have a plan, be safe.


If it's that important, buy a TIC (thermal imaging camera). They can be had for under $10,000 and will show you actual hotspots. Walk through, sweeping every item.


Temperature indicator stickers.


At $5+/ea, that quickly adds up and difficult to parse in large numbers in duress. Whereas an IR camera is rather effective and perhaps cheaper.


Thanks for this story. I am ordering a fire extinguisher for our server room now.


It helps to prioritize what you look for. 9 times out of 10 it's power related.


Get in an electronics "expert" they know what component smells like what :)


Infra red camera!


Can of deodorant.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: