
Boiling the machines when they needed to just chill - weinzierl
https://rachelbythebay.com/w/2020/04/28/boil/
======
dmonagha
Once upon a time I worked a contract gig at a credit union in the US. My first
day in the datacenter (ground floor), I noticed the grill from a late
90s/early 2000 Ford F-150 pickup truck hanging on the wall above the door
frame. I pointed it out and asked "why is that on the wall?"

The building was like many bank buildings, 4-5 stories, all glass exterior,
off a busy main road. In particular, this building was parallel to the busy
main road.

Turns out, the busy main road was perpendicular to another road, which lined
up almost perfectly with the datacenter on the first floor. One Saturday
night, a drunk driver ran the stop sign on the side street, blasted through
the field leading up the datacenter, and drove their truck straight into the
building, right through the offices surrounding the main server room. The
force of the crash blew bricks across the room in all directions and missed
the raised floor by maybe 3 meters.

Surprisingly, nothing went down. The datacenter now has "bulletproof" glass on
that portion of the exterior, poles installed into the ground in front of the
glass, and an earth berm raised above the road.

The sysadmins found the truck grill inside one of the offices under a desk.
They kept it and hung it on the wall as a reminder to plan for any sort of
disaster.

------
prewett
At one of my early jobs I was sysadmin for a local office of a small company
that wrote simulation software. We had procured some rack-mounted servers that
sounded like jet engines, so when we expanded to the office across the street,
the boss took the opportunity to knock out two closets and make a server room.
I remember having a conversation with him where he asked my advice on the size
of A/C to buy. Despite having no knowledge on the subject, I had opinions, and
I recommended getting the biggest A/C.

That all got installed, and it was great, we no longer had a jet engine in the
room next door, and I felt awesome because not only was I managing 10 Linux
machines but now I had a compute cluster. In a rack. With a little slide-out
KVM thing that switched between the machines!

Well, several months later I started noticing mold on the side of the closet,
and a thin condensation on the rack sometimes. So I got a temperature monitor
and wrote a script to create pretty graphs, and between the low temperature
and the central Texas humidity, we had a problem.

So the boss (who was in the New York office) had me talk to someone who
actually knew something about A/C (because I had no idea why this was
happening). Apparently I had spec'ed out an A/C large enough to cool the
entire building, and that the problem I was experience was why one matches the
A/C to the heat load. The solution was to add a heater to the A/C. So we ended
up cooling a building's worth of air, then heating it up, and then sending it
to cool 2U's of servers.

~~~
m463
That reminds me of a time I was working at a site delivering a big project.

They had put up a big temporary building next to the project building to hold
a lot of additional engineers. The temporary building seemed to be the
commerical equivalent of 10 double-wide mobile homes side by side with the
side walls removed. There's probably a name for it. The end result was a ramp
from the main building, leading up and turning into a long room maybe 200'
long and 40' wide filled with lots of desks with engineers.

There were also columns every so often for network and power drops.

There was also a thermostat every 20 feet or so. (see where this is going?)

I happened to be working late one night and entered this room after everyone
was gone. And the funny thing was, even though it was cool outside, all the AC
units seemed to be running and making a lot of noise.

Walking down the middle of the room and looking at the thermostats I realized
what had happened. Everybody had their own "comfortable" and they were all set
to different temperatures. The result was at night you had heating and ac
running, sometimes right next to each other, in an epic battle.

I went to each thermostat and set it to the same value, and within a short
time equilibrium was reached and the entire room quieted down as the
compressors and fans wound down.

Thinking back, I wonder how long my fix lasted.

~~~
mtnGoat
based on personal experience. if you didnt install a lock box over the
controls... about 10 minutes into the next work day. ;)

------
sixdimensional
I have a great story...

I worked for a startup in the back of someone's house. They had built an
extension on the back of their house which was basically a greenhouse (it was
like a steel and glass construction). The glass was supposed to be thermally
coated but apparently it was installed backwards, with the thermal coating on
the inside not the outside.

So the story goes, we had our server room and workstations all in this
addition. During the summer it was routinely at or above 100 deg air
temperature. I had a small portable AC unit we ran 24/7 and eventually they
got a dedicated AC unit for the room. We also ended up covering the glass roof
with tarp to try to block out the sun.

It was a pretty insane setup, but it was a cash strapped thing and the owners
didn't want to do more than what they did. I was a naive kid fresh out of
school, and thought it was so cool to work at a startup - even if it was
sweating constantly in a noise filled environment working nights and weekends.

That said, the startup succeeded and grew into a company.

There was also that one time there was a rain storm and the "greenhouse server
room" was not sealed against the side of the house, and the rain poured in
through a crack. I will never forget when my colleague called at 1AM to let me
know the server room was flooded, and that one server, which had it's case
open, had the case fill with rainwater - while it was still turned on and
running. And it kept running as most of the water collected at the base of the
server chassis.....

If you think this sounds crazy, you're right, but I'm sure somebody else has a
story just as crazy or even more so!

Ah, the early days, before the cloud... when we did it all ourselves. Yup, I
had to wear the HVAC hat too.

------
ogre_codes
A few years ago, I worked for a hospital system which had their datacenter in
Phoenix. We had a large (tens of thousands of square feet) datacenter in
supporting a few dozen hospitals and its HVAC system was marginal. On a hot
day it could barely keep the datacenter below 80 degrees. This was a facility
with hundreds of millions of dollars in equipment, I don't think there were
any systems which affected patient care, but pretty much every other type of
hospital system was in that building.

When the AC did fail, the building temperature spiked up over 90 degrees
within half an hour and eventually climbed up over 100. We spent most of the
afternoon shutting down the least critical systems so the most important ones
would stay online. Then slowly bringing them back online after HVAC was back
online and the building started to cool again in the evening.

I was stunned that they didn't install some kind of redundant HVAC setup after
that, or at least one that kept the room at 70 degrees on a hot day.

The other huge facilities failure at that facility was when electricians were
working on the (ironically enough) backup power system. They accidentally cut
power to the entire datacenter for about 30 seconds.

I've seen more server down-time due to facilities errors than computer
hardware by a good margin.

------
dx87
I had an instructor for a class who had a similar story. He had requested a
day off of work to take a short vacation, but because it was a small company,
it meant that no IT personnel would be available for that day. He came in on
Monday and was immediately called into his manager's office because apparently
all the servers went down on the day he wasn't there, and they had to call in
the other IT guy to fix everything. Turns out, one of the other managers in
the office had turned off the dedicated AC to the server room because he
thought IT was wasting money by running the AC when nobody was there.

Luckily they sided with my instructor when he said that it wasn't his fault
and ended up firing the other manager for trying to cover up his own mistake.

~~~
KineticLensman
30 ago I worked at a place where we had a server room full of Sun boxes and a
Symbolics Lisp Machine. The server room was a repurposed office to which some
A/C had been retrofitted. On one occasion the machines started randomly
glitching. The first clue that the A/C had failed was the Symbolics printing
'help I'm getting hot' (I paraphrase) to a console I was monitoring out in the
main open plan.

~~~
Symbiote
That reminds me of the "lp0 on fire" messages which some Unix/Linux systems
can generate:

[https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/char/lp.c?h=v5.6.7#n239)

------
drewg123
I worked as research staff for an OS research group at a university in the
90s. We had a new building, with a nice new machine room. However, the machine
room had tightly controlled access which was limited to department IT staff
only, and we wanted our grad students to have physical access to our machines.
So we took over a "lab" that was never intended to host machines.

By the time I left, we had 2 racks with networking gear (Myrinet and 1Gb/s
ethernet, which was a big deal in the 90s), and probably 40+ 1U servers, plus
several tables full of larger pedestal storage servers (one of the projects
was a cluster filesystem).

Just like in Rachael's story, we would have times where the HVAC would go out.
I remember dragging giant 4' fans up from the basement to put in the doors of
the room to cool it off during HVAC failures.

Oh, and we had a fume hood too!

------
JoeAltmaier
My last client had their main nationwide server, that their company depended
upon, in a stuffy closet under a fire sprinkler. No it never went off, but
dang.

Anyway folks in Enterprise can forget, any organization under a certain size
doesn't have _any_ special consideration for their computer systems. They just
grew from some PC on a desk to whatever server/router/firewall situation they
are in now.

~~~
pjc50
My favourite of that was when we grew from "some PCs plugged into extension
leads from a floorbox" to four full racks of 2U servers, without ever
upgrading the power distribution. Until someone noticed that one of the
consumer-grade extension leads powering the whole setup had gone brown ...

Oh, and if there was a power cut you'd have to disconnect the racks and bring
them back one at a time or the inrush current would trip the building breaker
again.

~~~
hinkley
I recall hearing years ago of someone modifying, I believe, the BIOS on a
fileserver to spin up the hard drives one at a time so that the inrush current
didn't cause brownouts (on the PSU or on the building circuit I don't recall)

~~~
buzer
Facebook went a bit farther.

[https://engineering.fb.com/core-data/under-the-hood-
facebook...](https://engineering.fb.com/core-data/under-the-hood-facebook-s-
cold-storage-system/)

> The biggest change was allowing only one drive per tray to be powered on at
> a time. In fact, to ensure that a software bug doesn’t power all drives on
> by mistake and blow fuses in a data center, we updated the firmware in the
> drive controller to enforce this constraint. The machines could power up
> with no drives receiving any power whatsoever, leaving our software to then
> control their duty cycle.

------
icedchai
In the mid-90's, I worked at an early ISP that had most of its network and
dialup equipment in a small retail / office plaza. It was in the basement of a
unit that they had rented to someone else.

There were about 100 individual phone lines coming off the wall, each going to
its own modem. Basically, a river of phone cables. Each modem had its own
power supply and serial cable. "Power distribution" consisted of power strips
chained 2 or 3 layers deep.

There was no cooling to speak of. I remember going down there in July or
August and it felt like the place was going to melt down. Some modems, quite
literally, had melted: the plastic was warped and discolored.

Fun times...

------
SteveGerencser
I had a client that didn't like to use passwords on their server because he
could never remember them when he wanted to check something. His security
solution was to put the server in a steel locker with 1 hole just large enough
to pass a cord from a power strip and a CAT5 cable out of the side. Then lock
the locker so no one had physical access to the server.

He did all this on his own on a Friday and then he left the CRT on all
weekend. I got a call on Monday letting me know that "my" server had failed
and I needed to get out there immediately. And of course, when I got there he
was gone with the only key to the locker.

~~~
cwkoss
I wonder how much situations like this contribute to the high frequency of
lockpicking being a hobby in IT/sec circles.

~~~
hinkley
We were going to move a server one evening and we told the head of IT ahead of
time we needed to move it at 6. He agrees.

At 5:45 we roll into the server room area and he has gone home for the
evening. Server room is locked.

Someone as a Hail Mary asks building security, and they surprise us by saying
of course we have keys. So they come in and try the doors (the server room had
been expanded to fill several offices, so there were 3 doors). No luck with
any of them. Which upset the security person because they were supposed to
have keys.

So security wanders off before we start trying to jimmy the locks, and it
turns out that the middle door accepts Mastercard. It was never mentioned
again and the IT dude never asked (which he should have, really).

I don't know that I'd play it the same today. Back doors are put in or
tolerated primarily because of fear of, or the reality that, someone isn't
doing their job or some group is making a power grab (or typically, both). It
costs political capital to call someone out on that but the alternative is to
have people lying (even by omission) about known security issues with the
system or facilities.

Personally, I'd rather have a couple people who have credentials or keys they
are sworn 'never to use', except in extraordinary circumstances.

------
jrochkind1
> How do you deal with a tech who leaves a system in a state where it will do
> nothing but dump hot air at full speed all night long?

Well, this is why organizations no longer keep server fleets in random on-
premise used-to-be-copier repair rooms. (Or do they? I wonder if school
districts might still be doing that...)

~~~
snazz
Speaking from my friend's experience, school districts _totally_ still do
that. Their uptime only has to be good, not amazing. I don't think he's ever
run into any HVAC issues, though.

~~~
abakker
My experience backs this up as well. Schools don’t have the budget to do it
right or to care, so they make do.

~~~
0_gravitas
yet they always seem to have enough to casually drop several grand
AstroTurfing their football field

~~~
abakker
Yes, because kids care about football, but nobody cares about a schools IT
department. For schools (and many businesses) IT is just a utility.

~~~
0_gravitas
I didn't care about football, I cared about computers. My entire school
district didn't have a hint of CS more advanced than learning how to use
Microsoft Office. Additionally, there are ways to play football outside of a
school team, I played soccer for many years outside of school and enjoyed
myself well enough.

------
chiph
"Quick fix" by a sysadmin at a previous job:
[https://imgur.com/a/REwK1HP](https://imgur.com/a/REwK1HP)

Yes, that's a Walmart window A/C sitting on a filing cabinet with a box fan
teetering on top. No, it's not vented to the outside. So the temperature in
the room continued to rise.

------
kyuudou
One of my early jobs, I worked for a group of remote sensing scientists that
analyzed weather satellite data and the equipment I was responsible for was in
an improvised raised floor room with probably the smallest unit Liebert
produced. I found out one weekend why there were huge fans scattered about the
office - the Liebert sometimes failed for various reasons and times and the 5x
52U racks of Dell PE and Sun E3800 servers and storage got super hot super
quick.

I had to rush in the office with my boss, a brilliant remote-sensing PhD who
flew into hurricanes with P-3 Orions to make sure the data they were gathering
was accurate with what they were getting from the microwave scatterometer on
the satellite, helping him setup fans and shut down equipment that wasn't
absolutely necessary.

After that, I resolved to be alerted proactively and scared up a setup with
some Sensatronics temp sensors, cacti and nagios so I could get data over time
as well as alerts when over a threshold. I also enabled the openmanage
alerting and found you could get the ambient temp at the air intake of a
poweredge so I got that into SNMP and setup alerts on that as well. Fun times.

Side note: when I first walked in to the server room, I saw some Apple Xserve
RAIDs and thought that was odd next to some Dell stuff and he said, those are
the only storage units that can handle the G-forces we experience in
hurricanes. "The Powervaults just fall right over"

They still had Sun equipment since RHEL was still at v4 and was not fully
64-bit, while Solaris + SPARC had always been 64-bit, and the dataset sizes
they were crunching required being able to access RAM-per-process of more than
4GB. MUCH more than.

------
larrydag
Is it common to have sysadmins to be ad-hoc HVAC service admins too?

~~~
PeterisP
Yes, because the ordinary HVAC service runs on a "best effort" basis where
some downtime or reduced capacity is acceptable and you can try and fix stuff
if someone complains; but for a datacenter you suddenly need 24/7 99%+ uptime
with rapid response in case of failure, because losing AC is only just a bit
worse than losing power, and it's easier to have redundant power than truly
redundant AC.

~~~
asdff
You mean having two windows with AC units isn't redundant enough? I kid.

------
nieve
Many years ago a brand new server room with its own dedicated HVAC systems
being used for electronic home arrest nationwide started having all the
machines fail each night every weekend. As you can imagine this was a panic
mode situation and they had some of the sysadmins stay overnight. It turns out
that the HVAC people despite knowing what it was used for, told the and having
in fact been in the room setting up the underfloor programmed the A/C to shut
off over the weekend to save electricity. When drama flows downhill from the
Feds no-one is happy and the resulting meeting was almost as uncomfortable as
the one after Qwest deliberately cut both sides of a fibre ring and then tried
to sell us service.

(We later found out Qwest's field people didn't want to cut a different
carrier's fibre so they called back to confirm and were told to cut it or not
come back in the morning.)

------
OrangeMango
In the 90s, I worked at a little place that was crammed full of PCs and other
electronic equipment. It got uncomfortably hot in the summer. A couple of us
went around to all the PCs and changed the settings - instead of a
screensaver, turn off the monitor after 10 minutes of inactivity. Problem
solved. The age of CRTs...

------
annoyingnoob
I used to call it spidey-sense. In reality it was being always on and never
off, not really a healthy way to live.

------
geocrasher
I have lived this scenario. I should write it up sometime. Also fun:
Sprinklers and failed UPS batteries.

