
Moved a server from one building to another with zero downtime - huhtenberg
https://www.reddit.com/r/sysadmin/comments/i3xbjb/rant_sorta_physically_moved_a_server_today/
======
digitalsushi
I had to search the reddit commits for 'vmotion'. They have it covered.

This anecdote is an amazingly good story for telling at the pub over a few
beers. It's a terrible story for a strategy.

If this is a mountain, my molehill is that one night in the late 90s, I got
paged cause the SMTP outbound server was overheating. At midnight I drive
across sleepy NH backroads, and stopped at a Wendys to get a chicken sandwich
and iced tea, for the caffeine.

When I got to the server room, I pulled the 2U Dell server out of the rack and
discovered the CPU cooling fan had seized up. Mind you, this is a New
Hampshire data center in 1999, and it has a filing cabinet with manilla
folders, and carpeted floors. This thing was never prepared for any disasters.

A half hour later, the SMTP server was up and running cool again.

I greased the fan with the mayonnaise from my sandwich.

~~~
dsr_
The (probably soybean) oil is a fine lubricant, but the constant motion should
cause the egg proteins to coagulate. How long did it operate before you
replaced the fan properly?

~~~
kijin
The egg proteins are already quite coagulated. I'd be more worried about the
vinegar component. You need to neutralize that acid with something.

~~~
tzs
It's also got corn syrup. Would that cause any problems for this application?

Here's the ingredient list for Wendy's mayo: Soybean Oil, Water, Egg Yolks,
Corn Syrup, Distilled Vinegar, Salt, Mustard Seed, Calcium Disodium EDTA (To
Protect Flavor)

~~~
cnasc
> It's also got corn syrup. Would that cause any problems for this
> application?

Over time, the server would expand to be 4U rather than 2U

~~~
noir_lord
That's the problem with the FAT filesystem, it grows over time.

------
walrus01
Sort of on the subject, i've seen a brochure for a specialty product marketed
to law enforcement. It's meant for use with the seizure of live, powered on
desktop PCs and similar that have a high likelihood of full disk encryption.

Essentially it's a medium sized double conversion ups, with a really high
quality sine wave inverter, and some electronics that can match phase with a
live 120vac 60Hz circuit. And a tool kit which consists of the insulated
electrical hand tools needed to do a midspan removal of the cable jacket and
splice into the wires in an ordinary PC power cable. The person using it is of
course supposed to be trained in advance, and competent at the process of
attaching the UPS to the live circuit.

~~~
huhtenberg
In a similar vein, there are USB gadgets that emulate a mouse that keeps on
jiggling, to prevent the machine from locking out on user inactivity.

However, there are anti-jigglers too that lock the machine when any new human
input device is plugged in.

[http://codefromthe70s.org/antijiggler.aspx](http://codefromthe70s.org/antijiggler.aspx)

~~~
kawsper
That's interesting.

You could have a list of known USB device IDs you trust, and if a newly
plugged in USB device wasn't on that list you could lock or power down.

~~~
ThePadawan
That is a policy I heard to be used in already not-extremely-secure
environments like software development at a bank (completely isolated from
production environment).

They didn't go so far as to cause alarms on unknown device ids, but devices
would just not be mounted if they were not whitelisted.

~~~
walrus01
About 13-14 years ago some parts of the US DoD resorted to hot glue gun
filling all the usb ports on desktop PCs, except for the two ports required
for the keyboard and mouse.

This was during the windows XP era when it seemed there were an endless number
of security problems related to usb devices, no matter how good the group
policy and registry settings pushed via active directory membership were.

~~~
swalsh
What does that solve though? I don't NEED a mouse to copy data.

~~~
dhosek
It solves the "I found this USB stick in the parking lot—let me plug it in to
see what's on it" problem.

~~~
icedchai
Sure, if they don't have a USB hub sitting around.

~~~
dhosek
The closest thing to a USB hub I've got is one of my external drives for my
Mac Mini has a built in USB hub so I can plug stuff into that as well as
directly into the computer. The last time I worried about such things was back
when desktop computers only had one or two USB ports. Plus, in a DoD
situation, I'd imagine that having your own USB hub plugged into a DoD
computer would be the kind of thing that could put your job at risk. A friend
who teaches at the Naval War College often laments the unusability of DoD IT
because of the level of locking down, but any "Why don't you do X?"
suggestions have a response of "I'd get fired."

The safeguard doesn't need to be perfect, it just has to be good enough.

------
jcrawfordor
I once basically spent a summer doing this, not over a parking lot but to
consolidate the remaining equipment in a large number of racks into a few new
ones - this was a former sales office of a megacorporation that had been built
to have its 1970s-era computer room proudly displayed through windows into the
main conference room, a very weird setup without the context that in said '70s
that conference room was used to pitch prospective customers on business
automation.

Anyway, by the time I was there it was still a '70s-vintage large computer
room but now massively overprovisioned on space, cooling, etc, particularly
with most IT functions having moved to corporate. A decision was made to
repurpose part of it as a test lab and move all the actual remaining equipment
to three racks in the corner.

I'd do about two servers a day in between other things, taking advantage of
redundant power supplies to transfer the PSUs one at a time to extension
cords, swap to a long network cable fast enough that TCP sessions probably
didn't time out, and then unrack onto a hydraulic lift card and do the same
procedure the other way.

I presented this at the start as far from a guaranteed strategy - that it
would minimize downtime but there would inevitably be some due to mistakes.
None of this was really that critical. There were a few devices that were
pretty old and poorly maintained, we agreed up front that if these lost power
for some reason and then failed to boot, we would just say they'd lived long
lives and purchase replacements.

I guess the point is that this whole situation was kind of unusual and I would
generally _not_ recommend doing this, we were lucky that all the equipment
left had stakeholders that acknowledged it was legacy stuff and they could
tolerate losing it.

The irony is, of course, that it went perfectly. So far as I know there was
not a single problem experienced through the whole thing. I even managed to
swap the phone lines to the (surprisingly busy!) legacy fax server when each
was out of use.

~~~
groby_b
> There were a few devices that were pretty old and poorly maintained, > if
> these lost power for some reason and then failed[... we'd] purchase
> replacements.

And the sysadmin let that opportunity pass??

~~~
jcrawfordor
Heh, I know of a couple of people who would have been far happier if the fax
server specifically was somehow accidentally tipped down a stairwell... by
this time there were enough dirt cheap "cloud" fax services that it really
didn't make sense to keep the on-prem system, particularly since the office
had been almost entirely migrated to VoIP except for a few oddball devices
like that fax setup. But that whole thing just becomes a story of the internal
woes of that particular megacorporation, from the computer room to the front
office there was a whole lot of stuff just being kept over from the '90s.

------
devchix
I recall sometime in the mid 2000s there was a fever for achieving five-9s
(99.999% uptime, I think -- it became fodder for a few episodes of Mr. Robot).
Not that the metric ever went away, but back then a lot of BigIron(TM) vendors
advertised achieving five-9s by replacing hardware while the OS remained
running and continuing service. Sun 15K and 25K series (Gilfoyle had a used
one in the garage running his network) were behemoths whose mem/cpu boards you
could swap out wholesale while the entire frame and backplane was powered on,
and while the OS the board came out of remains functioning. There were many
caveats around the procedure but it worked. Execs and sales guys loved those
demos. These monsters were expensive and banks and energy conglomerates were
buying them by the dozens. There was also a big todo about hot swappable
drives. The idea that you could be doing hardware maintenance while the
machine was still running was a novelty, something like brain surgery while
the patient was not only awake, but awake and eating, driving his car, talking
on the phone, etc.

A decade later I look back with deep surprise that we didn't think to abstract
out the service instead of the hardware. I don't know how many of those
behemoths are still being bought, now I work almost exclusively with small
server instances that can come and go on the fly. Micro services and AWS have
taken five-9s in a different direction. I frequently think of Sun as a failed
Hephaestus, in a Christopher Nolan film he would be brilliant but could only
turn out clumsy tools because of his deformity, he hates the things he makes
so he throws them away before completion. Men find these cast-offs and temper
and refine them.

~~~
larrik
> a lot of BigIron(TM) vendors advertised achieving five-9s by replacing
> hardware while the OS remained running and continuing service.

AS/400's were capable of that in the 90's (possibly the 80's as well). Heck,
they'd call IBM for replacement parts on their own. You'd show up for work and
there'd be an IBM guy waiting to be let in. He'd swap out a part with no
downtime, and be gone. I've seen machines with uptimes of over a decade with
zero on-site IT.

~~~
blhack
We had one of these at an old office of mine. I actually think it's really
cool.

------
Milank
No downtime is acceptable, but they have only one server?

What if a technical failure happen? What if there's a fire in the server room?
What if there is an earthquake and the building collapses? What if... many
things can happen that can result in a long, long downtime with this tactics.

If uptime is so crucial, the system should be setup in such way that moving
one server should be a peace of cake, not a spec-ops mission.

~~~
momokoko
You’d be shocked how rare downtime is with modern hardware. A redundant power
supply and SSDs in the right RAID configuration typically will not have any
issues for years until it can be replaced by a newer model. Also, hardware
monitoring is significantly improved to the point where you’ll typically know
if something will fail and can schedule the maintenance.

In the past power supplies and spinning disc hard drives would fail much more
often.

It’s basically a solved problem, outside of extremely mission critical, 5
nines kind of stuff, that we all forgot because of AWS.

HN ran, and may still run, on a single bare metal server.

~~~
paulie_a
Quality hardware has existed for years. At a ford motor plant they were doing
an inventory and couldn't locate a 10 ton mainframe. It was working so well
for 15 or so years the tribal knowledge of where it was physically located was
lost.

~~~
Milank
This is all true, but you still can't rely on increased hardware quality if
you can't afford any downtime due to moving (a one-time event) a server.

Also, that doesn't cover other problems mentioned here, like natural
disasters, ISP problems, etc.

~~~
hnlmorg
Often these kinds of SLAs are decided upon based on blame rather than what is
reasonably required by the customers of that system. In this case, moving
offices means the downtime is due to internal reasons. But if an ISP goes down
or there is a natural disaster, then that isn't in their control.

Also cost does come in play as well. Multiple physical links in would be very
expensive for what sounds like internal services. Likewise a natural disaster
might cause bigger issues to the company than those internal services going
down. They might still have offsite back ups (I'd hope they would!) so at
least they can recover the services but the cost of having a live redundancy
system off site might not justify those risk factors.

The customers requires are definitely unreasonable though. I'd hope those
systems are regularly patched, in which case when is downtime for that
scheduled and why is that acceptable but not when you're physically moving the
server? I doesn't really make much sense; but then "not making much sense"
also quite a common problem when providing IT services for others.

~~~
Milank
You are right, their SLA can be a bit different from what we're talking about
here (and expect).

In general, we don't know much about this case. It's a post on Reddit, might
not even be true. As is, it doesn't make much sense, but we don't know all the
details, so maybe we jumped to conclusions.

------
user5994461
I am so scared to imagine what would happen if there was any issue during the
move (very likely when dragging live cables and powers over hundreds of
meters).

The client would immediately refuse to pay anything because he was very clear
he wouldn't pay a thing if there is downtime.

Then, the next contractor would be super quick to judge you and the situation,
reinforcing that you were an incompetent idiot and the client was right to
kick you away on the spot and not pay a dime.

Glad it went well in the end. There is so much to lose for the person trying
to help.

~~~
ThePowerOfFuet
Also, moving a server with spinning disks? What could possibly go wrong.

~~~
daemin
Wasn't here a story about Sun (or HP or someone like that) where they moved a
bunch of disk servers across a parking lot to another building and found that
many of them had died from the vibrations on the trolley cart used to
transport them.

~~~
znpy
it was Yahoo, IIRC.

------
nobrains
I don't know. If the "boss" was charged "4.5 hours of work, 2 hours of
consultancy, and 4.5 hours of consultant", and assuming he would have been
charged half of that with downtime, maybe the boss did get a good deal. We
don't know the cost of downtime for him.

I mean if he had access to technical resources who were willing and capable to
do this for him, he chose to do it.

~~~
moduspol
It's also possible that "downtime" has different meanings to different people.
The client may be seeing "downtime" as the net result of what happened the
last few times the server was "down," which could have been for any number of
reasons (potentially even unrelated to the server itself).

When you get clients describing things like this, it's possible they've been
promised things about this server before by other consultants that didn't pan
out. They don't want to give you the full details because then you'll
recommend a different route that they don't want to take (justifiably or not).

It's easier for them to frame the problem to a consultant in a way that allows
for only one potential solution, even if perhaps better ones exist, because
the guy in charge of making the decision isn't technically skilled enough to
assess whether others proposed by consultants are as viable.

And, of course, one might read a little into why there exists a "boss" with
such a highly-critical IT need that is hiring a consultant to do work like
this, and thinks that threatening to not pay at all if there is any downtime
is the best way to do it.

I mean, what if they opened the door to this closet and it grazed a power
cable on the floor and the machine just shut off? Why even bother staying
around to bring things back up? It wasn't your fault and there's already
downtime: you're not getting paid.

~~~
closetohome
Someone upthread was talking about how, as a Salesman, you have to read the
room and know how to talk to clients. I did that for awhile, and always got a
lot of mileage out of asking the customer what they ultimately wanted to
_accomplish_ , which usually revealed that what they were asking for was a
solution to a self-made problem, and there was a better alternative
altogether.

------
fooblat
> Stupidest thing I've ever had to do.

I don't really understand the "ranty" tone. The client had very specific
requirements and the author came up with an effective solution and was fully
paid to deliver it. Sounds like a win for everyone.

~~~
throwaway0a5e
Reddit (for reasons related to user demographics and feedback loops) rewards
certain types of writing and implied viewpoints. Following best practices and
rules is one of those things. This server migration clearly runs counter to
established wisdom so OP using a writing style of "look how terrible and
asinine this was" will be rewarded and gain traction much more than a "look
how interesting this was" writing style.

~~~
user5994461
It's reddit /sysadmin, the channel is dedicated to rants and horrible
experiences from sysadmin and helpdesk folks.

It's quite sad IMO, don't recommend to go there unless you want to have a bad
day reading about the most horrific work environments and bad practices in the
world.

~~~
DoreenMichele
Perhaps somewhat similarly, r/TalesFromRetail is devoted to kvetching about
your job in the retail sector, but it's really not a depressing place. There
are a lot of rules and expectations about how you tell your story. You aren't
supposed to outright dox anyone or veer into genuine trash talk.

It's not supposed to be negative per se. It's supposed to be entertaining.

It's an art form. It's not everyone's cup of tea, just like horror isn't
everyone's cup of tea. But people often watch horror movies for catharsis, not
because they want to be depressed and wallowing in self pity.

Storytelling is often about educating people about things you can't speak
about more directly. It's often a way of sharing wisdom in an inoffensive
manner and one that will stick because people will actually pay attention,
unlike when you are giving them some dry lecture about some problem they
haven't yet had and don't yet care about.

But if you entertain them, they will read it anyway and that story may stick
with them. And then six months or a year later when they have the same
problem, they will actually remember how someone else handled the same issue
and it will turn a potentially nightmarish scenario into "Meh, I just did the
same thing that guy on Reddit did to his shitty boss/customer/coworker. Worked
like a charm. Moving on."

------
anfractuosity
Reminds me of this -
[https://www.youtube.com/watch?v=vQ5MA685ApE](https://www.youtube.com/watch?v=vQ5MA685ApE)

'Moving online webserver using public transport'

~~~
tyingq
The Indiana Bell building move is pretty impressive.
[http://www.paul-f.com/ibmove.html](http://www.paul-f.com/ibmove.html)

~~~
sschueller
[https://www.youtube.com/watch?v=CNqul9TfJwI](https://www.youtube.com/watch?v=CNqul9TfJwI)

------
Reedx
That reminds me of the Pixar incident where Toy Story 2 was accidentally
deleted while in production and had no working backups.

Luckily one employee was working from home (rare at the time!) and had a copy
of the entire movie on her desktop computer. Which they _very carefully_ moved
back to the office and were able to restore from that.

[https://www.youtube.com/watch?v=7MAedEXri7c](https://www.youtube.com/watch?v=7MAedEXri7c)

------
PinguTS
It's been done 7 years ago even using public transport.

[https://www.reddit.com/r/uptimeporn/comments/1kf26r/moving_a...](https://www.reddit.com/r/uptimeporn/comments/1kf26r/moving_a_server_without_turning_it_off_to/)

~~~
jagermo
The most dangerous part is them expecting the 3G to be available during the
subway ride.

~~~
mercora
in Germany mobile networks work just fine in the subway as ISPs have deployed
hardware there. I actually have more issues with the network when using
classical railroad transport...

------
neilv
Good thing the server had two power supplies. There was a YouTube video (which
I can't immediately find) of people moving a server across town, on the train,
without powering it off, and, IIRC, they had to splice the UPS into the power
cable.

When it's done for pay rather than for fun, and payment is conditioned on zero
downtime, I hope they charged a premium to make up for the risk of no pay.
Offhand, I don't know what's a good way to do that -- I've never had a
consulting client demand terms like that for billed-by-the-hour work.

~~~
kijin
Effective hourly rate = base hourly rate * risk.

Risk = client risk * task risk.

Client risk is based on your past experience with the same client. If they're
prone to demand last-minute changes or stupid stuff, they get charged a higher
rate on every project afterward. Jacking up the client risk factor is also a
nice way to fire a client you don't want.

------
macintux
At my first job we were starting up the company and didn’t really know what we
were doing; one early server was sitting on a folding table and its power cord
was wrapped around a leg, so just to replace the table with something more
robust involved downtime.

~~~
em-bee
the careful application of a saw or an angle grinder would have made it
possible to remove the folding table without unplugging the power cord. :-)

------
D895n9o33436N42
This reminds me of a famously obtuse and obdurate boss who asked for things
that were utterly impossible. He had delusions of grandeur which left him
convinced that he and only he was qualified to challenge the “cheap, fast,
good - pick any two” triangle.

Naturally, I did my best to explain the laws of physics to him, but he
wouldn’t hear it. In a spectacular display of Stockholm syndrome I did my best
to appease him for four years, but, as many of you can surely predict by this
point in the story, I failed in every possible way and eventually gave up.
Just wish I could have my four years back.

I was glad to read that OP at least got paid well for his efforts.

~~~
Tade0
I applaud you for being able to stand four years of this.

I usually get fired from such positions in less than two.

~~~
sleepybrett
I usually walk out about six months in if not sooner. Maybe it's just because
I spent so much time freelancing that I had enough experience to recognize a
no-win situation.

------
JeroenKnoops1
Reminds me of the OpenVMS clusters.. Police in Amsterdam celebrated in 2007 an
uptime of 10 years of their cluster. In this period, all hardware was replace,
and half of it was moved to another location 7 km away. All data moved from
DAS disks to SAN without one application needed to be stopped. Also VMS was
upgraded from 6.2 to 7.3-2. The VMS cluster did not go down during all of
these changes. I <3 OpenVMS

~~~
JeroenKnoops1
During Y2K I've also had to shutdown various OpenVMS servers with uptime over
10 years... Only because of company policies, not because OpenVMS required the
reboot.

------
cpuguy83
I'm picturing that Seinfeld episode where George tries to move the Frogger
arcade from a restaurant that is shutting down but doesn't want to lose his
high score.

~~~
will_pseudonym
HOLES! I need HOLES! :)

------
codingdave
I'm surprised that part of the story wasn't to drill down into the
requirements. No downtime ever? Not even at 3 AM on a Saturday?

I've found that when people are being unreasonable it is because they haven't
split out their true needs from their first idea of how to meet those needs.
In this case the true need is zero impact to users. The owner translated that
to "zero downtime", and then didn't accept alternative solutions that still
would have met his true business need.

------
tzs
I needed to restart a server where I worked. My boss was complaining about the
revenue loss during the down time. I knew the revenue loss (if there even was
any, as opposed to a couple of minutes of revenue simply shifting to a few
minutes later...) would be well under a dollar.

So I listened to him whine for a couple minutes, then tossed a dollar on his
desk, told him that would cover it so he could shut up now, and rebooted the
server.

Warning: you should probably only try this if you are good friends with your
boss. That boss had been my best friend for years before I came to work for
his company.

------
chisleu
I didn't want to lose my many months of uptime for a lan party back in
1999/2000 and we used the UPS to migrate my linux box across town for some
Quake 3 Arena action.

Things were so much simpler back then.

------
linsomniac
Lower stakes, but ~15 years ago a friend had a Linux box in the corner that
had huge uptime. I want to say the uptime started shortly after the kernel
patch that fixed the 400-ish day overflow of the uptime counter. He moved to a
new home and very carefully moved the running server using it's UPS. He didn't
have to worry about keeping networking up though.

I used to be all about long uptimes. I eventually started seeing long uptimes
as a negative though. A long uptime probably means patches have not been
applied.

~~~
jccooper
I also did that once, about the same timeframe, specifically to preserve an
uptime.

I think the cult of runtime came about simply because it was impressive that a
personal computer could stay running for more than a few days when most of the
world ran Win95. And because development cycles were longer and there weren't
a lot of network threats.

------
Humphrey
I haven't read the article, but I'm reminded of that episode of Seinfeld and
the frogger arcade game

~~~
dfsegoat
I have pondered this exact scenario (server move w/0 downtime) - because of
watching that episode - wouldn't have thought about it otherwise.

..It's interesting how pop-culture and your chosen profession intersect, at
times.

------
jfcorbett
Reminds me of the time where IT at a previous employer told us that due to a
"new IT strategy", our production cluster that had been sitting comfortably in
the basement for years had to be moved to an "approved IT hub facility"... in
another office 500 km away and across the North Sea.

There was downtime.

Promptly after our cluster settled into this wonderful new facility, a cooling
pipe in the ceiling leaked on it, frying 1/3 of our nodes.

~~~
yjftsjthsd-h
On a personal selfish level I was quite happy to see our workloads moving to
datacenters that we couldn't (reasonably) physically access, because it
replaced "can you go drive to the DC and replace a failing disk" with "we put
in the request for smart hands to replace the failing disk". Of course,
there's some notable tradeoffs, but it makes me feel better when the business
decides to do such things...

------
kuon
When I was younger (read 20 years ago), I did crazy things like that, not over
that long distance, but moving live servers in different racks.

Now that I am older, I don't think I would do it anymore, too much stress for
a small reward. Also today, most of the time, I am able to "talk out"
customers of crazy requirements, while I would just have said "OK let's do it"
in my younger years.

------
imglorp
The moving server on cart part made me nervous. If there was any rotating rust
in there, bouncing across the parking lot would make things difficult for
flying heads. I'd have hand carried it from stage to stage, setting it on a
padded cart each stop, treating it like sweating TNT.

------
social_quotient
Slight topic drift - Any thoughts on how the pandemic might materially change
assumptions about an onsite/onprem being better than cloud or manage data
center when the code people are now actually remote to the “Local”
infrastructure. Something specific to the reality of the pandemic strikes me
as something that would make the die hard local only folks have to start
rethinking the position.

(Not to suggest it’s bad, just different now that a primary assumption about
people work in the office is less true)

~~~
AnIdiotOnTheNet
As someone who works in a very anti-cloud company culture (which I happen to
agree with), this incident has had no effect whatsoever on that mindset. We
don't dislike cloud because it is accessed remotely, we dislike cloud because
of the lack of control we have over everything running there. If something
happens and our local systems have a problem, there are people here, like
myself, who's highest priority will be fixing it and second highest priority
will be communicating the status of that. Your problems are _never_ a priority
to a cloud vendor and communicating with you is even less of a priority.
That's before we even get into the absurd expenses and reliance on big fat
pipes.

------
noobermin
Sorry but this is ridiculous. It's a great story of a feat of sysadminery, but
the client should have just accepted some down time, even a few hours. The
level of entitlement from some clients people get is just infuriating. Even
down to calling him back for not agreeing to help, what an infuriating person.

That was my main take away from this. Endeavor to be the sort of person who
can refuse clients, the entire idea that "the customer is always right"
enables so much ridiculous behavior.

------
pengaru
Decades ago working in a sysadmin role at a hosting company I had a similar
situation.

The solution I came up with was to fashion a custom male<->male power cord,
like a gender changer, from some broken ATX PSU scraps we had laying around.
By rearranging the power sockets from multiple donors, two male power cords
could be connected on a single enclosure. Internally the sockets were simply
bridged, otherwise the PSU was basically gutted.

With this goofy metal box having two male power cords dangling from it in
hand, I just used a very long extension cord plugged into an outlet on the
same AC phase as the existing server's power source. The extension cord
powered one of the bridge cords. The other bridge cord plugged into the
server's existing - and hot - power strip, forming a redundant power source.
Now the power strip could be unplugged from the primary power source without
losing power, and we just moved the server to the new location with the bridge
box and power strip in tow.

If memory serves the only tricky part was determining which outlet at the new
home was on a compatible circuit. We didn't have much in the way of
electronics tools, no oscilloscopes or anything. Even the soldering involved
to make the bridge box was done using my personal soldering iron, which just
happened to be in the office because some of us raced RC cars there after
hours.

I think I just used an incandescent desk lamp to verify a normal brightness on
the bridged circuit before proceeding with the server, but it's been a while.

I wonder how many people have fashioned AC power cord gender changers
throughout history... :)

------
robin_reala
I always remember this post by the Amsterdam Police who managed to maintain
their uptime on a VMS cluster despite moving data centres in the middle:
[http://web.archive.org/web/20120229042903/http://www.openvms...](http://web.archive.org/web/20120229042903/http://www.openvms.org/stories.php?story=03/11/28/7758863)

------
umarniz
Interesting read, makes me wonder as a thought experiment if it counts as
downtime if the latency of commands on the machines rises to 5 minutes?

You could clone the VM to another instance and record commands going to VM1
and replay them to VM2 after 5 minutes.

This whole brain fart of mine doesn't make much sense but if you play along
with it, does it still count as a downtime or just very high latency?

~~~
pc86
Wouldn't requests time out on the client side long before five minutes?

~~~
heavenlyblue
I don’t know whether it’s the software in general, but ever since I’ve started
using Three 4G broadband in the UK; all of the software started behaving
really weirdly (lots of lockups, hangs, etc). Apps often need to be restarted.

If you do a ping during “bad weather”, you can see that they buffer up to 5
minutes of packets (i.e. there will be no communication for some time, then
you’ll receive a bunch of them with a huge latency with sequences intact).

So I would assume a lot of software could even work that way. I think a lot of
software don’t set any (TCP) timeouts at all.

------
BrianB
It's been done.
[https://i.cdn.turner.com/v5cache/TBS/Images/Dynamic/i439/sei...](https://i.cdn.turner.com/v5cache/TBS/Images/Dynamic/i439/seinfeld-s9e18-1600x900-800x450_071620141129.jpg)

------
Johnny555
Decades ago an ISP I was colocated at did the same thing. I don't remember the
exact details, but it was a DNS server and they either couldn't log in or were
relying on the zone files cached in memory or something but for some reason
they couldn't power it off.

It was already plugged into a UPS, but they had to cut one of the posts off
the rack to get the server out without unplugging it, then they plugged that
UPS into a bigger UPS on a cart and wheeled it to the new data center they
built out in the building next door.

The world was much different at the time -- this coloc provider had a good
reputation, yet.... they had a keg of beer in the corner of the server room
and a stack of adult magazines in the men's room.

------
geocrasher
I once shut down a PC, moved it to another desk, and it wouldn't power back
on. Another time I moved a server to another rack. It had 2 years uptime. Had
to power it down, and it wouldn't power back on. Both required PSU
replacements. Had I moved them _while powered on_ I can only imagine the fun
times.

Perhaps they should have just told the customer they couldn't find it:
[https://www.theregister.com/2001/04/12/missing_novell_server...](https://www.theregister.com/2001/04/12/missing_novell_server_discovered_after/)

------
fredley
This is the kind of content I've only ever seen previously in TDWTF (which is
entirely this sort of content...)

[https://thedailywtf.com/](https://thedailywtf.com/)

------
panpanna
Really disappointed they didn't use a wireless network of some kind.

~~~
dbalatero
I wouldn't, the risk of disconnects is high.

~~~
panpanna
But the risk of sometime tripping over your looong cat6 and breaking the
network is not negligible either...

------
arethuza
This reminds me of a small company I joined many years ago that did
deployments by RAID - find a working server (possibly at a customer site) swap
in a blank HD, wait for it to rebuild then take it and put in a new server and
repeat the process.

Like finding people who argue against revision control systems, it's really
quite a challenge convincing people why things like this are a bad idea -
after all "it works!".

~~~
yjftsjthsd-h
That's... actually fascinating, if in a slightly insane way. There's pets,
there's cattle... and apparently there's a herd of cloned pets, which I'd
somehow never considered before:)

------
akssri
Was it George Costanza ?

[https://m.youtube.com/watch?v=a-FbktgqCqY](https://m.youtube.com/watch?v=a-FbktgqCqY)

------
tzury
I once was called in to export data from a DOS program that had no export
option. Single Author died of heart issues and the company needed the data for
the migration.

After several attempts to understand the binary format I gave up and ended up
printing tabular reports to LPT1 which I connected my laptop to, extracting it
and rebuilding CSV files.

Lucky enough, printing those days were the most important feature of a
business app.

------
Zenst
Interesting story and one that has played out a few times, I'm aware of a
couple verbatim to that. Another - used power extension leads to cover power.
Key being systems with dual power units (most servers do) and networking so
you can switch from one run to another.

But have known some large companies who have in their history, done things
like this and other creative solutions to impossible problems.

------
ericyan
The consultants really should told the client that if all you have is a single
server then there is no such thing as "zero downtime".

------
xyst
Why wouldn’t cloning the VMs to a second server, then split the traffic
between the primary and secondary server work? Once traffic to the second
server is confirmed, you could shut off the second server and haul it off to
the new location.

I would probably still charge a much higher rate since the owner was an arse,
but at least you would get back your 7-8 hours.

~~~
viraptor
You're making assumptions about what's running on the servers. Let's say it's
a VoIP conference server with a shared dedicated room - effectively you have
an ongoing session shared between multiple connection and you cannot stop it.
Or you have stateful local processing so you can't "split the traffic". Or a
number of other limitations...

------
neya
Dude has ONE server and talks about having 0 downtime for his clients? What
the hell?!

In a way, this is Darwinism for the IT industry and I'm happy the people
involved got paid well. Due probably paid as much as it woulda costed him a
new server. I bet he'll never forget this lesson.

------
growt
Setting up a new server at the new location and moving the VMs one by one to
the new server as they become idle should be possible without downtime. But
maybe there were other requirements (like no new/additional hardware) that
weren't mentioned in the article.

------
aeno
A real classic comes to mind:
[https://www.youtube.com/watch?v=vQ5MA685ApE](https://www.youtube.com/watch?v=vQ5MA685ApE)

Moving a running server about 7km through public transport without downtime.

------
aleem
Sometimes it's better to seek forgiveness than to seek permission.

Saturday 3AM shift with a 5 minute downtime would work just as wwll. Unless
this server has had historically 100% uptime this would go unnoticed.

------
tobyhinloopen
10 hours investment for no downtime seems like a good deal for the owner

~~~
topkai22
Depends on if he really has customers accessing the system "all the time."

Besides, as pretty much everyone has noted, running a zero-downtime system on
a single physical machine in what sounds like is just a normal cable room is
kind of nuts. Those 10h would have been much better spent to move that puppy
to someone else's data center and get some redundancy.

Although reading between the lines, maybe the lease was up and they were
waiting to the last minute to move it.

------
johnklos
I've done something like this - server running off of UPS moved from one
building in Manhattan to another about 1/4 mile away, in snow... Not for
someone with weak arms.

------
shireboy
You realize the client is condescendingly mocking the guy for saying it can’t
be done now, and will expect this next time they run updates on the server,
which is to say never

------
jtbayly
Seems very risky. Not something I’d want to do if minimum downtime was the
goal. One wrong piece of gravel ends up with catastrophic failure instead of 5
minutes of downtime.

~~~
dmurray
But the goal was zero downtime, not minimum downtime. The client made it clear
that 5 minutes of downtime was equivalent to catastrophic failure. So they
correctly found a solution that reduced the chance of "5 minutes of downtime",
at the expense of an increased risk of catastrophic failure.

~~~
jtbayly
I understand that. I just doubt that the risk was worth it, if downtime is
such a big deal.

------
mercora
i wonder if its really possible to do the initial setup of the ethernet
failover without interruption. i have never done this, but i would expect the
interfaces themselves will become unavailable for direct use and you get a
completely fresh virtual ethernet interface which represents whatever physical
interface is currently active... at least this is what happens when you add an
ethernet interface to a bridge in linux...

------
g051051
I guess servers have gotten a lot more robust in the last decade...there's no
way any server I ever managed would survive something like that.

~~~
maeln
A lot of server are SSD-only these days which make them less fragile. Still, I
really wouldn't see myself pushing a running server in a cart.

~~~
tyingq
Yeah, there's certainly still things like riser cards and connectors that
could come unseated due to vibration.

~~~
marcosdumay
That's probably a problem for the next guy that takes an ops job there. Loose
pieces often don't disconnect right at the same instant, and even when they
do, memory caches usually postpone the failures.

------
neycoda
I wonder if there was any legit reason to require no downtime. Otherwise the
owner doesn't understand what downtime means for his business.

------
partiallypro
This reminds me of the Seinfeld episode with Frogger

------
lightlyused
I seem to remember a similar story from another site. I'm thinking it was
thedailywtf.com, but I can't find it.

------
mercora
when i was younger i was super proud that i could replace my disk while i kept
working on the device. i would put the new disk into my LVM volume group moved
all extends to the new disk and dropped the old disk out of the VG afterwards,
when done i could just unplug it and be done without halting work except for
kicking off the process.

------
rafaelturk
Fun reading. But my advice is never accept a job like this. This could easily
become 2 weeks down time

------
merb
meanwhile in germany, german telekom have their connect ip lines (leased
lines..., company internet..) shutdown since tuesday morning. so a downtime of
over 48 hours, besides a sla that no downtime will be longer than 8 hours and
a availability of 99,9%.

what a crazy world.

------
exabrial
Reminds me of the "hot slide" technique used for old telephone switches

~~~
ansible
They did some crazy stuff in the old days. Like when they moved a telephone
exchange live... the whole building.

------
webscalist
Could've been cheaper to buy/rent another server, put it on the new location,
set up redundancy/replication, power off the old server, move it to the new
location, return the new server. Or just keep it for sanity.

------
snvzz
Incredible. Five minutes of VM downtime were not acceptable. And yet, they had
a single VM host. Should it catch fire (hardware FAILS!!!), what then?

------
kyuudou
It's called vMotion

------
rafaelturk
Pictures please!

