

Fictional Plumbing Problems As A Tortured Analogy For Software Engineering - antubbs
http://latentcontent.net/2012/05/12/plumbing

======
GavinB
This may be a tortured analogy, but it boils down into a basic problem:

1\. You know there's a bug

2\. You can't reproduce it

Several next steps come to mind:

1\. Hire an outside expert who's dealt with this sort of thing before. They
may be able to theorize what's going on and come up with a solution.

2\. Install measures that don't prevent the problem but prevent the damage.
For example, an emergency failsafe that shuts down the system or relieves the
pressure when the incident occurs, thereby preventing the damage. This is why
we electricity has fuseboxes! Error management is sometimes the only option,
because 100% error prevention is impossible.

3\. Install monitoring that tracks a lot more details then you are currently
getting. When the next error occurs, you will know a lot more and may have the
information needed.

Edit: What's the name of the theory in networking that 100% error prevention
is not possible, so error handling is the only option? There was a great
article on HN about it a few years back.

~~~
drone
My experience has almost always led to #3 being the most workable solution,
but not a perfect one.* #2 should be incorporated into any project, but it
presumes that you know all possible ramifications of incorrect operation. An
electrical breaker works because complete non-operation is generally better
than death. For many software companies, complete non-operation is a precursor
to death.

#1 is almost never a good solution, namely, the amount of time it would take
for them to become familiar enough with the codebase to not aggravate your
existing engineers would exceed several iterations of #3, and also because
I've rarely met an outside expert whose solutions didn't involve re-writing
everything to meet their expectations of "correct implementation," this could
be a sample selection problem on my part, however.

* - How do you know that you are monitoring the correct component? This path usually leads to multiple monitoring development tasks as you find where you thought the problem was sourced was a in fact symptom, and you continue adding more monitoring options as you get closer to the source. This is why I almost always add an insane level of logging to any application, and control the verbosity through runtime controls.

~~~
dredmorbius
Brief non-operation (reboot / service restart) is often better than a
prolonged outage. Particularly where SLAs are set to create an expectation and
acceptance of this, and where redundancy exists.

I'm thinking too that there's a feedback process at work here, and some sort
of damping mechanism would help with that.

~~~
drone
Agreed, and many architectures are designed to have components "transparently
fail" without impact to overall operation. When you have forced failures,
feedback/damping is absolutely required. However, (my experience dictates)
that most such failures are unplanned and unknowable at the outset, and you
can only dampen conditions which are predictable.

------
stcredzero
Maybe programming is tortured analogies all the way down? (Not really, but
there are some over-engineered code bases that feel like it.)

~~~
dredmorbius
No, it's turtles.

 _Tortured_ turtles.

~~~
stcredzero
Actually, it's tortured turtle analogies all the way down.

~~~
dredmorbius
Well played, sir.

------
dsr_
IN the apartment building, we have a known problem, a high severity attached
to it, an unacceptably high incident rate, and no idea of the exact conditions
necessary to replicate it.

At this point I would do two things:

1\. log all the things.

2\. find me my top QA person, the one who can find bugs that nobody has yet
reported. Put her on it.

OK, everybody knows that logging is good. And everyone knows that QA is good.

What I have found, though, is a number of companies who think that QA is best
done by the developer who wrote the feature... and I think they are absolutely
wrong in every sense, except possibly short-term economics. Having someone do
QA who has none of their ego invested in the code is essential.

------
tomjen3
The problem here is that there is very little, if anything, as complicated as
software. Preventing leaks like in the example is not that difficult -- you
put in pipes that can handle a lot more than the required load, because it is
unacceptably expensive to have them burst and the better pipes are not that
much more expensive (putting them in is).

~~~
jaylevitt
Nope. I actually live in a luxury high-rise, and while the pipes don't leak,
the pressure and temperature is about as bad as the OP describes.

No bidets, though.

------
duwease
I was hoping for an analogy to explain how difficult it is to estimate long-
term programming work due to unexpected "black swan" details popping up as you
get into the work that add considerable effort to the project. It's a
situation I find I need to explain often, and in layman terms, so a perfect
analogy would be great...

~~~
antubbs
I think that's been beaten to death with [http://www.quora.com/Engineering-
Management/Why-are-software...](http://www.quora.com/Engineering-
Management/Why-are-software-development-task-estimations-regularly-off-by-a-
factor-of-2-3)

------
scotty79
Redo the bathrooms using different layout and components.

~~~
alainbryden
"If at first you don't succeed, refactor."

~~~
drone
Or, you can pivot... Turn the bathrooms into fishtanks!

