

Infrastructure Debt - etaty
http://www.littlehart.net/atthekeyboard/2011/11/03/infrastructure-debt/

======
gizmo
You have to walk a fine line. If you neglect your infrastructure for too long
you get in a situation where you get bogged down by all the circumstantial
complexity in your infrastructure. If you are too zealous about debt you
automate things that don't need automating and you end up with an over-
engineered infrastructure that is completely automated but never works quite
right.

As with the design of software projects the design of the infrastructure takes
time. People often go through phases where they take it either far too
seriously (after they got burned) and through phases where they cowboy-code
their way through.

The problem here is really mundane. It's difficult to set up an architecture
right and you don't get there by throwing buzzwords to the wall (and see which
ones stick). Even if you use multiplexed dictionary stores, Bazoop Clusters,
BunnyHQ, and tricircular backups you still may end up with with a mess of an
architecture if you don't know exactly which problems you're trying to fix.
And you can easily waste a few weeks evaluating different deployment tools and
automation software and still end up with something that works only barely.
The opportunity cost is immense.

~~~
Retric
It's not a static trade off either. If you see two world class developers
debating some tradeoff it's about as relevant to the average coder as what two
Olympic level swimmers discussing their diet has to do with someone trying to
lose some weight. AKA avoid holes to big for your team to get themselves out
of.

------
raganwald
I like the article, but am thinking about a bike shed. For each thing, like
manual deployments or failure to use source control, I ask myself: Is it
“debt?" Or "friction?”

Debt and friction both accumulate, but debt must be "repaid" in a lump sum. If
"technical debt" or "infrastructure debt" means your velocity is falling over
time, it's friction, not debt.

Infrastructure debt would be something that is eventually going to cause
everything to halt while you sort it out, and the longer you wait, the worse
it will be. Not using source control is debt, because with near certainty you
are going to have at least one major SNAFU requiring spelunking through
backups to recover a lost file or to restore some prior release.

Developing without deploying at all is definitely debt, sooner or later you
are gong to have to deploy.

OTOH, lot of "technical debt," isn't. It's just friction, and it takes some
experience to know when it's a bad tradeoff.

~~~
grumpycanuck
(Blog post author here)

Upon reflection your use of the term "friction" is a good one. Although I tend
to look at "debt" as something you pay back as quickly as possible but
sometimes you can only pay it back in installments.

I guess you could say having too much infrastructure friction eventually leads
to one humungous infrastructure bonfire. :)

~~~
raganwald
Well, “friction” is on ongoing cost but doesn’t result in some catastrophic
event down the road, whereas debt may or may not have an ongoing cost but
carries with it some non-trivial probability of disaster. Like real debt, the
longer you wait to eliminate it, the more expensive it is to fix and the worse
the catastrophe if you on’t fix it.

So... I tend to think of source control problems as being infrastructure debt,
because you are definitely going to crash and burn eventually, and the longer
you wait, the worse the problem will be. I am open to rethinking this, but I
would classify automated deployment as being friction. If you can deploy by
hand, and everybody knows how to deploy by hand... It seems that deploying by
hand is probably friction while you are in development and then debt once the
product is in “actual” production with end users. In development, you might
make a mistake, forget a library, and fixing it is work but not catastrophic.
But once you have actual users, making a deployment mistake could produce
irrevocable disaster.

Anyways, just to be clear, I’m only bringing up the distinction for the sake
of discussion. I like the post just the way it is.

~~~
karlmdavis
Keep in mind, though, that even "friction" will eventually cause a team's
productivity to drop down to 0. The amount of friction that a given team can
tolerate depends on the team members, of course, but especially on the size of
the team (see: Mythical Man Month).

There comes a point that even a tiny bit of friction, spread out over enough
developers, eventually causes the marginal productivity boost of adding new
devs to reach 0. That can happen even without friction, but the friction
exacerbates the trend.

------
narcissus
I get the benefits of having identical environments as production (and that's
what I aim for, at least, with our staging server) but to be honest, I like
having a small amount of diversity across our development servers. It's the
small differences that have helped us to work out that bizarre bug (turns out,
it was actually a problem specific to a particular version of PHP) or to find
those small assumptions about the environment (eg. assumed file paths and so
on).

The latter aren't _that_ much of an issue, but I find that being able to deal
path changes, for example, makes the code that little bit more flexible,
making it a little easier to make changes to the production environments going
forward with less issue.

~~~
gbog
I just changed job and got from a very unified setup (Linux-only, everyone
working on one server) to a much more disperse one (all OSes, working on
local). I agree that some diversity helps to keep flexibility and have a code
base, data model that is more forgiving for little discrepancies.

But I guess there must be some checkpoints, or rules, were you don't allow any
flexibility. You build a much more flexible project on rocks. For instance, no
deployment if one test don't pass, and tests should include "X"-lint checks.

------
dirtyaura
I think the author has a different interpretation of "technical debt" than how
it was originally intended.

Technical debt is not accidental and it doesn't consist of small mistakes, but
it's due to a deliberate decision to cut corners when trying to get product or
feature launched. Just like financial debt is not accidental, but a deliberate
decision. (although you could argue that credit cards etc can create
accidental debt too).

Idea is that it's okay and usually a good business decision to accumulate some
technical debt to speed up your product development, but you need to keep eye
on it and reduce it regularly, because if it grows too large, it can totally
halt your product development.

~~~
grumpycanuck
My experience is that it is a very rare and enlightened management team that
allows developers to go back and clean up the technical debt that was created
during the mad dash to get your application out the door.

Most of the time you are stuck with technical debt because there is no room in
your timeline to go back and fix stuff that you know is broken.

YMMV, but as the blog post author I'm keenly aware of this situation.

~~~
feralchimp
Sometimes it's up to engineers to say "I'm fixing this," and for QA people to
say "I'm not signing off on this until it's fixed."

Maybe I've just been lucky, but all the places I've worked had developers and
QA people that were willing to force quality into a product.

------
feralchimp
1\. I'm confused about who the audience of this article is supposed to be. If
you're a software development team that is not using version control in a
fairly deep way, you're not "in technical debt" so much as "wasting someone's
time and money with your monumental incompetence." I suppose it's still good
to note that you should be using it, but you're not at the point of worrying
about subtleties.

2\. What the author calls "infrastructure debt" is just technical debt outside
the application source code.

3\. Technical debt doesn't (just) happen because people are lazy, or take
shortcuts, or "plan to do it the right way later.". It happens because you
generally don't start an engineering effort knowing everything relevant up
front. Indeed, it's "engineerIng" precisely because you're learning important
things and uncovering subtlety as you progress. Technical debt is the
inevitable outcome of the fixed past rubbing up against the newly-discovered
present or anticipated future.

Update: So I guess I'm in the 'friction' analogy camp. :)

------
jader201
What about database design debt? Or is this still considered technical debt?

In our company, this is 90% responsible for the paralysis that is keeping us
from migrating from a 10 year old enterprise architecture. It is designed in
such a way that makes it hard for us to extend and scale, but the whole
foundation of our business rests on this outdated model. It's to the point
that the only way we can move forward is to start over.

I've read about technical debt, but it's usually in reference to code design,
lack of proper testing, and tightly coupled dependencies.

And we definitely suffer from "infrastructure debt" as this article describes,
but I feel this is the least of our problems.

To me, possibly the most expensive kind of debt to be in is database design
debt, as everything rests on this. At least this is the case where I work.

~~~
grumpycanuck
(I'm the blog post author)

I would place database design debt firmly in the technical debt side of the
ledger (as it were). To me, infrastructure debt deals more with consistency
across environments and consistency in moving code from one environment to
another.

I also agree with you that database design debt is very expensive and, as you
pointed out, leads to paralysis over fixing problems with an established
application.

------
arethuza
"Version control is a 20 year old, well-understood concept."

Closer to a 40 year old concept:

<http://code.google.com/p/pysync/wiki/VCSHistory>

The first, at least according to that article, being SCCS in 1972:

<http://en.wikipedia.org/wiki/Source_Code_Control_System>

~~~
grumpycanuck
Blog post author here. Yeah, I figured it has been around a lot longer than 20
years but I thought the 20 year thing would get the point across.

~~~
arethuza
I was nitpicking, but it got me wondering about the history of revision
control systems as I was pretty sure I had used RCS in the 1980s and I had to
check to see if my memory was playing up. :-)

------
valjavec
Best quote to conclude article:

"Don’t be scared of change, be scared of the debt growing in your code base
and in in your infrastructure. It won’t go away and there is no government
bailout on the way to fix it."

------
TwistedWeasel
Some good points, but very narrow focus on web development. Not everyone can
work in a VM for performance reasons, and many projects target multiple
different environments making it useful to have your devs on different
versions to get more eyes on problems that might arise there.

In the video game industry, we usually split our developers up between each
game console, so there are a half dozen different dev environments in use and
it works better that way.

I like the concept of infrastructure debt, and it's clear he's talking from
his own personal experience. I'm sure many examples could be made for other
software industries that are also valid.

The hard part about solving these kinds of problems isn't entirely a technical
problem, because many of them are caused by bad habits and stubborn
programmers. To fix these issues you need to change people's daily working
practices, not an easy task.

------
jwatte
I work with 40,000+ unit tests. I deploy across a live cluster with a single
command. My co-workers and I deploy to production dozens of times a day. Every
new hire gets a VM image sandbox to develop in. We still have infrastructure
debt. The truth is, debt leverages the future into the present, and as long as
your net productivity keeps increasing, you're winning. You have to pay off
enough debts along the way to not grind to a halt, but you will always be in
technology debt. If you're not, you paid too much for what you have.

------
endlessvoid94
I must advocate reading James Hamilton's paper: On Designing and Deploying
Internet-Scale Services.

[http://www.usenix.org/event/lisa07/tech/full_papers/hamilton...](http://www.usenix.org/event/lisa07/tech/full_papers/hamilton/hamilton_html/)

~~~
grumpycanuck
Just took a look, that is an awesome paper.

~~~
endlessvoid94
It's invaluable. I can't believe it's not more well-known.

------
dcolish
I liked the article because it made me think, again, about the debt we have at
work. However, the few tacit solutions are not going to get me anywhere. This
article needs a part 2 for debt that cannot be fixed with trivial solutions.

------
gbog
To the author: would a bad choice of language, framework, persistence layer
count as infrastructure debt? If so, then it should be noted as the most
dangerous of all.

~~~
grumpycanuck
Author here:

With the understanding that I do most of my work in the most-bullied, down-
trodden programming language here on Hacker News (PHP) I think that your
choice of language, framework and persistence layer leads to _technical_ debt
or friction as raganwald commented above.

Like I said before, I view infrastructure debt as the cost of consistency in
your environments and the cost of moving your code from one environment to
another. Very rarely is the choice of language going to be a problem, unless
you are trying to use a language in a way it's not intended to be uses (to
throw out a completely random idea, like trying to use PHP as a functional
language ).

Frameworks are a sore point for people, but mainly because they choose to
fight them instead of trying to do everything the framework's way. Picking the
wrong framework is a technical debt situation, not a problem of moving code
from dev to production.

Persistence layer stuff is also a technical debt / friction issue. Chances are
that you could use that particular data store without the persistence layer
you chose. For example, I struggled to learn Doctrine1/2 but once I learned
how DQL worked it became a lot easier to break out of the object-only
contraints and create custom queries.

Hope that makes sense and answers your question.

