
Configuration (mis)management or why I hate puppet, ansible, salt, etc. - jtrtoo
http://www.scriptcrafty.com/configuration-mismanagement-or-why-i-hate-puppet-ansible-salt-etc/
======
bostik
I think couple of our engineers have said it really well, so I'm going to
quote them both.

1\. All config management systems suck in equal measure, but in different
fashion. This is why you pick the tool that sucks the least _for your
particular usecase_ , and hope that two years down the line you made the right
choice.

2\. People who don't know how to write proper code end up creating two kinds
of things. Either they write a config management system, because they believe
it doesn't require deep diving; or they try to write a monitoring system,
because they don't understand what they are getting into. One way or the
other, the hidden and emerging complexities will eat them all.

Config management is hard, and the more stuff you have to massage on various
hosts, the harder it gets. (For the record: I'm a big fan of slim hosts and
immutable infrastructure movement.)

~~~
_qc3o
I've started using terraform and packer since I wrote that post. I'm going to
stick with those tools for the time being. Configuration management is solving
a problem that no longer exists. Pre-bake your infrastructure and then
download the rest at runtime if you can't do it at bake time.

Coupled with docker, hashicorpt vault, and fpm I think the problem of
deploying and configuring anything is pretty much solved and the remaining
issue is basically how you go about doing orchestration for things like
rabbitmq clustering which I don't think any of the existing configuration
management tools try to address anyway. You either need to configure the
topology ahead of time which kinda defeats the purpose of having a reactive
infrastructure or you have to write some kind of custom thing on top of what
you are using already which again kinda defeats the purpose since most of
these configuration management tools try to be declarative and you often need
imperative things to configure something like a rabbitmq cluster.

~~~
bostik
Terraform is a neat tool. From my limited experience so far, it manages to
hide most of the complexity when dealing with EC2 network setup. We haven't
tested it with VPC peerings yet, but do expect to see that one in a near
future too.

I don't quite agree with you about config management being useless, but its
role certainly can be reduced. A real life example: database hosts with
dedicated high performance volumes, their read replicas, and due to regulatory
compliance, always up-to-date off site replicas on physical hardware.

Same holds for load balancers, btw.

And as for docker, I have mixed feelings. I disagree with quite a few of their
design and engineering decisions, let alone with their QA so far. However - I
do think that they _nailed_ the basics of the devs' workflow UX. Anyone can
write a dockerfile, and the deployment story is pretty well thought out. (I
happen to disagree with the registry address being part of image namespace,
tbh.) Things start to get a bit iffy when you need to consider exported
volumes, and host interaction in general; why can't you declare per-image
default export paths on the host configuration? Why is restart policy
controlled by arguments to dockerd, instead of metadata inside the built
docker image?

I know the engineer in us all screams "that's a job for the orchestration
engine, stupid!" but the fact that you need to accept and integrate with a lot
of external complexity tells me that a lot of questions in this space are
still open.

------
smacktoward
To the author: I know that TFA includes Ansible in its list of crazy tools,
but after reading the list of complaints I think Ansible is actually more or
less what you're looking for.

\- Ansible needs no centralized server.

\- Ansible does everything over SSH, so you don't need to install client
software on all your machines.

\- Ansible can be configured to fail loudly when it fails ("ansible -vvvv"),
so if a task you specify doesn't work it's generally not hard to figure out
why.

\- YAML as used in Ansible isn't really a DSL; it's just used to produce a
very human-readable text recipe for bootstrapping a system. You can complicate
things beyond that if you want to, but by default you're just specifying a
list of tasks to be run on the target machine serially. You don't have to
learn a programming language like Ruby to be productive with it.

\- Ansible has a bunch of modules to make things like creating users, setting
up popular applications, etc. easy, but if you want to you can ignore all that
and just specify each task in your "playbook" as a shell command. So if you
can provision a machine by hand from the shell, all you need to do to write a
functioning Ansible playbook is learn a little YAML syntax and Ansible's
"shell" and "command" modules.

\- Vagrant supports Ansible as a provisioner, so if you want to do local
testing you can pick up your Ansible playbook and feed it to Vagrant to
produce a local VirtualBox VM configured to your specifications. Then once
it's working you can use the same playbook to configure your remote
machine(s).

I resisted configuration management when the choices were Chef and Puppet for
many of the same reasons articulated in the post; they were heavyweight,
complicated solutions designed for large-scale use. Ansible removed all my
complaints. It's easy to pick up, requires very little software, and when
making design decisions generally goes with the simplest thing that could
possibly work.

~~~
dozzie
> \- Ansible needs no centralized server.

No, but it requires the same centralization on a different level: SSH key.
Good luck migrating to another one once a need arises (e.g. one of the
sysadmin leaves).

> \- Ansible does everything over SSH, so you don't need to install client
> software on all your machines.

Once you start configuring SSH, you risk cutting off your debugging channel.

Also, you wouldn't put your database connection management in user session
handling code in your application, would you? So why mixing two totally
separate things in one service?

> \- YAML as used in Ansible isn't really a DSL [...] You don't have to learn
> a programming language like Ruby to be productive with it.

We've all seen how this was claimed for XSLT: "it's not code! it's human-
readable! you don't need to learn anything beside XML!". It didn't end pretty.
Why YAML-encoded programming language would be any different?

Ansible is not a configuration management tool. Its very architecture (push-
based SSH+sudo shell commands) sucks heavily for this task, even if you only
have a dozen servers. It was never designed to keep OS state as encoded in
intents. Ansible is just a tool that somebody hacked for their specific
deployment case and then advertised as a general-purpose tool.

------
Spooky23
This summarizes the post:

> Why the fuck are build/ops folks making decisions about how the application
> should be deployed and configured?

The answer is really easy -- the build/ops team is configuring, deploying and
running the application.

Config tools aren't perfect, but they are better than the old school manual
configuration or letting individual developers cook up special snowflakes to
do things the "right". Way.

You should be using as close to one configuration management tool as possible,
and as a developer, you should learn how that tool works.

~~~
havetocharge
This is exactly right. Ops are making decisions about how the app runs because
they are the ones get paged at 4 am when it suddenly does not.

~~~
CaptSpify
One of the best strategies I've seen with this is: put devs on call. As soon
as they are woken up at 4 am, you can see entire teams turn around their
attitude on how things are run.

~~~
seanp2k2
Yep, this. PagerDuty and Slack can help. Many alerting and monitoring services
now work with these. Info / debug stuff goes to slack, urgent and important
stuff goes to PD. Make a quadrant of how important and how time-sensitive each
alert is to respond to and make them go to the respective mechanism. Email can
stand in for Slack, but people tend to ignore emails and filter them if they
start getting spammed with important-but-not-their-problem things, then
they'll miss the important-and-their-problem alerts later because their filter
matches those too.

Suddenly, everyone is writing tests, testing locally, deploying canaries,
getting demos ready 2 days ahead of time, and not deploying at 6PM on Friday.

~~~
CaptSpify
Sorry, but I fail to see how

A) That is exclusive/special to things like slack or pd. Seems like a lot of
tools could handle this.

B) How those tools are related to being woken up. I exclusively use email for
work notifications. Chat is purely for BS and chit-chat. The last thing I
would do when woken up at 3 is check chat.

------
PaulHoule
The Unix equivalent of your Powershell scripts is bash scripts. You can go
pretty far that way.

I have looked at the many configuration management tools out there and I think
they do add more complexity than they remove.

~~~
placebo
Agree. Occam's razor isn't always adhered to in the software world. Often a
tool that is more hyped will be used instead of a tool that is more suitable,
but I'd be careful to blame the tool - the problem is usually the choice to
use it in a scenarios that doesn't require it at present (and probably not in
the future)

~~~
daenney
This, a million times this. Ansible, Chef, Puppet, cfengine and a bunch of
others serve certain use cases. Unfortunately, once you give someone a tool it
quickly turns into a hammer, then a jackhammer, a crowbar, sonic screwdriver,
fork lift and a lot more.

I've seen Puppet used as replacements for simple bash scripts, more complete
management of a host (this is what I would consider config management),
deployment tools and distributed consensus.

------
ivan_ah
Fabric is a good example of config management as code. While all the app-
specific configs are much easier to do this way, it feels like many other
tasks (install packages, create users, etc) could be "outsourced" to some sort
of shared recipe. So I guess it depends on how "custom" your deploy and
configs are. If you're doing something standard, ansible/chef could be a good
fit, less so if there are many tweaks etc.

If anyone is interested in pursuing fabric for their config+deploy, check out
these libraries of higher-level system functions:
[https://github.com/sebastien/cuisine](https://github.com/sebastien/cuisine)
or [https://github.com/ronnix/fabtools](https://github.com/ronnix/fabtools)

~~~
davidgerard
Config management? We use it as a build tool. It's _lovely_ because your DSL
is Python and the Fabric bits are libs. Certainly lovely having come to it
from Ant, whose DSL is Brainfuck in XML. I just wish Fabric had a decent Java
story (it totally doesn't).

------
convolvatron
it pains me to see this divide between operations and development perpetuate.
the developers (whether internal or external) ship some nonsense. it needs
excessive configuration. of course it needs to be keyed. it doesn't manage its
own resource usage properly. it needs to be protected from malicious agents.
it has a ton of ever-shifting external dependencies.

so the ops (or devops) people, not being developers, and more importantly not
having a mandate to spend months or years trying to address the fundamental
issues try to wrap the fragile broken thing in a blanket. they deploy it, and
build up an equally messy infrastructure around it.

this goes on to such a degree that the developer, probably suffering from
his/her own limits on time, and scope limitations, finds themselves unable to
deploy or even test the system in question without becoming a full time ops
person.

so the process gets worse, and we have 'canaries', which is basically an
admission that developers cant even test simple changes to make sure they work
before they are deployed.

ops has to deal with poorer and poorer software

developers have the scope of change cranked down so low that basically nothing
can be done

we hire more and more people to try to get things to happen, and none of them
can get anything done because its all so gridlocked.

the answer is clearly repeatably and trivially deployable builds, without any
secret commands and passwords that only billy really knows. including all the
configuration and state required to being up an instance of the service from
both a development and operational context. this, apparently, is akin to
asking for an honest politician, or a perpetual motion machine.

------
ajamesm
I'll definitely echo the sentiment about layers of incomprehensible crap in CM
tools.

I wrote my first set of Ansible scripts recently, and found myself deeply
pondering, "who on Earth thought YAML was the best option for a Turing-
complete language?"

CM tools should be saving me time. Writing under an isomorphism from BASH to
YAML does not save me time.

~~~
Terretta
Don't program in Ansible, declare desired state.

Use the Turing complete language under the hood, Python, to program. Use
Ansible to compose: declare and configure your stack, invoking the things done
by the real language.

~~~
ajamesm
Okay, except I was already writing Python scripts to deploy, so now I'm where
I started off except I have one additional engineer yelling at me for not
doing things The Ansible Way.

This is the exact problem the article was describing -- CM tools promise ease-
of-use and simplicity, then that premise is inverted entirely when the
developer is now doing additional work to satisfy the operations team's
standards.

~~~
Terretta
Separating config from code or separating declarative environments from
orchestration of components aren't to satisfy your ops team standards, they're
things you probably want to do as a developer thinking of infra as code.

Also, DevOps: It doesn't mean same person doing all things, but it does mean
you don't think in us vs. them.

Think about your app/service cradle to grave, do the right thing.

~~~
ajamesm
As a developer, I'm trying to meet deadlines with good-enough code.

I'm not stupid and I certainly understand why someone would develop a general
deployment strategy, and then each case can be described as a set of
parameters. Again: the use case is clear to me.

All of this makes sense in a world where the ops team is world-class and
responsive, and you have the schedule and resources to delay deployment for
another 2 weeks while you lovingly craft an Ansible module instead of
hardcoding a BASH script.

That does not describe business environments.

------
niftich
The author's experiences are fairly horrific, and they stem from their
workplace's misunderstanding of what config management tools are for and how
to deploy them properly.

The key quote is this:

"Doesn't take a genius to see that process is broken and that there is some
kind of impedance mismatch and the tools are not helping."

------
zeveb
> Umm, why can’t I just have a token in the code base I can use to query a
> service to give me that data?

Wait, is he arguing for including privileged tokens in source code‽

~~~
brazzledazzle
I got that impression as well.

I think he's on the right track with some frustrations but the whole piece
kind of sounds like me a few years ago which can basically be summed up as:
"fuck this is fucking stupid. why did they do it this way? fuck this." I'm not
saying he's wrong but it does smell a bit like a lack of experience. That can
result in some great new ways of thinking and doing things but it also has the
downside of not realizing that sometimes decisions were made because they were
the best ones given the circumstances or there's serious blind spots in one's
experience that keep them from seeing the true value of a decision.

~~~
dkarapetyan
Since decisions are often made under the pressure of business goals from a
technical standpoint they are almost always the wrong decision. In instances
like that it takes a certain amount of experience and backbone to either say
no to horrible hacks or put in the right kind of mitigation and fixes as soon
as the business goals have been met. If it intuitively feels wrong then I
don't really care what justifications people had at the time for writing a
horrible mess. Objectively it is a horrible mess still even if I forgive the
people that made the decisions that led to that mess.

Since I wrote that post I've seen the same mistakes repeated over and over
again. My tools of choice at the current time are terraform, packer, some kind
of secure storage mechanism like hashicorp vault, and plain old shell scripts
or rake files.

I'm also not advocating checking in secret tokens into the code base but you
need something that gives you access to the place that holds the secure
tokens. It is important to draw that distinction since the extra layer of
indirection actually provides you enough control to revoke and recycle tokens
much more quickly than if you had checked everything into the code directly.

~~~
brazzledazzle
I agree, it does take experience and backbone and even then it may not be
enough. It's basically technical debt and unfortunately that's still something
many organizations struggle with today.

While I think you're right about many things being objectively bad from a
technical perspective there are a lot of decisions that aren't just technical.
You may end up using the lesser of two options of the lesser one is more
compatible with the expertise present in your organization. Sometimes it's not
clear cut at all and only time and real world usage reveals the pitfalls. It's
easy to point at that and say "that's stupid and this is terrible" a few
months or years down the line. The hard part is trying to make those
decisions.

Let's look at Puppet as an example. Let's say you have an organization that
has a couple people out of a dozen that can write shell scripts that aren't
spaghetti. But if everything depends on them they become a bottleneck. You
know there's a bunch of halfway decent modules written out there for a lot of
stuff you need and your two people can close the gap. The rest of the team can
build knowledge and skill over time but can start putting modules together to
build infrastructure today. So you can choose shell scripts and fixing your
personnel/skill issues or you can choose Puppet and deal with the warts.
Training or hiring new people would be obvious but given the reality of budget
and politics at $bigcorp it's a no-go or you're looking at years of effort.
Even if it's cut and dry technically it's really not always a simple decision.

~~~
dkarapetyan
I have seen this argument

> Let's look at Puppet as an example. Let's say you have an organization that
> has a couple people out of a dozen that can write shell scripts that aren't
> spaghetti. But if everything depends on them they become a bottleneck. You
> know there's a bunch of halfway decent modules written out there for a lot
> of stuff you need and your two people can close the gap. The rest of the
> team can build knowledge and skill over time but can start putting modules
> together to build infrastructure today. So you can choose shell scripts and
> fixing your personnel/skill issues or you can choose Puppet and deal with
> the warts. Training or hiring new people would be obvious but given the
> reality of budget and politics at $bigcorp it's a no-go or you're looking at
> years of effort. Even if it's cut and dry technically it's really not always
> a simple decision.

at least 5 different times and each time whoever argued that point ended up
being wrong 3 months down the line. What I don't get is why it keeps being
propagated. You should indeed take those 2 people, make them a bottleneck, and
let them build the organizational tools and expertise around what they
consider is correct instead of doing what you suggested.

I think this argument is propagated because it provides the illusion that
programmers are interchangeable. If somehow you can leverage all the Puppet
knowledge in the world you will reduce your risk and dependence on single
points of failure like one programmer that understands how the tower of shell
scripts fit together. I think this is based on a false premise. Programmers
are not interchangeable and no matter how many frameworks and tools you layer
on it will not be possible to get rid of the dependence of good programmers
doing what is right from a technical standpoint. The alternative just doesn't
make sense really. You can't take a bunch of idiots and make them brilliant
through tooling.

------
helloiamaperson
You should take a look at [https://bosh.io/](https://bosh.io/) . It does what
you want wrt devs being responsible for defining how the software runs,
environment reproducibility, etc. Unfortunately, the learning curve is pretty
steep.

------
jwatte
Use a script to build a container; once deployed, never touch it again. Of you
need a new config, build be container, roll the old version out.

------
X0nic
I feel this sentiment all the time from devs. "Why are you making me work
harder?"

As apps get larger, we as devs have to stop and think about how things are
going to run in production. We have too often made the ops team's life worse,
because we would rather just hardcode stuff.

Yes, these tools make it harder FOR DEVS, but now the Ops team has a hope of
actually managing the platform.

~~~
dkarapetyan
My point is the distinction is artificial. I've worked on both sides of the
fence and the more you know about the other side the better programmer you can
be. Being an application developer that is completely oblivious of deployment,
build, configuration pipelines means you won't be able to build the best
application you possibly can if you knew about those things. Similarly being a
generic UNIX sysadmin that worships at the altar of Puppet et al means you
will constantly be fighting with application developers about how to do things
because you'll be unaware of their needs.

------
contingencies
Years ago I made a post analyzing the failures of this class of tool, which I
entitled _Post-facto Configuration Tinkerers_

Read: [http://stani.sh/walter/pfcts/](http://stani.sh/walter/pfcts/)

I actually built a solution:
[http://stani.sh/walter/cims/](http://stani.sh/walter/cims/)

------
dozzie
The same clueless developer as seven months ago:
[https://news.ycombinator.com/item?id=10857173](https://news.ycombinator.com/item?id=10857173)

------
dmourati
Configuration mismanagement, or why I hate puppet, ansible, salt... and chose
chef?

Right.

------
rboyd
This post makes me want to meet the ops team at his company and find a better
programmer than the author.

