A very valid complaint would be that Ansible isn't idempotent, which I think is what the author was trying to get at. Nixos would be a great example of an idempotent system, and you might be able to convince me that Docker would qualify, too. As much as I like Ansible, its lack of true idempotence is a weakness.
But that has nothing to do with yaml v. XML v. anything else. You could trivially write an idempotent description language in XML, and you can trivially write a horrible state-based system in a Haskell DSL or Prolog. As far as I can tell, the author is complaining that people think that Ansible is idempotent because it has yaml. That is incorrect, but I don't think that it'd be better or worse if they'd pick something else; merely more verbose. (Look at Maven and Gradle for an example.)
Yes, the post is somewhat ridiculous. I can take the blame for that. Still, I think the choice of language for these tools matters a lot. With so many experiments in configuration tools, my goal was to raise the point in a light-hearted way.
I'm really interested to see if or how a community with diverse preferences (developers and system administrators) might agree on a small number of configuration management tools. Or are we destined to have these tools splinter across languages?
I doubt it, unfortunately. I think a Nixos-like approach is the closest you can get to having a single tool, simply because it's a fairly logical extension of apt/rpm/etc.-style systems, but its very operation makes it too alien for me to get it adopted anywhere. I currently suspect that a movement to more PaaS-like designs is actually going to be our saving grace, simply because it ought to take so much off the table. Then we can look at tools like Clojure, Haskell, Prolog, and other similar language concepts to try to tackle the other half of things.
What I've watched and read, mostly from ansible.com, but also http://en.wikipedia.org/wiki/Ansible, suggests that Ansible is idempotent. What should I read that makes the opposite claim?
That's unfortunate. The presence of the command module alone ought to make it clear that's not true.
I think Ansible aims to be more idempotenty than other environments, and I think it succeeds there. The lack of something like Chef databags helps, as does its insistence that $SCM is the source of truth. But it doesn't take a holistic enough stance to really get all the way there. It's very difficult to do things like handle the transition from one version of a play book to another sometimes; while running a given playbook on a given blank system may be idempotent, it can't actually reliably model state transitions. In other words, it's only got half of the puzzle.
Ansible playbooks are not always declarative. There is often stuff that looks like this:
---
---
# Launch Job to count occurance of a word.
- hosts: $server
user: root
tasks:
- name: copy the file
copy: src=inputfile dest=/tmp/inputfile
- name: upload the file
shell: ...
<more names and shell pairs ...>
---
That is just specifying a set of steps. So then it calls for for loops, if conditions, and so on.
I noticed that pattern in examples when looking at Ansible and kind of shook my head.
But oh well, I guess, it depends on how you arrived at Ansible. I think if you view it as "better than SSH-ing and running shell scripts". Ok it looks nice. But if you want a more mature declarative based configuration system, then maybe you'll start to see various flaws or design issues with it.
Chef seems more powerful. Also played with saltstack. I like saltstack better so far but aside from playing with it still use shell scripts and OS packaging pre-/post- install scripts to handle tasks like these.
You say that they're not always declarative like it's a bad thing. Personally, I find it nice that we have the playbooks to build out our infrastructure and the playbooks that perform a rolling upgrade of our application using the same syntax, variables, and inventories. If you keep your declarative playbooks separate from your procedural ones, you get more power than a declarative system alone can provide. (FWIW, we chose it not just for this, but because of its lack of an agent - we can use our configuration management to deploy software to customers' infrastructure without colliding with their own CM.)
Further, this (using a data language as a programming language) isn't without precedent. Both XSL and Ant have standard conditionals, loops, etc available for use.
Salt implements its declarative language as a separate thing (States) from its more imperative parts (Modules, Runners, and the like). That's probably a better abstraction.
Things that take serious logic and state should be re-written as modules (or ActionModules, which are awesome & underdocumented). I feel like people are far too scared of writing their own modules- it's not very hard.
People don't necessarily hate XML, but man do we hate tags. One huge feature of YAML is its clever use of whitespace to eliminate a lot of cruft from the syntax.
That said, I'm starting to warm to the fact that Django uses straight Python for its settings. If you're going to use a Turing-complete language to write your config file, might as well use a familiar one.
I hate XML. It's unnecessarily complicated, it breaks easily and it's verbose.
YAML is kind of sucky as a data transfer format (chopped up YAML is still valid which leads to all kinds of screwed up behavior), but its mirror equivalent JSON is good for that. Equally, JSON sucks for declarative configuration but YAML really shines.
Django's settings file is also one of the thornier bits of django. I'm not sure if should necessarily have become declarative, but making it a monolithic python file causes a lot of headaches with integrating modules, state changes, etc. The fact that you can often move a bit of config from one end of the file to the other and change the behavior completely is a bit screwed up. I think it would work better if it were a bit more ansible-y.
I never did like using JSON as a configuration format (I'm looking at you, Sublime Text). But it's still miles ahead of using XML as a config format.
I can see where you're coming from regarding Django's settings file. Lord knows how many hours I've wasted hunting down stray configurations thanks to frameworks which abuse inheritance for its settings config (I'm looking at you, Mezzanine). Still, the fact that it is Python, and can therefore be debugged like any other Python script instead of obeying its own special syntax, is a plus in my books.
I was actually thinking of mezzanine when I wrote that (and django-cms is worse!). Both packages gave me a real headache while setting up settings.py. There was no clean separation between code and declarative configuration.
Yes! One of my little frustrations with Rails is that most configuration is done in Ruby, but then every now and again you're supposed to configure something with YAML, and then even weirder, some YAML files are actually implicitly interpreted as erb. It's weird and I'm not sure what the point is.
The idea is like what crdoconnor mentioned - it's good design to strictly separate 'data' from 'code', and most configurations are 'data'. Unfortunately, in practice there are plenty of times when a small loop or a conditional would greatly enhance DRY in config files.
I actually like the idea behind XSD - I have encountered situations where a "strongly-typed" configuration file would have made a lot of sense. Only problem is that the kinda-like-XML-but-not-really format of XSD is so clunky it becomes a second source of frustration.
The author's complaint isn't really related to YAML or XML, it's his disdain that Ansible uses a data serialization language, instead of a full-blown programming language.
Another way to look at it though is the author wouldn't mind using either YAML or XML if Ansible scripts were fully declarative. Once the need for if conditions and for loops comes about, any configuration file language (or data transport) language is going to be a little awkward.
"Once the need for if conditions and for loops comes about, any configuration file language (or data transport) language is going to be a little awkward."
This wraps up my feelings about CM tools very well. They all seem to follow this "language agnostic" / "it's not really programming!" model, whether it's chef's ruby DSL or ansible's YAML. I don't understand what's so special about this field that it requires a totally different paradigm.
In chef's case, I would rather write simple classes & functions in plain ruby than use the DSL and write "Lightweight Resource Providers".
In ansible's case, I even did a little experiment to show what it would look like to use & call ansible modules as normal python code:
I would love to see a CM tool that embraced plain python or plain ruby and not try to go the pseudo-"declarative" route, which ends up being IMO too constraining for not much benefit and needlessly reinventing a bunch of wheels (e.g. looping & conditionals, as the article mentions).
I think the intention is to appeal to traditional sysadmins with limited development experience, who don't know ruby, and more importantly don't want to know ruby.
I haven't worked with Chef so I can't comment, but I work extensively on the puppet-openstack project and my experience is that this aversion to actual languages is part of a pipe dream. The only people who are able to work with the modules in any meaningful way are the ones that understand the Puppet DSL and runtime in its entirety, and it's not much simpler than the Ruby language it's based on.
Honestly I don't know what the solution is: part of the DevOps movement is introducing development practices to the management of systems, but that brings with it an inherent requirement that the operators actually embrace the development practices. A quick example of where this can easily fall down is if you maintain a puppet environment of any complexity, you'll end up keeping both the data inputs and the modules in git repos, but if your sysadmins aren't comfortable with git, the adoption simply won't happen.
> A quick example of where this can easily fall down is if you maintain a puppet environment of any complexity, you'll end up keeping both the data inputs and the modules in git repos, but if your sysadmins aren't comfortable with git, the adoption simply won't happen.
As someone who is traditionally more dev side, but has done a lot of casual system administration, this part of things like chef and puppet drives me nuts. I don't want to version my infrastructure in git and then throw it, with no versioning context, up to a server and have it version it again in its own unique, quirky, and honestly haphazard way.
Chef's DSL is particularly egregious, since if you step out of their slightly bizarre subset your code runs on the machine running knife. Newbies tend to get really confused.
Of course, Ansible doesn't really use YAML. It uses a YAML parser, but each individual command goes through quite a bit of parsing inside ansible as well. An ansible command like this:
- command: chdir=/foo echo {{ bar }}
when: bar is defined
Really parses down to something like:
- action:
- module: command
- arguments:
- chdir: /foo
- command: echo {{ bar }}
- when: bar is defined
More parsing is done by ansible than is done by YAML. I'm not necessarily saying it's bad, but it definitely leads to some gotchas and weirdnesses.
How it mixes in Jinja2 is also strange. In the above example, there are two expressions: {{ bar }} and 'bar is defined'. Normally if you paired a templating language with a text format, you would expect the templating language to apply to the whole file. Ansible applies it afterwards, and selectively. That's the right thing to do, but it does make it confusing.
Many people choose ansible because they don't want to learn Ruby as well as a CM tool; ansible isn't much simpler. I'm not a huge fan of ansible, it's just better than everything else I've encountered so far.
I've been using Ansible a bit recently (and _really_ enjoying it) and agree that YAML can be a little wonky at times, like having to escape any template line with :{ in quotes because YAML interprets it as a dictionary. However I don't think Ansible wants you to code all your logic in YAML. You can easily write a plugin with Python and have all the proper syntax and capabilities of a programming language.
Where Ansible really shines is giving you a huge library of quality plugins/components and a simple YAML-based DSL for composing those plugins. Want to spin up an EC2 instance, copy over your code, and ensure a service is started? No problem, it's a 4 or 5 line Ansible script. When you want to do something more advanced, look at the extension points Ansible provides to plugin your own code: http://docs.ansible.com/developing_plugins.html
Right, if you need more complex logic, then you write a module in a normal programming language. The YAML is intended to keep the intention and orchestration logic simple and readable.
It works okay. One shortcoming, as mentioned at https://news.ycombinator.com/item?id=7831680, is that there is no clear model for state transitions. Let's say that on day 1, you want to use Apache:
- apt: name=apache2 state=present
Then, on day 2, you realize that Apache isn't hip, so you switch to nginx instead:
It depends on your deployment model to a degree. In the case where you build a new image for your web server on every deployment, then this isn't an issue. For longer running systems, like databases, it would still be an annoyance.
I think that Ansible is still a transitional glue tech though, and projects like flynn.io may move things forward by making the CM part of the infrastructure a developer task vs. a devops one.
Works very well for me. I have some pretty complex configurations and they're all done declaratively with a bit of template logic. I think it's pretty rare to need a custom module (not that it's hard to write one).
Ansible is 100% intended for configuration management. Configuration is by its very nature declarative, so in any case if you were doing a lot of custom procedural code for configuration you're probably doing something a bit wrong.
One of the things I like about Saltstack is that you can build your formulas using any renderers you like. You can try to remain strictly declarative in YAML, or if you like build your states completely dynamically in Python or a Python-DSL. The default uses an inbetween that takes YAML buts lets you template that YAML with Jinja.
Build tools also had a similar evolution. Ant's XML syntax first started out as all declarative, but then later on had to add more control structures (ant-contrib, IIRC). The currently popular build tools - Gradle, Rake, Grunt etc - all use a dynamically typed full-blown programing language for specifying tasks and their dependencies.
As another example, Leiningen is largely declarative, but since Clojure code is also data, it is possible to insert code alongside. See line 211 in the sample project.clj.
I find YAML to be code shoehorned into YAML and think it's ambiguous whether under the GPL (the license itself, not AnsibleWorks' interpretation of it) the YAML is subject to the GPL. It seems to be class declaration, iteration, and function calls (to GPLed functions) to me.
I'd say leave the programming languages to modules and compose them together with a data language. Forced simplicity for configuration is a good thing.
You'd prefer to allow composability in the middle and bottom layers (the modules) but have a different language on top?
I have to admit that I'm a fan of turtles -- the same kind of turtles -- all the way down. Many of the abstractions I see in configuration management tools seem to be fancy names for modules. I don't see why a role, a playbook, a play, and a task really need to be different things at all. Why is the hierarchy necessary?
> You'd prefer to allow composability in the middle and bottom layers (the modules) but have a different language on top?
Absolutely. IMO, the top layer should be a summary of what the configuration manager will be doing and should have the project-specific data in it. The rest should be reusable modules. When you let a full program language into the top layer, you inevitably get people trying to put smaller routines in the top level itself mixed in with reusable modules, which results in bloated spaghetti.
Thank God IntelliJ treats Spring XML as Java code, because so much logic and conditionals and variables live in those files. The actual program can be very different than what you would expect only looking at the Java files.
It pains me to see other tools make the same foolish mistake, even if I don't happen to use it.
Please! Not that horrible, horrid poop called XML. Keep it away from my beloved deployment/management tool. The world has plenty of other bloated packages that will gladly not care if they are bogged down with XML, just leave ansible alone!
I have a visceral hatred, a la Erik Naggum, towards XML. His arguments against XML capture a lot of my feelings, so I won't rehash them here.
A very valid complaint would be that Ansible isn't idempotent, which I think is what the author was trying to get at. Nixos would be a great example of an idempotent system, and you might be able to convince me that Docker would qualify, too. As much as I like Ansible, its lack of true idempotence is a weakness.
But that has nothing to do with yaml v. XML v. anything else. You could trivially write an idempotent description language in XML, and you can trivially write a horrible state-based system in a Haskell DSL or Prolog. As far as I can tell, the author is complaining that people think that Ansible is idempotent because it has yaml. That is incorrect, but I don't think that it'd be better or worse if they'd pick something else; merely more verbose. (Look at Maven and Gradle for an example.)