Hacker News new | comments | show | ask | jobs | submit login
Command execution on Ansible controller from host (computest.nl)
112 points by robin_reala 283 days ago | hide | past | web | 54 comments | favorite



If only we had some form of superluminal communication, it might be possible to patch all the affected systems before they get exploited.


Not sure what you are trying to say. Are you trying say that the CVS publication happened too early? If so, the timeline tells a different story:

    2016-12-08	First contact with Ansible security team
    2016-12-09	First contact with Redhat security team (secalert@redhat.com)
    2016-12-09	Submitted PoC and description to security@ansible.com
    2016-12-13	Ansible confirms issue and severity
    2016-12-15	Ansible informs us of intent to disclose after holidays
    2017-01-05	Ansible informs us of disclosure date and fix versions
    2017-01-09	Ansible issues fixed version


It was a joke about the name: https://en.wikipedia.org/wiki/Ansible

(Well, that plus the fact that this seems like the sort of scary bug which will get exploited very quickly.)


I see. This is really astonishing.

I always preferred using plain SSH and Shell for automatically configuring servers.

That's why I had a look at Ansible, but despite this simple concept, it seemed too bloated to me: too many layers around this simple idea. But I had never thought to be proven right, let alone that this complexity actually translates into 6 scary security holes in a row.

To propose an alternative, this is how I'm currently doing it. I'd love to hear what others think about it, or whether others have tried similar things:

Note: This works best of a relatively small and/or heterogenous set of servers/VMs. So it's not really the full-blown Ansible use case. But then, how many people really do have billions of users and a large set of homogenous servers at their hands? I bet the "long tail" of web applications doesn't need more than a few servers. Or, if your hardware is strong enough, just two servers, where the first one takes all the load, and the second one is a failover system on stand-by.

Use a plain shell and plain shell commands, which has the advantage that you can quickly try these interactively in any VM. Use "set -eu" so it halts on error, instead of blindly executing all following commands. Use appropriate commands, e.g. "install" instead of "cat+chmod+chown". Use "--backup=numbered" to keep backups of previous versions of files, if you want. Use only idempotent commands, i.e. commands that you can run multiple times without changing anything of significance. Prefer portable commands, but use distro-specific code as needed. The time you switch to another distro you'll have a new script, anyway, because you'll have a different system with different requirements. Write everything down in the style of a "copy&paste" documentation (Does this count as "literate programming"?), but put all your remarks in shell commands. Then automate the "paste" part by making it executable.

Example file (executable file "myserver_config"):

    #!/bin/sh
    tail -n +3 "$0" | ssh -p 1234 root@myserver_ip ; exit
    set -eu

    # Install packages
    ...

    # Configure Foo
    ...

    # Configure Bar
    ...

    # Configure Nginx
    #
    # Configure Nginx so that the foo fits into the bar
    # and boggles the foobar.

    install --backup=numbered -o root -g root -m 600 /dev/stdin /etc/nginx/nginx.conf <<'EOF'
    error_log /var/log/nginx/error.log;
    events {
      ...
    }
    http {
      access_log /var/log/nginx/access.log;
      ...
    }
    EOF

    # Do more stuff
    ...
Edit this file and redeploy by simply executing it:

    ./myserver_config
The script may seem hard to read, but with syntax highlighting it is a breeze. And you can easily convert this to HTML, AsciiDoc or Markdown if you need a "real" documentation for your customer or other project members. It's just a few lines of code in Python, or whatever language you prefer.


A couple of reasons spring to mind as to why Ansible is an improvement on the eminently workable approach you outline here.

One is idemopotency - The fact that it is not easy to run the above script on a bunch of host just to 'fix up' a config that was somehow broken or to add or amend a directive in httpd.conf type of thing. With Ansible, you could re-reun the playbook and it would only change what needed to be changed to bring everything in line with the playbook, i.e. just the directives in httpd.conf and then do the reconfigure on httpd to bring in the changed config.

The other is inventory, so such a change as I outlined could be, with a one liner playbook command, be run against all Dev hosts only, or Dev and UAT hosts only, or Red Hat hosts only etc, when you're managing a lot of hosts this is invaluable. Even for just checking some config or other easily by running a shell command against some set of hosts.


Ansible's idempotency is an illusion. Removing an item from a playbook will practically cause it to be "forgotten" by Ansible, but left on machines where it was deployed previously. While there are mechanisms to remove stale items (state=absent), in practice it takes Spock-level discipline to do this properly. Playbooks are also not tied to a particular Ansible version, meaning just upgrading Ansible can cause changes even though your definitions are still the same. In my experience Ansible promises consistent environments, but consistently fails to deliver.


> One is idemopotency - The fact that it is not easy to run the above script on a bunch of host just to 'fix up' a config that was somehow broken or to add or amend a directive in httpd.conf type of thing

In my case, I simply overwrite all config files and restart all services. This is not "efficient" (but still sub-second), but I believe this is idempotent by all means.


I prefer simple and portable shell scripts too. While I use here documents a bunch too this is the first time I've seen this idiom:

    #!/bin/sh
    tail -n +3 <current-filename> | ssh -p 1234 root@myserver_ip  ; exit
    set -eu
    ...
Clever.

When you write `install`, is this another script that you have installed on the server beforehand or is it a function/alias or something? [EDIT: Duh, didn't realize this was actually a standalone program (standard but non-POSIX AFAICT) I haven't used.]


> While I use here documents a bunch too this is the first time I've seen this idiom

I came up with this trick to avoid nested here documents, which would make syntax highlighting unusable.

> When you write `install`, is this another script that you have installed on the server beforehand or is it a function/alias or something?

The "install" tool is very common on almost all Unix systems. It is usually called by a Makefile on "make install", but of course can be used by anyone.



In my experience with ansible, salt and chef their key selling point compared to bash script is composibility and abstracting away the underlaying distribution. The later one is so leaky it's not very useful. Composibility is possible but the building blocks lack any opinionated structure, so everyone re-invents the wheel anyway.

Today I wouldn't use any of those anymore and, as you suggested, use shell scripts in a immutable infrastructure fashion (aka, reinstall system when changes are needed).


Doesn't reinstalling the system when changes are needed mean very slow deploys though?


Sure, but we ansible users are already accustomed to that. :/

More seriously, ansible is terrible and I have a long list of complaints, but it is still better than everything else at the moment. Being able to run playbooks from anywhere without a special controller host is very important, and seems to have been missed by most competitors.


If you deploy in containers, it's not. But yeah, that comes with it's own challenges.


Seen that approach in production, had to work with it, was not very impressed.


> To provide an alternative. This is how I'm currently doing it. I'd love to hear what others think about it:

Different tools for different scales. Ansible exists as an improvement over ad-hoc shell scripts, simple/ad-hoc inventories, copy-pasting etc.

I think it's very unfair to all the contributors to highlight some security issues and say "I told you so", and propose a shell script as an alternative.

EDIT: I see your edits, acknowledging more complex environments...


Consider using "$0" instead of relying on the file name in the script.


Fixed. Thanks for the hint.


Am I the only one who finds it pretty unprofessional to release the exploits when the fixed version hasn't been released yet (and anyway was only scheduled to have been released 48 hours beforehand)?

I'm all for disclosure, but seriously - if RH want Ansible to be used in enterprise they can't expect patches at this rate. The researchers releasing the exact exploits so quickly is just irresponsible IMO.


No, you're not the only one, but this is one of the oldest debates in computer security --- possibly the oldest debate --- and at least as many people as agree with you vigorously disagree and think that delaying information to conform with enterprise patch cycles does harm to organizations with strong security teams who can handle and respond to reports like this; those organizations tend to be the ones with the most users and the most sensitive data to protect.

While I sympathize far more with the full disclosure people than with the patch choreography people, I'm really only pointing this out to demonstrate that you're not going to resolve this debate in the HN comments about an Ansible vulnerability.


But to be a victim of this vulnerability you need to have one of the hosts already compromised AFAIU, so I don't think it's that severe.


Any exploit that turns a 1 host hack into hack entire data enter with root access seems worth a patch....


The article says fixes have been released.

> Resolution ---------- Ansible has released new versions that fix the vulnerabilities described in this advisory: version 2.1.4 for the 2.1 branch and 2.2.1 for the 2.2 branch.


The article is wrong. Pypi only shows 2.2.0 released in November. That's my point.


Pypi isn't the source-of-record for Ansible releases...

Latest can be found here: https://github.com/ansible/ansible/releases or https://releases.ansible.com/


There are only release candidates of the fixed versions on https://releases.ansible.com/


The fixed version does not seem to be released yet on pypi or their launchpad PPA. The only (official) place where a fixed release is available seems to be https://releases.ansible.com/ansible/


Can anyone explain in simple terms what is this about? Exploit found in Ansible? Should we all update asap?

I couldn't understand from article or discussion.


Ansible sends commands to servers, but asks them for certain data first (facts).

If one of your servers is compromised, this is a vulnerability in the ansible client that lets that bad server take over your local computer and your other servers when you connect to it by sending you bad facts.

So it's pretty serious.


Puppet uses facter and chef ohai to achieve te same thing. Could they be exploited in similar fashion?


Unfortunately versions 2.1.4 and 2.2.1 are still in release candidate phase so package managers such as Homebrew are still using vulnerable releases, as far as I can tell.


I'm just starting to use Ansible for a major deployment at work, so I am not an expert- can anyone who knows more explain why "ansible_connection" exists as a host definable fact? The controller already knows what host it is connecting to when it retrieves the fact, so why can the host change it?


it is not a host definable fact, it is a 'host variable' which you normally define in inventory.


From the advisory:

When Ansible runs on a host, a JSON object with Facts is returned to the Controller. The Controller uses these facts for various housekeeping purposes. Some facts have special meaning, like the fact "ansible_python_interpreter" and "ansible_connection".

So while you may normally define it in the inventory, it sounds like, from the advisory, it can also be defined by the host.


No fix for 1.9? We are stuck on 1.9 because 2.X does not support connecting to more than a couple hundred hosts. This ticket has been open for about a year now: https://github.com/ansible/ansible/issues/14143#issuecomment...


Have you considered working on it to fix that issue?


disclaimer: I'm one of the maintainers

I'll try to answer several questions so this might get long.

First, the procedure for disclosing the CVE is something we discussed internally (including security professionals), as all over the internet there are 2 or more views on this. The decision arrived at doesn't please everyone (I doubt one that would exists), but it is a recognized way of dealing with security issues, so it is what we followed.

The CVE can be dense explaining how the exploits work. The simple version: it is a rehash of an old problem which we had thought solved, the researchers proved us wrong by finding ways around our filtering. The vulnerability is pretty hard to trigger and requires both a compromised system to intercept the Ansible calls and return faulty data and intimate knowledge of the existing playbooks and systems used to force the arbitrary execution.

All software has flaws, this is not an excuse, it is a fact. Not all software or flaws have the same scope though. As a automation tool that is often used to manage things with high levels of privilege, we take these things very seriously and we do our best to prevent it in the first place or remediate it as soon as possible. As an OSS project we appreciate the eyes and efforts many put into finding these flaws which end up making the software better (and me a better programmer).

Ansible is not idempotent, it is declarative, which does help the user create idempotent plays. True idempotency depends on many things, the modules used, the problem addressed and how the playbooks are written, etc. Both of the following are valid ansible tasks, but the implications are very different even if the result ends up being the same.

- shell: usermod -G user bcoca

- user: state=present name=bcoca groups=user

I hope this helps,


Odd. I reported this very same form of vulnerability to the Ansible team in the 1.5.4 series in 2014, where the code basically eval'd the "facts" discovered from a system under management.

There was this "safe_eval" function which filtered input in a way quite inconsistent with its name. The Ansible team was responsive and pleasant to work with!

https://groups.google.com/forum/#!topic/ansible-project/MUQx...

But I suspect lots of remote control and monitoring software products might have security bugs like this where they assume that the returned information from systems under management are trustworthy.

Edit to add: Here's the patch made to safe_eval in 2014. I had suggested using literal_eval instead but I guess a Python 2.6+ requirement wouldn't work. https://github.com/ansible/ansible/commit/998793fd0ab55705d5...

Edit again: Ansible is a pretty great product, and IMO one of the first of such tools to seriously improve the UX for sysadmins. Thanks for maintaining it!


https://www.computest.nl/advisories/CT-2017-0109_Ansible.txt has the bug details

Edit: LWN link was changed to this one


Thanks! Some of those are somewhat embarrassing, especially for something that's meant to be software used in secure environments. Why does a client need to specify an interpreter to run on the master host? Or changing the template brackets to escape quoting? I'm also thinking that maybe Python might be a bit too dynamic – allowing anything by design – bringing its own share of problems to developing security-conscious software.


> Why does a client need to specify an interpreter to run on the master host?

Some system have Python installed in a rather uncommon location. For example, Python is not part of FreeBSD base system, so Python is installed at /usr/local/bin/python instead of the expected /usr/bin/python, or Arch has Python 2 installed at /usr/bin/python2 rather than /usr/bin/python.

Note that Ansible doesn't require itself to be installed on the remote host (and IMHO is one of its biggest selling point) and execute tasks by sending a packed version of a task to the remote host and execute it using `ansible_python_interpreter` (e.g. `/usr/bin/python /tmp/ansible-tmp-a43bf412.py`)


Even if you don't have to install a client you have at least to authorize a ssh key and somehow express your intent to add an ansible client.. When in the case of some other solution the installation amounts to one package to deploy (with no dependency) and one command line to issue, installation is a false problem..


Why does a client need to specify an interpreter to run on the master host?

No, the client specifies the interpreter to use inside itself.


actually, the 'controller' specifies the interpreter to be used at the client, there can be more than 1 and the '1st one in path' is not always the correct one.


But why? I think a dedicated interpreter at a fixed address would do.


Imagine you're in a large enterprise environment, and need to deploy something to lots of different servers that were created by lots of different people (your company bought 3 others and each used different software/linux versions/servers). Some have python in the path, some have different pythons in the path, etc. You can't immediately throw away all these servers and rebuild them - some probably have 10 year old bits of software on them where the original author has moved away. Ansible might be one of the tools you use to start fixing this chaotic situation, and having per-host configurations for things like python path is essential.


Step 1: Install the expected, dedicated Python interpreter as /usr/bin/ansible_python.


Python is not just a single executable. It takes a little more work than one would want. And for what gain? Would you be fine with other software also requiring a custom interpreter to function? It gets cumbersome.


If you rely on the system-wide Python you'll need to cover different versions and you need to make sure not to use any 3rd party modules.


Ansible ships its own libraries, generally.


Incredibly. But it seems to be The Ruby Way™…


Well, it's an agent-less system after all, the idea is to keep the system lean.


because it uses existing interpreters on the target host, fixed locations only happen in a homogeneous environment. Most IT shops commonly have to deal different OS/Distrbutions/Versions so the same way you cannot have just 1 tshirt size for everyone you cannot have 1 interpreter path.

I'm saying 'interpreter' instead of Python because you can create modules in any language, Ansible only ships with Python ones, but Perl, Ruby, etc modules exist also and usable by Ansible.


Can mods change the link to this?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: