
Ask HN: Which configuration management software would/should you use in 2020? - uaas
What is your team using at work? What should be used at scale (FAANG, or similar)? What are you planning to switch to?
======
caleblloyd
Not FAANG but for small to medium "cloud native" businesses I like to use this
approach with minimal dependencies:

Managed Kubernetes cluster such as GKE for each environment, setup in cloud
provider UI since this is not done often. If you automate it with terraform
chances are next time you run it, the cloud provider has subtly changed some
options and your automation is out-of-date.

Cluster services repository with Helm charts for ingress controller,
centralized logging and monitoring, etc. Use a values-${env}.yaml for
environment differences. Deploy with CI service such as Jenkins.

Configuration repository for each application with Helm Chart. If it's an app
with one service or all services in a single repo this can go in the same
repo. If it's an app with services across multiple repos, create a new repo.
Use a values-${env}.yaml for environment differences. Deploy with CI service
such as Jenkins.

Store secrets in cloud secrets manager and interpolate to Kubernetes secrets
at deploy time.

Cloud provider keeps the cluster and VMs up-to-date, CI pipelines do the
builds and deployments. No terraform/ansible/other required. Again, this only
works for "cloud native" models.

~~~
nickthemagicman
Yeah, in a decent architecture the only place state is located is in the
datastore layer.

The goal is to make servers disposable, able to be destroyed and created at
will, so configuration management becomes kind of a legacy technology at that
point.

~~~
polcia
Yes, but created from what? This can be called config mgmt too.

~~~
caleblloyd
For traditional datastore, I usually do:

Dev/QA/Similar: Either containerize and back to persistent volume, or use a
managed DB service such as RDS or Cloud SQL and create a schema per
environment. Include a deployment pipeline argument to reset to known state.
CI pipeline can be tuned to handling dynamic environments in either case.

Stage/Prod: Use managed DB service such as RDS or Cloud SQL.

The time and cost to automate a DB upgrade with every edge case considered is
huge. Rarely makes sense for small/medium business.

~~~
saber6
Nitpick: I really don't suggest a divergence in the DB/stack-of-choice between
Dev/QA/Stage/Prod. I've chased so many issues that were in the planning
process dismissed as "yeah that's an edge case and most likely won't happen".

The reasons I've seen for doing so are usually penny-wise, pound-foolish.
Penny-wise in saving a few dollars (conceptually) on a spreadsheet for per-
env/per-cycle, while neglecting the long-tail consequence of your labor factor
just growing, potentially forever, without regard for total cost of ownership.

Sorry didn't mean to rant. Hope this helps.

------
phaer
I still prefer the Open Source edition of
[https://puppet.com/](https://puppet.com/) to manage larger, diverse
environments - which may include not just servers, but workstations, network
appliances and so on. It's well established with lots of quite portable
modules. But it can also be a bit on the slower side and comes with a steeper
learning curve then some of the others.

[https://www.ansible.com/](https://www.ansible.com/) is surely a good solution
for Bootstraping Linux cloud machines and can be quite flexible. I personally
feel like its usage of YAML manifests instead of a domain-specific language
can make complex playbooks harder to read and to maintain.

If all you do is to deploy containers on a managed Kubernetes or a similar
platform, you might get away with some solution to YAML templating (jsonnet et
al) and some shell glue.

I am keeping an eye on
[https://github.com/purpleidea/mgmt](https://github.com/purpleidea/mgmt) which
is a newer contender which many interesting features but lacks more complex
examples.

Others like saltstack and chef still see some usage as far as I know, but I've
got no personal experience with them.

~~~
apple4ever
Ansible amazing for configuration management, much better than Puppet. Storing
the config in YAML makes it super easy to read and maintain, also much better
than Puppets method.

As you mention, puppet has a steep learning curve, whereas Ansible has a very
shallow one. It’s easy to get running in a few minutes!

We use both Puppet and Ansible at work, and its constant complaints and delays
with Puppet whereas Ansible is little complaints and no delays.

~~~
notyourday
> We use both Puppet and Ansible at work, and its constant complaints and
> delays with Puppet whereas Ansible is little complaints and no delays.

That's probably because you are not running masterless, which means your
puppet master is a bottle neck.

~~~
apple4ever
The master is part of the bottleneck, but a lot of the complaints are trying
to get it to do what it says. But a big benefit of puppet is the master
feature, so if that's taken out, why puppet?

~~~
notyourday
Puppetmaster took off because it conceptually easy to understand to people who
were used to managing servers by hand.

I would argue that masterless puppet is a superior pattern for both scaling
creating a hierarchical structures.

------
brightball
I favor Ansible for 2 main reasons:

\- If you have SSH access, you can use it. No matter what environment or
company you work for, there’s no agent to install and no need to get approval
to use the tool. It’s easy to build up a reproducible library of your shell
habits that works locally or remotely, where each step can avoid being
repeated in case there’s a need to rerun things.

\- If you get into an environment where performance across many machines is
more important you can switch to pull based execution. Because of that, I see
very little advantage to any of the other tools that outweighs the advantages
of Ansible.

~~~
inshadows
> If you have SSH access, you can use it. No matter what environment or
> company you work for, there’s no agent to install

I don't get why is this always brought up as a major advantage when discussing
CM. Ansible actually installs its Python runtime to target systems. Once I had
a server that had full disk root and Ansible failed to work because there was
no space left to copy tons of its Python code.

~~~
vinaypai
Ansible doesn't install a runtime on the target machine, it temporarily copies
over the scripts that do the work and removes them after the run is complete.
These are a few kilobytes typically.

No configuration system is likely to work with a full root partition, though.

~~~
_frkl
Agree. Although I personally prefer Ansible to the alternatives, the one thing
I don't like is that it does require python to be installed on the target
hosts for most modules. That's not a problem usually, but every now and then
it is. Also, sometimes additional python package requirements are needed,
managing those in an automated manner is usually a hassle....

I've had this idea for a config management system that compiles all
provisioning code to posit shell. One of those years I'll finish it... :-)

~~~
mdaniel
> it does require python to be installed on the target hosts for most modules

I recently learned that ansible supports binary modules, in addition to the
OOtB support for modules written in any already-configured scripting language
(including shell):
[https://docs.ansible.com/ansible/2.9/dev_guide/developing_pr...](https://docs.ansible.com/ansible/2.9/dev_guide/developing_program_flow_modules.html#binary-
modules)

However, while I know you meant "all the provided modules," it seems they are
headed toward a less "batteries included" style and more
(pypi|maven|npm|rubygems) style of "the community will sort it out" mechanism
of distribution:
[https://docs.ansible.com/ansible/2.9/user_guide/collections_...](https://docs.ansible.com/ansible/2.9/user_guide/collections_using.html#using-
collections)

Which I welcome wholeheartedly because landing even the simplest fixes to
ansible modules is currently a very laborious and time intensive operation

------
grrywlsn
I'm curious why people use configuration management software in 2020. All of
that seems like the old way of approaching problems to me.

What I prefer to do is use Terraform to create immutable infrastructure from
code. CoreOS and most Linux variants can be configured at boot time (cloud-
config, Ignition, etc) to start and run a certain workload. Ideally, all of
your workloads would be containerised, so there's no need for configuration
drift, or for any management software to be running on the box. If you need to
update something, create the next version of your immutable machine and
replace the existing ones.

~~~
notacoward
"Immutable infrastructure" what a laugh. In a large deployment, configuration
somewhere is _always_ changing - preferably without restarting tasks because
they're constantly loaded. We have (most) configuration under source control,
and during the west-coast work day it is practically impossible to commit a
change without hitting conflicts and having to rebase. Then there are machines
not running production workloads, such as development machines or employees'
laptops, which still need to have their configuration managed. Are you going
to "immutable infrastructure" everyone's laptops?

(Context: my team manages dozens of clusters, each with a score of services
across thousands of physical hosts. Every minute of every day, _multiple_
things are being scaled up or down, tuned, rearranged to deal with hardware
faults or upgrades, new features rolled out, etc. Far from being immutable,
this infrastructure is remarkably _fluid_ because that's the only way to run
things at such scale.)

Beware of Chesterton's Fence. Just because you haven't learned the reasons for
something doesn't mean it's wrong, and the new shiny often re-introduces
problems that were already solved (along with some of its own) because of that
attitude.

~~~
wpietri
Are you sure you two are talking about the same thing?

My understanding of immutable infrastructure is the same as immutable data
structures: once you create something, you don't mess with it. If you need a
different something, you create a new one and destroy the old one.

That doesn't mean that the whole picture isn't changing all the time. Indeed,
I think immutability makes systems overall more fluid, because it's easier to
reason about changes. Mutability adds a lot of complexity, and when mutable
things interact, the number of corner cases grows very quickly. In those
circumstances, people can easily learn to fear change, which drastically
reduces fluidity.

~~~
tibbon
Yup. We do this. When our servers need a change, we change the AMI for
example, and then re-deployment just replaces everything. Most servers survive
a day, or a few hours.

~~~
0xEFF
Configuration Management is still present in this process, it's just moved
from the live system to the image build step.

~~~
notacoward
Probably the most insightful comment in this entire thread. Thank you. In many
cases, an "image" is just a snapshot of what configuration management (perhaps
not called such but still) gives you. As with compiled programming languages,
though, doing it at build time makes future change significantly slower and
more expensive. Supposedly this is for the sake of consistency and
reproducibility, but since those are achievable by other means it's a false
tradeoff. In real deployments, this just turns configuration drift into
container sprawl.

------
b5n
Surprised more people here are not using Salt. Having used both Salt and
Ansible, I much prefer Salt, especially when working with larger teams.

When working solo I use Guix, both Guix and Nix are _seriously_ amazing.

~~~
skrebbel
What's salt? Any link? I found something called SaltStack but that appears to
be enterprise security software.

~~~
mroche
[https://github.com/saltstack/salt](https://github.com/saltstack/salt)

Salt (also known as SaltStack) was right.

> Salt is a new approach to infrastructure management built on a dynamic
> communication bus. Salt can be used for data-driven orchestration, remote
> execution for any infrastructure, configuration management for any app
> stack, and much more.

~~~
skrebbel
Thanks!

I'm ultra confused about their marketing btw. Their website doesn't even say
it's open source. You have to sign up to "try it now". It's like they don't
want customers? Or are people who want to understand what they're buying not
the target market, somehow?

For reference, this appears to be the Salt primer:
[https://docs.saltstack.com/en/getstarted/system/index.html](https://docs.saltstack.com/en/getstarted/system/index.html)

~~~
NikolaeVarius
I think there is a small subset of users where open source is actually a
buying decision

~~~
skrebbel
Sure, but understanding what the thing is, that's part of the buying decision
right? I have _no clue_ what
[https://www.saltstack.com/](https://www.saltstack.com/) is about.

How do I get from this:

> _Drive IT security into the 21st century. Amplify the impact of your entire
> SecOps team with global orchestration and automation that remediates
> security issues in minutes, not weeks._

to "it's a provisioning tool for servers, like ansible but faster"?

~~~
cdcarter
The docs landing page have me an easier to grok picture of what it does,
purely based on seeing what the doc headers are -
[https://docs.saltstack.com/en/latest/](https://docs.saltstack.com/en/latest/)

------
perlgeek
I use Ansible, mostly because it works pretty well for deployments (on
traditional, non-dockerized applications), and then I can just gradually put
more configuration under management.

So it's a very good tool to gradually get a legacy system under configuration
management and thus source control.

------
rootforce
My default tends to be Ansible because it is really versatile and lightweight
on the systems being managed. That versatility can bite you though because
it's easy to use it as a good solution and miss a great one. Also, heaven help
you if you need to make a change on 1000s of hosts quickly.

I also use (In order of frequency): Terraform, Invoke (Sometimes there is no
substitute for a full programming language like python), Saltstack (1000's of
machines in a heterogenous environment)

If I were going to deploy a new app on k8s today, I would probably use
something like
[https://github.com/fluxcd/flux](https://github.com/fluxcd/flux).

I haven't really had a pleasant time with the tooling around serverless
ecosystem yet once you get beyond hello worlds and canned code examples.

~~~
yjftsjthsd-h
> Also, heaven help you if you need to make a change on 1000s of hosts
> quickly.

Why? I would have seen that as Ansible's strong point.

~~~
luto
It gets terribly slow and eats up literal 10s of gigabytes of RAM. Extensions
like mitogen can help, though.

[https://mitogen.networkgenomics.com/ansible_detailed.html](https://mitogen.networkgenomics.com/ansible_detailed.html)

~~~
yjftsjthsd-h
Re: performance: That's fair. I didn't realize it scaled _that_ badly.

Re: mitogen: Thanks! I saw that once, a long time ago, but couldn't find it
again. I'll have to try it; vanilla ansible is fine for me so far, but I'm
hardly going to ignore a speed boost that looks basically free to implement.

------
witcher
I might be fanboy of the type safety and having a quick feedback loop, but I
cannot imagine a better configuration management system than just straight
configuration as code e.g. in Go:
[https://github.com/bwplotka/mimic](https://github.com/bwplotka/mimic)

I really don't see why so many weird, unreadable languages like jsonnet or CUE
were created, if there is already a type safe, script-like (Go compiles in
miliseconds and there is even go run command), with full pledged IDE
autocompletion support, abstractions and templating capabilities, mature
dependency management and many many more.. Please tell me why we are inventing
thousands weird things if we have ready tools that helps with configuration as
well! (:

~~~
beders
I agree. I wish we could just use EDN and Clojure, but your DevOps guy is not
writing Go or Clojure code.

They are also not doing code reviews to enforce security policies.

If you have DevOps guys who are also software developers, more power to you,
but if I approach my DevOps team with:

Hey just code your scripts in this turing-complete languages, they will ask me
"what's your username again?" BOFH-style ;)

~~~
smw
Holy gods yes! Please let me use a real programming language instead of an
unholy mixture of yaml and jinja. Clojure would be such a dream!

~~~
fulafel
Check out Spire,

[https://github.com/epiccastle/spire](https://github.com/epiccastle/spire)

------
aganame
Hashicorp tools are quite solid, and give you a lot for free. Ansible can
automate host-level changes in places where hashicorp cannot reach. There
shouldn't be many such places.

Alternatively, if you have the option of choosing the whole stack, Nix/NixOS
and their deployment tools.

I would recommend staying away from large systems like k8s.

------
maximilianburke
Here's what we're using which I'm pretty happy with:

0\. Self-hosted Gitlab and Gitlab CI.

1\. Chef. I'd hardly mention it because it's use is so minimal but we have it
setup for our base images for the nitpicky stuff like connecting to LDAP/AD.

2\. Terraform for setting up base resources (network, storage, allocating
infrastructure VMs for Grafana).

3\. Kubernetes. We use a bare minimum of manually maintained configuration
files; basically only for the long-lived services hosted in cluster plus the
resources they need (ie: databases + persistent volumes), ACL configuration.

4\. Spinnaker for managing deployments into Kubernetes. It really simplifies a
lot of the day-to-day headaches; we have it poll our Gitlab container
repository and deploy automatically when new containers are available. Works
tremendously well and is super responsive.

------
kalium_xyz
Nix (nixos, nixops) is worth looking into if you want a full solution and can
dedicate the time and energy.

~~~
danieldk
Also Morph, which is like NixOps, but stateless:

[https://github.com/DBCDK/morph](https://github.com/DBCDK/morph)

~~~
VoiceOfWisdom
Morph is lovely because it ends up being a very thin layer over the existing
Nix toolkit. All it does is deploy your NixOS config to a remote machine.

------
tilolebo
We use Ansible with Packer to create immutable OS images for VMs.

Or Dockerfile/compose for container images.

Cloud resources are managed by Terraform/Terragrunt.

~~~
mikepurvis
I think this is the ideal scenario for Ansible— one-time configuration of
throwaway environments, basically as a more hygenic and structured alternative
to shell scripts.

My experience trying to manage longer lived systems like robot computers over
time with Ansible has been that it quickly becomes a nightmare as your
playbook grows cruft to try to account for the various states the target may
be coming from.

~~~
aaronkaplan
Could you say more about why ansible is better than shell scripts for one-time
configuration? In my mind, ansible's big advantage over shell scripts is that
it has good support for making incremental changes to the configuration of
existing resources. In a situation like packer, where the configuration script
only gets run once, I prefer the conciseness of a shell script.

~~~
mikepurvis
I see the incremental piece as a dev-time bonus rather than something to try
to leverage much in production— it lets you iterate more quickly against an
already-there target, but that target is still basically pristine in that any
accumulated state is well understood. But that's very much not the case if
you're trying to do an Ansible-driven incremental change against a machine
that was deployed weeks or months earlier.

Even in the run-once case, though, I think there's a benefit to Ansible's
role-based approach to modularization. And again for the dev scenario, it's
much easier to run only portions of a playbook than it is to run portions of a
shell script.

And finally, the diagnostics and overall failure story are obviously way
better for Ansible, too.

Now, all this said, I do still go back and forth. For example, literally right
now in another window I'm working a small wrapper that prepares clean
environments to build patched Ubuntu kernels in— and it's all just
debootstrap, systemd-nspawn, and a bunch of shell script glue.

------
Ixiaus
Dhall: [https://dhall-lang.org/](https://dhall-lang.org/)

~~~
carapace
I haven't used either (yet) but Dhall or Cue lang should be on your list of
candidates IMO.

[https://cuelang.org/](https://cuelang.org/)

(To me things like puppet or ansible seem like thin layers over shell and ssh,
whereas both Dhall and Cue seem to innovate in ways that are more, uh, _je ne
sais quoi_ ;-) YMMV)

~~~
verdverm
Just started using Cue, it is fantastic. It was built for this problem

------
uranium235
You can never go wrong with bash, you should not put secrets in
169.254.169.254 metadata and you should not have IAM profiles that have
overreaching privileges. Any IAM profile that you use or whatever you use on
azure or gcp you should always consider what somebody can do with it if they
get access to it.

~~~
uranium235
Probably also just straight up docker and docker compose is another good idea,
and terraform and possibly hashicorp vault are real high on the list, too.
Ansible and chef and puppet are all pretty esoteric and I thought chef was
great till I just got good with bash and gnu parallel

------
ratiolat
Salt because it's declarative and runs on linux, windows and osx.

------
aprdm
I have been using Ansible for over four years now, my current use case has
around 1k VMs and a handful of baremetal in a couple of different datacenters
running 100s of services.

No orchestration as well FWIW, we usually have ansible configuring Docker to
run and pulling the images...

As for the future I have been meaning to explore Terraform and some
Orchestration platforms (Nomad).

------
polcia
I would go with Ansible for side projects/smaller tasks, and use Puppet at
large.

~~~
ianai
Any reasons for those choices?

~~~
polcia
Ansible is just extremely easy to begin with, and comfortable to use since it
is an agentless solution using SSH. As for Puppet, well, it could largely
depends on your team. Is it a devops one or a strictly dev one? Puppet seems
to be the perfect balance for us (devops mostly, but devs can touch it with
confident too)

------
geofft
Shameless plug for a thing I maintain, which is in the config management space
but a little bit different from the usual tools:
[https://github.com/sipb/config-package-dev#config-package-
de...](https://github.com/sipb/config-package-dev#config-package-dev)

config-package-dev is a tool for building site-specific Debian packages that
override the config files in other Debian packages. It's useful when you have
machines that are easy to reimage / you have some image-based infrastructure,
but you do want to do local development too, since it integrates with the dpkg
database properly and prevents upgraded distro packages from clobbering your
config.

My current team uses it - and started using it before I joined the company (I
didn't know we were using it when I joined, and they didn't know I was
applying, I discovered this after starting on another team and eventually
moved to this team). I take that as a sign that it's objectively useful and
I'm not biased :) We also use some amount of CFEngine, and we're generally
shifting towards config-package-dev for sitewide configuration / things that
apply to a group of machines (e.g. "all developer VMs") and CFEngine or
Ansible for machine-specific configuration. Our infrastructure is large but
not quite FAANG-scale, and includes a mix of bare metal, private cloud and
self-run Kubernetes, and public cloud.

I've previously used it for

\- configuring Kerberos, AFS, email, LDAP, etc. for a university, both for
university-run computer labs where we owned the machines and could reimage
them easily and for personal machines that we _didn 't_ want to sysadmin and
only wanted to install some defaults

\- building an Ubuntu-based appliance where we shipped all updates to
customers as image-based updates (a la CrOS or Bottlerocket) but we'd tinker
with in-place changes and upgrades on our test machines to keep the
edit/deploy/test cycle fast

~~~
asguy
Thanks for posting this. I’ve rolled my own version of this in the past and
was very happy with the end results.

------
smotti
Ansible for dev boxes or smaller deployments. For large-scale deployments
CFEngine3. When deployed within a cloud environment one doesn't even need a
master node for CFE3 but the agents can just pull the latest config state from
some object storage.

------
cfgmaster
If you want massive parallel remote script execution, none beat gnu parallel
or xargs + "ssh user@host bash < yourscript.sh".

All of cofiguration management tools( ansible, puppet, chef, salt etc ..) are
bloated.

We already have FINE SHELL. Why do we need crappy ugly DSL or weird yaml ??

These days, Newbies write ansible playbooks without even basic unix shell &
commands knowledge. What the hell?

I like ssh + pure posix shell approach like

Show HN: Posixcube, a shell script automation framework alternative to Ansible
[https://news.ycombinator.com/item?id=13378852](https://news.ycombinator.com/item?id=13378852)

------
jjmiv
I typically use terraform and ansible. tf creates/manages the infrastructure
and then ansible completes any configuration.

~~~
whatsmyusername
This is the approach we take. We don't track states or do continuous config
management either as we're all in on cattle > pets (and we don't typically
have the time to maintain terraforms properly enough to do anything but cut
new environments). Something gets sick? Shoot it and stand up another one.

------
madhadron
Funnily, I wrote my take on this not too long back:

[http://madhadron.com/posts/choosing_your_base_stack.html](http://madhadron.com/posts/choosing_your_base_stack.html)

Don't be distracted by FAANG scale. It's not relevant to most software and is
usually dictated by what they started using and then deployed lots of
engineering time to make work.

My suggestion is to figure out how you will manage your database server and
monitoring for it. If you can do that, almost everything else can fall into
line as needed.

~~~
mleonhard
Why did you leave out DigitalOcean and Terraform?

~~~
madhadron
Terraform isn't equivalent to Puppet, Chef, or Salt. It's a tool for
specifying cloud deployments, not a configuration management system.

DigitalOcean might be fine. So is Arch Linux. But if someone just wants to get
on with what they're interested in with a minimum of fuss over time, it
wouldn't be the right recommendation.

------
e12e
I've prototyped ansible for rolling out ssl certs to a handful of
unfortunately rather heterogeneous Linux boxes - and it worked pretty well for
that.

I still think there's too much setup to get started - but am somewhat
convinced ansible does a better job than a bunch of bespoke shell would
(partly because ansible comes with some "primitives"/concepts such as "make
sure this version of this file is in this location on that server - which is
quick to get wrong across heterogeneous distributions).

We're moving towards managed kubernetes (for applications currently largely
deployed with Docker and docker-compose on individual vms).

I do think the "make an appliance;run an appliance;replace the appliance" life
cycle makes a lot of sense - I'm not sure if k8s does yet.

I think we could be quite happy on a docker swarm style setup - but apparently
everything but k8s is being killed or at least left for dead by various
upstream.

And k8s might be expensive to run in the cloud (a vm pr pod?) - but it comes
with abstractions we (everyone) needs.

Trying to offload to SaaS that which makes sense as SaaS - primarily managed
db (we're trying out elephant sql) - and some file storage (100s of MB large
Pdf files).

For bespoke servers we lean a bit on etckeeper in order to at least keep track
of changes. If we were to invest in something beyond k8s (it's such a big
hammer, that one become a bit reluctant to put it down once picked up..) I'd
probably look at gnu guix.

------
chrisgoman
Fabric [https://www.fabfile.org/](https://www.fabfile.org/) (just one step
above shell scripts using python), using 1.x as the 2.x stuff is still missing
things. The key is having is structure to almost be like Ansible where you
kind of have "playbooks" and "roles" (had this structure going before Ansible)
... probably have to move out of this soon though

~~~
nepthar
Could you please explain why you think you'll have to move out of it soon?

~~~
chrisgoman
Works for smaller teams and smaller # of hosts. I would say that it would
start getting harder with 5-6 people and > 100 hosts. But for small stuff, it
is the most awesome thing in the world. I had a structure I used a long time
ago ([https://github.com/chrisgo/fabric-
example](https://github.com/chrisgo/fabric-example)) but have broken it up now
differently in the last 2 years (looks more like Ansible)

... and peer pressure (which is probably not a good reason)

------
eyberg
I use OPS [https://ops.city](https://ops.city) which uses the nanos unikernel
[https://github.com/nanovms/nanos](https://github.com/nanovms/nanos) and since
I work on it would appreciate any suggestions/comments/etc. on how to make it
better.

------
whatsmyusername
I'll tell you the one tool I DON'T use. Cloudformation. I've touched it a
grand total of once and it burned me so hard I set a company policy to never
use it again.

It's like terraform, except you can't review things for mistakes until it's
already in the process of nuking something. Which is terrible when you're
inheriting an environment.

~~~
emills
Isn't that what changesets are for?

~~~
gonzo41
And a set of environments along the lines of at least, Dev, Test, preview,
production.

------
nikivi
I enjoy using mage
([https://github.com/magefile/mage](https://github.com/magefile/mage)). I like
having a full language at my disposal for configuring things rather than yaml
or json or whatever else.

~~~
whatsmyusername
I like Chef for similar reasons. It's just Ruby code.

------
rhizome31
I operate a couple of Elixir apps and so far a simple Makefile with a couple
of shell scripts has been enough. This simplicity is due to the fact that the
only external dependency is a database server, everything else (language
runtime, web server, caching, job scheduling, etc.) is baked in the Elixir
release. One unfortunate annoyance though is that Elixir releases are not
portable and can't be cross-compiled (e.g. building on latest Ubuntu and
deploying to Debian stable won't work) so we have to build them in a container
matching the target OS version. So to be really honest I should mention that
Docker is also part of our deployment stack, although we don't run it on
production hosts.

~~~
skrebbel
How do you handle multiple servers? Eg for fallback, vertical scaling,
whatever

------
skinney6
Easy, flexible, ansible but not super fast (ssh) Still pretty easy but very
fast saltstack (zmq)

------
oneplane
Terraform for everything 'outside' your runtime (VM, container), SaltStack for
everything 'inside' (VMs and containers) and for appliances (where Terraform
has no provider available) as well.

------
slayerjain
I think we've developed multiple layers in our infrastructure (Cloud Infra -
AWS, GCP.., Paas - Kubernetes, ECS.., Service mesh - Istio, linkerd..,
application containers..). So it depends on how many layers you have and how
you want to manage a particular layer. Companies at `any` scale can get away
with just using Google App Engine (Snap) or have 5+ layers in their
infrastructure.

I find Jenkins X really interesting for my applications. It seems to solve a
lot of issues related to CI/CD and automation in Kubernetes. however, still
lacks multi-cluster support.

~~~
ejcb
Hey, there! I'm a product manager working on Jenkins X. We are at work right
now on multi-cluster support, actually.

I'd love to talk to you about it in more detail and get you involved in the
experiments around it - feel free to email me at ejones@cloudbees.com if you'd
like to be involved or chat more about it.

------
chousuke
I'm pretty happy using both Puppet and ansible. I use Puppet for configuring
hosts and rolling out configuration changes (because immutable infrastructure
isn't a thing you can just _do_ ; there's overhead and it does not fit all
problems) and ansible for orchestrating actions such as upgrades. They work
well together.

I very much dislike ansible's YAML-based language and would hate to use it for
configuration management beyond tiny systems, but it's pretty decent as a
replacement for clusterssh and custom scripts.

~~~
notyourday
I'm using puppet for everything, including nearly immutable infrastructure (
if you can't mount your disks read only and run that way you dont have
immutable infrastructure )

Puppet maintains the base image with the core system.

Special systems are recreated by applying system specific classes to a base
image.

Application software is installed via packages with git commit-ids being
versions.

Nothing is upgraded, rather a new instances are rolled out and the old
instances are destroyed.

This also ensures that we always know that we can recreate our entire
infrastructure because we do that for rapidly changing systems several times a
day and for all systems at least monthly.

This makes our operational workflow match the disaster recovery, which is god
sent.

------
apple4ever
Ansible Ansible Ansible for me!

I’ve tried Puppet and SaltStack, and I constantly find they are harder and
more complex than Ansible. I can get something going in Ansible in short
order.

Ansible really is my hammer.

------
SirensOfTitan
We use terraform to describe cloud infrastructure, check all k8s configmaps
and secrets into source control (using sops to securely store secrets in git).

~~~
xref
Curious about why you're using SOPS [1] instead of say, hashicorp vault or
AWS/GCP's integrated keystores or git-crypt, etc?

[1] [https://github.com/mozilla/sops](https://github.com/mozilla/sops)

~~~
SirensOfTitan
I found vault a pain to setup and maintain. I think security solutions should
be as simple as possible so people use them and understand them.

What I like about SOPs is that we still leverage AWS keystore to store our
master key, but we store encrypted secrets in git. This is helpful as we have
a history of rotations (great for rollbacks, audits, etc). Additionally, SOPs
doesn’t encrypt yaml keys, so one can tell what a secret is but not its
values.

------
antoncohen
I won't talk much about FAANG scale, because that is hyper specialized.

A small startup shouldn't use any configuration management (assuming
configuration management means software like Puppet, Chef, Salt, and Ansible).
That is because small startups shouldn't be running anything on VMs (or bare
metal). There are so many fully managed solutions out there. There is no
reason to be running on VM, SSHing to servers, etc. App Engine, Heroku, GKE,
Cloud Run, whatever.

Once you get to the point where you need to run VMs (or bare metal), there are
many options. A lot of systems are going to a more image + container based
solution. Think something like Container-Optimized OS[1] or Bottlerocket[2],
where most of the file system is read-only, it is updated by swapping images
(no package updates), and everything runs in containers.

If you are actually interested in config management, I'll give my opinions,
and a bit of history. I've used all four of the current major config
management systems (Puppet, Chef, Salt, and Ansible).

Puppet was the first of the bunch, it had its issues, but it was better than
previous config managements systems. Twitter was one of the first big tech
companies to use Puppet, and AFAIK they still do.

Chef was next, it was created by people to did Puppet consulting for a living.
It follows a very similar model to Puppet, and solves most of the problems
with Puppet, while introducing some problems of its own (mainly complexity in
getting started). In my opinion Chef is a clear win over Puppet, and I don't
think there is a good reason to pick Puppet anymore. One of the biggest
advantages is that the config language is an actual programming language
(Ruby). All the other systems started with language that was missing things
like loops, and they have slowly grafted on programming language features. It
is so much nicer to use an actual programming language. Facebook is a huge
Chef user.

Salt was next. It was created by someone who wanted to run commands on a bunch
of servers. It grew into a configuration management system. The underlying
architecture of Salt is very nice, it is basically a bunch of nodes
communicating over a message bus. Salt has different "renderers"[3], which are
the language you write the config in, including ones that use a real
programming language (Python). I'll back to Salt in a minute.

Ansible... it is very popular. This is going to sound harsh, but I'm just
going to say it. I think is it popular with people who don't know how to use
configuration management systems. You know how the Flask framework started as
an April Fool's joke[4], where the author created something with what he
thought were obviously bad ideas, but people liked some of them. Ansible is so
obviously bad, at its core, that I actually went and read the first dozen Git
commits to see if there were any signs that is was an April Fool's joke.

There was a time a few years ago when Ansible's website said things like
"agentless", "masterless", "fast", "secure", "just YAML". They are all a joke.

Ansible isn't agentless. It has a network agent that you have to install and
configure (SSH). Yes, to do it correctly you have to actually configure SSH, a
user, keys, etc. It also has a runtime agent that you have to install
(Python). You have to install Python, and all the Python dependencies your
Ansible code needs. Then it has the actual code of the agent, which it copies
to the machine each time it runs, which is stupidly inefficient. It is
actually easier to install and configure the agents of all the other config
management systems than it is to properly install, configure, and secure
Ansible's agent(s).

Masterless isn't a good thing, and a proper Ansible setup wouldn't be
masterless. The way Ansible is designed is that developers run the Ansible
code from their laptops. That means anyone making code changes needs to be
able to be able to SSH to every single server in production, with root
permissions. And it also risks them running code that hasn't been committed to
Git or approved. Any reasonable Ansible setup will have a server from which it
runs, Tower, a CI system, etc.

Fast. Ha! I benchmarked it against Salt, wrote the same code in both, that
managed the exact some things. Using local execution so Ansible wouldn't have
an SSH disadvantage. Ansible was 9 times slower for a run with no changes
(which is important because 99.9% of runs have no or few changes). It is even
slower in real life. Why is it so slow? Well, SSH is part of it. SSH is
wonderful, but it isn't a high performance RPC system. But an even bigger part
of the slowness is the insane code execution. You'd think that when you use
the `package` or `apt` modules to ensure a package is installed, that it would
internally call some `package.installed` function/method. And that the
arguments you pass are passed to the function. That is what all the other
configuration management systems do. But not Ansible. No, it execs a script,
passing the arguments as args to the script. That means every time you want to
ensure a package is still installed (it is, you just want to make sure it is),
Ansible execs a whole new Python VM to run the "function". It is incredibly
inefficient.

Secure. Having a network that allows anyone to SSH to any machine in
production and get root isn't the first step I'd take in making servers
secure.

It isn't just YAML. It is a programming language that happens to sort of look
like YAML. It has its own loop and variable syntax, _in YAML_. Then it has
Jinja templating on top of that. "Just YAML" isn't a feature. To do config
management correctly you need actual programming language features, so use an
actual programming language.

If I had to pick one again, I'd pick Salt. Specifically I'd use Salt with
PyObjects[5] and PillarStack[6].

But I'll reiterate, you shouldn't start with a config management system. Start
with something fully managed. Once you need a config management system, take
the time to do it correctly. Like it should be a six week project, not a thing
you do in an hour. Chef and Salt will take more time to get started, but if
setup correctly they will be much better than any Ansible setup. If you don't
have the time or knowledge to do Chef or Salt correctly, then you don't have
the time or knowledge to manage VMs correctly, so don't.

[1] [https://cloud.google.com/container-optimized-
os](https://cloud.google.com/container-optimized-os)

[2]
[https://aws.amazon.com/bottlerocket/](https://aws.amazon.com/bottlerocket/)

[3]
[https://docs.saltstack.com/en/latest/ref/renderers/](https://docs.saltstack.com/en/latest/ref/renderers/)

[4]
[https://en.wikipedia.org/wiki/Flask_(web_framework)#History](https://en.wikipedia.org/wiki/Flask_\(web_framework\)#History)

[5]
[https://docs.saltstack.com/en/latest/ref/renderers/all/salt....](https://docs.saltstack.com/en/latest/ref/renderers/all/salt.renderers.pyobjects.html#module-
salt.renderers.pyobjects)

[6]
[https://docs.saltstack.com/en/master/ref/pillar/all/salt.pil...](https://docs.saltstack.com/en/master/ref/pillar/all/salt.pillar.stack.html)

~~~
smw
> One of the biggest advantages is that the config language is an actual
> programming language (Ruby). All the other systems started with language
> that was missing things like loops, and they have slowly grafted on
> programming language features. It is so much nicer to use an actual
> programming language.

This is the most important thing I've learned from using all of the mentioned
options, as well as older options like cfengine. Please make it easy to work
in an actual programming language instead of an unholy mix of yaml and jinja!
I think Ruby excels for this, as well as Clojure or a Scheme, because it's so
easy to write "internal" DSLs.

------
dsr_
If you already know and/or use Ruby, use Chef.

It is silly to ask "what should be used at FAANG scale", because either you
are working at a FAANG and you are using what they use, or you are very
unlikely to ever be at that scale -- and somewhere along the journey to
getting there, you will either find or write the system that you need.

~~~
nickthemagicman
Some FAANGS I've heard about roll all their own config management tools.

~~~
cinquemb
>
> [https://news.ycombinator.com/item?id=18375662](https://news.ycombinator.com/item?id=18375662)

;)

~~~
nickthemagicman
Odd reply to my statement that FAANGs roll their own config management tools.
Are you smoking a lot of weed during this quarantine? You can google linkedins
version of docker 'locker', facebooks hydra, etc etc. Alot of these big
companies roll their own tools because consumer tools aren't scale enough.
Linked In rolled Kafka for just this reason as well.

Also, odd and a little creepy that you're stalking my comments but I hope you
enjoyed my brain droppings. ;)

------
canterburry
For anyone here who isn't yet using and end to end setup like terraform,
ansible, puppet etc and has more basic needs around managing environment
variables and application properties, I highly recommend
[https://configrd.io](https://configrd.io).

------
gentleman11
I used to use Chef, but I really didn’t like it. For small projects now, I
just use a set of shell scripts, where each installs and/or configures one
thing. Pair it with a Phoenix server pattern. It has treated me very well the
last two years

------
mmahut
What about Nix?

------
wickedOne
puppet is pretty good in my experience

------
nunez
For most teams: Docker or Ansible all the things.

For teams that have a large IaaS footprint: Chef (agent-less actually adds
complexity in this environment.)

------
bovermyer
Ansible where possible, Chef when I have to (for legacy reasons, usually), and
Terraform/Docker/Packer when given the option.

------
den-is
I'm working with both Ansible and Puppet for the last 6 years on a daily
basis. Ansible for: \- i absolutely love and adore Ansible \- extremely easy
and much much much pleasant to read. Sometimes ansible feels like poetry to
me. \- ad-hoc SysAdministration - I do not mean "ansible" command, but actual
style of work when you need to do something right here and right now. \-
prototyping, Dev and staging environments setup, experiments.

Puppet for a polished production. Puppet has robust and stable ecosystems and
infrastructure. It is a client-server model from the beginning. It is easier
to create and put in the production library of all your puppet modules. It has
hiera for central config values and secrets management. At the same time, I
hate the Puppet's resource relations. Puppet's architecture feels like
something developed in 1991, an ugly monster monolith and extremely heavy.

Terraform. For actual low-level infrastructure management. And I don't like to
put whole high-level host configuration into IaaC! IaaC has minimal host
configuration capabilities. Set hostname, set IP, register with Puppet or call
ansible - only a few lines in user-data or bash-script on boot, which then
calls actual configuration management!

Gitlab-CI - switched from Jenkins. Concourse-ci looks extremely interesting!
Also reviewing some GitOps frameworks. Kubernetes - bare-metal runs self-made
puppet-based pure k8s. Also, kops and EKS for AWS. Applications in k8s are
managed via Helm.

Compared to Puppet Ansible is less enterprissy, it is more like a hipster
tool. I would like to replace the Puppet with Ansible. But maybe I need the
help from all of YOU who have voted for Ansible. How do you achieve Puppet's
level of management with Ansible? How do you achieve client-server setup with
ansbile - somehow I do not see lot's of people using ansible-pull? (Without
using Tower!) You create cronjob with ansible-pull on a node boot? :D Or whole
your ansible usage is limited to running ansible-playbook from your console
manually? Ok maybe you sometimes put it in the last action of your CI/CD
pipeline ;) Nodes classification and review? Central config values management
for everything?

I use hashi Vault and lots of other things too. Some questions are rhetoric.
I've just expressed my mistrust in ansible which doesn't feel complete. :(

How to do you manage a fleet of 1000, 500 or even 200 hosts with ansible? When
after provisioning you need to review your fleet, count groups, list groups,
check states. Ah, you want to suggest Consul for that role? :)

Kubernetes for the win. It will replace config management diversity. It gives
you node discovery, state review, and much much much more.

------
jujodi
We're using Terraform for infrastructure and Ansible for deployments with
great success.

------
jimbob45
Shameless self-plug: ChangeGear. We’re cheapest in-class for medium-sized
companies.

------
thedance
At G scale you could never afford to run something as grossly wasteful as
chef. It would be cheaper to have several full-time engineers maintaining a
dedicated on-host config service daemon and associated tools, than it would be
for some ruby script to cron itself every 15 minutes.

~~~
antoncohen
That's strange, because the closest company to Google scale is Facebook, and
they actually use Chef in production[1][2][3] on hundreds of thousands of
servers.

[1]
[https://www.chef.io/customers/facebook/](https://www.chef.io/customers/facebook/)

[2] [https://engineering.fb.com/core-data/facebook-
configuration-...](https://engineering.fb.com/core-data/facebook-
configuration-management-community-and-open-source/)

[3] [https://github.com/facebook/chef-
cookbooks/](https://github.com/facebook/chef-cookbooks/)

~~~
thedance
100s of 1000s. Adorable.

------
pixiemaster
Salt + Serverless.

------
joe987654
I'm also really interested in what companies at scale are using. Anyone here
from FAANG?

~~~
cranekam
Facebook uses Chef to manage its base hosts [0] and its own container system
[1] to manage most workloads.

[0]
[https://www.chef.io/customers/facebook/](https://www.chef.io/customers/facebook/)
[1] [https://engineering.fb.com/data-center-
engineering/tupperwar...](https://engineering.fb.com/data-center-
engineering/tupperware/)

------
temptemptemp111
docker-compose + custom stuff + reduce all dependency on tooling

~~~
ehou
Here also docker-compose. Easy to separate tenants using same stack
(nginx+django+postgres+minio).

Question though: how do you manage the possible rebooting-containers-loop
after a host reboot? I had to throw in more memory to prevent this but it
feels like a (expensive|unnecessary) workaround. Anyone figured out how to let
multiple containers start after each other (while not in 1 docker-
compose.yaml)?

------
tayo42
Kind of surprised there isn't really a consistent answer for this. Just
skimming through these answers.

