
The architecture of declarative configuration management - zbentley
https://blog.nelhage.com/post/declarative-configuration-management/
======
gorgoiler
Building your own version of something is surely self indulgent wheel
reinventing, but that’s what I’m currently doing with distributed
configuration management.

It’s certainly been helpful in terms of understanding the boundaries between
parts of the system, as this post also describes. The desire to auto configure
everything is strong — one day you’ll have a VLAN hard coded into the config,
but the next day you’ll be trying to programmatically distribute VLAN ids
based on function instead. The day after that VLANs themselves are a artifact
generated from a higher level separation in your human readable config. What
was once a list of hosts with an attached VLAN id is now a group of hosts with
a declared function that _just happens_ to be programmatically assigned a VLAN
id, but only as an implementation detail.

The same happens with IP address management — your root configuration moves
closer and closer to being a document describing what you want to do, and less
about how to go about doing it (which is implemented in your custom
augmentations to the engine instead.)

When you can justify it as an exercise in understanding a system, and you have
time for it, building your own tool chain is incredibly rewarding.

------
paulddraper
> We could imagine resolving this tension if Terraform had two different
> convergence engines...The “create a new environment” engine, which always
> creates from scratch every resource it was given. This would excel at
> spinning up fresh environments as quickly as possible, since it would have
> to perform a minimum of introspection or logic and would just issue a series
> of “Create()” calls.

This just doesn't make sense; introspection usually allows you apply changes
_more_ quickly. For example, it takes seconds to describe and update an
existing AWS ELB; it takes minutes to delete and create a new one.

If you really want to forgo analysis and reuse of existing infrastructure,
just do

    
    
        terraform destroy
        terraform apply
    

> Importantly, however, it by design will never issue a destructive operation,
> and will error out on changes that cannot be executed non-disruptively.

The notion of a "destructive operation" is not clear cut. Is it destructive to
remove a file from S3? To update a file in S3? To delete a tag on an S3
bucket? To update a tag on an S3 bucket?

You can just manage this with permissions; that way you can specify exactly
what is and isn't an allowable operation. In fact, this is best practice as it
protects against bugs or misuse of the tool. Since Terraform already defaults
to non-destructive, adding infrastructure-level permissions would cause it to
work exactly as described.

A better example of customizable convergence would be the lifecycle management
options Terraform already has, such as create_before_destroy which ensures the
new resource exists before the old one is deleted.

------
purpleidea
I'm working on something called mgmt:
[https://github.com/purpleidea/mgmt/](https://github.com/purpleidea/mgmt/)

It runs as a distributed system, and is reactive to events, both in the engine
and in the language (a FRP DSL) which allows you to build really fast, cool,
closed-loop systems.

Have a look!

------
bandrami
Seems odd to talk about declarative configuration management and not mention
NIX or GUIX.

~~~
equalunique
I was going to comment the same thing. On the NixOS About Page[0], the first
main section is "Declarative system configuration model"

[0] [https://nixos.org/nixos/about.html](https://nixos.org/nixos/about.html)

------
di4na
It sounds like you want something closer to a prolog language in which you
could specify the rules for the engines to respect...

~~~
pjbk
Exactly, and structural and functional constraints over properties and rules.

I understand the need to reinvent the wheel, but most of these efforts feel to
me like customizations that most declarative languages can provide, albeit
possibly in a non-intuitive syntax.

~~~
di4na
yeah. I think the syntax and the fit of the syntax with the domain matter.
Tooling too.

I want to spend time building something like that, but i have doubt there is
really a market need/want.

------
ratiolat
Salt, for some reason, is not discussed in the article. It's declarative.

------
leg100
He's spot on about separating "configuration generation" from convergence.
There is no reason for the two to be the same system, the same tool. As he
says, Kubernetes is only concerned with the latter, whereas Puppet, Chef, and
Terraform conflate the two (insofar as it uses HCL).

And for all the talk of "declarative", there is no reason why the
configuration generation stage cannot be imperative, a la Pulumi. It is the
desired end state - the catalog that's being generated - that is declarative.

~~~
_frkl
I mostly agree, with the caveat that in my experience, if the configuration
generation stage is entirely imperative it is harder to reason about it. That
might not be a problem for low-complexity setups, but can get quite important
(and bad) in some more involved cases.

~~~
NieDzejkob
I suppose that functional languages might be a good fit this problem, then.
Nix and Guix come to mind.

~~~
_frkl
Yes, agreed. I don't think Nix nor Guix are there yet, in terms of usability
(not that most current alternatives are much better, mind). But I could see a
wrapping layer on top of either of them working quite well. It's difficult to
come up with abstractions for the kind of complexities we're dealing with
nowadays. I'm hopeful someone will eventually, though...

------
jcollins
The "pluggable convergence engines" is what we've built in Gyro[1] for this
very reason. We wanted to have more control over how changes are made in
production.

An example is doing blue/green deployments where you want to build a new
web/application layer, pause to validate it (or run some external validation),
then switch to that layer and deleted the old layer. All while having the
ability to quickly roll back at any stage. In Gyro, we allow for this with
workflows[2].

There are many other areas we allow to be extended. The language itself can be
extended with directives[3]. In fact, some of the core features like loops[4]
and conditionals are just that, extensions.

It's also possible to implement the articles concept of "non-destructive prod"
by implementing a plugin that hooks into the convergence engines (we call it
the diff engine) events and prevents deletions[5].

We envision folks using all these extension points to do creative things. For
example, it's possible to write a directive such as "@protect: true" that can
be applied to any resource and would prevent it from ever being destroyed
using the extension points described above.

[1]
[https://github.com/perfectsense/gyro](https://github.com/perfectsense/gyro)
[2] [https://gyro.dev/guides/workflows](https://gyro.dev/guides/workflows) [3]
[https://gyro.dev/extending/directive/](https://gyro.dev/extending/directive/)
[4] [https://gyro.dev/guides/language/control-
structures.html](https://gyro.dev/guides/language/control-structures.html) [5]
[https://github.com/perfectsense/gyro/blob/master/core/src/ma...](https://github.com/perfectsense/gyro/blob/master/core/src/main/java/gyro/core/diff/ChangeProcessor.java)

------
xmly
That is why immutable infra becomes popular. You could easily destroy and
rebuild the whole thing.

And for Prod env, what are discussing sounds like update behavior for me. In
cloudformation, you could choose the different update policies.

Comparing with each cloud's provisioning engine, cloudformation/gcloud
deployment manager/azure resource manager, terraform is lacking a lot of
features. So unless you are dealing with a private cloud, using cloud default
provisioning service is a no-brainer.

~~~
cosaquee
Cloud formation is after lacking support for resources that are new or less
popular. Terraform is much better in this, supports most of the resources from
the start as far as I know

~~~
xmly
In cloudformation, you could customize your resource types with aws lambda. It
is like creating the providers in terraform.

Plus, cloudformation is free and managed service. You do not need to maintain
and it is wrong, you could yell at AWS. Unless you bought terraform
enterprise, it is still a pain to maintain another possible failure point in
your system.

------
billsmithaustin
My experience is that the production operations engine is hard to get right
because your target environment can drift from the desired configurations for
reasons that you did not anticipate.

