
The Configuration Complexity Curse - UkiahSmith
https://blog.cedriccharly.com/post/20191109-the-configuration-complexity-curse/
======
DenisM
It occurred to me that the configuration hell is a consequence of
Microservice-heavy approach: we have reduced complexity of cross-component
interactions by compartmentalizing each component behind a hard boundary, and
now we’re paying the price for this free lunch by trying to put those things
back together and keeping them that way.

Turns out the complexity didn’t go anywhere, it was just biding it’s time,
waiting for the right moment to strike back.

Damn you, entropy!

~~~
derefr
If one component needs to know about the use-case it’s serving for another
service in order to do its job, you didn’t factor your components correctly
(i.e. either those components really _are_ one component, or the component
boundary should just be on a lower or higher abstraction layer than you put
it.)

S3 is an example of a correctly-favoured service. Object storage is a non-
leaky abstraction. Object storage doesn’t need to know what an object “is” or
why something is storing one. There’s no use-case-specific policy that can be
applied to only specific “types” of objects. There’s just objects and buckets,
and policies you can apply arbitrarily to any object or bucket because they’re
policies _about objects and buckets_ , rather than policies about some higher-
level thing.

If your component’s API doesn’t create a clear, obvious, “self-contained”
abstraction like object storage’s “objects and buckets” abstraction, then you
don’t really have an extractable component; you just have a monolith that
talks to itself using an extra layer of indirection.

~~~
falcolas
Question then. If S3’s abstraction is so good, why do they keep deliberately
poking holes in the abstraction, including but not limited to SQL access to
the contents of the individual files? Not to mention each object is a file
that can be accessed from the internet, requiring a lot of thought into how to
properly protect, limit, and bill access to it.

I’m not trying to be snarky, I’m just pointing out that even a on-the-surface-
ideal abstraction is leaky as hell. If S3, storing objects, can’t keep it’s
abstraction clean, how can we be reasonably expected to keep our own
abstractions clean?

~~~
derefr
Those features may look like they're part of the S3 "service", but they're
actually separate services built on top of S3, grouped into the S3 API
namespace but hitting separate higher-abstraction-layer microservices that
themselves make S3 API requests to accomplish their task.

The fact that all of those services that seem to be "a part of" S3 can be
implemented on top of S3, without touching the code of S3 at all, _and_
without the higher-abstraction-layer code having to reimplement any of the
S3-layer logic to do its job, is precisely what makes S3 a well-factored
service.

------
fake-name
If you're thinking about implementing this or something like this, _please
stop_.

Unless you're writing in a compiled language, use the same language your
application is written in for your configuration. If you have a python
application, have a python file for the configuration. Same for ruby or
whatnot.

You don't need to use the entire language, but at least use the language's
lexer/parser (cf. json/javascript). That way, all existing tooling for the
language will work for the config files (ask me about how saltstack happily
breaks their API because you're not "supposed" to use it, despite the fact
that they have public docs for it). Additionally, people won't need to figure
out all the stupid corner cases in your weird language that has no uses
outside of a few projects.

Additionally, by making your configuration language an actual language, you
also simplify a lot of the system design, because the configuration can act
directly against your API. This means using your tool from other tools becomes
much more straightforward, because the only interface you actually need is the
API.

The _existence_ of "configuration language" is, itself, a mistake.

~~~
jldugger
I have a thousand people working on the product I support. Do I have your
permission to think about using this tool?

------
cwp
Every time an article on configuration languages gets posted here there's
always a chorus of pooh-poohing. This is a solved problem, just use
YAML/JSON/XML/whatever. Just use the same language as your app. Just use bash.

I don't understand why people hate on new languages so much. Learning a
language is pretty easy, at most a week or two of concentrated effort. On the
other hand, we put years of effort into software projects. If learning a new
language would make us 10% more productive with a sizeable chunk of the work,
that's a HUGE WIN.

Configuration at scale is something new and we don't have good ways of doing
it yet. Inventing new languages and tools to make it easier will be hugely
rewarding; WAY more than a 10% improvement. Some of the stuff we invent won't
be perfect. So what? Let a thousand projects bloom, we'll see what works and
what doesn't. That's how our profession advances.

~~~
archibaldJ
"at most a week or two of concentrated effort" \- that's actually becoming
increasingly costly due to the ever-increasing opportunity cost attached to
basically everything. And it consumes mental resources too. Things may end up
becoming more unproductive. And then there is the risk of a new
language/toolkit/library/framework becoming under-maintained over the long
run.

I believe these are the reasons why everyone ends up using ad hoc solutions
and doing it their own way that works in a case-by-case basis despite not
enjoying as much generalisation as says the new
language/toolkit/library/framework which is basically an implementation
following specifications designed to solve a more general/abstract problem in
a particular way such as described in this article.

Nonetheless I'm glad this article has received a good amount of upvotes to
appear on the front page of HN. I'm just wondering what the conversion rate is
like i.e. how many % of people who clicked in would go through the entire
article and learn about CUE lang in its details and how many % of these people
would end up using CUE lang. And then there is the question of how many % of
these people would stick to CUE lang over the long run (says over the course
of 1 year).

------
GrantZvolsky
I just switched my pet project to Cue. It did take me cca 5 days to get
familiar with the language, but the result is much more elegant than the
original Kustomize configuration. I'm looking forward to using Cue in future
k8s projects and discovering new use cases for the language.

------
jacquesm
95% of you reading this and thinking oh, that's neat let me use this for the
company are going to waste time and resources. Why? Because the kind of
complexity the article tries to address is not reached by the vast majority of
the deployments out there. Typically the 'superstructure' is larger than the
thing it supports. As soon as that's the case and there is not credible path
to a (near!) future where you will need that superstructure you are better off
with the simplest configuration that you can get away with. It will be more
robust, easier to modify and easier to troubleshoot than any of these
abstraction layers.

Question one for anything that you aim at production should be: "Do I _really_
need this?" and only if the answer is a very clear yes and you're not just
trying to implement $COOLTECH because you are distracted by its shininess or
because 'Google does it too' then you should go ahead and implement.

My #1 technique for improving installations is to rip out unnecessary
superstructure which is obscuring why things are going wrong, and more often
than not is actually part of the problem. Works every time.

The same goes by the way for modeling your development process on whatever
Spotify does with your 6 people development team, and in fact for any other
piece of tech that you bring on board. Each of those pieces has a cost of
implementation, a cost of maintenance and a cost of cognitive overhead
associated with it. The best shops out there use the _least_ number of
technologies they can get away with.

~~~
wodenokoto
We don’t have a huge system, but already we are seeing bash scripts that sed
values in Kubernetes manifests or configuration stored as a dictionary inside
a python script that can generate large yaml files as part of a deploy
pipeline.

I think it is an inherent error in basically all orchestration tools
(Kubernetes, cloud build/formation, etc), that they don’t support scripting.

~~~
octopoc
For a long time I was frustrated by the lack of scripting tools for
orchestration. My ideal situation would be to write the orchestration /
deployment config using the same programming language the app is written in.
Eventually I found Pulumi, which supports a limited number of programming
languages but is basically what I was looking for, except I would like C# to
be officially supported.

Pulumi has a high-priority issue for deciding how they're going to support
arbitrary programming languages:
[https://github.com/pulumi/pulumi/issues/2430](https://github.com/pulumi/pulumi/issues/2430)

I'm watching this with great interest.

Edit: looks like it's not high priority anymore :/

~~~
webo
How has your experience been with Pulumi? I tried it back in its alpha days to
compare to terraform but I didn’t make much progress due to lack of
documentation and limited ecosystem.

~~~
ianpurton
We're using Pulumi for a production system and couldn't be happier.

The ability to template Yaml in Typescript and create infrastructure is mind
blowing, all the time checked by the compiler. Well it's not so much
templating as using Typescripts built in JSON syntax.

Using VSCode we can refactor our infrastructure code, i.e. create functions
for sub-levels of our Yaml.

So we effectively combine our Kuberentes Yaml and infrastructure. It's great.
Try it.

~~~
levi_b
Glad to hear you’re enjoying it! We just released a bunch of new k8s
content/features this week that you might want to check out. [1]

(I maintain the k8s provider at Pulumi)

[1] [https://www.pulumi.com/blog/crosswalk-
kubernetes/](https://www.pulumi.com/blog/crosswalk-kubernetes/)

------
tuldia
Sometimes there is no problem to be solved, all these tools just complicate
everything.

Do this instead:

1\. Use flat yaml files. No loops nor conditionals, no complexity.

2\. One (single) yaml file templated by ansible just for secret/sensitive
stuff.

3\. Done.

Boring is better and is easy to diff.

~~~
q3k
What if you are using already existing tooling that takes very verbose yaml
files? I've seen Concourse CI pipelines that push past 5k lines of YAML, where
50% internally is repeated 10-line blocks, and there is tons of repetition
across different YAML files.

~~~
tuldia
Duplication is far cheaper than the wrong abstraction.

I do prefer to have lots of dumb files than having to deal with the cognitive
load of a tool that in the end will generate lots of files anyways.

~~~
combatentropy
> Duplication is far cheaper than the wrong abstraction.

I'm going to chew on this. For me, Don't Repeat Yourself has been one of my
highest values. My former colleagues would copy and paste code everywhere. It
was a pain to make changes to, and I relished refactoring it.

But I also regret some of the libraries I wrote in my early years. They aren't
designed how I would today, but now several applications depend on them.

One thing I will say is that it's okay for your first draft to be ugly. It
helps to see all the duplication before you design the abstraction.

~~~
api
I would second the parent. DRY was a false God. I would say RCL: reduce
cognitive load.

Obviously if there is a manageable way to reduce repetition, take it, but I
would not add a lot of complexity for the sake of brevity. That's turning your
dev team into a data compression algorithm made of meat.

~~~
q3k
Cognitive load is also present from vastly duplicated code and config: having
to remember to update or take into consideration 20 other places in your
codebase any time you make a change.

~~~
api
Yes, both extremes can increase cognitive load. The ideal is simple,
effective, parsimonious abstractions, but that's much easier said than done.

I was just arguing that adding a lot of complexity to reduce duplication is an
exercise in diminishing returns pretty fast.

------
chuhnk
This post is really timely. We're going through a wave of innovation in how we
ship software but in that we're having to reason far more about the
infrastructure. I think we might be reaching the end of that phase and the
realisation that none of us really want to touch containers or kubernetes, we
just want it to fade into the background. Because at the end of the day
software development is still software development and not much has changed
there despite the underlying platforms being completely rewritten.

I'd argue that we might once again be on the cusp of true serverless but in a
way that might become ubiquitous. If we could unlock a shared platform like
GitHub but for running software we'd be in a much better place.

~~~
ubu7737
Urges won't translate into practice so easily. The abstractions of servers
won't be so easily reduced to stateless method calls. The design of a cloud-
based service is still so relevant that there is no abstraction which can
possibly reduce it at the current moment.

------
rauhl
I’ll have to take a look at CUE; it might be worth using.

But I have to ask: isn’t there something simpler which can handle taking a
declarative specification and adding imperative behaviour to it? I’m writing —
as anyone who knows me might suspect — of S-expressions & Lisp.

They have an advantage over CUE in that one might well choose to write one’s
entire program in Lisp & S-expressions. It doesn’t look like CUE is intended
to be the whole-program language.

I remember that Tcl used to be commonly used for config-files-which-need-a-
bit-of-scripting, but while it is an awesome language (really!) one probably
doesn’t want to write an _entire_ program with it, but rather use it to stitch
functions written in C together.

~~~
frumiousirc
Jsonnet (and so CUE) has a lot of overlap with LISP.

I find it's syntax more suited to constructing data structures than I do LISP
if only because "{}" and "[]" are more concise ways to indicate "object" and
"array" than using an S-expression.

~~~
sooheon
EDN is a lispy data notation with exactly those conventions:
[https://github.com/edn-format/edn](https://github.com/edn-format/edn)

------
alkonaut
Don’t use yaml or json for (infrastructure) configuration. Use code.
Configuration files checked in with source doesn’t make it “configuration as
code”. If you don’t want to roll your own tools, there are tools like pulumi
for this.

~~~
falcolas
Yaml, and every other configuration language, has gotten complex enough in its
interpretation to be its own programming language. Helm charts, for example.
Turing complete templating over the top of Turing complete (or damned close if
not) k8s configuration objects.

We’re already programming in code, just by shoehorning one domain-specific
language into another domain.

------
overgard
I don't know if it's the best way, but the way I've been managing kubernetes
at work is with a custom typescript library to interface with kubernetes. That
library has two frontends, a website and a CLI. Whenever I add some new
feature to the cloud I prototype it in YAML first and then when I get it
working I integrate it into the program.

It works mostly pretty well. It's a lot more organized and powerful than
having a bunch of YAML. It does feel a bit "heavy" though. There's also parts
of the kubernetes api that are hard to deal with (log streaming for
instance..)

I don't put everything in the tool though. Anything that's a one-off I just
check in YAML for. But anything that you might do repeatedly I add to the CLI,
and the website is for more general non-ops use

~~~
fhsgajajdgsj
Do you know about Pulumi (pulumi.io)? Their approach is similar.

------
nojvek
Totally agree. K8s using yaml as a config format is a shit show. They should
have used good ol’ JSON which has a lot smaller surface area. It has proper
schema definitions and a bazillion languages interop with it.

Then we can use whatever tools we want to generate and customize as needed.

YAML is ugly and doesn’t really solve a problem.

Jsonnet on the other hand is far more elegant templating language that solves
a real need of generating json/yaml files.

Please please don’t use a text templating language. You’re in a world of hurt.

~~~
hinkley
Years ago, someone showed me a python tool for build automation that was
written in an imperative style although it was declarative underneath. Like a
builder pattern. That looked pretty promising, but I have misremembered then
name and so can’t find it again.

I’d rather have something like that for infrastructure management.

~~~
datashaman
Perhaps you mean cog?
[https://nedbatchelder.com/code/cog/](https://nedbatchelder.com/code/cog/)

------
cbushko
I have lived the first 1/2 of the article for the last 1.5+ years and have
gone through that mental cycle. We've been moving our product from AWS to GCP.
We have a custom deployment tool based of off Capistrano, Chef and Cloud
Formation to setup our stack in AWS. There are a couple things that have
helped me and may help others who have to do a similar thing (or any setup for
that matter).

Tooling: I looked at them all. yaml, kustomize, helm, deployment manager, etc
and in the end I went with Terraform. The reasoning was simple. We deploy
Infrastructure, not kubernetes services and terraform lets you do the entire
thing with one tool. It has the ability to do loops and clunky if-like
statements but if you want real programming power then you can create your own
terraform provider which is what we did. The pattern of how to create a
provider already exists so you are not re-inventing the wheel. We even add our
own services to our provider so that we have one tool for setup of GCP
resources -> Kubernetes resources -> our own service(s) resources. 1 tool, 1
flow, 1 statefile and the same pattern when working with it all.

Establish patterns: "My service is a multi-tenant service and should do X".
Don't allow people to get away with special cases when it comes to deploying
infrastructure. Create patterns and stick to them. As an example, one pattern
I have is that I have 4 variables that all of my terraform modules use:
project, region, cluster and tenant. A combination of any of those variables
is enough to create unique resources for everything you deploy... dns names,
storage buckets, databases, service accounts, namespaces, clusters, etc. Those
variables allow me to know where your git repos are, what GCP project your
deployment lives in, what kubernetes cluster you are on and what namespace you
are in. Patterns.

Keep it simple: There will be pressure to have custom settings and provide as
much flexibility as possible. Try and avoid this. Set defaults for values and
try and reduce the number of configuration options available to others. In our
case, there were around hundreds of environment variables being set on a
deploy and most of them were the same. I took the list, standardized the
environment variable names, DATABASE_USER, MYSQL_USER, MYSQL_USERNAME,
DB_USERNAME... come on guys!, and deployed them as a secret in kubernetes so
that all of our running services can access them. Reduce complexity.

------
bogomipz
The author states:

>"kustomize would be the most well known tool now it that is it integrated
into kubectl. This seems to work well enough and could be feasible for simple
use cases. In comparison to data configuration languages though, this strategy
breaks down when configurations grow in complexity and scale. Template writers
lack abstraction and type validation that matters when things get complicated,
as the semantics are locked into an opaque tool and not exposed as language
features."

How exactly does this strategy break down? This sounds a bit hand-wavey to me.
Isn't Kustomize essentially patching? And isn't the type validation done by
the underlying API objects in the yaml that Kustomize is patching? Am I
missing something obvious or a more subtle point?

------
d_burfoot
Why do people look upon the hideous face of complexity and brittleness and
configuration hell and decide that the solution is to introduce yet ANOTHER
piece of technology? Don't they understand that this is the exact instinct
that led us to hell in the first place?

~~~
wpietri
I think it's a combination of two things. One is looking for minimum change,
which is honestly a pretty good urge. People like to keep doing mostly what
they're doing. Having too much of the opposite instinct leads to eternal
thrash.

The other, less good piece is people not being up to the task of going down to
the fundamentals and building back up from there. And not just the technical
fundamentals, but also those of system purpose, user needs, and how value gets
delivered.

Which honestly, I get as well. Having tried using Kubernetes, I'm unimpressed.
But it has such enormous momentum that even if I were sure I had a better
solution, I doubt I'd bother going my own way. Instead I'd just try to
mitigate the pain. Ideally I'd find a way that might lead people out of the
technological cul de sac they ended up in.

------
costrouc
There is a quite mature configuration language nix that is behind nixpkgs,
nixos, and many other projects. Worth checking out for others here that are
interested in abstractable configuration langauges. It is a functional lazily
evaluated language.

~~~
pblkt
Nix is dynamically typed. This shows badly in its discoverability and
validation stories.

The ecosystem is fantastic, but it's hard to make the case to migrate an
existing configuration to it for the sake of using nix.

Dhall is another option in this space, BTW. It has other faults that score CUE
existence points, thought.

~~~
cwp
I've been wondering if CUE's graph unification idea could be used as a type
system for Nix. It might be hard to square with the way nix does overlays and
overrides. It's a very common thing to take some existing derivation and
produce a tweaked version. That might not work with graph unification.

------
samsquire
I'm working on devops-pipeline to make complicated infrastructure easier to
understand and deploy. [https://devops-pipeline.com](https://devops-
pipeline.com) It is configured by a dot graph file.

------
Nihilartikel
After having to configure Druid clusters and their many many service
configuration files, I've started just immediately reaching for templating &
code-generators to build all of the disparate configuration artifacts the
moment that I feel annoyed by redundant information in multiple places.

I love the idea of Terraform, but find that the language only goes part way,
and is itself pretty idiosyncratic. It's pretty nice, though, if you move up
by another step of abstraction and write code to generate your HCL. If you get
lispy about it, then you can have infrastructure defined as data, generated by
code, that is also data...

------
yahyaheee
This is interesting and I feel like it’s somewhat on the right path but may
not go far enough. I think the right solution is actually a new protocol

------
tannhaeuser
I wish CUE really all the best, but every senior dev can tell you that the
problem of too many config languages isn't solved by yet another config
language to rule them all (insert obligatory xkcd 924 reference here).

~~~
garaetjjte
I think you mean 927?

------
fhsgajajdgsj
This is a really good idea. My take on this was to use Prolog but it turned
out people did not like Prolog:
[https://github.com/davidk01/cwacop](https://github.com/davidk01/cwacop). The
project is dead at the moment but being able to work with infrastructure using
a hybrid model of logic + imperative drivers I think is fundamentally a good
idea.

