
Why are we templating YAML? - jaxxstorm
https://leebriggs.co.uk/blog/2019/02/07/why-are-we-templating-yaml.html
======
joeduffy
My belief is that we've been slowly building up to using general purpose
languages, one small step at a time, throughout the infrastructure as code,
DevOps, and SRE journeys this past 10 years. INI files, XML, JSON, and YAML
aren't sufficiently expressive -- lacking for loops, conditionals, variable
references, and any sort of abstraction -- so, of course, we add templates to
it. But as the author (IMHO rightfully) points out, we just end up with a
funky, poor approximation of a language.

I think this approach is a byproduct of thinking about infrastructure and
configuration -- and the cloud generally -- as an "afterthought," not a core
part of an application's infrastructure. Containers, Kubernetes, serverless,
and more hosted services all change this, and Chef, Puppet, and others laid
the groundwork to think differently about what the future looks like. More
developers today than ever before need to think about how to build and
configure cloud software.

We started the Pulumi project to solve this very problem, so I'm admittedly
biased, and I hope you forgive the plug -- I only mention it here because I
think it contributes to the discussion. Our approach is to simply use general
purpose languages like TypeScript, Python, and Go, while still having
infrastructure as code. An important thing to realize is that infrastructure
as code is based on the idea of a _goal state_. Using a full blown language to
generate that goal state generally doesn't threaten the repeatability,
determinism, or robustness of the solution, provided you've got an engine
handling state management, diffing, resource CRUD, and so on. We've been able
to apply this universally across AWS, Azure, GCP, _and_ Kubernetes, often
mixing their configuration in the same program.

Again, I'm biased and want to admit that, however if you're sick of YAML, it's
definitely worth checking out. We'd love your feedback:

\- Project website: [https://pulumi.io/](https://pulumi.io/)

\- All open source on GitHub:
[https://github.com/pulumi/pulumi](https://github.com/pulumi/pulumi)

\- Example of abstractions: [https://blog.pulumi.com/the-fastest-path-to-
deploying-kubern...](https://blog.pulumi.com/the-fastest-path-to-deploying-
kubernetes-on-aws-with-eks-and-pulumi)

\- Example of serverless as event handlers: [https://blog.pulumi.com/lambdas-
as-lambdas-the-magic-of-simp...](https://blog.pulumi.com/lambdas-as-lambdas-
the-magic-of-simple-serverless-functions)

Pulumi may not be _the_ solution for everyone, but I'm fairly optimistic that
this is where we're all heading.

Joe

~~~
ff_
This is a great analysis, but it's missing a fundamental point: why do we have
a problem with these approximations of a programming language or just using a
programming language to template stuff?

Because your build then becomes an actual program (i.e. Turing complete) and
you have to refactor and maintain it! This is the common problem of using a
"programming language as configuration" (e.g. gulp?)

Dhall solves exactly this problem: [https://dhall-lang.org](https://dhall-
lang.org)

It has the same premises of Pulumi, but without the Turing completeness (I
don't know if/how Pulumi avoids that, but if it does it should be part of the
pitch), so you cannot shoot yourself in the foot by building an abstraction
castle in your build system/infrastructure config.

We use it at work to generate all the Infra-as-Code configurations from a
single Dhall config: Terraform, Kubernetes, SQL, etc.

And there is already an integration with Kubernetes:
[https://github.com/dhall-lang/dhall-kubernetes](https://github.com/dhall-
lang/dhall-kubernetes)

~~~
leg100
I don't get the problem with using a turing complete language to generate
configuration. There's nothing wrong with maintaining and refactoring a
program, that's a natural process for any program. If you don't want an
infinite loop, don't write one, as you wouldn't in any other program. You can
choose as much or as little abstraction as you so wish.

Give me a real language any day over dhall or jsonnet.

~~~
ithkuil
FWIW jsonnet is a "real" language. It's a dynamically typed, lazily evaluated
purely functional programming language).

~~~
leg100
Fair enough. I should have said "general purpose language" rather than "real",
which makes for flame-bait.

~~~
ithkuil
I once built a mandelbrot fractal renderer which emitted a data-URL encoded
PNG string to stdout in BCL (a spiritual predecessor of Jsonnet @ Google).

Yeah, I know what you mean. It lacks generic input/output, you cannot read
write arbitrary files and perform arbitrary network requests etc.

I do like that restriction in the context of managing configuration systems,
because it allows you to build hermetic evaluations.

With kubecfg we added the ability to import from URLs, which I wish was
available out of the box in jsonnet.

------
BossingAround
I know I'm in a minority, but I really dislike YAML... I recently did a lot of
Ansible and boy, at the beginning, I was just struggling a lot. Syntactic
whitespace kills me.

I don't like it in Python either, but for some reason, when I write Python,
it's a lot easier. Maybe YAML is just a bit more complex (and Python has
better IDE support..?)

~~~
ravenstine
> Syntactic whitespace kills me.

Okay, I'm gonna be the asshole in the room, but how hard is it to just use
consistent indentation? I can't count how many times I've heard people
complain about significant whitespace in languages.

Not only is it not difficult to begin with, but every code editor and IDE will
show you where there's a syntax error in your YAML. People are free to dislike
YAML, even for its significant whitespace, but how does it "kill you"?

Look at this example from the article:

```

something: nothing

    
    
      hello: goodbye
    

```

This is pure sloppiness, and anyone who has trouble carelessly adding
pointless bytes to code, no matter the language, is sloppy. I don't understand
why people criticize YAML and Python because "whitespace is hard".

P.S.: There's a similar configuration language called ArchieML, which is
similar to YAML but doesn't have significant whitespace.

[http://archieml.org](http://archieml.org)

~~~
pjc50
Three big things that annoy me even though I'm happily writing Python:

\- "cut and paste and edit" is broken. You can't autoformat the pasted code
into the right place, you have to go back and fix the whitespace. Since
whitespace is semantically significant, this can introduce bugs.

\- visually identical whitespace may not be textually identical whitespace.
Unless you go around breaking the tab key off your colleague's keyboards
you'll trip over this. Especially (again) if you paste. Occasionally seen in
merges too.

\- editors can no longer give you 100% correct indentation.

~~~
giancarlostoro
> \- "cut and paste and edit" is broken. You can't autoformat the pasted code
> into the right place, you have to go back and fix the whitespace. Since
> whitespace is semantically significant, this can introduce bugs.

Depends on how your editor is configured / it's feature set. Which makes me
wonder how editorconfig would handle this when enabled. It seems like a
insignificant issue to me, you can auto-PEP8 the code before pasting it. You
should probably be following PEP8 anyway (as far as spacing is concerned at
least).

> \- visually identical whitespace may not be textually identical whitespace.
> Unless you go around breaking the tab key off your colleague's keyboards
> you'll trip over this. Especially (again) if you paste. Occasionally seen in
> merges too.

I turn on show all whitespace on my editors regardless of programming
language. I've been burned by Sublime Text not just figuring out the already
defined whitespace ruleset for a file by what it's using and just shoving in
it's own defaults. I wish _all_ editors would base whitespace on what the
file's structure looks like, if there's mixed spaces, give me a warning.

> \- editors can no longer give you 100% correct indentation.

I don't understand this, it sounds like you've got your editor configured
poorly or something? But it goes back to how unintuitive the nice editors can
be. You can use editorconfig to define the indentation project wide, then any
editor should pick it up, of course if you define PEP8 at a minimum it
guarantees spacing settings.

I'm not sure if PyCharm covers a few of those cases, since I use it so
seamlessly I don't usually have complaints.

------
aetherlord
I didn't see this mentioned anywhere else, so another alternative (that I've
seen and really like conceptually, but haven't used so far) to all this
wildness with YAML and JSON -> [https://github.com/dhall-lang/dhall-
lang](https://github.com/dhall-lang/dhall-lang), and for kubernetes
specifically -> [https://github.com/dhall-lang/dhall-
kubernetes](https://github.com/dhall-lang/dhall-kubernetes)

~~~
svnpenn
god those examples are ugly - commas at the beginning of a line? mismatched
brace styles?

~~~
necubi
It's a haskell thing. The main advantage is that each line is independent. You
can comment out a line or add a line at the end without modifying anything
else.

~~~
basil-rash
Each line is not independent: you cannot comment out the first line. A better
approach is to allow trailing commas. (I suppose you could allow leading
commas, does Haskell support this?)

~~~
nh2
Unfortunately no.

Allowing trailing commas, like Python does, would be really great.
Unfortunately trailing commas already mean something: (a,b,) is a function
that still takes 1 argument to make a triple. It's called "TupleSections".

------
miohtama
If you have been around long enough you still remember the world that was
excited about XML and templating it using XSLT. As a hindsight it was a
horrible world.

Even though YAML is not optimal, it is a human friendly compromise between too
verbose XML and machine only JSON. It lacks native templating, leading to
funny constructs e.g. with Ansible files. However human kind has made progress
and will make progress further, so it is just a matter of time until someone
comes up with sane "native templated YAML" and all projects will adopt it.

~~~
chriswarbo
> If you have been around long enough you still remember the world that was
> excited about XML and templating it using XSLT. As a hindsight it was a
> horrible world.

I actually really like the _idea_ behind XSLT: machine-friendly, human-
tolerable, structured data + declarative rules for turning that data into a
display, or a report, or whatever else.

The execution was horrible though: incredibly verbose, lots of
overcomplication due to XML weirdness/asymmetries (e.g. attributes vs elements
vs text, namespaces, ...); mixtures of different languages hidden inside each
other (e.g. XPath hidden in attributes); etc.

I would really like to see what this could look like if done in a more
minimalist, lispy fashion (normal code-is-data stuff in Lisp is _similar_ ,
but I think term-rewriting is a more appropriate evaluation mechanism for such
rules)

~~~
pytester
XSLT was one in a litany of domain specific languages (ant, apache rewrite
rules, latex macros, etc.) that evolved towards turing completeness because
that's what the problem space demanded.

In most if not all of these cases an existing and well designed turing
complete programming language would likely have better served them.

~~~
thanatropism
There was a post on HN a few weeks back to the effect that it's rather easy
for Turing completeness to emerge accidentally. I wish I remembered more
specifics so I could find it again.

~~~
chriswarbo
Was it
[http://beza1e1.tuxen.de/articles/accidentally_turing_complet...](http://beza1e1.tuxen.de/articles/accidentally_turing_complete.html)
by any chance?

------
jrockway
I saw this title and immediately knew the article would be about Helm. I don't
think anyone wants to use Helm. People use it for a set-and-forget thing that
they don't care about (who cares that it's called impressive-leopard-
kubernetes-dashboard, after all.)

kustomize is much more sane for your own stuff:
[https://github.com/kubernetes-sigs/kustomize](https://github.com/kubernetes-
sigs/kustomize)

It is actually a little bit too magical for my taste, but I continue to use it
because it hasn't done anything stupid. I have one file that maps logical
names to images in a container repository. If I create a service called "foo"
pointing to selector.app.label="foo" in the base, then in production it's
called foo-prd and the label magically updates to foo-prd for the selector. It
actually understands what it's generating, and while they might have taken it
a little bit too far, it's far better than just dumb text replacement.

~~~
alexk
> I don't think anyone wants to use Helm.

Here is why everyone should use Helm:

Helm 2.0 introduced package as a first-class concept for Kubernetes and
created the standard to distribute applications, thanks to Helm thousands of
people could discover and collaborate on cloud-native deployments of the open
source software
[https://github.com/helm/charts/tree/master/stable](https://github.com/helm/charts/tree/master/stable)
published and managed by organizations and contributors all over the world.

Helm 3.0 keeps innovating, it adopts the most forward-thinking approach to
package management and Kubernetes config management by using higher level
domain specific language based on Lua to create expressive package management
system:

[https://sweetcode.io/a-first-look-at-the-
helm-3-plan/](https://sweetcode.io/a-first-look-at-the-helm-3-plan/)

Helm is also backed by CNCF[1] and is the best choice so far for organizations
to create a reproducible CI/CD pipeline in a Kubernetes cluster.

[1] [https://www.cncf.io/blog/2018/06/01/cncf-to-host-
helm/](https://www.cncf.io/blog/2018/06/01/cncf-to-host-helm/)

~~~
jrockway
That is a lot of buzzwords.

I don't really trust Helm to do anything that's actually useful in the long
term. It will get something running very quickly, but whether or not it's
maintainable, I am yet to be sure of. For example, very early on, I installed
the helm chart for prometheus. Now I want it to live in the kube-system
namespace because I am tired of seeing its resources in the default namespace.
For some reason, I highly doubt that changing values.yaml to change the
namespace is going to do anything other than give me a fresh instance of
prometheus running in another namespace. It's not going to use the already
allocated storage volume to satisfy the persistent volume claim in the new
namespace. It's not going to update the other stuff in my cluster to refer to
prometheus-pushgateway.kube-system.svc.cluster.local. It's not going to update
my Grafana dashboards to refer to the new namespace, even though I installed
Grafana with Helm! So what did I really gain? Helm isn't giving me the ability
to manage the long-term lifecycle of third-party software. It just explodes
some API objects all over my cluster and lets me delete most of them
automatically. That's all it does.

I get why Helm is popular. You can get some piece of software running in
Kubernetes with minimal effort. I would have never successfully made some
random complex piece of software work correctly in Kubernetes on day 1,
especially using something that assumes you deeply understand the core API
objects like kustomize does. What that boils down to is that Helm doesn't go
far enough, and in its current state, just encourages people to make mistakes
early.

------
SamWhited
As others in this thread have said: I ask this question all the time, except
s/templating/using/.

YAML is insanely over complicated; it's as bad or worse than XML for config
files, and it doesn't even have the nice streaming mode. Not to mention that
it's a bit of a security nightmare (seriously, who put pointers into the YAML
spec?).

And, on a more subjective note, YAML is just confusing: between all the
significant whitespace and the random single character symbols that no one
ever remembers what they do, I never get a YAML document right on the first
try.

Templating it really does add a whole new level of headache too.

~~~
macspoofing
>it's as bad or worse than XML for config files

XML works very well for config files. It's schema-optional (but is there),
well-specified, human-readable, has plethora of supporting technologies
(making things like templating easy), and is well supported by every language.

At the very least it is way better than JSON.

~~~
Macha
It's missing one important part for config files. It's tedious to write by
hand.

~~~
titanix2
XML is not tedious at all with the right tooling. For example a tool like
Visual Studio IntelliSense proposes only elements and attributes valid in the
context, automatically close tag, format the file and complete opening tags
too so it makes editing XML file a breath.

~~~
MrStonedOne
The tooling bar for config files is set firmly at notepad.exe

If your config file requires more tooling than that you fucked up.

~~~
heavenlyblue
I mean, with Visual Studio I can even the change the code of the application I
am running on the fly, depending on what exactly I am configuring :)

That doesn’t make XML an easier format to maintain.

------
niftich
File-based configs are a troublesome abstraction: they package unrelated
concerns into a rigid document whose form must take a particular, application-
dependent shape, and the assembly and disassembly of that document essentially
becomes an API where key-value pairs are mixed with complex glue code. The
application has to do this internally, but anyone who's generating their
configs are also doing parts of this externally.

Templates try to bandage over that by drilling down the abstraction to key-
value pairs themselves. And imperative constructs that sneak into templating
languages are an artifact of wanting to gain expressiveness without losing the
benefits of declarative form -- but really, the two are at odds.

YAML is a red herring -- we had the same headaches with XML a decade prior.
The problem is always that there's relationships among the data (or even
multiple instances of the config) that we care about, but that the structure
of a single config file at rest cannot model.

Databases -- let's say, an SQL one -- are actually among the better solutions,
because they allow the universe of config items to live in structured places
without overspecifying the exact form the data must take when serialized into
a file. Then, data can be normalized where it makes sense to avoid repetition
and introduce propagation. An SQL database gives all the tools needed to
accomplish this, using mostly declarative code.

Databases in a KV sense are often used for configuration, and SQLite's rise
has increased richly structured configs that are specified at a higher level
than what's typically done with other serialization formats, but the full
approach has not caught on outside big enterprise systems and complex
applications. Which is a shame, because it's hardly more complex than the
current awkward pairing of a full serializer and a templating engine.

~~~
afiori
SQL as a configuration format is not that bad of an idea

------
TeMPOraL
Wait, what?

I feel this article is missing the _bigger_ problem - one that for some reason
just cannot die.

The problem is that of _gluing strings together_. YAML is not an unstructured
text file, it's a tree notation. Whatever "templating" or "generation"
mechanism you want to use, it needs to respect the tree nature of the language
it operates on. It needs to respect semantics.

Gluing strings together is literally what causes SQL Injection to exist. It
caused countless of defacements on the web, and countless of broken websites.
I would think we've learned our lessons, but for some reason, I see these
template languages still alive and kicking.

~~~
ekimekim
The article goes on to talk about Jsonnet, which takes the exact approach you
describe - it generates JSON by aiming to be a "templated JSON" where the
templating involves generating semantic objects, not strings.

Here's an example (adapted from some real-world code) where I specify the k8s
cpu limit in one place, and then look up that info in several other places to
avoid needing to change multiple values later:

    
    
        {
          local container = self,
          requests: {cpu: 5.5, memory: "2G"},
          limits: container.requests + {memory: "4G"},
          environment: [
            {
              name: "NUM_THREADS",
              value: std.toString(std.ceil(container.requests.cpu)),
            },
          ],
        }
    

Note how I can patch the container.requests object with an alternate memory
limit, and how I can calculate an expression for the NUM_THREADS value in
order to automatically set it to ceil() of the requested cpu.

(edited for nicer formatting of the code)

------
epage
There have been times I've wanted to templatize my configuration but I don't
want to do it with text-based templates but templates within the configuration
files syntax (be it yaml, toml, or something else). Not sure what this is
called, I've been calling it "structural templating".

So far the only things close to this are

\- Azure pipeline's syntax: [https://docs.microsoft.com/en-
us/azure/devops/pipelines/proc...](https://docs.microsoft.com/en-
us/azure/devops/pipelines/process/templates?view=azure-devops) \- Something
called Jasonette:
[https://docs.jasonette.com/templates/](https://docs.jasonette.com/templates/)
\- Something called Jsonnet: [https://jsonnet.org/](https://jsonnet.org/)

Azure Pipeline's approach I think is closest to what I've been looking for.

Anything else in this space?

~~~
wryun
I prefer
[https://github.com/taskcluster/json-e](https://github.com/taskcluster/json-e)

Has the advantage/disadvantage that it's still valid json/yaml

I wrote a scary command line wrapper for it:

[https://wryun.github.io/rjsone/](https://wryun.github.io/rjsone/)

Has libs for Python, Go, and JS, and there's a bazel interface.

~~~
epage
This looks really nice. I'll have to give this a try at some point to see how
I feel about it vs Azure Pipelines. From my quick look, this looks more
general purpose at the cost of more verbosity.

------
fiddlerwoaroof
Dhall-lang ( [https://dhall-lang.org](https://dhall-lang.org) ) is another,
somewhat interesting, attempt to solve this program: it comes with a non-
Turing complete programming language, so you can bring some abstraction to
your configuration files without having to worry about things like infinite
loops.

------
MrStonedOne
The real question is why are we using yaml at all?

~~~
quickthrower2
Baffles me. I don't like any language or file format where whitespace matters.
Even Haskell bothers me in this regard. To me white space shouldn't add
cognitive load. I want to look at the symbols not the formatting of the
antisymbols to understand what is going on.

Write JSON and use your editor tools to format it with nice indentation, and
you are sweet!

That said Yaml makes an excellent format for reading, but not for writing.

~~~
amanzi
White space matters with Python but I rarely run into issues with indentation
in Python. But YAML on the other hand, I have nothing but nightmares.

~~~
kccqzy
I have the same experience. I just think think it is possible to design easy-
to-write indentation-sensitive formats but YAML is not. For example is always
baffles me that

    
    
        a:
        - b
        - c
    

has a list inside an object but the list is not further indented. There's in
fact a hierarchy relationship but absolutely no indentation.

~~~
Rapzid
You can indent it though, but the hyphen already means a list item.

That is a preferred syntax for many that are used to YAML though.

~~~
XorNot
The hyphen is associated with the a though. In what's posted, a is a list, not
a map - the type of the parent is being declared by that hyphen.

------
equalunique
Maybe we should be exploring using Dhall instead of YAML.

>One of the clearest signals I’ve gotten from users is that Dhall is “the YAML
killer”, for the following reasons:

>Dhall solves many of the problems that pervade enterprise YAML configuration,
including excessive repetition and templating errors

>Dhall still provides many of the good parts of YAML, such as multi-line
strings and comments, except with a sane standard

>Dhall can be converted to YAML using a tiny statically linked executable,
which provides a smooth migration path for “brownfield” deployment.

Source: [http://www.haskellforall.com/2019/01/dhall-year-in-
review-20...](http://www.haskellforall.com/2019/01/dhall-year-in-
review-2018-2019.html?m=1)

------
rhacker
LOL we are doing the same with k8s. We deploy to any environment, where each
environment is a different k8s namespace. We even have a namespace for each
developer. The variables are things like the image name (the tag is
effectively a git commit id, or sometimes different for not-committed yet
stuff). Beyond that just about nothing really needs to be a variable, but we
do have different RAM amounts.

We have a couple ways that we template this out, but mostly we literally just
do this in bash:

    
    
        sed -e "s/\$CI_COMMIT_SHA/$CI_COMMIT_SHA/" kube-deploy.template.yaml | kubectl -n $ENV apply -f -
    

(Where CI_COMMIT_SHA comes from gitlab) ENV comes from our gitlab CI file.

That all being said, the extent of our k8s integration is lots of stuff like
that. We could write a JS file that creates a JSON k8s template, but honestly,
that would be more work and more learning than we had to for what we're doing.
Why would we do more just because we want to avoid templating in a YAML file?

------
CryoLogic
I think they are missing the real selling point of JSON. It's basically
interoperable with JavaScript objects.

That means you write it, send it, store it, operate on it, etc. with little or
no modification.

The author says "converting between the two is trivial" which may be true, but
the developer overhead is less trivial. And it will always be JSON in the
client - JS doesn't support YAML objects.

~~~
altmind
Except that you cannot encode pretty valid Double.NaN or Date(), which is
showstopper for many.

I remember mongodb started with json, but switched to in-house bson pretty
early because of the json limitations.

~~~
svnpenn
new Date().toJSON()

~~~
altmind
When you parse back the result of this, you are not getting Date back.

------
the8472
Helm reminds me of the 90s level of webtech. I.e. php cgi files mixing html,
logic and includes.

~~~
jrockway
I believe people are back to doing that for web stuff. Open up any modern web
stuff and it looks like "const widget = <div>foo</div>".

Not making this up! [https://reactjs.org/docs/introducing-
jsx.html](https://reactjs.org/docs/introducing-jsx.html)

~~~
dmix
The difference, at least for Vue single file template components is that it's
3 separate areas in one file:

    
    
        - <template /> (Templated HTML)
        - <script /> (JS OR Typescript)
        - <style /> (CSS OR SASS/etc)
    

Whereas in old PHP files you could mix it in anywhere and the files were a big
mess. Including inline SQL into your view templates which is hardly a good
separation of concerns. While a Vue component can be separated into separate
files, at least as one it all represents one isolated piece of the interface.

[https://flaviocopes.com/vue-single-file-
components/](https://flaviocopes.com/vue-single-file-components/)

------
fasteo
Off-topic: The whole discussion is about deploy-time configuration management,
but our problem is more about run-time configuration management.

We have done the classical memcached+database custom solution, but I was
wondering if there is any accepted library/tool to change application run-time
behavior. We have tried consul KV store [1], but does not quite fit in our
environment.

My ideal solution would be a webapp with some text editor (think codemirror).
Changes in this text file would push the configuration data to a running
application.

[1] [https://learn.hashicorp.com/consul/getting-
started/kv](https://learn.hashicorp.com/consul/getting-started/kv)

------
nickjj
Who knows.

It makes me throw up a little in my mouth every time I see hundreds of lines
of YAML to configure something like Traefik with Kubernetes. The worst is when
people say they prefer that because "I don't have to write a config file for
my backend". That's true but instead now you have extremely verbose
configuration mixed in with other verbose configuration.

But in YAML's defense I think it's more of a problem with the tools that use
it more so than YAML itself. Ansible is a great example of how amazing YAML
can be to manage complex configuration in a concise way.

~~~
gh02t
I agree that simple YAML can be nice as a quick and clean tool, but Ansible is
an example of everything _wrong_ with how YAML is used. Layering program flow
constructs like loops, variables, templates, references etc. are exactly the
sort of abuses that make YAML feel awkward.

------
peterwwillis
YAML is a data stream, not a program. Please do not shove programs into data.

Your data does not need to be "expressive", it just needs to provide input to
a program. If your data files need to be complex, you need a program to
generate them for you.

I've danced the dance of ini -> json -> yaml -> weird hybrid -> embedded
logic, and it ends with "program that asks for what the thing you want looks
like and generates data files". Industrial software design figured this out
ages ago.

~~~
TeMPOraL
And you end up with program compiled to a config anyway, so with a proper
toolchain it means your real config _is_ the program.

People keep increasing complexity unintentionally precisely because they don't
realize that _code = data_. There's no real distinction. Code is data is code.

You will end up having Turing completeness _somewhere_ , it's just a matter of
choosing (or blindly selecting, like most people do) where. For a popular
product, it eventually gets embedded in the configuration language, turning it
into half-assed programming language (see most web-related templating). For
less popular products / more enterprise'y settings, you can probably get away
with embedding the Turing-complete part _in your bureaucracy_. That is, I
can't code my config to make it do what I want, but I can pay you to get
developers to write some code and export it to the config language as a
keyword. There's a spectrum to this, and tradeoffs galore.

But ultimately, YAML is nothing but a tree notation. Tree notation is enough
to represent high-level programming languages. Lisp without parenthesis, if
you might, or Python, if you squint your eyes.

~~~
peterwwillis
"Data" to me is the least complex input from a human, whereas "code" is more
complex and necessitates a lot more work to make sure it's correct and bug-
free.

If you embed "code" in "data", you made your thing way more complex and
subject to software design patterns. But in software operation, we already
have to contend with highly complex systems, so we want to remove as much time
and effort and complexity as possible from the instrumentation.

To put it another way: if you had to run a nuclear reactor, do you want to
instrument it by constantly writing new code, or turning a dial? I'd rather
turn a dial. That means I have to develop the code for that dial ahead of
time, but in the end, actually using it will be safer.

------
desc
My approach to this these days is:

* all configuration, without exception, is XML.

* all configuration may be generated from any other format imaginable, but it's sure as fuck going into the Big Main Godlike Application as XML.

Separation, interfaces, etc. Disclaimer: I work in .NET almost exclusively.
The .NET configuration APIs generally work, as long as you _only ever use them
for reading_ ; treating config as something the application itself can fiddle
with is a fast route to madness.

------
louiskottmann
I feel like Ansible generating YAML with Jinja templates is really a sweet-
spot, with idempotency and reusability.

I find it pushes me to write plain YAML files for variables and defaults
(Ansible), while allowing strong templating of generated files (Jinja) and
letting the result be readable (YAML). By readable I mean minimum programming
bloat (code spread on many lines just to write a for loop) and minimum extra
syntax that clutters the screen (brackets and quotes). It also lets me write
very little custom code (aside from variables, obviously).

If I had to use a more "powerful" YAML-like replacement, it would mix all
these into files, written differently by people with different styles, and it
would have bloat all over the place.

The main issue I have with helm is that values.yml is not templatable by
default so you have to generate it if you want reusability.

YAML does one thing and does it well, it's readable and bloat-free. Maybe we
need more tools like "kubectl explain" to know the syntax though.

------
mschaef
Reading the commentary on whitespace, my thoughts immediately jumped to the C
preprocessor. Even though it's built into the language, it has the same sorts
of problems as a template engine has with generating YAML: the preprocessor
just wasn't enough aware of the syntactic structure of the language to make it
easy to generate anything of significant complexity.

I'm not proud of this (and like to think I could come up with something better
these days), but this code was a bit of a nightmare for that reason:

[https://github.com/mschaef/vcsh/blob/master/vm/number.c#L256](https://github.com/mschaef/vcsh/blob/master/vm/number.c#L256)

Lisp macros do better, but they have the problem that the macros (and their
potentially unusual evaluation rules) can easily just blend in with ordinary
function calls.

------
sankyo
In the Clojure world we use EDN (extensible data notation) which is a subset
of Clojure. [https://github.com/edn-format/edn](https://github.com/edn-
format/edn)

You can extend it, convert it to JSON if necessary, and it is easy to read.

------
ssspaju
I'd rephrase the question: "Why are we templating YAML with text-based
templating tools?"

I and @akx wrote a templating tool called Emrichen that's specifically
designed for producing YAML and JSON from YAML templates:

[https://github.com/con2/emrichen](https://github.com/con2/emrichen)

In contrast to other template systems, Emrichen templates are not just "based
on YAML", they _are_ YAML. YAML tags like "!Var varname" are used to perform
things like variable substitution, loops etc. Variables can be of any JSON
type, not just strings, and the template is evaluated top–down.

------
fredsted
Helm charts are great, and accomplishes the task, but things get really weird
and complex if you're not careful. Oh man, the indentation and _helper.tpl
nightmares. It's not obvious at all. And there's still a lot of repetition for
each chart; writing a deployment or whatever is so verbose in K8s.

On the other hand, Ansible uses yaml and there it works great. I feel like
Ansible uses yaml in a way that's easier to understand and the way it was
meant to be written. With Ansible, you're writing configuration, not templates
of configuration. I don't think a layer on top of Ansible like Helm is for K8s
wouldn't make sense.

------
alxarch
Over the last week I created a tool processing YAML and JSON files using
Jsonnet. It's called
[ycat]([https://github.com/alxarch/ycat](https://github.com/alxarch/ycat)) and
is inspired by `jq` but uses Jsonnet for processing. It can also be used just
as `cat` to concatenate JSON/YAML files. It's still young but very usefull,
especially for handling complex kubernetes configurations.

------
zaphar
Wholly agree with this which is why I've been experimenting with a project
similar in goals to jsonnet as a side project of my own. I didn't know about
jsonnet when I started or I'd probably have just used/contributed to that
instead.

Why template yaml when you could just generate it? Or generate json, or toml,
or xml, or environment variables...

~~~
Aeolun
If you are generating it, why use it at all?

~~~
leg100
Because more often than not that is what the target system expects (k8s most
notably).

------
nikolay
Jsonnet is nice, but not actively developed, unfortunately. I highly recommend
you to support UCG [0], which is written in Rust (Jsonnet has two
implementations, which is one source of the slowness).

[0]: [https://github.com/zaphar/ucg](https://github.com/zaphar/ucg)

------
illumin8
His main argument is that you need to template differently for different
environments (dev/stage/prod) and cloud regions (us-west/us-east/emea/apac).

This is actually a solved problem, and you shouldn't be doing it in your
YAML/JSON templates. You should be using an external parameter store to do
this, and using a single template for everything.

See [https://aws.amazon.com/blogs/compute/query-for-the-latest-
am...](https://aws.amazon.com/blogs/compute/query-for-the-latest-amazon-linux-
ami-ids-using-aws-systems-manager-parameter-store/)

This is a simple use case: I want to deploy the latest AMI (Amazon Machine
Image) in any region, so I always get the latest patched Linux base image to
run my application on. I don't (and shouldn't) want to update my YAML/JSON
every time a new image is published.

So, why are people having to go to these crazy templating macro lengths? Just
store the changing bits in an external config/parameter store like etcd and
let your infrastructure as code templates remain unchanged.

~~~
jrockway
Based on the appearance of Helm, this is probably referring to Kubernetes,
which really does have per-environment things that can't be represented in any
form other than namespaces. For example, you have an app running (that's a
deployment in k8speak). You want network traffic to get to this app, so you
set up a load balancer. The load balancer config needs to know the name of
your app (or more precisely, a selector for "pods" that are created by the
deployment), which will change between environments, so now there is that
common variable that has to be updated in both places. That's a simple example
but is why people are templating their k8s configs. Yes, you could work around
it by giving every environment its own namespace and using the same set of
objects in every environment (differing only by namespace, which should
probably be in the file... but since you can't edit the files, you can pass it
in to kubectl probably), but there are other cases where even that doesn't
work.

You do need some way of saying "this is the base configuration and this is
what we change for staging and production". Helm is a way to do that, and a
popular one, but it's pretty ugly. Hence this article.

~~~
illumin8
This is how you do it in AWS CloudFormation:

Parameters: AMI: Type: AWS::SSM::Parameter::Value<String> Default:
/aws/service/ecs/optimized-ami/amazon-linux/recommended/image_id

That gives you the latest, fully patched base OS image no matter which of 19
regions you launch it in.

Even hardcoding that in a K/V store is going to get outdated unless you
manually update it. Parameters like this are great because you can simply
write your code once and never have to update unless you're adding new
functionality. All base parameters and external systems (APIs, etc) are
parameterized and never need to get updated, except by your SaaS partners that
update them for you.

------
l0b0
YAML is the bastard offspring of XML. A bunch of ways to write semantically
identical stuff is bullshit. Let data files be data files, and build them
using languages actually suited for the job. JSON is plenty complex for 105%
of the use cases of YAML, with much fewer downsides.

------
jeremysalwen
When I joined Google, I realized that it was kinda surprising I hadn't seen
something like GCL in the outside world (not to say it doesn't exist, just
it's ubiquitous enough for me to hear about it). This seems to be an example
of that hole being filled.

------
kyberias
Why is it so hard to write the meat of the argument in the first paragraph? So
much setup...

------
shusson
I've recently settled on typescript object literals for creating complex
config files.

Adv:

\- It's still quite declarative.

\- supports expressions.

\- supports merging objects using the spread `...` operator. This enables
breaking up large configs files into smaller files.

\- supports type constraints.

\- serialises to JSON easily.

~~~
tobyhinloopen
Same here... just using the same language as the rest application, usually a
file with constants or a class/struct/module/whatever.

the only reason config files in another language might be required is when you
require configuring the application after it is compiled to native binary
code.

Obviously this doesn't apply to all these applications written in scripting
languages.

I do wonder how you create type-safe config files though. I currently have a
`development.ts` and a `production.ts`, where the `production.ts` is only
loaded when `node_env === production`. `production.ts` contains blank/default
values and the file is overwritten on the server with production secrets.

~~~
shusson
> I do wonder how you create type-safe config files though

The types help prevent erroneous config declarations.

e.g

```

enum LogLevel {

    
    
        Info = "info",
        Debug = "debug"
    

}

const MyConfig = {

    
    
        logOptions: {
    
            level: LogLevel.Debug
    
        }
    

};

```

And since we serialise `MyConfig`, the config has to be transpiled by
Typescript.

------
zoobab
I use j2cli to template my YAMLs, pretty powerful to template stuff without
using ansible. You just need to right environment variables. And you can put
if logic if you want.

------
gjvc
We cannot now be far off some ansible [1]-like tool using LISP s-expressions
for its configuration language.

[1] or any of that ilk

------
spacesuitman2
A consistent observation has been that all config languages tend to become
turing complete over time.

------
Animats
Next, inheritance for JSON! Look how well it worked for CSS.

