Hacker News new | past | comments | ask | show | jobs | submit login
Why are we templating YAML? (leebriggs.co.uk)
263 points by jaxxstorm on Feb 7, 2019 | hide | past | favorite | 346 comments

My belief is that we've been slowly building up to using general purpose languages, one small step at a time, throughout the infrastructure as code, DevOps, and SRE journeys this past 10 years. INI files, XML, JSON, and YAML aren't sufficiently expressive -- lacking for loops, conditionals, variable references, and any sort of abstraction -- so, of course, we add templates to it. But as the author (IMHO rightfully) points out, we just end up with a funky, poor approximation of a language.

I think this approach is a byproduct of thinking about infrastructure and configuration -- and the cloud generally -- as an "afterthought," not a core part of an application's infrastructure. Containers, Kubernetes, serverless, and more hosted services all change this, and Chef, Puppet, and others laid the groundwork to think differently about what the future looks like. More developers today than ever before need to think about how to build and configure cloud software.

We started the Pulumi project to solve this very problem, so I'm admittedly biased, and I hope you forgive the plug -- I only mention it here because I think it contributes to the discussion. Our approach is to simply use general purpose languages like TypeScript, Python, and Go, while still having infrastructure as code. An important thing to realize is that infrastructure as code is based on the idea of a goal state. Using a full blown language to generate that goal state generally doesn't threaten the repeatability, determinism, or robustness of the solution, provided you've got an engine handling state management, diffing, resource CRUD, and so on. We've been able to apply this universally across AWS, Azure, GCP, and Kubernetes, often mixing their configuration in the same program.

Again, I'm biased and want to admit that, however if you're sick of YAML, it's definitely worth checking out. We'd love your feedback:

- Project website: https://pulumi.io/

- All open source on GitHub: https://github.com/pulumi/pulumi

- Example of abstractions: https://blog.pulumi.com/the-fastest-path-to-deploying-kubern...

- Example of serverless as event handlers: https://blog.pulumi.com/lambdas-as-lambdas-the-magic-of-simp...

Pulumi may not be the solution for everyone, but I'm fairly optimistic that this is where we're all heading.


This is a great analysis, but it's missing a fundamental point: why do we have a problem with these approximations of a programming language or just using a programming language to template stuff?

Because your build then becomes an actual program (i.e. Turing complete) and you have to refactor and maintain it! This is the common problem of using a "programming language as configuration" (e.g. gulp?)

Dhall solves exactly this problem: https://dhall-lang.org

It has the same premises of Pulumi, but without the Turing completeness (I don't know if/how Pulumi avoids that, but if it does it should be part of the pitch), so you cannot shoot yourself in the foot by building an abstraction castle in your build system/infrastructure config.

We use it at work to generate all the Infra-as-Code configurations from a single Dhall config: Terraform, Kubernetes, SQL, etc.

And there is already an integration with Kubernetes: https://github.com/dhall-lang/dhall-kubernetes

> We use it at work to generate all the Infra-as-Code configurations from a single Dhall config

This is the key bit and not something which is pitched well enough from the Dhall landing pages: using straight YAML forces you to repeat yourself in multiple areas for each Individual tool being used, and these repetitions have to stay consistent across multiple tools. What Dhall does is allow you to write a single config and use it to derive the correct configurations for each tool that you use. So you can write a single configuration file from which, eventually, every single part of your system is derived - Terraform infrastructure, Kubernetes objects, application config, everything. When you pull it off, it's simply magical.

You can think of it like this: JavaScript is a horrible, no-good, very bad language, and yet all browser programming is done in JavaScript because every browser supports it - so too, are JSON and YAML horrible configuration languages. But JavaScript gave rise to abstractions like TypeScript which are much better languages which compile down to JavaScript for compatibility. TypeScript is to JavaScript what Dhall is to JSON and YAML - the fact is, pretty much everything is configured with JSON and YAML, and Dhall makes it much, much easier to live in that world, with no need for the systems being configured to support it.

Considering the relative obscurity of Dhall, it's basically the best-kept secret in the DevOps world right now, and it's a shame more people don't know about it.

Dhall appears to be expressive enough that I can't see why you wouldn't have to refactor and maintain the Dhall code?

Writing Dhall code look exactly like programming to me, and the programmer must possess the necessary programming skills to produce good Dhall code. A random guy with a text editor will make an equal mess in Dhall as they would with a “real” programming language.

I don't see how the restrictions in Dhall really help much in this regard. Turing completeness feels like a red herring to me.

Not a user of Dhall, just a fan, but refactoring of Dhall configuration should be extremely easy. You make a change, and your configuration stays the same, which is easy to verify. (Thanks to https://en.wikipedia.org/wiki/Normalization_property_(abstra... )

For TC languages, comparing if two programs (original and refactored) do the same thing is not solvable in general. If the language is not TC then it is more feasible.

You can compare the outputs of two programs.

Sure, a TC program may not finish to produce output you can compare, but in my experience that's only a theoretical problem.

You can do more than just compare the output of two programs in Dhall. You can verify using a semantic integrity check that two programs are the same for all possible inputs. For example:

  $ dhall hash <<< 'λ(x : Natural) → x + 0'

  $ dhall hash <<< 'λ(x : Natural) → x'

  $ dhall hash <<< 'λ(y : Natural) → y'
The cryptographic hash is smart enough that many behavior-preserving changes don't perturb the hash.

Actually with Dhall, you should be able to compare the programs themselves, even without full "input" (there is even example on the Dhall page, see "You can reduce functions to normal form, even when they haven't been applied to all of their arguments").

So you can for example leave some parameters out of your config and still validate the correctness of refactoring.

If you use general purpose programming language, then even comparing just output might be difficult - most languages allow to do I/O, so it's possible that the configuration is dependent on some side channel.

I would say if you are only using general language "sensibly" for configuration then you are effectively restricting yourself in the same way that Dhall does.

This sounds like a render test?

This is how Lua started, as a config language, but it gradually added more features that people found useful in config, and became Turing complete.

Lua was TC from the start, it came with the procedural concepts from Modula - if/while/repeat - and functions.

I meant SOL, the predecessor, before Lua proper.

What's so bad about Turing Completeness? I haven't a decent look at Dhall, but I'm betting I could probably write an exponential Dhall program that won't terminate in the lifetime of the universe.

The real reason for giving up Turing equivalence was probably to get dependent types. This gives very powerful static guarantees, including the presence/absence of fields under non-trivial record operations such as merge. In using dependent types, they have also had to give up significantly on type inference, which is really going to annoy the average JavaScript/Ruby programmer.

I don't get the problem with using a turing complete language to generate configuration. There's nothing wrong with maintaining and refactoring a program, that's a natural process for any program. If you don't want an infinite loop, don't write one, as you wouldn't in any other program. You can choose as much or as little abstraction as you so wish.

Give me a real language any day over dhall or jsonnet.

This explains the disadvantages of using a general-purpose programming language as a configuration language:


FWIW jsonnet is a "real" language. It's a dynamically typed, lazily evaluated purely functional programming language).

Fair enough. I should have said "general purpose language" rather than "real", which makes for flame-bait.

I once built a mandelbrot fractal renderer which emitted a data-URL encoded PNG string to stdout in BCL (a spiritual predecessor of Jsonnet @ Google).

Yeah, I know what you mean. It lacks generic input/output, you cannot read write arbitrary files and perform arbitrary network requests etc.

I do like that restriction in the context of managing configuration systems, because it allows you to build hermetic evaluations.

With kubecfg we added the ability to import from URLs, which I wish was available out of the box in jsonnet.

> you have to refactor and maintain it

You already have to do that, so why not do it in a reasonably powerful language?

Here's a nice explanation on why using "reasonably powerful languages" has many disadvantages: https://github.com/dhall-lang/dhall-lang/wiki/Safety-guarant...

Also you might be familiar with the Rule of Least Power: https://en.wikipedia.org/wiki/Rule_of_least_power

> My belief is that we've been slowly building up to using general purpose languages, one small step at a time, throughout the infrastructure as code, DevOps, and SRE journeys this past 10 years.

I think that you’re right, and I think it’s great, because we have a programming model in which code is data and data is code: Lisp & S-expressions.

It’d be downright awesome to have a Lisp-based system which used dynamic scoping to meld geographical & environmental (e.g. production/development) configuration items. But then, it’d be downright awesome if the world had seriously picked up Lisp in the 80s & 90s, and had spent the last twenty years innovating, rather than reïnventing the wheel, only this time square-shaped. But then, the same thing could be said about Plan 9 …

I’ve not yet had the time to take a look at Pulumi, but I hope to have time soon.

> I think that you’re right, and I think it’s great, because we have a programming model in which code is data and data is code: Lisp & S-expressions.

"Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of CommonLisp."


Seriously, this has happened again and again and again. You have software, so you configure it via a clean and simple text syntax, then the configuration needs to be generated and the syntax becomes more complicated, then the next system you do has an "API" instead so you can configure it via programming, which is too complicated so the next time you Do it Right and go with a simple text file, which is then outgrown when the configuration it stores becomes too complicated...

It's like a circle of life thing.

And people are vehemently agreeing/disagreeing depending on their phase shift in the Turing complete vs declarative carousel.

Compare with: strongly vs weakly typed languages

That saying was very true of Fortran, reasonably true of C, and mostly don't happen on newer languages.

I think the parts of Lisp that tended to be rebuilt have mostly been incorporated into the newer languages. (At least, it's been a very long time since I've had to rewrite a fundamental data structure, etc.)

You don’t need code-is-data for what your parent is describing. All you need is code that outputs data. Or even better, code that initiates contact with other code.

The only requirement is a commitment to doing things imperatively in a real programming language. It’s hard to resist the temptation to do things declaratively (because it’s easier to imagine a declarative interface that describes your problem than an abstraction of the procedure which will solve it) but you are never forced to.

As the kids say: stop trying to make Lisp happen, it's not going to happen.

It has become yet another community that's fighting a struggle that everyone else ended years ago, like the few Japanese in jungles who refused to surrender. I'm not entirely sure why it's not been adopted, but I suspect it's because most people strongly prefer (a) visually semantically different scope delimiters and (b) function-outside-brackets syntax ie f(a, b) rather than (f a b).

Or you could go the other way and say that JSON is s-exps with curly brackets so it should be made executable as such, and build that language.

> As the kids say: stop trying to make Lisp happen, it's not going to happen.

That's probably true, but I think it's useful to fight the good fight regardless. Even if Lisp & s-expressions don't, in fact, take over the world (and I think they will), arguing in their favour might help increase the chance that whatever inferior technology does end up getting adopted is better than it could have been.

> Or you could go the other way and say that JSON is s-exps with curly brackets so it should be made executable as such, and build that language.

The problem is that without symbols, that ends up being hideously ugly. This:

     ["<", 1, 2],
     "less than",
     "greater than or equal to"
is appreciably worse than:

    (if (< 1 2)
        "less than"
        "greater than or equal too")
And alternatives like:

    {"if": [[1, "<", 2], "less than", "greater than or equal to"]}
are so much worse that I don't think anyone could seriously expect to use them.

> It has become yet another community that's fighting a struggle that everyone else ended years ago, ... like the few Japanese in jungles who refused to surrender.

Nice imagery, but the wrong point.

Except for the syntax, everybody else joined Lisp.

"We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp." --Guy Steele

Flash back to the mid-1980's (when the mainstream was C, Pascal, BASIC, FORTRAN, COBOL, etc.) and it's Lisp/Scheme (and Smalltalk) that have features like Garbage Collection, interactive development, lexical closures, decent built-in data structures, dynamic typing.

The fact that all of this is commonplace today, both justifies a lot what Lisp did in the first half of its existence and undermines its (technical) competitive advantages now.

> but I suspect it's because most people strongly prefer (a) visually semantically different scope delimiters and (b) function-outside-brackets syntax ie f(a, b) rather than (f a b).

It's not technical. I don't think it ever was. So much of it is around social concerns: a performance stigma dating back to the 1970's, fear of being able to hire people to do the work, fear of what VC's will think, worries that the language will still be available... And then at the end of the day, the problems whatever language will solve are a tiny fraction of the overall problem of doing something relevant and lasting and useful to others.

> As the kids say: stop trying to make Lisp happen, it's not going to happen.

Life is too short and the world is too big to try to confine other people's ideas of how they should think or work.

The point of the market economy and of the scientific process is that people get to try what they think is going to be useful and then let the world decide. The fact that Lisp is still in the conversation at all, when its contemporaries (Autocoder, Fortran) either aren't or are highly specialized, says a lot that we can learn from.

>As the kids say: stop trying to make Lisp happen, it's not going to happen.

Mean girls came out in 2004, no kid knows that movie

Oh my! So web-assembly is not 'happening' then ? May it REST in peace.

I think what you're doing with pulumi is the right answer and it's only a matter of time before this becomes the norm. The author's examples could easily be done with plain ol' JS/ES/TS with more far more extensibility and customization when the need arises.

I also feel this is where JSX got it right. Instead of creating yet-another-templating-language (looking at you Angular!), they used JavaScript and did a great job of outlining how interpolation works. Any new templating language is always going to be missing some key feature you expect out of a general programming language and your customers will continue to ask for more features.

Take for example Terraform and HCL, they're continually adding more and more [templating features](https://github.com/hashicorp/terraform/blob/master/website/d...) and [functions](https://www.terraform.io/docs/configuration/interpolation.ht...) because there's so many different ways to skin configuration/infrastructure as code. What if TF just expect a "computed" JSON object and it's left up to the developer to figure out how to put it together?

I'm gonna keep an eye on Pulumi and hope to be able to use in a real project soon.

Crazy idea, but couldn't we use JSX for configuration?

<AutoScalingGroup name='Main cluster'> <LaunchConfig imageId='ami-xx'>...</LaunchConfig> </AutoScalingGroup>

Paired with Typescript, we would have the clearness of a declarative language, with the power and flexibility of a real language that is also easy to extend and navigate.

As a bonus, most tooling already exists.

You just invented XML!

This is sitting right on the genius/insanity border.

I had the exact same idea! Does something like that exists?

In .NET land, there's Razor, which was designed from get go to mesh well with C# syntax such that you need a minimal amount of control characters:


In ROS we have these XML launch files that are just awful. They have enough features to be a really bad programming language for configuring and launching (often conditionally) numerous robot software nodes.

In ROS2 the launchfile can now just be a Python script. Very much learned all this the hard way and the solution was to just support Python. I think it's brilliant.

AOLServer and our own Tcl based application server also used this idea.

Configuration files for each component were a DSL made of Tcl functions. Each module just sourced the respective file on load.

There are several possible situations:

- the django like situation: the configuration is pure code, and it's a mistake. It was not necessary, it brought plenty of problems. I wish they went with a templated toml file.

- the ansible like situation: the configuration is templated static text. But with something as complex as deployment, they ended up adding more and more constructs, until they created a monstrous DSL on top of a their implementation language, with zero benefits compared to it and plenty of pitfalls. In that case, they should have made a library, with an API and documentation making an emphasis on best practices.

- and of course a big spectrum between those

The thing is, we see configuration as one big problem, but it's not. Not every configuration scenario has the same constraints and goals. Maybe you need to accept several sources of data. Maybe you need validation. Maybe you need generation. Maybe you to be able to change settings live. Maybe you need to enforce immutable settings. Maybe you need to pub sub your settings. Maybe you need to share them in a central place. Maybe they are just for you. Maybe you want them to be distributed. Maybe you need logic. Maybe you want to be protected from logic. Maybe the user can input settings. Maybe you just read conf. Maybe you generate it.

So many possibilities. And that's why there is not a single configuration tool.

What you would need, is a configuration framework, dealing with things like marging conf, parsing file, getting conf from the network, expressing constraints, etc.

But if you recreate a DSL for your config, it's probably wrong.

In defence of Django, the way settings.py works has been very stable for the entire lifetime of Django.

It may have its problems (I don't have many issues with it) but it doesn't seem to have this problem of attracting ever more layers of abstraction on top of it. It works.

Actually, I think settings.py is not a bad idea, but it's half backed.

There should have a schema checking the setting file. There should have a better way to extend settings, and make different settings according to context, such as prod, staging or dev.

There should be a linter avoiding stupid mistakes like missing a coma in a tuple, resulting in string concatenation.

There should be variables giving you basic stuff like current dir, log dir, var dir, etc. We all make them anyway.

And there should be a better to debug the import settings problem.

But all in all, it's quick and easy to edit, and very powerful.

There is already a mechanism to validate the settings.py file inside django.

The different context stuff can be handled by using env vars, and a nice python wrapper, like python-decouple.

> There is already a mechanism to validate the settings.py file inside django.

It's not exposed, but it's very limited.

> The different context stuff can be handled by using env vars, and a nice python wrapper, like python-decouple.

It's just one of the way to do it. Go to a new project, they use a different way. The main benefit of Django is the fact that a Django project is well integrated, and you find similar conventions and structure from project to project, allowing to reuse the skill you learned and build an ecosystem of pluggable app.

Just so we're on the same page, this is the validation I was referring to -


Standardization is always an issue, I guess. Env vars seem to be the norm in the community in my experience, whatever that's worth..

Ah the stuff used for the password ?

I would be more of a fan of something like marshmallow, checking the whole thing.

> it brought plenty of problems

Does anybody here personally suffered those problems that the Turing complete Django configuration creates? (I mean, not the ones caused by lack of a completness checks, or good library support, but the ones caused by too much power.)

If so, how do those problems look like?

Now that you say it, it's true I didn't have problems with too much power.

I never had an untrusted party editing my config, nor did I use data from any.

Also, you can make the same mistakes in the setting file that in any code file, but it's not more or less important.

In fact, all the problem I had could have been solved by better integration: solving the import problem, making composition easy, adding checks, allow loading data from several sources and merge them, presenting them in a unify interface.

If I'm being honest, problem with settings.py may have not been that it's Python, but that it's a flat file with no strong conventions, tooling or best practices.

I could raise the issue that you can't read the config from another language, but I never had to, and good tooling would allow a synced export or an API to consume the settings.

Same for writing, or live settings.

After years of working with cfengine then ansible I finally went to a bespoke BSD ports work alike with optional client/server and json configuration components. Never looked back.

What does it look like ?

RCS stored directory based modules with tasks in subdirectories. Make or shell script style module execution as part of each task dir + variable files containing settings for the install task. Json configuration files that define all necessary module params (ex:log, task selection, stop on error, initialization, build command per task, etc...) remote scheduling of module/task execution via per agent sysv ipc command queue serviced by a JSON-RPC microsvc which allows both serialized and non blocking task scheduling by queue priority.

I owned the majority of the configuration system and ecosystem for Borg, Google's internal cluster management and application platform.

Unfortunately, what described here is good in many level, but not excellent in any.

If you are OK to describe the complexity of your infrastructure in a programming language similar to the general purpose language, then a well abstracted API built on original APIs from cloud providers are more familiar to devs. And it will be more reliable performance and flexible.

If you want a config experience, something like kustomize is leaner and more compatible with the text config model.

I also cannot see how this interoperate with other tools, which will seriously limit it's appeals to people using other tools.

The problem with code as configuration is that the config file is indeterministic and it takes longer to extract information from the file.

This has long been a problem in the python/pip community, as its basically impossible for the build tools to determine the dependencies of a package without fully downloading and running the setup.py file.

Static config files are static for a reason!

Unless you import rand() your code should be deterministic. You're right about needing to run the thing to get the data (that's the point) but there is a middle ground between pure literals and fully side effects code. By example you could impose pure functions (no side effects).

That's exactly what Dhall is doing.

That's what Haskell already does. Dhall is optimizing on different dimensions (making sure the script execution ends, making the scripts verifiable at static time, making it convenient to marge files, making it convenient to centralize your configuration).

As a happy pulumi user, I have to say I am very impressed with the experience. An order of magnitude improvement on maintainability over our old terraform code base. Highly recommended.

This is my experience and it's clearly biased from maybe one bad example but ... Scons is an example of code over configuration and from what I could tell I never met someone that truly understood it. Because it was code over configuration every programmer added their own interpretation of what was supposed to happen, no programmer truly understood what was really going one and it turned into one giant mess of trying to understand different programmers hacks and code to get the build to work. I'm sure some Scons expert will tell me how I'm full of crap but I'm just saying, that's my experience.

So, what's my point? My point is configuration languages help in that they push "the one true way" and help to enforce it. Sure there are times you end up having to work around the one true way but given very powerful tools of a full language for configuration leads to chaos or at least that's my experience. Instead of being able to glance at the configuration and understand what's happening because it follows the one true way you instead end up with configuration language per programmer since every programmer will code up stuff a different way.

For what it's worth--I've been using Pulumi on a couple of different projects and, today, I couldn't imagine starting a cloud-based project on anything else. The Pulumi team has spent more time than almost anybody I know on understanding how to attack these problems; I guess I have a bit of an understanding of just how much work that is, as I've tried to do the same thing and their solution is better.

I appreciate that their revenue model doesn't require making the open-source version frustrating or stupid and I appreciate that they're incredibly responsive. And some of the stuff you'll see around cloud functions/Lambdas and the deployment thereof will fucking blow your mind.

It's good. You should strongly consider it.

I have been using ksonnet but that is now officially dead. Working with jsonnet seemed unnecessarily painful when coming from coding typescript. This information is quite timely and welcome, I'll look further at the ts example.

We have ksonnet expats on the team (we're all in cloud city -- Seattle), and I've been keeping an eye on that project myself, since I think it got a lot of things right and frankly many of the ideas for Pulumi were inspired by early chats with the Heptio team. But, as you say, why create a new language when an existing one will do -- that was our original stance and it's working great in practice.

Joe Beda will be doing a deep dive on Pulumi on the TGIK videocast tomorrow, so it's a timely opportunity to check it out: https://twitter.com/jbeda/status/1092963296565587969

OP here. I actually wrote a post about Pulumi in this very space a while back


I do think this is more like what we should be doing, but as dismayed to see Pulumi’s free tier get sunsetted

Our free tier is still there and here to stay. What did we do to make you think it's been sunsetted? :-(

Oh! I don’t know where I got that impression from then! perhaps I just thought that we couldn’t use the free tier because of the number of licenses we’d need, but you’re right, it’s still there!

Build files (e.g. makefiles are their various descendants like SCons, rake, etc) seem to be in the same general boat except very early on mixing "real languages" (or at least shell scripting) was obviously allowed so they've always leaned far more towards the "yes, it is a general purpose language" end of the spectrum.

> My belief is that we've been slowly building up to using general purpose languages, one small step at a time, throughout the infrastructure as code, DevOps, and SRE journeys this past 10 years. INI files, XML, JSON, and YAML aren't sufficiently expressive -- lacking for loops, conditionals, variable references, and any sort of abstraction -- so, of course, we add templates to it. But as the author (IMHO rightfully) points out, we just end up with a funky, poor approximation of a language.

This is the why I prefer to use a JS file for configuration instead of native JSON or YAML file if those options are available.

Also see `webpack` as a successful example of code-as-configuration in the wild.

Not sure if it was successful when people call it a hell to maintain and newer simpler alternatives like Parcel is gaining popularity.

I still don't know how to get it to do exactly what I want. There is far too much magic involved, and experience has long demonstrated that magic is bad (Webpack confirms that belief).

That being said, the concept of defining a function in, essentially, a config file seems like a step in the right direction. I don't think I'd trust that functionality outside of builds or infra-as-code, though.

What's magic about webpack? The online documentation provides quite a lot of insight into how it all fits together.

It probably only seems like magic because you didn't build a fundamental understanding of how it works before using it. I use some massive webpack configurations and I understand them all quite thoroughly thanks to well-written, modularized configuration files.

For 10 years of Java/Android/Scala coding there was no need to understand how compilers combine everything into one JAR.

Javascript is a scripting language without native module support. That isn't Webpack's fault.

Webpack also handles much, much more than just Javascript. It handles CSS, HTML, images, files, pretty much any kind of asset. Java/Scala doesn't have anything like that. Asset management is completely different due to the nature of how assets are transferred to the client.

And Android? Give me a break. The moment you stray from the strict layout of an Android app you run into a wall and have to learn how Gradle operates. This strict layout is good for some but others hate when an environment forces particular constraints upon them.

Webpack is completely configurable at every stage, works with plugins (which compilers don't do) and again, isn't magic. Not knowing how something works doesn't make it magic. That's not what magic means with respect to code.

Besides... Maybe if you just like getting by, you can program in C/Java/etc without learning about compilers. Web dev is fucked and transpiler knowledge is basically required, but sure you can get by in other domains without it. But if you want to be a good programmer, an expert at what you do, someone who lives and breathes and understands computer science, someone who will excel in his career and not remain a code monkey forever... You have to learn about how your compilers work just like you should know how the silicon in your computer is doing its own "magic".

It was very successful. Complicated projects require complicated build config. Parcel does fine for simple projects, but lacks the raw power & configurability of webpack.

Webpack now does simple config as well with the 'mode: "production"' and 'mode: "development"' presets.

Hi, is Pulumi an generalized AWS CDK(https://github.com/awslabs/aws-cdk/blob/master/examples/cdk-...)? Looks pretty similar :D

Having dealt with puppet, cloudformation, ansible and other solutions that have gone in and out of fashion and also dealing regularly with Kotlin, Java, Javascript, and recently typescript, my view is that configuration files are essentially DSLs.

DSLs ought to be type safe and type checked since getting things wrong means all kinds of trouble. E.g. with cloudformation I've wasted countless hours googling for all sort of arcane weirdness that amazon people managed to come up with in terms of property names and their values. Getting that wrong means having to dig through tons of obscure errors and output. Debugging broken cloudformation templates is a great argument against how that particular system was designed. It basically requires you know everything listed ever in the vastness of its documentation hell and somehow be able to produce thousands of lines of json/yaml without making a single mistake, which is about as likely as it sounds. Don't get me started on puppet. Very pleased to not have that in my life anymore.

On a positive note, kotlin recently became a supported language for defining gradle build files in. Awesome stuff. Those used to be written in Groovy. The main difference: kotlin is statically compiled and tools like intellij can now tell you when your build file is obviously wrong and autocomplete both the standard stuff as well as any custom things you hooked up. Makes the whole thing much easier to customise and it just removes a whole lot of uncertainty around the "why doesn't this work" kind of stuff that I regularly experience with groovy based gradle files.

Not that I'm arguing using Kotlin in place of Json/yaml. But typescript seems like a sane choice. Json is actually valid javascript, which in turn is valid typescript. Add some interfaces and boom you suddenly have type safety. Now using a number instead of a boolean or string is obviously wrong. Also typescript can do multi line strings, comments, etc. and it supports embedding expressions in strings. No need to reinvent all of that and template JSON when you could just be writing type script.

I recently moved a yaml based localization file to typescript. Only took a few minutes. This resulted in zero extra verbosity (all the types are inferred) but I gained type safety. Any missing language strings are now errors that vs code will tell me about and I can now autocomplete language strings all over the code base which saves me from having to look them up and copy paste them around. So no pain, plenty of gain.

And yes, people are ahead of me and there are actually several projects out there offering typescript support for cloudformation as well.

To go with your general line of thought, see how many JS-based projects are increasingly moving towards a JS file with a default export as a config file.

This looks absolutely great! I’ll give this a thorough look over for our coming API development/deployment.

Thanks for plugging!

Are you going to add C# support?

Definitely. I was a part of C# in the early days, so little else would make me happier than awesome class .NET support. This'll be great for Azure folks -- who knows, PowerShell too?

We are actively working on https://github.com/pulumi/pulumi/issues/2430, which will make it easier for our small team to manage multiple languages. Once that lands, I would expect this to be high priority.

Some of our amazing community members have been prototyping this, and it's looking pretty promising: https://twitter.com/MikhailShilkov/status/109278757393889689....


> Definitely. I was a part of C# in the early days, so little else would make me happier than awesome class .NET support. This'll be great for Azure folks -- who knows, PowerShell too?

Powershell would be great, it has nice support for building DSLs.

I know I'm in a minority, but I really dislike YAML... I recently did a lot of Ansible and boy, at the beginning, I was just struggling a lot. Syntactic whitespace kills me.

I don't like it in Python either, but for some reason, when I write Python, it's a lot easier. Maybe YAML is just a bit more complex (and Python has better IDE support..?)

> Syntactic whitespace kills me.

Okay, I'm gonna be the asshole in the room, but how hard is it to just use consistent indentation? I can't count how many times I've heard people complain about significant whitespace in languages.

Not only is it not difficult to begin with, but every code editor and IDE will show you where there's a syntax error in your YAML. People are free to dislike YAML, even for its significant whitespace, but how does it "kill you"?

Look at this example from the article:


something: nothing

  hello: goodbye

This is pure sloppiness, and anyone who has trouble carelessly adding pointless bytes to code, no matter the language, is sloppy. I don't understand why people criticize YAML and Python because "whitespace is hard".

P.S.: There's a similar configuration language called ArchieML, which is similar to YAML but doesn't have significant whitespace.


Three big things that annoy me even though I'm happily writing Python:

- "cut and paste and edit" is broken. You can't autoformat the pasted code into the right place, you have to go back and fix the whitespace. Since whitespace is semantically significant, this can introduce bugs.

- visually identical whitespace may not be textually identical whitespace. Unless you go around breaking the tab key off your colleague's keyboards you'll trip over this. Especially (again) if you paste. Occasionally seen in merges too.

- editors can no longer give you 100% correct indentation.

> - "cut and paste and edit" is broken. You can't autoformat the pasted code into the right place, you have to go back and fix the whitespace. Since whitespace is semantically significant, this can introduce bugs.

Depends on how your editor is configured / it's feature set. Which makes me wonder how editorconfig would handle this when enabled. It seems like a insignificant issue to me, you can auto-PEP8 the code before pasting it. You should probably be following PEP8 anyway (as far as spacing is concerned at least).

> - visually identical whitespace may not be textually identical whitespace. Unless you go around breaking the tab key off your colleague's keyboards you'll trip over this. Especially (again) if you paste. Occasionally seen in merges too.

I turn on show all whitespace on my editors regardless of programming language. I've been burned by Sublime Text not just figuring out the already defined whitespace ruleset for a file by what it's using and just shoving in it's own defaults. I wish all editors would base whitespace on what the file's structure looks like, if there's mixed spaces, give me a warning.

> - editors can no longer give you 100% correct indentation.

I don't understand this, it sounds like you've got your editor configured poorly or something? But it goes back to how unintuitive the nice editors can be. You can use editorconfig to define the indentation project wide, then any editor should pick it up, of course if you define PEP8 at a minimum it guarantees spacing settings.

I'm not sure if PyCharm covers a few of those cases, since I use it so seamlessly I don't usually have complaints.

I’m on the opposite end. I just had to export a JSON based AWS CodePipeline configuration and had a hell of a time trying to edit it and paste things in the right place.

I ended up converting it to yaml, making the edits and converting it back to JSON.

Before anyone asks the obvious, how do I handle deeply nested code in brackets. Simple, I don’t. When things start getting nested deeply, I use my IDE to Extract Method.

In the YAML case: It's hard if you don't have editor support and good diagnostics. Not because you're unusually sloppy, but because you make human mistakes and because you don't know the syntax. (YAML syntax is surprisingly complex and poorly documented in the pedagogic sense). Also, the edit-debug cycle is slow with Ansible or YAML-using CI systems, so this is doubly painful.

In the Python case it's much better, because people less often casually edit .py files without editor support, and because Python has good diagnostics and it's much much harder to produce syntactically correct but semantically wrong Python by whitespace mixups.

Everything is hard when you don't have editor support and good diagnostics. Don't blame YAML because you prefer to use Notepad.exe

However, missing/extra whitespace is not "hard". You would be docked points in an English paper and you should be docked points as a programmer.

So, whitespace aside... Tell me what is easier to edit without built-in syntax support: JSON, or YAML?

If we define "easy" as "how long it takes to complete a task" or "how quickly you can grok the structure of a given block of code", then YAML beats out JSON every time.

I see you restate your argument for clarity, let me try the same :)

1) YAML is a configuration file format, and it's targeting user groups and environments where people use ad hoc terminal based or os-bundled editors, such tools being nano or Notepad, and such users being sysadmins for example. 2) YAML implementations (=parsers) have poor diagnostics compared to Python, separate from the editor issue, and 3) YAML syntax is more prone than Python to parsing correctly but producing unwanted semantics when you make a mistake.

I think there is value in your English paper analogy: many/most people editing YAML files don't know YAML syntax very well compared to this scenario. If their knowledge of English was at the same level, misplaced whitespace would not be chief of their problems in a graded English paper.

It is of course a structurally valid (philosophically consistent) argument that people should not make mistakes and they should suffer when they do, but this goes generally against the consensus of configuration language usability thinking.

In my opinion no one should be using Notepad for programming work or configurations that are more than 1-2 dozen lines. Nano is about the same: It's a text editor with no inherent tooling for configuration files and syntax support.

A construction worker can't complain that nails are hard to use because they showed up to work with a baseball bat. Or that they're designed badly because they brought a soft aluminum hammer with a tiny head instead of one made with a stronger metal and large impact surface. Tooling is important. Vim and several graphical editors have syntax support. Notepad++ if you're on Windows.

> YAML syntax is more prone than Python to parsing correctly but producing unwanted semantics when you make a mistake.

If you made a mistake, you made a mistake. Why do you expect a program with a mistake to work correctly? Use tooling which prevents you from making mistakes. And the particulars of YAML semantics are orthogonal to how your editor handles it. Yes = True, No = False, etc., for better or worse, but that's got nothing to do with your editor.

> many/most people editing YAML files don't know YAML syntax very well compared to this scenario. If their knowledge of English was at the same level, misplaced whitespace would not be chief of their problems in a graded English paper.

I wholeheartedly agree. So if a programmer complained to me that they were having issues related to inconsistency with whitespace, I would be suspect of their general programming abilities and would start reading their code to determine if the problem lies deeper than just getting an extra space here or there: Incorrect tooling, linting, sloppiness, inattention to detail... All of these things get in the way of well-written software.

As for whitespace in general, and the fact that it's harder for linters to determine and highlight if a block is correctly scoped without enclosures... Python and others have this same issue.

> It is of course a structurally valid (philosophically consistent) argument that people should not make mistakes and they should suffer when they do, but this goes generally against the consensus of configuration language usability thinking.

True, and I agree. Everyone makes mistakes even with things as simple as rote data entry. This is why tooling is incredibly important.

Tightrope-walking at great heights is incredibly dangerous. Practitioners accept this danger. They typically wear harnesses to mitigate the danger of falling. Of course, some people like to live on the edge and set records involving no harnesses. If someone like Dean Potter fell while walking a tight-rope freeform with no harness and plunged to their death, their last thought wouldn't be, "Shit, I knew that tightrope was poorly designed and dangerous," it will be "Shit, I wish I'd been wearing a harness."

We can't remove our harnesses and then complain that mistakes are too frequent and costly.

Editing JSON is ok without specific format support, it just looks like any other C-like language. Editing YAML is basically impossible without specific support, your editor will almost certainly break any file you open and destroy relevant information on the process.

Care to elaborate? Your statement is hollow on its own.

How does your editor destroy information? What kind of information is destroyed? Why is your editor rearranging bits in validly encoded text files?

Most programming editors rearrange the white space of the files they open. Some do it more, some do it less.

Rearranging white space in a YAML file often destroys information.

Mine do no such thing. The only whitespace that gets stripped in /any/ editors I have are trailing whitespace and extra whitespace before the EOF, and that's only in certain IDEs where I have consciously enabled these options. They are disabled by default.

Removing trailing whitespace should never change the logic of a file in general, but as for YAML it certainly doesn't. And editors should never remove leading whitespace... who does that?

Can you name an offending editor as an example?

Press tab on a line in emacs, and the whitespace will get rearranged. It's more explicit in vi, but don't bother (un)indenting blocks there either.

Just writing characters anywhere a file on the MS IDEs I've tried is enough to rearrange the line's whitespace, while the Jetbrain's I've tried are more conservative and won't break lines you haven't changed somehow.

Ok, now show me a single editor that doesn't make whitespace changes when you press tab.

I've only ever had issues with vim messing up whitespace on the line I'm typing specifically with regard to YAML, and yes that's an issue but it has nothing to do with YAML. For example, Adding another colon to a string, wrapped in whitespace or not, will often reduce indentation. That's just plain bad behavior, but it's not intended behavior.

I largely agree with you, but:

How hard is it to use HN formatting? I can’t count how many times people screw it up.

It’s not difficult to begin with, the documentation is free, yet here I am reading your comment with broken formatting.

    something: nothing
      hello: goodbye
Anyone who has trouble with this is just being sloppy. No useless backticks! You might think you’re doing it right, but unless you check, maybe you’re not.

lol Well played.

Have a look at the CodeDeploy appspec.yml specification for whitespace [1].

"AWS CodeDeploy will raise an error that might be difficult to debug if the locations and number of spaces in an AppSpec file are not correct."

Great. There couldn't possibly be an easier format to use, could there?

[1]: https://docs.aws.amazon.com/codedeploy/latest/userguide/refe...

In fairness, that documentation makes the process out to be far more complicated than it actually is in reality. Plus their point about errors being difficult to debug can be equally true with other data formats too (eg some JSON parses can throw really unhelpful errors if you accidentally include a comma at the end of list)

Please consider simply believing and trusting those who tell you they hate significant whitespace and that it is a real impediment to work.

Another take, perhaps: Assigning deep semantic significance to invisible symbols is simply stupid. It is stupid to a much greater degree than wanting to be free from having to care about the amount of invisible symbols is “sloppy”.

YAML is a generic format which leaves formatting as a result of effect to you. Ansible puts rules on top of it, which makes intendation not always trivial, it is easy to have an dangling key value pair which doesn't cause an error, but has only an effect with the right intendation.

> how hard is it to just use consistent indentation

It's pretty annoying when you don't have access to an IDE or decent editor.

YAML is a bit bonkers in that it's a superset of JSON (all valid JSON is valid YAML), so if you don't like the whitespace sensitivity, you can write your YAML like this:

    a: 42,
    # But you can have comments!
    b: "hello world",
    c: "and
      strings!",  # and trailing commas!

I wrote a pared down version of YAML because while I like the basic structure I hated the complicated bullshit like the "we also parse JSON" layered on top:


Worse than JSON though, is the Norway problem.

If you remove this stuff and start validating it properly it becomes much easier to maintain.

> the Norway problem



I've always liked YAML, it's always seemed pretty intuitive to me coming from Python, and I like human-readable resource files, but those are some pretty damning counterexamples.

JSON.NET has an insane default "smart deserialization" mode which checks if string values are valid ISO dates, and if so, deserializes them to DateTime. The result is that your typical unsuspecting app works fine for a long time, until the user just happens to throw data at it that has a date-like string in it somewhere - and so the app code gets a DateTime instance where it expected a string.

And depending on how exactly it was accessed, this can go two ways. The best case is that the app just gets the value via the untyped API, casts it to string, and blows up with an invalid cast - best because you actually know what went wrong.

The worst case is when the app specifically tells JSON.NET that it wants a string value (via generic type parameters), at which point it will helpfully implicitly convert the actual date value back to a string... except it can reformat it, and even helpfully adjust it from one timezone to another. Semantically it's the same date, of course, but it's not at all the same string, and sometimes that matters a lot. So this is the worst case because it's just silent data corruption.

For some mysterious reason, the author believes that this is acceptable default behavior - i.e. "it's a feature, not a bug". It's especially ironic to look at all the mentions in GitHub ticket, as various projects that rely on the library run into this issue (one of them is mine):


I'll be honest, I didn't believe you at first so I went and tested some JSON against a few YAML parsers and it's completely true.

This is insane. I already knew writing a YAML unmashaller was a needlessly complicated affair - and that was before I realised this could happen.

Oh hey, a Python dictionary that throws an error. Just kidding -- faced a JSON parsing issue today and this made me smile.

It's a step up from XML though.

For me lack of validation and comments are many steps down, not up.

Lack of validation?

JSON files don't have validation support like XML schema does.

Maybe not built in like DTD, closer to XML Schema in its laid-on-top manner, but I would argue that JSON Schema is fairly good, and on its way to becoming an IETF standard: http://json-schema.org/ and https://tools.ietf.org/html/draft-handrews-json-schema-01 (et al)

The lack of obvious namespace management is suboptimal, but so far I thankfully haven't encountered a situation where it was a show-stopper.

IntelliJ and related tools also allow associating a json schema with a YAML file, which I have found infinitely handy

JSON Schema is actually pretty great.

I find YAML to be almost unusable. IMO it's just not intuitive. If I get to choose a format for my config files I would only use TOML, it's just better (again IMO).

All these comments amuse me, because I feel the opposite. YAML has always made immediate intuitive sense to me. Meanwhile TOML feels like a terrible hack.

Also, I'm guessing I'm in a tiny minority who loves YAML but hates Python's semantic indentation...

I like YAML but it has some minor quirks and it feels overused in domains in which it simply doesn't make sense to use YAML. I can think of ansible or complex dynamic configurations that depend on external values as mentioned in the article. If simple merging of a base file + dev, staging, prod files isn't enough for the task at hand then YAML is a bad fit.

I'm the same way. I think it's a difference in my expectations of a programming language versus a hierarchical data storage format. I'm fine with (and even prefer) enforced whitespace in data formats. That makes it easier to view and edit.

In programming languages, it makes me twitch. I don't have any problem with "accepted" formatting styles (i.e. linux kernel c style), but for the language itself to enforce that for some reason feels like it's adding perpetual cognitive overhead (like whenever I use python). I don't know why; it shouldn't be any different than using a particular formatting style voluntarily in a more flexible language, but somehow, it feels different.

This is strange. The thing I most like about YAML is how intuitive and human readable it is.

One thing that is rarely pointed is that one of the advantages of TOML is that it allows to write dictionaries/tables not as trees.

Agreed. From readability perspective, I started out with INI, which I ditched partially due to having no standard on the format and skipped JSON when it can't add comments, skipped YAML for not looking too intuitive, considered JSON5 but skipped for being not popular enough and landed on TOML.

And I skipped toml for an object-array syntax that makes baby Jesus cry.

None of these problems are hard, why are all the solutions so awful?

What are you using now?

XML, generally. I hate it but I've given up on its "successors".

Hocon for me, if I get to choose.

I don't mind YAML. I dislike that something things become strings and sometimes they become other types:

   foo: bar   # {"foo":"bar"}
   foo: "bar" # {"foo":"bar"}
   foo: 42    # {"foo":42}
   foo: "42"  # {"foo":"42"} 
Other than that, no major complaints. My editor understands YAML and shows the indentation level in the background (highlight-indentation-mode) and auto-formats files so they all have consistent indentation (prettier-mode). As a result, it is not much of a nightmare to edit, despite the fact that semantic whitespace COULD cause you a lot of problems.

How about...

    foo: yes  # "foo": true
    bar: YEs  # "bar": "YEs"
    baz: YES  # "baz": true

Yeah, that's a little crazy. It's the classic case of in-band signalling. It never works. I wish quotes around strings were mandatory, then having 83 ways to say "true" would be OK. But when strings randomly get upgraded to other primitive types... it's a little weird.

I like looking at YAML when it doesn't use any of the insane YAML features. Even so I'm not convinced it should be used (at least not as widely as it is) for one big reason: it can be truncated almost anywhere and still be valid. This causes way more issues than you might think. JSON has no such issue - the only case I can think of where you can truncate JSON into valid JSON is if your JSON is just a number.

The simple parts of YAML are great for simple applications. The complicated parts of YAML are not great for complicated applications.

>> I really dislike YAML

I wasn’t aware anyone liked it

I have a feeling that it is/was preferred in rails/ruby community.

Tools keep using it even where it's the wrong tool for the job, so someone must like it, surely?

I'm sure it will be just like XML where it's the trendy thing for a while in the early days, then everyone stops and hates it for a while. Except XML at least has a handful of applications where it's the right tool for the job (it has a nice streaming mode), YAML doesn't even have that.

YAML is great for simple human-editable configuration files. Its very easy to write, and can be picked up quite quickly.

Opting for YAML over XML/JSON/whatever doesn't make me a tool. It made life much easier for myself & my colleagues.

I think they meant tools like docker or npm and stuff. Not calling people tools.

heh, I didn't even consider this interpretation of the above; sorry about that. I meant "tools like Docker keep using it", not people. But I still don't know what you're talking about; YAML has an 83 page spec that includes pointers, and uses tons of random confusing symbols. I say "maybe it's just me" in some of these posts, but I know it's not: I've watched many of my coworkers get it wrong the first time for years and then have to be corrected. A quick common example I see in CI config all the time: If I write version: 1.10, that's a number, then I decide to move to latest so I write version: 1.10.x, that's a string. Oops, we were never using version "1.10" we were using "1.1". Everything about it is implicit and bad. Now, it's easy to say "always use quoted strings", and I agree, but then why the hell does it have bare strings in the first place? That seems like an easy enough oversight or typo to make, and it will be made.

I generally dislike languages like YAML or Python where whitespace matters, and can break your code, however, YAML is way more easily human readable than JSON, so I started to appreciate it for readability purposes.

I guess YMMV, but after you've used both YAML and JSON for a while, you might appreciate YAML a little bit more.

The killer feature for me over JSON is comment support, it's definitely useful to add todo comments etc.

Yep, so true. Any decent config file that will be seen/edited by a human needs comment support. And any file that will not be seen by a human could be json (or whatever).

Yeah this is a huge selling point of YAML. JSON should have comments added to the spec. The other benefit to YAML is human readability, which is usually better in YAML compared to JSON. A specific glaring example of this is when there are long string-literal snippets inside the document, in YAML this is massively more readable than in JSON.

JSON shouldn't have comments added to the spec, because people shouldn't be trying to read or write JSON. It's an application interchange language, meant to be written and read by machines. YAML is a markup language, meant to be written and read by people. Ever notice how most YAML libraries don't even have a "dump" function?

>It's an application interchange language, meant to be written and read by machines.

Not entirely true, JSON is based on Javascript objects, it was meant to be written and read by humans just like Javascript, INI or any other basic serialized data format, or text-based programming language. If JSON were truly never meant to never be viewed or edited by human beings, it would have been published as bytecode.

>> Ever notice how most YAML libraries don't even have a "dump" function?

No. I guess unless you’re using something half-baked, I suppose. Machines need to be able to edit configs either way.

Hum... We have a bad historic of success in deciding that this or that stuff is for machine consumption only.

> JSON should have comments added to the spec.

See also ~5 years ago: https://news.ycombinator.com/item?id=7325735

You can just use JSON with comments though. If you have sufficient control over the technology in question to be able to completely change it to a YAML parser, surely you can change it to be a JSON+Comment parser too. See: VS Code's config files.

But you can restrict yourself to JSON+Comments and call it YAML; it isn't idiomatic YAML, but it's still valid YAML.

> You can just use JSON with comments though.

Then it's not valid JSON, and if you try to treat it as such bad things will happen.

My point is that if you're in sufficient control of the stack to be able to convert the whole thing over to YAML, you could just as easily convert the whole thing over to JSON+Comments. And of course bad things would happen if you treat JSON+Comments as JSON, but similar bad things would happen if you treat YAML as JSON, so I don't see your point. It's not like people are trying to send their tsconfig's on the wire as "application/json" and expecting arbitrary parsers to support it.

When you think about it, JSON lacks so much (check JSON5 for what it's lacking), it's hard to believe even a comment is not allowed when pretty much anything else allows, which is a showstopper by itself.

> YAML is way more easily human readable than JSON, so I started to appreciate it for readability purposes.

> I guess YMMV, but after you've used both YAML and JSON for a while, you might appreciate YAML a little bit more.

I've used JSON a lot, and XML and s-expressions and MessagePack and ini and YAML and a whole bunch of other formats.

I usually have to fire up Google to read YAML. YAML is the only one where I routinely have to Google for a syntax cheatsheet and wade through tables of redundancy and edge-cases.

YAML made sense before JSON became a thing. Why people persist with it in new projects is baffling to me.

a raw YAML file is readable with your eyes. A Json file needs to be prettified before you go thru it. Json is good for APIs but is not made for readibility.

We must disagree with what "readable" means then. I find JSON readable (as long as it's nicely layed out, e.g. by piping through 'jq "."'), in the sense that I can skim over the structure looking for [/]/{/}/". If I want to read some of the content, like a string, I just need to read '\"' as '"' and '\\' as '\', which is a small constant cost per (usually rare) occurrence.

With YAML it's difficult to even know the structure of what I'm looking at, due to anchors and extensions. It's also hard to discern structure from skimming, since strings can appear unquoted, and may contain unescaped lexical tokens (depending on which particular symbols it started with); hence we must carefully consider each and every character, rather than just skimming for the next token.

If I know I'm looking at a perfect YAML file, than I should be able to guess the gist of what it says, since I can make assumptions about what the syntax means. If I want to be sure, I'd be Googling for cheatsheets. Yet as a programmer, I mostly look at files when they're buggy, meaning I can't just assume that, say, an unescaped quotation mark won't terminate the string; or that a certain piece of text is allowed to run across multiple lines; or that the indentation corresponds to the nesting; etc.

That's what syntax highlighting is for.

I use notepad++ for YAML. Besides coloring what it thinks I'm thinking, it displays vertical lines corresponding to the indentation levels.

(I prefer INI/TOML whenever I can help it; hierarchies in TOML are so counterintuitive that it incentivizes a simple flat structure. But then, some things are irremediably hierarchical)

Is it really though? The implicit typecasting gives rise to very unexpected results, that you'd never get with JSON. See: https://hitchdev.com/strictyaml/why/implicit-typing-removed/

Have you considered alternatives like TOML or JSON5?

Writing YAML is easily the part I hate most about writing/deploying software. It's unstructured, feedback cycles tend to be slow (e.g. when deploying k8s configs into prod), and you can't possibly write something useful without documentation pulled up. It's definitely easier to implement than purpose built DSLs, but it's not a good experience.

Python has better IDE support and, maybe more importantly, python does not completely disallow any indentation style so it handles unsupported IDEs better. But it's essentially an IDE support problem.

My life improved a lot since I got an YAML mode for emacs. Now things would be just perfect if haskell's cabal migrated to dhall...

I know you only said Python is better, not great, but you might want to check out OpsMop: https://medium.com/@michaeldehaan/opsmop-building-the-next-g... . By the creator of Ansible, in pure Python, including the config.

Is there a reason http://opsmop.io/ and http://vespene.io/ are both down and their Github's both say "DISCONTINUED"?



Michael DeHaan threw his toys out of the pram because he wasn’t getting the user numbers he wanted and discontinued them

Looks like it happened a few days ago: https://threadreaderapp.com/thread/1091710068234641408.html

> In this case, lots of discussions show everyone is busy, has no time, and also ... increasingly they have interest in low-code/no-code type solutions. This is not open source as whole, just the IT ops vertical

Looks like he doesn’t believe the code approach is viable as much as other people are claiming in this thread.

I... No. The OpsMop Twitter has a tweet from January 31, so it seems like if the project had died it would have to be really recent. That would be sad.

YAML is useless because it replaces JSON (tree structure that is minimally verbose to not be confusing) with something worse (a tree structure that is just less verbose than JSON to be slightly confusing)

I didn't see this mentioned anywhere else, so another alternative (that I've seen and really like conceptually, but haven't used so far) to all this wildness with YAML and JSON -> https://github.com/dhall-lang/dhall-lang, and for kubernetes specifically -> https://github.com/dhall-lang/dhall-kubernetes

Came here to say similar. In particular dhall does allow scripting (functions etc.) but is non-Turing-complete as a feature. This seems like a particular sweet spot to me as it allows for more dynamism than data formats like json/yaml while constraining the scope sensibly.

It also has very nice bindings with haskell and nix

also this is probably a nicer intro https://dhall-lang.org/

What is up with the strange comma positioning? I assume that’s just a stylistic choice?

It allows each line to be completely independent of it’s neighbors; you can comment and/or add lines without needing to touch neighboring lines. Also, it makes it visually easy to spot missing commas. Give it a try sometime, it’s actually quite nice.

Wouldn't this also be solved by allowing trailing commas?

But they are not independent: first line doesn’t have one.

Also makes nice file diffs!

Wouldn't the same thing occur if it wasn't there at all?

Having prefixed commas is a rather common style in the Haskell community, because it ends up nicely matching open/closing brackets/braces and lining things up.

Since the author of Dhall comes from the Haskell community, he's kept this style.

Author here: This is correct. I'm just borrowing a Haskell convention. Also, I like this convention because it leads to vertical alignment of commas.

god those examples are ugly - commas at the beginning of a line? mismatched brace styles?

Not sure what you mean by "mismatched brace styles". The convention of putting separators like commas at the start of the following line rather than the end of the preceding line is common in Haskell, which Dhall is built with.

The advantages are:

- All of the separators are in the same column, along with the opening and closing characters. This makes it trivial to check if we've missed a separator.

- Appending new lines to the end will not affect previous lines (i.e. we don't need to go and add a comma). This avoids making mistakes and polluting diffs.

Unfortunately the error-prone diff pollution we avoid at the last line instead occurs at the first line. It's still less error-prone than trailing commas, since we can look in the separator column and either spot that it's empty, or that it contains two opening braces (depending on whether we inserted or copy/pasted).

It's a haskell thing. The main advantage is that each line is independent. You can comment out a line or add a line at the end without modifying anything else.

Each line is not independent: you cannot comment out the first line. A better approach is to allow trailing commas. (I suppose you could allow leading commas, does Haskell support this?)

Unfortunately no.

Allowing trailing commas, like Python does, would be really great. Unfortunately trailing commas already mean something: (a,b,) is a function that still takes 1 argument to make a triple. It's called "TupleSections".

For the sake of this comment, let's define "templating" to be attempts to solve the problem "I need $FORMAT due to an existing constraint, but $FORMAT does not entirely meet my needs on its own" (in this article, $FORMAT is YAML). Additionally let's say that in order to be a "template" something must be a text file (e.g. exporting a database table as $FORMAT does not count as "templating" for the purposes of this comment).

I think there are three very different kinds of tools that people use for this:

1. Interpolation/preprocessor languages: This is what the author is talking about. There are delimiters/tags/sigils to distinguish "the templated parts" from "the rest" and the primary operation done by the template engine is substitution. "The rest" is literal content that's already in $FORMAT and it remains mostly/entirely unchanged during template rendering. Languages of this type are basically glorified sed. This can be nice because they're agnostic as to their embedding (any string will do) so they're very portable/flexible (you don't have to create "handlebars for YAML", "handlebars for HTML", "handlebars for CSV", etc; one implementation does it all). Languages of this kind can work in the small but don't scale well for all the reasons mentioned in the article/comments. The language doesn't know anything about the semantics of $FORMAT and that can cause all kinds of pain. Examples include golang templates, PHP, ERB, handlebars, the C preprocessor, Jinja, etc.

2. Compilers/code generators: These are "complete" languages that compile to $FORMAT. The difference between these and and interpolation/preprocessor languages is that the entire input is the language, not just specific chunks/tags. This kind of language can be nice because you have complete control and can therefore guarantee valid output and do tricks like supporting multiple different output formats for the same input, but the downside is that you're working with an entirely new language so there's a learning curve, you need specialized syntax highlighters and other tools to work with templates, etc. Examples include HAML, Jsonnet, Dhall, etc.

3. Embedded DSLs: Templates of this kind are valid $FORMAT from the beginning, but have embedded ways to specify transformations to be applied to the parsed AST. These languages are homoiconic with respect to $FORMAT. First $FORMAT is parsed, then the template engine iterates through the AST to perform evaluations, then the result can either be used as-is in memory or serialized back to (a possibly different) $FORMAT. This is sort of like an interpolation/preprocessor language with the evaluation order swapped: preprocessing is "run the template engine, then parse $FORMAT" while this is "parse $FORMAT, then run the template engine". A downside of this approach is that it is less general, e.g. it only really makes sense when $FORMAT has a well-defined structure (you probably can't template plain english sentences with this approach), but these days most "data languages" have converged towards being semantically equivalent to JSON (lists, dictionaries, and primitives) and this approach works well for any of them. An upside is that like compilers/code generators you can guarantee that the output will be valid $FORMAT no matter what the template looks like. Examples include JSON-e, Lisp macros, CloudFormation templates, etc.

It's unfortunate that all of these get called "templating languages" because they're very different beasts from one another, and usually when I see conversations about this stuff these distinctions get blurred and you end up with apples-vs-oranges comparisons. If I had my druthers we'd reserve the word "templating" for the first one and use different terminology for the others, but that ship has sailed.

If you have been around long enough you still remember the world that was excited about XML and templating it using XSLT. As a hindsight it was a horrible world.

Even though YAML is not optimal, it is a human friendly compromise between too verbose XML and machine only JSON. It lacks native templating, leading to funny constructs e.g. with Ansible files. However human kind has made progress and will make progress further, so it is just a matter of time until someone comes up with sane "native templated YAML" and all projects will adopt it.

> If you have been around long enough you still remember the world that was excited about XML and templating it using XSLT. As a hindsight it was a horrible world.

I actually really like the idea behind XSLT: machine-friendly, human-tolerable, structured data + declarative rules for turning that data into a display, or a report, or whatever else.

The execution was horrible though: incredibly verbose, lots of overcomplication due to XML weirdness/asymmetries (e.g. attributes vs elements vs text, namespaces, ...); mixtures of different languages hidden inside each other (e.g. XPath hidden in attributes); etc.

I would really like to see what this could look like if done in a more minimalist, lispy fashion (normal code-is-data stuff in Lisp is similar, but I think term-rewriting is a more appropriate evaluation mechanism for such rules)

The syntactic mistake of XSLT was writing it in XML, XPath was a redeeming feature. Imagine if XPath was also written in XML...

jq occupies the same role as XSLT, but for JSON. It can be used for templating but it's not quite as declarative as XSLT (you must pipe things through).

> The syntactic mistake of XSLT was writing it in XML, XPath was a redeeming feature. Imagine if XPath was also written in XML...

Yes, I didn't mean to imply that XPath itself is bad (although it also has to handle XML quirks like element/attribute/text, etc.).

Rather that the reason to write XSLT as XML in the first place is that it's machine readable, we can mix and match elements from different vocabularies, etc. yet most of the heavy lifting ends up as opaque string attributes :(

PS: I've done a few projects which make heavy use of jq; it's really nice, but as you say it's more of a pipeline.

XQuery was halfway between XSLT and XPath in expressiveness - functions, loops, queries with joins etc, but no pattern matching. If it only had the latter, it'd be perfect.

I quite liked SXML + SXSLT back in the day (scheme syntax)

It is more concise. Similarly, I was a fan of using attributes instead of text elements (with their unnecessary close tags), but eventually was won over by neatness. e.g. translating an eg from https://www.gnu.org/software/guile/manual/html_node/SXML.htm...

  (parrot (@ (type "African Grey")) (name "Alfie")) 

    <type>African Grey</type>
It's more the one-variable-per-line pretty-printing than the syntax as such, but still.

XSLT was one in a litany of domain specific languages (ant, apache rewrite rules, latex macros, etc.) that evolved towards turing completeness because that's what the problem space demanded.

In most if not all of these cases an existing and well designed turing complete programming language would likely have better served them.

I don't see a reason that a DSL can't be Turing complete and still a better option than an existing language. If you look at old-school make files, they are little more than shell script with top level rules that you can invoke from the command line. You could theoretically just use shell script but the make scripts still simplify the task quite a bit.

Less power means less to go wrong. Automatic checks can also be deeper in a simpler system.

Your point is valid though, the power seems to end up being needed, sometimes, in some parts, in some cases. Escaping to a full language when needed seems to retain the benefits of both worlds.

There was a post on HN a few weeks back to the effect that it's rather easy for Turing completeness to emerge accidentally. I wish I remembered more specifics so I could find it again.

I wrote a lot of XMLT in early 2000.

The idea behind pure function is nice, but in practice you ended up with hundreds of mini plugins in Java or Python.

I find XSLT intolerable in practice. Thankfully I've only had to touch it a handful of times. I agree the idea behind it is neat but boy is it a headache.

I kinda miss it, actually. XML had many warts, but at least everybody spoke it, and it was the same everywhere. Occasionally you still had some overlapping but different things, like XSD and RELAX NG schemas (though even there, there was a big difference - one is a language for describing data types, and the other is a language for describing grammars). But it's better than several dialects of JSON, YAML, TOML etc.

I also rather liked thorough extensibility. Namespaces were the right idea, despite clunky syntax. Today you can see Clojure doing something similar in Spec.

And while we're on the subject of XML, XSLT and Clojure; I feel like this is the best solution for readable serialization of tree-like data, and an associated ecosystem of tools (to validate, transform etc). Note some nice features for humans, like the ability to comment out a specific node, in addition to the usual line-oriented comments.


As, I think Douglas Crockford said, the best thing that XML delivered was UTF-8.

Then we have built a whole new domain on things on the top of UTF-8.

I actually like XML templating, it's the only one that supports it in the file.

I export custom queries for data to xml, Json, CSV and HTML.

Where HTML is XML + XSLT. It works great. And clients can even theme it

I also like it, for a couple of years my web site was XML + XSLT.

What you say you want is XSLT though, just wity a better syntax and for YAML. XSLT was just fine, had a lot of very good ideas, just horrible syntax.

Horrible syntax is kind of a forgivable offense, lots of things have horrible syntax and work fine.

Generating YAML with go templates though.. it is just horrible on so many levels.

I definitely feel the 90s can take the higher ground over 2010s on this one.

I saw this title and immediately knew the article would be about Helm. I don't think anyone wants to use Helm. People use it for a set-and-forget thing that they don't care about (who cares that it's called impressive-leopard-kubernetes-dashboard, after all.)

kustomize is much more sane for your own stuff: https://github.com/kubernetes-sigs/kustomize

It is actually a little bit too magical for my taste, but I continue to use it because it hasn't done anything stupid. I have one file that maps logical names to images in a container repository. If I create a service called "foo" pointing to selector.app.label="foo" in the base, then in production it's called foo-prd and the label magically updates to foo-prd for the selector. It actually understands what it's generating, and while they might have taken it a little bit too far, it's far better than just dumb text replacement.

I’m in agreement; it seems lots of projects use helm charts for hello world / standard deployment demos, but considerably fewer run Helm charts exclusively in production clusters.

> I don't think anyone wants to use Helm.

Here is why everyone should use Helm:

Helm 2.0 introduced package as a first-class concept for Kubernetes and created the standard to distribute applications, thanks to Helm thousands of people could discover and collaborate on cloud-native deployments of the open source software https://github.com/helm/charts/tree/master/stable published and managed by organizations and contributors all over the world.

Helm 3.0 keeps innovating, it adopts the most forward-thinking approach to package management and Kubernetes config management by using higher level domain specific language based on Lua to create expressive package management system:


Helm is also backed by CNCF[1] and is the best choice so far for organizations to create a reproducible CI/CD pipeline in a Kubernetes cluster.

[1] https://www.cncf.io/blog/2018/06/01/cncf-to-host-helm/

That is a lot of buzzwords.

I don't really trust Helm to do anything that's actually useful in the long term. It will get something running very quickly, but whether or not it's maintainable, I am yet to be sure of. For example, very early on, I installed the helm chart for prometheus. Now I want it to live in the kube-system namespace because I am tired of seeing its resources in the default namespace. For some reason, I highly doubt that changing values.yaml to change the namespace is going to do anything other than give me a fresh instance of prometheus running in another namespace. It's not going to use the already allocated storage volume to satisfy the persistent volume claim in the new namespace. It's not going to update the other stuff in my cluster to refer to prometheus-pushgateway.kube-system.svc.cluster.local. It's not going to update my Grafana dashboards to refer to the new namespace, even though I installed Grafana with Helm! So what did I really gain? Helm isn't giving me the ability to manage the long-term lifecycle of third-party software. It just explodes some API objects all over my cluster and lets me delete most of them automatically. That's all it does.

I get why Helm is popular. You can get some piece of software running in Kubernetes with minimal effort. I would have never successfully made some random complex piece of software work correctly in Kubernetes on day 1, especially using something that assumes you deeply understand the core API objects like kustomize does. What that boils down to is that Helm doesn't go far enough, and in its current state, just encourages people to make mistakes early.

I'm not very fond of helm's seemingly imperative approach, though. Am I mistaken about that?

Since you seem to be in the know, what kind of timeline are we looking at for Helm 3?

I've been waiting for it for over a year now...

As others in this thread have said: I ask this question all the time, except s/templating/using/.

YAML is insanely over complicated; it's as bad or worse than XML for config files, and it doesn't even have the nice streaming mode. Not to mention that it's a bit of a security nightmare (seriously, who put pointers into the YAML spec?).

And, on a more subjective note, YAML is just confusing: between all the significant whitespace and the random single character symbols that no one ever remembers what they do, I never get a YAML document right on the first try.

Templating it really does add a whole new level of headache too.

>it's as bad or worse than XML for config files

XML works very well for config files. It's schema-optional (but is there), well-specified, human-readable, has plethora of supporting technologies (making things like templating easy), and is well supported by every language.

At the very least it is way better than JSON.

It's missing one important part for config files. It's tedious to write by hand.

XML is not tedious at all with the right tooling. For example a tool like Visual Studio IntelliSense proposes only elements and attributes valid in the context, automatically close tag, format the file and complete opening tags too so it makes editing XML file a breath.

The tooling bar for config files is set firmly at notepad.exe

If your config file requires more tooling than that you fucked up.

I mean, with Visual Studio I can even the change the code of the application I am running on the fly, depending on what exactly I am configuring :)

That doesn’t make XML an easier format to maintain.

notepad is a terrible editor for anything other than simple property files. For example, editing JSON that has any sort of complexity would be just as painful.

Not really. For quick authoring, configs should have pre-authored snippets for common things, that are commented out, and have adjacent descriptive comments ("Uncomment the following to ...") - this is regardless of their syntax.

And for complicated stuff, you're going to spend a lot more time reading the manual than you will actually typing those closing tags. In fact, in most cases you'd be copy/pasting bits from the manual as well.

> human-readable

technically yes, practically maybe not so much, especially with e.g. CDATA sections

If we're talking about a human-editable configuration file, then yes, it will be quite human-readable.

Machine generated XML can be noisy but the target for those are other machines, and the extra context is there for a reason.

You can certainly make XML as obtuse and complex as you want.

Two examples from two IDE's.

Codelite's xml based project files can be easily read and modified by hand. Diffing them yields useful information about files added and moved, config values changed, etc.

Eclipses project files also written in xml are an Eldritch Horror.

I think the failing of xml is also it's strength. It doesn't do typing and schemas, doesn't even try. Which means that can be sane. Or not.

He means Cantor's XML you barbarian.

There's absolutely no way that this (copied from https://github.com/akeeba/fof/wiki/The-XML-configuration-fil...):

    <?xml version="1.0" encoding="UTF-8"?>
        <!-- Common settings -->
            <!-- Container configuration -->
                <option name="componentNamespace"><![CDATA[MyCompany\MyApplication]]></option>
            <!-- Dispatcher configuration -->
                <option name="defaultView">items</option>
            <!-- Transparent authentication configuration -->
                <option name="totpKey">ABCD123456</option>
                <option name="authenticationMethods">HTTPBasicAuth_TOTP,QueryString_TOTP</option>
            <!-- Model configuration. One tag for each Model. -->
            <model name="orders">
                <!-- Model configuration -->
                    <option name="tbl"><![CDATA[#__fakeapp_orders]]></option>
                <!-- Field aliasing. One tag per aliased field -->
                <field name="enabled">published</field>
                <!-- Relation setup. One tag per relation -->
                <relation type="hasMany" name="items" />
                <relation type="belongsToMany"
                          pivotTable="#__foobar_orders_transactions" />
                <relation type="belongsTo" name="client" foreignModelClass="Users@com_fakeapp" />
                <!-- Behaviour setup. Use merge="1" to merge with already defined behaviours. -->
                <behaviors merge="1">foo,bar,baz</behaviors>
            <!-- Controller, View and Toolbar setup. One tag per view. -->
            <view name="item">
                <!-- Controller task aliasing -->
                    <task name="list">browse</task>
                <!-- Controller ACL mapping -->
                    <task name="dosomething" />
                    <task name="somethingelse">core.manage</task>
                <!-- Controller and View options -->
                    <option name="autoRouting">3</option>
                <!-- Toolbar configuration -->
                <toolbar title="COM_FOOBAR_TOOLBAR_ITEM" task="edit">
                    <button type="save" />
                    <button type="saveclose" />
                    <button type="savenew" />
                    <button type="cancel" />
        <!-- Component backend options -->
            <!-- The same options as Common Settings apply here, too -->
        <!-- Component frontend options -->
            <!-- The same options as Common Settings apply here, too -->
is at all preferable to this:

      (container (component-namespace "MyCompany\\MyApplication"))
      (dispatcher (default-view items))
       (totp-key ABCD123456)
       (authentication-methods (http-basic-auth-totp query-string-totp)))
      (model orders
             (config (tbl "#__fakeapp_orders"))
             ;; Field aliasing. One tag per aliased field
             (field enabled published)
             ;; Relation setup. One tag per relation
             (relation items (type has-many))
             (relation transaction (type belongs-to-many)
                       (local-key "foobar_order_id")
                       (foreign-key "foobar_transaction_id")
                       (pivot-local-key "foobar_order_id")
                       (pivot-foreign-key "foobar_transaction_id")
                       pivot-table "#__foobar_orders_transactions")
             (relation client (type belongs-to)
                       (foreign-model-class "Users@com_fakeapp"))
             ;; Behaviour setup. Use merge="1" to merge with already defined behaviours.
             (behaviors (merge 1) (foo bar baz)))
      ;; Controller, View and Toolbar setup. One tag per view.
      (view item
            (taskmap (list browse))
            ;;  Controller ACL mapping
             (task dosomething)
             (task somethingelse core.manage))
            ;; Controller and View options
            (config (auto-routing 3))
            ;; Toolbar configuration
            (toolbar "COM_FOOBAR_TOOLBAR_ITEM"
                     (task edit)
                     (button save)
                     (button saveclose)
                     (button savenew)
                     (button cancel))))

There's just no way.

You changed the model when you adapted the xml to lisp. You decided that some tags are unnecessary, dropped some attributes and assumed others are merely different types of child nodes - and now your sample doesn't actually have the same semantic meaning as the XML example. You also removed some comments. Was all this done to emphasis how much cleaner a lisp alternative would be? If we're playing this game, you can actually simplify the XML configuration file as well. If you attempted to capture everything that the XML does, it would make your lisp sample much more ugly.

Anyway, to each his own, but I think XML holds up very well and I do find it more readable and easier to work with that your lisp example.

I also never said XML is the best configuration format. For simple configurations a simple property file is by far the best option. For anything complicated (as in your example) XML does a great job. To contrast, JSON would fall flat on its face with this. Not to mention the fact XML parsing is typically part of the standard library of most programming language and most people are familiar with it.

> You changed the model when you adapted the xml to lisp.

That was a conscious decision, because the verbosity of XML prevents clear understanding of a data model, while the cleanness of S-expressions enables a clarity of vision which enables prudent judgement when laying out a data structure.

> You also removed some comments.

Yes, because they were akin to:

    // Add 1 & 2, assign to X
    x = 1 + 2
If you really want an S-expression version of the XML in that example, here is SXML[0]:

      (*comment* " Common settings ")
       (*comment* " Container configuration ")
        (option (@ (name "componentNamespace")) "MyCompany\\MyApplication"))
       (*comment* " Dispatcher configuration ")
        (option (@ (name "defaultView")) "items"))
       (*comment* " Transparent authentication configuration ")
        (option (@ (name "totpKey")) "ABCD123456")
        (option (@ (name "authenticationMethods"))
       (*comment* " Model configuration. One tag for each Model. ")
       (model (@ (name "orders"))
              (*comment* " Model configuration ")
               (option (@ (name "tbl")) "#__fakeapp_orders"))
              (*comment* " Field aliasing. One tag per aliased field ")
              (field (@ (name "enabled")) "published")
              (*comment* " Relation setup. One tag per relation ")
              (relation (@ (type "hasMany") (name "items")))
               (@ (type "belongsToMany") (name "transactions")
                  (localKey "foobar_order_id") (foreignKey "foobar_transaction_id")
                  (pivotLocalKey "foobar_order_id")
                  (pivotForeignKey "foobar_transaction_id")
                  (pivotTable "#__foobar_orders_transactions")))
               (@ (type "belongsTo") (name "client")
                  (foreignModelClass "Users@com_fakeapp")))
               " Behaviour setup. Use merge=\"1\" to merge with already defined behaviours. ")
              (behaviors (@ (merge "1")) "foo,bar,baz"))
       (*comment* " Controller, View and Toolbar setup. One tag per view. ")
       (view (@ (name "item"))
             (*comment* " Controller task aliasing ")
              (task (@ (name "list")) "browse"))
             (*comment* " Controller ACL mapping ")
              (task (@ (name "dosomething")))
              (task (@ (name "somethingelse")) "core.manage"))
             (*comment* " Controller and View options ")
              (option (@ (name "autoRouting")) "3"))
             (*comment* " Toolbar configuration ")
             (toolbar (@ (title "COM_FOOBAR_TOOLBAR_ITEM") (task "edit"))
                      (button (@ (type "save")))
                      (button (@ (type "saveclose")))
                      (button (@ (type "savenew")))
                      (button (@ (type "cancel"))))))
      (*comment* " Component backend options ")
       (*comment* " The same options as Common Settings apply here, too "))
      (*comment* " Component frontend options ")
       (*comment* " The same options as Common Settings apply here, too "))))
Which I think is still indubitably and inarguably clearer & cleaner than the XML version.

Technically, the XML spec requires whitespace preservation, so really it’s this:

     (fof "
          (*comment* " Common settings ") "
          (common "
                  (*comment* " Container configuration ") "
                  (container "
                             (option (@ (name "componentNamespace")) "MyCompany\\MyApplication") "
                  (*comment* " Dispatcher configuration ") "
                  (dispatcher "
                              (option (@ (name "defaultView")) "items") "
                  (*comment* " Transparent authentication configuration ") "
                  (authentication "
                                  (option (@ (name "totpKey")) "ABCD123456") "
                                  (option (@ (name "authenticationMethods"))
                  (*comment* " Model configuration. One tag for each Model. ") "
                  (model (@ (name "orders")) "
                         (*comment* " Model configuration ") "
                         (config "
                                 (option (@ (name "tbl")) "#__fakeapp_orders") "
                         (*comment* " Field aliasing. One tag per aliased field ") "
                         (field (@ (name "enabled")) "published") "
                         (*comment* " Relation setup. One tag per relation ") "
                         (relation (@ (type "hasMany") (name "items"))) "
                          (@ (type "belongsToMany") (name "transactions")
                             (localKey "foobar_order_id") (foreignKey "foobar_transaction_id")
                             (pivotLocalKey "foobar_order_id")
                             (pivotForeignKey "foobar_transaction_id")
                             (pivotTable "#__foobar_orders_transactions")))
                          (@ (type "belongsTo") (name "client")
                             (foreignModelClass "Users@com_fakeapp")))
                          " Behaviour setup. Use merge=\"1\" to merge with already defined behaviours. ")
                         (behaviors (@ (merge "1")) "foo,bar,baz") "
                  (*comment* " Controller, View and Toolbar setup. One tag per view. ") "
                  (view (@ (name "item")) "
                        (*comment* " Controller task aliasing ") "
                        (taskmap "
                                 (task (@ (name "list")) "browse") "
                        (*comment* " Controller ACL mapping ") "
                        (acl "
                             (task (@ (name "dosomething"))) "
                             (task (@ (name "somethingelse")) "core.manage") "
                        (*comment* " Controller and View options ") "
                        (config "
                                (option (@ (name "autoRouting")) "3") "
                        (*comment* " Toolbar configuration ") "
                        (toolbar (@ (title "COM_FOOBAR_TOOLBAR_ITEM") (task "edit")) "
                                 (button (@ (type "save"))) "
                                 (button (@ (type "saveclose"))) "
                                 (button (@ (type "savenew"))) "
                                 (button (@ (type "cancel"))) "
          (*comment* " Component backend options ") "
          (backend "
                   (*comment* " The same options as Common Settings apply here, too ") "
          (*comment* " Component frontend options ") "
          (frontend "
                    (*comment* " The same options as Common Settings apply here, too ") "
But I think that rather proves my point: XML obscures that which should be obvious.

(and apologies for these terribly vertical posts — I think that they go a long way towards demonstrating the need for a compact information representation).

0: http://okmij.org/ftp/Scheme/xml.html#SXML-spec

If you think the whitespace is bad, just wait till you see the implicit casting: https://hitchdev.com/strictyaml/why/implicit-typing-removed/

I end up hitting bare string implicit casting problems constantly. I also end up catching them in code review when codeworkers do it constantly and yet I still end up doing it too. This might be the best example of why YAML is overengineered garbage (that and the fact that the spec is 83 pages long and has pointers… WTF?).

God, I hate programming some times.

Stick with a config format that isn't overengineered and too clever for its own good, like... anything but YAML?

Unfortunately, this isn't practical. For any of my own tools I will never use YAML, but I don't just use my own tools, and reinventing the wheel just to not use YAML has its own problems which are (in some cases) worse.

You can always use a subset of YAML (much like we do with JavaScript these days).

That's a nice thought, but it comes with its own problems. For example, XMPP uses a sane subset of XML, which is nice, except that people throw full XML parsers at it (because why wouldn't you? Your XML library parses XML and limiting that is more work for the developer to do) and then end up with vulnerabilities they don't know about like entity expansion DOS's or system directive stuff (and YAML has lots of tricky behavior that can be abused too like pointers).

Using a subset creates more work for the developer, so many just won't bother (if they even know that it's using a subset and they have to do more work at all), which leads to issues.

We can agree to generate a subset - the things we won't add to a certain stream - even if we are more lax in what we accept by using full parsers.

We can agree, but that doesn't mean that others won't do it anyways; unless you operate in a silo, it's not likely that you're the only one writing software to use your system.

Standards get written and implemented by lots of people, and even tooling like Docker gets alternative implementations.

That might solve one problem but fighting your tools and occasionally hating the complex giant messes we've engineered is a fact of life for any programmer. I absolutely love programming, just not always the process.

We get better and better tools each year but it still seems unavoidable. We're ultimately building incredibly complex systems with each layer using multiple development approaches, style choices, language choices, degrees of quality/time investment by the creator, etc.

TLDR: you can't help bang your head against the wall in any real-world day-to-day programming

This is my favorite collections of reasons that yaml is a bad config file format: https://github.com/cblp/yaml-sucks

> it's as bad or worse than XML for config files

Let's agree to disagree here. No human should ever write XML. No human should ever be forced to read it.

YAML is very readable and writable if you stay away from the corners. Templating allows you to stay clear of the corners (the 1 char operators that concatenate stuff, b64 stuff and so on).

> the 1 char operators that concatenate stuff

But they're super useful. Some examples from Ansible.

    - name: Do something annoying.
      command: >
        -a yep
        -c it

    varable_that_I_need_to_preserve_whitespace: |

But they are less than readable.

File-based configs are a troublesome abstraction: they package unrelated concerns into a rigid document whose form must take a particular, application-dependent shape, and the assembly and disassembly of that document essentially becomes an API where key-value pairs are mixed with complex glue code. The application has to do this internally, but anyone who's generating their configs are also doing parts of this externally.

Templates try to bandage over that by drilling down the abstraction to key-value pairs themselves. And imperative constructs that sneak into templating languages are an artifact of wanting to gain expressiveness without losing the benefits of declarative form -- but really, the two are at odds.

YAML is a red herring -- we had the same headaches with XML a decade prior. The problem is always that there's relationships among the data (or even multiple instances of the config) that we care about, but that the structure of a single config file at rest cannot model.

Databases -- let's say, an SQL one -- are actually among the better solutions, because they allow the universe of config items to live in structured places without overspecifying the exact form the data must take when serialized into a file. Then, data can be normalized where it makes sense to avoid repetition and introduce propagation. An SQL database gives all the tools needed to accomplish this, using mostly declarative code.

Databases in a KV sense are often used for configuration, and SQLite's rise has increased richly structured configs that are specified at a higher level than what's typically done with other serialization formats, but the full approach has not caught on outside big enterprise systems and complex applications. Which is a shame, because it's hardly more complex than the current awkward pairing of a full serializer and a templating engine.

SQL as a configuration format is not that bad of an idea

Wait, what?

I feel this article is missing the bigger problem - one that for some reason just cannot die.

The problem is that of gluing strings together. YAML is not an unstructured text file, it's a tree notation. Whatever "templating" or "generation" mechanism you want to use, it needs to respect the tree nature of the language it operates on. It needs to respect semantics.

Gluing strings together is literally what causes SQL Injection to exist. It caused countless of defacements on the web, and countless of broken websites. I would think we've learned our lessons, but for some reason, I see these template languages still alive and kicking.

The article goes on to talk about Jsonnet, which takes the exact approach you describe - it generates JSON by aiming to be a "templated JSON" where the templating involves generating semantic objects, not strings.

Here's an example (adapted from some real-world code) where I specify the k8s cpu limit in one place, and then look up that info in several other places to avoid needing to change multiple values later:

      local container = self,
      requests: {cpu: 5.5, memory: "2G"},
      limits: container.requests + {memory: "4G"},
      environment: [
          name: "NUM_THREADS",
          value: std.toString(std.ceil(container.requests.cpu)),
Note how I can patch the container.requests object with an alternate memory limit, and how I can calculate an expression for the NUM_THREADS value in order to automatically set it to ceil() of the requested cpu.

(edited for nicer formatting of the code)

It’s DRY run amok. People don’t want to see the same bit of anything in two places, forgetting that code will be read, so they remove a “redundancy” but create duplicated effort for everyone, every time they have to decipher the thing later.

I also don’t think we have a workable definition of “configuration”. 70% of the config at my work is hard coded service discovery. If we moved the service discovery anywhere else (say, consul, kubernetes, hell - Docker swarm), we’d need far fewer sets of config than we have deployment environments. When there are only two or three you don’t need templating.

How often do you really have the same service deployed twice in prod and legitimately want it to work differently? I can count the scenarios I know on one hand and none of them have occurred for me in almost ten years, except read replicas and that shouldn’t be more than a few lines of config.

There have been times I've wanted to templatize my configuration but I don't want to do it with text-based templates but templates within the configuration files syntax (be it yaml, toml, or something else). Not sure what this is called, I've been calling it "structural templating".

So far the only things close to this are

- Azure pipeline's syntax: https://docs.microsoft.com/en-us/azure/devops/pipelines/proc... - Something called Jasonette: https://docs.jasonette.com/templates/ - Something called Jsonnet: https://jsonnet.org/

Azure Pipeline's approach I think is closest to what I've been looking for.

Anything else in this space?

I prefer https://github.com/taskcluster/json-e

Has the advantage/disadvantage that it's still valid json/yaml

I wrote a scary command line wrapper for it:


Has libs for Python, Go, and JS, and there's a bazel interface.

This looks really nice. I'll have to give this a try at some point to see how I feel about it vs Azure Pipelines. From my quick look, this looks more general purpose at the cost of more verbosity.

plug for json-e i found it much less painful-looking than jsonnet

Dhall-lang ( https://dhall-lang.org ) is another, somewhat interesting, attempt to solve this program: it comes with a non-Turing complete programming language, so you can bring some abstraction to your configuration files without having to worry about things like infinite loops.

The real question is why are we using yaml at all?

Easier to read and write than JSON.

JSON requires constant quoting, can't support multiline strings, has no comments, has no/little typing (e.g., no datetime type). It's not good if a human needs to encode data.

For configs, I think TOML beats YAML hands down; I think YAML's spot is at encoding data structures that humans need to read/write.

I do agree that YAML, the spec, is fairly complicated. But YAML, as used in most projects, by most people, is not, and can be picked up fairly quickly. It is easier to visually read as it removes much of the clutter that would exist in the comparable JSON. It isn't typically necessary to know the entirety of the YAML spec to be useful with YAML, and most of the parts you won't know will get introduced by an obvious-looking sigil, which can be used to figure out what you're dealing with.

When I've actually sat with folks struggling with YAML, it's almost always in configuration tools, and it's also always around the templating bits. Ansible, in particular, has a bizarre templating: it happens after YAML parsing, which is not the mental model most people use when approaching it. I've also found that most of the people I've spoken to intertwine YAML and Ansible's templating functions, thinking they're one in the same.

I do not think Ansible makes good use of YAML: I would rather write task files in an actual programming language, since they are — at their core — a program. (The tasks do have some metadata attached to them, but the core task itself is a program. A function in some real language can get metadata attached to it in a number of ways, and that would be a better solution.)

YAML quoting is only optional if you're obsessive enough to always remember to quote the country code for Norway.

>For configs, I think TOML beats YAML hands down

If you compare a medium sized, 3 level deep TOML document with an equivalent YAML document you'll find that the TOML document is up to 50% longer.

None of that additional verbosity adds meaning - or readability.

Part of this is the extra syntactic noise but most of it is because all of the key names have to be defined explicitly above the key values for each value whereas in YAML you just need to put the values below an indent.

TOML is tolerable for small, very lightly nested config files but even medium sized TOML configurations get ugly very fast.

It's worth giving Ansible some credit here. Alternatives like Chef, can and do contain arbitrary Ruby logic which can make Chef "cookbooks" horrible to figure out and debug.

So while Ansible may not be great, other solutions also have drawbacks, and it isn't quite as easy as do <x>.

I've worked with both. The simple items are nicer in ansible but when there are a lot of custom roles and functions ansible isn't much better than chef from what I've seen in practice.

I really like TOML as a compromise between "looks pretty" and "parses well".

That being said, I find it funny that the author is crying afoul, saying that YAML isn't as well suited to be templated as JSON is, when XML had schemas sorted out ages ago.

People ran away from XML, saying it was this verbose, ugly, bandwidth hungry (back when we were sending XML over AJAX) behemoth (which it totally is)... but I think when your usecase is complicated enough to worry about templating, you should take a hard look at XML and ask whether it might suit your better.

Baffles me. I don't like any language or file format where whitespace matters. Even Haskell bothers me in this regard. To me white space shouldn't add cognitive load. I want to look at the symbols not the formatting of the antisymbols to understand what is going on.

Write JSON and use your editor tools to format it with nice indentation, and you are sweet!

That said Yaml makes an excellent format for reading, but not for writing.

I personally love Python where whitespace matters a lot too. But I regularly find myself fighting with the whitespace in yaml for some reason.

To me yaml (especially with templating) seems like a very contrived way of avoiding having to program, while still effectively programming. I much prefer json, or if more intelligence is required, actual javascript objects.

The only big downside of json where yaml does shine is the support for comments within your files.

White space matters with Python but I rarely run into issues with indentation in Python. But YAML on the other hand, I have nothing but nightmares.

I have the same experience. I just think think it is possible to design easy-to-write indentation-sensitive formats but YAML is not. For example is always baffles me that

    - b
    - c
has a list inside an object but the list is not further indented. There's in fact a hierarchy relationship but absolutely no indentation.

You can indent it though, but the hyphen already means a list item.

That is a preferred syntax for many that are used to YAML though.

The hyphen is associated with the a though. In what's posted, a is a list, not a map - the type of the parent is being declared by that hyphen.

That's kind of my point. Indentation is optional, so many don't indent, making the resulting document harder to understand.

FWIW, I think yaml / ansible linters and style checkers typically require indentation.

That's at your option, though. It's perfectly valid yaml to indent the list as many spaces as you want.

I think its because python has very strict and very simple indentation rules (a nested block must be indented, and the file requires consistent indentation. Is there anything else?)

YAML gives you options, or varies necessity somewhat arbitrarily on the structures, which is (marginally) good for reading, but a lot of headache in writing

Also because the longer the YAML file, the bigger your indentation gets. It gets exponentially worse as you go.

In python that’s been a non-issue, but I’m not sure YAML can reset indentation as python does with functions

I never quite understood this attitude. Not treating whitespace as meaningful means it's necessary to have a longer, syntactically noisier file.

That's fine if you're primarily concerned with computers exchanging data (JSON) but where readability and writability matters, extra syntactic weight is a headache.

I can't imagine writing JSON by hand but I write plenty of YAML by hand every day without issues.

Because deleting a tab could entirely change the meaning of the file, breaking everything, with no compile time errors. IDE tooling or linting might go somewhat to make me more comfortable with YAML.

> I don't like any language or file format where whitespace matters. .... To me white space shouldn't add cognitive load.

1. there is no language where whitespace does not matter

2. the very purpose of indentation is to ease cognitive load.

Now, you may not want it to be inflexible or mandatory but your argument needs elucidating.

Ok: I don't like any language of file format where the regex s/\s+/\s/g would change the meaning of the program except in parts of the program within string delimiters, except for the parts of string delimiters which contain literal code e.g. ins ES6 `Hello ${world}`.

So you prefer languages way that regex only appears to change the meaning?

Non-semantic white space can be deceptive. Semantic whitespace isn't.

OK, but I'd restrict the set even more to exclude voluntary indentation. I mean, you'd want that caught by style checkers anyway, even if the language didn't require it.

So we're left with the cases where you'd like compact code / one-liners but are forced to use multiple lines for language syntax reasons.

I'd say that that's a pretty small set of cases and long way from the phrase "I don't like whitespace sensitive languages" which you (and many others!) use.

I conceded it can be annoying in Python and I do wish there was an escape route sometimes.

Experienced programmers use decent editors to indent their code anyway but at least you don't see codes with indentation going left and right at random locations with Python.

The best thing that ever happened to Cloud Formation was the new YAML syntax.

Writing YAML is easy with a good editor like VSCode. Install the YAML code outline extension for sugar; works great for OpenAPI specs. YAML flow style offers some good options for keeping the file compact.

I'm all for good editor support, but you shouldn't need it to write a simple document. Complicated editors should be supporting tools at best, because you won't always be in a position to use one. The things we write should be simple enough to be written and understood by hand with minimal mistakes (then we can make it even easier in some cases using nice editors and tooling).

I don't think I've ever used anything under than vanilla vim for editing YAML files... with next to zero issues.

The problem for me is that the multiple ways to do the same thing result in something that's pretty opaque to clear specifications that aren't just very simple examples / structures.

Me too, except it's just always wrong the first time… drives me nuts. Maybe it's just me.

It's a lot easier for end-users to read and write by hand. If that is not a requirement, a simpler machine-friendly format is likely better.

Writing YAML is not easy for end-users. The indentation--especially in large files--is really difficult to keep track of, and some really basic stuff like when to quote strings is not consistent at all.

It definitely is. You have no idea how intimidating an equivalent JSON file would be for a new user creating a pretty simple k8s setup.

YAML hides most of the scary brackets and quotes so that new users can focus on copy-and-pasting semantic tidbits.

That said... I don't like YAML...

Nope, indentation is not a problem with indentation guides. A misplaced comma in a JSON file however kills it, gets me every time I have to write it by hand.

Also, it is true YAML has too many features, though I find they are typically ignored or disabled.

A misplaced comma is also not a problem in an editor with syntax highlighting, but between the two, indentation could take longer to fix.

Syntax highlighting doesn't count commas. The problem is JSON doesn't allow trailing commas, so adding or reordering frequently results in a difficult to diagnose error.

Finding out if a set ends in a comma or not still makes it easier to debug than YAML.

Not my experience, 10x more trouble with long JSON.

But the simpler format of JSON should be easier to write and read.

No comments, constant need to quote every string, inane comma requirements, no multiline string, little typing support.

Many language require strings to be quoted, and calling the comma requirements inane is an opinion, not an objective fact.

You're not saying anything about the readability of one versus another, you're just listing gripes you have about JSON.

> Many language require strings to be quoted

The languages we're comparing here are JSON and YAML. The latter does not require quoted strings, except in ambiguous cases. (And even then, actually, it technically isn't required, though it is usually the easiest thing to do.) The absence of these quotes makes the syntax get out of the way of the reader, and makes YAML comparatively easier to read.

> comma requirements inane is an opinion, not an objective fact.

I boiled it down to a simple statement, but disallowing a comma at the end of repeated grammars means that adding something to that line causes unnecessary noise in the diff. For example, say I add a single item to the end of a JSON list. The diff I would love to see is:

  +   c,
which makes it rather straight-forward to the code reviewer / reader of the diff that we're simply adding a single item c. But JSON's grammar forces this diff:

  -   b
  +   b,
  +   c,
which makes it harder to see what the semantic change is, because simple syntactic changes are now clouding the picture. Also, I find that people's mental model of the task ("add item to end of list") causes them to forget that they need to add a comma to the item above it, resulting in syntax errors down the road.

(Better diff tooling can help here, but often I find we have to work with the most primitive of tooling; if the grammar can lend itself towards such simple tooling, s.t. that tooling is more effective despite its simplicity, why not? And here, we can: grammatically, whether the list ends or does not end in a comma is rather meaningless, and JSON (and for a long time, JS) were pretty alone in this opinion. Most other languages allow that trailing comma. The only other one I can think of is SQL.)

> You're not saying anything about the readability of one versus another, you're just listing gripes you have about JSON.

The gripes I have about JSON don't apply to YAML. Hence, that's why I prefer using YAML, which was the original question.

> The latter does not require quoted strings, except in ambiguous cases.

I believe it is the right choice to always quote strings, there should be various quoting formats as no single one is perfect (normal strings with reasonable escapes,raw string, multiline-with-indentation, multiline raw strings...) and you should be able to use unquoted keys (TOML has a good balance on this problem), but unquoted strings as a default are unnecessary and problematic.

You post a comment describing one alternative as "simpler" and "easier to read" compared to another.

Respondent lists five specific reasons in support of the opposite conclusion.

You dismiss those as "opinions, not objective facts". Ignoring that the same could be said for your parent comment, to a much greater degree.

No comments is... fair. That does put a limit on the legibility of JSON files, unless you want to count dumb hacks like comment strings.

Having to quote strings in JSON, however is still simpler than the multitude of ways strings can be declared in YAML. You know a string is a string in JSON because of the quotes... knowing whether something is a string or not in context is more difficult in YAML because of the more complex syntax.

You can learn the entire syntax of JSON in minutes. Objects, arrays, string keys, and a few primitive types... that's it. How long would it take to learn all of YAML? How explicit is its syntax versus JSON? Of course JSON is far simpler, and being simpler, it's easier to read.

I dismissed "inane comma requirements" as an opinion, not everything the commenter said. The only reason it's "inane" is because the commenter doesn't like it personally.

Not having multiline strings, to me, doesn't affect readability much at all, although it is unfortunate. Turn on text wrapping in your editor, it's the same thing.

Typing support doesn't affect readability either. I'd like to have a date type in JSON too but

or whatever is just as readable as


Parent's points are exactly the reason I'd never consider JSON as config format.

Not when you take into account all the clutter in JSON: braces, brackets, quotes. Those are extra things to type and extra things to have to filter out when you read.

See sibling comment.

I think it is partly momentum - at this point most CI systems use YAML in spite of its insanity - even Azure Pipelines!

And partly it's because the syntax for multiline string literals is very minimal, which is kind of a nice feature for CI since you tend to have a lot of them.

It's still insane though. TOML is much more reasonable.

I've taken the strategy of emitting JSON, but accepting YAML input (maybe a restricted subset in cases where it's untrusted data). YAML can function as a super-set of JSON, so you can have comments in hand-written/modified data this way while emitting a simpler to parse data format.

Because json is poorly suited for config files, and it is very convenient to have config files that closely mirror apis, like kubernetes does. I've seen too many teams who try to use json for config end up developing yet another custom superset of json to allow comments or multiline strings or such. If you are going to use a superset of json might as well use an existing one like yaml or json5.

That said I agree with all the criticisms of templating yaml. We have to do the same (with helm and other tools), and I have pushed hard to adopt conventions that we only use flow-style and not block-style to avoid all the white space problems when splicing together chucks of yaml. And on the plus side we get trailing commas and other such niceties which don't exist in json and make it harder template.

Anyone working with cloudformation would wonder why we ended up writing json?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact