Hacker News new | past | comments | ask | show | jobs | submit login
Configs suck? Try a real programming language (beepb00p.xyz)
289 points by gyre007 on April 5, 2020 | hide | past | favorite | 343 comments



This just sounds like solving a problem at the wrong level and making it worse. If your config is so complicated that it warrants a full-blown language, the logic should go into the main program that reads the config and decides what to do, not left inside config.

Having a programmable config is how you end up with horrible interdependencies where function f() does foo, and you can't figure out why it's not blowing up, until you realize the only caller is dynamically configured in staging_env_4.json which is emitted by config_generator.py which always ensures the precondition for foo, but only if it's also generating prod_env.json at the same time, but it's OK because we always do that anyway, and you're not sure if you need another cup of coffee or whiskey.


> Having a programmable config is how you end up with horrible interdependencies where function f() does foo, and you can't figure out why it's not blowing up, until you realize the only caller is dynamically configured in staging_env_4.json which is emitted by config_generator.py which always ensures the precondition for foo, but only if it's also generating prod_env.json at the same time, but it's OK because we always do that anyway, and you're not sure if you need another cup of coffee or whiskey.

That sounds like the opposite of what the article describes. The point of having configuration as code is having all code related to config in a single file. There is no config generation, nor config loading.


> The point of having configuration as code is having all code related to config in a single file.

This makes no sense, because the program has to understand the config in order to use it. That means the code that defines what should be in the config and how it's structured has to be in the program, not the config.


No -- you're confusing generation and parsing. Your program is still (often, but necessarily) going to consume a static data structure. Properly handled, using code to generate a config at runtime can reduce the number of `if` statements and weird logic you have inside of your main program.

Purely as an example, let's say I'm writing a program that wants to consume a list of files in a JSON config.

  {
    "files": ["/path/a", "/path/b"]
  }
Now, let's say I want to pass in a wildcard pattern, or I want to exclude a specific directory, or I want to dynamically load the file paths based on a REST request, or some kind of other complicated crap.

If I want to have a static JSON file as the config, I have a couple of options:

First, I can put logic inside of my program to handle wildcards, and to distinguish between filepaths and URLs, and to have exclude paths. This is not ideal because it makes my program much more complicated, and it's not even complication for a particular useful reason. 99% of the time I'm not going to need those options, so I'm writing all of this code to try and anticipate weird inputs that most people don't need. It also forces me to make opinionated decisions about stupid stuff like, "what should the glob format be? What Regex variant are we using?" None of these are questions I care about or want to answer.

The second option, which would be a lot better, is I could have a 3rd-party program generate a config on the fly, write it to disk, and then launch my actual program when it's done. This means I don't need to have a bunch of extra logic in my main program, and everything is a lot less opinionated.

BUT, it also means that the logic to launch my program is split up across multiple files, and now I'm adding some weird dependencies to my runtime. It's a hacky solution that's layering a bunch of complexity on top of what should be a simple, "launch the program."

----

So, what's a third option? I write my config file with code, which my program will execute, and which will return the original static config that's nice to consume.

  module.exports = {
    files: [
      //some big list of files, maybe populated with a glob.
      //exclude any filepath that contains the letter 'c'
    ].filter((file) => !file.includes('c'));
  };
I still need the code to spit out an array of files at the end, but I can do that after making some network requests, or running custom filter functions on that array, or doing really whatever the heck I want.

In many cases this does not introduce any new dependencies to my runtime. It's a lot less error prone -- I can launch multiple instances of my app at the same time without worrying about files overwriting each other. It's also a lot cleaner and easier to refactor, because all of my filter functions and Regexes and generation logic lives in one file.

And most importantly, for users who aren't taking advantage of the dynamic nature of the config, I'm still ultimately shipping a smaller, simpler program with a less complicated API. I don't need to ship a bunch of error-prone logic dedicated to reconciling 20 different flags or options from random pull requests.

----

Now, taking this a step further, you don't necessarily need to return a static data object. If I'm building a static site generator, maybe instead of embedding a bunch of logic in my system to load/parse Pug templates, instead I just consume a function from the config that takes a string and returns a string.

And the reason that might be attractive (beyond what I list above) is that the programmatic APIs for a library like Pug are already just as short and easy to use as any set of YAML options you can come up with. So instead of forcing your users to learn your new APIs to slot in a custom template engine, they just use the APIs they already know and enjoy, and your documentation is shorter and easier to read, and everyone wins.

And again, importantly, getting rid of that extra logic has reduced the amount of code in your main project and made stuff like validation and loading easier to reason about and easier to debug.

The way you want to think about a config that returns a function is the same way you already think about functions-as-parameters in functional languages like Lisp or Javascript. The trick is realizing that there is no real difference between a config file on a filesystem and an API for a library you're importing at runtime.


> you're confusing generation and parsing

I understand the distinction you are making, but I don't think it's what the article is illustrating in its examples.

> I write my config file with code, which my program will execute

If you, the developer, are writing this code (for example, to provide some pre-packaged standard configs for different use cases, and the user just selects which one they want to use), that's one thing.

If your users are writing this code, you have a whole new set of problems that you don't have if users can only provide text config files in a safe format.

I can see providing a separate config editor program users can use to generate configs for your main program; you can write the code for that program to ensure that the generated config files are safe. But such a program would be generating configs separate from the running of your main program.

If your configs are really complicated enough that they need to be generated at the time your main program runs, to me that means your code needs refactoring. Either that, or what you are calling a "config" is really a plugin/extension/scripting system, where your users can write their own code and your main program is more of an interpreter for it. That's a whole separate use case that I don't think is well described by the term "config". Certainly I would say that's the case if your "config" can return a function to your main program.


>Either that, or what you are calling a "config" is really a plugin/extension/scripting system

based on Greenspun's 10th rule (Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of CommonLisp.), perhaps sufficiently complicated config system will contain a badly designed plugin/extension/scripting system inside of it.


> If your configs are really complicated enough that they need to be generated at the time your main program runs, to me that means your code needs refactoring.

I'm not sure that's true. In fact, I would go so far as to say for some programs, it's the opposite.

In the example I describe above (forgetting about lambda functions or anything like that) I'm shipping a program that consumes a small set of very rigid input options. I consume a single array of strings that need to resolve to file paths.

That's incredibly easy to validate: that's the type of limited input where I can be pretty certain that my program will not have any bugs or vulnerabilities around the input. It's also an incredibly simple API to teach to users, it's literally one option. And importantly, my program is still pretty powerful -- you can do whatever you need to do to get that file list without me needing to extend what input I accept.

I can't think of a good way to convert that to a pure JSON/YAML config without either making my software more error prone or making my software much less useful.

If returning lambda functions from a config file scares you, forcing the config to be a static object literal but allowing that object literal to be generated at runtime will still (often) result in a runtime that is less error prone, easier to validate, and easier to debug when something goes wrong. This is because at the end of the user's dynamic config, they'll still be returning a small, easy-to-understand object literal -- whatever logic was used to build it.

If instead you put that logic in your main program, and you accept globs for files, and you start doing complicated things like allowing arrays of multiple globs and file excludes -- well, suddenly you have a big chunk of parsing logic in your main program that end-users can't debug when something goes wrong. It becomes harder to validate whether or not you have some bug around the order that patterns are passed to you, or whether there's an error if you try to use a globbed filepath as an excluded filepath, or whatever.

In other words, it is usually better to use procedural logic to generate declarative logic than it is to use declarative logic to generate procedural logic. Of course, your program might be simple enough that you can get away with a config file that's turned into a bunch of `if` statements when its parsed. But it probably would be simpler if you got rid of as much parsing code as possible and focused on more rigidly validating a smaller, stricter set of options.

> I can see providing a separate config editor program users can use to generate configs for your main program

If you ignore the stuff about lambda functions, that's exactly what we're doing. We're just piggybacking on a real language to do it instead of wasting time writing our own graphical/command-line utility.

----

Where returning functions is concerned:

> Either that, or what you are calling a "config" is really a plugin/extension/scripting system

I think this is a kind of blurry line. I'm not going to say there is no difference between a plugin system and a config, but I have run into programs that have both configs and plugin systems, and where the plugin system exists mostly to deal with the fact that the config API is not very useful.

As a user, given the choice between learning one API and two, I usually prefer to learn one API. A lot of static site generators come to mind; I don't want to learn two different APIs for a templating engine just so I can change how links render. I don't want to learn both your plugin API and your config API. Just give me one, single API for templates that can be fully described in a single paragraph of text.

I dunno. I've been doing functional programming long enough that the idea of lambdas just doesn't scare me. Functional paradigms are wildly useful when building library APIs that get imported; I don't see why we should throw them away just because our API lives in a dotfile.


> I consume a single array of strings that need to resolve to file paths.

And I'm finding it very hard to imagine why your users would need to write code to give your program those paths, instead of just giving you a text file with a list of file paths in it. Of course I don't know anything about your program or its use cases other than what you've said here. But I do know that if your users are writing code, what you're describing doesn't look to me like a config. See further comments below.

> that's exactly what we're doing

Making your users write actual code is not the same as providing them with a config editor.

> I think this is a kind of blurry line.

I think the line is pretty clear: are your users expected to know how to write code, or not?

If they're not, then why would they need the ability to write code to generate your config, since they can't do it anyway?

If they are, then as I said, you don't have a static program with a config, you have a scripting language and an interpreter. Your program is just a customized version of the standard interpreter for whatever programming language your users are expected to use.

I don't have any problem with this if it's appropriate for your users. I just don't think the word "config" is the right one to use to describe it.

> I've been doing functional programming long enough that the idea of lambdas just doesn't scare me.

Whether it scares you or not is irrelevant unless you are the only one writing configs for your program. If your users are doing it, the relevant question is what scares them, not you.


Sure -- if your users can't write code, then you obviously can't tell them to use code to generate a config.

But take a step back and look at a lot of the examples that people are bringing up here -- Docker, Babel, Webpack, Github Actions, code linters, testing libraries, Ansible. These are all environments where it's pretty reasonable to expect users to know how to write code.

If you're mostly getting hung up about me calling this a config, let me see if I can rephrase the original article's point in a way that is more palatable to you:

A lot of people are building programs explicitly for programmers that use config files, when those programs would be much better suited by instead dropping the configs entirely and using something more like scripted options.

These program authors struggle building a config schema that is powerful enough for programmers yet simple enough that programmers won't get irritated learning yet another multi-page API. What the authors have missed is that they shouldn't be trying to control their program with a config at all. Their entire product is geared at programmers, so they should just let programmers generate the options that they pass in.

There's very little reason for Babel to be setting options with a JSON config file instead of a JS script, because nobody uses Babel other than Javascript programmers. For these specific programs that are struggling to express themselves with YAML/JSON, a non-config scripted setup would be less error-prone and easier for their users to learn.


Keep in mind that config files are not just about writing them. They are also about reading, and in my personal experience, when I encounter a heavily scripted config file in some javascript project, my inclination is to flatten it out to a plain JSON file just so I can see what it's actually doing.

This is the beginning of the logic that lead to things like Automake, and completely impenetrable makefiles. The problem with programmable scripted config isn't the writing of it. it's the reading and maintenance. Programmed config is a layer of obfuscation, that I, as a maintainer, need to spend time and mental effort decoding to get work done.


I'm sympathetic to this, and I suspect the original author is as well (see their comments on Emacs). I think you're making a very good point that people need to keep in mind when they write configs.

However, very specifically where Makefiles are concerned:

> This is the beginning of the logic that lead to things like Automake, and completely impenetrable makefiles.

The problem with Make in specific isn't scripting the build, it's that Make is a completely separate domain-specific language that isn't particularly well structured or particularly elegant in the first place.

The comparison I would draw is with Grunt.

Grunt is also a programmable config that allows you to do powerful stuff. It's also IMO a giant disaster and most projects shouldn't use it. Grunt is using programming to control how a project builds, but it's also tying tons of layers of abstraction into this secondary API that's unreasonably difficult to debug, and that lends itself to really bad, unnecessarily verbose control flows.

I was teaching interns how our build process worked at a company, and I had an "aha" moment when I realized that in the time I could teach them Grunt's API, I could also teach them how to write a maybe 100 line pure-JS script that did literally everything Grunt was doing for us.

When I look at Makefiles and Gruntfiles and cringe, very often the thing that makes me cringe are these giant pillars of abstractions and control flows of "program X calls program Y and pipes its output to program Z, and good luck intercepting that specific output to see what's going wrong".

There's something to be said for getting rid of these giant build systems that are spanning multiple processes and languages in general, and instead doing something very direct in one language in one file. And I kind of think that's a concern that cuts across different config types. At my current job, we're in the process of dropping Team City, because we realized that for our needs, we can get by with a single 30-ish line bash script that gets copied to each build machine and auto-run every 5 minutes. And Team City is using a lot of declarative options in a very structured web UI, but the Bash script is still a lot easier to debug and reason about than Team City's whole... thing.

Which is not to say you're wrong, I do think you're making a completely valid point. But I don't think the obfuscation you're talking about is inevitable, I think that's just a side effect of programmers thinking they're clever and trying to make generalized build systems for specific tasks.


it comes down to what exactly is the configuration for. Is it a few settings (server urls, ports, table names, s3 buckets, directory paths, usernames and passwords), or are you actually scripting the tool? Build systems almost straddle the two worlds - they're kind of somewhere between config and scripting. At some point you gotta make the call: if the configuration file is hard to write, check first if it's a problem with the tool the configuration is for. then second, check if what you're actually doing is programming, not configuring. This is repeating what others have wrote, but I solidly agree- at some threshold of complexity you've got to stop calling it "configuration", and start calling it a greenspun's 10th:

http://wiki.c2.com/?GreenspunsTenthRuleOfProgramming


Specific to your comments on Grunt: mostly agreed w your analysis; IIRC, I felt it was part of why Gulp kind of ate Grunt's lunch when it came on the scene. But that was a long time ago.


This is exactly why Gulp was such a refreshing change of pace from Grunt. Gulp was "just JS" APIs, and made it _so much easier_ to understand what the heck was going on during build.


> These are all environments where it's pretty reasonable to expect users to know how to write code.

For those kinds of environments, yes, I agree that it makes a lot more sense to just have users write code in the same language that the main program is in, than to try to hand-roll a config file format and structure.

I didn't get the sense from the article that its scope was limited to these kinds of environments, but I may have missed something.


move over templated yaml files, get ready for templated python files. I heard you like config files, so I set up a config file for your config file so you can config while you config.


You might think you are joking, but that's exactly what happened with sendmail. It's configuration file is so complex (and I think Turing complete) that people came up with a completely different configuration file to generate the configuration file.

Thank god I no longer have to deal with sendmail.


I was making a joke that you can either laugh at or cry at.


Real talk though, at the point where people are building templating engines for YAML files, doesn't it become kind of obvious that YAML isn't powerful enough or expressive enough for at least some of the projects that are using it?

We've got a project in Python, but Python is too hard to read, so instead we have YAML. But YAML is too hard to write, so now we have YTT.

We've now taken a program that was written in one language, and magically transformed it into 3 languages instead :)


> We've got a project in Python, but Python is too hard to read

You said the kind of scripting system you were describing was for cases where your users can write code.

If your users can write code, I don't see why Python would be too hard to read.

If your users can't write code, then, as you agreed, you shouldn't expect them to, which means the kind of system you were describing won't work. And that means you should rewrite your code so it doesn't need such a complicated configuration that it seems like writing code is necessary to express it.


Yeah, I was trying to joke -- if your users are OK with Python, then just let them use Python.

What I was getting at was that this is a scenario where concerns over Python users needing to actually use Python has morphed into those same users now needing to use 3 languages.


If you're templating YAML, it just means that you don't know that you can generate YAML. Dev ops is a very different world. Yes they can script, but that doesn't mean they will modify any part of your program, even a python config file directly. They will template it.

If you find yourself thinking that your config language isn't powerful enough, the solution is to either generate it, or to fix the program to not require a complex config file. There is a long term, 10 year cost to making config turing complete that I don't believe is worth paying. It'll save you time up front, maybe, but I certainly am not going to be impressed with having to modify your program just to "configure" it.

Finally, given that's the environment we're dealing with, from now on the rational thing to do would be to design config file formats expecting them to eventually get templated. Because that's how they get automated.


> Finally, given that's the environment we're dealing with, from now on the rational thing to do would be to design config file formats expecting them to eventually get templated.

But... that's kind of what a dynamic config file is.

If you're writing a JS program that imports and runs a JS file to generate an object literal for its options, you have just made a config file format that you expect to be generated, or templated, or whatever you'd like to call it. The only difference is that you're using using 1 language instead of 3, and your users don't need to have a separate step during build to generate the file.

People keep on circling around to, "but what if my user doesn't know the language or isn't comfortable with it?" And sure, if your user isn't comfortable programming, don't do this. But if your user does know how to program, don't force them to learn a new templating format, just let them write the code that they already know.

Or honestly, just play the joke out straight and build a template engine for your source language. I get that you're making a joke above, but a templating engine that spits out valid Python/JS code from valid JSON/YAML is trivial to write.

  var template_source = fs.readFileSync(process.argv[1]);
  var output = JSON.stringify(JSON.parse(template_source), null, 2); /* check malformatted input and prettify */
  fs.writeFileSync('config.js', `module.exports=${output};`);
So if you go this route and a few of your users end up wanting to template JSON/YAML anyway, then fine, they still can.


great now integrate that in all the automation tools that devops use, and train them how to use the integration instead of just templating the config file.

or put another way, stop trying to reinvent the wheel and expect that OTHER people will script the generation of the config file, in their own way using their own tools and languages, and just make that easier for them.

when devops are involved, the question isn't whether they're comfortable with whatever language, the question is how easy it will be for them to template the file, because they simply will not learn your special configuration language, whether they possibly could or not.

The point isn't about whether we should make the automation people learn a new templating language. the point is they simply will not. they will find where the relevant parts of the config file are, and they'll stick a ${my_variable} in there, process it using their own tool, whatever it is, regardless of how cleverly (or not) you designed the scripting language around the config. And now whether your config file is python yaml or json is just irrelevant. They will not use python to loop through a file list, even if they could, or know how. They will template that. all you've done by switching your config to python is create a breeding ground for script injection attacks.

So ultimately it's the wrong question. What I'm proposing is, design our config file expecting the ${subsititution_variable} will get edited in at some point, and you guard against script injection assuming that is comimng.


> great now integrate that in all the automation tools that devops use, and train them how to use the integration instead of just templating the config file.

I don't understand what you're getting at. This is just a 3 line executable you call when you're done with whatever templating process you want to use. Ship it with the program.

  node generate_config.js path/to/json/templated/with/literally/any/tool.json
But if you really think that a devops engineer will struggle calling that command, then fine. Put these 7 lines at the top of your program.

  var config = (()=>{
    try {
      return require('./config.js');
    } catch (err) {
      return JSON.parse(fs.readFileSync('./config.json');
    }
  })();
Now your engineers can stick a `./config.json` in and template it with whatever automation tool they want in whatever integrated pipeline they want to work with. They never need to acknowledge that runtime option generation is possible. They never need to run even one extra shell command.

So I just don't see this as a real problem. If your config script is ultimately returning a static object literal then there is no special knowledge with this setup that anyone needs to learn. All we're doing is giving people the option to script a config on a lower level in a real programming language. We're not inventing a new scripting language or toolchain. And it's still trivial for engineers to ignore that if they want to compile a config with a 3rd-party templating tool.

> What I'm proposing is, design our config file expecting the ${subsititution_variable} will get edited in at some point, and you guard against script injection assuming that is coming.

I would maintain that a config file isn't really where you should be handling security, that's something that should be ingrained into your program at a deeper, more fundamental layer.

But granted -- if for some reason your application can't be run in a secure unprivileged environment, and you need to accept untrusted config files within that environment, and you also can't run the config generation code in a secure unprivileged environment, then it would make sense to strictly limit a config file to a static format like JSON, where you know that no one else can inject actual code. But that's a very different concern than usability, and affects an even smaller subset of software than what we were talking about above. This is just not relevant for systems like Github Actions, Babel, Webpack, or Ansible, all of which are built on either the assumption that the config is trustworthy, or that the trustworthiness of the config doesn't matter because of sandboxing.

In most (not all, but most) cases, if you don't trust your user config, you're already at the point where you should be running your software in an unprivileged sandbox.


Yes of course you're right about all that. My only point is, it's a fact that devops add templating tags to YAML files. This is not theoretical. The templating tags are likely not escaped properly, so, aside from the worst case scenario of script injection, the more innocent situation of causing parsing problems is possible and likely.

It's worth very seriously asking the question of WHY devops are doing this instead of just generating the yaml file with code, which from our perspective as developers would be easier and more sane, and allow us to use the full power of a programming language. I think it's a more important problem to solve than ergonomics of config for the original developers, because end of the day, who will need to interact with this configuration file more? Developers or the people responsible for deploying the application?


+1; for me, this comment sums up my feelings on the entire (worthwhile) thread.


Here’s a real-world example: clangd, the language server, takes a file with the compiler flags it should use for completion. clangd works with C, C++, Objective-C, and Objective-C++; obviously, I would like to use different flags for each. Currently I get around this problem by running four “instances” of clangd, one for each language, each with its own configuration file. I wouldn’t need to do this if the configuration file was a script that could dynamically generate flags by looking at the source code and inferring what flags would be appropriate.


> If your users are writing this code, you have a whole new set of problems that you don't have if users can only provide text config files in a safe format.

If your users are writing this problem, you have a whole set of problems that aren't being solved if users can only provide text config files in a safe format.


> The second option, which would be a lot better, is I could have a 3rd-party program generate a config on the fly, write it to disk, and then launch my actual program when it's done.

The fraction of my dotfiles that is simply small scripts that “wrap” actual programs so I can do this is extremely high. In some cases I have resorted to code injection to accomplish this sort of dynamic configuration generation without putting a file on disk (yes, using fmemopen).


I have a nice counter example why "config files as source code" is a bad idea.

The emscripten SDK (compile C/C++ to WASM) writes a config file as python source file when it is installed. At first glance it looks like a fairly simple key/value config file, but it cannot be parsed by a 3rd-party-tool (for instance to extract the paths to the installed emscripten SDK) unless the file is evaluated with a full-blown python interpreter.

I don't understand how this can be considered a "good idea" under any circumstances ;)


The SDK could have a command line tool (or option) to dump the config in JSON format, if that config is meant to be consumed by other tools. I can come up with examples where config files are a bad idea all day long. Bottom line is neither is a silver bullet, and there are always trade-offs to be made.


I am very much of the 'just code it' crowd, with a few small allowances that basically add up to a subset of the 12 Factor App philosophy.

That said, the combination of SaaS and CI/CD makes that a bit of a challenge. Everyone wants to launch features darkly. You don't have to change your strategy to do that per se. You can move everything else into code or into a service discovery system, but the 'feature toggles' have to live somewhere, and they are essentially config.

If your CD system were ridiculously fast, you could just push a commit to turn something on and back off again. But I haven't seen many systems that are as fast as pushing a change into for instance Consul (even if you are using git2consul, that'd be faster than a typical CI/CD build)


Well, the idea is not to generate json files, the idea is to not use json files at all.

And even if you don't want to implement logic in config file, there are still advantages in having access to a full language.

For example: with a simple config file

  maxAttachmentSize: 33554432,
  maxTotalSize: 33816576
with a full language

  maxAttachmentSize = 32 * 1024 * 1024; // 32 MB covers 99% of use cases
  maxTotalSize = (256 * 1024) + maxAttachmentSize; // 256 kB of text + one big attachment
No logic here, just a more readable file. But that uses several things json doesn't support: comments, arithmetic and variables.


If you give a fully loaded programming language to a programmer to write configuration, It's almost a certainty that you will end up with a program in your configuration file. With great power come great responsibilities but unfortunately not all of us are super heroes.


You don’t need a full blown programming language for that, just a configuration language with some basic expressions.

Your example is fine, it’s certainly more pleasant than not doing it that way, but that’s not where the trouble starts: the trouble begins when you have conditions and loops and any non-trivial logic.

If you need non-trivial logic then it probably isn’t “configuration” and should be part of the normal codebase (and go through the same review and testing procedures).

I would also argue that if your system is so complex that you need a programming language to configure it, there’s something very very wrong.


If that's the best example, I'd almost count that as an argument against.

    maxAttachmentSize: 33554432 # 32MB
    maxTotalSize: 33816576      # 256kB + maxAttachmentSize
All the "full language" version is done is avoided me having to use the incredibly powerful calculator I'm sitting in front of to calculate some numbers for all of 5 seconds.

Programatically solving something which changes very infrequently isn't really a good case for all the other complexity and potential issues something like this would drag along with it.


> maxAttachmentSize: 33554432 # 32MB

That works until person 1 updates the value, but does not notice the comment duplicates it, then person 2 spends hours figuring out why he can't upload 30MB file, while he can clearly see in the config, that it allows up to 32MB, and his debugger tells him limit value actually matches that long number.


I really don't like JSON for configuration. I prefer TOML or YAML for config file definition. But I would also make sure my config item names are descriptive enough to where I know units from the name, e.g. maxAttachmentSizeBytes. Any other comments I usually prefer to put in my code that loads the config files into my program.


I’m a big fan of TOML. It’s not perfect, but as a configuration format, it does the job much better (imho) than the alternatives.


The other option is to pull in the one of many libs it there that can understand suffixes (or even write your own... Really not that hard).

As a bonus, your config is more readable.


With a configuration language one could use

maxTotalSize: 1MB

Or similar.


And then have to parse and translate that to a workable value in their program...precisely (part of) the problem the article is pointing out


I don't agree. Configuration files tell a program what to do. You want expressive power there. Telling a program what to do merely through values only makes things more indirect.

To give an example: you can get into the situation that some configuration values are only valid when configuration value X is Y, and otherwise other configuration values are valid. What better way to model this than through an if-statement in a programming language? This makes it immediately apparent which settings have effect.


I am involuntarily howling internally in anguish, and if the feeling could speak, it would be screaming "parse, don't validate" (https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...).

If some config values are valid when X is true, and other config values are valid when X is false, then use a different schema for the two cases. This is what discriminated unions are for: for representing conditionals statically, inspectably, serialisably!

If some config values are valid when X is 0.339, and some are valid when X is 0.340, and some are valid when X is 0.341, and so on, then… I don't know how to help, and maybe I must just avert my eyes in shame as I implement the dynamic logic. (But in that case it seems a bit odd to say you're "telling the program what to do" with this configuration; I'd say you're bolting on a little extra program at the start.)


Sometimes “valid” doesn’t mean “illegal to have in a configuration file”, it just means that “in this specific case, the configuration should have a certain value”. For example, on my Mac I should prefix my commands with “g” if I want to access GNU tools, while on most Linux systems I don’t need to do this. Trying to run gsed on Ubuntu would be “invalid” in this case but a parser can’t help me here.


Yes it can: if your config file is more declarative than simply "lists of command lines". The program can determine what is true of its environment, and can construct command lines appropriately given the data about intended outcome that is stored in the configuration file.


I can’t change the program.


I once wrote yet another JS static website generator. It incorporated exactly what you mention: powerful configuration. It was pretty awesome, each website could write their own extension functions and really enhance the way the generated HTML was formed.

In 8 months I couldn't understand head nor tails of the configuration, where and when certain functions were used, etc.

It was (and is) my greatest disaster in writing a static website generator. (And I'd written 3 pretty decent ones before that). FWIW here's the code: https://github.com/ergo-cms


I'm not sure why choosing a configuration file format like ini or yaml prevent configuration validation. This logic can be expressed within the program itself: parse the file, and validate inside the program that the configuration is consistent.

Also putting the validation logic directly inside the configuration file is not a good idea in my opinion for two main reasons:

* the validation logic is then "external" to the program and can be overridden (with potential catastrophic effects) by the operator installing the program.

* Configuration validation logic can be quite complex, and having it inside the configuration file would clutter it too much in many cases.

I tend to see configuration files as an API/Interface (maybe because I'm an SRE), and as any interface, I want it clearly defined, with known boundaries and as explicit as possible. For these reasons I'm not a big fan of using a fully fledge programing language as configuration.


> Configuration files tell a program what to do

The source code tells a program what to do.

Configuration tells the program about its environment.


it depends on the application. For things like build systems, ci systems, infrastructure as code, and even things like reverse proxies and webservers, the config definitely DOES tell the program what to do. And in those cases I think a full programming language makes more sense. For programs like say, a word processor, where the config really is just the environment, and is suffiently simple, something like yaml is sufficient. but please use a format that supports comments if it will be read or written by humans.


I should know more about those uses cases to say anything about it.

My hunch is that it would be good with a different term than "config" for those things.


> Configuration files tell a program what to do.

Configuration may be an overloaded term, but in my terminology telling machinery what to do is the function of the code. Configuration gives users, who in general have neither interest nor skill to follow the algorithm paths, some high level options to control the program execution. Thus config files need to be small, clean and well commented. Just my 2c.


Until you see the alternative. Wanna take a look at some configurations written for big Kubernetes cluster in a not at all full-blown language a.k.a "mysteriously templated YAML"?


I do that on a regular basis as part of my work, and they're honestly not that bad. Complex, sure, but largely due to the essential complexity of the problem domain. For sure, I don't see how mixing in some occult magic would make them better.


I agree they are inherently complicated. The point here is you lose your usual tools available for full-blown programming languages (e.g. tests, type checks or other static analysis), which makes it harder to assess the conf, leads to more pain.


It's shocking how little effort is put into developing static analysis compared to the insufferable amount of type system work that goes on

Here we're discussing destroying a data format and replacing it with a programming language because we can't be bothered to do static analysis on a data format?

LSP should have made this the golden era of static analysis, but instead everyone conflates type systems and static analysis so the industry gets myopic on typescript and friends

If you took static analysis out of typescript I wouldn't even give typescript a second look, meanwhile something like this where a given key should be one of a set we don't even consider how to do that without a type system

A data format doesn't have a type system and that's partly why they're great, type systems don't work across different programming languages but data does, data is a shared composable language

Data as a concept is more important than a type system and static analysis is more important than a type system


Sorry, but I don't get how you express constraints you want your static analysis to enforce in stringly typed "data format" without out of the "format" type system e.g. JSON schema. Mind giving some hints?

Static analysis is not magic, they need some more inputs other than your code. I can't say for sure but Rust's success told me at least some people are happy to write annotations specifically targeted at static analyzer (Lifetime etc).


Can you define "static analysis" here? It sounds like you're talking about data schemas otherwise, but if you apply them statically, the end result is a type system, no? (XSLT/XQuery would be one example of that)


> e.g. tests, type checks or other static analysis

You can do all these things for YAML or JSON or TOML or XML or INI or s-expressions or custom bytecode or butterflies or however else you're defining configuration data, too. It might not be present as part of the configuration language itself, but that's true of a lot of full-blown programming languages, too (especially the dynamic and weakly-typed ones).

If anything, a static configuration file should be easier to statically verify than a full-blown programming language because it's, you know, actually static. This goes out the window somewhat with YAML or XML (which do somewhat funny things), but once you have that hash table or whatever you've deserialized the config data into, congrats, now you can statically analyze it.


We use a lot of python and actually been thinking of writing up some python config middleware for YAMLs which are absolutely awful and I'm convinced no one, even core kube developers, could write one from the top of their head.


> the only caller is dynamically configured in staging_env_4.json which is emitted by config_generator.py which always ensures the precondition for foo, but only if it's also generating prod_env.json at the same time

That sounds specific enough to be based on a very bad day or week at the office. You have my sympathies.


Haha, thanks, it's not directly based on a particular bug, but the way configs are managed did push me several times to wonder "If I walk out, buy a bottle of whiskey, come back and start drinking it, will I get fired?"

To be fair, I didn't do it... yet. I have family, I need salary. -_-


I did devops at a Web 1.9 SaaS shop a couple of years back. One of the product teams asked me to host a retrospective for them one Friday afternoon. Little did I know it was a regular occurrence for them to supply the whiskey during said retrospective sessions.

Another project team leader from a company they acquired regularly paid me in bottles of whiskey when I helped his team hit integration milestones for their product, as the main dev teams weren't helping him.


When you work from home, the answer to that question is "No."


> If your config is so complicated that it warrants a full-blown language, the logic should go into the main program that reads the config

Exactly. Even the simple toy example in the article illustrates the problem: why is the Person class being defined in the config? If the program is going to use objects representing people, the code defining the structure of those objects should go in the program. If the program is not going to use objects representing people, why are you adding extra code to your config to define Person objects when you could just use a list of plain tuples?


I totally agree... also, this whole article makes it sound like this person is writing their config a by hand. I don't know what their setup is, but most config where I work are generated by one system and consumed by another. I am not writing these configs on my laptop. We have code that pulls data from a service and generates a config for other systems to use. Having that system output executable code seems to be asking for complexity, security vulnerabilities, and unexplained behavior.


Lots of things pretty much require dynamic config. Nontrivial docker compose files, Dockerfiles (if you want a static include mechanism, for example), Kubernetes configs, CloudFormation templates, Terraform templates, etc. They need to be dynamic and most of those technologies have one or more popular and hacky solutions for introducing that dynamism (helm charts use ninja templates, CloudFormation adds an AST son too of YAML and drives users toward insane macro lambda functions, terraform keeps extending its “config” language, etc). I’m fine with those platforms preferring a static view of the world so long as there is a sane way to generate the configs dynamically. I would actually prefer a strictly static interface a la Kubernetes config YAML and other targets can generate the corresponding kubernetes configs. The best time I’ve had has actually been using Python to generate this YAML. I even use type hints so I can statically type check my configs, which isn’t something I can do with vanilla YAML (barring some grotesque jsonschema-esque monstrosity).


Didn't read the article, but would like to add that there are already programs used and loved by many people, which are using a programming language for configuration. I know just the following, but there are probably many more:

- Emacs which uses elisp an embedded Lisp interpreter to configure but as well to implement new functionality.

- XMonad, a tiling Windows Manager written in Haskell and it is configured inside the program itself, i.e. you edit the main program.

- Guix package manager (and GuixSD, the distribution) that uses Guile (a scheme) to describe the programs to be installed and their configuration. This case is interesting because Guix is using the same concept/functionality to manage the package dependencies which they took from the Nix package manager, but in contrary to Nix decided to use the general purpose scheme programming language instead of a special configuration language.


Ennnnnnhh... I disagree. I've spent a lot of time with Django and it's configuration files are just vanilla Python. It's really great. I especially like being able to add just a little bit of logic here and there.


Keep your configs small, but write them in real programming languages.

You're dismissing a viable solution that works extremely well. Take Elixir for example, which is configured using Elixir

https://hexdocs.pm/elixir/master/Config.html

Its simple, powerful, and allows you to make it as complicated or not, as you want.


> the logic should go into the main program that reads the config and decides what to do, not left inside config.

SAP has some 3,000 database tables that are all configuration tables. Consultants are paid $200-$250 an hour to do nothing but tweak all these settings.

It sucks that SAP was designed this way but god damn its an easy way to make bank.


was going to downvote as I use config.py everywhere -- but your second paragraph made me laugh (due to familiarity) so you get the upvote.

I suggest that all solutions to the config problem gravitate towards major suckage, but creating new micro-languages like yaml just seems like adding more complexity for zero benefit.

The advantage of config.py and config.js is that anyone claiming to be a programmer can read them and immediately grok exactly what's going on.

That said, starting now I would go with json as it is the lingua franca that every language has to understand -- and it is not a full programming language; you can't do conditionals or loops etc. Put that shit in your code please! Logic should never be platform-dependant.


No it is usually not the case.

There is a real difference between declarative and imperative approach to configuration and such decisions can be traced in many projects.

Simplifying things: declarative configuration is usually the best approach if you manage to capture most of your component behaviour in some relatively simple language with limited expressivity.

Imperative is where you start mixing some code in. This is usually because of big amount of corner cases and where flexibility is a must. This is seen quite often in build tools: it’s hard to just configure everything, some projects need to be built using really custom pipelines.


I would take that a step further and say any config is itself an inherent form of complexity. Some are more complex than others. A simple approach is no config, as in a default working state that executes when called. If there are optional considerations to produce any working state the application is inherently complex, as in there are multiple working states of which none are the default.

To be clear complicate means to to make many while the word simplify means to make fewer.


>the logic should go into the main program that reads the config and decides what to do, not left inside config.

if i follow your logic i should replace the 5 if debug statements in my config.js with 50 littered throughout my codebase. Also i lose strong typing and in line documentation.

edit:ill add im against emitting json from a config script, that seems like additional complexity. just reference a file, you can use json if and only if you need program interop and reference it from the config script


If you really need to process the configuration in multiple places throughout the app, then you could put that processing code in a service class and put the debug statements there.


Thats essentially what config.js does, why split up configuration in multiple places?

With scripting languages theres no reason to use json for configurtion unless the config is shared between programs. The main reason config files exist is so you can edit the program without recompiling, i feel like people forgot that and kept blindly using json when js or ts is better. My config file is just like 20 lines but putting the extremely simple conditional logic of if its debug if its client if its server there reduces program complexity by a couple hundred sloc.


>The main reason config files exist is so you can edit the program without recompiling

This isn't generally the reason. The main reason config files exist is so you can run the program in in a different state easily. State differs across environments, code should not. This is why you have a config file for prod and test and dev, but you shouldn't have different code for prod and test and dev. You still have to rebuild the app for those environments. See: https://12factor.net/config.


>This isn't generally the reason. The main reason config files exist is so you can run the program in in a different state easily

My ide lets me f12 to my config.js file and i can use editor hotkeys like comment toggle so by this metric my method is superior.

>You still have to rebuild the app for those environments.

As i said above you dont have to rebuild for python and javascript since they can load the config file on the fly. I use xml or json for compiled languages because beign able to change the program without recompiling is of critical importance.

>you shouldn't have different code for prod and test and dev.

You dont have different file for test/dev/prod, you have 1 config file that has a few inline if statments based off of runtime args. For me 1 file is less confusing than 2 or 3.


Well, yes, admittedly it's true that scripting languages do have this unique advantage where you can basically hardcode the configuration in the program code itself and still have it be flexible enough to be called a configuration file. In a situation like that, I agree, it might not make sense to break into two files what can just be done in one.


Definitely this. Configs reflect the architecture, of there is a lot of complexity that means the architecture is overly complex and can likely be cleaned up.


Hmmm... I agree with you, except...

I think there's a divide between developer and user that could be bridged with a config.

Many, if not every piece of software in the world is crippled by what the developer allows you to do.

One example is the systemd config files. they are so brain dead that everyone has to go write their own logic to handle anything remotely flexible.

On the other hand, they're a clear limited user interface to systemd. bleh. who knows.


Completely disagree here — that's how you get extreme program bloat.

Lets take this scenario. I want current weather value be part of config value. This could be pulled/scraped from online source in 2 lines of python. Alternative as you imply every program should have some complex abstraction for scraping now? That's just silly.


Why not just let the program get that value? If my config has side effects I wouldn't be confident that my program would run the same each time. To me it seems like this would lead to poor reproducibility.


How would the program get the value? It has to pack all of the scraping frameworks and abstract it so it could be interacted enough for text configs? Then it would have more scraping code than whatever program intended to do.

> To me it seems like this would lead to poor reproducibility.

I'm not sure how reproducibility matter here. You're not writing tests - it's a program on your computer for your personal use.


> and you're not sure if you need another cup of coffee or whiskey.

The answer to that is always an Irish Coffee.


What if it's config for CI (i.e. Jenkins)? In that case your "config" needs to support branching ("build this if master else build that").


I don't know Jenkins, but my take is that "DSL is good, dynamic full-language config is bad."

If you need a fully programmable environment, then you need a fully programmable environment. The important thing is that the DSL is a well-defined contract between what you specify and what the underlying software does, so if you say "If X then do Y", then that's exactly what it does. You should be able to follow the DSL line-by-line and debug it.

(You've probably noticed that I'm not a fan of declarative languages a la Makefile: you can never figure out where things "start" and "end".)

A "full language config" (like Python) is much worse than DSL: usually there's no clear semantics about when any snippet is executed, yet it's powerful enough to bring in the whole outside environment, which is what's going to happen in a team with 5+ people.


CI systems tends to be more about defining a workflow than setting simple parameters, specially if they also handle CD.

For that reason, something looking more like a programing language feels more appropriate to me.

I kind of like the Jenkinsfile syntax for this use case as a CI workflow is not a static definition defined by a configuration file (yaml, json or ini) but the syntax is also not a fully "open" programing language like Python which would authorize too many potential hacks.

A specialized DSL in this case seems more appropriate and being the correct balance between flexibility and complexity, at least for most needs.


As the article mentions, there's 'real' programming languages made for configuration, that solve at least some of the issues outlined, like Dhall, Cue, Jsonnet. After using both approaches a number of times (a general purpose language vs. one of the three), I'm imploring anyone trying to give this a shot to _not_ use a general purpose language.

For instance: I'm mostly familiar with Jsonnet, which has guarantees that make it much easier to use than a Real Language: no arbitrary file loads (paths must be static), no side effects, no ambient state (like env vars or Internet). Relative imports make it very easy to drop in the language somewhere in a repo and not have to worry about venvs/PYTHONPATH/system libraries... The interpreter being a single binary and a thin C library also simplifies integration quite a bit, compared to interfacing with full-blown interpreter/compiler installations.

The downside is, of course, that you must learn a new language. But we're all pretty good at this (unless you're one of those people obsessed with using JS/npm everywhere for some inexplicable reason). Also Jsonnet still has the pedigree of being a superset of JSON, which is painful (ie. magical int/float number type). But the alternatives (Dhall, Cue) fix that, and also provide a vastly better type system.

I've been using Jsonnet in production for a couple of years (after leaving Google where I Saw The Light with BCL), and couldn't be happier. The infrastructure of the Warsaw Hackerspace (a production k8s cluster) is all brought up using Nix and Jsonnet and is open for all to inspect [1].

And yes. Let's please stop using JSON and YAML for everything. Even Python is better than yet another YAML templating tool.

[1] - https://gerrit.hackerspace.pl/plugins/gitiles/hscloud/+/refs...


Very interesting, thanks for the links!

I have a different approach. When I'm deciding about a new technology that we could add to our tech stack, I ask myself if it's really worth it. Adding a new tech is very expensive because it adds to the list of things that the developers have to be good at. I would avoid adding a configuration language and prefer using the same general purpose language that the rest of the software is written in.

At work, we have a C# ASP.NET Core application we build.

- We use Pulumi's C# API to deploy to the cloud. This lets us avoid yet another YAML based config format, and it lets us edit the deployment in a familiar language with a pretty simple API in our favorite IDE. (Link: https://www.pulumi.com/)

- We're going to move to Nuke instead of having Azure Pipelines YAML files. One less YAML file format to remember. (Link: https://www.nuke.build/)

- We're going to do benchmarks with BenchmarkDotNet, to keep our code in C#. (Link: https://benchmarkdotnet.org/)

- We're going to be adding Spark-based analytics, and I'm undecided about whether we should use Python or the C#. Python is so ubiquitous in analytics and there are so many libraries that it might be worth adding another language to our system. (Link: https://dotnet.microsoft.com/apps/data/spark)


Yes. I don't like that it a) uses imperative, general purpose programming languages b) requires keeping an external statefile for deploying prod.

EDIT: parent originally asked 'we use Pulumi, have you heard of it?'


Yeah that is a pretty big downside to Pulumi. That's something it shares with Terraform.


Large parts of Pulumi actually use Terraform providers.


What is the downside of storing state for external services vs using configuration to attempt to recreate everything each time?


Same here, but stuck with json/yaml, etc. What are you thoughts on starlark for configuration language? Imperative, but with lots of functional/declarative bits - like frozen vars, no recursion, sandboxing, execution in parallel... I've also looked at jsonnet, even contributed one or two patches to speed it up in C++, but still felt the evaluation time was higher (the "go" version is faster than C++, but they (last I've checked) used slightly different evaluation rules, also C++ would get slower in bigger app due to the overuse of dynamic_cast (at least that's what I've found)).

Yeah, I'm also trying to find the magical bcl/gcl - but what was so useful (when I worked at google) was the diff tool (was it written by an intern, it was amazingly useful) - e.g. your presubmit would tell you - how the end (evaluated) result would look like, as one change in one place might have quite the ripple effect.


> What are you thoughts on starlark for configuration language?

I haven't tried that yet. It might work. I mean, I've used starlark as _a_ configuration language (notably, in Bazel, or as a little prolog-standin for defining complex rules), but never really built an entire infrastructure on top of it. I would have to look into how it works my usecase (can't lie that jsonnet having an existing third-party kubernetes library was a big seller for me, even if I end up extending it). I do prefer, however, functional/lazily evaluated/declarative languages for configuration - while starlark has a very imperative taste to it.

> [...] but still felt the evaluation time was higher [...]

Yeah, that can be painful, but has not fully bitten me - my configs rarely get large enough to notice, and even if they do there is a large bottleneck somewhere else (ie. actually applying the config). Definitely also have it on my backlog to look into sending some patches for that.

> was the diff tool

There exists something like this, kind of, for when you're using jsonnet to emit k8s resources: `kubecfg diff`. But that's for diffing against prod, not an evaluation against a previous VCS tip. Definitely a tool that should exist, though :). I do think Dhall is a bit further down the road of having this sort of tooling actually exist upstream.


skycfg is doing this: https://skycfg.fun

I'm not convinced - the ergonomics aren't great, it does not support CRDs, is mostly untyped, and you end up unnecessarily encapsulating everything with dozens of function parameters since you cannot easily merge nested data.

Starlark is great for configuring an imperative tool like Bazel which is only declarative at the top level, but is impractical for pure configuration data. Having the full power of a general purpose language isn't great for configuration - it quickly leads to lots of complexity at the wrong layer.

Cue has a radically different approach[1] that works very well in practice. The author of Cue, mplv, wrote the first version of bcl/gcl and Cue is the spiritual successor of sorts :-)

It's still early, but many people (including me) already use it in production. Their Slack is very friendly and helpful!

[1]: https://cuelang.org/docs/usecases/configuration/


Cue gets a lot of things right (strong builtin validation, strong typing), I haven't explored it much, but I feel that a language that bans inheritance mixin-inheritence, and uses essentially typed structs that can have computed properties, and late-binding, gets most of what you want. Something like this could be put onto starlark/skycfg fairly easily, I think.


> no arbitrary file loads (paths must be static), no side effects, no ambient state (like env vars or Internet)

This all sounds like it should be a part of the sandbox, but it's not clear why it actually requires a whole new language. Many existing runtimes can be sandboxed in a similar manner. So why not just sandbox e.g. JS for JSON?


That is really cool - as chance would have it, I've just spent this weekend converting all my personal infra stuff to Jsonnet and I'm having a blast! This is some really nice inspiration.


BCL is something that's new to me. Anyone know what it stands for?


Borg Control Language, as others have said, Google's configuration language for "Borg", the precursor of Kubernetes. Both Jsonnet and Cue are descendants of BCL, developed by people who were previously involved in BCL development. Jsonnet is a direct descendant of BCL, "BCL done right". Cue is an indirect descendant of BCL, you might say "BCL rethought".


Imagine Jsonnet, but internal to Google, old and quirky. It's not public, but the name has been leaked for a while now.

Edit: here's a thesis that works with a related language, GCL, with code samples: https://pure.tue.nl/ws/portalfiles/portal/46927079/638953-1....


"Old and quirky" is an understatement. It's a god-awful mess, with semantics so murky that people end up just copying patterns and hoping that it'll do what they want it to do. And it can't be killed, because every bit of those implementation-defined semantics are used by someone.


up.up.up.up


Borg Control Language


borg config language, IIRC.


(author here). Thanks for the links, I'll add them!


Under no circumstance would I allow my projects to be configured by a full-blown programming language. There is no way to ensure that there are no regressions under different configurations. The amount of complexity introduced just for configuration is insane.

If you feel the need to put code in your runtime config, that means your design sucks.


But configuration is a complex task! At least from my experience. Limiting yourself to solving it with mediocre tools (for instance language that disallow writing tests or even comments!) is a footgun, where you disregard good engineering tools for stringly types, duplicated configuration.


Configuration is complex if you use it for complex things. That's just one way of designing software, it's not necessary to make software so extremely configurable that you need a DSL to configure it. Even if you really feel you need that level of configurability, then the software could be designed as a library instead and users could consume it with whatever language they prefer (where bindings are available). That would basically be the same amount of complexity as a configuration DSL, except you could use all your standard project management tooling to manage it.


We might have different target audiences of our work, but in my line configs should not require tests. Because their task is to adjust the functioning of an application. If they all of a sudden require tests, that means (at least to me) that it's not part of the configuration anymore, but a part of the application proper and should be treated as such.


For me the target is mostly verbose YAML configuration of existing software (which I can now emit from an in-memory object instead of templating and having to remember all the YAML quirks) and k8s manifests. Especially when I have to stamp the same piece of software multiple times in multiple deployments (environments/regions/etc).

For instance: I just brought up a new Factorio game server yesterday, just by adding a single line to my jsonnet configuration machinery: https://gerrit.hackerspace.pl/c/hscloud/+/246 . This in turn brought up a bunch of new k8s resources, including some persistent volumes, a deployment, and a loadbalancer service, after running a single `kubecfg update` command.


I agree, that YAML is usually a mistake.

In your scenario, it's more like provisioning of infrastructure and in that case - configuration is the code, so I can see where you are coming from.

I mostly write financial software so the use case is usually closer to classical single machine applications. And in that case, I prefer simple TOML.


And what about "configs" like Kubernetes deployments?


You could use the Maven approach: for whatever the declarative configuration isn't powerful enough to specify, you write a plugin. That plugin can then be developed and managed just like any other software project, rather than having to bring those tools into the configuration realm.


Wait. In this case aren't the plugin itself "code in runtime config"?

edit: Now I understand that being a plugin it must have proper abstraction, but ad-hoc logic do exist.


I guess it depends on how you look at it, but if the plugin is a self-contained module that can be managed independently from the app it's configuring, I wouldn't really say it's code that's "in" the configuration.

> ... but ad-hoc logic do exist.

True. I don't disagree, no tool is right for every job.


Personally, I group these use cases under "infrastructure provisioning". The use case here is completely different - for example, you won't have a non-developer modifying the configs. So in this case, automation is a good idea.


> But configuration is a complex task!

Should it be? If an application doesn't have configuration, then the logical thing to do would be hard code all the values it needs in the code itself. This could be done with a number of variables/constants defined in the entrypoint method for each value needed, or defining a data structure that contains all the values needed in the same method.

Reading a config file and serializing it into those variables or data structure shouldn't be a hard problem. Formats like YAML or json allow for deserailizing the string into a data structure (though you have to add your own validation). ini format files can be done the same way in a loop, but still require adding validation.

The validation could be based on data type or allowed values. While that's rather tedious, it's not complex.


> If you feel the need to put code in your runtime config, that means your design sucks.

I would look at Terraform and beg to differ - they started with a simple stringly typed language and had to build a typed language. Still no support for conditionals, so everyone has to abuse the only bit of control flow on offer - ternary variables and counts.

CircleCI is just as bad - it's a nice yaml format until you want a conditional step in workflow that's conditional on something other than the branch name.

It's very risky to settle on a static config at design time since the users will find a need. How would you cater to that user without offering a programming interface to configure it?


> There is no way to ensure that there are no regressions under different configurations.

No, but static config files won't give you that proof either. Unless I'm misunderstanding your point?

At least if I write build scripts in e.g. ts-node I can use the type system to provide some modicum of safety, which seems like a plus.


Proof, no. But ensuring it in terms of manual labour is far easier with a static file, than debugging a convoluted script pretending to be a config file.


That’s why my personal preference is to keep the config file as code but to keep it as close to a static config as possible. The Turing completeness of the environment is there merely as an escape hatch, for the one branch you need to make.

If your tooling requires static configs, but you really need dynamism, you’re stuck with dynamically generated static configs, which are even worse than a dynamic config.


As I've mentioned elsewhere, I agree, but I think in an ideal world, there would be an explicit syntactic construct warning everyone where the program was using the escape hatch to leave the Turing-incomplete subset.


You could create and use an embedded DSL tailored specifically for your application. This is what Jenkins does with their "declarative pipeline": it's basically Groovy-based DSL, and if you need anything not covered by it you need to use `script { ... }` element.

The problem here is of course how not to make your DSL suck, which Jenkins' did not avoid. But if you're lucky, your language of choice already has a few libraries implementing embedded DSLs for configuration, and it's quite possible that at least one of them is actually usable.


If you just want to use the type system, why not just deserialize your JSON config into a Plain Old Object? At least in C#, you can mark a property as required and the serialization will fail if a property is missing or the wrong type.


That depends of the scale.

See ansible and the YAML config files.

In this case, a real language would have been a better choice.

Also, sometimes you need to generate config files. In that case, templating is not always a good choice.


Then again, we have Chef (ruby DSL), and from experience I've found that they tend to turn into a mess that very easily can become hard to reason about. Although, I'm not sure if this is because of the design of Chef itself, or just simply because it uses a real language.


> Under no circumstance would I allow my projects to be configured by a full-blown programming language.

Well, the choice is Java, or Java and yaml.

> The amount of complexity introduced just for configuration is insane.

Again, it's Java, vs. Java + Spring + env + application.yaml + application-$profile.yaml (in my case usually).

Once config constants enter your language, they're type-checked. Tree-structures suddenly need to be well-formed. No more trying to remember if you force https-only with 'enforceHttps', 'enforceHTTPS', 'enforceTLS', 'enforceSSL'. (= True,= true, :true, :1, :on, = "on")

> no regressions under different configurations.

It's great for that! Instantiate three copies of your service from Main - say, one backed by a hashmap, one connecting to a live db, one connecting to your docker-compose db, etc. Run the the same set of commands (in the same language your tests are already written in) and assert the outputs.


So to change a configuration option I need to recompile the project?


My flow is: 1) I think about changing it 2) I change it 3) I figure out some way to verify it 4) I deploy it to prod

I don't want to do step 2 unless I can do step 3.

It just so happens that type-checking happens during recompilation, and testing happens after.


At least one large and successful tech company uses Python for configuration: https://muratbuffalo.blogspot.com/2016/02/holistic-configura...


As well as every company using Django (the most popular Python web framework)...


Just because Django is popular, doesn't mean that companies using it know what they are doing, or have weighed all pros and cons. I've seen some outright tragic Django setups with major companies.


> I've seen some outright tragic Django setups with major companies.

Well, the same thing can be said about pretty much anything. I've seen good and bad Django setups, so what?. The point is configuration as code works well if you know what you are doing. A config minimal language (toml, yaml, etc) may also work well as long as you don't need something that's not supported by it, and as long as you don't need to debug it.


Frameworks that encourage convention over configuration provide a better tradeoff.

If you are willing to accept the defaults you aren't required to write any configuration.


Every company uses JavaScript too but let’s not pretend like it’s a top-tier programming language.


Glad it works for them. For me and my team it doesn't.


I find general-purposeness a great idea if your specific config depends on something complex, e.g. the username/host/path where the code is running. That's quite hard to in a confgiuration-specific format, without anticipating this specific scenario in advance.


> If you feel the need to put code in your runtime config, that means your design sucks.

Hello from every plugin system ever. True fact: not all configuration is basic.


I wouldn’t call it configuration.

And usually these things are implemented with way more forethought.


> I wouldn’t call it configuration.

I actually agree with you. Elsewhere in the comments I say that the OP article failed to name this properly: https://news.ycombinator.com/item?id=22788696

But it is what the article is really discussing.


Generate static configuration, effect only what's in the static configuration and persist it to version control. You get the ergonomics of a programming language while getting a diff if anything changes and get it code reviewed. Could even write unit tests if you start to get unintended changes.


That seems like an acceptable solution to this. I'd still prefer not to be in a situation, where configuration is so complex and dynamic, that I need config generation, but if one does land in such a situation, I think this is the most sane approach.


What about .bashrc, .vimrc or Makefiles?


I mean the "rc" in those names stands for "run command". To me they are initialization scripts.

Also, Bash is a scripting environment (sort of), and VimL is a solid part of vim. These are scenarios, where you don't add undue complexity or bugginess to the software. But personally, I wouldn't kid myself and pretend that I can write applications as solid as bash or vim.

Makefiles are just bash scripts with dependency resolutions. So again, scripts and not configuration.


Makefiles aren't executable - they are fine. .bashrc is badly designed. Bash should have a proper non-executable config system. But... it's Bash. If you're expecting sane robust design from Bash you haven't been paying attention!


Makefiles are executable, with their own crappy ad-hoc Turing-complete language. For example, here’s the full list of functions supported by GNU make:

https://www.gnu.org/software/make/manual/html_node/Functions...

Naturally, trying to do anything even slightly complex with it quickly leads to an unreadable cacophony of dollar signs, backslashes, and soft and hard tabs (you need both!). Just like the modern YAML-based systems, Make’s language wasn’t originally intended to be a full scripting language, but had scripting bolted on because people needed it. If only it had chosen to embed a real programming language instead, you could have the functionality without sacrificing readability.


It's not an ad-hoc turing complete language, it is just a programming language. Makefiles are just programs. There are some strange implicits, but that isn't unique to the Make programming language.


It's Turing complete because all standard programming languages are Turing complete (modulo some technicalities which are not relevant here).

It's ad-hoc because it was built gradually on top of the original Unix make, which supported rules and variables but not any of the more advanced features. This history explains many of its quirks and relatively extreme limitations as a programming language, such as:

- Expressions must be all on one line, except within `define` blocks or if you use backslashes to join lines together (which tends to result in a lot of backslashes).

- Hard tabs are required to introduce commands in rules but are (mostly) banned outside of them, so if you want any kind of indentation for an if block or user-defined function, you need to use spaces instead.

- User-defined functions. Calling them is merely more verbose than calling builtin functions (you need to invoke the `call` builtin function). Defining them is worse: function parameters are numbered rather than named, similar to a shell, but in a shell you can reassign them to named variables, whereas Make's expansion rules make that awkward to achieve.

- And of course, no types other than strings (a limitation partly shared by shells, but most shells do have arrays, even if they're awkward to use).

Admittedly, most of that gradual evolution happened a long time ago, and GNU Make has been relatively stable as a language since then. It's not "ad-hoc" in the sense that it's constantly changing or ill-defined. But that stability also means that it never outgrew the limitations of its original design.

Some research I did for fun:

The oldest version of GNU Make I can find [1], from 1988, already had a handful of functions, including `foreach`, `filter`, `patsubst`, etc., as well as support for multi-line definitions (`define`/`endef`). `call`, on the other hand, didn't appear until 1999. Amusingly, its initial implementation came with a comment [2] bemoaning the lack of "nested lists and quotes", and even semi-seriously proposing that GNU Guile (Lisp implementation) be integrated into Make, in order to let Makefile authors use a 'real' programming language. No fewer than 13 years later, that proposal actually became a reality; unfortunately, it was an optional feature which distributions tended to leave disabled, ergo Makefile authors could not rely on it being present, so the feature has seen approximately zero use.

[1] within https://gcc.gnu.org/pub/binutils/old-releases/binutils-1988-...

[2] https://github.com/mirror/make/blob/c4353af3f9b143b213644c09...


Glad to see Dhall (https://dhall-lang.org/) mentioned here.

As we seem to be heading towards immutable-'everything' though, sometimes I wonder how valuable dynamic (in the sense of picked up at start) configuration is for backend services. It seems preferable to build everything, config included, into one single artifact. Potentially this gives more optimization opportunities as a lot branches could be completely eliminated (feature flags, etc.). No doubt the simplest way to build your config directly into your code is to just.. have it be code.

A potentially super annoying trade-off is that it's difficult to test minor configuration tweaks, and the annoyingness scales non-linearly with your build time.


Something that really annoys me about that intro:

No, I can’t identify what’s wrong with it. Why am I spending what feels like 5 minutes studying a code sample to find a flipped ‘I’ and ‘L’?

Every time I read about dahl, I remember this and it sours my patience for the language.

It’s a good point, but there’s got to be a better way of communicating the value. Or, maybe just having a button that says “spoilers” and reveals the issue.


I couldn't figure it out either! But now that you tell me what's wrong, it looks completely obvious. And I agree that it turned me off on reading the rest of the page.


And even more annoying if you would rather not restart your services unless absolutely necessary (ie. you're handling long-lived TCP connections that cannot be handed off gracefully). Not to mention that some kind of configuration tweaks (ie. ACL changes, quota changes, etc) might happen at a vastly different cadence than software rollouts.


> if you would rather not restart your services unless absolutely necessary (ie. you're handling long-lived TCP connections that cannot be handed off gracefully).

If you have no way to hand off your TCP connections gracefully, that is probably not a great position to be in because computers are unreliable and sometimes you need security updates. One approach I tried experimentally for a TCP server (but unfortunately did not have the opportunity to try in production) was to have a small server acting as the frontend that handled sessions and called into the real implementation which was essentially stateless. This small server could theoretically support self updating without dropping connections, if need be. (The main reason I was doing this was so that the rest of the stack could be dynamically-scaling based on Kubernetes, while this part remained relatively static.) I think anytime you have forced statefulness, it’s worth isolating as much as possible so it doesn’t constrain the design of the whole system.

(Even better, of course, is to just have a protocol where clients can gracefully handle draining and reconnect without side effects.)

> ACL changes, quota changes, etc

At a certain level I think these are runtime data and not configs. Aforementioned service also dealt with ACLs and configs and we were serving it through another service that persisted to a database layer. One cool thing about doing it as a service is that you can manage and scale it like any other service.


> At a certain level I think these are runtime data and not configs.

I think that's the gist of the issue - sometimes it's difficult to tell which one is which :).

And I agree with the 'TCP connection in thin server' approach. Thankfully, I only have to deal with two protocols where this is an issue: IRC and BGP.


>I think that's the gist of the issue - sometimes it's difficult to tell which one is which :).

Absolutely... in fact, there's definitely stuff that I've had start as configuration and gradually turn into runtime data, and there are things I would call configuration that need to be dynamic at runtime too... I'm never 100% satisfied with the split.


Ideally this isn't how you achieve HA - machines are unreliable and therefore service instances are too. Absolute statelessness is the goal, but of course that's exceptionally hard to achieve in the real world and bopping live connections will always cause annoyances.


This is how Lua got started - its predecessors started out as data description or configuration languages. The users felt it would be useful to have some forms of flow control, which led to the birth of Lua.

The whole story can be found here: https://www.lua.org/history.html


It's still very suitable for configurations. The statements can be used e.g. to configure rules.


Indeed, I wrote a console-based mail-client inspired by mutt but with Lua for the configuration file.

The Lua code made it easy to configure things such as the path to sendmail, and your email address. But also writing functions to filter, colour, and display messages was a joy.

mutt is awesome, but the configuration language it uses is a mess of history. No loops and conditionals being somewhat limiting. People end up writing "profiles" by essentially reading the configuration values from a different init-file. Horrid.


These days, using a statically typed language for configuration makes a lot of sense. You get stronger typing and with type inference not necessarily a lot more verbosity.

Kotlin and kotlin script seems to work reasonably well for Gradle. Gradle unfortunately has a bit of groovy legacy, which makes it a bit convoluted for some things. But Kotlin DSLs are quite nice for configuring things and it removes a lot of ambiguity when your editor can tell you a property doesn't exist or that a list instead of a string is expected. or help you autocomplete things that are allowed.

Basically the ultimate in configuration languages is something that has a minimal syntax but provides enough structure for tooling to help you validate things, provide autocomplete and other help, etc. There are now several languages emerging that have both static typing and some level of support for internal DSLs. Ruby sort of pioneered a lot of configuration DSLs back in the day but since it was dynamically typed, tool support for these DSLs was a lot harder. With languages like Kotlin, Swift, C#, Rust, etc. you get enough expressiveness that you can do similar internal DSLs but with the advantage of more robust tooling support.


Agreed, but the best combination is type-safe and non-turing complete, ideally with restricted effects. [Dhall](https://github.com/dhall-lang/dhall-lang) gets this so right that I'm increasingly shocked that people never seem to have heard of it.


This is a good example of a structured approach, with the caveat that there are multiple phases.

With gradle, there's three big phases: initialization, configuration, then execution. (See https://docs.gradle.org/current/userguide/build_lifecycle.ht...). I'd think you'd need to do something similar with any structured configuration DSL - have distinct phases where you may "init, plan, apply", etc.

This, it's a pretty complex problem to have. But that complexity may be worth it as part of a "developer productivity engineering" effort, where you're really looking to use data to minimize the time between a configuration change, and the time it's applied in a runtime environment. Doing that is a hard problem.


The only thing I don't like with Kotlin is the lack of a decent syntax for list and map literals. Unfortunately it seems to have resolved in a dead end as far as implementation into the language goes. It's still a nice language but quite awkward and ugly in the context of expressing a lot of literal data.

From that perspective Groovy is much nicer ...


Bazel uses a subset of Python for describing build rules

Bazel's configuration language is called Starlark[0] (used to be Skylark). It's not a strict subset of Python. It has implementations in Java[1] and Go[2]. I haven't had a chance to use it yet, but it seems very useful as a general-purpose scripting language, especially for embedding into tools.

[0] https://docs.bazel.build/versions/master/skylark/language.ht...

[1] https://github.com/bazelbuild/bazel/tree/master/src/main/jav...

[2] https://github.com/google/starlark-go


I've had great success with the Go Starlark interpreter. Very easy to get started with. I map some of my internal structs to its dictionary type and let CLI callers filter my collections by providing predicates as Starlark expressions.

The mapping to and from Go types is a bit obtuse and could probably be smoothed over by reflection or an intermediate representation like JSON in many cases. It's not just data though: you can also send functions in both directions.

Embedding an interpreter for filtering vs. trying to compile all possible filtering needs into the binary greatly improved my program's power and shrunk the volume of application code. I recommend it.

Don't be alarmed by the Bazel connection: it's much easier to use and understand (and better documented) than Bazel itself. Although I am a lot better at Bazel now for having taken the time to understand Starlark.

All that said: if you find yourself needing to support logic and behavior in your configuration language, you may not actually have a good enough abstraction to justify a config-driven system. Take a hard look at whether it might be cleaner to write plain old code, or APIs for other people's code. A significant number of internal platforms at my company are unnecessarily obtuse because they try to put everything in boxes like this vs. being legitimately composable/programmable through ordinary software engineering tools and processes.


Bazel is convenient but also a full-time pain in my ass.


As someone already put it, 'Bazel is the worst system except all the others'. It has some warts (Python integration, non-hermetic-by-default rules, starlark structure javaesque complexity ...), but I still haven't far a good replacement for my use: a monorepo build system that has to touch a ton of different languages and can be run in CI without having to prebuild tons of fragile builder images.


It absolutely is, especially if you do C++ with it. So many bugs and missing features in the native rules and skylark api, it's not even close to a 1.0


Bazel is the thing that I've missed after leaving Google, (lol "bazel menu"). Back then I was using it on the surface, there were no .bzl files, but internal python rules. I actually hated it when I started, because i was like - WTF is this shit, then I've learned.

My best example would be a CI system, given a CL (changelist) it can tell you exactly what targets need to get built. In our current CI - we have put extreme whitelsits - e.g. if you have touched this folder with C++ files, these are the .sln files to build. But can't move to bazel yet (mostly Windows shop). I actively use it for my home projects (linux + windows), but that's about it.


Thanks, I tend to forget it's actually Starlark, I'll amend.

I guess my point is it's so similar to Python that in most cases I don't even notice.


When you say "not a strict subset of Python", is that primarily the stdlib changes? Are there any syntactic differences that mean valid Starlark is not valid Python?


There aren't syntactic differences, syntactically starlark is a strict subset. But there are some semantic differences/weirdnessess.

`load` does magic, so that `load('//path', 'name')` puts `name` into the file scope, which is possible in pure python but ewww.

Containers are immutable, so they can go in default values, which is good since a lot of bzl and BUILD relies on default arguments for brevity.


Yep yep, makes sense. I helped migrate a Python monorepo to Bazel, which involved an unfortunate amount of Starlark :-)

The mental lines between Starlark and Python are fully blurred at this point, but I've seen the "Starlark isn't Python" a few times this past week already.

Thanks for the elaboration :)


(ask me how I know they're source compatible: https://github.com/bazelbuild/bazel-skylib/pull/91)


One variant of this is to use Python syntax but parse it with the `ast` module rather than evaluating it.

This avoids the security problems and the risk that someone will start to write programs inside the configuration file, but gives you the reasonably nice and powerful syntax.

It turns out that teenagers in my country have often done some Python in school, so it's becoming one of the friendlier file formats for non-programmers.


(author here). The ast suggestion sounds cool, never came to my mind. Thanks, I'll try it sometime!


I cannot disagree more. Configs are wonderful, as they follow the "rule of least power" with putting settings in a place without logic operations, variables, etc.

Yes, they are not general. Still, having one dedicated place for all constant switches is much better than constants being out in a dozen (or hundreds) of places all over code.


I agree with you. The idea of programmable config languages is really, really misguided. Whenever somebody wants to use something like that, it's a sign for me that they haven't thought out their design properly and are content of letting abstractions leak into the config.


But that's the point of this. You can have a single high-level config that emits the same high-level settings onto multiple config 'facades'. For instance, a single jsonnet/python program will let you tweak a list of domains, and that will in turn create a facade (or generate a file) for nginx (to configure vhosts) and for a letsencrypt tool (to acquire TLS certs). Less repeating across files, less chance of making a mistake, and since this is a programming language, you can have a single codebase that performs facade generation, without having to go through a CM system.


But then your config files in Python also need tests. Are they still a config file then or config scripts? How do you limit what they can and can do? Easy to do with JSON or XML, harder with Python.


This is why I'm not a huge fan of using Python for this (see my other comment) - and would much rather recommend Dhall/Cue/Jsonnet. And you can (and definitely should) write tests for these, or even just assertions for generated facades. It's much better than the alternative, which is just testing in production/qa/canary.


I cannot imagine how handing over a code base with configs that themselves require testing can be smooth.


I don't get it. If I get the choice between taking over a system where the configuration is self-tested, vs one that is not, I would much rather pick the first option?

You don't _have_ to test anything, just like in any software development. But now you can, and you can do it in a side-effect-free environment, on CI push, instead of realizing you made a typo when the CM kicks in, or worse, when your system starts crashlooping on startup because of a configuration issue.


If somebody hands over an application to me where configs have unit tests, this means to me that the application can fail catastrophically under certain (unwanted) configurations. And seeing how I might be the owner of said applications, chances are high that at some point I would need to change the configs.

I, personally, would rather prefer straightforward static configuration and very well written documentation on why certain configurations make more sense than others. That way, I know how and when to change something in the configs.

I absolutely do not share the view that "tests are documentation", because they do not communicate the most important thing in software engineering - the intent and reasoning behind a certain design.


> If somebody hands over an application to me where configs have unit tests, this means to me that the application can fail catastrophically under certain (unwanted) configurations

This would be a sign of a program that allowed input (configuration, in this case) that was unsafe. That is completely a different matter to whether a programing language is suitable for defining configuration. I suspect that many an application has been written that assumes that because it's consuming "static" configuration that is a simple text file, or a structured data file, that the input is completely safe. (SQL injection can come from any direction.)


Not a popular answer, but XML is mature, widely supported by many programming languages, and addresses all of the frustration criteria mentioned in the article:

- doesn't have comments

XML uses the same <!-- stuff here --> SGML-style comments that HTML does

- bits of configs can't be reused

XML can include other XML via XInclude: https://www.w3.org/TR/xinclude/

- can't contain any logic

You can technically put logic in XML using XSLT (https://www.w3.org/TR/2017/REC-xslt-30-20170608/), but I would ask if one SHOULD put logic into your configuration. I personally would recommend against it if at all possible. Keep configuration as declarative as possible and put the logic in your program. It's better to have a configuration logic module or use dependency injection than put logic in your config.

- Programming language constructs are reinvented from scratch

Again, XSLT could be used, but should your configuration really need to be Turing-complete? Granted, XML is complicated and not without criticisms, but it is mature, standardized, and well-supported. Python has great XML support: https://docs.python.org/3/library/xml.html

- can't be validated

XML can be validated against XML Schema (https://www.w3.org/XML/Schema), Relax NG (https://relaxng.org/), and possibly other standards.

- implicit conversions and portability issues

XML can be used portably between different platforms and programming languages depending on how the data is represented. XML Schema defines a way to describe data types that is flexible and interoperabile: https://www.w3.org/TR/xmlschema-2/

Again, I know it's not the popular answer, but I was surprised that it was not even mentioned in the article.


Merely for your consideration, I suspect a lot of people have opposition to XML because it is noisy when edited by a human:

    <httpd>
      <baseDir>/opt/server</baseDir>
    </httpd>
Of that snippet, "</baseDir>" and "</httpd>" are characters of my life that I will never get back, since they're just parser niceities and not _value_

Moving up the enlightenment chain:

    {"httpd": {"baseDir": "/opt/server"}}
or its brace-and-quoteless friend:

    httpd:
       baseDir: /opt/server
place a lot more emphasis on the payload and less on the packaging


It is a matter of preference and legibility.

With XML and other "fully qualified" syntaxes the named closing tag makes it easy to visually see where a block starts and stops. Humans parse the text too. This begs the question: Do you read config more or do you write config more?

Braces and parens all look the same, so its harder to visually match them. "But wait," you may say, "my editor / IDE does brace and paren matching so I can easily see and jump to the corresponding brace or paren." True, but doesn't your editor / IDE also support or have plugins for automatic HTML / XML end tag insertion?

Personally, I am equally comfortable with XML and JSON style configuration, but I find the brace-and-quoteless style unsettling because I believe that whitespace is a poor choice for delimiting structure. Your opinion may vary and that is fine. Do what works best for you.


But you can do

     <httpd baseDir="/opt/server"/>
and it is shorter than the JSON.


In my opinion one of the weaknesses of XML is that it gives so many ways to express the same data, in attributes, as nested elements, with CDATA, and probably different other ways. And often it is a matter of taste. So in the end you come up with so many approaches to express similar structures.


>> So in the end you come up with so many approaches to express similar structures.

Expressiveness and flexibility are good so that you can define configuration to meet the needs of the system or application you are building.

XML Schema, Relax NG, etc. can be used to specify how the XML configuration should be structured, limits on data types, required versus optional configuration items.

As I said before, XML gets a lot of flack for being verbose, ugly, and complicated, but it is mature, widely-supported, and might be worth considering depending on your needs.


Some parts are extremly ugly, and mostly pointless. namespaces, doctype, processing instructions.

This is well-formed xml and good way to confuse people:

     <?xml version="1.0"?><!DOCTYPE abc[<?abc >]]<abc><abc/>]>>?>]><?x?><a/><?x <(x)>>?>


The main reason I'm not using XML seriously is that it only really has a single native, scalar type: string. Sure, you can schema validate away a large part of this problem, but it's still somewhat of a wart.


> Again, XSLT could be used, but should your configuration really need to be Turing-complete?

A few of OPs suggestions are not Turing complete, but rather total programming languages as is the case with Dhall.


> You can technically put logic in XML using XSLT (https://www.w3.org/TR/2017/REC-xslt-30-20170608/), but I would ask if one SHOULD put logic into your configuration.

Well, definitely not XSLT.


XML needs to die


I think that the security issue needs more attention than the author gives it. Config files are often shared and not rigorously checked, especially if they are very long. Arbitrary code execution is a real security risk that should not be minimized.

For example, years ago I was a frequent user of the chemical kinetics code called Cantera. It calculates the dynamics of combustion reactions, with the big application being for jet engines. One of the files that it needs to load is a mechanism file (called the CTI file). This contains all of the information about the gas properties and chemical reactions. Different situations might require different mechanisms (propane mechanism versus JP8 mechanism). Anyhow, Cantera's mechanism file format is literally a Python script. See the link below for the most commonly used mechanism file that comes with Cantera:

https://github.com/Cantera/cantera/blob/master/data/inputs/g...

This file is 2000 lines long, and many mechanism files are even longer. I told my colleagues that it is possible to execute arbitrary Python code using the files but I was unable to convince them that it was a security risk. I think that these kind of config files are a big security risk for engineering firms, because they make it much easier to conduct industrial espionage. All that a bad actor has to do is put a few lines in one and get an engineer to run it once. Then they could steal designs, analyses, business plans, financial data, and many other things. It's a serious threat that should not be minimized.


I converted PyOxidizer's configuration files from TOML to Starlark because I found it effectively impossible to express complex primitives in a static configuration file and the static nature was constraining end-user utility.

A common solution to this problem is to invent some kind of templating or pre-evaluation of your static config file. But I find these solutions quickly externalize a lot of complexity and are frustrating because it is often difficult to debug their evaluation.

At the point you want to do programming-like things in a config file, you might as well use a "real" programming language. Yes, it is complex in its own way. But if your target audience is programmers, I think it is an easy decision to justify.

I'm extremely happy with Starlark and PyOxidizer's configuration files are vastly more powerful than the TOML ones were.

https://pyoxidizer.readthedocs.io/en/stable/config.html


> I found it effectively impossible to express complex primitives in a static configuration file

I don't quite understand this part. Any examples?


Compare https://pyoxidizer.readthedocs.io/en/v0.4.0/config.html#file... to https://pyoxidizer.readthedocs.io/en/stable/config.html#file....

In PyOxidizer's case, I wanted to create virtual pipelines of actions to perform. In TOML, we could create sections to express each stage in a logical pipeline. But if you wanted to share stages between pipelines, you were out of luck. With Starlark, you can define a stage as a function and have multiple functions reference it because is "just" calling a function.

I suppose I could have defined names for stages and made this work in TOML. So let's use a slightly more complicated example.

PyOxidizer config files need to allow filtering of resources. Essentially, call fn(x) to determine whether something is relevant. In the TOML world, we had to define explicit stages that applied filtering semantics: there were config primitives dedicated to applying filtering logic. PyOxidizer's config files had to expose primitives that could perform filtering logic desired by end-users. By contrast, Starlark exposes an iterable of objects and config files can examine attributes of each object and deploy their own logic for determining whether said object is relevant. This is far more powerful, as config files can define their own filtering rules in a programming language without being constrained by what the TOML-based config syntax supports.


Clojure and Common Lisp (among other Lisps, I'm sure) are great at handling programmable configuration files.

I do not think I would use this author's suggestion though.


There are cool things somewhere in the middle now too, e.g Cue [1], which is really cool.

I was recently thinking that its a shame there isn't a cross language proto-lite kind of thing (I'd settle for json with structures and pointers) that has a great devx. The multi step transform that configs are right now (from the serialization representation to the code one) could be eliminated. IMO interop is one the most important things for configs, so I'd be hesitant to use a thing that needs a preinstalled runtime.

[1] https://cuelang.org/


Related: This is the same reason I've always preferred doing server config management with Chef (which uses ruby for everything) to Puppet (which has it's own DSL). Ansible, which uses YAML config with Python behind the scene is sort of a middle ground, but often wins out because it's simpler to learn and operate than a full Chef setup. I still find myself missing Chef when I have to write stuff for ansible though.

To me, from best experiences to worst: Real language (ruby, Python) > Yaml/Toml > ini config > JSON > DSLs


Check out https://cuelang.org

Created by people who have dealt with this problem at scale for many years. Picked it up a month ago, absolutely amazing.


Smug lisp weenies, as lisp fans sometimes call themselves , can be a bit annoying at times but as I see all these new config file formats, data dsls, hybrid scripts like svelte etc emerge you do have to admit that none of these are necessary in a language with a programmable syntax like lisp.


Seems like the config language would benefit from being functional. I've actually sort of seen an example of this: in GCL/BCL (Google/Borg configuration language). I used to joke that it's essentially a FP language with all the good features removed. All the alternatives I have seen outside Google are much worse, however.

See here for more details: https://pure.tue.nl/ws/portalfiles/portal/46927079/638953-1....

It ticks off most of the author's boxes, but it's so woefully underspecified and so horrifyingly complex, that Google has been trying to replace it with something else for the past decade, unsuccessfully (at least as of a few years ago, don't know about now).

But you get to provide external parameters, compute values, use some rudimentary logic, inherit configs, reuse configs in larger configs, and so on and so forth. You can also spend a couple of weeks trying to understand the config structure of something like an ads backend.

You do get some niceties though: even fairly large services, consisting of dozens of different backends/mixers/frontends can be brought up/upgraded/shut down/reconfigured with a single CLI invocation, soup to nuts, including things like monitoring and load balancing.


I did this with a server-side TypeScript project I built last year. The biggest advantage I noticed (and it's mentioned in the article): validation (i.e. type-safety).

Specifically, the project uses TypeORM and it was nice to be able to import the types for the DB connection options to make sure I had all of the necessary properties.

I was using ts-node to run server-side without compilation, which made importing an `app.config.ts` file quite elegant, IMO.


I think not containing any logic is a useful feature for config files. Logic adds complexity and potential for bugs and vulnerabilities - I actually like that JSON forces you to use simple primitives as config variables. I hate it when config files start containing function definitions; you can't pass those around to different processes.

I'm generally against the idea of complex configs. Configs should be simple. Sometimes that requires extra planning or thinking from developers. Every time I've allowed logic in config files, I've regretted it.

The main annoyance about JSON is lack of commenting ability. This is a point I agree with but not a big deal IMO, it's still better than all other alternatives.

JSON is ideal in terms of being both machine-readable and human-readable. Some people would argue that YAML is more human-readable, but it's definitely less machine-readable. With YAML, there are too many situations where some random code somewhere will remove all the new lines and tabs (for whatever reason) and mess everything up. JSON is resilient to machine sanitization. JSON is simple, robust and readable.


This reminded me of the Configuration Complexity Clock:

https://mikehadlow.blogspot.com/2012/05/configuration-comple...

Hacker News discussion here:

https://news.ycombinator.com/item?id=14298715


I use schemy [0] for non trivial config tasks in .NET projects. LISPs in general are well suited for this. Very small implementations are easily done and data is code, it’s pretty great.

Of course, if one has already access to a Python runtime that is probably fine as well.

[0] https://github.com/microsoft/schemy


Lisps are great as embedded scripting and configuration languages! Lisp is XML done right.


This reminds me of Proxy Auto Configs (https://en.wikipedia.org/wiki/Proxy_auto-config). PAC files are Proxy configs that are programmed in JavaScript. To do this devices usually embed a JavaScript runtime in the operating system to parse proxy files. This introduces a lot more attack surface than a standard config file would and has resulted in remote code execution vulnerabilities in android and windows

https://android.googlesource.com/platform/external/chromium-...

https://googleprojectzero.blogspot.com/2017/12/apacolypse-no...


One of my favorite projects that does this is Guix, which uses Guile for its configuration. See this page for an example: https://guix.gnu.org/manual/en/html_node/Using-the-Configura...

It's a nice balance, I think - most of the time you can treat it as a normal config file, but if you need to drop into some programmatic stuff you can. I've done this in my own dotfiles to add custom package definitions before contributing them upstream, for example: https://github.com/jfrederickson/dotfiles/blob/master/guix/g...


I somewhat agree and so does much of the JavaScript world.

eslint or maybe jslint started off with json and later ended up letting you make a JavaScript file. As a simple example of something you might want to do, read the version from the package.json and generate a banner or an installation version.

Grunt (not sure who still uses Grunt but I do) is JavaScript based. And yes, sharing parts is something I commonly do as in define common options and re-use them in dev, production, or minimized, un-minimized.

WebPack takes a JavaScript file

For most build tasks I've had some place where I needed to use code to make the build better but it can go both ways. The worst experience I've ever had is Scons (python). No one I know ever really understood how it worked and so every programmers wrote wrong and hard to understand code trying to insert their special needs into the build.


I am quite turned off by the following passage near the top of this page:

> bits of configs can't be reused. For example, while YAML, in theory, supports reusing/including bits of the config (they call it anchors), some software like Github Actions doesn't support it.

I use YAML configuration for Home Assistant all the time, and the includes work just fine. This article is off to a really weak start by saying "config files can not have X, because well they can just fine but arbitrary software Y doesn't support it". Especially damning because the proposed solution is to embed a whole other programming language within your program! If you can change the code of the consuming program, then just add support for imports, instead of adding a Python dependency.


You can add eslint to the things that allow scripts as config files, just like webpack. I found this very useful because I just started a full-stack javascript project, and I need most of the linting config to be the same (tab-width, syntax, etc.) but then node and browsers don't have the same syntax for imports, don't have the same globals, etc. so I want to reuse a part of the config. And because you can write the config in js, it's trivial to import the result of the common config file (which is also used for isomorphic code, which should have no client/server specific code) and just add or remove what I need from that to return a different config for linting client and server files.


Lua is perfect for this. Simple syntax, one of it's design goals was to be a config language.


Still don't get the hang up about comments with JSON. I'd find it strange to encounter a config file with lots of internal documentation. They should be small with well named, descriptive settings.

These are config files. A comment is something that gets ignored, this "comment" will be ignored by whatever ingests this JSON. If not, there are other problems. If you have to write a treatise here, there are other problems.

   {
      "comment": "This is a comment",
      "dev": { ... },
      "stage": { ... },
      "prod": { ... }
   }
Oh no, that line doesn't start with a # or //, whatever will we do!


- sometimes the schema is strict (and you might not be able to control it)

- you might not be certain if you break something with such a comment (i.e. you have to double check that all the "comment" fields are filtered)

- you're not geting highlighting from your text editor, it's easier to miss a comment

- you can't have a comment for a list item

Basically, you can leave the comments, but you're really discouraged to do so. Most people would just give up and not bother.


I think the real loss here is that with a # or // the person reading the file immediately knows this is a comment. This line won't be read by the interpretor, and it's purely here for me to read it, so I probably should do.

The following however

    {
      "comment": "Some comment",
      ...
    }
Suddenly I'm not sure if this is meant for me, or if it's actually a configurable state without going into the program itself and checking if it uses it.


Putting a comment into the payload is just a terrible hack. this shouldn't be necessary,


> Still don't get the hang up about comments with JSON. I'd find it strange to encounter a config file with lots of internal documentation.

A lot of mainstream applications come with documentation in the sample config file that explains what each setting does and what its default value is. For example, this is the default configuration for redis[1].

Having that information in the config file guides users in terms of what's possible to change and what values can be used.

[1] https://raw.githubusercontent.com/antirez/redis/5.0/redis.co...


Your "comment" is part of the data so it's not a real comment. I don't understand the resistance against adding comments to JOSN. This seems pretty silly to me.


Commenting in complex configs is actually a highly compressed signal for communicating a lot of information without a lot of unwieldy external documentation. INI files support comments. apache.conf or any number of Linux config file support comments.

Also, executable configurations are not uncommon. Consider .bashrc or .vimrc. And they're usually loaded with multi-line comments.

Embedding comments in JSON fields gets unwieldy really quick for long configs. Compare:

    {
        "generalcomment": "the quick brown fox\njumps over the lazy dog",
        "setting1" : {},
        "setting1comment" : "this is setting1",
        "setting2tree": {
                "setting2a": {},
                "setting2acomment: "this is supposed to do this",
                "setting2b": {},
                "setting2bcomment: "this is supposed to do this"

                },
        "setting3": [1,2,3]
        "setting3comment": "to use this setting, do this"
    }
To:

    // The quick brown fox
    // jumps over the lazy dog.
    {
        "setting1" : {},  // this is setting1 
        "setting2tree": {
                "setting2a": {},  // this is supposed to do this
                "setting2b": {},  // this is supposed to do this
                },
        "setting3": [1,2,3] // to use this setting, do this
    }


> I'd find it strange to encounter a config file with lots of internal documentation.

This is hard to believe. Go ahead and open any linux config file.


I do something a bit different - salt pillar or ansible vault contains my "settings" for the entire backend, so as a step in my deploy, I just write out all the settings as a config.json so it's accessible from any application programming language.

Then I'll have a python module config.py that reads that and injects those settings with perhaps a small amount of logic into that namespace on import. So you where ever you might need a setting you can just "import config; config.foo" to get a specific config value.

For small systems, I even do service discovery this way - where does the db live? config.postgres.host/port or something yada yada.


Note one kinda neat thing about json is you can use jq from scripts to get values out of it as well.

`psql -h $(jq -r '.services.postgres.host' config.json)`


Ansible might be a valid example of a system that should have been built using a programming language instead of a data language. Due to the template language mixed with interpretation it can be quite difficult to understand the source of a particular error, and the limitations of Jinja make it quite difficult to express some things that would be straightforward to write in Python. That said you can be quite productive with Ansible and the config is very easy to read and simple to write. I think the original Ansible author now tries a Python based approach with his new tool (opsmops), so we'll see how that works out!


Imo this article points out legitimate problems, but downplays the downsides. The real solution is to combine both:

1. Use a general purpose language (or ideally a very dumb subset like Starlark) to generate the configs. You get all the reusability, type safety, comments etc. 2. Have #1 output a deterministic config like json/yaml/etc, so you can always refer to what the config actually looks like without needing to debug the code in your head. Check that in and whoever uses the config only uses the static output, they shouldn’t care how it came to life.

This way a config is easy to reason about both on the author and consumer side.


I write a ton of config in a bespoke python config language and FAANG and there are definitely downsides to having logic in your configs. Config is supposed to be explicit and when you start iterating over lists and parsing conditionals, those layers of abstraction can make reading a simple config quite cumbersome. Additionally, each team will often write their own libraries which diverge from the standard practices furthering the stratification.

That being said, these config can get very complex and I can’t imagine expressing them in something even relatively friendly like yaml


https://queue.acm.org/detail.cfm?id=2898444

""" ... embrace the inevitability of programmatic configuration, and maintain a clean separation between computation and data. The language to represent the data should be a simple, data-only format such as JSON or YAML, and programmatic modification of this data should be done in a real programming language, where there are well-understood semantics, as well as good tooling ... """


I'm surprised the author didn't mention Suckless tools https://tools.suckless.org/.

Their philosophy is to use minimal software and all configuration is done in a single programming language file. But it's also written in C which means you need to recompile the app to change config values. They typically provide a bunch of patch files that you can optionally bake into your custom binary, or you can write your own modifications too.


> you need to recompile the app to change config values.

This works for a user using one of the suckless tools like dwm or surf. However, for server side applications there maybe multiple configurations and it may not be feasible to compile the whole app for every environment or deployment.


Unless we're talking about some specific cases of dependency injection, adding logic to your configurations sounds like a living hell. Can you imagine the exponential explosion of complexity that you'd have to deal with? How do you even _test_ something like this?

If you program requires configurations so dynamic that it needs to be its own program, there's something very smelly about your architecture. Next thing we'll need is a config config to config your config, etc. Infinite regression into a very stupid place.


Configs usually have a decent amount of logic hid in the implementation for defaults and other implicit things. Having a language like Dhall puts a hard limit on the possible complexity of the configuration while simultaneously allowing you to make everything explicit without being redundant.


Configuration files are globals and they are lousy for the same reason globals are lousy when they are in a program.

If you use a programming language, you had least have a way to scope.


Using json or yaml for your languages config is nice for the application but it's not great for for the developer especially if they want to reuse values or settings across multiple applications. The solution then is to use a real programming language to generate that config. This is why languages like Dhall, JSonnet, and Cuelang exist. It's why I built my own toy language UCG to explore the space. But you could just as easily use python, javascript, or some other language too.


> json or yaml... Dhall, JSonnet, and Cuelang... UCG... python, javascript

God, what are we being punished for?


If you think that's punishment then you haven't suffered enough. I've seen enough custom written config formats in my life that choosing a widely supported format like YAML or JSON is always the right choice. YAML may have warts but you can learn them in 10 minutes and you're done. Those custom configuration formats don't even have a name or documented syntax.


This reminds me of makefiles. Most programmers wouldn't dream of checking in code with no comments. But makefiles with byzantine code in it, no comments whatsoever, nothing even saying what the makefile is for, are routine. The makefiles degenerate into such an awful mess that then people look for alternatives to make.

The sensible solution is to treat a makefile like code - document it and use sensible coding practices like organizing it instead of writing spaghetti soup.


Indeed; I've always wondered what is it that's keeping us from having a saner alternative to make in the form of a simple yet more powerful interpreted language with all needed procedures and functions built in.


You're describing Nix: https://nixos.org/nix/

It fills the same niche as make, but is a pure functional programming language, reproducible, and has a large package repository.

It's pretty much my favorite piece of software because it allows me to depend on third party packages with minimal risk of breaking the build.


My wishlist for a perfect declarative language (configs being one primary use) is something like:

* Not turing complete.

* Imports/exports

* Comments

* Variables

* Statically typed (amongst other benefits, allows you to use it for RPC, like protobuf)

I've started implementing such a language 2 or 3 times, but JSON is just barely good enough that I haven't been able to justify the effort to flesh it out. Taking JSON and adding types (inline in the grammar; schemas aren't good enough) and comments would get us 80% of the way there.


Looks like Dhall would be a perfect fit for you.

> Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports

https://dhall-lang.org


I've looked at Dhall before, and never come away with a desire to try it. It's interesting, and definitely has a lot of the right ideas. I'm not convinced functions are necessary.

Also, any language without an official browser JavaScript implementation obviously wasn't created to solve the same problems I'm interested in.


It's hard to take Dhall seriously because it seems like its authors don't take it seriously, and work on it for entertainment. They went through the trouble of creating bindings for five (!!) different languages, and the most popular of them, by a pretty wide margin, is Ruby. I assume Fortran, Delphi, and Idris are next on their list.


Dhall now has bindings to Rust and Go and Java/Python bindings are currently in progress

We also do take things seriously, including:

* Creating a language server (do any other configuration languages have this?) * Soliciting donations to fund high priority work (https://opencollective.com/dhall) * Working on a book (https://github.com/Gabriel439/dhall-manual) * Maintaining a formal language semantics (https://github.com/dhall-lang/dhall-lang/tree/master/standar...)

If you think we're still missing something please let us know as we are responsive to user feedback


Rust and Go are even less popular than Ruby but glad to hear about Java and Python. Either one of them has at least 5x the market penetration of all currently supported languages combined, and at least 10x of all of them but Ruby combined.


Overall language popularity is only one input into how we prioritize programming languages.

The best way I can summarize our prioritization process is that we prioritize in descending over:

* What people are willing to spend their free time to build (I can't order other people to build high-priority bindings and my own free time is already accounted for by improving Haskell bindings that power a lot of shared tooling such as the language server)

* Bindings specific to DevOps use cases (e.g. Go / Python / Ruby / Nix / JSON / YAML), since they are the dominant languages and formats in this space)

* Bindings that can be used to create derived bindings (e.g. Rust, which can then be used to create a binding in any language that can bind to C. In fact, this is how the upcoming Python bindings work. See: https://pypi.org/project/dhall/)

* Bindings that users request (We have a yearly survey where we ask users to inform the direction of the ecosystem. Python was the most requested language in the most recent survey)

* Overall language popularity (as the final tiebreaker)

So I hope this illustrates that there is a lot more that goes into these decisions beyond just which language is the most popular and we're not being obtuse or dilettantes just because we haven't gotten to a specific language, yet.


Has there been any work to get a native browser implementation, either in JS or Wasm?


Not native JavaScript, but the closest thing we have is PureScript (still in progress):

https://github.com/MonoidMusician/dhall-purescript

PureScript corresponds pretty closely to JavaScript in terms of the code it generates, so once we have a PureScript binding it shouldn't be hard to generate lean JavaScript bindings from that. Also, more people requested the PureScript bindings over the JavaScript bindings anyway in our most recent yearly survey:

http://www.haskellforall.com/2020/02/dhall-survey-results-20...


HOCON is what you want: https://github.com/lightbend/config


No official browser JS implementation is a non-starter for me.


I have long argued that teams should write configs in one of the languages they already know and are using in the system.

This has many obvious benefits, including testing!

A key aspect is for the owner of the "config system" (the stuff that takes the output of each config program and applies it) to standardize the API for config generators. Inputs, outputs, runtime env, etc. Then let teams integrate with that API however they want.


Here's a philosophical question... when does one put configuration in files vs a GUI? When changes in the GUI are made, do you update the config files? This always bothered me. Even databases don't get this rightish. Take MySQL, you can SET GLOBAL TIMEZONE after the database has started, or you can set it in my.conf. To me, have multiple places do things leads to confusion and head-desking.


I disagree.

Those configs would be unmanageable. You would no longer be able to set a value, interfaces like `git config` or your-favorite-preferences-GUI-pane wouldn't be possible. And, generally speaking, I believe it's a bad and non-user-friendly thing.

However, it makes perfect sense to use a programming language, if your program is programmable and is meant to be programmable. Like a shell's rc files.

It's also probably okay to use programs as configs, if everything related to the program is built decoratively, so all the configuration files are generated (from some higher-order configuration) and not meant to be ever manipulated, only replaced with the newly generated versions. Programmatically generating programs is simple, programmatically manipulating programs is not.

It is also fine if you explicitly want to restrict an ability to manipulate the configuration or create management interfaces, requiring human programming (whenever the tool is meant to be programmable or not). Feels like a weird idea, but I can see this being considered as a trade-off.


This is an area where Tcl can shine. Sandboxed sub-interpreters can be stripped of all unwanted commands so that you are no longer Turing complete. You are left with simple configurations that just set variables. If more power is needed you let some control structures back in or select custom commands to do what a dumb config file can't.


Yup. I just did a similar thing in guile (using sandboxed environments). as a scheme weenie I feel like I am looking at a world of people re-discovering things I have taken for granted since... forever?

I went from trying to embed python in a C application to embedding TCL and was blown away how easy it was, and that it supported threads. this was back in 2003 I remember trying it with lua, but the parallelism story wasn't really great until at least 5.1. TCL did it right from 8.4 and things hasn't really changed since, except that Lua catched up.


I've read a lot of "one true way" in this discussion. Do we need programmable configuration? Depends on the requirements really.

Django seems to work ok with Python configuration. I haven't seen found much logic happen in the projects I've used. But the user is the developer not an end-user. So security is not an issue from this angle.

I'm of the opinion that a simple config format like .ini is best for the end-user of a small application, while a validated schema (with code if needed) is best for a large one. To that end I'm currently experimenting with tconf to bring together those approaches under one package: https://github.com/mixmastamyk/tconf


I think Python is a terrible language for config files. It requires a full python installation. Lua or a simple LISP are better choices as they can trivially be embedded in every program at very low cost in terms of added code. It further avoid juggling different Python versions.


Exactly. And once one has something like that embedded, one can use that flexibility for many kinds of adaptations and scripting of the main program.



My users consistently forget a comma in a JSON config file, and then open an issue because "the program doesn't work."

This is indeed a problem, but something tells me the solution isn't "have them use Python."


Your users should not be writing JSON or Python to begin with, so using Python is fine.


So no configuration files at all?


As a sysadmin, my main requirements for configs are:

* Human-readable and writable, including on systems where you have only very basic text editing tools * Easy to template (for example mass deployments via Ansible or other such tool) * Easy to use with grep/sed/etc

The format I've found that is easiest to work with is what sysctl and OpenWRT use for their configs. You can have complex hierarchies, but every line stands on its own. This means you don't have to be careful about where in the config a particular line is, if it is in the proper block, etc. Also getting information on a particular sub-item is as easy as running a single grep command.


I do this all the time in projects at work and it's extremely useful. Most importantly it pleasantly separates core logic from configuration-specific niggling detail logic (now, for instance, you can generate that thousand entry list of similar things in a few concise lines of code without a bespoke preprocessor pipeline).

But the best way to sell it to people who don't understand or who aren't yet on board is to name it properly, which the article fails to do. What this is describing is no longer configuration. This is now a modular plugin architecture that only in the most basic usage case implements a configuration interface.


> I would argue that when you can't define temporary variables, helper functions, substitute strings or concatenate lists, it's a bit fucked up.

I must admit that's where I stopped taking it seriously. If you need helper functions and temp variables, you do need a good Turing-complete language, but we're no longer talking about config files. Maybe about config system, but not a config file. And confusing the two means missing the point why config files exist at all. I mean not everybody has to buy into code vs. data division, but if you miss entirely why it exists maybe it not thought through enough.


I agree with you that helper functions within configuration are a code smell, but I find (immutable) temporary variables helpful. It's pretty common to want named constants to avoid copy-pasting configuration and to make consistent modification easier. You could have an inclusion mechanism and put those constant parts in their own configuration file, but then you've effectively renamed "my_constant" to "include('my_constant.cfg')". Immutable temporary variables are just non-globally scoped constants.


If your config files contain anything that could not be expressed as an OS variable, that's a maintenance nightmare right there. Eventually some user somewhere is going to jam entire mini programs in there and then complain when his castle of cards implodes.

Consider, if possible, the chance that if your program needs a subprogram for configuring itself, then maybe a bootstrap event hook or some tap into the bootstrap process like a plugin or something is a better fit for what you are trying to do. Actual managed extension points instead of ad-hoc ones will make your program much more useful in the long run.


I’m surprised to see nobody brought up the 12 Factor App principle about config,

https://12factor.net/config

Anything that's changing from one run of your application to another is config. Static settings that don’t change from one run to another are not config, they are just static pieces of data and it does not matter how you store them other than that you ensure it meets the operating need of your application.

Configs, however, which can change from one run to the next, are different. They need to be factored out of code completely and only addressed by inspection of the runtime environment.

Whatever tool you use to ensure they are injected to the environment is also a totally uninteresting decision as long as it meets the operating needs of your deployment system.

That’s it.

1. store static constants and data items however you want as long as your program meets its operating requirements

2. config is not the same as static constants or static data, config can change from one run of the application to another.

3. factor config completely out of the code so it is solely referenced as part of the environment

4. store external configs however you want so long as the operating constraints of the deployment system are met.

Within items 1 and 4, debate over relative merits of different tools is almost always useless bikeshedding unless it boils down to a real operating constraint of either the app itself or the deployment system.

For example, a system that puts secret access tokens (config, not static settings) into the environment by storing them in plaintext environment variables might violate a security operating constraint of the deploy system and so a different system that manages encrypted secrets injected into the environment safely could be the winner for real operating reasons.

Meanwhile, whether to store “staging” vs “production” database connections in Python / YAML / JSON / TOML / etc. because of comments / whitespace / use of builtin library “extra” code / whatever is just pure bikeshedding waste of time.


My takeaway from the comments is that software development is not monolithic and what's appropriate for your application may be anything from a couple lines in a text file to literal python.


What about writing YAML, parsing it in your preferred programming language, and document what's expected ? We're doing it with thousands of lines of YAML, it's working great.


There is something wrong with the idea to use a markup language for configurations. Sure, the idea is extremely popular, and there are reasons for that, but a homoiconic scripting language (a Lisp or, to an extent, Javascript/JSON) would serve the purpose much better.


YAML isn't a markup language and that's why the name is now "YAML ain't markup language".


We use json schema to counter a lot of the concerns listed around type checks, validation, back references to structures etc.

More importantly this helps us have tight validation around configuration ecosystem that defined experimentation and server side overrides. See config delta blog post: https://medium.com/crunchyroll/introducing-crunchyroll-confi...


I used to develop game engines when I was starting to learn programming. First C then C++. I knew JSON and there's an excelent header-only library for C++. However, I quickly learned from the indie gamedev community that parsing y=x always solves most of your problems, and it's quite trivial to do it, specially if you're using Python. Certainly better than using a full blown programming language for conf files.


I hope I'm not to late to the party but I'm building a project doing just this. It's called Anyfig and allows you to create your configs during runtime in Python for pure-Python projects. Check it out :) https://github.com/OlofHarrysson/anyfig


This approach seems to have served emacs quite well; the config is just some elisp. It’s weird more programmers don’t copy this approach, given how many of them use emacs.

Of course, you start out just cargo culting emacs config, so many may not even realize emacs is configured via code. You can also get pretty far with setq functions, essentially just assigning values to variables.

But learning to grok and code elisp is when you start to really see emacs’ power.


Interesting. This is the opposite of modern design defunctionalization. I think the arguments made for that apply to why this is not a good idea.

The advantage of declarative configuration is that it provides a sync barrier to the human and a safe entry-point. I imagine in a pure language without global runtime state you could use this method but in the more mainstream languages it is likely to trip you up. I will refrain.


Ctrl+F search XML, no result.

I know the majority of developers hate XML config files, yet...

I try to make my apps depend on config files and manifests as little as possible, but DSL vs fully featured scripting languages has always been a big conundrum in software development. Is python a solution? the problem is that at some point you might be tempted to add json/yaml config files to your python config scripts...


about 5 minutes after you start using xml for configuration, you want to reference a bit of global information (environment variable, port, ....) and add variable substitution ${foo} as an extra feature.


There was a time when I thought you want a declarative language for your build system. After having used Rust, which allows you to use Rust in your build scripts I think that that is the way forward. You can say generate game assets in your build script, access network, use third party crates etc etc.

Declarative languages are a work around for bad APIs.


People often forget that formats such as JSON and XML is not meant to be hand written. It's a serialization format. It's meant to be parsed, while still human readable. It's very nice to have them human readable to make it easier to debug, and make quick changes. But you should probably use a tool to edit them.


Saltstack is an interesting compromise... states can be described in a serialization format or in Python. I used Saltstack a lot at my last company and really liked it because for most cases, you could just use YAML... but when that didn't work, you could bring in a "real" programming language (python).


Gradle should be cited: It allow to configure build/automation rules with either groovy or Kotlin script


author here! Thanks, I've actually had it in my prompts but forgot to add. Will amend!


I occasionally do this for python-only projects or for Makefiles, but (especially in the latter case) it's quite fragile.

I usually use libconfig for configuration, which is better than json and yaml and has implementations in many languages (although I'm not aware of any javascript implementation).


setuptools doesn't deserve to be mentioned in the same capacity as bazel/nix/dhall.

Having the entire python language (specifically the ability to inspect the local system) available BEFORE you declare your package (as a side effect!) means you cannot reliably learn anything about a package without having an entire operating system with python and all package dependencies pre-installed. It can lead to scenarios where you cannot automatically query a packages dependencies without having those dependencies already pre-installed. Or when installing packages in different order yields totally different results.

On the data <-> code spectrum, configuration should be closer to data than to code, perhaps a pure function that takes a high level config to a lower level config.


I hate when config is done using a real programming language. In the JS world, Grunt sucked, Gulp sucked, Webpack sucks, Parcel came with zero-config (ie. 3 lines of JSON max) and it rocks. Maybe it's me, but I love it when my conf is static and just easy to understand.


I really think configuration files should have a schema of some sort, something that an editor can read and know the complete set of valid options for, and highlight invalid options. Something aiken to an XSD for XML (but not XML, sucks to edit with command lines tools).



Config is code.

Replace "configuration" with "scripting layer", and you'll have a much better perspective on "config" "vs" "code". Especially if your "code" is also a scripting language or open source.


> exec(Path('config.py').read_text(), config)

I'm guessing that this is in place of just importing the config module, because of this line:

> You can even import the very package you're configuring.

Which sounds like a recipe for circular import disaster.


> This is considered as a positive by many, but I would argue that when you can't define temporary variables, helper functions, substitute strings or concatenate lists, it's a bit fucked up.

What's "a bit fucked up" is expecting the configuration file to be able to support this when this can and should be done by the application itself. If you feel the need to do these things in the configuration file itself, then the application is providing insufficient abstractions.

Like sure, there's definitely use in an application dynamically loading its own potentially-user-supplied code to modify its behavior, but most people don't call that code "configuration files"; we call that code "extensions" or "modules" or "plugins" or "scripts" or somesuch.


In terms of how this should be done, clojure and edn shine here and prove the point rather strongly that solutions like ymal are unnecessary at best and deeply limiting at worst.

Yaml is the poster boy for easy but not simply.


The neat thing about config files is that they’re easily serializable and transferrable over the network, the file system etc..

A programming language can’t do that (or if it could via eval, it would be a bad idea)


At one of my old jobs, we used XML for config files, but users could write C# in the XML which would get dynamically parsed and run at runtime. Needless to say it was an absolute nightmare.


My 2¢: 90% of the time config files are simple enough for something like json (+ comments)

The other 10% of the time where you need some level of logic to modify some setting value, there's no substitute for something that people are already familiar with. Just let people have their 3 line js function instead of a 2-3 other dependencies that add more convoluted logic to the build system.

Trying to create a config specific language with more advanced features/logic is an absolute mess where you end up learning a second language, people are unhappy with how certain features aren't implemented, and others are unhappy with how too many features exist, and it's just all round chaos.

tl;dr There should be no inbetween: Either you have a simple key/value config file that someone can learn in 5 minutes, or you let people use a language they're already familiar with.


At work, we generally do configs via Python that builds to json. It's a good best of both worlds imo, as it avoids a bunch of the downsides in the article.


A large number of configs are created/used by people that don't know any programming languages, it seems rough to exclude all those people.


It seems worse to me to treat them like they can't learn a programming language.


Not forcing them to is different from assuming they can't.


The trick is to get them programming without them even realizing they're doing it.

There's examples of this working effectively: millions of people can use excel to perform computations on groups of cells - the formulae they're entering include basic programming constructs like assignment, selection, loops, etc - yet it's not presented as a "programming system", which otherwise seems to put people off.

I recall reading an anecdote about the TECO editor (or some other emacs predecessor). Where the editor was being used by secretaries and other non-programmers, but they'd have no problems configuring it with the documentation they had available - they were never told that they were actually programming when they were doing so.

Perhaps it's a bit much to give the user a full-blown programming language for their configuration and expect them to have no problems, but it seems like limited programming concepts can be learned by just about anyone if presented in the right way.


I agree. I have seen incredible things done in Excel in the most elegant ways, by people with zero programming, math or computer backgrounds.

And as you said, presentation is key. Presented as "use these tools to set this up" things could go well. Presented as "write this in python" it may not go at all.


Not forcing them to learn a programming language, but instead forcing them to learn a half-assed quirky ham-strung config language that's just as complex as a programming language's syntax, and as hard to learn as a programming language, but isn't useful for anything else, or widely supported by any other tools, or well documented, or well designed, certainly isn't doing them any favors, or respecting their time and effort.

Case in point, PHP's "Smarty" templating engine. Because lord forbid people learn to use PHP as a templating language (which is ALREADY IS), when instead they can learn to use an ad hoc, informally-specified, bug-ridden, slow implementation of half of PHP, which is implemented in PHP, and dynamically generates PHP code.

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

>Greenspun's tenth rule of programming is an aphorism in computer programming and especially programming language circles that states:

>Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

Your templating engine sucks and everything you have ever written is spaghetti code (yes, you)

http://www.workingsoftware.com.au/page/Your_templating_engin...

>This is where folks will start to get technical about why their code isn't spaghetti code.

>Oh no! Not me because I use (Django|Rails|Haskell|Node.js|Symfony|CakePHP|Tornado|Plone) and the superior templating language (Django Template|ERB|Mustache|PHP|TAL|Smarty) doesn't allow business logic in templates and enforces a separation of concerns

>-You, the reader, December 2011

>The problem is that in every single one of these circumstances, the templating engine has 2 primary responsibilities:

>The fragmentification and inclusion of common interface assets (to avoid duplication and maintaining the same code in multiple files) - eg. headers, footers, sidebars and the like

>The output of dynamic content

>The output of dynamic content requires logic (if/else, loops being about the bare minimum).

>The creators of templating languages have, to varying degress, availed themselves of a false dichotomy, namely that there exists "presentation" logic and "business" logic, the former of which belongs in templates and the latter of which has absolutely no place in templates and must be banished immediately.

>I say false dichotomy because this is an arbitrary separation - a choice made by the person who created the templating engine and is completely subjective.

Yet another reason smarty sucks

https://www.sitepoint.com/community/t/yet-another-reason-sma...

>Which brings me right back to my chief criticism of smarty which to this day it’s adherents can’t answer – what the Hell is the point? [...]

>I hated smarty from the day I tried it and I was always wondering why would anyone use it since you’re forced to learn some weird syntax that’s so similar to php - so why not use php from the start?

https://news.ycombinator.com/item?id=4605887

>PHP is itself a template engine. Bolting a template engine on top of that makes no sense whatsoever.

https://news.ycombinator.com/item?id=20736493

>+1 for Turing complete programming languages instead of half-assed config languages.

https://news.ycombinator.com/item?id=20735231

>I was suspicious of YAML from day one, when they announced "Yet Another Markup Language (YAML) 1.0", because it obviously WASN'T a markup language. Who did they think they were fooling?


> your program crashes because of something that would be trivial to catch with any simple type system

Then goes on to recommend python. What?


Sure, Python is dynamically typed, but it supports type hints and avoids implicit type conversions. I'm sure that quote is aimed squarely at YAML, it is an absolute nightmare by comparison.


I'm happy with json5 for now, it addresses my complaints with json and is supported by the main tools I use.


I'm gonna call this thermonuclear programming... Exploding your program with another program.


How about something like HCL?


Don't we have configs so we can change programs without programming?


> termination checking > Anyone knows examples of conservative static analysis tools that check for termination in general purpose languages?

As this is provably impossible, this "solution" introduces exactly one of the problems the author was trying to avoid:

> can't be validated


(author here)

That's why 'conservative'. I.e. it's allowed to reject a valid, terminating program, but if it does pass the check, your program is guaranteed to terminate. This is something that's possible, the only question is the tradeoff between the subset of the syntax and how complicated is the static analysis.


But if it's allowed to reject a valid terminating program it's allowed to reject arbitrary, otherwise valid, configuration programs. In other words, you can no longer trust the output of the validator - the weakened model significantly reduces the utility of the validator.


That's how most analysis tools work. Not necessarily 'dynamic' languages even, e.g.

- Clang (depending on warnings level) may reject a valid program -- doesn't make it less useful, you just suppress the check for the offending line and carry on

- Rust borrow checker may be seemingly picky and reject a perfectly valid program form your viewpoint. Does it make it less useful? I wouldn't say so.


Right, but if you've introduced the ability to ignore the validator then you've traded away the guarantee that your config program will be safe to execute.

My point is that the validator can't give you the safety property that is claimed as a defence against one of the inherent issues with this approach.


Ah, maybe I wasn't clear enough there -- the validator/analyser is supposed to run by both parties, the party who writes config, and the party who loads it as well. So you can reject a malicious config before trying to execute it.

I mostly have a non-malicious user in mind though (i.e. end-user software, where the software and the config have same permissions)

If you do have such security concerns, you probably need a sandbox at some point. E.g. big source of my frustration are CI pipelines -- they run isolated anyway and execute arbitrary code. Having a YAML there does nothing for the security.


Even if I'm not concerned with a malicious actor, I ought to be concerned about silly future me that accidentally introduces an infinite loop into the config, which then makes it to production and is able to wreak havoc, because silly past me had to disable the validator for this config because of "that pesky validator bug that only shows up on Tuesdays"...


My gut is telling me NO NO NO NO NO NO! But like most things development I’m sure the answer is maybe. Some of the time. For a specific set of problems. In certain cases. With certain languages.


Arras uses a JavaScript file that generates JSON.


Agree. Config is hard-coding. So code it.


I came to a very similar conclusion with a big IF. If your project isn't concerned about security, then yes, Python is a way to go.


Many real programming languages are compiled. Does it mean I have to recompile the binary every time I change config?


Many such 'real' languages have support for some kind of interpreted mode.

For example if you set a path to Stack as a script shebang, you can write interpreted-mode shell scripts in Haskell.


What is the support for the "interpreted mode" in C, C++ or Rust?


The author mentions XMonad, which does precisely that. However, you can update XMonad without "restarting" the program. When you update the config, which is just a haskell program - the config is recompiled and launched. The state of your current session is then passed over to the new config process, and the old one config process is discarded.


dlopen? :)


Lua is great for this.


What about jsonc or JSON5?


Looks like author does not know what is it :)


uhm... toml and yaml are fine.

You could also use JSON or environment vars...


Have fun writing a GUI to edit your config file...


[flagged]


If you think about it, configuration is a form of scripting.


I wouldn’t call Python a “real language” to begin with.

Troll aside, nix is mentioned and IMHO is the perfect tool for the task, but the author throws it away in few lines because it’s an “overkill”.


Turing completeness in config files is a proven bad idea.

If it is truly needed, use/write a config generator instead. We have so many tools in the build chain now that config generation can be easily automated into the deployment process.

See m4, BPF, PDF, etc...


There are plenty of examples where eschewing Turing-completeness has resulted in safer systems that are easier to reason about.

I haven't experienced cases where I've needed Turing-complete configuration, and regard Turing-complete configuration as a code smell. However, people at least feel they sometimes need Turing-completeness. When they do, I think we can all agree that ideally they'd not need to add another language to their project in order to do so.

I think the ideal case would be a declarative Turing-incomplete configuration language with an escape hatch to a Turing-complete superset, similar to Rust's unsafe blocks. Bonus points for forcing impurity to be similarly confined without scaring people with monads.

If done well, with both an AoT compiler generating highly optimized native code and an interpreter for loading configuration, one might even be able to bring the "configuration should be Turing-incomplete" and "configuration should be in the same language as the application" folks under the same big tent. In an ideal world, the AoT compiler could optionally partially evaluate your program with respect to a configuration file, basically conditional compilation on steroids, with guarantees that the conditional compilation hasn't altered program behavior.

Static analysis would be easier, both for developer tools and the compiler's optimizer. A quick code search would help you focus on the gnarly bits of code. A code reviewer would take pause before approving some code with an "unsafe impure loopy {}" block (assuming "loopy" is the keyword for the Turing-complete superset).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: