> Having a programmable config is how you end up with horrible interdependencies...

pdonis · on April 5, 2020

> The point of having configuration as code is having all code related to config in a single file.

This makes no sense, because the program has to understand the config in order to use it. That means the code that defines what should be in the config and how it's structured has to be in the program, not the config.

danShumway · on April 6, 2020

No -- you're confusing generation and parsing. Your program is still (often, but necessarily) going to consume a static data structure. Properly handled, using code to generate a config at runtime can reduce the number of `if` statements and weird logic you have inside of your main program.

Purely as an example, let's say I'm writing a program that wants to consume a list of files in a JSON config.

  {
    "files": ["/path/a", "/path/b"]
  }

Now, let's say I want to pass in a wildcard pattern, or I want to exclude a specific directory, or I want to dynamically load the file paths based on a REST request, or some kind of other complicated crap.

If I want to have a static JSON file as the config, I have a couple of options:

First, I can put logic inside of my program to handle wildcards, and to distinguish between filepaths and URLs, and to have exclude paths. This is not ideal because it makes my program much more complicated, and it's not even complication for a particular useful reason. 99% of the time I'm not going to need those options, so I'm writing all of this code to try and anticipate weird inputs that most people don't need. It also forces me to make opinionated decisions about stupid stuff like, "what should the glob format be? What Regex variant are we using?" None of these are questions I care about or want to answer.

The second option, which would be a lot better, is I could have a 3rd-party program generate a config on the fly, write it to disk, and then launch my actual program when it's done. This means I don't need to have a bunch of extra logic in my main program, and everything is a lot less opinionated.

BUT, it also means that the logic to launch my program is split up across multiple files, and now I'm adding some weird dependencies to my runtime. It's a hacky solution that's layering a bunch of complexity on top of what should be a simple, "launch the program."

----

So, what's a third option? I write my config file with code, which my program will execute, and which will return the original static config that's nice to consume.

  module.exports = {
    files: [
      //some big list of files, maybe populated with a glob.
      //exclude any filepath that contains the letter 'c'
    ].filter((file) => !file.includes('c'));
  };

I still need the code to spit out an array of files at the end, but I can do that after making some network requests, or running custom filter functions on that array, or doing really whatever the heck I want.

In many cases this does not introduce any new dependencies to my runtime. It's a lot less error prone -- I can launch multiple instances of my app at the same time without worrying about files overwriting each other. It's also a lot cleaner and easier to refactor, because all of my filter functions and Regexes and generation logic lives in one file.

And most importantly, for users who aren't taking advantage of the dynamic nature of the config, I'm still ultimately shipping a smaller, simpler program with a less complicated API. I don't need to ship a bunch of error-prone logic dedicated to reconciling 20 different flags or options from random pull requests.

----

Now, taking this a step further, you don't necessarily need to return a static data object. If I'm building a static site generator, maybe instead of embedding a bunch of logic in my system to load/parse Pug templates, instead I just consume a function from the config that takes a string and returns a string.

And the reason that might be attractive (beyond what I list above) is that the programmatic APIs for a library like Pug are already just as short and easy to use as any set of YAML options you can come up with. So instead of forcing your users to learn your new APIs to slot in a custom template engine, they just use the APIs they already know and enjoy, and your documentation is shorter and easier to read, and everyone wins.

And again, importantly, getting rid of that extra logic has reduced the amount of code in your main project and made stuff like validation and loading easier to reason about and easier to debug.

The way you want to think about a config that returns a function is the same way you already think about functions-as-parameters in functional languages like Lisp or Javascript. The trick is realizing that there is no real difference between a config file on a filesystem and an API for a library you're importing at runtime.

pdonis · on April 6, 2020

> you're confusing generation and parsing

I understand the distinction you are making, but I don't think it's what the article is illustrating in its examples.

> I write my config file with code, which my program will execute

If you, the developer, are writing this code (for example, to provide some pre-packaged standard configs for different use cases, and the user just selects which one they want to use), that's one thing.

If your users are writing this code, you have a whole new set of problems that you don't have if users can only provide text config files in a safe format.

I can see providing a separate config editor program users can use to generate configs for your main program; you can write the code for that program to ensure that the generated config files are safe. But such a program would be generating configs separate from the running of your main program.

If your configs are really complicated enough that they need to be generated at the time your main program runs, to me that means your code needs refactoring. Either that, or what you are calling a "config" is really a plugin/extension/scripting system, where your users can write their own code and your main program is more of an interpreter for it. That's a whole separate use case that I don't think is well described by the term "config". Certainly I would say that's the case if your "config" can return a function to your main program.

bryanrasmussen · on April 6, 2020

>Either that, or what you are calling a "config" is really a plugin/extension/scripting system

based on Greenspun's 10th rule (Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of CommonLisp.), perhaps sufficiently complicated config system will contain a badly designed plugin/extension/scripting system inside of it.

danShumway · on April 6, 2020

> If your configs are really complicated enough that they need to be generated at the time your main program runs, to me that means your code needs refactoring.

I'm not sure that's true. In fact, I would go so far as to say for some programs, it's the opposite.

In the example I describe above (forgetting about lambda functions or anything like that) I'm shipping a program that consumes a small set of very rigid input options. I consume a single array of strings that need to resolve to file paths.

That's incredibly easy to validate: that's the type of limited input where I can be pretty certain that my program will not have any bugs or vulnerabilities around the input. It's also an incredibly simple API to teach to users, it's literally one option. And importantly, my program is still pretty powerful -- you can do whatever you need to do to get that file list without me needing to extend what input I accept.

I can't think of a good way to convert that to a pure JSON/YAML config without either making my software more error prone or making my software much less useful.

If returning lambda functions from a config file scares you, forcing the config to be a static object literal but allowing that object literal to be generated at runtime will still (often) result in a runtime that is less error prone, easier to validate, and easier to debug when something goes wrong. This is because at the end of the user's dynamic config, they'll still be returning a small, easy-to-understand object literal -- whatever logic was used to build it.

If instead you put that logic in your main program, and you accept globs for files, and you start doing complicated things like allowing arrays of multiple globs and file excludes -- well, suddenly you have a big chunk of parsing logic in your main program that end-users can't debug when something goes wrong. It becomes harder to validate whether or not you have some bug around the order that patterns are passed to you, or whether there's an error if you try to use a globbed filepath as an excluded filepath, or whatever.

In other words, it is usually better to use procedural logic to generate declarative logic than it is to use declarative logic to generate procedural logic. Of course, your program might be simple enough that you can get away with a config file that's turned into a bunch of `if` statements when its parsed. But it probably would be simpler if you got rid of as much parsing code as possible and focused on more rigidly validating a smaller, stricter set of options.

> I can see providing a separate config editor program users can use to generate configs for your main program

If you ignore the stuff about lambda functions, that's exactly what we're doing. We're just piggybacking on a real language to do it instead of wasting time writing our own graphical/command-line utility.

----

Where returning functions is concerned:

> Either that, or what you are calling a "config" is really a plugin/extension/scripting system

I think this is a kind of blurry line. I'm not going to say there is no difference between a plugin system and a config, but I have run into programs that have both configs and plugin systems, and where the plugin system exists mostly to deal with the fact that the config API is not very useful.

As a user, given the choice between learning one API and two, I usually prefer to learn one API. A lot of static site generators come to mind; I don't want to learn two different APIs for a templating engine just so I can change how links render. I don't want to learn both your plugin API and your config API. Just give me one, single API for templates that can be fully described in a single paragraph of text.

I dunno. I've been doing functional programming long enough that the idea of lambdas just doesn't scare me. Functional paradigms are wildly useful when building library APIs that get imported; I don't see why we should throw them away just because our API lives in a dotfile.

pdonis · on April 6, 2020

> I consume a single array of strings that need to resolve to file paths.

And I'm finding it very hard to imagine why your users would need to write code to give your program those paths, instead of just giving you a text file with a list of file paths in it. Of course I don't know anything about your program or its use cases other than what you've said here. But I do know that if your users are writing code, what you're describing doesn't look to me like a config. See further comments below.

> that's exactly what we're doing

Making your users write actual code is not the same as providing them with a config editor.

> I think this is a kind of blurry line.

I think the line is pretty clear: are your users expected to know how to write code, or not?

If they're not, then why would they need the ability to write code to generate your config, since they can't do it anyway?

If they are, then as I said, you don't have a static program with a config, you have a scripting language and an interpreter. Your program is just a customized version of the standard interpreter for whatever programming language your users are expected to use.

I don't have any problem with this if it's appropriate for your users. I just don't think the word "config" is the right one to use to describe it.

> I've been doing functional programming long enough that the idea of lambdas just doesn't scare me.

Whether it scares you or not is irrelevant unless you are the only one writing configs for your program. If your users are doing it, the relevant question is what scares them, not you.

danShumway · on April 6, 2020

Sure -- if your users can't write code, then you obviously can't tell them to use code to generate a config.

But take a step back and look at a lot of the examples that people are bringing up here -- Docker, Babel, Webpack, Github Actions, code linters, testing libraries, Ansible. These are all environments where it's pretty reasonable to expect users to know how to write code.

If you're mostly getting hung up about me calling this a config, let me see if I can rephrase the original article's point in a way that is more palatable to you:

A lot of people are building programs explicitly for programmers that use config files, when those programs would be much better suited by instead dropping the configs entirely and using something more like scripted options.

These program authors struggle building a config schema that is powerful enough for programmers yet simple enough that programmers won't get irritated learning yet another multi-page API. What the authors have missed is that they shouldn't be trying to control their program with a config at all. Their entire product is geared at programmers, so they should just let programmers generate the options that they pass in.

There's very little reason for Babel to be setting options with a JSON config file instead of a JS script, because nobody uses Babel other than Javascript programmers. For these specific programs that are struggling to express themselves with YAML/JSON, a non-config scripted setup would be less error-prone and easier for their users to learn.

ZenPsycho · on April 6, 2020

Keep in mind that config files are not just about writing them. They are also about reading, and in my personal experience, when I encounter a heavily scripted config file in some javascript project, my inclination is to flatten it out to a plain JSON file just so I can see what it's actually doing.

This is the beginning of the logic that lead to things like Automake, and completely impenetrable makefiles. The problem with programmable scripted config isn't the writing of it. it's the reading and maintenance. Programmed config is a layer of obfuscation, that I, as a maintainer, need to spend time and mental effort decoding to get work done.

danShumway · on April 6, 2020

I'm sympathetic to this, and I suspect the original author is as well (see their comments on Emacs). I think you're making a very good point that people need to keep in mind when they write configs.

However, very specifically where Makefiles are concerned:

> This is the beginning of the logic that lead to things like Automake, and completely impenetrable makefiles.

The problem with Make in specific isn't scripting the build, it's that Make is a completely separate domain-specific language that isn't particularly well structured or particularly elegant in the first place.

The comparison I would draw is with Grunt.

Grunt is also a programmable config that allows you to do powerful stuff. It's also IMO a giant disaster and most projects shouldn't use it. Grunt is using programming to control how a project builds, but it's also tying tons of layers of abstraction into this secondary API that's unreasonably difficult to debug, and that lends itself to really bad, unnecessarily verbose control flows.

I was teaching interns how our build process worked at a company, and I had an "aha" moment when I realized that in the time I could teach them Grunt's API, I could also teach them how to write a maybe 100 line pure-JS script that did literally everything Grunt was doing for us.

When I look at Makefiles and Gruntfiles and cringe, very often the thing that makes me cringe are these giant pillars of abstractions and control flows of "program X calls program Y and pipes its output to program Z, and good luck intercepting that specific output to see what's going wrong".

There's something to be said for getting rid of these giant build systems that are spanning multiple processes and languages in general, and instead doing something very direct in one language in one file. And I kind of think that's a concern that cuts across different config types. At my current job, we're in the process of dropping Team City, because we realized that for our needs, we can get by with a single 30-ish line bash script that gets copied to each build machine and auto-run every 5 minutes. And Team City is using a lot of declarative options in a very structured web UI, but the Bash script is still a lot easier to debug and reason about than Team City's whole... thing.

Which is not to say you're wrong, I do think you're making a completely valid point. But I don't think the obfuscation you're talking about is inevitable, I think that's just a side effect of programmers thinking they're clever and trying to make generalized build systems for specific tasks.

ZenPsycho · on April 6, 2020

it comes down to what exactly is the configuration for. Is it a few settings (server urls, ports, table names, s3 buckets, directory paths, usernames and passwords), or are you actually scripting the tool? Build systems almost straddle the two worlds - they're kind of somewhere between config and scripting. At some point you gotta make the call: if the configuration file is hard to write, check first if it's a problem with the tool the configuration is for. then second, check if what you're actually doing is programming, not configuring. This is repeating what others have wrote, but I solidly agree- at some threshold of complexity you've got to stop calling it "configuration", and start calling it a greenspun's 10th:

http://wiki.c2.com/?GreenspunsTenthRuleOfProgramming

chrisweekly · on April 6, 2020

Specific to your comments on Grunt: mostly agreed w your analysis; IIRC, I felt it was part of why Gulp kind of ate Grunt's lunch when it came on the scene. But that was a long time ago.

tomlagier · on April 6, 2020

This is exactly why Gulp was such a refreshing change of pace from Grunt. Gulp was "just JS" APIs, and made it _so much easier_ to understand what the heck was going on during build.

pdonis · on April 6, 2020

> These are all environments where it's pretty reasonable to expect users to know how to write code.

For those kinds of environments, yes, I agree that it makes a lot more sense to just have users write code in the same language that the main program is in, than to try to hand-roll a config file format and structure.

I didn't get the sense from the article that its scope was limited to these kinds of environments, but I may have missed something.

ZenPsycho · on April 6, 2020

move over templated yaml files, get ready for templated python files. I heard you like config files, so I set up a config file for your config file so you can config while you config.

spc476 · on April 6, 2020

You might think you are joking, but that's exactly what happened with sendmail. It's configuration file is so complex (and I think Turing complete) that people came up with a completely different configuration file to generate the configuration file.

Thank god I no longer have to deal with sendmail.

ZenPsycho · on April 6, 2020

I was making a joke that you can either laugh at or cry at.

danShumway · on April 6, 2020

Real talk though, at the point where people are building templating engines for YAML files, doesn't it become kind of obvious that YAML isn't powerful enough or expressive enough for at least some of the projects that are using it?

We've got a project in Python, but Python is too hard to read, so instead we have YAML. But YAML is too hard to write, so now we have YTT.

We've now taken a program that was written in one language, and magically transformed it into 3 languages instead :)

pdonis · on April 6, 2020

> We've got a project in Python, but Python is too hard to read

You said the kind of scripting system you were describing was for cases where your users can write code.

If your users can write code, I don't see why Python would be too hard to read.

If your users can't write code, then, as you agreed, you shouldn't expect them to, which means the kind of system you were describing won't work. And that means you should rewrite your code so it doesn't need such a complicated configuration that it seems like writing code is necessary to express it.

danShumway · on April 6, 2020

Yeah, I was trying to joke -- if your users are OK with Python, then just let them use Python.

What I was getting at was that this is a scenario where concerns over Python users needing to actually use Python has morphed into those same users now needing to use 3 languages.

ZenPsycho · on April 6, 2020

If you're templating YAML, it just means that you don't know that you can generate YAML. Dev ops is a very different world. Yes they can script, but that doesn't mean they will modify any part of your program, even a python config file directly. They will template it.

If you find yourself thinking that your config language isn't powerful enough, the solution is to either generate it, or to fix the program to not require a complex config file. There is a long term, 10 year cost to making config turing complete that I don't believe is worth paying. It'll save you time up front, maybe, but I certainly am not going to be impressed with having to modify your program just to "configure" it.

Finally, given that's the environment we're dealing with, from now on the rational thing to do would be to design config file formats expecting them to eventually get templated. Because that's how they get automated.

danShumway · on April 6, 2020

> Finally, given that's the environment we're dealing with, from now on the rational thing to do would be to design config file formats expecting them to eventually get templated.

But... that's kind of what a dynamic config file is.

If you're writing a JS program that imports and runs a JS file to generate an object literal for its options, you have just made a config file format that you expect to be generated, or templated, or whatever you'd like to call it. The only difference is that you're using using 1 language instead of 3, and your users don't need to have a separate step during build to generate the file.

People keep on circling around to, "but what if my user doesn't know the language or isn't comfortable with it?" And sure, if your user isn't comfortable programming, don't do this. But if your user does know how to program, don't force them to learn a new templating format, just let them write the code that they already know.

Or honestly, just play the joke out straight and build a template engine for your source language. I get that you're making a joke above, but a templating engine that spits out valid Python/JS code from valid JSON/YAML is trivial to write.

  var template_source = fs.readFileSync(process.argv[1]);
  var output = JSON.stringify(JSON.parse(template_source), null, 2); /* check malformatted input and prettify */
  fs.writeFileSync('config.js', `module.exports=${output};`);

So if you go this route and a few of your users end up wanting to template JSON/YAML anyway, then fine, they still can.

ZenPsycho · on April 6, 2020

great now integrate that in all the automation tools that devops use, and train them how to use the integration instead of just templating the config file.

or put another way, stop trying to reinvent the wheel and expect that OTHER people will script the generation of the config file, in their own way using their own tools and languages, and just make that easier for them.

when devops are involved, the question isn't whether they're comfortable with whatever language, the question is how easy it will be for them to template the file, because they simply will not learn your special configuration language, whether they possibly could or not.

The point isn't about whether we should make the automation people learn a new templating language. the point is they simply will not. they will find where the relevant parts of the config file are, and they'll stick a ${my_variable} in there, process it using their own tool, whatever it is, regardless of how cleverly (or not) you designed the scripting language around the config. And now whether your config file is python yaml or json is just irrelevant. They will not use python to loop through a file list, even if they could, or know how. They will template that. all you've done by switching your config to python is create a breeding ground for script injection attacks.

So ultimately it's the wrong question. What I'm proposing is, design our config file expecting the ${subsititution_variable} will get edited in at some point, and you guard against script injection assuming that is comimng.

danShumway · on April 6, 2020

> great now integrate that in all the automation tools that devops use, and train them how to use the integration instead of just templating the config file.

I don't understand what you're getting at. This is just a 3 line executable you call when you're done with whatever templating process you want to use. Ship it with the program.

  node generate_config.js path/to/json/templated/with/literally/any/tool.json

But if you really think that a devops engineer will struggle calling that command, then fine. Put these 7 lines at the top of your program.

  var config = (()=>{
    try {
      return require('./config.js');
    } catch (err) {
      return JSON.parse(fs.readFileSync('./config.json');
    }
  })();

Now your engineers can stick a `./config.json` in and template it with whatever automation tool they want in whatever integrated pipeline they want to work with. They never need to acknowledge that runtime option generation is possible. They never need to run even one extra shell command.

So I just don't see this as a real problem. If your config script is ultimately returning a static object literal then there is no special knowledge with this setup that anyone needs to learn. All we're doing is giving people the option to script a config on a lower level in a real programming language. We're not inventing a new scripting language or toolchain. And it's still trivial for engineers to ignore that if they want to compile a config with a 3rd-party templating tool.

> What I'm proposing is, design our config file expecting the ${subsititution_variable} will get edited in at some point, and you guard against script injection assuming that is coming.

I would maintain that a config file isn't really where you should be handling security, that's something that should be ingrained into your program at a deeper, more fundamental layer.

But granted -- if for some reason your application can't be run in a secure unprivileged environment, and you need to accept untrusted config files within that environment, and you also can't run the config generation code in a secure unprivileged environment, then it would make sense to strictly limit a config file to a static format like JSON, where you know that no one else can inject actual code. But that's a very different concern than usability, and affects an even smaller subset of software than what we were talking about above. This is just not relevant for systems like Github Actions, Babel, Webpack, or Ansible, all of which are built on either the assumption that the config is trustworthy, or that the trustworthiness of the config doesn't matter because of sandboxing.

In most (not all, but most) cases, if you don't trust your user config, you're already at the point where you should be running your software in an unprivileged sandbox.

ZenPsycho · on April 7, 2020

Yes of course you're right about all that. My only point is, it's a fact that devops add templating tags to YAML files. This is not theoretical. The templating tags are likely not escaped properly, so, aside from the worst case scenario of script injection, the more innocent situation of causing parsing problems is possible and likely.

It's worth very seriously asking the question of WHY devops are doing this instead of just generating the yaml file with code, which from our perspective as developers would be easier and more sane, and allow us to use the full power of a programming language. I think it's a more important problem to solve than ergonomics of config for the original developers, because end of the day, who will need to interact with this configuration file more? Developers or the people responsible for deploying the application?

chrisweekly · on April 6, 2020

+1; for me, this comment sums up my feelings on the entire (worthwhile) thread.

saagarjha · on April 6, 2020

Here’s a real-world example: clangd, the language server, takes a file with the compiler flags it should use for completion. clangd works with C, C++, Objective-C, and Objective-C++; obviously, I would like to use different flags for each. Currently I get around this problem by running four “instances” of clangd, one for each language, each with its own configuration file. I wouldn’t need to do this if the configuration file was a script that could dynamically generate flags by looking at the source code and inferring what flags would be appropriate.

naniwaduni · on April 6, 2020

> If your users are writing this code, you have a whole new set of problems that you don't have if users can only provide text config files in a safe format.

If your users are writing this problem, you have a whole set of problems that aren't being solved if users can only provide text config files in a safe format.

saagarjha · on April 6, 2020

> The second option, which would be a lot better, is I could have a 3rd-party program generate a config on the fly, write it to disk, and then launch my actual program when it's done.

The fraction of my dotfiles that is simply small scripts that “wrap” actual programs so I can do this is extremely high. In some cases I have resorted to code injection to accomplish this sort of dynamic configuration generation without putting a file on disk (yes, using fmemopen).

flohofwoe · on April 6, 2020

I have a nice counter example why "config files as source code" is a bad idea.

The emscripten SDK (compile C/C++ to WASM) writes a config file as python source file when it is installed. At first glance it looks like a fairly simple key/value config file, but it cannot be parsed by a 3rd-party-tool (for instance to extract the paths to the installed emscripten SDK) unless the file is evaluated with a full-blown python interpreter.

I don't understand how this can be considered a "good idea" under any circumstances ;)

nitely · on April 6, 2020

The SDK could have a command line tool (or option) to dump the config in JSON format, if that config is meant to be consumed by other tools. I can come up with examples where config files are a bad idea all day long. Bottom line is neither is a silver bullet, and there are always trade-offs to be made.

hinkley · on April 6, 2020

I am very much of the 'just code it' crowd, with a few small allowances that basically add up to a subset of the 12 Factor App philosophy.

That said, the combination of SaaS and CI/CD makes that a bit of a challenge. Everyone wants to launch features darkly. You don't have to change your strategy to do that per se. You can move everything else into code or into a service discovery system, but the 'feature toggles' have to live somewhere, and they are essentially config.

If your CD system were ridiculously fast, you could just push a commit to turn something on and back off again. But I haven't seen many systems that are as fast as pushing a change into for instance Consul (even if you are using git2consul, that'd be faster than a typical CI/CD build)