
Configs suck? Try a real programming language - gyre007
https://beepb00p.xyz/configs-suck.html
======
yongjik
This just sounds like solving a problem at the wrong level and making it
worse. If your config is so complicated that it warrants a full-blown
language, the logic should go into the _main program_ that reads the config
and decides what to do, not left inside config.

Having a programmable config is how you end up with horrible interdependencies
where function f() does foo, and you can't figure out why it's not blowing up,
until you realize the only caller is dynamically configured in
staging_env_4.json which is emitted by config_generator.py which always
ensures the precondition for foo, but only if it's also generating
prod_env.json at the same time, but it's OK because we always do that anyway,
and you're not sure if you need another cup of coffee or whiskey.

~~~
amelius
I don't agree. Configuration files tell a program what to do. You want
expressive power there. Telling a program what to do merely through values
only makes things more indirect.

To give an example: you can get into the situation that some configuration
values are only valid when configuration value X is Y, and otherwise other
configuration values are valid. What better way to model this than through an
if-statement in a programming language? This makes it immediately apparent
which settings have effect.

~~~
Smaug123
I am involuntarily howling internally in anguish, and if the feeling could
speak, it would be screaming "parse, don't validate" ([https://lexi-
lambda.github.io/blog/2019/11/05/parse-don-t-va...](https://lexi-
lambda.github.io/blog/2019/11/05/parse-don-t-validate/)).

If some config values are valid when X is true, and other config values are
valid when X is false, then use a different schema for the two cases. This is
what discriminated unions are for: for representing conditionals statically,
inspectably, serialisably!

If some config values are valid when X is 0.339, and some are valid when X is
0.340, and some are valid when X is 0.341, and so on, then… I don't know how
to help, and maybe I must just avert my eyes in shame as I implement the
dynamic logic. (But in that case it seems a bit odd to say you're "telling the
program what to do" with this configuration; I'd say you're bolting on a
little extra program at the start.)

~~~
saagarjha
Sometimes “valid” doesn’t mean “illegal to have in a configuration file”, it
just means that “in this specific case, the configuration should have a
certain value”. For example, on my Mac I should prefix my commands with “g” if
I want to access GNU tools, while on most Linux systems I don’t need to do
this. Trying to run gsed on Ubuntu would be “invalid” in this case but a
parser can’t help me here.

~~~
Smaug123
Yes it can: if your config file is more declarative than simply "lists of
command lines". The program can determine what is true of its environment, and
can construct command lines appropriately given the data about _intended
outcome_ that is stored in the configuration file.

~~~
saagarjha
I can’t change the program.

------
q3k
As the article mentions, there's 'real' programming languages made for
configuration, that solve at least some of the issues outlined, like Dhall,
Cue, Jsonnet. After using both approaches a number of times (a general purpose
language vs. one of the three), I'm imploring anyone trying to give this a
shot to _not_ use a general purpose language.

For instance: I'm mostly familiar with Jsonnet, which has guarantees that make
it much easier to use than a Real Language: no arbitrary file loads (paths
must be static), no side effects, no ambient state (like env vars or
Internet). Relative imports make it very easy to drop in the language
somewhere in a repo and not have to worry about venvs/PYTHONPATH/system
libraries... The interpreter being a single binary and a thin C library also
simplifies integration quite a bit, compared to interfacing with full-blown
interpreter/compiler installations.

The downside is, of course, that you must learn a new language. But we're all
pretty good at this (unless you're one of those people obsessed with using
JS/npm everywhere for some inexplicable reason). Also Jsonnet still has the
pedigree of being a superset of JSON, which is painful (ie. magical int/float
number type). But the alternatives (Dhall, Cue) fix that, and also provide a
vastly better type system.

I've been using Jsonnet in production for a couple of years (after leaving
Google where I Saw The Light with BCL), and couldn't be happier. The
infrastructure of the Warsaw Hackerspace (a production k8s cluster) is all
brought up using Nix and Jsonnet and is open for all to inspect [1].

And yes. Let's please stop using JSON and YAML for everything. Even Python is
better than yet another YAML templating tool.

[1] -
[https://gerrit.hackerspace.pl/plugins/gitiles/hscloud/+/refs...](https://gerrit.hackerspace.pl/plugins/gitiles/hscloud/+/refs/heads/master/cluster)

~~~
desert_boi
BCL is something that's new to me. Anyone know what it stands for?

~~~
q3k
Imagine Jsonnet, but internal to Google, old and quirky. It's not public, but
the name has been leaked for a while now.

Edit: here's a thesis that works with a related language, GCL, with code
samples:
[https://pure.tue.nl/ws/portalfiles/portal/46927079/638953-1....](https://pure.tue.nl/ws/portalfiles/portal/46927079/638953-1.pdf)

~~~
oddthink
"Old and quirky" is an understatement. It's a god-awful mess, with semantics
so murky that people end up just copying patterns and hoping that it'll do
what they want it to do. And it can't be killed, because every bit of those
implementation-defined semantics are used by someone.

~~~
q3k
up.up.up.up

------
jevgeni
Under no circumstance would I allow my projects to be configured by a full-
blown programming language. There is no way to ensure that there are no
regressions under different configurations. The amount of complexity
introduced just for configuration is insane.

If you feel the need to put code in your runtime config, that means your
design sucks.

~~~
q3k
But configuration is a complex task! At least from my experience. Limiting
yourself to solving it with mediocre tools (for instance language that
disallow writing tests or even comments!) is a footgun, where you disregard
good engineering tools for stringly types, duplicated configuration.

~~~
jevgeni
We might have different target audiences of our work, but in my line configs
should not require tests. Because their task is to adjust the functioning of
an application. If they all of a sudden require tests, that means (at least to
me) that it's not part of the configuration anymore, but a part of the
application proper and should be treated as such.

~~~
darkwater
And what about "configs" like Kubernetes deployments?

~~~
shawnz
You could use the Maven approach: for whatever the declarative configuration
isn't powerful enough to specify, you write a plugin. That plugin can then be
developed and managed just like any other software project, rather than having
to bring those tools into the configuration realm.

~~~
rfoo
Wait. In this case aren't the plugin itself "code in runtime config"?

edit: Now I understand that being a plugin it must have proper abstraction,
but ad-hoc logic do exist.

~~~
shawnz
I guess it depends on how you look at it, but if the plugin is a self-
contained module that can be managed independently from the app it's
configuring, I wouldn't really say it's code that's "in" the configuration.

> ... but ad-hoc logic do exist.

True. I don't disagree, no tool is right for every job.

------
klysm
Glad to see Dhall ([https://dhall-lang.org/](https://dhall-lang.org/))
mentioned here.

As we seem to be heading towards immutable-'everything' though, sometimes I
wonder how valuable dynamic (in the sense of picked up at start) configuration
is for backend services. It seems preferable to build everything, config
included, into one single artifact. Potentially this gives more optimization
opportunities as a lot branches could be completely eliminated (feature flags,
etc.). No doubt the simplest way to build your config directly into your code
is to just.. have it be code.

A potentially super annoying trade-off is that it's difficult to test minor
configuration tweaks, and the annoyingness scales non-linearly with your build
time.

~~~
q3k
And even more annoying if you would rather not restart your services unless
absolutely necessary (ie. you're handling long-lived TCP connections that
cannot be handed off gracefully). Not to mention that some kind of
configuration tweaks (ie. ACL changes, quota changes, etc) might happen at a
vastly different cadence than software rollouts.

~~~
jchw
> if you would rather not restart your services unless absolutely necessary
> (ie. you're handling long-lived TCP connections that cannot be handed off
> gracefully).

If you have no way to hand off your TCP connections gracefully, that is
probably not a great position to be in because computers are unreliable and
sometimes you need security updates. One approach I tried experimentally for a
TCP server (but unfortunately did not have the opportunity to try in
production) was to have a small server acting as the frontend that handled
sessions and called into the real implementation which was essentially
stateless. This small server could theoretically support self updating without
dropping connections, if need be. (The main reason I was doing this was so
that the rest of the stack could be dynamically-scaling based on Kubernetes,
while this part remained relatively static.) I think anytime you have forced
statefulness, it’s worth isolating as much as possible so it doesn’t constrain
the design of the whole system.

(Even better, of course, is to just have a protocol where clients can
gracefully handle draining and reconnect without side effects.)

> ACL changes, quota changes, etc

At a certain level I think these are runtime data and not configs.
Aforementioned service also dealt with ACLs and configs and we were serving it
through another service that persisted to a database layer. One cool thing
about doing it as a service is that you can manage and scale it like any other
service.

~~~
q3k
> At a certain level I think these are runtime data and not configs.

I think that's the gist of the issue - sometimes it's difficult to tell which
one is which :).

And I agree with the 'TCP connection in thin server' approach. Thankfully, I
only have to deal with two protocols where this is an issue: IRC and BGP.

~~~
jchw
>I think that's the gist of the issue - sometimes it's difficult to tell which
one is which :).

Absolutely... in fact, there's definitely stuff that I've had start as
configuration and gradually turn into runtime data, and there are things I
would call configuration that need to be dynamic at runtime too... I'm never
100% satisfied with the split.

------
zevv
This is how Lua got started - its predecessors started out as data description
or configuration languages. The users felt it would be useful to have some
forms of flow control, which led to the birth of Lua.

The whole story can be found here:
[https://www.lua.org/history.html](https://www.lua.org/history.html)

~~~
Rochus
It's still very suitable for configurations. The statements can be used e.g.
to configure rules.

------
jillesvangurp
These days, using a statically typed language for configuration makes a lot of
sense. You get stronger typing and with type inference not necessarily a lot
more verbosity.

Kotlin and kotlin script seems to work reasonably well for Gradle. Gradle
unfortunately has a bit of groovy legacy, which makes it a bit convoluted for
some things. But Kotlin DSLs are quite nice for configuring things and it
removes a lot of ambiguity when your editor can tell you a property doesn't
exist or that a list instead of a string is expected. or help you autocomplete
things that are allowed.

Basically the ultimate in configuration languages is something that has a
minimal syntax but provides enough structure for tooling to help you validate
things, provide autocomplete and other help, etc. There are now several
languages emerging that have both static typing and some level of support for
internal DSLs. Ruby sort of pioneered a lot of configuration DSLs back in the
day but since it was dynamically typed, tool support for these DSLs was a lot
harder. With languages like Kotlin, Swift, C#, Rust, etc. you get enough
expressiveness that you can do similar internal DSLs but with the advantage of
more robust tooling support.

~~~
involans
Agreed, but the best combination is type-safe and non-turing complete, ideally
with restricted effects. [Dhall]([https://github.com/dhall-lang/dhall-
lang](https://github.com/dhall-lang/dhall-lang)) gets this so right that I'm
increasingly shocked that people never seem to have heard of it.

------
conroy
_Bazel uses a subset of Python for describing build rules_

Bazel's configuration language is called Starlark[0] (used to be Skylark).
It's not a strict subset of Python. It has implementations in Java[1] and
Go[2]. I haven't had a chance to use it yet, but it seems very useful as a
general-purpose scripting language, especially for embedding into tools.

[0]
[https://docs.bazel.build/versions/master/skylark/language.ht...](https://docs.bazel.build/versions/master/skylark/language.html)

[1]
[https://github.com/bazelbuild/bazel/tree/master/src/main/jav...](https://github.com/bazelbuild/bazel/tree/master/src/main/java/com/google/devtools/starlark)

[2] [https://github.com/google/starlark-
go](https://github.com/google/starlark-go)

~~~
busterarm
Bazel is convenient but also a full-time pain in my ass.

~~~
q3k
As someone already put it, 'Bazel is the worst system except all the others'.
It has some warts (Python integration, non-hermetic-by-default rules, starlark
structure javaesque complexity ...), but I still haven't far a good
replacement for my use: a monorepo build system that has to touch a ton of
different languages and can be run in CI without having to prebuild tons of
fragile builder images.

------
mjw1007
One variant of this is to use Python syntax but parse it with the `ast` module
rather than evaluating it.

This avoids the security problems and the risk that someone will start to
write programs inside the configuration file, but gives you the reasonably
nice and powerful syntax.

It turns out that teenagers in my country have often done some Python in
school, so it's becoming one of the friendlier file formats for non-
programmers.

~~~
karlicoss
(author here). The ast suggestion sounds cool, never came to my mind. Thanks,
I'll try it sometime!

------
stared
I cannot disagree more. Configs are wonderful, as they follow the "rule of
least power" with putting settings in a place without logic operations,
variables, etc.

Yes, they are not general. Still, having one dedicated place for all constant
switches is much better than constants being out in a dozen (or hundreds) of
places all over code.

~~~
q3k
But that's the point of this. You can have a single high-level config that
emits the same high-level settings onto multiple config 'facades'. For
instance, a single jsonnet/python program will let you tweak a list of
domains, and that will in turn create a facade (or generate a file) for nginx
(to configure vhosts) and for a letsencrypt tool (to acquire TLS certs). Less
repeating across files, less chance of making a mistake, and since this is a
programming language, you can have a single codebase that performs facade
generation, without having to go through a CM system.

~~~
throw_m239339
But then your config files in Python also need tests. Are they still a config
file then or config scripts? How do you limit what they can and can do? Easy
to do with JSON or XML, harder with Python.

~~~
q3k
This is why I'm not a huge fan of using Python for this (see my other comment)
- and would much rather recommend Dhall/Cue/Jsonnet. And you can (and
definitely should) write tests for these, or even just assertions for
generated facades. It's much better than the alternative, which is just
testing in production/qa/canary.

~~~
jevgeni
I cannot imagine how handing over a code base with configs that themselves
require testing can be smooth.

~~~
q3k
I don't get it. If I get the choice between taking over a system where the
configuration is self-tested, vs one that is not, I would much rather pick the
first option?

You don't _have_ to test anything, just like in any software development. But
now you can, and you can do it in a side-effect-free environment, on CI push,
instead of realizing you made a typo when the CM kicks in, or worse, when your
system starts crashlooping on startup because of a configuration issue.

~~~
jevgeni
If somebody hands over an application to me where configs have unit tests,
this means to me that the application can fail catastrophically under certain
(unwanted) configurations. And seeing how I might be the owner of said
applications, chances are high that at some point I would need to change the
configs.

I, personally, would rather prefer straightforward static configuration and
very well written documentation on why certain configurations make more sense
than others. That way, I know how and when to change something in the configs.

I absolutely do not share the view that "tests are documentation", because
they do not communicate the most important thing in software engineering - the
intent and reasoning behind a certain design.

~~~
jt2190
> If somebody hands over an application to me where configs have unit tests,
> this means to me that the application can fail catastrophically under
> certain (unwanted) configurations

This would be a sign of a program that allowed input (configuration, in this
case) that was unsafe. That is completely a different matter to whether a
programing language is suitable for defining configuration. I suspect that
many an application has been written that assumes that because it's consuming
"static" configuration that is a simple text file, or a structured data file,
that the input is completely safe. (SQL injection can come from any
direction.)

------
thesuperbigfrog
Not a popular answer, but XML is mature, widely supported by many programming
languages, and addresses all of the frustration criteria mentioned in the
article:

\- doesn't have comments

XML uses the same <!-- stuff here --> SGML-style comments that HTML does

\- bits of configs can't be reused

XML can include other XML via XInclude:
[https://www.w3.org/TR/xinclude/](https://www.w3.org/TR/xinclude/)

\- can't contain any logic

You can technically put logic in XML using XSLT
([https://www.w3.org/TR/2017/REC-
xslt-30-20170608/](https://www.w3.org/TR/2017/REC-xslt-30-20170608/)), but I
would ask if one SHOULD put logic into your configuration. I personally would
recommend against it if at all possible. Keep configuration as declarative as
possible and put the logic in your program. It's better to have a
configuration logic module or use dependency injection than put logic in your
config.

\- Programming language constructs are reinvented from scratch

Again, XSLT could be used, but should your configuration really need to be
Turing-complete? Granted, XML is complicated and not without criticisms, but
it is mature, standardized, and well-supported. Python has great XML support:
[https://docs.python.org/3/library/xml.html](https://docs.python.org/3/library/xml.html)

\- can't be validated

XML can be validated against XML Schema
([https://www.w3.org/XML/Schema](https://www.w3.org/XML/Schema)), Relax NG
([https://relaxng.org/](https://relaxng.org/)), and possibly other standards.

\- implicit conversions and portability issues

XML can be used portably between different platforms and programming languages
depending on how the data is represented. XML Schema defines a way to describe
data types that is flexible and interoperabile:
[https://www.w3.org/TR/xmlschema-2/](https://www.w3.org/TR/xmlschema-2/)

Again, I know it's not the popular answer, but I was surprised that it was not
even mentioned in the article.

~~~
mdaniel
Merely for your consideration, I suspect a lot of people have opposition to
XML because it is noisy when edited by a human:

    
    
        <httpd>
          <baseDir>/opt/server</baseDir>
        </httpd>
    

Of that snippet, "</baseDir>" and "</httpd>" are characters of my life that I
will never get back, since they're just parser niceities and not _value_

Moving up the enlightenment chain:

    
    
        {"httpd": {"baseDir": "/opt/server"}}
    

or its brace-and-quoteless friend:

    
    
        httpd:
           baseDir: /opt/server
    

place a lot more emphasis on the payload and less on the packaging

~~~
benibela
But you can do

    
    
         <httpd baseDir="/opt/server"/>
    

and it is shorter than the JSON.

~~~
reirob
In my opinion one of the weaknesses of XML is that it gives so many ways to
express the same data, in attributes, as nested elements, with CDATA, and
probably different other ways. And often it is a matter of taste. So in the
end you come up with so many approaches to express similar structures.

~~~
thesuperbigfrog
>> So in the end you come up with so many approaches to express similar
structures.

Expressiveness and flexibility are good so that you can define configuration
to meet the needs of the system or application you are building.

XML Schema, Relax NG, etc. can be used to specify how the XML configuration
should be structured, limits on data types, required versus optional
configuration items.

As I said before, XML gets a lot of flack for being verbose, ugly, and
complicated, but it is mature, widely-supported, and might be worth
considering depending on your needs.

~~~
benibela
Some parts are extremly ugly, and mostly pointless. namespaces, doctype,
processing instructions.

This is well-formed xml and good way to confuse people:

    
    
         <?xml version="1.0"?><!DOCTYPE abc[<?abc >]]<abc><abc/>]>>?>]><?x?><a/><?x <(x)>>?>

------
atrettel
I think that the security issue needs more attention than the author gives it.
Config files are often shared and not rigorously checked, especially if they
are very long. Arbitrary code execution is a real security risk that should
not be minimized.

For example, years ago I was a frequent user of the chemical kinetics code
called Cantera. It calculates the dynamics of combustion reactions, with the
big application being for jet engines. One of the files that it needs to load
is a mechanism file (called the CTI file). This contains all of the
information about the gas properties and chemical reactions. Different
situations might require different mechanisms (propane mechanism versus JP8
mechanism). Anyhow, Cantera's mechanism file format is literally a Python
script. See the link below for the most commonly used mechanism file that
comes with Cantera:

[https://github.com/Cantera/cantera/blob/master/data/inputs/g...](https://github.com/Cantera/cantera/blob/master/data/inputs/gri30.cti)

This file is 2000 lines long, and many mechanism files are even longer. I told
my colleagues that it is possible to execute arbitrary Python code using the
files but I was unable to convince them that it was a security risk. I think
that these kind of config files are a big security risk for engineering firms,
because they make it much easier to conduct industrial espionage. All that a
bad actor has to do is put a few lines in one and get an engineer to run it
once. Then they could steal designs, analyses, business plans, financial data,
and many other things. It's a serious threat that should not be minimized.

------
indygreg2
I converted PyOxidizer's configuration files from TOML to Starlark because I
found it effectively impossible to express complex primitives in a static
configuration file and the static nature was constraining end-user utility.

A common solution to this problem is to invent some kind of templating or pre-
evaluation of your static config file. But I find these solutions quickly
externalize a lot of complexity and are frustrating because it is often
difficult to debug their evaluation.

At the point you want to do programming-like things in a config file, you
might as well use a "real" programming language. Yes, it is complex in its own
way. But if your target audience is programmers, I think it is an easy
decision to justify.

I'm extremely happy with Starlark and PyOxidizer's configuration files are
vastly more powerful than the TOML ones were.

[https://pyoxidizer.readthedocs.io/en/stable/config.html](https://pyoxidizer.readthedocs.io/en/stable/config.html)

~~~
mixmastamyk
> I found it effectively impossible to express complex primitives in a static
> configuration file

I don't quite understand this part. Any examples?

~~~
indygreg2
Compare
[https://pyoxidizer.readthedocs.io/en/v0.4.0/config.html#file...](https://pyoxidizer.readthedocs.io/en/v0.4.0/config.html#file-
processing-semantics) to
[https://pyoxidizer.readthedocs.io/en/stable/config.html#file...](https://pyoxidizer.readthedocs.io/en/stable/config.html#file-
processing-semantics).

In PyOxidizer's case, I wanted to create virtual pipelines of actions to
perform. In TOML, we could create sections to express each stage in a logical
pipeline. But if you wanted to share stages between pipelines, you were out of
luck. With Starlark, you can define a stage as a function and have multiple
functions reference it because is "just" calling a function.

I suppose I could have defined names for stages and made this work in TOML. So
let's use a slightly more complicated example.

PyOxidizer config files need to allow filtering of resources. Essentially,
call fn(x) to determine whether something is relevant. In the TOML world, we
had to define explicit stages that applied filtering semantics: there were
config primitives dedicated to applying filtering logic. PyOxidizer's config
files had to expose primitives that could perform filtering logic desired by
end-users. By contrast, Starlark exposes an iterable of objects and config
files can examine attributes of each object and deploy their own logic for
determining whether said object is relevant. This is far more powerful, as
config files can define their own filtering rules in a programming language
without being constrained by what the TOML-based config syntax supports.

------
SomeHacker44
Clojure and Common Lisp (among other Lisps, I'm sure) are great at handling
programmable configuration files.

I do not think I would use this author's suggestion though.

------
parhamn
There are cool things somewhere in the middle now too, e.g Cue [1], which is
really cool.

I was recently thinking that its a shame there isn't a cross language proto-
lite kind of thing (I'd settle for json with structures and pointers) that has
a great devx. The multi step transform that configs are right now (from the
serialization representation to the code one) could be eliminated. IMO interop
is one the most important things for configs, so I'd be hesitant to use a
thing that needs a preinstalled runtime.

[1] [https://cuelang.org/](https://cuelang.org/)

------
sudosteph
Related: This is the same reason I've always preferred doing server config
management with Chef (which uses ruby for everything) to Puppet (which has
it's own DSL). Ansible, which uses YAML config with Python behind the scene is
sort of a middle ground, but often wins out because it's simpler to learn and
operate than a full Chef setup. I still find myself missing Chef when I have
to write stuff for ansible though.

To me, from best experiences to worst: Real language (ruby, Python) >
Yaml/Toml > ini config > JSON > DSLs

------
verdverm
Check out [https://cuelang.org](https://cuelang.org)

Created by people who have dealt with this problem at scale for many years.
Picked it up a month ago, absolutely amazing.

------
cageface
Smug lisp weenies, as lisp fans sometimes call themselves , can be a bit
annoying at times but as I see all these new config file formats, data dsls,
hybrid scripts like svelte etc emerge you do have to admit that none of these
are necessary in a language with a programmable syntax like lisp.

------
m0zg
Seems like the config language would benefit from being functional. I've
actually sort of seen an example of this: in GCL/BCL (Google/Borg
configuration language). I used to joke that it's essentially a FP language
with all the good features removed. All the alternatives I have seen outside
Google are much worse, however.

See here for more details:
[https://pure.tue.nl/ws/portalfiles/portal/46927079/638953-1....](https://pure.tue.nl/ws/portalfiles/portal/46927079/638953-1.pdf)

It ticks off most of the author's boxes, but it's so woefully underspecified
and so horrifyingly complex, that Google has been trying to replace it with
something else for the past decade, unsuccessfully (at least as of a few years
ago, don't know about now).

But you get to provide external parameters, compute values, use some
rudimentary logic, inherit configs, reuse configs in larger configs, and so on
and so forth. You can also spend a couple of weeks trying to understand the
config structure of something like an ads backend.

You do get some niceties though: even fairly large services, consisting of
dozens of different backends/mixers/frontends can be brought up/upgraded/shut
down/reconfigured with a single CLI invocation, soup to nuts, including things
like monitoring and load balancing.

------
marpstar
I did this with a server-side TypeScript project I built last year. The
biggest advantage I noticed (and it's mentioned in the article): validation
(i.e. type-safety).

Specifically, the project uses TypeORM and it was nice to be able to import
the types for the DB connection options to make sure I had all of the
necessary properties.

I was using ts-node to run server-side without compilation, which made
importing an `app.config.ts` file quite elegant, IMO.

------
cryptica
I think not containing any logic is a useful feature for config files. Logic
adds complexity and potential for bugs and vulnerabilities - I actually like
that JSON forces you to use simple primitives as config variables. I hate it
when config files start containing function definitions; you can't pass those
around to different processes.

I'm generally against the idea of complex configs. Configs should be simple.
Sometimes that requires extra planning or thinking from developers. Every time
I've allowed logic in config files, I've regretted it.

The main annoyance about JSON is lack of commenting ability. This is a point I
agree with but not a big deal IMO, it's still better than all other
alternatives.

JSON is ideal in terms of being both machine-readable and human-readable. Some
people would argue that YAML is more human-readable, but it's definitely less
machine-readable. With YAML, there are too many situations where some random
code somewhere will remove all the new lines and tabs (for whatever reason)
and mess everything up. JSON is resilient to machine sanitization. JSON is
simple, robust and readable.

------
henrik_w
This reminded me of the Configuration Complexity Clock:

[https://mikehadlow.blogspot.com/2012/05/configuration-
comple...](https://mikehadlow.blogspot.com/2012/05/configuration-complexity-
clock.html)

Hacker News discussion here:

[https://news.ycombinator.com/item?id=14298715](https://news.ycombinator.com/item?id=14298715)

------
zer
I use schemy [0] for non trivial config tasks in .NET projects. LISPs in
general are well suited for this. Very small implementations are easily done
and data is code, it’s pretty great.

Of course, if one has already access to a Python runtime that is probably fine
as well.

[0] [https://github.com/microsoft/schemy](https://github.com/microsoft/schemy)

~~~
Koshkin
Lisps are great as embedded scripting and configuration languages! Lisp is XML
done right.

------
rot25
This reminds me of Proxy Auto Configs
([https://en.wikipedia.org/wiki/Proxy_auto-
config](https://en.wikipedia.org/wiki/Proxy_auto-config)). PAC files are Proxy
configs that are programmed in JavaScript. To do this devices usually embed a
JavaScript runtime in the operating system to parse proxy files. This
introduces a lot more attack surface than a standard config file would and has
resulted in remote code execution vulnerabilities in android and windows

[https://android.googlesource.com/platform/external/chromium-...](https://android.googlesource.com/platform/external/chromium-
libpac/+/af8de44fa4f7c228d812e8db86d4d16585ed1050)

[https://googleprojectzero.blogspot.com/2017/12/apacolypse-
no...](https://googleprojectzero.blogspot.com/2017/12/apacolypse-now-
exploiting-windows-10-in_18.html)

------
ryukafalz
One of my favorite projects that does this is Guix, which uses Guile for its
configuration. See this page for an example:
[https://guix.gnu.org/manual/en/html_node/Using-the-
Configura...](https://guix.gnu.org/manual/en/html_node/Using-the-
Configuration-System.html)

It's a nice balance, I think - most of the time you can treat it as a normal
config file, but if you need to drop into some programmatic stuff you can.
I've done this in my own dotfiles to add custom package definitions before
contributing them upstream, for example:
[https://github.com/jfrederickson/dotfiles/blob/master/guix/g...](https://github.com/jfrederickson/dotfiles/blob/master/guix/guix/manifest.scm)

------
greggman3
I somewhat agree and so does much of the JavaScript world.

eslint or maybe jslint started off with json and later ended up letting you
make a JavaScript file. As a simple example of something you might want to do,
read the version from the package.json and generate a banner or an
installation version.

Grunt (not sure who still uses Grunt but I do) is JavaScript based. And yes,
sharing parts is something I commonly do as in define common options and re-
use them in dev, production, or minimized, un-minimized.

WebPack takes a JavaScript file

For most build tasks I've had some place where I needed to use code to make
the build better but it can go both ways. The worst experience I've ever had
is Scons (python). No one I know ever really understood how it worked and so
every programmers wrote wrong and hard to understand code trying to insert
their special needs into the build.

------
kire456
I am quite turned off by the following passage near the top of this page:

> bits of configs can't be reused. For example, while YAML, in theory,
> supports reusing/including bits of the config (they call it anchors), some
> software like Github Actions doesn't support it.

I use YAML configuration for Home Assistant all the time, and the includes
work just fine. This article is off to a really weak start by saying "config
files can not have X, because well they can just fine but arbitrary software Y
doesn't support it". Especially damning because the proposed solution is to
embed a whole other programming language within your program! If you can
change the code of the consuming program, then just add support for imports,
instead of adding a Python dependency.

------
Kwantuum
You can add eslint to the things that allow scripts as config files, just like
webpack. I found this very useful because I just started a full-stack
javascript project, and I need most of the linting config to be the same (tab-
width, syntax, etc.) but then node and browsers don't have the same syntax for
imports, don't have the same globals, etc. so I want to reuse a part of the
config. And because you can write the config in js, it's trivial to import the
result of the common config file (which is also used for isomorphic code,
which should have no client/server specific code) and just add or remove what
I need from that to return a different config for linting client and server
files.

------
s_y_n_t_a_x
Lua is perfect for this. Simple syntax, one of it's design goals was to be a
config language.

------
strictnein
Still don't get the hang up about comments with JSON. I'd find it strange to
encounter a config file with lots of internal documentation. They should be
small with well named, descriptive settings.

These are _config_ files. A comment is something that gets ignored, this
"comment" will be ignored by whatever ingests this JSON. If not, there are
other problems. If you have to write a treatise here, there are other
problems.

    
    
       {
          "comment": "This is a comment",
          "dev": { ... },
          "stage": { ... },
          "prod": { ... }
       }
    

Oh no, that line doesn't start with a # or //, whatever will we do!

~~~
karlicoss
\- sometimes the schema is strict (and you might not be able to control it)

\- you might not be certain if you break something with such a comment (i.e.
you have to double check that all the "comment" fields are filtered)

\- you're not geting highlighting from your text editor, it's easier to miss a
comment

\- you can't have a comment for a list item

Basically, you can leave the comments, but you're really discouraged to do so.
Most people would just give up and not bother.

~~~
worble
I think the real loss here is that with a # or // the person reading the file
immediately knows this is a comment. This line won't be read by the
interpretor, and it's purely here for me to read it, so I probably should do.

The following however

    
    
        {
          "comment": "Some comment",
          ...
        }
    

Suddenly I'm not sure if this is meant for me, or if it's actually a
configurable state without going into the program itself and checking if it
uses it.

~~~
Ididntdothis
Putting a comment into the payload is just a terrible hack. this shouldn't be
necessary,

------
mattbillenstein
I do something a bit different - salt pillar or ansible vault contains my
"settings" for the entire backend, so as a step in my deploy, I just write out
all the settings as a config.json so it's accessible from any application
programming language.

Then I'll have a python module config.py that reads that and injects those
settings with perhaps a small amount of logic into that namespace on import.
So you where ever you might need a setting you can just "import config;
config.foo" to get a specific config value.

For small systems, I even do service discovery this way - where does the db
live? config.postgres.host/port or something yada yada.

~~~
mattbillenstein
Note one kinda neat thing about json is you can use jq from scripts to get
values out of it as well.

`psql -h $(jq -r '.services.postgres.host' config.json)`

------
ThePhysicist
Ansible might be a valid example of a system that should have been built using
a programming language instead of a data language. Due to the template
language mixed with interpretation it can be quite difficult to understand the
source of a particular error, and the limitations of Jinja make it quite
difficult to express some things that would be straightforward to write in
Python. That said you can be quite productive with Ansible and the config is
very easy to read and simple to write. I think the original Ansible author now
tries a Python based approach with his new tool (opsmops), so we'll see how
that works out!

------
stepanhruda
Imo this article points out legitimate problems, but downplays the downsides.
The real solution is to combine both:

1\. Use a general purpose language (or ideally a very dumb subset like
Starlark) to generate the configs. You get all the reusability, type safety,
comments etc. 2\. Have #1 output a deterministic config like json/yaml/etc, so
you can always refer to what the config actually looks like without needing to
debug the code in your head. Check that in and whoever uses the config only
uses the static output, they shouldn’t care how it came to life.

This way a config is easy to reason about both on the author and consumer
side.

------
thethethethe
I write a ton of config in a bespoke python config language and FAANG and
there are definitely downsides to having logic in your configs. Config is
supposed to be explicit and when you start iterating over lists and parsing
conditionals, those layers of abstraction can make reading a simple config
quite cumbersome. Additionally, each team will often write their own libraries
which diverge from the standard practices furthering the stratification.

That being said, these config can get very complex and I can’t imagine
expressing them in something even relatively friendly like yaml

------
phissenschaft
[https://queue.acm.org/detail.cfm?id=2898444](https://queue.acm.org/detail.cfm?id=2898444)

""" ... embrace the inevitability of programmatic configuration, and maintain
a clean separation between computation and data. The language to represent the
data should be a simple, data-only format such as JSON or YAML, and
programmatic modification of this data should be done in a real programming
language, where there are well-understood semantics, as well as good tooling
... """

------
nickjj
I'm surprised the author didn't mention Suckless tools
[https://tools.suckless.org/](https://tools.suckless.org/).

Their philosophy is to use minimal software and all configuration is done in a
single programming language file. But it's also written in C which means you
need to recompile the app to change config values. They typically provide a
bunch of patch files that you can optionally bake into your custom binary, or
you can write your own modifications too.

~~~
zabil
> you need to recompile the app to change config values.

This works for a user using one of the suckless tools like dwm or surf.
However, for server side applications there maybe multiple configurations and
it may not be feasible to compile the whole app for every environment or
deployment.

------
warent
Unless we're talking about some specific cases of dependency injection, adding
logic to your configurations sounds like a living hell. Can you imagine the
exponential explosion of complexity that you'd have to deal with? How do you
even _test_ something like this?

If you program requires configurations so dynamic that it needs to be its own
program, there's something very smelly about your architecture. Next thing
we'll need is a config config to config your config, etc. Infinite regression
into a very stupid place.

~~~
klysm
Configs usually have a decent amount of logic hid in the implementation for
defaults and other implicit things. Having a language like Dhall puts a hard
limit on the possible complexity of the configuration while simultaneously
allowing you to make everything explicit without being redundant.

------
michaelfeathers
Configuration files are globals and they are lousy for the same reason globals
are lousy when they are in a program.

If you use a programming language, you had least have a way to scope.

------
zaphar
Using json or yaml for your languages config is nice for the application but
it's not great for for the developer especially if they want to reuse values
or settings across multiple applications. The solution then is to use a real
programming language to generate that config. This is why languages like
Dhall, JSonnet, and Cuelang exist. It's why I built my own toy language UCG to
explore the space. But you could just as easily use python, javascript, or
some other language too.

~~~
Koshkin
> _json or yaml... Dhall, JSonnet, and Cuelang... UCG... python, javascript_

God, what are we being punished for?

~~~
imtringued
If you think that's punishment then you haven't suffered enough. I've seen
enough custom written config formats in my life that choosing a widely
supported format like YAML or JSON is always the right choice. YAML may have
warts but you can learn them in 10 minutes and you're done. Those custom
configuration formats don't even have a name or documented syntax.

------
WalterBright
This reminds me of makefiles. Most programmers wouldn't dream of checking in
code with no comments. But makefiles with byzantine code in it, no comments
whatsoever, nothing even saying what the makefile is for, are routine. The
makefiles degenerate into such an awful mess that then people look for
alternatives to make.

The sensible solution is to treat a makefile like code - document it and use
sensible coding practices like organizing it instead of writing spaghetti
soup.

~~~
Koshkin
Indeed; I've always wondered what is it that's keeping us from having a saner
alternative to make in the form of a simple yet more powerful interpreted
language with all needed procedures and functions built in.

~~~
smilliken
You're describing Nix: [https://nixos.org/nix/](https://nixos.org/nix/)

It fills the same niche as make, but is a pure functional programming
language, reproducible, and has a large package repository.

It's pretty much my favorite piece of software because it allows me to depend
on third party packages with minimal risk of breaking the build.

------
anderspitman
My wishlist for a perfect declarative language (configs being one primary use)
is something like:

* Not turing complete.

* Imports/exports

* Comments

* Variables

* Statically typed (amongst other benefits, allows you to use it for RPC, like protobuf)

I've started implementing such a language 2 or 3 times, but JSON is just
barely good enough that I haven't been able to justify the effort to flesh it
out. Taking JSON and adding types (inline in the grammar; schemas aren't good
enough) and comments would get us 80% of the way there.

~~~
karlicoss
Looks like Dhall would be a perfect fit for you.

> Dhall is a programmable configuration language that you can think of as:
> JSON + functions + types + imports

[https://dhall-lang.org](https://dhall-lang.org)

~~~
pron
It's hard to take Dhall seriously because it seems like its authors don't take
it seriously, and work on it for entertainment. They went through the trouble
of creating bindings for five (!!) different languages, and the most popular
of them, by a pretty wide margin, is Ruby. I assume Fortran, Delphi, and Idris
are next on their list.

~~~
Gabriel439
Dhall now has bindings to Rust and Go and Java/Python bindings are currently
in progress

We also do take things seriously, including:

* Creating a language server (do any other configuration languages have this?) * Soliciting donations to fund high priority work ([https://opencollective.com/dhall](https://opencollective.com/dhall)) * Working on a book ([https://github.com/Gabriel439/dhall-manual](https://github.com/Gabriel439/dhall-manual)) * Maintaining a formal language semantics ([https://github.com/dhall-lang/dhall-lang/tree/master/standar...](https://github.com/dhall-lang/dhall-lang/tree/master/standard))

If you think we're still missing something please let us know as we are
responsive to user feedback

~~~
pron
Rust and Go are even less popular than Ruby but glad to hear about Java and
Python. Either one of them has at least 5x the market penetration of all
currently supported languages combined, and at least 10x of all of them but
Ruby combined.

~~~
Gabriel439
Overall language popularity is only one input into how we prioritize
programming languages.

The best way I can summarize our prioritization process is that we prioritize
in descending over:

* What people are willing to spend their free time to build (I can't order other people to build high-priority bindings and my own free time is already accounted for by improving Haskell bindings that power a lot of shared tooling such as the language server)

* Bindings specific to DevOps use cases (e.g. Go / Python / Ruby / Nix / JSON / YAML), since they are the dominant languages and formats in this space)

* Bindings that can be used to create derived bindings (e.g. Rust, which can then be used to create a binding in any language that can bind to C. In fact, this is how the upcoming Python bindings work. See: [https://pypi.org/project/dhall/](https://pypi.org/project/dhall/))

* Bindings that users request (We have a yearly survey where we ask users to inform the direction of the ecosystem. Python was the most requested language in the most recent survey)

* Overall language popularity (as the final tiebreaker)

So I hope this illustrates that there is a lot more that goes into these
decisions beyond just which language is the most popular and we're not being
obtuse or dilettantes just because we haven't gotten to a specific language,
yet.

------
lokar
I have long argued that teams should write configs in one of the languages
they already know and are using in the system.

This has many obvious benefits, including testing!

A key aspect is for the owner of the "config system" (the stuff that takes the
output of each config program and applies it) to standardize the API for
config generators. Inputs, outputs, runtime env, etc. Then let teams integrate
with that API however they want.

------
exabrial
Here's a philosophical question... when does one put configuration in files vs
a GUI? When changes in the GUI are made, do you update the config files? This
always bothered me. Even databases don't get this rightish. Take MySQL, you
can SET GLOBAL TIMEZONE after the database has started, or you can set it in
my.conf. To me, have multiple places do things leads to confusion and head-
desking.

------
drdaeman
I disagree.

Those configs would be unmanageable. You would no longer be able to set a
value, interfaces like `git config` or your-favorite-preferences-GUI-pane
wouldn't be possible. And, generally speaking, I believe it's a bad and non-
user-friendly thing.

However, it makes perfect sense to use a programming language, if your program
is programmable and is meant to be programmable. Like a shell's rc files.

It's also probably okay to use programs as configs, if everything related to
the program is built decoratively, so all the configuration files are
generated (from some higher-order configuration) and not meant to be ever
manipulated, only replaced with the newly generated versions. Programmatically
generating programs is simple, programmatically manipulating programs is not.

It is also fine if you explicitly want to restrict an ability to manipulate
the configuration or create management interfaces, requiring human programming
(whenever the tool is meant to be programmable or not). Feels like a weird
idea, but I can see this being considered as a trade-off.

------
kevin_thibedeau
This is an area where Tcl can shine. Sandboxed sub-interpreters can be
stripped of all unwanted commands so that you are no longer Turing complete.
You are left with simple configurations that just set variables. If more power
is needed you let some control structures back in or select custom commands to
do what a dumb config file can't.

~~~
bjoli
Yup. I just did a similar thing in guile (using sandboxed environments). as a
scheme weenie I feel like I am looking at a world of people re-discovering
things I have taken for granted since... forever?

I went from trying to embed python in a C application to embedding TCL and was
blown away how easy it was, and that it supported threads. this was back in
2003 I remember trying it with lua, but the parallelism story wasn't really
great until at least 5.1. TCL did it right from 8.4 and things hasn't really
changed since, except that Lua catched up.

------
mixmastamyk
I've read a lot of "one true way" in this discussion. Do we need programmable
configuration? Depends on the requirements really.

Django seems to work ok with Python configuration. I haven't seen found much
logic happen in the projects I've used. But the user is the developer not an
end-user. So security is not an issue from this angle.

I'm of the opinion that a simple config format like .ini is best for the end-
user of a small application, while a validated schema (with code if needed) is
best for a large one. To that end I'm currently experimenting with tconf to
bring together those approaches under one package:
[https://github.com/mixmastamyk/tconf](https://github.com/mixmastamyk/tconf)

------
socialdemocrat
I think Python is a terrible language for config files. It requires a full
python installation. Lua or a simple LISP are better choices as they can
trivially be embedded in every program at very low cost in terms of added
code. It further avoid juggling different Python versions.

~~~
acqq
Exactly. And once one has something like that embedded, one can use that
flexibility for many kinds of adaptations and scripting of the main program.

------
oweiler
Groovy solves this really nice [https://mrhaki.blogspot.com/2009/10/groovy-
goodness-using-co...](https://mrhaki.blogspot.com/2009/10/groovy-goodness-
using-configslurper.html?m=1)

------
sramsay
My users consistently forget a comma in a JSON config file, and then open an
issue because "the program doesn't work."

This is indeed a problem, but something tells me the solution isn't "have them
use Python."

~~~
jfkebwjsbx
Your users should not be writing JSON or Python to begin with, so using Python
is fine.

~~~
sramsay
So no configuration files at all?

------
peanut-walrus
As a sysadmin, my main requirements for configs are:

* Human-readable and writable, including on systems where you have only very basic text editing tools * Easy to template (for example mass deployments via Ansible or other such tool) * Easy to use with grep/sed/etc

The format I've found that is easiest to work with is what sysctl and OpenWRT
use for their configs. You can have complex hierarchies, but every line stands
on its own. This means you don't have to be careful about where in the config
a particular line is, if it is in the proper block, etc. Also getting
information on a particular sub-item is as easy as running a single grep
command.

------
ebg13
I do this all the time in projects at work and it's extremely useful. Most
importantly it pleasantly separates core logic from configuration-specific
niggling detail logic (now, for instance, you can generate that thousand entry
list of similar things in a few concise lines of code without a bespoke
preprocessor pipeline).

But the best way to sell it to people who don't understand or who aren't yet
on board is to name it properly, which the article fails to do. What this is
describing is no longer configuration. This is now a modular plugin
architecture that only in the most basic usage case implements a configuration
interface.

------
smsm42
> I would argue that when you can't define temporary variables, helper
> functions, substitute strings or concatenate lists, it's a bit fucked up.

I must admit that's where I stopped taking it seriously. If you need helper
functions and temp variables, you do need a good Turing-complete language, but
we're no longer talking about config files. Maybe about config system, but not
a config file. And confusing the two means missing the point why config files
exist at all. I mean not everybody has to buy into code vs. data division, but
if you miss entirely why it exists maybe it not thought through enough.

~~~
KMag
I agree with you that helper functions within configuration are a code smell,
but I find (immutable) temporary variables helpful. It's pretty common to want
named constants to avoid copy-pasting configuration and to make consistent
modification easier. You could have an inclusion mechanism and put those
constant parts in their own configuration file, but then you've effectively
renamed "my_constant" to "include('my_constant.cfg')". Immutable temporary
variables are just non-globally scoped constants.

------
cfv
If your config files contain anything that could not be expressed as an OS
variable, that's a maintenance nightmare right there. Eventually some user
somewhere is going to jam entire mini programs in there and then complain when
his castle of cards implodes.

Consider, if possible, the chance that if your program needs a subprogram for
configuring itself, then maybe a bootstrap event hook or some tap into the
bootstrap process like a plugin or something is a better fit for what you are
trying to do. Actual managed extension points instead of ad-hoc ones will make
your program much more useful in the long run.

------
mlthoughts2018
I’m surprised to see nobody brought up the 12 Factor App principle about
config,

[https://12factor.net/config](https://12factor.net/config)

Anything that's changing from one run of your application to another is
config. Static settings that don’t change from one run to another are not
config, they are just static pieces of data and it does not matter how you
store them other than that you ensure it meets the operating need of your
application.

Configs, however, which can change from one run to the next, are different.
They need to be factored out of code completely and only addressed by
inspection of the runtime environment.

Whatever tool you use to ensure they are injected to the environment is also a
totally uninteresting decision as long as it meets the operating needs of your
deployment system.

That’s it.

1\. store static constants and data items however you want as long as your
program meets its operating requirements

2\. config is not the same as static constants or static data, config can
change from one run of the application to another.

3\. factor config completely out of the code so it is solely referenced as
part of the environment

4\. store external configs however you want so long as the operating
constraints of the deployment system are met.

Within items 1 and 4, debate over relative merits of different tools is almost
always useless bikeshedding unless it boils down to a real operating
constraint of either the app itself or the deployment system.

For example, a system that puts secret access tokens (config, not static
settings) into the environment by storing them in plaintext environment
variables might violate a security operating constraint of the deploy system
and so a different system that manages encrypted secrets injected into the
environment safely could be the winner for real operating reasons.

Meanwhile, whether to store “staging” vs “production” database connections in
Python / YAML / JSON / TOML / etc. because of comments / whitespace / use of
builtin library “extra” code / whatever is just pure bikeshedding waste of
time.

------
twomoretime
My takeaway from the comments is that software development is not monolithic
and what's appropriate for your application may be anything from a couple
lines in a text file to literal python.

------
maelito
What about writing YAML, parsing it in your preferred programming language,
and document what's expected ? We're doing it with thousands of lines of YAML,
it's working great.

~~~
Koshkin
There is something wrong with the idea to use a _markup_ language for
configurations. Sure, the idea is extremely popular, and there are reasons for
that, but a homoiconic scripting language (a Lisp or, to an extent,
Javascript/JSON) would serve the purpose much better.

~~~
imtringued
YAML isn't a markup language and that's why the name is now "YAML ain't markup
language".

------
mdale
We use json schema to counter a lot of the concerns listed around type checks,
validation, back references to structures etc.

More importantly this helps us have tight validation around configuration
ecosystem that defined experimentation and server side overrides. See config
delta blog post: [https://medium.com/crunchyroll/introducing-crunchyroll-
confi...](https://medium.com/crunchyroll/introducing-crunchyroll-config-
delta-90b212e61a15)

------
Pmop
I used to develop game engines when I was starting to learn programming. First
C then C++. I knew JSON and there's an excelent header-only library for C++.
However, I quickly learned from the indie gamedev community that parsing y=x
always solves most of your problems, and it's quite trivial to do it,
specially if you're using Python. Certainly better than using a full blown
programming language for conf files.

------
bonebutter
I hope I'm not to late to the party but I'm building a project doing just
this. It's called Anyfig and allows you to create your configs during runtime
in Python for pure-Python projects. Check it out :)
[https://github.com/OlofHarrysson/anyfig](https://github.com/OlofHarrysson/anyfig)

------
mapgrep
This approach seems to have served emacs quite well; the config is just some
elisp. It’s weird more programmers don’t copy this approach, given how many of
them use emacs.

Of course, you start out just cargo culting emacs config, so many may not even
realize emacs is configured via code. You can also get pretty far with setq
functions, essentially just assigning values to variables.

But learning to grok and code elisp is when you start to really see emacs’
power.

------
renewiltord
Interesting. This is the opposite of modern design defunctionalization. I
think the arguments made for that apply to why this is not a good idea.

The advantage of declarative configuration is that it provides a sync barrier
to the human and a safe entry-point. I imagine in a pure language without
global runtime state you could use this method but in the more mainstream
languages it is likely to trip you up. I will refrain.

------
throw_m239339
Ctrl+F search XML, no result.

I know the majority of developers hate XML config files, yet...

I try to make my apps depend on config files and manifests as little as
possible, but DSL vs fully featured scripting languages has always been a big
conundrum in software development. Is python a solution? the problem is that
at some point you might be tempted to add json/yaml config files to your
python config scripts...

~~~
toolslive
about 5 minutes after you start using xml for configuration, you want to
reference a bit of global information (environment variable, port, ....) and
add variable substitution ${foo} as an extra feature.

------
adamnemecek
There was a time when I thought you want a declarative language for your build
system. After having used Rust, which allows you to use Rust in your build
scripts I think that that is the way forward. You can say generate game assets
in your build script, access network, use third party crates etc etc.

Declarative languages are a work around for bad APIs.

------
z3t4
People often forget that formats such as JSON and XML is not meant to be hand
written. It's a serialization format. It's meant to be parsed, while still
human readable. It's very nice to have them human readable to make it easier
to debug, and make quick changes. But you should probably use a tool to edit
them.

------
indymike
Saltstack is an interesting compromise... states can be described in a
serialization format or in Python. I used Saltstack a lot at my last company
and really liked it because for most cases, you could just use YAML... but
when that didn't work, you could bring in a "real" programming language
(python).

------
The_rationalist
Gradle should be cited: It allow to configure build/automation rules with
either groovy or Kotlin script

~~~
karlicoss
author here! Thanks, I've actually had it in my prompts but forgot to add.
Will amend!

------
cozzyd
I occasionally do this for python-only projects or for Makefiles, but
(especially in the latter case) it's quite fragile.

I usually use libconfig for configuration, which is better than json and yaml
and has implementations in many languages (although I'm not aware of any
javascript implementation).

------
sly010
setuptools doesn't deserve to be mentioned in the same capacity as
bazel/nix/dhall.

Having the entire python language (specifically the ability to inspect the
local system) available BEFORE you declare your package (as a side effect!)
means you cannot reliably learn anything about a package without having an
entire operating system with python and all package dependencies pre-
installed. It can lead to scenarios where you cannot automatically query a
packages dependencies without having those dependencies already pre-installed.
Or when installing packages in different order yields totally different
results.

On the data <-> code spectrum, configuration should be closer to data than to
code, perhaps a pure function that takes a high level config to a lower level
config.

------
thiht
I hate when config is done using a real programming language. In the JS world,
Grunt sucked, Gulp sucked, Webpack sucks, Parcel came with zero-config (ie. 3
lines of JSON max) and it rocks. Maybe it's me, but I love it when my conf is
static and just easy to understand.

------
exabrial
I really think configuration files should have a schema of some sort,
something that an editor can read and know the complete set of valid options
for, and highlight invalid options. Something aiken to an XSD for XML (but not
XML, sucks to edit with command lines tools).

------
ehosca
[https://github.com/lightbend/config/blob/master/HOCON.md#hoc...](https://github.com/lightbend/config/blob/master/HOCON.md#hocon-
human-optimized-config-object-notation)

------
lonelappde
Config is code.

Replace "configuration" with "scripting layer", and you'll have a much better
perspective on "config" "vs" "code". Especially if your "code" is also a
scripting language or open source.

------
jscholes
> exec(Path('config.py').read_text(), config)

I'm guessing that this is in place of just importing the config module,
because of this line:

> You can even import the very package you're configuring.

Which sounds like a recipe for circular import disaster.

------
yellowapple
> This is considered as a positive by many, but I would argue that when you
> can't define temporary variables, helper functions, substitute strings or
> concatenate lists, it's a bit fucked up.

What's "a bit fucked up" is expecting the configuration file to be able to
support this when this can and should be done by the application itself. If
you feel the need to do these things in the configuration file itself, then
the application is providing insufficient abstractions.

Like sure, there's definitely use in an application dynamically loading its
own potentially-user-supplied code to modify its behavior, but most people
don't call that code "configuration files"; we call that code "extensions" or
"modules" or "plugins" or "scripts" or somesuch.

------
blain_the_train
In terms of how this should be done, clojure and edn shine here and prove the
point rather strongly that solutions like ymal are unnecessary at best and
deeply limiting at worst.

Yaml is the poster boy for easy but not simply.

------
ronyfadel
The neat thing about config files is that they’re easily serializable and
transferrable over the network, the file system etc..

A programming language can’t do that (or if it could via eval, it would be a
bad idea)

------
akyu
At one of my old jobs, we used XML for config files, but users could write C#
in the XML which would get dynamically parsed and run at runtime. Needless to
say it was an absolute nightmare.

------
preommr
My 2¢: 90% of the time config files are simple enough for something like json
(+ comments)

The other 10% of the time where you need some level of logic to modify some
setting value, there's no substitute for something that people are already
familiar with. Just let people have their 3 line js function instead of a 2-3
other dependencies that add more convoluted logic to the build system.

Trying to create a config specific language with more advanced features/logic
is an absolute mess where you end up learning a second language, people are
unhappy with how certain features aren't implemented, and others are unhappy
with how too many features exist, and it's just all round chaos.

tl;dr There should be no inbetween: Either you have a simple key/value config
file that someone can learn in 5 minutes, or you let people use a language
they're already familiar with.

------
gaogao
At work, we generally do configs via Python that builds to json. It's a good
best of both worlds imo, as it avoids a bunch of the downsides in the article.

------
SanchoPanda
A large number of configs are created/used by people that don't know any
programming languages, it seems rough to exclude all those people.

~~~
q3k
It seems worse to me to treat them like they can't learn a programming
language.

~~~
SanchoPanda
Not forcing them to is different from assuming they can't.

~~~
sparkie
The trick is to get them programming without them even realizing they're doing
it.

There's examples of this working effectively: millions of people can use excel
to perform computations on groups of cells - the formulae they're entering
include basic programming constructs like assignment, selection, loops, etc -
yet it's not presented as a "programming system", which otherwise seems to put
people off.

I recall reading an anecdote about the TECO editor (or some other emacs
predecessor). Where the editor was being used by secretaries and other non-
programmers, but they'd have no problems configuring it with the documentation
they had available - they were never told that they were actually programming
when they were doing so.

Perhaps it's a bit much to give the user a full-blown programming language for
their configuration and expect them to have no problems, but it seems like
limited programming concepts can be learned by just about anyone if presented
in the right way.

~~~
SanchoPanda
I agree. I have seen incredible things done in Excel in the most elegant ways,
by people with zero programming, math or computer backgrounds.

And as you said, presentation is key. Presented as "use these tools to set
this up" things could go well. Presented as "write this in python" it may not
go at all.

------
choward
> your program crashes because of something that would be trivial to catch
> with any simple type system

Then goes on to recommend python. What?

~~~
XelNika
Sure, Python is dynamically typed, but it supports type hints and avoids
implicit type conversions. I'm sure that quote is aimed squarely at YAML, it
is an absolute nightmare by comparison.

------
bschwindHN
I'm happy with json5 for now, it addresses my complaints with json and is
supported by the main tools I use.

------
PopeDotNinja
I'm gonna call this thermonuclear programming... Exploding your program with
another program.

------
AlphaSite
How about something like HCL?

------
jayd16
Don't we have configs so we can change programs without programming?

------
phab
> termination checking > Anyone knows examples of conservative static analysis
> tools that check for termination in general purpose languages?

As this is provably impossible, this "solution" introduces exactly one of the
problems the author was trying to avoid:

> can't be validated

~~~
karlicoss
(author here)

That's why 'conservative'. I.e. it's allowed to reject a valid, terminating
program, but if it does pass the check, your program is guaranteed to
terminate. This is something that's possible, the only question is the
tradeoff between the subset of the syntax and how complicated is the static
analysis.

~~~
phab
But if it's allowed to reject a valid terminating program it's allowed to
reject arbitrary, otherwise valid, configuration programs. In other words, you
can no longer trust the output of the validator - the weakened model
_significantly_ reduces the utility of the validator.

~~~
karlicoss
That's how most analysis tools work. Not necessarily 'dynamic' languages even,
e.g.

\- Clang (depending on warnings level) may reject a valid program -- doesn't
make it less useful, you just suppress the check for the offending line and
carry on

\- Rust borrow checker may be seemingly picky and reject a perfectly valid
program form your viewpoint. Does it make it less useful? I wouldn't say so.

~~~
phab
Right, but if you've introduced the ability to ignore the validator then
you've traded away the guarantee that your config program will be safe to
execute.

My point is that the validator can't give you the safety property that is
claimed as a defence against one of the inherent issues with this approach.

~~~
karlicoss
Ah, maybe I wasn't clear enough there -- the validator/analyser is supposed to
run by both parties, the party who writes config, and the party who loads it
as well. So you can reject a malicious config before trying to execute it.

I mostly have a non-malicious user in mind though (i.e. end-user software,
where the software and the config have same permissions)

If you do have such security concerns, you probably need a sandbox at some
point. E.g. big source of my frustration are CI pipelines -- they run isolated
anyway and execute arbitrary code. Having a YAML there does nothing for the
security.

~~~
phab
Even if I'm not concerned with a malicious actor, I _ought_ to be concerned
about silly future me that accidentally introduces an infinite loop into the
config, which then makes it to production and is able to wreak havoc, because
silly past me had to disable the validator for this config because of "that
pesky validator bug that only shows up on Tuesdays"...

------
S_A_P
My gut is telling me NO NO NO NO NO NO! But like most things development I’m
sure the answer is maybe. Some of the time. For a specific set of problems. In
certain cases. With certain languages.

------
crazypython
Arras uses a JavaScript file that generates JSON.

------
arvindrajnaidu
Agree. Config is hard-coding. So code it.

------
option
I came to a very similar conclusion with a big IF. If your project isn't
concerned about security, then yes, Python is a way to go.

------
cft
Many real programming languages are compiled. Does it mean I have to recompile
the binary every time I change config?

~~~
jbreckmckye
Many such 'real' languages have support for some kind of interpreted mode.

For example if you set a path to Stack as a script shebang, you can write
interpreted-mode shell scripts in Haskell.

~~~
cft
What is the support for the "interpreted mode" in C, C++ or Rust?

------
otikik
Lua is great for this.

------
11235813213455
What about jsonc or JSON5?

~~~
Snelius
Looks like author does not know what is it :)

------
robbyoconnor
uhm... toml and yaml are fine.

You could also use JSON or environment vars...

------
IshKebab
Have fun writing a GUI to edit your config file...

------
rq1
I wouldn’t call Python a “real language” to begin with.

Troll aside, nix is mentioned and IMHO is the perfect tool for the task, but
the author throws it away in few lines because it’s an “overkill”.

------
jstewartmobile
Turing completeness in config files is a proven bad idea.

If it is truly needed, use/write a config generator instead. We have so many
tools in the build chain now that config generation can be easily automated
into the deployment process.

See m4, BPF, PDF, etc...

~~~
KMag
There are plenty of examples where eschewing Turing-completeness has resulted
in safer systems that are easier to reason about.

I haven't experienced cases where I've needed Turing-complete configuration,
and regard Turing-complete configuration as a code smell. However, people at
least feel they sometimes need Turing-completeness. When they do, I think we
can all agree that ideally they'd not need to add another language to their
project in order to do so.

I think the ideal case would be a declarative Turing-incomplete configuration
language with an escape hatch to a Turing-complete superset, similar to Rust's
unsafe blocks. Bonus points for forcing impurity to be similarly confined
without scaring people with monads.

If done well, with both an AoT compiler generating highly optimized native
code and an interpreter for loading configuration, one might even be able to
bring the "configuration should be Turing-incomplete" and "configuration
should be in the same language as the application" folks under the same big
tent. In an ideal world, the AoT compiler could optionally partially evaluate
your program with respect to a configuration file, basically conditional
compilation on steroids, with guarantees that the conditional compilation
hasn't altered program behavior.

Static analysis would be easier, both for developer tools and the compiler's
optimizer. A quick code search would help you focus on the gnarly bits of
code. A code reviewer would take pause before approving some code with an
"unsafe impure loopy {}" block (assuming "loopy" is the keyword for the
Turing-complete superset).

