Looks like nobody has mentioned TOML yet, so I will. It's basically INI syntax with stricter rules and can be used as such if you wish, but also supports some extra features like arrays, dictionaries/tables, timestamp values, nested sections, and so on. It's a joy to use in the rare times that I'm able to use it.
TOML made an unfortunate decision by not enforcing lack of indentation. Now people who don't understand it but really should (gitlab is one prominent example) are doing horrors like this:
[foo]
bar = baz
instead of
[foo]
bar = baz
Do not indent TOML!
Also, I don't think it's typed in the way the article wished for.
We have a little dhall at work. We use it to generate JSON dynamically from env vars. The tooling is great, and the error messages are great. It’s a bit “functional”, so it will strike some as unfamiliar, but I have really liked using it so far. We will probably extend our use of it to generate json schema documents, since it allows us much better code commenting and structure reuse, even across files.
I've increasingly found myself drawn to writing declarations as programs, not as config in any plain textual format.
Whether it's going all the way in the direction of supporting general purpose languages like Pulumi does, or with a niche but still Turing-complete language like Nix's expression language, or even Dhall, which other commentators have mentioned. That isn't to say there is no place for simple, human readable schema. I think these tools need a fallback to something simpler.
Nothing these tools do couldn't be emulated by manual processes of hand-writing complex makefiles or YAML or whatever, but what is striking to me is the use of general purpose languages, usually with tooling to do type-checking and IDE assistance to writing these things, which lowers the barrier to entry and empowers someone who "just knows (Python|JavaScript|Go|...)" to contribute in a familiar environment.
A couple more examples:
Envoy: I have had the displeasure, recently, of writing by-hand a configuration for the Envoy load balancer. Envoy uses a "typed config language", which is often represented as YAML or JSON, but it's very painful to write by hand. On the other hand, if you have the protocol buffers/gRPC schema available in code, it's vastly less painful to use any programming language to build the typed objects and then export to plain text. The xDS protocol is designed for being interacted with via programs, not plaintext.
GitLab CI: GitLab supports writing a program, which runs as part of the CI/CD job, to generate the configuration for a subsequent pipeline. This makes writing complex jobs, or repetitive monorepo tasks much simpler. A 10 line Python program that effectively does "for each folder in `python`, emit this block of YAML" is incredibly powerful.
That last example is salient to me: markup languages are really easy to parse, but challenging to read when they're dynamic. Wouldn't it be nice to be able to mix and match? YAML/JSON where it makes sense and the intent and meaning is self-evident, and to write code where you need dynamism?
Full disclosure: I work for Pulumi, opinions are my own, etc.
Interesting. I've been using Guix for about two years now and really love its approach to configuration. Everything is written in a general programming language, but Guix provides a really nice DSL that covers your package declarations, operating system declarations, etc.
Guix uses Guile Scheme end-to-end, but you really don't have to know any lisp as a user. The DSL just reads in such an obvious way. However, since it's Just Lisp, you also have a full programming language and ecosystem available for the times you need it.
In fact, this has been how I have begun to absorb lisp! I started off writing simple package definitions and slowly started tackling ones that require more hand-holding. Now I find myself even able to quickly sketch up scheme scripts for common system tasks and the like.
In the past it was fairly common to write configuration files in TCL and just running the script to parse them.
This was generally considered to be a mistake. It opens up a huge threat surface for your program and few people ended up using the more advanced capabilities it created. If your config format supports simple globbing that covers the 95% use case for otherwise needing a turing compete language for your configuration files.
Pity. Tcl allows you to create safe interpreters within which you can disable any commands you want in order to have a trustworthy environment for running configuration scripts.
Tcl itself uses them to build its internal list of available package modules.
YAML's parsing of `no` as `False` has not been part of the spec for 13 years now. It was changed in YAML 1.2 in 2009 to only be `true` and `false` (with variations in case allowed I think).
As has come up in this thread already, any discussion of typed config languages nowadays that doesn't mention Cue (https://cuelang.org/) seems incomplete. They really seem to be tackling the problem in a thorough way. I hope it catches on.
For anyone who knows more about Cue: right now you can go from Cue<->yaml (in fact, their docs on yaml also use the "no" case as an example: https://cuelang.org/docs/integrations/yaml/) to integrate with existing systems, but I suppose eventually the goal would be to have direct support in libraries like Serde?
Cue is a very cool language, but it is quite different than the "typed config language" that I have described here. Maybe I picked a poor title but in the post I am talking about using the type information to "improve" parsing. IIUC Cue does not due this, it parses in a "dynamically typed" manor, then uses the type system to evaluate the turing complete (or close to it) expression language.
Yeah that's why I was asking about Cue's eventual goals with libraries like Serde. I assume eventually they'd like to be able to auto-generate type definitions for a target language, but I don't know.
> Cue does not due this, it parses in a "dynamically typed" manor, then uses the type system to evaluate the turing complete (or close to it) expression language.
As I understand it Cue would help in two ways currently. 1. It would be able to type-check existing yaml files to catch things like the "no" case. 2. if you write your config in Cue, it would output properly-typed yaml to avoid things like "no".
Yes. I agree. It would "prevent" the "no case" by returning an error on parse/evaluation. However the solution described here can do better. It can correctly parse the no case. Basically by knowing it is parsing a string the grammar can be simpler, it doesn't have to decide if it is a int/bool/string anymore.
Was going to mention Cue. Works great in cases where you want to add a touch of structure to your existing yaml configuration, such as types and bounds checking.
The author is right that you gain syntax benefits when you define a schema. For those who say this adds cognitive overhead, it actually doesn't; the schema and compiler are able to reduce that overhead, because if you make a mistake, you get a nice, accurate error message.
I'm kind of shocked no one brought up protobufs yet. protobuf libraries are available in pretty much all mainstream languages and the textproto format is pretty mature.
It's albeit clunkier and less freeform than YAML. And if you ever only plan on using rust the proposed solution here is probably cleaner.
Having portability over multiple languages maintained by large organizations can be useful in some cases though.
> Statically typed programming languages are catching on so why don’t we extend this typing to our config files?
What you really need is Cuelang. Cuelang does graph unification over a type-value lattice. This allows the user to do progressive type -> value refinement (e.g. type->range->value).
For configuration, this is both better than regular type systems, and better than inheritance.
Indentation-based config languages really need to be retired.
Unless you put in extra work to validate your config, they are just error prone and impractical.
They don't compose (INI files do, btw), they need special editors to keep sane when copying and pasting.
If you now add an invisible type layer on top of them, it doesn't get better, it actually gets worse. Now the interpretation of a value in the file not only depends on the literal interpretation by the human brain but on some type definition one needs to be aware of, adding a semantic interpretation that can be non-obvious.
That is why we have micro-formats: they are obvious (mostly).
http:// 03-13-1980 123e4567-e89b-12d3-a456-426614174000 "I'm a string"
There's a delicate balance between readability and correctness and safety.
YAML et. al. are not hitting that balance.
And if you really, constantly, need to edit config files and can't handle a format that prioritizes correctness over readability: Build a UI.
Yep. YAML especially is just horrifying to use unless you have internalized how lists and dicts work. After 5ish years of working with YAML files, I haven't.
TOML is an excellent contemporary choice that is not unpopular. HCL might be another, but unfortunately not very popular outside Hashicorp tools.
Why not put it in code? You're program in strong typed lang, you write the config in that lang. You have a config function (analogous to the file) and a config "data" type (returned by the function, specified by the "configuree"). The function can only read env vars (keys, secrets, etc) and return the data structure layout by the app, strong typed enough you can prevent any side-effect easily by restricting the return type of the config function.
Your IDE can help you write only acceptable config files (functions) this way.
In case you do not want to recompile for conf file changes, many languages come with some kind of interpreter. You may even make it hot-reloadable for some properties.
You need more, you probably need a config service, which you can build in a type safe fashion as well.
Like it or not YAML configuration files are everywhere. I've had a lot of luck using JSON Schema with YAML config files. Luckily VSCdoe and possibly many other editors can be configured to provide type hints for completion. Using any available JSONSchema checker you can validate your config files in CI and elsewhere.
Most of the time, what's missing an accurate JSON Schema for a configuration. I usually encourage owners of those configurations to sit down and write it for everyone to benefit from
YAML allows comments, I'll give it that. But what I really want is splitting configuration into multiple files that can be imported to others. And substitutions. Basically HOCON known on the JVM as https://github.com/lightbend/config
The white and orange/brown are hard to read for me. Both have less than 1.5 contrast in the chrome dev tools. To continue on the nitpicks, I think there's a typo, the procedural macros have "drive" instead of "derive". And the boxes of code taking all the page compared to the "terminal" text make it a bit hard to "get" the flow of the page.
Other than that it was a nice read, and the grey background is nice and easy on the eyes. I really like the breadcrumbs too, they act as a minimalist menu and easily clickable URL.
Thanks for the feedback. I think now that the bug with the syntax highlighting is fixed the accessibility should be ok. I need to find a better way to test though because I am using the "invert" hack for the light theme and the build-in accessibility checker for Chromium and Firefox don't appear to take this into consideration. I'll need to find a basic calculator and do the math manually when I have time.
Typo fixed.
Thanks for the flow feedback, I thought the full-width code was cool and it can help for wide code without making the page text too wide (it allows the code to grow to the right) but maybe it is more confusing than it is worth. And thanks for the feedback on the background and breadcrumbs, it is good to hear both positive and negative thoughts.
Thanks, it's way easier to read now. The flow feedback is very personal, feel free to ignore it. After rewatching the article, I like how unique it is.
This is why, in the statically-typed programming language I’m working on, the project manifest is just a file written in the language itself which can export a special (typed) const to configure things like the linter. It gets to piggyback off of all the existing tooling for the language, particularly type checks, and can even be constructed using functions, etc if desired.
If we're on the topic of config languages, I'd like to plug Gura (https://github.com/gura-conf/gura). It's not too well-known, but it probably has the best design I've seen, and seems to have a good coverage of languages with an available library.
Re the first note in the post: a good serialization format is both easy to read by machine and read/write by human. I think the text protobuf file is one of such example. A (human read/write-able) config language needs to be consumed by program anyway, in a sense a config language is a human-to-computer serialization format.
Muphry's law is an adage that states: "If you write anything criticizing editing or proofreading, there will be a fault of some kind in what you have written." The name is a deliberate misspelling of "Murphy's law".
I am surprised that there is so little usage of decent scripting languages as configuration. Lua always seemed like it would have been perfect for that kind of thing, or Python, back a decade ago when Python was less... what it has turned into.
Yaml is just horrible; I've never had a good experience having to use it.
XMonad's config files are the text of Haskell Read instances, so they have to be valid members of the datatype in question. The type checking is very precise because of Haskell's fine grained type system, and it's all done automatically by the GHC runtime.
This works well if the program is written in a dynamic language like Lua, but in a compiled language it not only requires a recompile per change, but an entire development environment set up in order to modify the config at all.
> Statically typed programming languages are catching on so why don’t we extend this typing to our config files?
Because I simply don't want to expend the same amount of cognitive load to read config files as I do for code.
Yes, yaml has some minor ambiguities. These are easily solved. To use the example from the article:
countries:
- ca
- "no"
- us
There, done. The problem was solved with 2 extra characters and remembering the fact that `no` is special in yaml. Comparing that to the amount of typing I have to do to define a scheme, the syntax of which I have to learn, which I also have to read or remember and keep in mind every time I read the config, I take the 2 extra double-quotes.
And, speaking of statically typed languages: This problem would be caught immediately anyway if the config is read into static types.
YAML has multiple footguns like this, which I have to remember, forever. And anyone who works with YAML. It's unintuitive and confusing, and costs space in my brain that I really should be using for more important things.
Not to mention that if you -don't- know about these ahead of time, debugging them can be confusing.
A type system is marginally more work for decreased cognitive load and eliminating stupid, idiotic bugs that nobody should have to waste their time tracking down.
With IDE integration, the cost is pretty much negligible other than learning the syntax, which, c'mon, is not difficult and we're being paid to do it.
There are even tools like Dhall [0] that auto-generate yaml for us.
So yes it’s a footgun but it makes some sense. Most people wouldn’t really complain about true not being equivalent to “true” or 100 not keeping it “100”. People just aren’t used to yes/no being reserved words. Ruby’s klass is a funny workaround to this.
enable_feature: yes
Is totally natural. Nobody reads the spec though. If you’re outputting YAML documents with string builders you’re headed for ruin no matter what. You don’t need Dhall, you need yaml.dump which handles the types too.
Yes, but now you are losing a lot of the clean syntax that causes most people to use YAML in the first place. There is a reason that most people don't write YAML like JSON with trailing commas and comments, it is nice to cut most of this noise.
You can also use a stricter subset of YAML that removes things like the "no" footgun. Plenty of such strict parsers exist across languages. Maybe it's no longer technically YAML at that point, but you get all the nice parts of YAML without having to revamp everything with static typing.
> Comparing that to the amount of typing I have to do to define a scheme
From my use case of config files the code that is reading them knows the type anyways. So for a setup like Rust+serde there is no overhead to set this up.
> This problem would be caught immediately anyway if the config is read into static types.
That is true, but it still breaks you out of your flow. You get a confusing error, it probably doesn't tell you the exact line number and you need to look over your changes. If you changed a lot of places in the file it may be easy to miss that adding `no` to a list was the mistake. Because problems like that are easy to understand in retrospect, but if you keep reading "no" as "Norway" it is easy to look straight at this mistake and think it is fine before hunting elsewhere in the file.
I think you are right. It is still unclear if the cognitive overhead when writing the file is worth it, but from my point of view the upsides are much more valuable then you make them appear to be.
I agree that the problem is one of quicker feedback - the dev cycle should involve a program checking the yaml correctness (using types or otherwise) straight away and giving a useful error message. Too often I've seen incorrect yaml checked into git that fails with a cryptic error when deploying the application.
The strength of types, in my opinion, is composability. Most config files I've seen have ultimately pulled in input from another source and used that to create their output. Types would allow the configuration to be checked for correctness even in the face of unknowns.
https://toml.io/en/