Hacker News new | past | comments | ask | show | jobs | submit login
YAML: Probably not so great after all (arp242.net)
445 points by kylequest on Aug 18, 2019 | hide | past | favorite | 444 comments

From my experience, while YAML itself is something one can learn to live with, the true horror starts when people start using text template engines to generate YAML. Like it's done in Helm charts, for example, https://github.com/helm/charts/blob/master/stable/grafana/te... Aren't these "indent" filters beautiful?

I developed Yet Another JSON Templating Language, whose main virtue was that it was extremely simple to use and implement, and it could be easily implemented in JavaScript or any other languages supporting JSON.

We had joy, we had fun, we had seasons in the sun, but as I added more and more features and syntax to cover specific requirements and uncommon edge cases, I realized I was on an inevitable death-march towards my cute little program becoming sufficiently complicated to trigger Greenspun's tenth rule.


There is no need for Yet Another JSON Templating Language, because JavaScript is the ultimate JSON templating language. Why, it even supports comments and trailing commas!

Just use the real thing to generate JSON, instead of trying to build yet another ad-hoc, informally-specified, bug-ridden, slow implementation of half of JavaScript.

> the true horror starts when people start using text template engines to generate YAML

I just had a shiver recalling a Kubernetes wrapper wrapper wapper wrapper at a former job. I think there were at least two layers of mystical YAML generation hell. I couldn't stop it, and it tanked much joy in my work. It was a factor in me moving on.

Oh my god! I'm working on the same wrapper wrapper wrapper

I called ours The Yamlith. Name yours!

What was the straw that broke the YAML's back?

get out.

The Yamdenburg would be apt since it gives you a pretty good indication of how that's going to end.

Oh the huyamlity!

try moving to something like http://skycfg.fun

Surely the right approach needs to be generating the desired data programmatically, rendering back to YAML if needed, rather than building these files with text macros.

Kubernetes works well but pretty it is not.

> Kubernetes wrapper wrapper wapper wrapper at a former job

oh god why

Why would they have chosen to use template/text to generate YAML? That seems insane.

Surely using an encoder on an object/structure hierarchy (like people do with encoding/json) is the way to go?

On the other hand, the quality of the yaml libraries in Go wasn't great, last time I had to choose a configuration file format.

A lot of people working with YAML have an ops background and aren't familiar with basic data structures.

That’s disingenuous. Most “ops” folks I work with despise templating files but it’s the easiest way to parameterize things, especially when providing ways for “devs” to do “ops”.

Yes, we may not have a cleaner way of deploying k8s Deployment configs to different clusters but the desire to templatize YAML is easy for everyone to understand. The decision to abstract or templatize is one rooted in time and cost, not ability to understand data structures.

personally I’d prefer a templatized yaml file over an over-engineered, snowflake DSL created by a “real programmer” and not an ops person.

Ok, those aren't the only two choices though.

It probably starts with an existing YAML config file that you only need to pass one or two variables to. Then things get out of hand.

What is "an encoder"? Like a function that takes the same variables as the template would but does some work itself generating things?

An encoder is anything that serializes some data. Think `JSON.stringify()`.

YMMV, I believe most folks would call that "serialization," reserving "encoding" for turning a notionally written-down-ish representation into bytes; e.g. string -> utf8 bytes, or float -> IEEE-754 bytes.

Right, for some reason in Go (which Helm is written in), the standard library calls them encoders/decoders with marshal/unmarshal as the operations. Serialize is definitely the more common term generally.

I've seen it used even more generally for any `A -> B` (or `A -> Option<B>`) where B could be faithfully decoded back to A.

In my head, serialization is a special case of encoding when B is some type of string. In this case it's YAML.

I could probably have worded my previous comment more precisely.

At my old place we developed a small tool that wraps CloudFormation with a templating language (jinja2). This was actually great as it CloudFormation is extremely verbose and often unnecessarily complex. Templating it out and adding custom functions to jinja2 made the cfn templates much easier to understand.

I think it all depends. Most of the time I would agree that you shouldn't template yaml, but sometimes, it's the lesser of two evils.

Templating CFN is really good practice once you hit a certain scale. If you have 5 DDB tables deployed to multiple regions, and on each of them you want to specify keys, attributes, throughput, and usage alarms, at a minimum. That’s already 30-40 values that need to be specified, depending on table schemas. Add EC2, auto scaling, networking, load balancer, and SQS/SNS—now untemplated cloud formation is really unpleasant to work with.

Some of the values like DDB table attributes are common across all regions, other values like tags are common across all infra in the same region. Some values are a scalar multiple of others, or interpolated from multiple sources. For example, a DDB capacity alarm for a given region is a conjunction of the table name (defined at the application level), a scalar multiple of the table capacity (defined at the regional deployment level), and severity (owned by those that will be on-call).

To add insult to injury, a stack can only have 60 parameters, which you butt up against quickly if you try to naively parameterize your deployment.

Given all these gripes, auto-generating CFN templates was easiest for me. I used a hierarchical config (global > application > region > resource) so the deployment params could be easily manipulated, maintained, and where “exceptions to the rule” would be obvious instead of hidden in a bunch of CFN yaml. To generate CFN templates I used ERB instead of jinja, but to similar effect.

A side benefit of this is side-stepping additional vendor lock-in in the form of the weird and archaic CFN operators for math, string concatenation, etc. I don’t have a problem learning them, but it’s one of those things that one person learns, then everyone who comes after them has to re-learn. My shop already uses ruby, so templating in the same language is a no-brainer.

For cloudformation; my team a few years ago got a lot of mileage out of using troposphere.


The basic type checking done was quite helpful, and avoided some of the dumb errors that we had run into when we attempted to do everything by hand.

> This was actually great as it CloudFormation is extremely verbose and often unnecessarily complex

I think its opposite, the most lean way to deploy AWS resources. Did you wrote it yourself, in text editor? I was doing it for 5 years now. You can omit values if you're fine with defaults, you only state what needs to be different. Other tip is use Export and ImportValue to link stacks.

I kept on using JSON, even after all my buddies jumped on YAML. JSON is just more reliable, harder to miss syntax errors, and can be made readable by not using linters and keep long lines that belong on one line. Also, the brackets are exactly what they are in Python :)

> wraps CloudFormation with a templating language (jinja2)

Not sure it it is a good idea. Everyone's use case is different, though. A well written CFN template is like a rubber stamp, just change the Parameters. The template itself doesn't need to change.

Hell no, terraform is way better even though HCL isn’t the nicest DSL.

3x times LOC, 1/3x speed and weird State corruptions? Not to mention dependence on 3rd party.

k8s and helm is where I learned to dislike yaml. I now want a compiled and type safe language that generates whatever config a system needs.

I'm pretty much thinking I want Go as a pre-config where I can set variables, loops, and conditionals and that my editor can help with auto-complete. Maybe I can "import github.com/$org/helmconfig" and in the end write one or more files for config.

Helm 3 is moving to Lua, that may be better or worse.

Sounds like another short sighted decision. Why don't they support an intermediatory representation that many languages can support. Even yaml would be fine if other languages can generate it. If they had to absolutely use something why not something more main stream and popular like Python. Helm asks too much for the functionality it provides.

-1 for Turing complete config languages.

+1 for Turing complete programming languages instead of half-assed config languages.

Why are you putting logic in the config in the first place? Just let it be data.

That's like saying that God existed before creating the universe. Then you have to ask who created God? And if God was created only from data and not any logic, then the config file must have been really huge and unmaintainable.

Do you have an example of when logic would help a configuration file? Their value and use are entirely dependent on the context.

Also I can't make a head or tail of your statement, no offense intended—I can make out parts, but the whole just has no meaning to me.

Given any non-trivial data-only config file, it will always grow to the point that you'll end up needing to generate it automatically with logic. And that goes double for God's config file.

Do you have an example? I don't think I've ever needed to generate a config automatically (except in niche cases like generating configs for services in chef), and if I'm understanding correctly, nothing proposed in this thread would help with that scenario.

I suspect you're front-loading a lot of logic from the app bootup into the config file, and I'm not sure what you stand to gain conflating those two things.

Like—what's the execution order of a config file? Can you refer to a value later in the file? If so, how does it determine which value is executed before the next? If not, how do you setup circular dependencies—can you redefine config values halfway through a file? If not, you're gonna have to fall back on at-boot pre-processing anyway with sufficient complexity, so just treat a config like dumb data to begin with and do all your logic in the bootup and put all the values/whatever in the config file. And god knows, I would be strongly tempted to murder an engineer that introduced a config file capable of rewriting itself—that engineer has clearly never needed to debug another person's shitty code before.

This thread hurts so bad.

I never used helm / kubernetes before 3 months ago.

Not 2 weeks ago I needed to loop in a helm config file in order to basically say "all this same config, libraries, etc., just run this other command instead" ... because someone who makes those decisions had ~100 lines of environment-injected configuration + boilerplate in the yaml that I couldn't get rid of, needed, and would have otherwise needed to copy / paste.

Since then, those environment variables have been pulled out into a different file (refactoring!), and now we replaced a loop over 100 lines of config, with 2x sets of 15-20 lines of config boilerplate. Better, but still a lot of bull. I don't know what the right answer is, because we've got less helm templating bullshit in there, but we still need boilerplate. Because it's not like I can tear down an entire kubernetes + helm infrastructure because I don't like how the config files are written.

Configs / config generation is hard, and generally awful. If you don't see it that way, congratulations; you're either a genius in your field, you've got not enough experience, and/or you're wrong. If you believe it's easy, and we're all missing something - please, by all means, write a book on how / why configurations aren't as hard as the rest of us say they are.

Best of luck to you.

> Configs / config generation is hard, and generally awful. If you don't see it that way, congratulations; you're either a genius in your field, you've got not enough experience, and/or you're wrong.

The point I'm trying to make is that you're describing broken frameworks, data flows, and work flows, and blaming it on config generation. If you have a counter example, I'd love to see it. Discussing these things in the abstract is pretty pointless and based in emotional language/semantic quibbling rather than meaningful things people can reason about and discuss, like code comparison or time tradeoffs.

Hell, because no specific GOOD examples of configuration-as-code have been brought up, literally everyone in this thread could be considering a different pet example of theirs. It's OBVIOUSLY a waste of everyone's time without examples. Why bother comment at all—to go out of your way to punch down without contributing to the discourse?

> punch down

You say this is easy. Seems to me that you're claiming to be elevated above us all with something we don't know, claiming that everyone else is doing it wrong, all the while hiding behind anonymity.

Stop clutching your pearls and faking the victim. No one is punching down; you're claiming knowledge you don't have and are being called out for it.

Look at any one of the references cited in the thread.

It's quite hard to claim anyone is wrong when nobody (in this thread...) has made substantial claims.

Were my claims insubstantial? Do you think I faked those videos, or didn't write the code I linked to?

Lots of examples in my other post, including links to some open source code (UnityJS).

I've developed programs with tens of thousands of lines of expanded JSON in their config files. No fucking way I'm maintaining all that by hand as pure data.

See: https://news.ycombinator.com/item?id=20735231

Also: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself and https://en.wikipedia.org/wiki/Single_source_of_truth

The opposite of DRY (Don't Repeat Yourself) is WET (Write Everything Twice or We Enjoy Typing or Waste Everyone's Time) -- but twice could be ten times or more, there's no limit. Writing it all out again and again by hand as pure literal data, and hoping I didn't make any typos or omissions in each of the ten repetitions, without some sort of logical algorithmic compression and abstraction, would be idiotic.

> I've developed programs with tens of thousands of lines of expanded JSON in their config files.

To what end? I don't get what you could possibly be putting in these files that could consume so much space without refactoring.

I posted some examples above. (Or maybe it's below. The long message with lots of links.)

FYI, here are some concrete examples, and some demos of a multi player cross platform networked AR system I developed that's based on shitloads of JSON config files describing objects, behaviors, catalog entries, user interface, a multi player networking protocol, etc.

Pantomime extensively (and extensibly) uses the simple JSON templating system I described in this other link, which I wrote in C#. Everything you see is a plug-in object, and they're all described and configured in JSON, and implemented in C#:


Pantomime Playground - Immersive Virtual Worlds for the Rest of Us


Pantomime Creatures - Real-Time Augmented Reality


Bug Farm – First Minute, First Person 3D


Consumer Augmented Reality Arrives – with Pantomime


If I were to rewrite it from scratch, I'd simply use JavaScript instead of rolling my own JSON templating system, because it would have been much more flexible and powerful.

Oh wait -- I DID rewrite at least some of that stuff from scratch! To illustrate that superior JavaScript-centric approach, here's an example of some other JSON based systems I developed with Unity3D and JavaScript, one for scripting ARKit on iOS, and the other for scripting financial visualization on WebGL, both using UnityJS (an extension I developed for scripting and configuring and debugging Unity3D in JavaScript).

One nice thing about it is that you can debug and live code your Unity3D apps running on the mobile device or in the web browser while it's running, using the standard JavaScript debugging tools!

UnityJS is a plugin for Unity 5 that integrates JavaScript and web browser components into Unity, including a JSON messaging system and a C# bridge, using JSON.net.


WovenAR Tools with ARKit and Pantomime (built with an early version of UnityJS):


A description of UnityJS, and how I use JavaScript and JSON to define, configure and control Unity3D objects:




Here's a demo of another more recent application using UnityJS that reads shitloads of JSON data from spreadsheets, including both financial data, configuration, parameters, object templates, etc.



Here is an article about how the JSON spreadsheet system works, and discusses some ideas about JSON definition, editing and templating with spreadsheets, which is about a year old, but that I've developed it a lot further since writing that article.


>Representing and Editing JSON with Spreadsheets

>I’ve been developing a convenient way of representing and editing JSON in spreadsheets, that I’m very happy with, and would love to share!

>I‘ve been successfully synergizing JSON with spreadsheets, and have developed a general purpose approach and sample implementation that I’d like to share. So I’ll briefly describe how it works (and share the code and examples), in the hopes of receiving some feedback and criticism. Here is the question I’m trying to answer:

>How can you conveniently and compactly represent, view and edit JSON in spreadsheets, using the grid instead of so much punctuation?

>My goal is to be able to easily edit JSON data in any spreadsheet, conveniently copy and paste grids of JSON around as TSV files (the format that Google Sheets puts on your clipboard), and efficiently export and import those spreadsheets as JSON.

>So I’ve come up with a simple format and convenient conventions for representing and editing JSON in spreadsheets, without any sigils, tabs, quoting, escaping or trailing comma problems, but with comments, rich formatting, formulas, and leveraging the full power of the spreadsheet.

>It’s especially powerful with Google Sheets, since it can run JavaScript code to export, import and validate JSON, provide colorized syntax highlighting, error feedback, interactive wizard dialogs, and integrations with other services. Then other apps and services can easily retrieve those live spreadsheets as TSV files, which are super-easy to parse into 2D arrays of strings to convert to JSON.


>Philosophy: The goal is to leverage the spreadsheet grid format to reduce syntax and ambiguity, and eliminate problems with brackets, braces, quotes, colons, commas, missing commas, tabs versus spaces, etc.

>Instead, you enjoy important benefits missing from JSON like like comments, rich formatting, formulas, and the ability to leverage the spreadsheet’s power, flexibility, programmability, ubiquity and familiarity.

More info:




I really appreciate the post, I have a much better understanding.

I don't use configs in this way, or if I did, I would not be inclined to call them configs. I can certainly appreciate the problem of processing of JSON objects in many different contexts. I was more referring to a UX concept of providing a configuration interface—short of something like emacs that gives full functionality, simpler and easily debuggable is emphatically better.

The topic of this discussion is YAML (and JSON as an alternative), and of this thread is "using text template engines to generate YAML", which covers a lot more than just config files. YAML and JSON and template engines are used for a hell of a lot more than just writing config files, but they're also very useful for that common task too. The issues that apply to config files also apply to many other uses of YAML and JSON. Dynamically generated YAML and JSON are very common and useful, and have many applications besides config files.

The fact that you've never done and can't imagine anything complicated enough to need more than a simple hand-written data-only config file doesn't mean other people don't do that all the time. It's simply a failure of your imagination.

What I can't understand is what you were getting at about "punching down". When you say things like "I would be strongly tempted to murder an engineer", that sounds like punching down to me. And why you were complaining nobody gave any examples, by saying "no specific GOOD examples of configuration-as-code have been brought up". Don't my examples count, or do you consider them bad?

So what was bad about my examples (or did you not read them or follow any of the link that you asked for)? Pantomime had many procedurally generated config files, using the JSON templating engine I described, one for every plug-in object (and everything was a plug-in so there were a lot of them), as well as some for the Unity project and the build deployment configuration itself. It also used dynamically generated JSON for many other purposes, but that doesn't cancel out its extensive use of JSON for config files.

Here are some concrete examples of some actual JavaScript code that dynamically generates a bunch of JSON, both to create, configure, send message to, and handle messages from Unity3D prefabs and objects, and also to represent higher level interactive user interface objects like pie menus.

What this illustrates should be blindingly obvious: that JavaScript is the ideal language for doing this kind of dynamic JSON generation and event handling, so there's no need for a special purpose JSON templating language.

Making a JSON templating language in JavaScript would be as silly as making a HTML templating language in PHP (cough i.e. "Smarty" cough).


JavaScript is already a JSON templating language, just as PHP is already an HTML templating language.

UnityJS applications create and configure objects by making lots and lots of parameterized JSON structures and sending them to Unity, to instantiate and parent prefabs, configure and query properties with path expressions, define event handlers that can drill down and cherry pick exactly which parameters are sent back with events using path expressions (the handler functions themselves are filtered out of the JSON and kept and executed on the JavaScript side), etc.

At a higher level, they typically suck in a bunch of application specific JSON data (like company models and financial data), and transform it into a whole bunch of lower level UnityJS JSON object specifications (like balls and springs and special purpose components), or intermediate JSON user interface models like pie menus, to create and configure Unity3D prefabs and wire up their event handlers and user interfaces. Basically you're transforming JSON to JSON, and associating callback functions, and sending it back and forth in messages and events between JavaScript and Unity.

There are also a bunch of standard JSON formats for representing common Unity3D types (colors, vectors, quaternions, animation curves, material updates, etc), and a JSON/C# bridge that converts back and forth.


This is a straightforward function that creates a bunch of default objects (tweener, light, camera, ground) and sets up some event handlers, by creating and configuring a few Unity3D prefabs, and setting up a pie menu and camera mouse tracking handlers.

Notice how the "interests" for events include both a "query" template that says what parameters to send with the event (and can reach around anywhere to grab any accessible value with path expressions), and also a "handler" function that's kept locally and not sent to Unity, but is passed the result of the query that was executed in Unity just before sending the event. The point is that every "MouseDown" handler doesn't need to see the exact same parameters, it's a waste to send unneeded parameter, and some handlers need to see very specific parameters from elsewhere (shift keys, screen coordinates, 3d raycast hits, camera transform, other application state, etc). So each specific handler gets to declare exactly which if any query parameters are sent with the event, up front in the interest specification, to eliminate round trips and unnecessary parameters.


The following code is a more complex example that creates the Unity3D PieTracker object, which handles input and pie menu tracking, and sends JSON messages to the JavaScript world.pieTracker object and JSON pie menu specifications, which handle the messages, present and track pie menus (which it can draw with both the JavaScript canvas 2D api and Unity 3D objects), and execute JavaScript callbacks (both for dynamic tracking feedback, and final menu selection).


Pie menus are also represented by JSON of course. A pie can contain zero or more slices (which are selected by direction), and a slice can contain zero or more items (which are selected or parameterized by cursor distance). They support all kinds of real time tracking callbacks so you can provide custom feedback. And you can make JSON template functions for creating common types of slices and tracking interactions.

This is a JavaScript template function MakeParameterSlice(label, name, calculator, updater), which is a template for creating a parameterized pie menu "pull out" slice that tracks the cursor distance from the center of the pie menu, to control some parameter (i.e. you can pick a selection like a font by moving into a slice, and also "pull out" the font size parameter by moving further away from the menu center, and it can provide feedback showing that font in that size on the overlay, or by updating a 3d object in the world, to preview what you will get in real time. This template simply returns a blob of JSON with handlers (filtered out before being sent to Unity3D, and kept and executed locally) that does all that stuff automatically, so it's very easy to define your own "pull out" pie menu slices that do custom tracking.


It sounds like you're assuming that configurations are only created before running a program. But you can also create them while programs are running, to configure dynamically created objects or structures, too. And you can send those configurations as messages, to implement, for example, a distributed network object system for a multi player game. So you may be programmatically creating hundreds of dynamic parameterized "configuration files" per second.

How about normally-Turing-complete languages that can be stripped down to non-Turing-completeness to make a configuration DSL?

This is exactly what Tcl supports / was designed to do (and in turn is one of my motivations for developing OTPCL). This is also exactly what your average Lisp or Scheme supports.

Any reason why you think they're bad? Sounds enticing to me to be able to have a bit of logic in configuration file.

Programming language design and implementation is a huge and hard problem. What you get is an incomplete frustrating language full of semantic oddities and confusions without any serious support tooling to help you out.

If you use it in anger, you quickly need all language features e.g. importing libraries, namespaces, functions, data-structures, rich string manipulation etc. But you rarely get these.

At run-time, you don’t have a debugger or anything leading to a maddening bug fix experience because config cycle times are really high.

Because it’s a niche language, only one poor soul in a team ends up the expert of all the plentiful traps.

Eventually... you give up and end up generating the config in a proper language and it feels like a breath of fresh air.

Why not use an actual language instead? Like Guix uses guile.

One of the most ridiculous examples of this was the Smarty templating language for PHP.

Somebody got the silly idea in their head of implementing a templating language in PHP, even though PHP is ALREADY a templating language. So they took out all the useful features of PHP, then stuck a few of them back in with even goofier inconsistent hard-to-learn syntax, in a way that required a code generation step, and made templates absolutely impossible to debug.

So in the end your template programmers need to know something just as difficult as PHP itself, yet even more esoteric and less well documented, and it doesn't even end up saving PHP programmers any time, either.


>Bad things you accomplish when using Smarty:

>Adding a second language to program in, and increasing the complexity. And the language is not well spread at all, allthough it is’nt hard to learn.

>Not really making the code more readable for the designer.

>You include a lot of code which, in my eyes, is just overkill (more code to parse means slower sites).


>Most people would argue, that Smarty is a good solution for templating. I really can’t see any valid reasons, that that is so. Specially since “Templating” and “Language” should never be in the same statement. Let alone one word after another. People are telling me, that Smarty is “better for designers, since they don’t need to learn PHP!”. Wait. What? You’re not learning one programming language, but you’re learning some other? What’s the point in that, anyway? Do us all a favour, and just think the next time you issue that statement, okay?


>I think the Broken Windows theory applies here. PHP is such a load of crap, right down to the standard library, that it creates a culture where it's acceptable to write horrible code. The bugs and security holes are so common, it doesn't seem so important to keep everything in order and audited. Fixes get applied wholesale, with monstrosities like magic quotes. It's like a shoot-first-ask-questions-later policing policy -- sure some apps get messed up, but maybe you catch a few attacks in the process. It's what happened when the language designers gave up. Maybe with PHP 5 they are trying to clean up the neighborhood, but that doesn't change the fact when you program in PHP you are programming in a dump.

> One of the most ridiculous examples of this was the Smarty templating language for PHP.

Wow... Yuck!

Lua is at least a "standard" and rather sane language. Not ad hoc insanity.

You should check out dhall-lang

Are you looking for something like https://jsonnet.org/ ?

Some templating languages such as Jsonnet[0] add built-in templating and just enough programmability to cover basic operations like templating and iteration.

I originally felt it was overly complex, but after seeing some of the Go text/template and Ansible Jinja examples in the wild, it actually seems like a good idea.

Perhaps we should more strongly distinguish between “basic” data definition formats and ones that need to be templated. JSON5 for the former and Jsonnet for the latter, for example.

agreed, text templating of yaml (or any structured content) does not make sense. too much context (actual config structure) is lost if plain text is used.

i've collaborated on ytt (https://get-ytt.io) - yaml templating tool. it works directly with yaml structure to bind templating directives. for example setting a value is associated with a specific yaml node so that you dont have to do any manual indenting etc. like you would with plain text templating. defining functions that return yaml structures becomes very easy as well. common problems such as improperly escaped values are gone.

i'm also experimenting with a "strict" mode [1] that raises error for questionable yaml features, for example, using NO to mean false.

i think that yaml is here to stay (at least for some time) and it's worth investing in making tools that make dealing with yaml and its common uses (templating) easier.

[1] https://github.com/k14s/ytt/blob/master/docs/strict.md

Shoot, it will be a hell to test such monster, say you have a single typo somewhere, lol

The issue is, I think most people (myself included) enter YAML into their lives as basically a JSON alternative with lighter syntax. Without really realizing, or perhaps without internalizing, the rather ridiculous number of different ways to represent the same thing, the painful subtle syntax differences that lead to entirely different representations, the sometimes difficult to believe number of features that the language has that are seldom used..

It's not just alternate skin for JSON, and yet that's what most people use it for. Some users also want things like map keys that aren't strings, which is actually pretty useful.

I recall there being CoffeeScript Object Notation as well... perhaps that would've been better for many use cases, all things said.

I've never understood this. JSON is really not that difficult to work with manually. I tend to write my config files as JSON for utilities I write. What is it with peoples' innate aversion to braces?

I don't aversion to braces. Rather, my issues with JSON is that it doesn't have comments and that you cannot use a optional trailing comma.

Having spent a nontrivial amount of my life hand editing large JSON files now, I have to agree here. Lack of comments and trailing commas are a real QOL issue.

Also no multiline strings. Using \n or string arrays is painful.

I don't get why TOML is so underrated, it's barely mentioned in the HN discussion

Thanks for mentioning it; I hadn't encountered it yet and it seems like a very sane config file format. There's several mentions in this discussion actually, nearly all positive.

And the required double quotes around strings. YAML’s string handling is a lot easier to deal with.

I think it is good to require quotation marks for strings, at least for values (although I could live with it if quotation marks for strings are allowed even if not required, since then, if you do not like the feature of not having quotation marks for strings, you can just not use that feature).

Maybe it would be sense if quotation marks were not required for keys with only a restricted character set which are not an empty string, though.

No quotes around keys would be sufficient, honestly. I use YAML a lot for API documentation, and there are still some cases where wrapping your values in quotes is necessary. But requiring it for keys becomes very annoying.

It’s also a lot less obvious. What’s so difficult about wrapping a string in double quotes?

It gets annoying when you have to do what seems unnecessary.

It's not really unnecessary when someone can write an entire article on how unfathomable the output is without them.

It’s easier until you hit one of the cases where a particular value is interpreted as a different type, possibly in a very confusing context. I’ve seen that bite enough people that I end up quoting strings to avoid confusion.

It’s also extremely hard to learn as a beginner.

“Hey, I deleted a character in a string and now I am getting this weird schema validation exception”.

Or “Why did it break when I changed the version from 3.7 to 3.7.1?”

JSON5 looks like good, actually. It has all of the added features that it should have.

They don't mention mismatched surrogates and otherwise invalid Unicode characters, but they should perhaps be implementation-dependent, like duplicate keys are. (It can either allow them or report an error.) There is also the possibility that some implementations may wish to disallow "\x00" or "\u0000" too, I think.

The thing I disagree is the part about white space. U+FEFF should be allowed only at the beginning of the document (optionally), and other than that only ASCII white space should be allowed. Unquoted keys also should be limited to ASCII characters.

Other than that, I think it is good.

JSON is serviceable as an intermediate format, machine-generated and machine-consumed.

It is outright bad as a human-operated format. It explicitly lacks comments, it does not allow trailing commas, it lacks namespaces, to name a few pain points.

YAML is much more human-friendly, with all its problems.

I often hear the “comments aren’t supported” argument against JSON, but as a daily consumer, creator, and maintainer of JSON, I honestly can’t recall ever _really_ needing comments in JSON. It tends to be somewhat self documenting in my experience.

A config file without comments can mean serious annoyance.

If JSON was developed more recently on places like GitHub, it would never have ended up like that with that many deficiencies.

When maintaining a JSON file, did you ever happen to wonder why a particular value is what it is?

This is where comments belong.

If it's that important and complex, have an accompanying README that lists line numbers and comments.

Good idea! And then put a comment into the configuration file that refers to where the documentation is.. ah, f__k!

Lesson to learn: Nobody reads the docs.

This is such a dumb aphorism. I read and create docs every single day.

If the comments are so critical that it is a problem, then an accompanying file with those comments would be used. Otherwise, it's just a bunch of crocodile tears.

The lack of comments is the real problem. When you need to explain why a particular parameter in the config file is set a certain way JSON becomes a real problem.

Comments, comments, comments.

Seriously, our batch jobs for better or worse have configs with a bunch of parameters that are passed around as json, and while most variable names are intuitive and there is documentation on the wiki, and most often the config can be autogenerated by other tools it would still be better if when I manually open it in the config itself I would easily see the difference between n_run_threads vs n_reg_threads, etc...

json's lack of int types is what ruins it for me

JSON alternative with lighter syntax and comments is basically what I tried to make StrictYAML.

I made it largely because I saw a disconnect with what YAML was, and what people - including me - thought it was (which is what it should be).

Don't agree with non-string map keys though... they're a complication I never saw a use for.

They’re fairly useful in applications that use numeric IDs. For example, if I’m using SQL, and I have a table with an AUTOINCREMENT primary key, I’m going to have a lot of numeric IDs. If I want to reference these in a config file of some kind, I don’t want to have to read them as strings and handle the parsing on my end.

Even if you’re of the opinion that IDs shouldn’t be numeric, there are a lot of cases where you’re stuck with integers—on Linux, user IDs, group IDs, and inodes are just a few examples.

Ah I see, yes that makes perfect sense. I've used integer keys too. Sorry, I thought by non-string you meant non-scalar - i.e. the idea of using lists as keys (allowed in YAML).

I think they did mean nonscalar keys. Say I have a compoubd primary key in a database, over 3 columns. In YAML, representing that key as an array of the three columns' values (or a map from column name to value) makes sense, and so does using that as a key in other maps.

I was suspicious of YAML from day one, when they announced "Yet Another Markup Language (YAML) 1.0", because it obviously WASN'T a markup language. Who did they think they were fooling?


XML and HTML are markup languages. JSON and YAML are not markup languages. So when they finally realized their mistake, they had to retroactively do an about-face and rename it "YAML Ain’t Markup Language". That didn't inspire my confidence or look to me like they did their research and learned the lessons (and definitions) of other previous markup and non-markup languages, to avoid repeating old mistakes.

If YAML is defined by what it Ain't, instead of what it Is, then why is it so specifically obsessed with not being a Markup Language, when there are so many other more terrible kinds of languages it could focus on not being, like YATL Ain't Templating Language or YAPL Ain't Programming Language?


>YAML (/ˈjæməl/, rhymes with camel) was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. Originally YAML was said to mean Yet Another Markup Language, referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.


>In computer text processing, a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. The idea and terminology evolved from the "marking up" of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red or blue pencil on authors' manuscripts. In digital media, this "blue pencil instruction text" was replaced by tags, which indicate what the parts of the document are, rather than details of how they might be shown on some display. This lets authors avoid formatting every instance of the same kind of thing redundantly (and possibly inconsistently). It also avoids the specification of fonts and dimensions which may not apply to many users (such as those with varying-size displays, impaired vision and screen-reading software).


YAML is bad.

Every YAML parser is a custom YAML parser.


The problem is with parsers, how they are implemented or used. YAML actually has a way to specify type of the data, alternatively the application supposed to suggest desired type. What's this take is showing is what types are assumed when they are not specified.

Oh Puppet, why did you use your own executable YAML.

I'll say it: I think YAML is great and a joy to use for configuration files. I can write it even with the dumbest editor, I can write comments, multi-line strings, I can get autocompletion and validation with JSON schema, I can share and reference other values. It allows tools to have config schemas that read like a natural domain specific language, but you already know the syntax. I haven't had problems with it at all.

This was me too - until yesterday, when I made a minor change to one of our YAML config files and everything broke. On investigation it turned out that all of our YAML files had longstanding errors but those errors happened to be valid syntax and also did not cause any bad side effects, so we had been getting away with it by pure luck until I made a change that happened to expose the problem.

So now no longer a YAML fan...

That would make me not a fan of the particular parsers/validators I've been using, rather than not a fan of YAML.

The big strike against YAML I see there is that it needs a good conformance test suite and implementations need to be tested against it. But that's not a problem with the format but a fairly easy to fix ecosystem problem.

> of the particular parsers/validators

But the syntax was valid, the parsers/validators would've been correct to accept it.

I agree. As long as you're using a strict parser, I've found YAML to be much nicer for configuration than JSON. I use Python's ruamel.yaml library, and have never had any weird type problems. Once the nesting gets too deep, it can be a pain. but that's the same for JSON.

I have found myself using TOML more and more for configuration, though. It helps a lot with keeping things flat and easy to read. I'll still prefer YAML over JSON for human-writable files, but I'm starting to prefer TOML over YAML.

I've got to say it is the most frustrating config file ever to wrote. The only time I have to use it is for Docker Compose and I am constantly fighting vim on indentation and trying to make sense of confusing errors about "unexpected block start." Do you have any suggested vimrc for YAML?

fish shell is looking for a new text serialization format for its history file (currently it uses an ad-hoc broken psuedo-YAML).

Boxes to check:

1. Self describing format

2. SAX-style parser available to C++

3. Easy for users to understand and ad-hoc parse using command-line tools

4. No document closing necessary, so appending is trivial

YAML looks pretty good:

    - cmd: git checkout file.txt
      when: 1565133286
      pwd: /home/me/dir/
      - file.txt
protobuf is also an option:

    entry {
      cmd: "git checkout file.txt"
      when: 1565133286
      paths: "file.txt"
though I am unsure of how well its text serialization is supported.

Any suggestions?

Disclaimer: I work on Tree Notation. (https://github.com/treenotation/jtree)

Here's a proposal: use a Tree Language.

I created a demo for you called "Fished": https://github.com/breck7/fished.

Took me just a few minutes but already get type check, autocomplete, syntax highlighting, and more.

Tree Notation is early, and there will be kinks until the community is bigger, but I think it may be useful for you.


Wow. This looks really cool. Is there a sort of design defense on how this was designed (tree notation)?

Not sure if I’m familiar with the term “design defense”. Can you explain?

Stumbled into the idea. Basically just brute forced it. Tried thousands of things, built a huge database of languages, and tried to keep it simple.

I take the idea of “design defense” from Pyramid (Python Web Framework) [0], and have incorporated it into documentation on my projects.

Basically, it’s a narrative discussion of how this solution came to be, the trade offs involved, and perhaps its relationships with prior art.

[0] https://docs.pylonsproject.org/projects/pyramid/en/1.10-bran...

Do you know if this is called something else? (perhaps in other disciplines)

Googling "Design defen{c,s}e" just got me a whole lot of military contractors.

http://enwp.org/Design_document is the commonly used term of art.

Very cool. Thank you. Seems like it would be a useful exercise. Added to the todo list.

How about JSONL (JSON Lines)? http://jsonlines.org/

Ps. Thanks for (all the) fish, it's my daily driver shell and keeps me that much more sane c.f. the alternatives.

That's really close to [RFC 7464](https://tools.ietf.org/html/rfc7464), JSON Text Sequences. It uses U+001E RECORD SEPARATOR. The `jq` tool supports those if you pass a flag.

So just add a record separator at the end of each line?

jq can also handle newline-separated JSON objects, with the -s flag.

(suggestion) Drop the 4th requirement.

Having to close contexts is a VERY good 'sanity check' to see if something is malformed or not.

If appending is necessary make the parser handle multiple copies of the namespace and merge them upon output. Unknown keys and sections should also always be copied from input to output (this is how you embed comments).

I'm interested, can you explain more about the merging idea?

To clarify the requirement, history could be a JSON array of objects:

        {"cmd": "git checkout", "when": 1234 },
        {"cmd": "vagrant up", "when": 4567 }

To append an entry to this file and keep it valid, one must locate the closing square bracket and overwrite it. That work is what I hope to avoid.

Why not use a “stream” of objects?

  {"cmd": "git checkout", "when": 1234}
  {"cmd": "vagrant up", "when": 4567}
Not sure about other languages and libraries, but Go supports this out of the box[1]. And while we're at it, why not CSV? That can be processed with awk.

  "git checkout","1234"
  "vagrant up","4567"
[1]: https://play.golang.org/p/sTN9z4Kv3DB

CSV is fine for simple cases but has issues with versioning (adding new/optional fields) and nested data like arrays.

The object stream idea is apparently supported widely and seems pretty strong. Thanks for the suggestion!

I use JSON streams a lot with command line tools. Keep in mind that it’s limited in that each object must consume a single line. This allows you to recover from syntax errors in a single entry; each line is a fresh start.

Have you considered SQLite? I know it’s not a friendly text format, but it alleviates a lot of the issues with the “append to a text file” approach, such as concurrency. It’s great for this sort of thing.

Loosely running with your example:

A file already exists with this content:

[{"cmd": "git checkout", "when": 1234 }]

Another tool wants to add a setting/element/etc, and simply creates a new config object with only the change in question included and APPENDS the existing config.

[{"cmd": "git checkout", "when": 1234 }][{"cmd": "vagrant up", "when": 4567},{"comment": "Comments, notes, etc are kept even if they don't validate to the recognized configuration options."}]

A configuration validator / etc loads the config and merges these in state-machine order, over-writing existing values with the latest ones from the end of the stream of objects and then determining if the result is a valid configuration. (Maybe file references fail to resolve / don't open / there's some combination of settings that's not supported...)

      {"cmd": "git checkout", "when": 1234 },
      {"cmd": "vagrant up", "when": 4567},
      {"comment": "... actually kept as above"}

> Unknown keys and sections should also always be copied from input to output (this is how you embed comments).

Better than nothing I guess, but I'd say just use a syntax that supports comments.

S-Expressions are quite simple, there are some parsers floating around in well-known projects, although I'm not sure they're SAX-style: https://leon.bottou.org/projects/minilisp

I also wonder if you need a text format, or if SQLite or systemd's journal API would work.

I love proto, but the textformat was an after thought. The binary format is rigorously defined, portable, extensible and optimized. The text format was reverse engineered from the c++ implementation after the fact when folks found textproto useful. Unfortunately there are discrepancies between languages around the corner cases of the textformat and that's the sad world we live in. Avoid letting textproto be part of your user exposed interface.


TOML would be great, if not for an annoying obscure detail in the specification that makes it hard to use for my typical use cases (scientific computation) [1]. Moreover, I find quite unintuitive how you are supposed to specify array of tables [2]: this kind of is much easier in JSON (which is the format I am currently using, although it is far from perfect).

[1] https://github.com/toml-lang/toml/issues/356

[2] https://github.com/toml-lang/toml#user-content-array-of-tabl...

That is an annoyingly obscure detail, I think the wrong decision was made there. Hopefully it gets reversed.

Personally I really like the array of tables syntax. It is a little unintuitive but it's not difficult to remember. It's useful for fulfilling the OP's "No document closing necessary, so appending is trivial" requirement.

And if you don't want to use it, you can always use inline tables in an array, just like JSON.

Can't recommend TOML enough. I use it for everything. Super simple and easy to edit.

It fulfills all of the requirements. There are several available C++ TOML parsers, including one from Boost.

Do you ever have the limitation that arrays must be of a single type come up as an issue?

I haven't run into that. I think in the contexts for which TOML was meant, it isn't that big of an issue though. Also, I'm not entirely sure how many parsers enforce that restriction.

TOML is great when your data are mostly flat.

It's great at representing nested data in a flat way, as well.

Obligatory "thanks for fish shell".

Try just using line-delimited JSON objects (http://jsonlines.org/). It ticks all of your boxes, especially 3: "jq -s '.cmd' fish_history | histogram".

Neither YAML or Protobufs are quite as easy as that.

All in all it's ridiculously simple, easy to parse in a variety of languages and each row is a single line that's simple to iteratively parse without loading the whole thing into memory.

this seems like it makes json useful for logging, but not too useful as configuration. For instance, it doesn't support commenting, and it seems like every line needs to have all its children compressed onto one line?

Indeed, it's not suitable for configuration. But we are talking about logging shell history, not configuring it.

Tcl with control structure commands disabled and infix assignment for convenience. Jim Tcl is a lightweight implementation if the main line isn't workable.

I'm tempted to suggest CSV.

I've used YAML as the format for a config file, and I certainly regret that choice. Trying to explain to someone that doesn't know YAML how to edit it without setting them up for failure is quite annoying. There are too many non-obvious ways to screw up, like forgetting the space after the colon or of course bad indentation.

YAML is easier to read and write. That's the benefit. It's also always going to be smaller than anything JSON or XML. Maybe it's not as correct, maybe some people don't like it, I don't really mind it. I don't see it really going anywhere soon either considering Kubernetes and the lack of alternatives in widespread usage.

I've never had someone that needed extensive help understanding YAML and that's besides reviewing work for people just coming up to speed. Find me an IDE or editor that doesn't have YAML support. Also, YAML supports comments so if you have pitfalls people need to know about you can document them inline.

Your argument is people who don't know things might screw stuff up. Well Yeah! This applies to everything.

>”YAML is easier to read and write.”

You may be surprised to find that there’s significant disagreement on that point.

Not surprised at all, people on the internet complain about everything. When they build something better I'll be the first to jump ship.

> When they build something better I'll be the first to jump ship.

Will you though?

With learning a tool (almost any tool), its value increases with more use until you find a local maximum. The amount of effort to switch to something with a higher maximum at that point will definitely be considered by most people as part of the cost of that competitor. It is very hard to write off years of your life on anything.

When was the last time you did this?

It could take a few months at a minimum, just to see if you can understand it. Then, perhaps years of building things another way just to get good at it. Would you really give a few years of your life in making configuration suck less forever?

If you're serious, I have a potential solution: We don't have configuration files (or indention) or parsing problems, and users aren't the slightest bit confused on how to configure our applications (although they are often surprised if they've ever had to use a configuration file!). The downside is it's going to require a lot of re-learning on your part, and there's little I can do to make it any easier for you.

Will I? Personally, I enjoy tinkering with new tech. If we look at trends in technology, it's an absolute guarantee.

The last time I wrote off technology? In 30 years I've been doing this, I would say constantly, especially with tooling. I wouldn't consider the lessons learned using defunct tools to be lost either.

I'm not complaining about serialization formats, so why would I dedicate time to making it better?

Solution? I'm a quick learner so I'm not overly concerned. In a perfect world, you could just point me to your documentation.

Ok, I'll give it a shot.

Smalltalk environments don't typically have "configuration files" because it's so much easier to just directly manipulate the configuration parameter. Interactive development- the kind impossible with anything but the most specialised of IDE-- simply makes "config files" obsolete. Take a look at the seaside configuration guide[1] to get an idea what this is like.

In k/q[2] we also don't typically use configuration files, but for a different reason: Every type can be serialised over network or onto the disk, and when it's written to the disk it's usually in a format we can mmap() and access transparently. This is how q is also a database -- the data types q supports also includes tables.

k/q also has a built-in event loop that's not dissimilar to what you get when you run nodejs with the debugger port open except it's fast, and it's the regular way k/q processes communicate with each other.

What typically happens is that a table is designed for configuration, and we just expose it to the UI. Production environments are usually locked down so the only UI is to edit existing configuration parameters (and those are permissioned accordingly). These UI are typically quite general for any k/q data type, so they're quite rich and easy for people to use.

Then parts of the application interested in configuration just query the appropriate configuration table - this is only about 1000x faster than you would expect connecting to a remote database, and in many ways it's similar to a python application just storing its config in an sqlite database, except SQLite doesn't let you have a table as a data type so you can't put a table into another table, and you don't have tooling around comments and advice like you do with k/q UIs.

There are other places if you look carefully: Environments people have used (even beyond thirty years ago) that didn't have configuration "files", often had interesting and useful solutions to storing configuration. People tended to build configuration into part of their application, and so the legacy of that has tended to be excellent tooling instead of novel file formats.

[1]: http://www.shaffer-consulting.com/david/Seaside/Configuratio...

[2]: https://kx.com/

You're trying to tell me a database or passing config options to your program are better than configuration files? What you're describing is the entire reason and purpose of why configuration files exist. YAML is also not limited to configuration files. If we look at the example you provided it's pretty clear:

"From the point of view of a component, the configuration can be thought of as a Dictionary-like collection"

This is exactly what YAML and JSON provide too. Using a configuration file is a choice, so are parameters, and so are databases. I'm not really understanding what you're trying to get at?

No. I'm saying when your application language is also your database or your operating environment [or for some other reasons], you don't need a configuration format.

The reason configuration file [formats] exist is because many programs are configurable and programmers are too lazy (or are not specified to, take your pick) to build a configuration tool [that has all their needs]. Configuration files are inferior in every way to an integrated and well-thought-out configuration process except that they may be easier to build and use in less ideal environments.

JSON is a fine format for interchange, and even persistence (i.e. to store configuration) but as a "configuration file" that people are expected to edit in their own way it is lacking, and that's why there are things like YAML and TOML and a million other things.

Giving meaning to whitespace causes so many headaches and yet people still embrace Python, for some reason. I don’t understand it.

Your editor makes a world of difference here. Since you shouldn't be writing brace-language code without indents anyways, the biggest issue remaining is mixing tabs and spaces. Gedit makes this a big pain with it's default config (it doesn't even auto-indent) but Atom and IDLE handle it well.

Code you write yourself is not usually the source of problems with significant whitespace; it's situations like posting code on websites and discussing it where code in a whitespace-significant language becomes next-to-useless when leading whitespace is stripped, whereas code in any other language will still survive and then easily be autoformatted without changing its meaning.

If I were to use such a shitty website, I'd rather make a pastebin and link to it, instead of forcing every reader to reformat it themselves.

Can’t remember the last time this has actually happened to me. In what websites are people posting code without code block formatting support? Like, instant messengers?

Funfact, even Facebook, Whatsapp, and Telegram support preformatted text in triple backticks.

I’ve heard this argument in a lot of contexts, and it has always struck me as saying, “if hitting yourself with this bat hurts, try wrapping this towel around it and maybe it will hurt less.”

Python 3 rejects mixed whitespace so the problem will be caught quickly.

If I pinky promise to indent my code anyway, then why does it matter whether I also have braces or not? In fact, braces allow me to press one button in my editor and get the indentation absolutely perfect, without affecting semantics.

Braces also allow for easily copying and pasting blocks of code because the braces delimit the semantics of the copied text. Because your code is already indented, with white space indentation you have to check that a) you pasted the first line at the right indentation level b) every subsequent line is also at the right level relative to the first line. No small feat.

> If I pinky promise to indent my code anyway, then why does it matter whether I also have braces or not?

Without braces copy-pasting becomes a context-sensitive headache. That's poor usability.

Admittedly I’ve been writing code in Python for many years now but, even from the start I never had a problem with the significance of whitespace.

Quite the opposite.

I like to format my code nicely anyways (or rather, mostly my editor does it for me because I’ve asked it to do so).

I indent with two spaces usually, regardless of language. And have my editors configured to insert two spaces when I press tab.

JavaScript, Rust, Python, C. Same difference, in terms of how I use whitespace.

The main headaches are due to people either wanting to copy and paste code from various sites, or wanting to write really deeply nested code.

If you're writing well structured, original code in Python, it's generally cleaner and easier than other languages because the syntax avoids ambiguities that other languages have.

The difference in my experience is that once you know what's wrong with your whitespaces in Python, you're out of the woods. The interpreter is your friend from that point onward. YAML parsers, on the other hand, give you these really strange errors that are pretty difficult to understand, and it doesn't end with whitespaces.

There are quite a few comments saying they don't like python even from 10+ year users.

Language becomes popular largely through library ecosystem and resources around it, not just how the language looks. I think Google embracing it had a good role in acquiring mind shares.


YAML is so bad for human writing. Everytime I write ansible tasks, I get confused with indentation and how to do arrays etc. JSON and YAML is frankly a generation behind compared to TOML or JSON5.

I’m not keen on how so many tools and services opt for YAML by default, either. Both JSON and YAML are a nightmare to handle once you’ve got 3000 line files and several layers of nesting.

CI would be a lot nicer to use if it didn’t rely on a single YAML file to work. And if you want to switch, suddenly you had a build step to convert back to YAML.

I keep my YAML CI files as minimal as possible by putting the logic into a Makefile and/or shell scripts and just have the YAML invoke that.

As an ansible user, I hate YAML and its broken parsers with a passion, but the security objection does not make much sense. It does apply verbatim to any parser of anything if the implementation decides that a given label means "eval this content right away". I fail to see how this can be a fault of the DDL rather than the parser's.

The reason this is a fault of the DDL and not the parser is that the DDL spec decides that it has label that evaluates a command. The parser then has two options, either implement it or not conform to the spec (and essentially implementing a different DDL). For programming languages it makes sense to have an eval label/command. For configuration/serialization DDLs I think it's a terrible choice.

And terrible it is indeed, but I cannot find it specified - the strings eval, exec, command, statement do not even occur in the official specs (shallow doc perusal, I know)

That's because there's nothing in the spec stating anything about execution. The parent is simply incorrect. That's why they haven't responded.

> DDL spec decides that it has label that evaluates a command

This is simply wrong. There is nothing in the spec stating that.

> As an ansible user, I hate YAML and its broken parsers with a passion

Could you elaborate on this? I use Ansible daily and I've never had a problem with YAML once I took some time to understand it. What do you mean by broken parsers? I'm assuming that's something Ansible specific you are referring to.

I intensely dislike yaml's whitespace-based syntax because whitespace is white, and it gives very little visual context expecially in long, nested documents. Editors that expand/collapse branches do help some, but are no match for highlighting matching pair of braces in other saner formats/languages (I am also not a fan of syntactic whitespace in python, if you get my drift.)

And ansible's parser is broken, in more ways that I can remember (haven't been writing playbooks and stuff for a couple of months now). If you like pointless pain, try embedding ":" in task names for a demo (or one of other several "meta" characters: the colon is just the one that ends to recur most).

I will give a passing mention to the smug, vague error message "You have an error at position (somewhere in the middle of the file) It seems that you are missing... (something) we may be wrong (they almost always are) but it appears it begins in position (some position close to the first line)" that sets off a hunt for the missing brace/colon/space/whatever and makes me want to do stuff to the person who devised it.

This compounds with the confusion brought on weaving of yaml's and jinjia2 syntaxes and ansible's own flakiness on deciding what is evaluated when - which decides when and if a variable does indeed change, when does yes means "yes" rather than true or 1, but not '1' or "true" (try prompting the user for a boolean variable, and find yourself writing if ( (switch == "true") or (switch == "1")) in short order).

Pity that ansible is so damn convenient, or I would have ditched it long time ago for anything - bash included (OK, maybe not bash).

Funny, as an Ansible user I love YAML. It works so well for me.

So what's the HN consensus on the best format for config files?

Is it TOML as the author seems to prefer at the end?

My vote is yes. Most configuration doesn’t need anything more sophisticated than key-value pairs, perhaps with namespaces. INI can manage that and TOML is basically a better-specified INI.

I can't tell if I've spent too much time on HN or if I came to this conclusion on my own, but TOML is my language of choice for configuration now. It's flexible in the right ways and sectioning of config is so important.

No reason to use INI over TOML. INI doesn't even have a standard specification.

That’s my point, yeah. INI is the right idea but TOML is the same thing but actually specified so use it.

If possible, prefer what tools in your vicinity use. My team uses Kubernetes and Concourse extensively, which both use YAML, so I tend to stick with YAML since people are already familiar with it.

(More recently, I've come around to prefer plain environment variables for configuration, but that only works nicely when the amount of configuration is fairly limited, say 20 values instead of 1000 values.)

For my own use, I do prefer TOML.

Perfect comment! I agree 100%.

.ini, followed by TOML, followed by an identical implementation of some other app's config format.

The biggest problem with config formats is they mislead users into thinking they understand the format. The user tries to edit it by hand, and chaos ensues. So only formats that are stupidly simple, or whose warts are already familiar and well documented, are good choices.

Apache had a great configuration format. Nothing else used it (that I knew of) but you could in theory implement "Apache configs" and then people'd just have to look up how to write those, which there's lots of examples of.

JSON and YAML and XML are data formats; they should only be written by machines, and read by humans. Same with protocols like HTTP, Telnet, FTP... You're not supposed to write it yourself, but it's readable to make troubleshooting easier.

Data formats are nice for expressing nested data structures, but then they don't (usually) support logical expressions; at that point you need a template/macro/programming language, and at that point you're writing code, which will need to be tested, and at that point you should just write modules and use a config format to give them arguments. Every complex tool goes through the same evolution.

If you care about your users, write a tool to generate configs based on a wizard. Good CLI tools do this, and it really makes life better. (It's also a great way to document all your config features in code, and test them)

In the scale world, HOCON is very nice. It’s a format designed explicitly for config files, and has a lot of niceties (like you can append files together and they merge correctly, so you don’t have to end up with giant config files)

What's the "scale" world?

I think they meant "Scala", the programming language.

Sorry, yes, Scala. Autocorrect changed it and I didn't notice.

I agree with HOCON being nice based on personal usage but I haven't seen an in depth analysis of it. This is the canonical parser for JVM based languages — https://github.com/lightbend/config, are there many other implementations that are widely used?

That's the one Akka uses, which probably comprises the majority of HOCON usage.

I've also used the Python port without issues.

I think it's horses for courses. JSON I guess is the best for interchange i.e machine to machine, but I never want to edit it by hand; XML is relatively easy to read but can be quite painful to edit raw, but it can be quite easy to develop a structures editor. I’d favour it for document persistence. YAML is fine for configuration files but I would be careful about how I apply it and would always provide it as a heavily documented templated config file. YAML when used correctly is by far the easiest to edit in the clear, with a plain text editor. With that said, I would try to get away with basic namespaces properties files first before I’d go that far ...

ini if needs are crazy simple, YAML if you need a structure like JSON's but with something any human ever needs to interact with. JSON if humans aren't in the loop.

TOML, in my opinion, is like a weird mishmash of JSON, ini, and bashisms. Though I have worked with it a lot less than the other formats, so YMMV.

Since when is JSON not human readable/maintainable??

Try writing strings with backslashes, or adding a new line to your array and accidentally leaving a trailing comma (or accidentally forgetting it for the previous row). It's also just very visually noisy. I agree with a lot of the other comments in this thread that for human configuration: TOML > YAML > JSON > XML

You can edit JSON by hand but it’s not what it’s for. It’s not designed for that and it’s not really suitable for that. Theoretically you can milk anything that has nipples, but you might find the experience of milking a cat to be ... challenging

The main issue I had with TOML is how much more syntactically noisy it is. Equivalent files with 2-3 levels of nesting usually become at least 50% longer than equivalent YAML.

More here : https://hitchdev.com/strictyaml/why-not/toml/

This is a different use case, I think. This example is defining content, not configuration. In this case the content is user stories. I agree for creating sequences of documents/content in this way, YAML often is nicer. But for configuration, TOML is designed to specify it in a simple and flat way, and that can be very helpful.

I have some projects where I'm frequently writing and midifying content that resembles the example here, and I use YAML there and plan to keep using YAML. For most other things, I'm just doing configuration, so I use TOML. No reason you need to stick to one or the other.

Been working more with Dhall and have really enjoyed it so far.


Putting commas at the start of the line is the toe shoes of syntax.

You can put them at the end of the line.

There's no white Knight here, they all suck in some way. Personally I've had decent success with yaml as simple configuration, but I would never use it as an interchange format. If you know it's caveats and you're targeting one language so you can become familiar with the parser it's serviceable.

I say just use JSON. Everyone knows it already and it's good enough. Use a parser in your app that allows comments and trailing commas like vscode does.

That's not JSON anymore, that's some custom format that's JSON inspired.

If JSON did support this (and multiline strings), I think it’s unlikely anyone would reach for anything else.

Yep. Some kind of JSON++ is where we're headed. Hopefully we can agree on a new standard someday?

(No, not YAML.)

JSON5.org would be nice :)

Cool. It doesn't matter.

It's called JSON5. Don't bend JSON to confuse parsers.


I dislike json with comments or trailing commas as even if your parser can handle them, it surprises many text editors.

Aside from lack of comments, the other major thing that can sometime make json a bad config choice is lack of multi-line strings.

JSON is for data. Not documents. Not config files. I don't agree with any "add this to JSON" comments. It's fine just as it is....for data.

Config files are data about program configuration. So “for data” and “not config files” don't go well together.

I use JSON in the end. I prefer to write TOML, then parse that into JSON. This seems to strike a nice balance between human/machine write/read. It's simple enough to reason TOML, even if it gets verbose. If I have to write YAML after 2 layers I usually write it as JSON and include the JSON in the 2nd level of key.

It's telling that the responses to this question are broad and varied. Still not a well solved problem, it seems.

It's also telling that even with every other possible answer being given by someone, there's still no one who wants XML.

nah, the XML people are just keeping quiet because no one likes hearing 'I told you so'

I'm still using INI files like it's 1999.

Everything being a string is a bit of a joke now.

My cursory survey of config / serialisation formats concluded that nothing is close to being good.

It's overly verbose, and hard to understand XML, it's no comments son, horrors of yaml or some okay format that doesn't have parsers for the languages (plural) you are using on your project.

For in-house, python-only project, my way to go is to create a "config.py". Then I declare a bunch of module variables that can be overridden by environment variables as a bonus.

J S O N all the way.

YAML for me, then TOML

If you still aren't convinced YAML is terrible, try copying and pasting YAML fragments with a regular text editor.

You might end up with valid YAML, but you won't know until the YAML consumer barfs.

BTW, all of a sudden XML with DTDs are looking sane again :)

Use the right tool for the job. I use yaml extensively but never in a situation where someone would want to edit it with a regular text processor.

do you deliver a YAML editor with your software? Because people will use notepad or nano to edit that stuff.

Sort of, I deliver a GUI that exports into YAML for pretty much only reading, portability, and version control. People are expected to do the editing in the GUI, only using YAML for editing when doing complex regex operations that my GUI doesn't support.

If people aren't editing it by hand, why does the format matter? Why not just use JSON? Tools for too-complex-for-the-GUI manipulations are at least as good for JSON as they are for YAML, and the editing is less error-prone.

Internally it is JSON. It exports as YAML for readability when sharing on discord.

Regardless of the reasoning laid out in the OP, it's difficult to argue in YAML's favor comparing it with JSON. I'm not an ardent fan of JSON either -- both YAML and JSON have issues wrt inconsistencies:

- what draft of JSON Schema are you using 4? 7? Neither?

- what version of Swagger or OpenAPI are you using?

- etc.

Sure, it's great to see ongoing development of schemas, but with each new development we have yet another dialect to consider/support.

In my view, perhaps an even greater problem with structured data formats in general is the void that separates them from programming languages esp. static languages such as Java where static type information is otherwise leveraged. The industry standard solution, code generation, is awful in almost every respect. The Manifold framework looks promising in this regard (http://manifold.systems/).

JSON Schema is not affiliated with JSON and should not be confused with it. JSON is a data format, like YAML, and there is only one version of it: the spec at http://json.org/.

XML is as pleasant to look at or touch as a nettle rash, but it seems it can join ALGOL 60 among the ranks of technologies which were a great improvement on their successors.

Umm, no. You can find cases where JSON sucks, but you have to look for them. You can find cases where XML doesn't suck, but you have to look for them.

Other than looking ugly and being a pain to type does xml actually suck?

Is there an agreement on whether it’s

  <author name="pete" />


Yes there is. Markup languages are for representing rich text. Anything that gets rendered to the user is content and anything that's metadata (data about how to render) but not rendered as such goes into attributes. If there is no concept of "rendering to the user" in your app/data, then markup isn't the right choice for representing your data. You're blaming markup languages for not being designed to represent arbitrary data structures when you should be blaming yourself for misusing and misunderstanding markup. Though by skimming through this thread, XML appears to still do much better than YAML. And it's true that in the 00's, XML was improperly used and advocated as universal data format.

That is a reasonable approach - thanks.

Attributes are XML’s foot-gun.

Particularly annoying is that there's no way to do lists with attributes. Looks good:

  <user name="alice" group="wheel" />
Oh no:

  <user name="alice" groups="wheel admin sudoers" />
Or is it one of:

  <user name="alice" groups="wheel,admin,sudoers" />
  <user name="alice" groups="wheel:admin:sudoers" />
  <user name="alice" groups="wheel;admin;sudoers" />
Or give up and use elements:

  <user name="alice">
Hold on, should there be a container?

  <user name="alice">
XML really needs either richer attributes, or no attributes.

Replace with "Element" with "Class" and "Attribute" with "Property" in your examples.

It's your job to decide how the data should be structured, in any language.

I would do the following :

  <user name="alice">
      <group uid="wheel"/>
      <group uid="admin"/>
      <group uid="sudoers"/>

Attributes are XML’s foot-gun.

I disagree. Back in the day we used attributes for everything that was key value and inner tags for anything with structure. We also formatted for clarity:

Compared with what we used to do, I look at attribute-less maven pom.xml with horror.

The decision of when and how to use attributes is the thing that is most often done wrong or clumsily in an XML file. That is my point.

What if it's to be used by french speaking software/people ?


          <label type="env" language="fr">Dehors</label>
          <label type="env" language="de">Außenseite</label>
          <label type="env" language="en">Outside</label>
Quite curious about it.

I guess this way:

    <label xml:lang="fr">Dehors</label>
    <label xml:lang="de">Draußen</label>
    <label xml:lang="en">Outside</label>
says[0]the w3.


I see, thanks. (I also see you corrected my broken German ^^).

Putting l10n/i18n data inline is generally not a pleasant experience, and this problem persists regardless of what serialization format you are using. Either have separate data files per-locale that are merged with the defaults or store the localized strings in a central location (ala your usual gettext setup).

namespaces are absolutely a bear and always unpleasant to work with. The libraries to use xml are equally frustrating when you’re doing complicated things, unless you want to make a class for every single type of detail that this xml document wants - then it’s fine, but some of us don’t want to do that, or inherited a project that didn’t do that.

XPath is a struggle with namespaces, as well. It’s ... trying.

How is that relevant for config files though? If you implement an app and want to use a config file, you don't have to use namespaces. I agree that namespaces are no fun, so I don't use them for my config files.

Namespaces are great when used wisely: they allow for embedding unrelated XML fragments in your document and your application will not see them. Or versioning your XML in namespaces. XPath and namespaces are trivial but it depends on the library you use on how to setup the namespace resolving. My pet peeve about namespaces: people that use HTTP URLs for their declaration and expecting them to be resolved for some reason. The other one being people that see a namespace prefix and think that that is the actual namespace.

Depends on the XML. When you start mixing in namespaces (like trying to parse Maven pom.xml files in Python), it quickly becomes a mess.

Namespaces are horrible in Python because the Python XML libraries are deficient. They're generally fine in e.g. Java. (Unless you're trying to represent QName typed data, but that's very niche.)

I used Python as an example, but handling namespaces in most languages besides Java (and even arguably in Java depending on your point of view) is rather painful.

Any time the semantics represented by the XML is non-trivial, the schema design is likely to be obtuse and/or broken, even when done by smart people. XML is just hard to get right, and hard to evolve gracefully.

When you add verbosity on top of that, working with XML over the long haul is an utter pain.

I'd say the same about JSON!

Numbers over 2^52. Unicode. Terminator vs. Separator. Weak type system no DTD. No Xpath equivalent. No namespaces.

Date representations. No comments. Behavior when multiple instances of the same key occur.

XMLs "predecessor" is full-blown SGML, and since XML is just a proper subset of SGML, XML is no improvement. The one thing it brought was DTD-less, canonical angle-bracket markup where documents can be parsed without grammar rules eg DTDs (this was also incorporated into SGML via the WebSGML adaptations, so that XML could remain a proper subset of SGML). But this also means XML is useless for parsing HTML, the most important markup language (XML was started to supersede HTML as XHTML, but that failed).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact