Hacker News new | past | comments | ask | show | jobs | submit login
YAML: probably not so great after all (2017) (arp242.net)
673 points by tlb on June 20, 2018 | hide | past | favorite | 394 comments

One thing to remember is that YAML is about 20 years old. It was created when XML was at peak popularity. JSON didn't exist (YAML is a parallel, contemporary effort). Even articulating the problems with XML's approach was an uphill battle. What you would replace it with is also hard. What use cases matter? What is the core model? A simple hierarchy? Typed nodes? A graph? What sort of syntax is needed for it to be usable? These were all questions. Seen in context, we got quite a bit correct. And yes... it has a few embarrassing warts and a few deep problems. Ah well.

A second thing to consider... YAML was created before it was common that tech companies actively contributed to open source development. There are lots of things we could have done differently if we had more than a few hours per week... even a tiny bit of financial support would have helped.

Finally, YAML isn't just a spec, it has multiple implementations. Getting consensus among the excellent contributors is a team effort, and particularly challenging when no one is getting paid for the work. Once you have a few implementations and dependent applications, you're kinda stuck in time.

It was an special pleasure for me to have had the opportunity to work with such amazing collaborators.

We did it gratis. We are so glad that so many have found it useful.

I am the author of this article. Apparently people read my website (how they get there, I don't know?)

At any rate, it's worth mentioning that in the conclusion I wrote:

> Don’t get me wrong, it’s not like YAML is absolutely terrible but it’s not exactly great either.

I still use YAML myself even when I have the freedom to use something else simply because – for better or worse – it's very widespread, and for many tasks it's "good enough". For other tasks, I prefer to avoid it.

I think that a stricter version of YAML (such as StrictYAML) would make a lot of people's lives easier though.

I think you're probably right there. I use YAML when something else I'm using calls for it, but mainly I tend to output things in it just because it's very readable.

Using it a lot more lately as I'm diving into Ansible, so I'll be interested to see if I run into problems.

It is particularly unfortunate that ansible uses yaml because if infrastructure is going to be code, some day you will surely want to refactor.

The trend of "stick together Yaml and a template engine, we have our DSL!" in CM sytems is a bit horrible.

Ansible does make some efforts to limit jinja templating to variable substitution, but it's sill not that great, you have all kinds of weird stuff that can happen specially with colons.

The worst one is saltstack, the resulting syntax is just atrocious and border line unreadable, I not a big fan of map.jinja files[0] and on the yaml side, things can get ugly quite fast [1].

I know it's not a popular opinion, but I would rather use the puppet DSL, even with its step learning curve.

[0] https://github.com/saltstack-formulas/salt-formula/blob/mast...

[1] https://github.com/saltstack-formulas/mysql-formula/blob/mas...

The whole concept of templating language on top of YAML is suspect anyway, but I wish Salt had just gone with Mako as the default templating language. That way you could write plain Python in your templates and not have this horrible misuse of Jinja.

I also have come to agree that a DSL is the best solution, though Puppet's particular DSL is not a great example. Projects that re-implement the same thing from scratch like mgmt[1] are on the right track, but probably won't gain enough traction.

[1] https://github.com/purpleidea/mgmt/blob/master/docs/language...

I love the clean style of your website.

Thanks! Last time I checked my domain got penalized for having abnormal low markup or some such, which apparently makes it look like a spam site. I am proud of this.

> Last time I checked my domain got penalized for having abnormal low markup or some such

Do you have a link to the document you were pointed to when you got penalized? If it was Google who penalized you, they must have pointed you to a URL with documentation on why you got penalized and how to resolve it.

I ask this because I run a few websites with even lesser markup than your site but I have never got penalized. I once got penalized due to excessive number of spam comments on one of my websites and they pointed me to https://support.google.com/websearch/answer/190597 ("Remove this message from your site") to resolve the issue. This issue did not affect the search ranking much though (dropped by only about 2 or 3 places in the list of results). But never had an issue with abnormally low markup.

The markup in your website looks pretty reasonable to me, so I am surprised you could get penalized for that when I have had no issues with even lesser markup and they still appear at the top of the list of results for relevant search terms.

I think it was some tool at moz.com, but I don't recall from the top of my head. I don't think it was Google itself. I have no idea what effect that has; I'm not really in to that world.

> I have had no issues with even lesser markup and they still appear at the top of the list of results for relevant search terms.

It seems people are finding my site, whether or not it's being penalized. I mean, someone other than me posted it here, right?

What a time to be alive

Penalised? By whom?

I would assume tehGoog. Just to make it hard to find low impact sites that aren't AMP.

Oh I see. Thanks.

The only search engine in town. You already know its name.

Oh you mean Ask Jeeves?

No, he means Hotbot.

Drupal 8 uses YAML* as its configuration language because JSON doesn't support comments. That simple. Thank you for YAML, it does deliver for us: it's human readable and it's easy to parse (see below).

* I mean, it uses an ill defined subset of YAML. The definition is "whatever the Symfony YAML parser supports".

You know what else is human readable, easy to parse if you're using PHP, and supports comments?


I understand why some languages rely on common configuration file formats.

I don't understand why the popular dynamic script-y languages don't more commonly use the natively-expressable associative/list data structures that they're famous for making convenient.

Using includes/imports is not the greatest idea ever.

Your configuration file is one of your program interface. It's something that must be well define. If your configuration file is a programing language this interface is not that well defined.

Also you expose yourself to all kind of weird bugs because some (too smart for their own good) people will monkey patch your software using it.

It adds a lot of unnecessary stuff in the configuration file, things like ';' or '$' are not really useful.

Lastly, common configuration file format are good because there are... common. You can have 2 pieces of software in 2 different languages accessing the same configuration file. A common example of that is configuration management, There are a lot of modules/formula in salt/ansible/puppet/chef doing fine parsing of the configuration files and permits fine grain settings, and I'm not mentioning augeas. If your configuration is a php/python/perl/ruby file good luck with that.

I know it's really common for php applications to do configuration files in php, but frankly, it's a bit annoying.

> If your configuration file is a programing language this interface is not that well defined.

While I do agree with the rest of your comment I don't think they were advocating using the full language for configuration, just the maps/arrays/etc. (e.g. Python's `literal_eval`).

true, but some user will use the full language.

Something like:

config = {'key1': 'value1', 'key2': 'value2'}

could be written as:

config = {}

config['key1'] = 'value1'

config['key2'] = 'value2'

With large chunk possible between the 3.

It basically transforms the configuration file into an API like any library, which is not really what you want for an end user program.

If a key objection/perceived threat is that this might give someone an insertion point they're not meant to use for code ... well, let's consider that we're talking about applications distributed as interpreted language source here. Disallowing code-as-config isn't even closing the door of this particular barn after the horse has left, it's putting two strands of police line tape across the bottom half of the gap where the door was never installed and hoping any equines thinking of passage politely consider the message in case it hadn't already occurred to them which side of the entrance they preferred to be on.

Consider this: Design and optimize for the common case.

Why do we have config files? Because developers actually want a place dedicated to simple or structured application configuration data, for which PHP assignments with arrays + primitives can function at least as effectively as JSON. Most developers would prefer that config data get loaded quickly so the application can get on to doing actual app-y things. Using the language for this means you're parsing at least as fast as you can interpret and you can also take advantage of any code caching that's part of your deployment (especially nice in the PHP-likely event that config settings would be reloaded with every request).

Abuse isn't likely to be the common case. The end users you invoked certainly aren't going to be the ones looking for opportunities to insert code over data. Developers have other places to put code and, as mentioned, probably actually want a place dedicated to data. You're still right that of course someone will do it, just like someone will inevitably create astronaut architecture hierarchy monstrosities in any language with classical inheritance or make potentially hidden/scary changes to language function using metaprogramming facilities.

But potential for abuse doesn't automatically mean a feature should be disallowed.

A lot of the time it's better to let people who can be circumspect have the benefits of a potential approach, and if somebody thinks they need to solve a problem by using a technique that's arguably abuse, well, let them either find out why it's a bad idea or enjoy having solved their problem in an unusual way. Not the end of the world. Possibly even legit.

You can use arbitrary tools to programmatically generate YAML (or JSON, or XML, any of the other "data only" formats.) This allows for tools to drive other tools by generating a spec file and feeding it in. See e.g. Kubernetes for a good example of that.

There's no language that I'm aware of that can natively generate PHP syntax, and there's no common multi-language-platform library for generating PHP syntax. I think that's most of the reason.

To contradict myself, though: Ruby encodes Gemfiles and Rakefiles as Ruby syntax. And Elixir encodes Mixfiles, Mix.Config files, Distillery release-config files, and a bunch of other common data formats as Elixir syntax.

And, of course, pretty much every Lisp just serializes the sexpr representation of the live config for its config format (which means that, frequently, a lot of Lisp runs code at VM-bootstrap time, because people write Turing-complete config files.)

> There's no language that I'm aware of that can natively generate PHP syntax

This is a solid argument against using PHP (or any such language) as a cross-language data interchange format. There are others :) And I totally agree you want a language independent format for anything you might have to feed across an ecosystem of tools.

For a PHP-system generating/altering its own config files... PHP's `var_export` generates a PHP-parseable string representation of a variable (though it sadly doesn't use the short array syntax).

Turing-complete config files probably have some hazards, like Lisp itself does. YMMV regarding whether those hazards can be avoided by circumspect developers or need to be fenced off.

You don't know when you'll need to generate or parse your config files with something that either can't read, write or execute your language.

Django's settings.py sucks. I've used Django since the 0.9 days. It's extremely impractical and needs to be worked around constantly.

This, and the security problems of executable code as configuration, are why the OpenBSD people mandate that /etc/rc.conf is not general-purpose shell script, and why the systemd people mandate that /etc/os-release is similarly not. People want to be able to parse configuration files like this with something other than fully-fledged shell language interpreters; and they want these things to not be vectors for command injections.

* https://unix.stackexchange.com/a/433245/5132

Settings.py is uniquely bad, though, IMO because it tries to be a badly defined dict(), instead of exposing proper configuration interfaces. Ruby config files are common and usually fairly great, see for example the Vagrantfiles.

And you won't have to generate your config files (parsing, maaaaaybe), because those needs are covered by the fact that the files are programs. They are _already_ generating a configuration.

> And you won't have to generate your config files (parsing, maaaaaybe), because those needs are covered by the fact that the files are programs. They are _already_ generating a configuration.

Yes, theoretically, if settings.py was a "generator" format that you ran as a pre-step (like you do to get parser-generators like Bison to spit out source files for you to work with), and this generator actually spat out something like a settings.json, and all the rest of the infrastructure actually dealt with the settings.json rather than the generator, then, yes, it wouldn't matter. Tools in other languages could just generate the settings.json directly.

As it stands, none of those things are true, so tools in other languages actually need to do something that outputs settings.py files.

Galaxy brain: if your config is programmable, it can read whatever terrible configuration format you want. That means my settings.py (yes, I'm forced to use Django) is configured via environment, which is populated by k8s from - gasp - JSON files.

That means that if I wanted to configure Vagrant with JSON, there is no force in the universe that could stop me.

If the config file is actually a normal program, then it can do normal program things, then any benefit from using JSON instead is nullified by the fact that you can still use JSON. In turn, if your tools primary configuration is via a more limited settings, you're stuck with it. Not even "generators in other languages" allow comparable runtime flexibility.

Yup, totally agree with you, settings.py has always been a pain in the ass. Not really an acute one but the kind that is uncomfortable but not enough to make you do something about it.

> There's no language that I'm aware of that can natively generate PHP syntax.

Actually, I've had to use PHP to output a PHP configuration array for a project that required config in PHP.

`var_export($foo)` will output valid PHP code for creating the array $foo. In my case I was doing horrible things to create the array in my pseudo-makefile, then using `var_export()` to output the result. Note that you can run php from the Bash CLI with the `-r` flag, which helps.

Tcl works well for configuration files. You can strip away the extraneous commands in a sub-interpreter to prevent Turing completeness and add infix assignment to remove the monotony of the set command and what you get is a nice config format. If you need more power in the future you just relax some of the restrictions and use it as a script without breaking existing files.

People get really upset when they have to type "array(" instead of "[" or "{" (pre-PHP 5.something) and quotes instead of no quotes (and punting the character escape problem to something else) I guess.

Using code-as-data works really well in Lisp-like languages. Reading a Clojure project's project.clj file or a Lisp project's project.asdf file is pretty pleasant. A programming language's choice in how it decides to handle library config info for building and specifying dependencies (XML, makefiles, JSON, YAML, INI, nothing, etc...) will be a good indicator for the culture of the language around config files in general. Composer for PHP only came out in 2012.

Interestingly, the Lua programming language actually evolved from configuration files: https://www.lua.org/history.html (and is still officially deemed useful for writing them)

I use Lua for configuration files for both personal and work related projects [1]. You get comments and the ability to construct strings piecemeal (DRY and all that). It's easy to sandbox the environment, and while you can't protect against everything (basically, a configuration script can go into an infinite loop), if someone unauthorized does have access to the script, you have bigger things to worry about.

[1] An example: https://github.com/spc476/mod_blog/blob/master/journal/blog....

You can set a count hook to defend against infinite loops.

Lua is great.

That was also one of the rationales behind TCL's design.

John Ousterhout explained in one of his early TCL papers that, as a "Tool Command Language" like the shell but unlike Lisp, arguments were treated as quoted literals by default (presuming that to be the common case), so you don't have to put quotes around most strings, and you have to use punctuation like ${}[] to evaluate expressions.

TCL's syntax is optimized for calling functions with literal parameters to create and configure objects, like a declarative configuration file. And it's often used that way with Tk to create and configure a bunch of user interface widgets.

Oliver Steel has written some interesting stuff about "Instance-First Development" and how it applies to the XML/JavaScript based OpenLaszlo programming language, and other prototype based languages.

Instance-First Development: https://blog.osteele.com/2004/03/classes-and-prototypes/

>The equivalence between the two programs above supports a development strategy I call instance-first development. In instance-first development, one implements functionality for a single instance, and then refactors the instance into a class that supports multiple instances.

>[...] In defining the semantics of LZX class definitions, I found the following principle useful:

>Instance substitution principal: An instance of a class can be replaced by the definition of the instance, without changing the program semantics.

In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.

That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).

The syntax for creating a bunch of objects is parallel to the syntax of declaring a class that creates the same objects.

So you can start by just creating a bunch of stuff in "instance space", then later on as you see the need, easily and incrementally convert only the parts of it you want to reuse and abstract into classes.

What is OpenLaszlo, and what's it good for? http://www.donhopkins.com/drupal/node/124

Constraints and Prototypes in Garnet and Laszlo: http://www.donhopkins.com/drupal/node/69

In our Tcl based application server (many eons ago), we followed exactly that approach.

All configuration files were Tcl data structures that were sourced on server start.

>I don't understand why the popular dynamic script-y languages don't more commonly use the natively-expressable associative/list data structures that they're famous for making convenient.

You picked the wrong language... PHP comes with its own JSON parser. And INI and XML and even CSV.

But, the reason is that, generally, you want config files to describe data or state only. Yes, you could just make your config native code, but then the temptation to add functions and methods and logic to that becomes irresistible and soon your config is an application that needs its own config.

Config formats need to be simple, and preferably not Turing complete.

Any configuration will eventually become a programming language.

See The Configuration Complexity Clock. https://mikehadlow.blogspot.com/2012/05/configuration-comple...

Just because it's common doesn't mean it should be encouraged by writing your config in native code to begin with.

INI is still simple, and JSON doesn't support logic, so the madness can be held at bay at least for a time.

XML and s-expressions are lost causes, though.

Because it's just in general incredibly short sighted to think that your config file is never going to be read by code written in another language.

There's also an argument about whether making configuration files able to execute arbitrary code is a good idea. You get straight into the JavaScript 'eval' problems which we've spent a decade escaping.

Arbitrary code execution in configuration files has caused a few vulnerabilities in Wordpress extensions already, so yes, it's a terrible idea.

I think some of it is PLOP (Principle of Least Power).

$CFG = random() > 0.5 ? "yes" : "no";

...is likely "too powerful". It'd be nice if there were ways in certain programming languages to do something like "drop privileges" to avoid loops, function calls, external access, etc.

The makers of Drush, the cli for Drupal, subscribed to your line of thinking in the early versions and inventory items were defined in PHP files. Migrating from that will be interesting.

Because that forces the end user, who might not know anything about the programming language one’s application is written in to wrestle with the low level implementation details. In the words of Keith Wesolowski, the programmer assumes that the end user is a “Linux superengineer”, which is almost always a wrong assumption to make.

I have linked this https://groups.drupal.org/node/159044 elsewhere. Please note PHP was considered.

Fully agree with you, I have done so multiple times.

I totally agree that in the ideal world, JSON should support comments. I yearn for them, and none of the in-band work-arounds or post-processing tools are acceptable substitutes.

But to play the devil's advocate, how would JSON be able to support round-tripping comments like XML can, since <!-- comments --> are part of the DOM model that you can read and write, while JSON // and /* comments */ are invisible to JavaScript programs. There's nowhere to store the comments in the JSON object model, which you would need to be able to write them back out later!

On important feature of JSON is being able to read and write JSON files with full fidelity and not lose any information like comments. XML can do that, but JSON can't. To fix that you'd have to go back and redesign (and vastly complicate) fundamental JavaScript objects and arrays and values, to be as complex and byzantine as the DOM API.

The less-than-ideal situation we're in isn't JSON's fault or JavaScript's fault, because JSON is just a post-hoc formalization of something that was designed for a different purpose. But JSON is rightly more popular than XML, because it's extremely simple, and nicely impedance matched with many popular languages.

YAML suffers from the same problem as JSON that it can't round-trip comments like XML can, but it fails to be as simple as JSON, is almost as complex as XML, and doesn't even map directly to many popular languages (as the article points out, you can't use a list as a dict key in Python, PHP, JavaScript, or Go, etc).

You can sidestep some of JSON's problems by representing JSON as outlines and tables in spreadsheets, without any need for syntax and sigils like brackets, braces, commas, no commas, quoting, escaping, tabs, spaces, etc, but in a way that supports rich formatted comments and content (you can even paste pictures and live charts into most spreadsheets if you like), and even dynamic transformations with spreadsheet expressions and JavaScript.

See my comments about that in this and another article: https://news.ycombinator.com/item?id=17360071 https://news.ycombinator.com/item?id=17309132

> "YAML ... is almost as complex as XML"

In fact YAML is probably more complex than XML; the specification of YAML, when I print it into PDF, is about three times as long as that of XML 1.0. (And XML 1.0 also describes DTD, which is kind of a simple type validation for XML and thus includes much more than just serialization syntax.)

> But to play the devil's advocate, how would JSON be able to support round-tripping comments like XML can, since <!-- comments --> are part of the DOM model that you can read and write, while JSON // and /* comments */ are invisible to JavaScript programs.

It doesn't support it for whitespace in general (if you deserialize into JS object model or equivalent), so why would it be any different for comments specifically? It's just not a design goal of the format.

Although, of course, it's quite possible to have a JSON parser that preserves representation. It'll just have a non-obvious mapping to the host language because of all the comment and whitespace nodes etc.

Zish https://github.com/tlocke/zish supports comments (but they're not round-trip comments) and also extra data types such as timestamp and bytes.

It's still in its early stages, so if anyone's got any comments I'm interested in hearing them :-)

> YAML suffers from the same problem as JSON that it can't round-trip comments like XML can, [...]

While not mandated by the YAML specification, it doesn't prevent creation of a parser that round-trips comments.

In fact, the ruamel.yaml project for Python provides one.

I'm kinda sad that JSON has been struggling for like 15 years to get comments. Is there like some kind of gestapo that's saying no or something? All it takes is for the maintainers of probably 15 popular libraries to start handling comments.

At the end of the day I'm sure the reason we don't have JSON comments is somewhere listed in this page: xkcd.com/927/

I'm aware of at least three JSON libraries that at least can accept comments (Gson in lenient mode, Json.NET, and json-cpp are the ones I've used personally that do)-- it's hard to convince everyone that JSON needs comments, though, and comments are of limited utility if it's not guaranteed that they'll parse everywhere.

But you really only need comments in JSON if you're doing stuff like storing configuration in JSON, and JSON's too fiddly in general to be a great config file format (too easy to do something like forget a comma; no support for types beyond object, array, (floating point) number, and string). Something more like YAML without the wonky type inference would be better, IMO.

I believe Douglas Crockforf used to make the argument that JSON is not meant for human consumption and thus shouldn't be changed to better serve humans. I personally wish hjson (https://hjson.org) were to get more traction. I prefer it over both JSON and YAML.

It was Crockford. Directly from him:

> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.


Well, then why not to allow a trailing comma in lists and objects? Computers don't care and they would even be happier, because they can then just pour array and object members with a trailing comma without concerning themselves whether this is the last member or not. (Dijkstra's train toilet problem comes to mind.) Also compare with XML, where each element is self-contained.

And why to model JSON syntax closely after JavaScript literal object syntax (which is actually more convenient, by the way) which, being taken from mainstream programming languages, naturally evolved to be written by humans in small amounts not by computers in large dumps? :)

VS Code uses JSON with comments for config files. [1]

Technically, this is not JSON. You won't be able to use a standard JSON parser without stripping comments first. But you can use a simple, JSON-like language with comments for config.

[1] https://code.visualstudio.com/docs/languages/json#_json-with...

> you can use a simple, JSON-like language with comments for config.

YAML can be employed as a simple JSON-like language with comments.

> simple

YAML is much, much more complicated than JSON.

    > simple
    YAML is much, much more complicated than JSON.
Quoting a single word from the parent’s sentence is misleading. The sentence "YAML can be employed as a simple JSON-like language with comments." is true because JSON is YAML, so you can parse a JSON file with #-comments using a YAML parser.

The parser is not simple, though, and that's what counts.

Most YAML users don't need to look at the source for a YAML parser. I appreciate elegant simplicity, but I don't think parser complexity is the most important metric by which to judge a data interchange format.

If you use a YAML parser to parse JSON-with-comments, it will accept many inputs that don't correspond to JSON-with-comments, and furthermore is likely to report syntax errors that don't make sense to a user who only knows JSON.

So, this unnecessary parser complexity is a usability issue. You should use a parser for the config language you actually intend to support.

JSON5 is also a great alternative: https://json5.org/

Supports comments, trailing commas, single quotes, multi-line strings, and more number formats.

I really wished json5 would support optional commas as well. If you have a new line, no comma needed. So you can do






New lines used by humans, computers should do a good job as well.

It looks very like an s-expression. Maybe we should go back to Lisp for our data encoding? (and to our code when we are at it? ;))

Well you could use CSON, which uses CoffeeScript notation that allows for constructs such as:

required: [



which is the same as:


  "required": [




in JSON :)

That's not a subset of JavaScript though.

Sublime Text does too.

Try https://jsonnet.org/. Supports comments, plus a handful of additional useful features.

Jsonnet is awesome. We use it to generate our yaml files for kubernetes. YAML isn’t easy to parse, nor is it very flexible as a templating language. It gets cumbersome very quickly.

Jsonnet is a relief. Kubernetes should have been a dumb json config from the get go. JSON is ridiculously simple to parse and emit. It has huge interoperability as well with lots of programming languages.

Kubenetes never intended to get stuck on YAML. The CNCF is backing Ksonnet which is Jsonnet for k8s if you haven't seen it before.

Oh man, you just made my day. Comments, imports and mixins, I just had an 'evrika' moment.

P.S. You Romanian by any chance?

well there is also toml and hocon (json supersets) which are "yaml like"

TOML is more like ini than YAML.

I don't like it because it uses the = symbol which seems imperative rather than declarative. (Same with HCL, it might be a nitpick but these are languages I'm going to be using all the time.)

HOCON is interesting but at first glance it seems it might be too ambiguous for my tastes, because like YAML, because it supports both js-style ("//") and shell-style ("#") comments.

JSON plus comments is beautiful because it adds minimally to an unambiguous language which lends itself to automatic formatting (stringificiation).

I'd argue that = only feels imperative if you're used to imperative languages. Prolog and Haskell, both of which focus on being declarative, also both use the equals sign.

Fair enough. It still seems overkill to me though. A ":" seems much more unassuming than an "=".

Don't forget HJSON if you want a json clone with comments. https://hjson.org/

TOM: Initial release 23 February 2013; 5 years ago

Drupal 8 file format discussion was in 2011, predating it by two years. https://groups.drupal.org/node/159044

Is there a yaml parser that preserves comments and a writer that manages to write them back though?

The parser linked from the article does: https://github.com/crdoconnor/strictyaml

> JSON doesn't support comments


{ "firstName": "John", "lastName": "Smith", "comment": "foo", }

I know it isn't the same as #comments, but who cares really.

The trouble there is that your comments come in-band. What if you're trying to serialise something and you don't have the power to insist that it's not a dictionary with "comment" as a key?

It seems the main difference is your comments are all parsed and loaded into memory with the file, while official comments aren't.

How do I do something like:

{ # comment with a note about the value of foo "foo": "bar", # comment with a note about the value of baz "baz": "qux" }

Without driving myself and future readers insane with fooComments and bazComments?

What if I need a multiline comment explaining a yak-shaving story for why a key is set to a certain value?

What if the object in question is a set of keyword arguments, and adding new fields changes the behavior of whaever is parsing the document?

Ok, I'll bite.

    "#": "A foo variable",
    "foo": true,

    "#": "A bar variable",
    "bar": false

    "# A foo variable": "",
    "foo": true,

    "# A multiline..": "",
    "# .. bar variable": "",
    "bar": false

Sigh. All I wanted to do is to say thanks for the YAML standard -- comments are important but not the only problem with JSON. And truly I can't be expected to remember all of this discussion from like six plus years ago. One thing I remember though, it the trailing comma problem -- we upstreamed a grammar change to Doctrine annotation so "foo, bar," is OK because PHP arrays accept that and it's bonkers trying to code a mostly PHP system without trailing comma support. Also, JSON is no fun to write , you need to have [] {} all correct where YAML is much easier. The less sigils the better and most of Drupal YAMLs only use the dash, the colon and the quote. This is the grave mistake Doctrine committed as well, instead of simple arithmetic (>=1.0) they used mysterious sigils in version specification (~1.0). Drupal is in the business of constantly accepting new contributors and (~R∊R∘.×R)/R←1↓ιR is not newbie friendly, no matter how you slice and dice it. There are certainly advantages of sigil heavy languages like APL and Perl but the scare factor is too high.

fair enough, but then you probably shouldn't have led off your earlier comment with "that simple".

That's just ugly and you're mixing your comments with the data structure, which is potentially confusing. Also, Jason requires a lot more typing. I don't want to have to manually add in all the brackets, quotes and commas when editing config a file.

Presto! You have a duplicate key in the first example.


  print (json.dumps(json.loads(js_data), indent=2))

    "bar": false,
    "foo": true,
    "# A multiline..": "",
    "# .. bar variable": "",
    "# A foo variable": ""
Presto! ;-)

> who cares really

the person who came up with HOCON, probably

> JSON didn't exist (YAML is a parallel, contemporary effort).

Interesting. How did it happen then that, quoting the YAML 1.2 spec, that "every JSON file is also a valid YAML file"? Although the previous spec documents don't mention JSON.

Was that an intentional design decision for 1.2 or was it some kind of convergent design due to Javascript?

I have admired Douglas Crawford's excellent JSON from the moment I saw it, it is a model of simplicity. I also like TOML and wish it all the best. By contrast, YAML is complex and could use a hair cut.

When I say "JSON didn't exist", what I mean is that it wasn't popular or known to us when we were working on YAML. So, please excuse my sloppy wording. For me, the work on what would become YAML started with a few of us in 1999 (from SML-DEV list). In January of 2001 we picked the name and had early releases. It took a few years of iteration before we had a specification the collaborators (Perl, Python, and Ruby) could all bless.

Anyway, with regard to Crawford's excellent work, JSON. It is a coincidence that YAML's in-line format happened to align. Although, it's probably because of a "C" ancestor, not JavaScript. The main influence on the YAML syntax was RFC0822 (e-mail), only that from my perspective, it needed to be a typed graph. In fact, we documented where we stole ideas from, to the best we could recall at that time: http://yaml.org/spec/1.0/#id2488920.

>YAML is complex and could use a hair cut.

Out of curiosity, did you see the parser linked to at the end of the article? ( https://github.com/crdoconnor/strictyaml )

That was my attempt at giving YAML a haircut. I'd be curious to know what you thought.

Thank you for creating YAML, by the way. Even though part of that rant was quoted from me, I'm not negative on it like the author - I think the core was brilliantly designed. If you put two hierarchical documents side by side - one in TOML and another in YAML the YAML one is much, much clearer and cleaner.

Thank you for StrictYAML I might just use it. It does look like a nice hair cut. You might wish to give Ingy a ring. He has been itching to move forward on a reduced/secure YAML subset.

That said, StrictYAML seems to be a tad bit more of a hair cut than I'd imagine. I'd keep nodes/anchors, since I think a graph storage model is underrated; I think that data processing techniques just haven't caught up with graph structures.

Further, I'm not sure everything can be easily typed based upon a schema. Hence, I'm not sure about completely dropping implicit types, perhaps you may want to provide a way for applications to resolve them if they wish. For example, an application may want to attempt to treat anything starting with "[" or "{" as JSON sub-tree. Perhaps keeping "!tag" but handing it off to the application to resolve might also be a good idea in this regard. Even so, typing should be done at the application level and default to something very boring.

>Thanks for StrictYAML, I might just use it.

Thanks, that's very flattering.

> I'd keep nodes/anchors, since I think a graph model is underrated

Well, you can create graph models without it (and I do) - you can just use string identifiers to identify nodes and let the application decide what that means.

I always thought the intent behind nodes/anchors was not so much graph models but rather to take repetitive YAML and make it DRY. That appears to be how it is used, e.g. in gitlab's ci YAML.

>I'm not sure about completely dropping implicit types, perhaps you may want to provide a way for applications to resolve them if they wish. For example, an application may want to attempt to treat anything starting with [ or { as JSON.

I think that would cause surprise type conversions. There will be plenty of times when you want something to start with a [ or { and you won't want it parsed as JSON.

I embed snippets of JSON in YAML multiline strings sometimes and I usually just parse it directly as a string. Then I run that string through a JSON parser elsewhere in the code.

>You might wish to give Ingy a ring.

I would like that.

> I think that would cause surprise type conversions.

YAML has traditionally been used as the basis of higher-level configuration files for particular applications. What I'm saying is that implicit typing should be permitted, but delegated to those applications.

Conversely, I'm not saying that StrictYAML should do anything by default with unquoted values, except reporting them to the application as being an unquoted value. This way the application could choose to process the value differently from those that are quoted.

An interesting idea, but it's not clear that this will be less confusing or that application authors will make better at avoiding config languages gotchas than config language designers such as yourself (and existing app specific config languages suggest otherwise).

I think a reason this won't necessarily fix the problem with unmet expectations is that identical constructs in different but analogous yaml files would be likely to end up with very different semantics and users effectively have to remember which particular idiosyncratic YAML dialect choices various apps make. Say

   version: 1.3
means the string "1.3" in app a), the float 1.3 in app b) and a version number in app c) one. Furthermore let's assume that app c) required a version number, whereas a) and b) required strings.

Another, more subtle problem, is that such a scheme would make it more likely that applications would end up parsing raw string representations themselves (with ensuing subtle differences even for things which are nominally meant to be identical, say dates or numbers and possibly security problems as well).

> I always thought the intent behind nodes/anchors was not so much graph models but rather to take repetitive YAML and make it DRY. That appears to be how it is used, e.g. in gitlab's ci YAML.

That's how I use it too. When I read about competing formats, that's the first feature I check for. It's really key for readability and usability in some use cases.

Great to have you here elaborating on various design choices. Are you perhaps familiar with OGDL [1] and what's your opinion?

[1] http://ogdl.org/spec

I don't have much to suggest. For YAML, the use of whitespace, colons and dashes primarily emerged from usability testing with domain experts who are not programmers. In particular, testing was done in the context of an application that needed a configuration and data auditing interface, an accounting application. Even anchors/aliases worked in this context and supported the application's use by making the audit records less repetitive without introducing artificial handles.

Other use cases such as dumping any in-memory data structure from memory, perhaps out of a sense that we needed full completeness, actually didn't have any end-user usability testing. Round-tripping data seems in retrospect to be a diversion from the primary value that YAML provided.

Is there an implementation of strict yaml that you know of for Ruby?

If you are writing a new YAML implementation, then yeah, you want a simpler spec to follow.

If on the other hand you are using a YAML library... I've had pretty good success using YAML compatibly across Python, Ruby, C# and Go projects. Do you have a particular issue in mind that the existing Ruby implementation doesn't address?

It's an implementation of YAML, not StrictYAML which has different semantics.

Yes, strict YAML is different than YAML. If you take a look at the github page linked in the GP, it explains the differences.

"JSON didn't exist because Us and We"?

YAML is an invented serialization format, JSON is a discovered one. As CrOCKford points out, JSON existed as long as JS existed, he just called it out and put a name on it.

Anyway, XML is a strong anti-pattern (too much security, even if you get it right on your end, the other party likely screwed something up). YAML seems to be going down that path too.

TOML seems to be "the JSON of *.ini" (ie: discovering old conventions, rather than inventing new ones), and I'm glad to have been exposed to it.

> "JSON didn't exist because Us and We"?

If you define JSON as the underlying practice that Crawford later named and documented, then sure, what I wrote reads completely wrong headed. However, when we were working on YAML, JSON was not yet called out and given a name.

I believe the most important convention that YAML and JSON shared was a recognition of the typed map/list/scalar model used by modern languages. Further, as far as conventions go, I think there's quite a bit to be said about languages that use light-weight structural markers such as: indentation, colon and dash.

The answer is in version 1.2 of the spec:

> The primary objective of this revision is to bring YAML into compliance with JSON as an official subset.

It's not really a moral judgement, thanks for your contributions and your innovations, but I prefer not to use YAML if possible for the same reasons the author outlined.

I didn't know this bit of history. You're right, context explains a lot of the design choices made at YAML birth. Thanks for sharing.

"JSON" became popular in the 90s. They were http requests which returned javascript which you would simply eval(). No need to write or import a parser, and it is the same syntax as the language you're using, because it is the same language. In technology many things become popular not because how good (or bad) things are, but how easy to use something is.

Can't agree more. In tech, the prime mover often becomes the standard.

Clark, thanks so much for YAML. I love it and use it a lot. It actually increases the day-to-day joy of the work I do as a developer.

(While constructive criticism is fine, those rare people who trash it are... nonsensical to me. I'd like to see them do one-tenth as good under the same conditions!)

Unity3D uses YAML for it's serialization engine. Thank you.

I love YAML, thanks for creating it, it's saved me a lot of time over the years.

I love json but despise the fact it doesn't support comments

JSON definitely did exist 20 years ago.

JavaScript objects, yes, but not JSON. Folks were deep into XML as a message format.

> Douglas Crockford originally specified the JSON format in the early 2000s

> I discovered JSON. I do not claim to have invented JSON because it already existed in nature. What I did was I found it, I named it, I described how it was useful. I don’t claim to be the first person to have discovered it. I know that there are other people who discovered it, at least, a year before I did. The earliest occurrence I found was there was someone at Netscape who was using JavaScript array literals for doing data communication as early as 1996, which was at least 5 years before I stumbled onto the idea.


I can independently confirm that people were using JSON before he named it JSON. I was dumping data in JSON in 2000 for dynamically displayed reports.

But then again I was already used to using Perl data structures as dumped by Data::Dumper for config, because I was taught a lot about Perl by a Lisp programmer who had used Lisp data structures for the same purpose since the 1980s. So using JSON didn't feel original or clever. It seemed like I was simply using a well-known technique in yet another dynamic language.

Then again our reaction to XML was the stupid thing other people were doing that you had to do to interact with the rest of the world. I got used to holding my tongue until I went to Google a decade later and found that my attitude was common wisdom there...

According to Platonism, JSON has no spatiotemporal or causal properties (like a datetime format) and thus has existed and will exist eternally. All hail JSON.

I have used all the principle of JSON and developed https://jsonformatter.org

I'd like to propose the "YAML-NOrway Law."

"Anyone who uses YAML long enough will eventually get burned when attempting to abbreviate Norway."


  NI: Nicaragua
  NL: Netherlands
  NO: Norway # boom!

`NO` is parsed as a boolean type, which with the YAML 1.1 spec, there are 22 options to write "true" or "false."[1] For that example, you have wrap "NO" in quotes to get the expected result.

This, along with many of the design decisions in YAML strike me as a simple vs. easy[2] tradeoff, where the authors opted for "easy," at the expense of simplicity. I (and I assume others) mostly use YAML for configuration. I need my config files to be dead simple, explicit, and predictable. Easy can take a back seat.

[1]: http://yaml.org/type/bool.html [2]: https://www.infoq.com/presentations/Simple-Made-Easy

The implicit typing rules (ie, unquoted values) should have been application dependent. We debated this when we got started and I thought there was no "right" answer. Alas, Ingy was correct and I was wrong.

I appreciate your humility and professionalism in a discussion thread that holds a lot of criticism; suffice it to say, I should have practiced a bit more humility and a bit less "Monday morning quarterbacking" in my original post. And I should have read your comment on YAML's history. To right the record: you got _so_ much right with YAML, and it's unfair for me to cherry-pick this example 20 years later. Sincere apologies...

As the saying goes, "there are only two kinds of languages: the ones people complain about and the ones nobody uses." YAML, like any language, isn't perfect, but it's withheld the test of time and is used by software around the world—many have found it incredibly useful. Sincere thanks for your contribution and work.

As someone who doesn't really use YAML much, your comment provides a good introduction to the kinds of things one needs to know before choosing formats in the future.

This is a very good example of the problems of YAML and it's one of those things that has really preplexed me about the design of YAML. (I suppose it's a sign of the times when YAML was designed.)

It's[1] just so blatantly unnecessary to support any file encoding other than UTF-8, supporting "extensible data types" which sometimes end up being attack vectors into a language runtime's serialization mechanism, autodetecting the types of values... the list goes on and on. Aside from the ergonomic issues of reading/writing YAML files, it's also absurdly complex to support all of YAML's features... which are used in <1% of YAML files.

A well-designed replacement for certain uses might be Dhall, but I'm not holding my breath for that to gain any widespread acceptance.

[1] Present tense. Things looked massively different at the time, so it's pretty unfair to second-guess the designers of YAML.

This was fixed in YAML 1.2 though? So, e.g., in Python you'd just use ruamel.yaml instead of pyyaml.

That doesn't help you, of course, when using a multitude of existing systems whose yaml parsers are based on 1.1...

I've been bit by the string made out of digits and starts with 0 thing a couple times. In this case it gets interpreted as a number and drops leading zeroes. I quickly learned to quote all my strings.

I'd still love for a better means to resolve ambiguities like this, but I've found always quoting to be a fairly reliable approach.

A thread hating on YAML without a mention of the bastardized YAML that ansible uses?

Ansible extends yaml so that:

cmd: a b c

is actually but not quite identical to:

cmd: ["a", "b", "c"]

It also embeds JINJA2 templating part-way (!) through the YAML parsing process.

The gotchas that these and other bastardizations cause is only partially documented at the bottom of this page: https://docs.ansible.com/ansible/latest/reference_appendices...

I like ansible, but its decision to use a bastardized YAML is a major pet peeve of mine.

I'm not an expert by any means, but I'm pretty sure that Ansible uses vanilla YAML (no 'bastardization').

Your first example is an Ansible convenience feature, it's not extending or changing the YAML syntax in any way. You can simply specify `cmd` values as lists or strings, since working with one or the other may be easier depending on the use case.

The templating is unfortunate in some areas, especially where the jinja2 syntax conflicts with what YAML expects (for example starting an object with '{'). That's due to a combination of templating engine choice and YAML, though, and not some custom implementation of YAML. Unless I'm misunderstanding?

I do think going with YAML was a trade-off for Ansible, but it's hard to see Ansible getting to where it is today if it had gone with a custom DSL (or JSON, thank god). I'd take Ansible's YAML over Chef's Ruby or CloudFormation's JSON any day.

Another example of YAML-but-not-quite is Travis CI configuration format:


Oh god, Ansible is exactly why I don't like YAML or Jinja2. I never know what needs to be quoted, what's inlined, what needs to be wrapped in "{{ }}", and what expressions are supported. But once you get the syntax right, it works great.

SaltStack also has the JINJA2 template embedding which can make it very difficult to understand which parts of the lifecycle run through templating. I'm still not certain I understand how it works.

The most recent offenders for bastardizing YAML I have seen are the different CI services:

* Circle CI using moustache-like templating and interpolation with things like {{ .Branch }} available in certain steps [1]

* GitLab CI adding an "include" type directive to declare YAML dependencies [2]

I've also experienced this professionally. At my last company, somebody decided to add a feature to enable interpolation in some parts of the YAML deployment data. It ended up being used by a handful of people who were confused why interpolation worked in some places and not others. The weird trend of "extending YAML" seems to be going against any sort of benefits you might have by trying to use it.

[1]: https://circleci.com/docs/2.0/configuration-reference/#save_...

[2]: https://docs.gitlab.com/ee/ci/yaml/#include

Ansible would be so, so much better if it just used plain JSON, or even JS with an implied context for variables, .eslintrc.js style.

You can usually use plain old JSON anywhere where YAML would be used (e.g. host vars, group vars, vars file includes, I think even playbooks). And internally, most everything in Ansible is JSON anyways.

YAML is for convenience for hand-editing configuration/task files; if you're doing anything that doesn't require hand editing/readability, use JSON.

You can, but you lose the possibility of comments, and writing JSON by hand is also a pain in the ass...

With YAML I can never remember what's an object versus a list, string, or number, nor am I ever able to add new stuff to a YAML file and get it to parse correctly without first looking up the spec. And it's impossible to see where large objects start and end.

In contrast, JSON is super intuitive and basically self documenting. The only real quirks are that you need to use double quotes, and objects can't have a trailing comma.

The only good thing I can see about YAML is that it's super easy to convert and re-export to JSON.

> In contrast, JSON is super intuitive and basically self documenting. The only real quirks are that you need to use double quotes, and objects can't have a trailing comma.

I'd expand the list of quirks... JSON lacks comments (both line-level and block level). Fine for data transport but super super bad for configuration files.

> lacks comments [..] super super bad for configuration files

Not that that matters when applications take it upon themselves to re-save the config file in some kind of normalisation effort. Bye-bye comments, hope they're checked in somewhere.

I'm lookin' at you, kubernetes...

This hits home. Every time I've ever had to make the decision I've chosen yaml, for exactly this reason. Funny how the seemingly small things can be absolute show stoppers when it comes to making decisions in production.

JSON5 supports comments, and is only slightly more complex than JSON. https://json5.org/

The barrier here would be whether there's support in enough implementations to feel safe using it in the wild, which I'm guessing will take a while at the very least.

What’s incredible here is that we’re not at the beginning of programming, when we built temporary languages that ended up becoming forgotten. Web may be a final form of IT, Angular may be the « right », the final, the perfect way to build applications even in 50 years, just like HTML has become the final way to build websites for the last 25 years, JSON may make legacy, and my grandson might even struggle with parsers that still use JSON instead of this new tech called JSON5...

The popularity of transpilers might help overcome that barrier. IDE's and task runners can watch for file modifications and run a simple program to convert to the old format.

But I'm curious what you mean by "in the wild"? If you're using (producing) it, something needs to consume it, and you would probably have control over both in whatever project you were using it for.

If you're going to do this kind of thing why would you not add a standard date format

Just use a string and ISO 8601? “2018-03-25”

Wow, I can "just" use that, thanks. The problem is that JSON is an interchange format, meaning that I need to implement this serialization and deserialization quirk on every producer/consumer of my API (which, you know, avoiding is kind of the point of using a standard interchange format). Furthermore, because everything is a string, I can't unambiguously indicate something is meant to be a string in that format rather than a date.

What's the timezone?

Do you need a time zone for dates?

Consider if I'm storing a user's (local) birthday on my server:

    {..., "birthday": "2018-03-25"}
If my server is located in New York City, and the user is in Sydney, then my server isn't going to wish them happy birthday in time.

So maybe we could do:

    {..., "birthday": "2018-03-25", "location": "Sydney/AU"}
But at this point we might as well use a standardized time format (UTC) with a timezone offset. Maybe I'm thinking too far into it?

ISO 8601 has you covered:

      YYYY (eg 1997)
   Year and month:
      YYYY-MM (eg 1997-07)
   Complete date:
      YYYY-MM-DD (eg 1997-07-16)
   Complete date plus hours and minutes:
      YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
   Complete date plus hours, minutes and seconds:
      YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
   Complete date plus hours, minutes, seconds and a decimal fraction of a second
      YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)

     YYYY = four-digit year
     MM   = two-digit month (01=January, etc.)
     DD   = two-digit day of month (01 through 31)
     hh   = two digits of hour (00 through 23) (am/pm NOT allowed)
     mm   = two digits of minute (00 through 59)
     ss   = two digits of second (00 through 59)
     s    = one or more digits representing a decimal fraction of a second
     TZD  = time zone designator (Z or +hh:mm or -hh:mm)

Most parsers will not support JSON5.

More like browsers can de-serialize JSON5 natively. writing a JSON5 parser is not difficult. It's just not part of most std libs in most languages, but I would argue that most std libs don't parse YAML either.

JSON5 is a good compromise.

Neither Chrome nor Firefox's JSON.parse accept JSON5. I'm not sure what browser API you mean.

For example,

throws a syntax error.

> Neither Chrome nor Firefox's JSON.parse accept JSON5. I'm not sure what browser API you mean.

This is a typo, I meant "can't" not "can". Off course browsers don't support JSON5 or my message makes no sense whatsoever.

The lack of template string support is a weird part of that.

In our JSON config files we do:

{ "ConfigKeyComment": "This is for blah blah blah", "ConfigKey": "Foo" }

Obviously this wouldn't work in all cases (you're putting more work on your parser to interpret unused keys basically), but if we're talking config files specifically, I see this as an acceptable approach since there's little chance you'll be parsing such files more than once each (plus, writing a simple tool to strip the comments out would be very trivial).

I've tried something similar but found it way too painful if the comments need to be long...

    # Never enable this config, because if you do the space-time
    # continuum will collapse into itself and the cloud servers
    # will disappear in a puff of steam. However, if you really
    # must enable it, remember that it's boolean and go read
    # TICKET-8675309 for the extensive list of side effects.
    TurboFactorRenoberation = false
...so people just end up writing stuff like:

        "ConfigKeyComment": "TICKET-8675309",
        "ConfigKey": "TurboFactorRenoberation",
        "ConfigVal": false
[edit]: formatting

That's not JSON anymore, that's pretty much enough to be considered a unique DSL. Scary that this is more or less required.

Almost all applications evolve their config files into unique DSLs over time. They may choose a generic serialization for the DSL's AST but it will end up being an underspecified application specific DSL regardless.

Interesting approach.

Maybe I'm lazy but avoiding increasing cost to commenting is one of the few absolutes I abide by. Often I find myself tired after a long stretch of code, trying to convince myself that's it's understandable on it's own.

This is one of those systematic rules I have to enforce to shutdown my lazy lizard brain.

edit: But I can see how highly structured comments could actually come in handy as well for viewing configs in a gui

>Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.

-Douglas Crockford, creator of JSON

There is no issue using JSON with comments for a config file.

Blurg, now every tooling pipeline that uses JSON needs to include a JSMin step...

If you're making comments for yourself in config files you control:



"hello // syntax error"

You are 100% correct. For not single line comments, "//" in the datastructure, and everything else there are already well tested solutions for the problem. https://github.com/sindresorhus/strip-json-comments

My point is only that this isn't a big issue. I don't understand why so many see it as a large issue. Instead, projects use non-standard YAML or other problematic solutions only because "JSON doesn't have comments".

Yes, except for the fact that ECMAScript has a syntax for comments, but not JSON. Standards matter.

Or pipe through json5 which has other conveniences one might want like trailing commas.

I love json5, but it's not always an option.

Yes, but ITT we're talking about someone apparently having the option to pick which format they're using for config, and they'd use JSON if it wasn't for one dealbreaker.

JSON lacks comments (both line-level and block level)

    “_comment”: “blah blah blah”,

That doesn't work if you use a strict parser where superfluous fields are an error. It's quite rare, but there are good use cases for that kind of strictness.

Or when no fields are superfluous. For instance, when the code iterates over all the fields and does something with each, instead of just looking for known keys.

Or when the place you need to put the comment doesn't happen to be inside an object literal.

This is ugly.

In contrast, JSON is super intuitive and basically self documenting.

Personally I've found the exact opposite when dealing with 'normal' people. Most people can get basic YAML, but unless they're a programmers (or at least know how to program) most people fail miserably at writing JSON by hand.

I agree with this. Our biz guys have to edit JSON config files regularly, and they're limited to basically just copying/pasting lines from existing files and editing the values. When they need to be able to do more, we end up building a UI for it and either storing the config elsewhere or writing the code to manage persisting their changes to the config.

See how much fun these people have trying to figure out why their application crashes because they put a tab instead of spaces into the YAML file.

We use a YAML linter before attempting to load the file. Provides detailed feedback to the user: "Tab found at line 17, column 23"

Also detects repeated hash keys. This is a very good compromise between user-friendliness and machine-friendly specification and serialization.

Thank you very much for YAML! It is a critical user-interceptable interchange mechanism at several companies that I have worked at.

JSON has too many rules regarding comma placement.

> objects can't have a trailing comma

This has caused me so much misery in the past, especially since none of the tools will tell you which line the offending comma is on. Great, somewhere in my thousand-plus-line JSON file is a tiny syntax error but you won't tell me where.

Ended up having to regex for them. Didn't do wonders for my trust in JS tooling.

From the command line:

    python -m json.tool < somefile.json
This will tell you where your file is messed up.

There is also https://github.com/zaach/jsonlint

Thanks for the tip :)

You use JavaScript a lot more than python, right?

Intuitive usually means "close to what I'm used to".

I actually use Python a lot more than Javascript. That's why the double quotes and the no trailing comma are always what get me.

Especially since leaving the trailing comma is considered the best practice in every other language.

JS added support for a trailing comma, and single-vs-double quote is a contentious topic in Python land!

Python dicts are a lot closer to JSON than YAML.

The real schism here, IMO, is Programmer Intuitive vs. Natural Language Intuitive.

Second this. I have used Python dict to write the default config file and JSON Schema to validate a user-supplied config file, which worked quite well in regard of that purpose.

Python objects and JSON are practically identical.

In what sense could this be true? Python objects support a whole host of behavior; JSON is a data format. Python dicts might be a closer analogy except Python keys can be anything that is hashable while JSON requires strings, and of course Python dict values can be any Python value; not just the JSON analogs.

I think you are purposely mis-interpreting me. Python dicts are practically, syntactically identical to JSON. Yes, python dict values can be any python value the same way JSON in JS can be any JS value. Point being, someone coming from python would see JSON as identical to a python dict. We can run around in semantic circles all day.

> I think you are purposely mis-interpreting me.

You misrepresented yourself by saying "object" when you meant "dict" and saying "practically identical" when you meant "vaguely syntactically similar". I wasn't trying to nitpick your semantics; I just had no idea that "Python objects are practically identical to JSON" meant "Python dicts are to Python what JSON objects are to JS, oh and also Python dicts have some syntactic similarlities to JSON" or whatever.

You sure are difficult. There is a non-trivial set of text that is both valid JSON and a valid python dict. Many people would consider the two very similar.

Years ago I had to support a tool that used YAML as a configuration language, and a transport between different applications. Holy. Hell.

First of all, don't ever try to edit a YAML file by hand. You will introduce whitespace or other characters that will break the file, and you will not know until you run it and it breaks something.

The reason you will not know? Not all YAML parsers are the same. Some will interpret it correctly, and some will break. You'll have to get reference implementations of every "supported" YAML parser and run every config you have through them all, and diff them all, before you can trust them.

YAML may be easier to read than JSON, but its added complexity (the parser is significantly more complicated) and obtuse "features" are just not worth the effort. Not to mention, have you ever tried to maintain a very large indented YAML file by hand? Pain in the ass. Just shove everything into JSON files. The fact that it's so limiting is freeing, and everything can parse it. But don't edit it by hand.

And IMNSHO, you shouldn't use either YAML or JSON as a configuration language. They are for data structures, not configuration. If you want a configuration language, go get something designed as a configuration language.

I’ve been using python enums to store static, non sensitive config lately. Let’s me store my data in dict/JSON like format while being able to write comments. Plus no need to do any IO to access variables! However, not really sure if this is was the intended use case for python enums.

Well your last sentence is the whole point: What is a sensible configuration language? For example what would have been decent for Ansible?

A lot of JS tools now will just take a js file that exports a configuration object (`.prettierrc.js`, `.eslintrc.js`, `.babelrc.js`). I find it very sensible.

- Allows code reuse.

- Allows configuration to be as dynamic as you want.

- Can use environment variables.

I suppose there are some cases where you can't trust the user in this way (running configuration code), but I think in a lot of cases you can, and it's generally more convenient.

Good question.

For a sensible language, I would model something after Apache's. Simple, direct, easy to read, easy to write, easy to extend. It's like a server admin who barely knew HTML 1.0 wrote a config format. Perfect for the things it should actually be doing.

Another option is to take a simple format and extend it with another format or language. For example, you could add SQL to a simple file format, and suddenly tons of people can extend the config with some complex logic. But I also think templating and macro languages should generally die in a fire.

INI files aren't bad. They aren't a language, but they are good for simple use cases and a flat structure. Yes, you can have hierarchical section names, but it's a pain. If you want to use INI, you should probably use TOML. But there's very little incentive to add a TOML parser to a simple app when they could just suck in a JSON file. (Personally, I use JSON files, but only because I'm lazy, not because it's a good idea)

The biggest problem with things like Ansible is they'll give you enough rope to hang yourself. First you get defeated by whitespace. Then you get defeated by the stupid YAML rules. Then you get defeated by complexity like inheritance, namespace conflicts, and the shittiest debugging output ever. Then you get Jinja madness embedded inside Ansible madness inside YAML madness, and nobody knows how it works and can even touch it for fear of breaking everything. And of course, there is nothing that can parse it other than Ansible.

I think if Ansible had been TOML+Jinja it would have worked. It would have been ugly and clunky, but it would have worked. (The engine itself and their stupid rules about structuring your project should also die in a fire, but that's a different subject)

Chef just uses Ruby.

No body talks about SDLang (Simple Declarative Language) : https://sdlang.org/

An example :


    // This is a node with a single string value
    title "Hello, World"

    // Multiple values are supported, too
    bookmarks 12 15 188 1234

    // Nodes can have attributes
    author "Peter Parker" email="peter@example.org" active=true

    // Nodes can be arbitrarily nested
    contents {
    	section "First section" {
    		paragraph "This is the first paragraph"
    		paragraph "This is the second paragraph"

    // Anonymous nodes are supported
    "This text is the value of an anonymous node!"

    // This makes things like matrix definitions very convenient
    matrix {
	1 0 0
	0 1 0
	0 0 1

One thing I dislike about it at a glance is:

    author "Peter Parker" email="peter@example.org" active=true
This is like XML attributes, which I've always found annoying to deal with in programs. It doesn't really map to any native data structure in most (all?) programming languages, so you need a special class/struct which supports it.

Simply using something that maps directly to a hash map/object/associative array would be much better, IMHO.

Other than that, it looks like an interesting project.

Actually it is even a superset of XML, from the docs...

    SDL documents are made up of Tags. A Tag contains

    * a name (if not present, the name "content" is used)
    * a namespace (optional)
    * 0 or more values (optional)
    * 0 or more attributes (optional)
    * 0 or more children (optional)
So it's like an XML node, but the `0 or more values` means it has a list/array for a "body".

At least on the case of DLang implementation, uses a DOM api to access to the values. https://github.com/Abscissa/SDLang-D/blob/master/HOWTO.md

I don't get the matrix example. Why is it different from key 1 with multiple values 0, 0, and key 0 with values either 1, 0 or 0, 1?

SDL identifiers start with a letter or underscore, and a key must be a valid identifier.

Interesting. They don't seem to have a python implementation surprisingly.

I have not seen this before, thanks for sharing.

The fact is that is old. I discovered it, seeing the dub build (DLang build&depency tool) files that can use JSON or SDLang.

I'm gonna continue using YAML, like, even if each parser came with support for a halt-and-catch-fire directive that you couldn't turn off or whatever. It's just about the only markup language where you can embed multiline strings without the indentation being fucked either in the markup or in the resulting string, without requiring lots of escaping.

I am sad that EDN hasn’t achieved popularity as a format. It seems like a better specified, less verbose format. As a bonus it plays well with Paredit-like editor modes. Alas, the curse of being better, but later.

EDN is nice, but the curly brackets are a hassle when editing configs. That's why I prefer using yaml to configure my Clojure apps.

We've spent like 10 years trying to fill in gaps left when we all decided to hate XML. JSON is great as a lightweight DIF between trusted partners. If you care about maintenance and safety, XML with XSD is rock solid.

I don't know, XML is awfully verbose and the schemas are even more verbose. I've lost track of how many "XML" configuration files that looked like this:

So that they could pass schema validation and still have some hope of extensibility.

A few weeks ago, I had about 100 config files (tomcat context.xml) which all needed fixes for common misconfigurations - if they had the misconfigurations in the first place The kind of problem that is just a little bit too hard for search and replace. It was really easy with xslt. The result had all comments preserved. I choose to reformat the files, but keeping whitespace was an option too. Now tell me if you can do that with json,yaml,toml. Most parses simply forget about the comments to begin with.

In the same way, if we receive a data transfer in XML and there is a schema, simple validation catches a lot of problems quickly. You'd be surprised how many times a company gives you a schema and then sends you xml which doesn't validate. In JSON, you have to write a program to get even basic validation.

Don't get me wrong: XML has problems, some inherited from html/sgml (entities!), and even more after serious abuse by consultants, archicture astronauts and enterprise vendors (SOAP! namespace overuse! 10 XML parsers in 1 app!). But it was also miles better than what came before and I feel the XML hatred pendulum has swung too far.

Today, JSON is in vogue, and I've seen enough IT to not swim against the tide. It is a reasonable solution for problems caused by XML abuse. Besides, there is value in going with the majority,even if it only fixes 80% of your problem. But I can only weep for the miserable date, numeric and comment support, and their endless stream of incompatible workarounds.

For your parameter example: You can't both strictly validate and have full freedom at the same time. Something has to give a bit. Some less horrible alternatives I've seen:

  <parameter name="X" value="Y"/>

  <subsystem name1="value1" name2=value2 ... /> , add newline for each  attribute


What problems wrt entities does XML have that it has inherited from SGML and HTML? Do you mean entity expansion attacks such as million laughs? HTML has only character rather than general entity references, and SGML has had the ENTLVL capacity to bound entity reference nesting since the year 1986.

Edit: XML is just a proper subset of SGML by definition, hence it didn't introduce a single thing that wasn't there before. It only introduced XML-style empty elements and DTD-less markup, and SGML was extended in lockstep with XML to support these as well

I'd consider user definable entities a problem, as you can't read a file without knowing the DTD. Million laughs is just a very ugly bonus.

XML is more than the part inherited from SGML, it's also the XML culture surrounding it. Namespaces are an example of something that created an XML dialect. And of course SOAP, which actually needs the WS-I standard to explain what parts of the WS-* standards to use or ignore, and how to interprete them. And even then 2 WS-I stacks will rarely interop without trouble. Lets not blame SGML for that monstrosity

What's wrong with namespaces? These are simply globally unique identifiers that allow us to define and use our own globally unique names and make them passably human-readable. That is:

  <a:log /><b:log /><c:log /> 
can mean a math function, a text file that records what's happening, and a cut-off trunk of a tree and there will be no confusion whatsoever.

(I really don't get how SOAP is relevant here.)

I would definitely use RelaxNG for specifying schemas instead of XSD - it's simpler both to read and write in every case I've tried, and the resulting schema is smaller as well, often by a lot.

  <element name="addressBook">
      <element name="card">
        <attribute name="name">
        <attribute name="email">

Plus there's a non-XML "compressed" version as well:

  element addressBook {
    element card {
      element name { text },
      element email { text }
But I agree that XML like you posted is nasty, it's no harder to write

  <parameter name="ApplicationName">WhizBang</parameter>
instead if you need generic parameters, or

if not.

Why not just

<parameter name="ApplicationName" value="WhizBang"/> ?

That's the age-old question, isn't it?

Your option is better, but XML is very (maybe too) flexible and is bound to be made a mess of.

I've seen some pretty messed up schemas in JSON and YAML, too.

That's still worse than <ApplicationName>WhizBang</ApplicationName>. Or


XML is verbose, and overcomplicated. But most purported replacements trimmed too much.

To be quite honest I dislike using XML for human editable configuration files. Variations on Microsoft's .ini files (such as TOML) seem to work best for that, IMHO.

We use .ini files for all of our settings in our products, and they work great most of the time. The only weirdness creeps in when you try to store things with embedded CRLFs and need to escape/unescape them (not a big deal), and storing lists of things is a little difficult.

JSON is great in terms of flexibility, but .INI files are really easy to read because everything is on the left side of the screen/window at all times.

XML is in that odd middle-ground where it's usually human-readable, but also a huge pain in the ass to write. It's great at what it was intended for, as a data interchange format.

For a general-purpose human-writable structured data format, I guess the ugly nonstandard hack that is "JSON with comments" is probably good. It's certainly faster to parse than YAML.

XML wasn't intended as data interchange format, but for replacing SGML as serialization and markup meta-language on the Web (eg. for XHTML, SVG, MathML). It can't be said often enough that markup languages are for authoring and delivering semistructured text data, not for general-purpose data serialization. As in, editing plain text files and have your text treated as content unless marked up with markup and annotated with metadata attributes. Though this is much more pronounced in SGML which also contains the features for authoring (as opposed to delivery) omitted from XML such as tag omission/inference, custom Wiki syntaxes, and other short forms.

Well, what is the purpose of markup languages? Isn't the sole purpose of markup is to be able to process the marked-up content with a computer? Why would you add markup to your favorite verse if it wasn't to somehow feed it to machine for some purpose (analyze, typeset, etc.)?

So when we have text with markup the text part is meant to be there for humans and the markup part is solely for computers. Now let's remove all text; now there's no content for humans at all, only for computers. How is this different from general-purpose data serialization?

(Some of the samples you give, like SVG, may not have any text content at all; it's basically a drawing language.)

Given that XML ecosystem has quite a few tools (e.g. several type description languages or a declarative data transformation language just to name a few) it's a very good general-purpose data serialization format.

I've managed to teach non-programmers to successfully edit YAML files without too much trouble, but most non-programmers have a really hard time consistently producing valid JSON by hand.

As much as there is a lot to not like about YAML, it is the easiest one for humans to consistently write in my experience.

But why should they have to do it by hand?

We don't judge Excel and Librecalc by how easy it is to open their files and produce valid spreadsheets without good tooling.

If they're working with structured data, why can't they use tools/editors which work with the structure, reveal it, and enforce it?

> It's great at what it was intended for, as a data interchange format.

XML's incredible verbosity is a problem for computers too. I've spent time performance-tuning message parsing code that had no good reason to be slow except that our use of XML bloated the data and decoding time by an order of magnitude or more compared to a binary protocol with a schema.

In my experience if you are using XML as a data interchange format and it's slow it's probably because you are using a DOM parser instead of a SAX parser. DOM parsers build a tree that is best used describing a marked up document and much less useful for describing a data structure you might want to serialize or deserialize.

I've gotten incredible speedups just by switching to SAX parsing in those cases.

My biggest complaint with JSON is the lack of support for comments. For that reason, it's hard to take it seriously for human-maintained configurations.

Comments and trailing commas. If those two features were added, I would use JSON for configuring everything. Naked keys would be a distant third. My conclusion is to use TOML or the protocol buffer text format.

I've been slowly ripping out YAML support and converting configurations to TOML.

Not just trailing commas, but the need for commas at all when there is a newline right next to it has been a source of many stupid issues for me when less knowledgeable/experienced people edit json based conf files.

Then there is floats without a leading zero. Missing colon after the key. And yea, naked keys. The need to wrap the entire file in { } or [ ] is just icing.

Honestly I feel the most bare simple conf format of [first-word] [rest-of-line] is enough for many programs that end up using but never taking advantage of more powerful formats.

> but the need for commas at all when there is a newline right next to it

JSON is often minified though - you're going to need something to use as a delimiter

Newline and comma are both 1 byte. You could specify that either is acceptable.

Actually, I think you could leave out the delimiter altogether and still be syntactically unambiguous, since quotes are required around keys:

though this looks terrible and there's probably some edge case I've forgotten. (Also it misses the point of JSON in that it's no longer valid JS. I don't know whether that's important anymore since you should be calling JSON.parse() not eval() anyway.)

> you should be calling JSON.parse() not eval()

The one major reason I could see to use "JSON" as a conf file is in trusted node.js apps because you can then easily embed functions and logic in them if you need more advanced/customizable configurations. And you can do it with full syntax highlighting in your editor. And comments, and trailing commas, and naked keys.

Of course this is no longer JSON, it's straight up Javascript config files. But it has come in handy a few times when I want to override standard behavior on a per config basis, and most of the file is still just plain key: val

A newline character is just as minimal as any other ASCII delimiter.

I would hope minified JSON is not being used for conf files.

I got hit by lack of comment support in go-yaml recently.


"comment": "blah" is one hack.

it's a shame json doesn't support them though. oh well. would be nice to restart the universe and get all this right next time. :-)

I get a lot of flak for this, but there are definitely times I miss XML for certain things and find it way easier to work with than JSON or YAML. I definitely understand some of the backlash against XML that happened a decade or so ago and definitely don't want to return to the days of half of a Java application being XML code.

I think JSON is more efficient to write, but XML often ends up being more efficient to read due to comments and the fact that XML tags often give you better context. I think most programmers (myself included) tend to heavily optimize towards writability when we should think about readability a little more.

An example of this is ElasticSearch, where your queries are in JSON and often end up tons of levels deep - it is super easy to get lost in a sea of closing brackets, whereas XML would let you add comments in and the fact that closing tags have names in them would give you better context about what you were doing.

Well, if you go deeper down the rabbit hole, XML was a complete and utter waste of time. This problem was solved in the 60's with S-Expressions.

> This problem was solved in the 60's with S-Expressions.

Not so much. Sexps don't provide a place hang "extra" information. It's been a pain point. While some lisps allowed decorating runtime things (eg objects with attributes, and symbols with property lists), their printed/readable representations were implementation dependent.

There's also a widespread misconception that Scheme is easy to parse. Numbers and all. It's actually very hard to get right. Real scheme parsers are quite large and hairy.

> XML was a complete and utter waste of time.

While XML was ghastly, there was an unmet need. There still is.

Yeah... I'm a diehard Common Lisp user, and when I saw YAML+go-template used for Kubernetes Helm templates, with some extra hacks to take care of indentation shifts... I felt almost physical pain.

XML has many of the same security problems that YAML does, and even some additional ones.

XML isn't bad, but everyone went crazy and decided to use it for everything which didn't work out well. Then the backlash happened.

Safety? Sounds like rosy retrospection to me. We're still dealing with XML vulnerabilities.

Safety? The OWASP Top 10 doesn't feature a JSON security issue.


DTD and SOAP are both awful.

Especially when technologies like E4X came out to make writing and reading XML in JS so much easier. Although I don't miss SOAP at all.

What is a DIF?

"data interchange format"

Data Interchange Format

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact