Hacker News new | past | comments | ask | show | jobs | submit login

One thing to remember is that YAML is about 20 years old. It was created when XML was at peak popularity. JSON didn't exist (YAML is a parallel, contemporary effort). Even articulating the problems with XML's approach was an uphill battle. What you would replace it with is also hard. What use cases matter? What is the core model? A simple hierarchy? Typed nodes? A graph? What sort of syntax is needed for it to be usable? These were all questions. Seen in context, we got quite a bit correct. And yes... it has a few embarrassing warts and a few deep problems. Ah well.

A second thing to consider... YAML was created before it was common that tech companies actively contributed to open source development. There are lots of things we could have done differently if we had more than a few hours per week... even a tiny bit of financial support would have helped.

Finally, YAML isn't just a spec, it has multiple implementations. Getting consensus among the excellent contributors is a team effort, and particularly challenging when no one is getting paid for the work. Once you have a few implementations and dependent applications, you're kinda stuck in time.

It was an special pleasure for me to have had the opportunity to work with such amazing collaborators.

We did it gratis. We are so glad that so many have found it useful.




I am the author of this article. Apparently people read my website (how they get there, I don't know?)

At any rate, it's worth mentioning that in the conclusion I wrote:

> Don’t get me wrong, it’s not like YAML is absolutely terrible but it’s not exactly great either.

I still use YAML myself even when I have the freedom to use something else simply because – for better or worse – it's very widespread, and for many tasks it's "good enough". For other tasks, I prefer to avoid it.

I think that a stricter version of YAML (such as StrictYAML) would make a lot of people's lives easier though.


I think you're probably right there. I use YAML when something else I'm using calls for it, but mainly I tend to output things in it just because it's very readable.

Using it a lot more lately as I'm diving into Ansible, so I'll be interested to see if I run into problems.


It is particularly unfortunate that ansible uses yaml because if infrastructure is going to be code, some day you will surely want to refactor.


The trend of "stick together Yaml and a template engine, we have our DSL!" in CM sytems is a bit horrible.

Ansible does make some efforts to limit jinja templating to variable substitution, but it's sill not that great, you have all kinds of weird stuff that can happen specially with colons.

The worst one is saltstack, the resulting syntax is just atrocious and border line unreadable, I not a big fan of map.jinja files[0] and on the yaml side, things can get ugly quite fast [1].

I know it's not a popular opinion, but I would rather use the puppet DSL, even with its step learning curve.

[0] https://github.com/saltstack-formulas/salt-formula/blob/mast...

[1] https://github.com/saltstack-formulas/mysql-formula/blob/mas...


The whole concept of templating language on top of YAML is suspect anyway, but I wish Salt had just gone with Mako as the default templating language. That way you could write plain Python in your templates and not have this horrible misuse of Jinja.

I also have come to agree that a DSL is the best solution, though Puppet's particular DSL is not a great example. Projects that re-implement the same thing from scratch like mgmt[1] are on the right track, but probably won't gain enough traction.

[1] https://github.com/purpleidea/mgmt/blob/master/docs/language...


I love the clean style of your website.


Thanks! Last time I checked my domain got penalized for having abnormal low markup or some such, which apparently makes it look like a spam site. I am proud of this.


> Last time I checked my domain got penalized for having abnormal low markup or some such

Do you have a link to the document you were pointed to when you got penalized? If it was Google who penalized you, they must have pointed you to a URL with documentation on why you got penalized and how to resolve it.

I ask this because I run a few websites with even lesser markup than your site but I have never got penalized. I once got penalized due to excessive number of spam comments on one of my websites and they pointed me to https://support.google.com/websearch/answer/190597 ("Remove this message from your site") to resolve the issue. This issue did not affect the search ranking much though (dropped by only about 2 or 3 places in the list of results). But never had an issue with abnormally low markup.

The markup in your website looks pretty reasonable to me, so I am surprised you could get penalized for that when I have had no issues with even lesser markup and they still appear at the top of the list of results for relevant search terms.


I think it was some tool at moz.com, but I don't recall from the top of my head. I don't think it was Google itself. I have no idea what effect that has; I'm not really in to that world.

> I have had no issues with even lesser markup and they still appear at the top of the list of results for relevant search terms.

It seems people are finding my site, whether or not it's being penalized. I mean, someone other than me posted it here, right?


What a time to be alive


Penalised? By whom?


I would assume tehGoog. Just to make it hard to find low impact sites that aren't AMP.


Oh I see. Thanks.


The only search engine in town. You already know its name.


Oh you mean Ask Jeeves?


No, he means Hotbot.


Drupal 8 uses YAML* as its configuration language because JSON doesn't support comments. That simple. Thank you for YAML, it does deliver for us: it's human readable and it's easy to parse (see below).

* I mean, it uses an ill defined subset of YAML. The definition is "whatever the Symfony YAML parser supports".


You know what else is human readable, easy to parse if you're using PHP, and supports comments?

PHP.

I understand why some languages rely on common configuration file formats.

I don't understand why the popular dynamic script-y languages don't more commonly use the natively-expressable associative/list data structures that they're famous for making convenient.


Using includes/imports is not the greatest idea ever.

Your configuration file is one of your program interface. It's something that must be well define. If your configuration file is a programing language this interface is not that well defined.

Also you expose yourself to all kind of weird bugs because some (too smart for their own good) people will monkey patch your software using it.

It adds a lot of unnecessary stuff in the configuration file, things like ';' or '$' are not really useful.

Lastly, common configuration file format are good because there are... common. You can have 2 pieces of software in 2 different languages accessing the same configuration file. A common example of that is configuration management, There are a lot of modules/formula in salt/ansible/puppet/chef doing fine parsing of the configuration files and permits fine grain settings, and I'm not mentioning augeas. If your configuration is a php/python/perl/ruby file good luck with that.

I know it's really common for php applications to do configuration files in php, but frankly, it's a bit annoying.


> If your configuration file is a programing language this interface is not that well defined.

While I do agree with the rest of your comment I don't think they were advocating using the full language for configuration, just the maps/arrays/etc. (e.g. Python's `literal_eval`).


true, but some user will use the full language.

Something like:

config = {'key1': 'value1', 'key2': 'value2'}

could be written as:

config = {}

config['key1'] = 'value1'

config['key2'] = 'value2'

With large chunk possible between the 3.

It basically transforms the configuration file into an API like any library, which is not really what you want for an end user program.


If a key objection/perceived threat is that this might give someone an insertion point they're not meant to use for code ... well, let's consider that we're talking about applications distributed as interpreted language source here. Disallowing code-as-config isn't even closing the door of this particular barn after the horse has left, it's putting two strands of police line tape across the bottom half of the gap where the door was never installed and hoping any equines thinking of passage politely consider the message in case it hadn't already occurred to them which side of the entrance they preferred to be on.

Consider this: Design and optimize for the common case.

Why do we have config files? Because developers actually want a place dedicated to simple or structured application configuration data, for which PHP assignments with arrays + primitives can function at least as effectively as JSON. Most developers would prefer that config data get loaded quickly so the application can get on to doing actual app-y things. Using the language for this means you're parsing at least as fast as you can interpret and you can also take advantage of any code caching that's part of your deployment (especially nice in the PHP-likely event that config settings would be reloaded with every request).

Abuse isn't likely to be the common case. The end users you invoked certainly aren't going to be the ones looking for opportunities to insert code over data. Developers have other places to put code and, as mentioned, probably actually want a place dedicated to data. You're still right that of course someone will do it, just like someone will inevitably create astronaut architecture hierarchy monstrosities in any language with classical inheritance or make potentially hidden/scary changes to language function using metaprogramming facilities.

But potential for abuse doesn't automatically mean a feature should be disallowed.

A lot of the time it's better to let people who can be circumspect have the benefits of a potential approach, and if somebody thinks they need to solve a problem by using a technique that's arguably abuse, well, let them either find out why it's a bad idea or enjoy having solved their problem in an unusual way. Not the end of the world. Possibly even legit.


You can use arbitrary tools to programmatically generate YAML (or JSON, or XML, any of the other "data only" formats.) This allows for tools to drive other tools by generating a spec file and feeding it in. See e.g. Kubernetes for a good example of that.

There's no language that I'm aware of that can natively generate PHP syntax, and there's no common multi-language-platform library for generating PHP syntax. I think that's most of the reason.

To contradict myself, though: Ruby encodes Gemfiles and Rakefiles as Ruby syntax. And Elixir encodes Mixfiles, Mix.Config files, Distillery release-config files, and a bunch of other common data formats as Elixir syntax.

And, of course, pretty much every Lisp just serializes the sexpr representation of the live config for its config format (which means that, frequently, a lot of Lisp runs code at VM-bootstrap time, because people write Turing-complete config files.)


> There's no language that I'm aware of that can natively generate PHP syntax

This is a solid argument against using PHP (or any such language) as a cross-language data interchange format. There are others :) And I totally agree you want a language independent format for anything you might have to feed across an ecosystem of tools.

For a PHP-system generating/altering its own config files... PHP's `var_export` generates a PHP-parseable string representation of a variable (though it sadly doesn't use the short array syntax).

Turing-complete config files probably have some hazards, like Lisp itself does. YMMV regarding whether those hazards can be avoided by circumspect developers or need to be fenced off.


You don't know when you'll need to generate or parse your config files with something that either can't read, write or execute your language.

Django's settings.py sucks. I've used Django since the 0.9 days. It's extremely impractical and needs to be worked around constantly.


This, and the security problems of executable code as configuration, are why the OpenBSD people mandate that /etc/rc.conf is not general-purpose shell script, and why the systemd people mandate that /etc/os-release is similarly not. People want to be able to parse configuration files like this with something other than fully-fledged shell language interpreters; and they want these things to not be vectors for command injections.

* https://unix.stackexchange.com/a/433245/5132


Settings.py is uniquely bad, though, IMO because it tries to be a badly defined dict(), instead of exposing proper configuration interfaces. Ruby config files are common and usually fairly great, see for example the Vagrantfiles.

And you won't have to generate your config files (parsing, maaaaaybe), because those needs are covered by the fact that the files are programs. They are _already_ generating a configuration.


> And you won't have to generate your config files (parsing, maaaaaybe), because those needs are covered by the fact that the files are programs. They are _already_ generating a configuration.

Yes, theoretically, if settings.py was a "generator" format that you ran as a pre-step (like you do to get parser-generators like Bison to spit out source files for you to work with), and this generator actually spat out something like a settings.json, and all the rest of the infrastructure actually dealt with the settings.json rather than the generator, then, yes, it wouldn't matter. Tools in other languages could just generate the settings.json directly.

As it stands, none of those things are true, so tools in other languages actually need to do something that outputs settings.py files.


Galaxy brain: if your config is programmable, it can read whatever terrible configuration format you want. That means my settings.py (yes, I'm forced to use Django) is configured via environment, which is populated by k8s from - gasp - JSON files.

That means that if I wanted to configure Vagrant with JSON, there is no force in the universe that could stop me.

If the config file is actually a normal program, then it can do normal program things, then any benefit from using JSON instead is nullified by the fact that you can still use JSON. In turn, if your tools primary configuration is via a more limited settings, you're stuck with it. Not even "generators in other languages" allow comparable runtime flexibility.


Yup, totally agree with you, settings.py has always been a pain in the ass. Not really an acute one but the kind that is uncomfortable but not enough to make you do something about it.


> There's no language that I'm aware of that can natively generate PHP syntax.

Actually, I've had to use PHP to output a PHP configuration array for a project that required config in PHP.

`var_export($foo)` will output valid PHP code for creating the array $foo. In my case I was doing horrible things to create the array in my pseudo-makefile, then using `var_export()` to output the result. Note that you can run php from the Bash CLI with the `-r` flag, which helps.


Tcl works well for configuration files. You can strip away the extraneous commands in a sub-interpreter to prevent Turing completeness and add infix assignment to remove the monotony of the set command and what you get is a nice config format. If you need more power in the future you just relax some of the restrictions and use it as a script without breaking existing files.


People get really upset when they have to type "array(" instead of "[" or "{" (pre-PHP 5.something) and quotes instead of no quotes (and punting the character escape problem to something else) I guess.

Using code-as-data works really well in Lisp-like languages. Reading a Clojure project's project.clj file or a Lisp project's project.asdf file is pretty pleasant. A programming language's choice in how it decides to handle library config info for building and specifying dependencies (XML, makefiles, JSON, YAML, INI, nothing, etc...) will be a good indicator for the culture of the language around config files in general. Composer for PHP only came out in 2012.


Interestingly, the Lua programming language actually evolved from configuration files: https://www.lua.org/history.html (and is still officially deemed useful for writing them)


I use Lua for configuration files for both personal and work related projects [1]. You get comments and the ability to construct strings piecemeal (DRY and all that). It's easy to sandbox the environment, and while you can't protect against everything (basically, a configuration script can go into an infinite loop), if someone unauthorized does have access to the script, you have bigger things to worry about.

[1] An example: https://github.com/spc476/mod_blog/blob/master/journal/blog....


You can set a count hook to defend against infinite loops.

Lua is great.


That was also one of the rationales behind TCL's design.

John Ousterhout explained in one of his early TCL papers that, as a "Tool Command Language" like the shell but unlike Lisp, arguments were treated as quoted literals by default (presuming that to be the common case), so you don't have to put quotes around most strings, and you have to use punctuation like ${}[] to evaluate expressions.

TCL's syntax is optimized for calling functions with literal parameters to create and configure objects, like a declarative configuration file. And it's often used that way with Tk to create and configure a bunch of user interface widgets.

Oliver Steel has written some interesting stuff about "Instance-First Development" and how it applies to the XML/JavaScript based OpenLaszlo programming language, and other prototype based languages.

Instance-First Development: https://blog.osteele.com/2004/03/classes-and-prototypes/

>The equivalence between the two programs above supports a development strategy I call instance-first development. In instance-first development, one implements functionality for a single instance, and then refactors the instance into a class that supports multiple instances.

>[...] In defining the semantics of LZX class definitions, I found the following principle useful:

>Instance substitution principal: An instance of a class can be replaced by the definition of the instance, without changing the program semantics.

In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.

That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).

The syntax for creating a bunch of objects is parallel to the syntax of declaring a class that creates the same objects.

So you can start by just creating a bunch of stuff in "instance space", then later on as you see the need, easily and incrementally convert only the parts of it you want to reuse and abstract into classes.

What is OpenLaszlo, and what's it good for? http://www.donhopkins.com/drupal/node/124

Constraints and Prototypes in Garnet and Laszlo: http://www.donhopkins.com/drupal/node/69


In our Tcl based application server (many eons ago), we followed exactly that approach.

All configuration files were Tcl data structures that were sourced on server start.


>I don't understand why the popular dynamic script-y languages don't more commonly use the natively-expressable associative/list data structures that they're famous for making convenient.

You picked the wrong language... PHP comes with its own JSON parser. And INI and XML and even CSV.

But, the reason is that, generally, you want config files to describe data or state only. Yes, you could just make your config native code, but then the temptation to add functions and methods and logic to that becomes irresistible and soon your config is an application that needs its own config.

Config formats need to be simple, and preferably not Turing complete.


Any configuration will eventually become a programming language.

See The Configuration Complexity Clock. https://mikehadlow.blogspot.com/2012/05/configuration-comple...


Just because it's common doesn't mean it should be encouraged by writing your config in native code to begin with.

INI is still simple, and JSON doesn't support logic, so the madness can be held at bay at least for a time.

XML and s-expressions are lost causes, though.


Because it's just in general incredibly short sighted to think that your config file is never going to be read by code written in another language.

There's also an argument about whether making configuration files able to execute arbitrary code is a good idea. You get straight into the JavaScript 'eval' problems which we've spent a decade escaping.


Arbitrary code execution in configuration files has caused a few vulnerabilities in Wordpress extensions already, so yes, it's a terrible idea.


I think some of it is PLOP (Principle of Least Power).

$CFG = random() > 0.5 ? "yes" : "no";

...is likely "too powerful". It'd be nice if there were ways in certain programming languages to do something like "drop privileges" to avoid loops, function calls, external access, etc.


The makers of Drush, the cli for Drupal, subscribed to your line of thinking in the early versions and inventory items were defined in PHP files. Migrating from that will be interesting.


Because that forces the end user, who might not know anything about the programming language one’s application is written in to wrestle with the low level implementation details. In the words of Keith Wesolowski, the programmer assumes that the end user is a “Linux superengineer”, which is almost always a wrong assumption to make.


I have linked this https://groups.drupal.org/node/159044 elsewhere. Please note PHP was considered.


Fully agree with you, I have done so multiple times.


I totally agree that in the ideal world, JSON should support comments. I yearn for them, and none of the in-band work-arounds or post-processing tools are acceptable substitutes.

But to play the devil's advocate, how would JSON be able to support round-tripping comments like XML can, since <!-- comments --> are part of the DOM model that you can read and write, while JSON // and /* comments */ are invisible to JavaScript programs. There's nowhere to store the comments in the JSON object model, which you would need to be able to write them back out later!

On important feature of JSON is being able to read and write JSON files with full fidelity and not lose any information like comments. XML can do that, but JSON can't. To fix that you'd have to go back and redesign (and vastly complicate) fundamental JavaScript objects and arrays and values, to be as complex and byzantine as the DOM API.

The less-than-ideal situation we're in isn't JSON's fault or JavaScript's fault, because JSON is just a post-hoc formalization of something that was designed for a different purpose. But JSON is rightly more popular than XML, because it's extremely simple, and nicely impedance matched with many popular languages.

YAML suffers from the same problem as JSON that it can't round-trip comments like XML can, but it fails to be as simple as JSON, is almost as complex as XML, and doesn't even map directly to many popular languages (as the article points out, you can't use a list as a dict key in Python, PHP, JavaScript, or Go, etc).

You can sidestep some of JSON's problems by representing JSON as outlines and tables in spreadsheets, without any need for syntax and sigils like brackets, braces, commas, no commas, quoting, escaping, tabs, spaces, etc, but in a way that supports rich formatted comments and content (you can even paste pictures and live charts into most spreadsheets if you like), and even dynamic transformations with spreadsheet expressions and JavaScript.

See my comments about that in this and another article: https://news.ycombinator.com/item?id=17360071 https://news.ycombinator.com/item?id=17309132


> But to play the devil's advocate, how would JSON be able to support round-tripping comments like XML can, since <!-- comments --> are part of the DOM model that you can read and write, while JSON // and /* comments */ are invisible to JavaScript programs.

It doesn't support it for whitespace in general (if you deserialize into JS object model or equivalent), so why would it be any different for comments specifically? It's just not a design goal of the format.

Although, of course, it's quite possible to have a JSON parser that preserves representation. It'll just have a non-obvious mapping to the host language because of all the comment and whitespace nodes etc.


> "YAML ... is almost as complex as XML"

In fact YAML is probably more complex than XML; the specification of YAML, when I print it into PDF, is about three times as long as that of XML 1.0. (And XML 1.0 also describes DTD, which is kind of a simple type validation for XML and thus includes much more than just serialization syntax.)


Zish https://github.com/tlocke/zish supports comments (but they're not round-trip comments) and also extra data types such as timestamp and bytes.

It's still in its early stages, so if anyone's got any comments I'm interested in hearing them :-)


> YAML suffers from the same problem as JSON that it can't round-trip comments like XML can, [...]

While not mandated by the YAML specification, it doesn't prevent creation of a parser that round-trips comments.

In fact, the ruamel.yaml project for Python provides one.


I'm kinda sad that JSON has been struggling for like 15 years to get comments. Is there like some kind of gestapo that's saying no or something? All it takes is for the maintainers of probably 15 popular libraries to start handling comments.

At the end of the day I'm sure the reason we don't have JSON comments is somewhere listed in this page: xkcd.com/927/


I'm aware of at least three JSON libraries that at least can accept comments (Gson in lenient mode, Json.NET, and json-cpp are the ones I've used personally that do)-- it's hard to convince everyone that JSON needs comments, though, and comments are of limited utility if it's not guaranteed that they'll parse everywhere.

But you really only need comments in JSON if you're doing stuff like storing configuration in JSON, and JSON's too fiddly in general to be a great config file format (too easy to do something like forget a comma; no support for types beyond object, array, (floating point) number, and string). Something more like YAML without the wonky type inference would be better, IMO.


I believe Douglas Crockforf used to make the argument that JSON is not meant for human consumption and thus shouldn't be changed to better serve humans. I personally wish hjson (https://hjson.org) were to get more traction. I prefer it over both JSON and YAML.


It was Crockford. Directly from him:

> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.

https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...


Well, then why not to allow a trailing comma in lists and objects? Computers don't care and they would even be happier, because they can then just pour array and object members with a trailing comma without concerning themselves whether this is the last member or not. (Dijkstra's train toilet problem comes to mind.) Also compare with XML, where each element is self-contained.

And why to model JSON syntax closely after JavaScript literal object syntax (which is actually more convenient, by the way) which, being taken from mainstream programming languages, naturally evolved to be written by humans in small amounts not by computers in large dumps? :)


VS Code uses JSON with comments for config files. [1]

Technically, this is not JSON. You won't be able to use a standard JSON parser without stripping comments first. But you can use a simple, JSON-like language with comments for config.

[1] https://code.visualstudio.com/docs/languages/json#_json-with...


> you can use a simple, JSON-like language with comments for config.

YAML can be employed as a simple JSON-like language with comments.


> simple

YAML is much, much more complicated than JSON.


    > simple
    YAML is much, much more complicated than JSON.
Quoting a single word from the parent’s sentence is misleading. The sentence "YAML can be employed as a simple JSON-like language with comments." is true because JSON is YAML, so you can parse a JSON file with #-comments using a YAML parser.


The parser is not simple, though, and that's what counts.


Most YAML users don't need to look at the source for a YAML parser. I appreciate elegant simplicity, but I don't think parser complexity is the most important metric by which to judge a data interchange format.


If you use a YAML parser to parse JSON-with-comments, it will accept many inputs that don't correspond to JSON-with-comments, and furthermore is likely to report syntax errors that don't make sense to a user who only knows JSON.

So, this unnecessary parser complexity is a usability issue. You should use a parser for the config language you actually intend to support.


JSON5 is also a great alternative: https://json5.org/

Supports comments, trailing commas, single quotes, multi-line strings, and more number formats.


I really wished json5 would support optional commas as well. If you have a new line, no comma needed. So you can do

[

1

2

3

]

New lines used by humans, computers should do a good job as well.


It looks very like an s-expression. Maybe we should go back to Lisp for our data encoding? (and to our code when we are at it? ;))



Well you could use CSON, which uses CoffeeScript notation that allows for constructs such as:

required: [

  'firstName'

  'lastName'
]

which is the same as:

{

  "required": [

    "firstName",

    "lastName"

  ]
}

in JSON :)


That's not a subset of JavaScript though.


Sublime Text does too.


Try https://jsonnet.org/. Supports comments, plus a handful of additional useful features.


Jsonnet is awesome. We use it to generate our yaml files for kubernetes. YAML isn’t easy to parse, nor is it very flexible as a templating language. It gets cumbersome very quickly.

Jsonnet is a relief. Kubernetes should have been a dumb json config from the get go. JSON is ridiculously simple to parse and emit. It has huge interoperability as well with lots of programming languages.


Kubenetes never intended to get stuck on YAML. The CNCF is backing Ksonnet which is Jsonnet for k8s if you haven't seen it before.


Oh man, you just made my day. Comments, imports and mixins, I just had an 'evrika' moment.

P.S. You Romanian by any chance?


well there is also toml and hocon (json supersets) which are "yaml like"


TOML is more like ini than YAML.

I don't like it because it uses the = symbol which seems imperative rather than declarative. (Same with HCL, it might be a nitpick but these are languages I'm going to be using all the time.)

HOCON is interesting but at first glance it seems it might be too ambiguous for my tastes, because like YAML, because it supports both js-style ("//") and shell-style ("#") comments.

JSON plus comments is beautiful because it adds minimally to an unambiguous language which lends itself to automatic formatting (stringificiation).


I'd argue that = only feels imperative if you're used to imperative languages. Prolog and Haskell, both of which focus on being declarative, also both use the equals sign.


Fair enough. It still seems overkill to me though. A ":" seems much more unassuming than an "=".


Don't forget HJSON if you want a json clone with comments. https://hjson.org/


TOM: Initial release 23 February 2013; 5 years ago

Drupal 8 file format discussion was in 2011, predating it by two years. https://groups.drupal.org/node/159044


Is there a yaml parser that preserves comments and a writer that manages to write them back though?


The parser linked from the article does: https://github.com/crdoconnor/strictyaml


> JSON doesn't support comments

eh?

{ "firstName": "John", "lastName": "Smith", "comment": "foo", }

I know it isn't the same as #comments, but who cares really.


The trouble there is that your comments come in-band. What if you're trying to serialise something and you don't have the power to insist that it's not a dictionary with "comment" as a key?


It seems the main difference is your comments are all parsed and loaded into memory with the file, while official comments aren't.


How do I do something like:

{ # comment with a note about the value of foo "foo": "bar", # comment with a note about the value of baz "baz": "qux" }

Without driving myself and future readers insane with fooComments and bazComments?

What if I need a multiline comment explaining a yak-shaving story for why a key is set to a certain value?

What if the object in question is a set of keyword arguments, and adding new fields changes the behavior of whaever is parsing the document?


Ok, I'll bite.

  {
    "#": "A foo variable",
    "foo": true,

    "#": "A bar variable",
    "bar": false
  }
Alternatively.

  {
    "# A foo variable": "",
    "foo": true,

    "# A multiline..": "",
    "# .. bar variable": "",
    "bar": false
  }
Presto!


Sigh. All I wanted to do is to say thanks for the YAML standard -- comments are important but not the only problem with JSON. And truly I can't be expected to remember all of this discussion from like six plus years ago. One thing I remember though, it the trailing comma problem -- we upstreamed a grammar change to Doctrine annotation so "foo, bar," is OK because PHP arrays accept that and it's bonkers trying to code a mostly PHP system without trailing comma support. Also, JSON is no fun to write , you need to have [] {} all correct where YAML is much easier. The less sigils the better and most of Drupal YAMLs only use the dash, the colon and the quote. This is the grave mistake Doctrine committed as well, instead of simple arithmetic (>=1.0) they used mysterious sigils in version specification (~1.0). Drupal is in the business of constantly accepting new contributors and (~R∊R∘.×R)/R←1↓ιR is not newbie friendly, no matter how you slice and dice it. There are certainly advantages of sigil heavy languages like APL and Perl but the scare factor is too high.


fair enough, but then you probably shouldn't have led off your earlier comment with "that simple".


That's just ugly and you're mixing your comments with the data structure, which is potentially confusing. Also, Jason requires a lot more typing. I don't want to have to manually add in all the brackets, quotes and commas when editing config a file.


Presto! You have a duplicate key in the first example.

Also...

  print (json.dumps(json.loads(js_data), indent=2))

  {
    "bar": false,
    "foo": true,
    "# A multiline..": "",
    "# .. bar variable": "",
    "# A foo variable": ""
  }
Presto! ;-)


> who cares really

the person who came up with HOCON, probably


> JSON didn't exist (YAML is a parallel, contemporary effort).

Interesting. How did it happen then that, quoting the YAML 1.2 spec, that "every JSON file is also a valid YAML file"? Although the previous spec documents don't mention JSON.

Was that an intentional design decision for 1.2 or was it some kind of convergent design due to Javascript?


I have admired Douglas Crawford's excellent JSON from the moment I saw it, it is a model of simplicity. I also like TOML and wish it all the best. By contrast, YAML is complex and could use a hair cut.

When I say "JSON didn't exist", what I mean is that it wasn't popular or known to us when we were working on YAML. So, please excuse my sloppy wording. For me, the work on what would become YAML started with a few of us in 1999 (from SML-DEV list). In January of 2001 we picked the name and had early releases. It took a few years of iteration before we had a specification the collaborators (Perl, Python, and Ruby) could all bless.

Anyway, with regard to Crawford's excellent work, JSON. It is a coincidence that YAML's in-line format happened to align. Although, it's probably because of a "C" ancestor, not JavaScript. The main influence on the YAML syntax was RFC0822 (e-mail), only that from my perspective, it needed to be a typed graph. In fact, we documented where we stole ideas from, to the best we could recall at that time: http://yaml.org/spec/1.0/#id2488920.


>YAML is complex and could use a hair cut.

Out of curiosity, did you see the parser linked to at the end of the article? ( https://github.com/crdoconnor/strictyaml )

That was my attempt at giving YAML a haircut. I'd be curious to know what you thought.

Thank you for creating YAML, by the way. Even though part of that rant was quoted from me, I'm not negative on it like the author - I think the core was brilliantly designed. If you put two hierarchical documents side by side - one in TOML and another in YAML the YAML one is much, much clearer and cleaner.


Thank you for StrictYAML I might just use it. It does look like a nice hair cut. You might wish to give Ingy a ring. He has been itching to move forward on a reduced/secure YAML subset.

That said, StrictYAML seems to be a tad bit more of a hair cut than I'd imagine. I'd keep nodes/anchors, since I think a graph storage model is underrated; I think that data processing techniques just haven't caught up with graph structures.

Further, I'm not sure everything can be easily typed based upon a schema. Hence, I'm not sure about completely dropping implicit types, perhaps you may want to provide a way for applications to resolve them if they wish. For example, an application may want to attempt to treat anything starting with "[" or "{" as JSON sub-tree. Perhaps keeping "!tag" but handing it off to the application to resolve might also be a good idea in this regard. Even so, typing should be done at the application level and default to something very boring.


>Thanks for StrictYAML, I might just use it.

Thanks, that's very flattering.

> I'd keep nodes/anchors, since I think a graph model is underrated

Well, you can create graph models without it (and I do) - you can just use string identifiers to identify nodes and let the application decide what that means.

I always thought the intent behind nodes/anchors was not so much graph models but rather to take repetitive YAML and make it DRY. That appears to be how it is used, e.g. in gitlab's ci YAML.

>I'm not sure about completely dropping implicit types, perhaps you may want to provide a way for applications to resolve them if they wish. For example, an application may want to attempt to treat anything starting with [ or { as JSON.

I think that would cause surprise type conversions. There will be plenty of times when you want something to start with a [ or { and you won't want it parsed as JSON.

I embed snippets of JSON in YAML multiline strings sometimes and I usually just parse it directly as a string. Then I run that string through a JSON parser elsewhere in the code.

>You might wish to give Ingy a ring.

I would like that.


> I think that would cause surprise type conversions.

YAML has traditionally been used as the basis of higher-level configuration files for particular applications. What I'm saying is that implicit typing should be permitted, but delegated to those applications.

Conversely, I'm not saying that StrictYAML should do anything by default with unquoted values, except reporting them to the application as being an unquoted value. This way the application could choose to process the value differently from those that are quoted.


An interesting idea, but it's not clear that this will be less confusing or that application authors will make better at avoiding config languages gotchas than config language designers such as yourself (and existing app specific config languages suggest otherwise).

I think a reason this won't necessarily fix the problem with unmet expectations is that identical constructs in different but analogous yaml files would be likely to end up with very different semantics and users effectively have to remember which particular idiosyncratic YAML dialect choices various apps make. Say

   version: 1.3
means the string "1.3" in app a), the float 1.3 in app b) and a version number in app c) one. Furthermore let's assume that app c) required a version number, whereas a) and b) required strings.

Another, more subtle problem, is that such a scheme would make it more likely that applications would end up parsing raw string representations themselves (with ensuing subtle differences even for things which are nominally meant to be identical, say dates or numbers and possibly security problems as well).


> I always thought the intent behind nodes/anchors was not so much graph models but rather to take repetitive YAML and make it DRY. That appears to be how it is used, e.g. in gitlab's ci YAML.

That's how I use it too. When I read about competing formats, that's the first feature I check for. It's really key for readability and usability in some use cases.


Great to have you here elaborating on various design choices. Are you perhaps familiar with OGDL [1] and what's your opinion?

[1] http://ogdl.org/spec


I don't have much to suggest. For YAML, the use of whitespace, colons and dashes primarily emerged from usability testing with domain experts who are not programmers. In particular, testing was done in the context of an application that needed a configuration and data auditing interface, an accounting application. Even anchors/aliases worked in this context and supported the application's use by making the audit records less repetitive without introducing artificial handles.

Other use cases such as dumping any in-memory data structure from memory, perhaps out of a sense that we needed full completeness, actually didn't have any end-user usability testing. Round-tripping data seems in retrospect to be a diversion from the primary value that YAML provided.


Is there an implementation of strict yaml that you know of for Ruby?


If you are writing a new YAML implementation, then yeah, you want a simpler spec to follow.

If on the other hand you are using a YAML library... I've had pretty good success using YAML compatibly across Python, Ruby, C# and Go projects. Do you have a particular issue in mind that the existing Ruby implementation doesn't address?


It's an implementation of YAML, not StrictYAML which has different semantics.


Yes, strict YAML is different than YAML. If you take a look at the github page linked in the GP, it explains the differences.


"JSON didn't exist because Us and We"?

YAML is an invented serialization format, JSON is a discovered one. As CrOCKford points out, JSON existed as long as JS existed, he just called it out and put a name on it.

Anyway, XML is a strong anti-pattern (too much security, even if you get it right on your end, the other party likely screwed something up). YAML seems to be going down that path too.

TOML seems to be "the JSON of *.ini" (ie: discovering old conventions, rather than inventing new ones), and I'm glad to have been exposed to it.


> "JSON didn't exist because Us and We"?

If you define JSON as the underlying practice that Crawford later named and documented, then sure, what I wrote reads completely wrong headed. However, when we were working on YAML, JSON was not yet called out and given a name.

I believe the most important convention that YAML and JSON shared was a recognition of the typed map/list/scalar model used by modern languages. Further, as far as conventions go, I think there's quite a bit to be said about languages that use light-weight structural markers such as: indentation, colon and dash.


The answer is in version 1.2 of the spec:

> The primary objective of this revision is to bring YAML into compliance with JSON as an official subset.


It's not really a moral judgement, thanks for your contributions and your innovations, but I prefer not to use YAML if possible for the same reasons the author outlined.


I didn't know this bit of history. You're right, context explains a lot of the design choices made at YAML birth. Thanks for sharing.


"JSON" became popular in the 90s. They were http requests which returned javascript which you would simply eval(). No need to write or import a parser, and it is the same syntax as the language you're using, because it is the same language. In technology many things become popular not because how good (or bad) things are, but how easy to use something is.


Can't agree more. In tech, the prime mover often becomes the standard.


Clark, thanks so much for YAML. I love it and use it a lot. It actually increases the day-to-day joy of the work I do as a developer.

(While constructive criticism is fine, those rare people who trash it are... nonsensical to me. I'd like to see them do one-tenth as good under the same conditions!)


Unity3D uses YAML for it's serialization engine. Thank you.


I love YAML, thanks for creating it, it's saved me a lot of time over the years.


I love json but despise the fact it doesn't support comments


JSON definitely did exist 20 years ago.


JavaScript objects, yes, but not JSON. Folks were deep into XML as a message format.


> Douglas Crockford originally specified the JSON format in the early 2000s


> I discovered JSON. I do not claim to have invented JSON because it already existed in nature. What I did was I found it, I named it, I described how it was useful. I don’t claim to be the first person to have discovered it. I know that there are other people who discovered it, at least, a year before I did. The earliest occurrence I found was there was someone at Netscape who was using JavaScript array literals for doing data communication as early as 1996, which was at least 5 years before I stumbled onto the idea.

https://www.youtube.com/watch?v=-C-JoyNuQJs


I can independently confirm that people were using JSON before he named it JSON. I was dumping data in JSON in 2000 for dynamically displayed reports.

But then again I was already used to using Perl data structures as dumped by Data::Dumper for config, because I was taught a lot about Perl by a Lisp programmer who had used Lisp data structures for the same purpose since the 1980s. So using JSON didn't feel original or clever. It seemed like I was simply using a well-known technique in yet another dynamic language.

Then again our reaction to XML was the stupid thing other people were doing that you had to do to interact with the rest of the world. I got used to holding my tongue until I went to Google a decade later and found that my attitude was common wisdom there...


According to Platonism, JSON has no spatiotemporal or causal properties (like a datetime format) and thus has existed and will exist eternally. All hail JSON.


I have used all the principle of JSON and developed https://jsonformatter.org




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: