
YAML: probably not so great after all (2017) - tlb
https://arp242.net/weblog/yaml_probably_not_so_great_after_all.html
======
clarkevans
One thing to remember is that YAML is about 20 years old. It was created when
XML was at peak popularity. JSON didn't exist (YAML is a parallel,
contemporary effort). Even articulating the problems with XML's approach was
an uphill battle. What you would replace it with is also hard. What use cases
matter? What is the core model? A simple hierarchy? Typed nodes? A graph? What
sort of syntax is needed for it to be usable? These were all questions. Seen
in context, we got quite a bit correct. And yes... it has a few embarrassing
warts and a few deep problems. Ah well.

A second thing to consider... YAML was created before it was common that tech
companies actively contributed to open source development. There are lots of
things we could have done differently if we had more than a few hours per
week... even a tiny bit of financial support would have helped.

Finally, YAML isn't just a spec, it has multiple implementations. Getting
consensus among the excellent contributors is a team effort, and particularly
challenging when no one is getting paid for the work. Once you have a few
implementations and dependent applications, you're kinda stuck in time.

It was an special pleasure for me to have had the opportunity to work with
such amazing collaborators.

We did it gratis. We are so glad that so many have found it useful.

~~~
chx
Drupal 8 uses YAML* as its configuration language because JSON doesn't support
comments. That simple. Thank you for YAML, it does deliver for us: it's human
readable and it's easy to parse (see below).

* I mean, it uses an ill defined subset of YAML. The definition is "whatever the Symfony YAML parser supports".

~~~
wwweston
You know what else is human readable, easy to parse if you're using PHP, and
supports comments?

PHP.

I understand why _some_ languages rely on common configuration file formats.

I don't understand why the popular dynamic script-y languages don't more
commonly use the natively-expressable associative/list data structures that
they're famous for making convenient.

~~~
derefr
You can use arbitrary tools to programmatically generate YAML (or JSON, or
XML, any of the other "data only" formats.) This allows for tools to drive
other tools by generating a spec file and feeding it in. See e.g. Kubernetes
for a good example of that.

There's no language that I'm aware of that can natively generate PHP syntax,
and there's no common multi-language-platform library for generating PHP
syntax. I think that's most of the reason.

To contradict myself, though: Ruby encodes Gemfiles and Rakefiles as Ruby
syntax. And Elixir encodes Mixfiles, Mix.Config files, Distillery release-
config files, and a bunch of other common data formats as Elixir syntax.

And, of course, pretty much every Lisp just serializes the sexpr
representation of the live config for its config format (which means that,
frequently, a lot of Lisp runs code at VM-bootstrap time, because people write
Turing-complete config files.)

~~~
wwweston
> There's no language that I'm aware of that can natively generate PHP syntax

This is a solid argument against using PHP (or any such language) as a cross-
language data interchange format. There are others :) And I totally agree you
want a language independent format for anything you might have to feed across
an ecosystem of tools.

For a PHP-system generating/altering its own config files... PHP's
`var_export` generates a PHP-parseable string representation of a variable
(though it sadly doesn't use the short array syntax).

Turing-complete config files probably have some hazards, like Lisp itself
does. YMMV regarding whether those hazards can be avoided by circumspect
developers or need to be fenced off.

~~~
scrollaway
You don't know when you'll need to generate or parse your config files with
something that either can't read, write or execute your language.

Django's settings.py sucks. I've used Django since the 0.9 days. It's
extremely impractical and needs to be worked around constantly.

~~~
LaGrange
Settings.py is uniquely bad, though, IMO because it tries to be a badly
defined dict(), instead of exposing proper configuration interfaces. Ruby
config files are common and usually fairly great, see for example the
Vagrantfiles.

And you won't have to generate your config files (parsing, maaaaaybe), because
those needs are covered by the fact that the files are programs. They are
_already_ generating a configuration.

~~~
derefr
> And you won't have to generate your config files (parsing, maaaaaybe),
> because those needs are covered by the fact that the files are programs.
> They are _already_ generating a configuration.

Yes, theoretically, if settings.py was a "generator" format that you ran as a
pre-step (like you do to get parser-generators like Bison to spit out source
files for you to work with), and this generator actually spat out something
like a settings.json, _and_ all the rest of the infrastructure actually dealt
with the settings.json rather than the generator, _then_ , yes, it wouldn't
matter. Tools in other languages could just generate the settings.json
directly.

As it stands, none of those things are true, so tools in other languages
actually need to do something that _outputs_ settings.py files.

~~~
LaGrange
Galaxy brain: if your config is programmable, it can read whatever terrible
configuration format you want. That means my settings.py (yes, I'm forced to
use Django) is configured via environment, which is populated by k8s from -
_gasp_ \- JSON files.

That means that if I wanted to configure Vagrant with JSON, there is no force
in the universe that could stop me.

If the config file is actually a normal program, then it can do normal program
things, then any benefit from using JSON instead is nullified by the fact that
you can still use JSON. In turn, if your tools primary configuration is via a
more limited settings, you're stuck with it. Not even "generators in other
languages" allow comparable runtime flexibility.

------
allanbreyes
I'd like to propose the "YAML-NOrway Law."

"Anyone who uses YAML long enough will eventually get burned when attempting
to abbreviate Norway."

Example:

    
    
      NI: Nicaragua
      NL: Netherlands
      NO: Norway # boom!
    
    

`NO` is parsed as a boolean type, which with the YAML 1.1 spec, there are 22
options to write "true" or "false."[1] For that example, you have wrap "NO" in
quotes to get the expected result.

This, along with many of the design decisions in YAML strike me as a simple
vs. easy[2] tradeoff, where the authors opted for "easy," at the expense of
simplicity. I (and I assume others) mostly use YAML for configuration. I need
my config files to be dead simple, explicit, and predictable. Easy can take a
back seat.

[1]: [http://yaml.org/type/bool.html](http://yaml.org/type/bool.html) [2]:
[https://www.infoq.com/presentations/Simple-Made-
Easy](https://www.infoq.com/presentations/Simple-Made-Easy)

~~~
clarkevans
The implicit typing rules (ie, unquoted values) should have been application
dependent. We debated this when we got started and I thought there was no
"right" answer. Alas, Ingy was correct and I was wrong.

~~~
allanbreyes
I appreciate your humility and professionalism in a discussion thread that
holds a lot of criticism; suffice it to say, I should have practiced a bit
more humility and a bit less "Monday morning quarterbacking" in my original
post. And I should have read your comment on YAML's history. To right the
record: you got _so_ much right with YAML, and it's unfair for me to cherry-
pick this example 20 years later. Sincere apologies...

As the saying goes, "there are only two kinds of languages: the ones people
complain about and the ones nobody uses." YAML, like any language, isn't
perfect, but it's withheld the test of time and is used by software around the
world—many have found it incredibly useful. Sincere thanks for your
contribution and work.

~~~
Retra
As someone who doesn't really use YAML much, your comment provides a good
introduction to the kinds of things one needs to know before choosing formats
in the future.

------
bryanlarsen
A thread hating on YAML without a mention of the bastardized YAML that ansible
uses?

Ansible extends yaml so that:

cmd: a b c

is actually but not quite identical to:

cmd: ["a", "b", "c"]

It also embeds JINJA2 templating part-way (!) through the YAML parsing
process.

The gotchas that these and other bastardizations cause is only partially
documented at the bottom of this page:
[https://docs.ansible.com/ansible/latest/reference_appendices...](https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html)

I like ansible, but its decision to use a bastardized YAML is a major pet
peeve of mine.

~~~
crooked-v
Ansible would be so, so much better if it just used plain JSON, or even JS
with an implied context for variables, .eslintrc.js style.

~~~
geerlingguy
You can usually use plain old JSON anywhere where YAML would be used (e.g.
host vars, group vars, vars file includes, I think even playbooks). And
internally, most everything in Ansible is JSON anyways.

YAML is for convenience for hand-editing configuration/task files; if you're
doing anything that doesn't require hand editing/readability, use JSON.

------
Alex3917
With YAML I can never remember what's an object versus a list, string, or
number, nor am I ever able to add new stuff to a YAML file and get it to parse
correctly without first looking up the spec. And it's impossible to see where
large objects start and end.

In contrast, JSON is super intuitive and basically self documenting. The only
real quirks are that you need to use double quotes, and objects can't have a
trailing comma.

The only good thing I can see about YAML is that it's super easy to convert
and re-export to JSON.

~~~
mschuster91
> In contrast, JSON is super intuitive and basically self documenting. The
> only real quirks are that you need to use double quotes, and objects can't
> have a trailing comma.

I'd expand the list of quirks... JSON lacks comments (both line-level and
block level). Fine for data transport but super super bad for configuration
files.

~~~
tlb
JSON5 supports comments, and is only slightly more complex than JSON.
[https://json5.org/](https://json5.org/)

~~~
saghm
The barrier here would be whether there's support in enough implementations to
feel safe using it in the wild, which I'm guessing will take a while at the
very least.

~~~
tajen
What’s incredible here is that we’re not at the beginning of programming, when
we built temporary languages that ended up becoming forgotten. Web _may be_ a
final form of IT, Angular may be the « right », the final, the perfect way to
build applications even in 50 years, just like HTML has become the final way
to build websites for the last 25 years, JSON may make legacy, and my grandson
might even struggle with parsers that still use JSON instead of this new tech
called JSON5...

------
peterwwillis
Years ago I had to support a tool that used YAML as a configuration language,
and a transport between different applications. Holy. Hell.

First of all, don't ever try to edit a YAML file by hand. You _will_ introduce
whitespace or other characters that will break the file, and you _will not
know_ until you run it and it breaks something.

The reason you will not know? _Not all YAML parsers are the same._ Some will
interpret it correctly, and some will break. You'll have to get reference
implementations of every "supported" YAML parser and run every config you have
through them all, and diff them all, before you can trust them.

YAML may be easier to read than JSON, but its added complexity (the parser is
significantly more complicated) and obtuse "features" are just not worth the
effort. Not to mention, have you ever tried to maintain a very large indented
YAML file by hand? Pain in the ass. Just shove everything into JSON files. The
fact that it's so limiting is freeing, and everything can parse it. But don't
edit it by hand.

And IMNSHO, you shouldn't use _either_ YAML or JSON as a configuration
language. They are for data structures, not configuration. If you want a
configuration language, go get something designed as a configuration language.

~~~
tajen
Well your last sentence is the whole point: What is a sensible configuration
language? For example what would have been decent for Ansible?

~~~
atom_arranger
A lot of JS tools now will just take a js file that exports a configuration
object (`.prettierrc.js`, `.eslintrc.js`, `.babelrc.js`). I find it very
sensible.

\- Allows code reuse.

\- Allows configuration to be as dynamic as you want.

\- Can use environment variables.

I suppose there are some cases where you can't trust the user in this way
(running configuration code), but I think in a lot of cases you can, and it's
generally more convenient.

------
Zardoz84
No body talks about SDLang (Simple Declarative Language) :
[https://sdlang.org/](https://sdlang.org/)

An example :

```

    
    
        // This is a node with a single string value
        title "Hello, World"
    
        // Multiple values are supported, too
        bookmarks 12 15 188 1234
    
        // Nodes can have attributes
        author "Peter Parker" email="peter@example.org" active=true
    
        // Nodes can be arbitrarily nested
        contents {
        	section "First section" {
        		paragraph "This is the first paragraph"
        		paragraph "This is the second paragraph"
        	}
        }
    
        // Anonymous nodes are supported
        "This text is the value of an anonymous node!"
    
        // This makes things like matrix definitions very convenient
        matrix {
    	1 0 0
    	0 1 0
    	0 0 1
        }
    

```

~~~
arp242
One thing I dislike about it at a glance is:

    
    
        author "Peter Parker" email="peter@example.org" active=true
    

This is like XML attributes, which I've always found annoying to deal with in
programs. It doesn't really map to any native data structure in most (all?)
programming languages, so you need a special class/struct which supports it.

Simply using something that maps directly to a hash map/object/associative
array would be much better, IMHO.

Other than that, it looks like an interesting project.

~~~
transfire
Actually it is even a superset of XML, from the docs...

    
    
        SDL documents are made up of Tags. A Tag contains
    
        * a name (if not present, the name "content" is used)
        * a namespace (optional)
        * 0 or more values (optional)
        * 0 or more attributes (optional)
        * 0 or more children (optional)
    

So it's like an XML node, but the `0 or more values` means it has a list/array
for a "body".

~~~
Zardoz84
At least on the case of DLang implementation, uses a DOM api to access to the
values.
[https://github.com/Abscissa/SDLang-D/blob/master/HOWTO.md](https://github.com/Abscissa/SDLang-D/blob/master/HOWTO.md)

------
ben0x539
I'm gonna continue using YAML, like, even if each parser came with support for
a halt-and-catch-fire directive that you couldn't turn off or whatever. It's
just about the only markup language where you can embed multiline strings
without the indentation being fucked either in the markup or in the resulting
string, without requiring lots of escaping.

------
peatmoss
I am sad that EDN hasn’t achieved popularity as a format. It seems like a
better specified, less verbose format. As a bonus it plays well with Paredit-
like editor modes. Alas, the curse of being better, but later.

~~~
t3soro
EDN is nice, but the curly brackets are a hassle when editing configs. That's
why I prefer using yaml to configure my Clojure apps.

------
tootie
We've spent like 10 years trying to fill in gaps left when we all decided to
hate XML. JSON is great as a lightweight DIF between trusted partners. If you
care about maintenance and safety, XML with XSD is rock solid.

~~~
commandlinefan
I don't know, XML is awfully verbose and the schemas are even more verbose.
I've lost track of how many "XML" configuration files that looked like this:

    
    
       <parameter>
         <name>ApplicationName</name>
         <value>WhizBang</value>
       </parameter>
       ...
    

So that they could pass schema validation and still have some hope of
extensibility.

~~~
organsnyder
Why not just

<parameter name="ApplicationName" value="WhizBang"/> ?

~~~
rocmcd
That's the age-old question, isn't it?

Your option is better, but XML is very (maybe too) flexible and is bound to be
made a mess of.

~~~
organsnyder
I've seen some pretty messed up schemas in JSON and YAML, too.

------
jerrac
A couple thoughts.

If your configuration file is so long it's unreadable in YAML, then maybe you
need to break it up into more than one file? I can't imagine any syntax would
be easy to read once you reach more than 100 or so lines.

Do any configuration file languages support type hinting? Adding (int) in
front of a YAML key would be easy enough to read, and would keep some of the
confusion at bay.

~~~
erik_seaberg
YAML tags work this way. E.g., 2002-04-28 is a date (because it looks like
one) but !!str 2002-04-28 is a string.

~~~
jerrac
I had never heard of tags before. Thanks!

So, would this work?

    
    
        ports:
          https:
            enabled: yes
            !!int port: 443
    

Then if someone is copy/pasting and tries to use "blah" as the value, the
!!int tag would cause the yaml parser to throw an error. Right?

~~~
spiralx
No, because tags specify a type for the value, not the key, so you could use

    
    
      ports:
        https:
          port: !!str 443
    

to parse the same as

    
    
      {
        "ports": {
          "https": {
            "port": "443"
          }
        }
      }
    

in JSON

~~~
jerrac
Ah, I had my syntax wrong. Thanks!

------
DonHopkins
As a general rule of thumb: Never use yet another non-markup language designed
by people who claimed to be designing yet another markup language from the
very outset, then after somebody awkwardly pointed out that what they'd
designed wasn't actually a markup language, they invent a backronym to
contradict that embarrassing historical fact.

It just makes me wonder what the hell they thought they were doing all that
time...

It's like designing a tool called YACC, and ending up with Yet Another
Interpreter Interpreter!

It's like a standard for storing all your pornography in a folder called
"Definitely Not Pornography".

[https://en.wikipedia.org/wiki/YAML](https://en.wikipedia.org/wiki/YAML)

>Originally YAML was said to mean Yet Another Markup Language, referencing its
purpose as a markup language with the yet another construct, but it was then
repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish
its purpose as data-oriented, rather than document markup.

~~~
clarkevans
> It just makes me wonder what the hell they thought they were doing all that
> time...

The YAML project was a convergence of several different efforts at information
representation including people from Perl, Python and Ruby, each with our own
ideas. I happened to be involved in the outer ring of the XML community, in
particular a group SML-DEV where we were looking for a better information
model more suitable to data serialization that would use an XML compatible
syntax.

At that time, especially since serializing data with XML was all the rage,
"ML" or "Markup Language" was commonly associated with data serialization. In
fact XML is very inconvenient for actual markup, even though it derives from
SGML (of which HTML is an example).

The "YA" part did deliberately come from YACC, the reason why is that XML (and
we hoped YAML) would be the basis of domain specific serialization languages.
Hence, you can think of it as a meta-language for building sub-languages, like
the application I was working on, a serialization for accounting data.

Hence, that's the origin of YAML acronym, which happened to have a domain name
open. In fact, you can look at archive.org starting in 2001 and you'll see the
1st public pass of YAML more closely followed XML like bracket syntax. I
disliked it, but, it was what came out of the SML-DEV collaborations. Even so,
the important thing was the information model, which was a simple typed graph
and not an element tree.

The syntax evolved with many revisions after the model and goal were set, with
lots of feedback to address concerns and usability tests with domain experts.
The syntax got more and more lightweight, inspired from RFC0822 (email) while
adding dashes for list items. Testing the syntax with domain experts, e.g.
accountants and the like, was exceptionally important part of the process.
Users liked this serialization syntax since it made their data "pop".

So. A year or so passes while we focus on getting things to work and helping
people. Then, the product differentiation question comes up. Since XML had the
dominant position in the data serialization mind share, how does YAML compare?
Well, the YAML model was a typed graph, while XML was an element tree. One
required no special libraries to manipulate, the other require a DOM to
translate. But perhaps more importantly, it's because XML borrowed its model
and tags from SGML and SGML was a true "markup language". So, XML was a
"markup language" where it was impractical to do actual markup. Then, it
dawned on us, well, of course a data serialization language isn't markup
problem. I'm not sure any of this was obvious at the time.

Anyway, in a very fun chat, Oren pointed this out and then Ingy said: well
then, YAML Ain't Markup Language! So, the new name actually represented what
we had set out to do in the first place. Further, I would suggest that the
industry understanding of how serialization languages are poorly supported by
markup approaches (XML) is at least somewhat due to our name change and fun
filled articulation at conferences.

------
nonbirithm
For config formats I'm finding HCL[1] to be nice for my use cases. It has
comments, no requirement for double quoted identifiers, and is actually
simple. The main issue was the only implementation is in Go, so I had to write
a port to C++.

[1] [https://github.com/hashicorp/hcl](https://github.com/hashicorp/hcl)

~~~
LukeB42
Super happy with HCL for over a year in production now.

I've only got one FOSS project using HCL but I think of its' bundled HCL
config file is an attractive part of its UX:
[https://github.com/LukeB42/psyrcd](https://github.com/LukeB42/psyrcd)

------
crooked-v
I continue to hold a firm belief that the reason JSON is so popular is that it
covers most use cases without any of the dumb crap that hides in YAML and XML
behavior.

~~~
krapp
Yes. YAML and XML do way more than a config format needs. You need key value
pairs and basic structures (variables, arrays and maps) and types.

JSON is lacking in some respects but it's still really close to perfect for
its use case.

------
the_imp
Over the past few months, I've built up a somewhat masochistic relationship
with YAML, as I've been writing my own JS library for it [1]. Yes, the spec is
more complicated than it ought to be and yes, writing yet another
implementation might just mean more overall variance within the spec, but it's
still the only config language with decent usage that supports human-readable
multi-line strings and comments. And it would've been really nice if someone
else had supported editing comments, so I wouldn't have needed to do that
myself.

Still, I'm optimistic, especially now that Prettier is getting YAML support.
[2]

[1] [https://eemeli.org/yaml/](https://eemeli.org/yaml/)

[2]
[https://github.com/prettier/prettier/pull/4563](https://github.com/prettier/prettier/pull/4563)

------
strogonoff
For a serious take on configuration file format it might be worth checking out
Dhall.

Dhall has schemas, types, imports, and even functions (though it doesn’t let
you write code that would e.g. loop infinitely).

You can even use an executable[0] to compile Dhall to YAML and JSON, excluding
unsupported features such as functions.

[0] [https://github.com/dhall-lang/dhall-
lang/blob/master/README....](https://github.com/dhall-lang/dhall-
lang/blob/master/README.md#json-and-yaml)

------
ChrisSD
I agree with some of the author's points but the "surprising behaviour"
section is odd. For example, why would you expect `3.5.3` to be parsed as a
number? How could that be parsed as a number?

~~~
bhaak
The 013 to 11 issue is pretty obvious to any seasoned programmer. For example
C, Ruby, and yes, also Javascript have the same "problem".

Octal 13 is decimal 11.

JSON should actually have the same issue. When I enter { 013: "11" } in the
web console I get '{11: "11"}'. And YAML is backwards compatible to JSON.

That's IMO the actual problem of YAML. It could have supported a reasonable
subset of JSON and not the whole nine yards.

~~~
aepiepaey
The web console is not a JSON parser. To check if something parses as JSON in
the web console you should use JSON.parse, e.g:

    
    
      JSON.parse('{ 013: "11" }')
    

which should produce a syntax error (as already mentioned by siblings to this
comment).

~~~
bhaak
Thanks. Yeah, I had mistakenly the believe that JSON is mostly a Javascript
hash, not that 'JSON is valid Javascript' which means it's a subset of all
possible Javascript hashes.

------
Zamicol
Every time this comes up, I don't understand how
[https://json5.org](https://json5.org) isn't superior to YAML in every way.

If there was some deficiency with JSON5, just simply use JSON with comments.
It's that simple.

JSON is one of the best things to ever come out of the CS disciplines.

For those that whine about comments in JSON, Douglas Crockford, the creator of
JSON, himself said to do it.

>Suppose you are using JSON to keep configuration files, which you would like
to annotate. Go ahead and insert all the comments you like. Then pipe it
through JSMin before handing it to your JSON parser.

------
jwr
I never understood what it is that people like about YAML.

I keep configuration in EDN, which avoids all of the problems described in the
article and has other advantages, too.

~~~
Rapzid
EDN is fantastic. It wasn't super portable last I looked though.

~~~
peatmoss
There is reasonably good library support: [https://github.com/edn-
format/edn/wiki/Implementations](https://github.com/edn-
format/edn/wiki/Implementations)

------
markpapadakis
I personally dislike with a passion every language or grammar that depends on
white-space identation, especially if the designers were extremely opinioned
to the degree that you can only use spaces (or even, a specific number of
spaces per identation level) and not tabs.

It's not as a big of a deal in Python because as others have mentioned, you
usually don't end up writing large functions to begin with, and tab identation
is supported. It still means that if you try to send a code snippet to someone
over email or Slack, IM, etc it may not work because whitepsace may be trimmed
etc. With YAML, it's way worse for reasons the author outlined.

Something may look good on paper (or in screenshots) but practical
considerations need to factored in when designing a grammar, and there are
myriads pretty-formatting utilities that could be used to that end if one
cares for that sort of thing (see also: clang-format).

------
49bc
I don't love YAML, but for configurations I always choose it for the simple
fact that it supports comments. Comments for json are almost always hacky (for
example, imbedding a comment key inside the value).

------
pducks32
I write a lot of command line tools for work and need configuration files and
always go for YAML, but am never happy about it. I use it because it
serializes to Ruby objects so I can more easily do validation and check if
someone forgot an important key but I wish there were something that would
make that easier for the people using the config files. I looked into TOML
after this and liked what I saw.

------
Mister_Snuggles
I've been playing with Home Assistant[0] recently and, as a result, getting
exposed to YAML. I don't find it pleasant to work with at all. I'm sure a big
part of that is how Home Assistant uses YAML for automation stuff[1] that
would probably be better served by a real programming language.

I think a big part of the unpleasantness comes from how non-obvious some
things are. For example, why does the first automation example require a "-"
in front of "platform" (under "trigger"), but the second doesn't? The comments
explain _when_ it's required, but not _why_? Shouldn't the parser be able to
figure it out from the significant whitespace?

I found it so unpleasant that I'm using Node-RED[2] to do anything even
vaguely automation related and have relegated Home Assistant to being the UI
and communications abstraction layer.

In contrast, XML, while overly verbose, has a more reasonable structure. I
haven't played with JSON much, but it also seems pretty reasonable.

[0] [https://www.home-assistant.io/](https://www.home-assistant.io/)

[1] [https://www.home-
assistant.io/docs/automation/examples/](https://www.home-
assistant.io/docs/automation/examples/)

[2] [https://nodered.org/](https://nodered.org/)

~~~
balloob
(Founder Home Assistant here)

Everything that you can do in YAML + more, you can do in Home Assistant using
Python. Have a look at my PyCon talk [1] from 2 years ago or check the
available functions in the docs [2].

About the dashes. In Home Assistant when an option takes a list, you can omit
the list if you are just passing in a single entry.

[1]:
[https://youtu.be/Cfasc9EgbMU?t=1038](https://youtu.be/Cfasc9EgbMU?t=1038)
[2]: [https://dev-docs.home-
assistant.io/en/master/api/event.html](https://dev-docs.home-
assistant.io/en/master/api/event.html)

------
jrochkind1
> Are you sure that every YAML parser will treat foo:bar as a string, or 0x42
> as the integer 42, etc.?

Definitely not. I'd expect 0x42 to be 66. (Not kidding, if 0x42 means hex
notation!).

Point taken.

~~~
Gibbon1
My friends who are smarter than me have these criticisms.

XML -> doesn't record what programs think of as 'data'. So a program needs to
convert it's data representation to XML and back again. Usually there is no
formal spec. This is the exact same issue you have with databases. But at
least a database has a formal representation and data types.

Part of SOAP is Microsoft trying to bolt schema's onto XML. SOAP seems to
generate a lot of unhappy programmer noises. But my ex roommate said, 'well
when you get it working it works'

JSON and YAML do but they don't have schema's to parse against. So programs
need to do their own validation.

------
ravenstine
> Loading a user-provided (untrusted) YAML string needs careful consideration.

Why would you ever use YAML for user-provided input? At that point, it's
better to just use JSON.

> Many other languages (including Ruby and PHP1) are also unsafe by default.
> Searching for yaml.load on GitHub gives a whopping 2.8 million results.
> yaml.safe_load only gives 26,000 results.

Maybe that's everyone's using JSON where it would be unsafe to use YAML.

> YAML files can be hard to edit, and this difficulty grows fast as the file
> gets larger.

And... this isn't the case for XML or JSON?

Ok, so reindenting a section might be a pain, but if your YAML is containing
large amounts of data, maybe that data doesn't belong in that format if you're
manually editing the YAML.

> especially since 2-space indentation is the norm and tab indentation is
> forbidden

Good. ;)

> And accidentally getting the indentation wrong often isn’t an error; it will
> often just deserialize to something you didn’t intend. Happy debugging!

Which is unlikely to happen if you're using a YAML library or only editing
small-ish config files by hand.

\---

As noted, YAML has a lot of quirks. As a configuration language, I love it and
am used to the little edge cases. Could it be better? Definitely. But I would
still consider YAML to be great in the domain where it excels: human-readable
configuration. Using it to store and transmit large amounts of data,
especially in ways where a human is manually editing the YAML, is a terrible
idea.

------
madhadron
Since we're all chiming in with configuration formats, here's two more nice
ones:

1\. If you're writing Lisp, just read S-expressions in.

2\. Use Python or Skylark and have a step that executes it into a
configuration format. Obviously this is not something you would want to use
for a data interchange format, but no one thinks that they can blindly run a
random hunk of untrusted Python. Right? ...Right?

------
transfire
One of the important things implementations failed to do was properly support
Schemas
([http://yaml.org/spec/1.2/spec.html#Schema](http://yaml.org/spec/1.2/spec.html#Schema)).
And this is still the case today. Had they done so, the safe load issue would
never have arose, and the loaders/dumpers would be properly configurable to
suit the needs of the application, e.g. don't support "Yes" and "No" as `true`
and `false`.

I once did about 98% of the work to support Schemas properly in Psych
([https://github.com/ruby/psych](https://github.com/ruby/psych)) but the
maintainer said he didn't want to "maintain it".

So, there you go. What else can one do? You can't blame the spec for decisions
of implementors.

(That's not to say the YAML spec couldn't use some improvements, but it's far
from "not so great".)

------
mixmastamyk
Yes, I love YAML in general. The complaints about big files can be alleviated
with a good editor that collapses blocks and draws indentation guides.

But the author is mostly right, when adding support for YAML to my code I
spend a lot of time disabling all of its nifty misfeatures. Wish it was simply
an indented JSON with comments and fewer quotes.

------
dragly
We recently published a paper suggesting an alternative to HDF5 [1] using
directories for objects, YAML for metadata and NumPy for data. Many of the
points in this article were raised by the reviwers or were worries we had
about choosing YAML as the metadata format. In the end, we decided to use a
subset of YAML with only basic tags, enforced quoted strings, no directives,
and no block scalar styles (fancy multiline strings). So far it has worked out
great. I hope it will make the format easier to understand for users and make
it possible to write faster parsers in the future.

[1] Shameless plug:
[https://www.frontiersin.org/articles/10.3389/fninf.2018.0001...](https://www.frontiersin.org/articles/10.3389/fninf.2018.00016/full)

------
blt
YAML and TOML both seem too complicated. Automatic date parsing? So many
different ways to specify the same nested hash table? I like json because
there's usually one obvious way to do what you want. It's a local minimum,
like C and Lisp. It's really too bad about the comments.

~~~
emodendroket
Way too much engineering time and energy has gone into handrolled, mutually-
incompatible solutions for passing dates in JSON.

------
nilsocket
What about Toml , I like it.

[https://github.com/toml-lang/toml](https://github.com/toml-lang/toml)

~~~
felideon
He literally recommended that as an alternative in the article.

------
jokoon
> Don’t get me wrong, it’s not like YAML is absolutely terrible – it’s
> certainly not as problematic as using JSON – but it’s not exactly great
> either.

Well at least it's the least worst then, do that make it the best?

Frankly I hope the future will be make with indented languages. Curly braces
languages often allows too much liberty, and it's annoying. The fact that go
enforce the curly brace style is really the tipping point of curly brace
languages.

Granted there need to be some good compromise for ambiguous details when
parsing an indented syntax, but readability matters much more than anything to
me.

~~~
Carpetsmoker
I like the "Ruby approach" of:

    
    
        if foo
            ...
        else 
            ...
        end
    

It solves some of the issues that you can have with Python, while at the same
time also avoiding the whole nonsense with the braces. I find it's a good
trade-off between the strengths of both approaches.

------
iGoog
If the problem YAML is size / formatting, that's something a good IDE can
simplify even more. For loading up configs that would otherwise be properties
files, I have found it to be quite clean. If you think of it a domain language
for configuration, YAML is great. It can be understood pretty much
intuitively, and it does some things quite well. On the other hand, when it
comes to exchanging data, JSON > XML > X12 EDI... EDI? Yes, it's still a
thing... a terrible thing...

------
DonHopkins
I recently posted about a technique I've been developing (and am very happy
with) for representing and editing JSON in spreadsheets, without any sigils,
tabs, quoting, escaping or trailing comma problems, but with comments, rich
formatting, formulas, and leveraging the full power of the spreadsheet.

[https://news.ycombinator.com/item?id=17309132](https://news.ycombinator.com/item?id=17309132)

>Recently I've been working on kind of the converse of this problem with JSON
and spreadsheets, and I'll briefly describe it here (and I'll be glad to share
the code), in the hopes of getting some feedback and criticism:

>How can you conveniently and compactly represent, view and edit JSON in
spreadsheets, using the grid instead of so much punctuation?

>The goal is to be able to easily edit JSON data in any spreadsheet, copy and
paste grids of JSON around as TSV files (the format that Google Sheets puts on
your clipboard), and efficiently import and export those spreadsheets as JSON.

>[...]

Since I wrote that post, I've cleaned up and refactored the code into a
portable little library that will run in the browser, or inside of Google
Sheets:

[https://github.com/SimHacker/UnityJS/blob/master/UnityJS/Ass...](https://github.com/SimHacker/UnityJS/blob/master/UnityJS/Assets/StreamingAssets/sheet.js)

Here's an example spreadsheet (also check out the examples in the other sheet
tabs):

[https://docs.google.com/spreadsheets/d/1nh8tlnanRaTmY8amABgg...](https://docs.google.com/spreadsheets/d/1nh8tlnanRaTmY8amABggxc0emaXCukCYR18EGddiC4w/edit?usp=sharing)

I haven't come up with a trendy marketing name for it yet (except for the
source file name sheet.js), because I think it's important to first discover
what it is by using and refining it for a while, writing some documentation to
explain it, and getting feedback from other people (work in progress), before
trying to name it -- otherwise you might end up calling it "Yet Another
<something it's not>".

------
s3m4j
Not agreeing or disagreeing, but I read both this and his other post about
JSON as configuration file and I have not seen him propose and argue for an
alternative.

~~~
reaperducer
He doesn't have to propose an alternative to have an opinion on it.

If I don't enjoy a movie, I'm under no obligation to suggest another. It's an
opinion. It can stand alone.

~~~
0xffff2
In your analogy, the alternative is to not see the movie. The analogous
alternative would be to... what? Not use config files? Doesn't seem like much
of an alternative to me.

------
kevinmgranger
> Can be hard to edit, especially for large files

How does this entire section not also apply to JSON?

~~~
Carpetsmoker
JSON is slightly easier since you have actual start/stop marks in the form of
`{`, `}`, instead of relying on 2-space indentation. (are there 8 or 10 spaces
there? Hard to see).

------
woolvalley
The alternative of writing executable code as your config language also has
it's own issues. It's why things like skylark was invented for buck & bazel:

[https://docs.bazel.build/versions/master/skylark/language.ht...](https://docs.bazel.build/versions/master/skylark/language.html)

------
ngrilly
JSON + comments + trailing commas + naked keys + multiline strings would be a
great alternative. Maybe something like JSON5.

~~~
kodablah
You'd really like HOCON[0] then. Just a bit too complicated to see wide
adoption I'm afraid.

0 -
[https://github.com/lightbend/config/blob/master/HOCON.md](https://github.com/lightbend/config/blob/master/HOCON.md)

~~~
ngrilly
Interesting, but, as you wrote, it's a bit too much complex, and there is too
much syntactic variations for my taste.

------
vortico
We need to standardize cson
([https://github.com/bevry/cson](https://github.com/bevry/cson)) and build a
compliant C parser. The format is simpler with fewer surprises as YAML, yet
able to handle more types of syntaxes in a straightforward form.

------
rusk
I came in here to rant that YAML _isn 't for the same purpose as JSON and XML_
but then thought I'd better RTFA and realised that _oh yeah one or two good
points here_.

The first criticism, where he's embedding exectutable code in YAML I kind of
have to agree with - that seems crazy. I don't know why YAML would support
this.

The remaining criticisms seem to relate to (a) the spec overreaching in terms
of complexity and (b) differences in implementation, which I guess is some
kind of an extension of (a).

I maintain however that YAML, JSON and XML are different.

If you want to make me feel cross and insulted give me a JSON file to edit. I
think JSON is probably the best commonly used format for M2M and storage
serialisation.

I wouldn't want you to use YAML for that though. There's two many _different
ways to do it_ and any kind of ambiguity never makes for good M2M.

For configuration-files it's great though, as long as you stay away from some
of the more exotic features I suppose. It effectively provides a "user
interface" of sorts by which your users can specify non-trivial configuration
details.

The complaints about overlong and overcomplex yaml files could be extended to
other commonly used formats.

With regards to XML, I'd say that YAML provides all the features you'd want to
use from that format in a format that's easier to hand-edit as text. XML is
"okay" for M2M but probably better for document storage where you have some
kind of custom editor.

It's horses for courses. I wouldn't want to use YAML where I'd want to use
JSON, or use XML where I'd use either of them, and neither of these have the
semantic richness of XML either and so wouldn't be appropriate in whatever
spaces XML should be used (which is far more limited I suspect than its
current span of applications).

Ultimately when each is used to its strengths they're not interchangeable
formats.

------
_sdegutis
I haven't found a markup language that I really prefer for configuration
files, but JSON has been reasonably nice to me in NPM configurations and VS
Code settings and I haven't ran into problems yet. I understand the motivation
for TOML but it has the same ambiguity problems that YAML does. There's
something to be said for not being _too_ flexible, or it'll fall into the same
trap as AppleScript did. Humans might prefer the convenience of ultra-
flexibility at first, but sooner or later our intuition will just not match up
with the actual rules, and we'll have to spend longer than we wanted to
looking up arcane syntax documentation. That's been my experience with YAML
anyway, like every single time.

------
fibo
A few considerations I do not agree with:

> YAML is insecure by default. Loading a user-provided (untrusted) YAML string
> needs careful consideration.

That is trivial. Everything you execute that come from outside is potentially
dangerous.

> It’s pretty complex

I would say, that YAML is more expressive and has more features. On the other
hand is true that TOML is a valid alternative in many use cases.

> 3.5.3 gets recognized as as string, but 9.3 gets recognized as a number
> instead of a string

That is correct, in my opinion, 9.3 is a float while 3.5.3 is a version. If
you want both to be strings, use quotes.

------
anyzen
> One might also argue that fixing it is as easy as replacing load() with
> safe_load(), but many people are unaware of the problem, and even if you’re
> aware of it, it’s one of those things that can be easy to forget. It’s
> pretty bad API design.

It is. At API design time, it would have been trivial to replace them with
`load()` (which does the same as `load_safe()` now) and `unsafe_load()` (which
does the same as `load()` now) and probably avoid this pitfall altogether.
Now? Much more difficult to solve.

------
EamonnMR
For what it's meant to do (conscisely express data in a human maintainable
way) it's still in a league of it's own. For configuring your apps, it's the
way to go.

------
smsm42
I think this is a problem of trying to serve all use cases at once. I mean, if
you're making a data exchange format, do you _really_ need it to be able to
execute arbitrary code? Isn't that inviting trouble? It's like building a bank
vault and then cutting a large hole in it and putting a plywood door on it -
just in case somebody would want to convert it to a restaurant later. Maybe it
would be better not to serve that particular use case at all?

------
tomxor
I'm not saying YAML is perfect or terrible (just like the author), however
most of the examples the author gives are going to be the same for any
language attempting to implicitly type values.

I think maybe YAML just went a bit too far: because everyone hates defining
key names with quotes when it's unnecessary 99% of the time, but it would have
been enough to relax that rather than go all the way to suggesting relaxing
all quotes and then attempting to infer value types.

------
jgalt212
Our shop are heavy users of YAML, and we've sort of backed our way into a
restricted subset of YAML. Some of them are config files, but others function
closer to DSLs.

I have not yet taken a look at strictyaml, but after years of use the spec
definitely needs YAML, The Good Parts Treatment.

One thing the author did not mention was how slow the out of box the Python
YAML parser can be. This can be sped up with a call to libyaml, but then you
lose the safe_load method.

~~~
gpoore
I suspect the speed issues are more a result of implementation details than of
being in Python. Last year, I created a config language for my own use that
supports some syntax very similar to YAML. My pure Python library can load
simple dict/list/string data 10x as fast as PyYAML, and nearly within 1.5x the
speed of libyaml
([https://bespon.org/#benchmarks](https://bespon.org/#benchmarks)). That's
while building an AST with source information to allow round-tripping and
supporting my own version of anchors and tags, so there's significant room for
improvement. I expect that a pure Python YAML library might be able to match
or beat the current performance of libyaml in at least some cases,
particularly for a restricted subset of YAML.

~~~
jgalt212
I did a few rudimentary benchmarking tests on in house real world data sets
and strictyaml was approx 10X slower than PyYAML's yaml.load (w/ out calling
out to libyaml) and yaml.safe_load.

I used the basic strictyaml.load function, without any schemas.

README page said speed is not a current priority, and that appears to be true.

[https://github.com/crdoconnor/strictyaml#strictyaml](https://github.com/crdoconnor/strictyaml#strictyaml)

------
minimaxir
Semirandom question: what's the proper file extension for a YAML file?

I've done informal polls on it and every time it's an _even split_ between
.yaml and .yml

~~~
kaslai
There are plenty of file formats that have multiple extensions in common use.
I can think of a few off the top of my head:

\- .jpg / .jpeg

\- .tif / .tiff

\- .htm / .html

\- .cpp / .cxx

It's frustrating at times, but not all file formats have One True Extension.

~~~
ythn
C++ is the worst offender: .cpp, .cxx, .cc, .h, .hpp, .hh, etc.

~~~
kaslai
I've even seen .c++/.h++ in the wild at least once.

~~~
erik_seaberg
Could be worse, could be uppercase .C/.H…

------
pippy
I got blindsided by another obscure yml phrasing rule. Sometimes people
couldn't pay online using our service and I couldn't figure out why. I tested
the hell out of the payment module on our dev and UAT environments, but I just
couldn't reproduce the issue.

Eventually I tracked it down to an ID inside a yml file. Turns out the live
environment was running in 32 bit mode which interpenetrated the number as a
string.

------
Pxtl
As somebody who heavily uses statically-typed languages and serializers, I'm
actually really enjoying reading about StrictYAML.

I love this bit explaining why schemas are essential to having a human-
friendly config file:

[http://hitchdev.com/strictyaml/why/syntax-typing-
bad/](http://hitchdev.com/strictyaml/why/syntax-typing-bad/)

------
transfire
I hope YAML Implementers will read this and give thought to adding proper
support for Schemas
([http://yaml.org/spec/1.2/spec.html#Schema](http://yaml.org/spec/1.2/spec.html#Schema))
-- they should be customizable by the application. This would resolve a number
of complaints and improve interoperability.

------
kvark
We used YAML from the start in Wrench tool for writing reftests for
WebRender[1]. Currently looking into a prospect of migrating all of them to
RON[2] as a better alternative, which has proven itself useful in WebRender
captures.

    
    
      [1] https://github.com/servo/webrender/
      [2] https://github.com/ron-rs/ron

------
geraldbauer
Nothing is perfect and critiques always welcome. Will add the article to
Awesome YAML - a Collection of Awesome YAML (Ain't Markup Language) Goodies
for Structured (Meta) Data in Text [[https://github.com/datatxt/awseome-
yaml](https://github.com/datatxt/awseome-yaml)]

------
shruubi
Honestly, I can't stand YAML. Every time I need to read a YAML file I also
need to bring up a reference for what the syntax is because to me, none of it
is immediately obvious.

Personally, I don't understand why all these projects have defaulted to using
such an esoteric markup language.

------
Waterluvian
Yaml has a lot of features that I think need to be used sparingly.

But wow do I wish JSON supported comments in some form.

------
asimpletune
I enjoy HOCON btw. Are there other configuration languages that any of y’all
like?

~~~
EamonnMR
There's this half baked one I built once because we wanted untyped config and
had never heard of JSON

[https://github.com/EamonnMR/Zond/blob/master/RedShift/src/co...](https://github.com/EamonnMR/Zond/blob/master/RedShift/src/core/StringTree.java)

It ends up looking like this:
[https://github.com/EamonnMR/Zond/tree/master/RedShift/assets...](https://github.com/EamonnMR/Zond/tree/master/RedShift/assets/text)

------
geoalchimista
Surprised to see that few people mentioned this one advantage that JSON has
over YAML: validation against a predefined Schema [1].

[1] [http://json-schema.org](http://json-schema.org)

------
TooBrokeToBeg
> it seems to be the case that the majority of libraries are unsafe by default
> (especially the dynamic languages), so de-facto it is a problem with YAML.

It may seem that way, but it's not. Re: cryptographic functions

------
tetron
Schemas and validation for YAML (and JSON):

[http://github.com/common-workflow-
language/schema_salad](http://github.com/common-workflow-
language/schema_salad)

------
andrei_says_
I hear the arguments, and I’d say YAML is unbeatable

\- when loaded safely

\- when used by humans to edit obviously structured data of hashes and arrays

\- and the humans are trained to double-quote potentially problematic entries.

This wins over xml or Json hands down.

------
transfire
[https://zedshaw.com/archive/stackish-an-xml-
alternative/](https://zedshaw.com/archive/stackish-an-xml-alternative/)

------
edibleEnergy
I've used YAML for years, and yeah, there's a lot of warts, but I love it for
small(ish) manually edited configuration files. I generally keep it simple and
it works.

------
earonesty
Library comparison: [https://github.com/cblp/yaml-
sucks](https://github.com/cblp/yaml-sucks)

------
k__
Is there something that looks like YAML, but with a smaller spec?

I know things like TOML and JSON, while they have good DX, for me YAML has the
better UX.

------
amelius
The most flexible configuration language is still a programming language. It
allows you to define e.g. callback functions.

------
ne01
Those reasons are exactly why I created mset. It's not as flexible as Yaml but
it is dead simple and works for many use cases. It is also very easy to
implement in any language and more importantly super simple to learn for end
users.

[https://github.com/sed-seyedi/mset](https://github.com/sed-seyedi/mset)

------
scandox
> What About:

> python: 3.5.3

> postgres: 9.3

> {'python': '3.5.3', 'postgres': 9.3}

Surely that's reasonable?

------
programmarchy
Anyone here use Swagger? Do you write your definitions in JSON or YAML?

~~~
Carpetsmoker
I generate Swagger/OpenAPI files, and use YAML because it's easier to read.

If I had to write OpenAPI files manually I'd probably choose to jump out of
the window.

------
krick
> {'Clemenza': True},

That actually made me laugh out loud.

------
forrestthewoods
What should I use for config files?

------
ythn
From a reddit comment:
[https://www.reddit.com/r/programming/comments/8shzcu/yaml_pr...](https://www.reddit.com/r/programming/comments/8shzcu/yaml_probably_not_so_great_after_all/e0zwlx3)

Seems like YAML tried too hard to be predictive of intent. I never got into
YAML myself simply because it seemed like "JSON, but less ubiquitous and more
hassle to find libraries that support it"

------
jlebrech
there should be a json standard with a header to describe the content.

also what was wrong with ini files?

------
Edmond
Object graphs are the answer to the endless iteration on the right config
format:

[http://codesolvent.com/config-node/](http://codesolvent.com/config-node/)

it is however difficult to pull off and requires productization, in other
words not low-level tooling in a text file.

~~~
mason55
> _difficult to pull off and requires productization, in other words not low-
> level tooling in a text file._

Is this supposed to be a feature? One of the great things about simple config
files is that you can use standard GNU tools to view edit, and diff them, you
can put them in source control, you can be sure that you can edit them on a
remote server no matter what's installed, etc.

Eliminating all those benefits would require an extraordinary jump in
functionality as a tradeoff, a jump in functionality that most things frankly
don't need.

~~~
Edmond
You're right regarding the value of text files. ConfigNode is really for
management of configurations for use cases where you can have huge JSON/YAML
files and you want to be able to manage them, facilitate dynamism (often using
templates)...support collaboration..etc...it is a gold-plated solution and not
necessarily suitable for simpler needs.

Here's another example of ConfigNode used to manage Akamai configurations:

[https://youtu.be/gcPAmpKo9fs](https://youtu.be/gcPAmpKo9fs)

Akamai configurations can be very complex and need to be maintained and
managed, you can't really do that effectively using text files.

