
YAML: Probably not so great after all (2019) - wheresvic3
https://www.arp242.net/yaml-config.html
======
mxscho
Earlier discussions:

[https://news.ycombinator.com/item?id=17358103](https://news.ycombinator.com/item?id=17358103)

[https://news.ycombinator.com/item?id=20731160](https://news.ycombinator.com/item?id=20731160)

------
hn_throwaway_99
One of my favorite bugs of all time, which turns out to be a pretty common
yaml bug (the article touches on a similar example), was when we used yaml to
configure some localizable data. Everything worked fine until Norway came
online. Sweden worked fine, so did Denmark and every other country, but the
app crashed when it loaded the config for Norway.

Turns out the country code for Norway, "no", is interpreted as boolean false
when unquoted in yaml.

~~~
azaras
But it is because you do not use a yaml serializer.

If you use a template you have to put "" for strings.

~~~
eyelidlessness
If your human-writable format requires a serializer to avoid unexpected types,
it's not human-writable. If your human-readable format requires special cases
for certain keywords but treats others as strings, it's not human-readable.

~~~
yjftsjthsd-h
> If your human-writable format requires a serializer to avoid unexpected
> types, it's not human-writable.

That sounds like there are no human-writable formats.

> If your human-readable format requires special cases for certain keywords
> but treats others as strings, it's not human-readable.

 _That_ I can 100% support; _sometimes_ needing to quote strings is a recipe
for disaster.

~~~
eyelidlessness
> That sounds like there are no human-writable formats.

I'm not sure why it sounds that way. There are formats where types are
explicit, either by stating them or by their delimiters or by their contents.

------
_bxg1
I've never understood the love for significant whitespace. I see where the
idea came from - "No more missing semicolon errors! Woohoo!" \- but it
should've been clear after trying out the idea for five minutes that it was
not at all worth it. It constantly causes trouble and all just to save a
couple of keystrokes.

Though: now that I think about it, most of the problems happen at the block
level, not the line level. So maybe significant newlines are fine but not
indentation.

~~~
ludamad
Maybe after trying it out for five minutes you make up your mind - but mine
was decidedly for it

~~~
throwaway_pdp09
Always liked the idea of whitespace-only indenting but always suspicious of
it. Problems I've found are that automatically generated code can't just be
dumped in with { and } around it.

A colleague spent a few hours tracking down a python bug caused by a bad merge
that messed up the indent without it outright causing the code to crash, just
produce wrong output. That's IMO too expensive a bug type to permit just to
save a few keystrokes. But YMMV as they say.

~~~
mixmastamyk
It is a great deal, millions of keystrokes and code reading impediments
avoided over _decades_ vs. one exception a colleague of someone on a
discussion board had once.

That’s a 1000x advantage in my book. Not mention exceptions are often avoided
with a professional editor with whitespace highlighting and indentation
guides.

~~~
viklove
> one exception a colleague of someone on a discussion board had once

You're not even trying to have a reasonable discussion about this, are you.

At least this guy has a datapoint. You just have a hunch that you can write
code faster if it's formatted slightly differently. I don't think it's true --
I don't think typing semicolons and curly braces slows down my coding enough
to warrant omitting them. Most of my time coding is spent thinking and
reading, probably less than 15% of my time is spent typing. So if I only type
semicolons and commas 5% of that time, that means I'm saving 0.75% of my time
by not typing them, which is about 14 hours a year if I work 40 hour weeks.

I've definitely debugged a yaml error or two due solely to formatting in the
past year, which probably cost me at least 4 hours of productivity. If my
entire codebase was formatted solely by indents and returns, I imagine that
number would balloon significantly, which is why I opt to use semicolons and
other formatting characters.

Also, they're not "reading impediments," at least not universally. Being able
to use my code-editor to jump to the end of a code block (by matching a { or
}) has been immensely useful in my experience.

~~~
mixmastamyk
The detriment to readability of punctuation chars is documented. I write code
all day in .py and .js and it is obvious which is easier to read. I won’t
bother to mention .pl.

If you need 4 hours to fix a yaml file, then what can I say? Perhaps you need
better tools, or reduce tech debt.

~~~
viklove
> If you need 4 hours to fix a yaml file

Yes, recognizing a bug, investigating it, sourcing it to a yaml file, finding
the offending line, fixing it and testing it, and getting the fix merged up
probably took me a minimum of 2 hours, and I had to do it at least twice.

You just handwave away every argument that is brought against you, without
actually thinking about what you're saying or if it's even true. You're the
worst type of engineer.

~~~
mixmastamyk
Yeah, the worst type, who has never faced/complained about/belabored
whitespace issues in over twenty years. Because they are shown trivially in
any editor. As opposed to the hysterical type who needs hours to fix a text
file.

------
eyelidlessness
Nearly every time I have to make any changes to a CI environment (or set one
up for a new project), I end up with a stream of increasingly unintelligible
and often increasingly grumpy commit notes. It's almost always because YAML,
as a format, is confusing and frustrating to write.

Structured data with only vague, and often misleading, indication of its
structure is awful to work with. Sure, it's "human readable", but if your data
is of any real complexity you're almost certainly better off just machine-
generating it.

~~~
crispyambulance
It makes me grumpy. Even the self-ironic name "Yet ANOTHER mark-up lanuguage"
is a set-up for disappointment.

My personal belief is that XML was "good-enough" having been engineered for
many use-cases and easy to comprehend for what it provides.

It's a tragedy that XML was abused so vigorously in the early naughts. People
got so sick of it because they were compelled to edit XML manually in a text-
editor with shitty to none schema support. It didn't help that the worst
monstrosities of that era (soap and ws-* crap) heavily invested in turgid,
badly designed XML.

I think that if there had been some more effort on tooling and good-practices,
XML could have remained popular and we would not have had jason, yaml, and
whatever awful thing is next.

~~~
thinkloop
> XML could have remained popular and we would not have had jason

One of the selling points of json is file size, xml started dying when people
realized that more than half their bandwidth was going to xml structure rather
than actual data

~~~
Thiez
Is the difference in size really significant? When the files are small it
doesn't matter much, and when they are large you can throw some compression in
the mix. I've never heard filesize as an argument for json before.

~~~
thinkloop
Imagine a table of numbers, the xml would be 90% repeated header names and
structure. Now multiply that by hundreds of little Ajax requests, it added up.

------
teknopaul
Heartily agree with this post.

One of the things I most dislike about yaml is that they persuaded JSON to
remove comments from the spec for some psuedo compatability nonsense. Without
comments, json is less useful for config files and working/documented
examples.

I have written a pre-parser to permit comments in JSON and for nodejs apps I
use javascript as config. Naturally where security is not a concern.

Yaml always seemed like a mess to me.

Liking Toml, decent compromise.

And linux style: name space value, hash and semi for comments. Unless I really
need a heirachy.

~~~
williamdclt
> I have written a pre-parser to permit comments in JSON and for nodejs apps I
> use javascript as config. Naturally where security is not a concern.

JSON5 allows comments and a few other niceties. I'd trust JSON5 parsers more
than my own hand-rolled one

------
topkai22
I find YAML distressingly hard to work with. Just one example- because it’s
white space delimited, copying and pasting a block of code will often blow up
between docs based on nesting levels. This is intensely frustrating when the
setting is tied to a CI pipeline that takes 5 minutes to get to the error.

The human unreadability if JSON is greatly exaggerated. While you can get
horrible looking JSON, you can also then pretty print it into something much
better. If we could just all agree to allow comments into a spec it’d be fine.

~~~
morelisp
> This is intensely frustrating when the setting is tied to a CI pipeline that
> takes 5 minutes to get to the error.

I also detest this, but it's not really about YAML - it's ridiculous that so
many of our tools no longer allow any validation / linting of files before
attempting to use them; some that do require the "full service" running to
provide sufficient context; even those that don't don't always give good (i.e.
consistent, structured) errors. This problem would remain if the format was
JSON, TOML, or even XML when working without schemas for every namespace.

The worst is the tools that layer Jinja on top of the YAML
(Salt/Ansible/Puppet) which is basically impossible to validate statically. At
least with GitLab, Docker Compose, and Kubernetes I have some hope - but
integration into other tools is awful.

~~~
williamdclt
CircleCI has a CLI, you can `circleci config validate` (or something like
that) and it will check that what you wrote is valid YAML and that it's a
valid CircleCI config (so syntaxical and semantic validation). Very useful,
one of the many thing that make me prefer Circle over the other solutions I
tried (Travis, Jenkins and a couple other I forget)

------
jrochkind1
Since he opens by referring to a similar argument against JSON for human-
editable configuration files, I wondered what format he did like for that.
Answer at the end appears to be TOML, if you must have one at all.

> Don’t get me wrong, it’s not like YAML is absolutely terrible – it’s
> probably better than using JSON – but it’s not exactly great either. There
> are some drawbacks and surprises that are not at all obvious at first, and
> there are a number of better alternatives such as TOML and other more
> specialized formats.

> One good alternative might be to just use commandline flags.

> If you must use YAML then I recommend you use StrictYAML, which removes some
> (though not all) of the more hairy parts.

(I do agree YAML is in retrospect a mistake. The reasons why remind me in some
ways of the problems with Markdown).

(Oddly, I can't seem to find an up to date TOML parser in ruby that supports
the 1.0 spec...)

------
crehn
Fun fact: YAML is a superset of JSON. That is, any JSON is also valid YAML.

YAML is a complex and unintuitive mess that allows doing every thing in a
million ways. I’m surprised it ever got so much traction. TOML is a breath of
fresh air next to it.

~~~
bmn__
> Fun fact: YAML is a superset of JSON. That is, any JSON is also valid YAML.

That's false. [https://metacpan.org/pod/JSON::XS#JSON-and-
YAML](https://metacpan.org/pod/JSON::XS#JSON-and-YAML)

------
moron4hire
Wait, what? It's built in to the format that it can execute arbitrary shell
commands?

"Who approved this?!"

I can't imagine the cluster fuck of ideas in a person's head to lead to
thinking that this was an ok design for a configuration file. The person who
designed this and I just don't live on the same planet.

And here I thought the reason I didn't use YAML was because the syntax looked
stupid.

~~~
ezrast
No, it's built into the format that implementations can extend it with their
own types. Then library authors decided it would be neat if _any_ object in
their language could be serialized as yaml, and so extended it with types that
happen to be able to execute arbitrary code, because that's what objects do in
dynamic languages.

A more conservative implementation wouldn't have that particular
vulnerability.

------
Animats
YAML allows escapes to executable code?

    
    
        !!python/object/apply:os.system
        args: ['ls /']
    

Who put that backdoor in?

~~~
bmn__
That's a leading question.

Authors of some libraries did not consider the security implications of the
object serialisation part of the spec. I assign the blame to them for not
looking beyond their own limited horizon, metaphorically speaking. When they
made their libraries, the reference implementation was already available and
it was secure by default.

~~~
Animats
A "call arbitrary external program" feature does not get in there by accident.

~~~
bmn__
It's not a "call arbitrary external program" feature, but an "object
serialisation" feature that has security implications which some implementers
did not handle correctly because of their language parochialism and lack of
experience.

That does make it closer to an unfortunate (but entirely avoidable) accident
rather than what you believe, that evil people maliciously and intentionally
added a backdoor. You can keep believing it, but that does not make it any
more true.

------
pmoriarty
My biggest gripe with YAML is its meaningful whitespace.

Debugging nearly invisible indentation problems can be such a pain.

~~~
meowface
I'd argue that's the defining feature, though. That's one of the main reasons
people use it over JSON.

~~~
karussell
It is not so much about indentation vs. brackets. For me the main reason over
JSON is that YAML works without quoting the entries and the possibility of
comments. And btw: every JSON file should be a valid YAML file in theory.

------
nojvek
When I first saw json, compared to XML, it was so simple. As a fairly new
programmer I could say to myself “hey I can write a parser and dumper for this
quite easily”.

When I first saw YAML, it was nice but it felt a bit too complicated.

All I really want is Indented JSON. New lines instead of commas. That’s it.

Json is fast to parse. See simdjson doing it at > 2.5GB/s. One can’t do this
with YAML where no could mean many things.

And god I hate k8s for their bajillion yaml configs. Thank god for jsonnet
that can dump to yaml. Jsonnet is truly nice and makes working with json like
configs a pleasure.

------
save_ferris
I see a lot of criticism of YAML, and I’ve looked a few other minimal
configuration languages like TOML as well.

Serious question: why doesn’t this language space have a universally adopted
candidate like SQL?

~~~
stickfigure
For a brief period, it did: XML

But everyone thought "that's too complex" and so reinvented something simpler.
Then kept adding features, and more features, and now YAML is more complicated
than XML.

This process will repeat ad infinitum.

~~~
pjmlp
Thankfully in Java and .NET land it is still mostly about XML.

It is so ironic to see these fads come and go, and then one wonders why we are
so cinic regarding newcomers.

~~~
reallydontask
.net core uses json for config files

~~~
pjmlp
Still using .NET Core 1.x?

One can use the XML Configuration Provider just as well.

~~~
reallydontask
> Still using .NET Core 1.x?

We're on a mix of 2.2 and 3.1 where the default still is json or have i got
this wrong?

~~~
pjmlp
Just because it is the default, it doesn't mean it cannot be changed.

I just thought that XmlConfigurationProvider wasn't in 1.x, as I avoided that
crash train with multiple reboots how projects and .NET Standard was supposed
to look like.

So I stand corrected regarding when it was introduced.

------
karussell
It is so easy to complain about something, but seriously: what are the
alternatives for human readable config files?

~~~
m4r35n357
For a scripting language, how about the language itself?

~~~
baq
Turing complete configuration is a no-no.

~~~
gruez
Why?

~~~
baq
I want my configuration to be guaranteed to halt for one, I don’t want my
configuration to have the ability to open sockets, I don’t want configuration
to import business logic or 3rd party modules, etc.

------
Mikhail_Edoshin
The specification of YAML is about three times as long as XML 1.0, but while
XML includes a whole grammar-based validator with such niceties as referential
integrity of IDs and default values for omitted parameters, YAML spends all
this on syntactic sugar.

------
rs23296008n1
YAML isnt my idea of clarity but thats fine. I tend to go with using a python
script to generate a json file. This supports comments, if-else decision
making, modules and all sorts of other smarts. Tell users not to edit the
resulting json file - its readonly.

Python gives you all the power of scripting including validation and json
gives you the easily readable format for both humans and machines. Great for
knowing exactly what the settings evaluated as.

And when you don't need the power of full expressive python script, you can
just json.dump() a python dictionary and be done with it.

I've also used sqlite as a config file format and that is very polite and
easily read from anything I use.

------
rudolph9
[https://cuelang.org](https://cuelang.org) is a very nice alternative that has
nice import/export support for yaml among others

------
mrbonner
I have not seen anything that could come close to the robustness and
flexibility XML has to offer. When being used for configuration XML seems to
be the superior choice. People tend to give XML a bad name in RPC usage for
being verbose. But, for config it’s perfect for me.

There are tons of IDEs supporting XML (syntax highlight & collapsing brackets,
etc...). XML schema is also great to ensure the config is conforming. I keep
thinking we are trying to reinvent a worse wheel here.

------
beefbroccoli
I just recently replaced a bunch of YAML in a project with JSON. At face value
the YAML still _looks_ easier to grok, but I kept needing to periodically
think about YAML<->JSON. At one point in the middle of yet again running JSON
through a YAML conversion process to see how I would write something, it
dawned on me that YAML wasn't saving making my life easier it was making it
harder.

------
cachestash
The opening sentence is incorrect.

yaml.load is now a wrapper around yaml.safe_load that negates the risks he
highlights

------
azaras
YAML in kubernetes works very well.

If yaml is not working for you have to search a serializer format that fit in
your project.

There is not silver bullet.

~~~
morelisp
> If yaml is not working for you have to search a serializer format that fit
> in your project.

If I need a serializer format, it's not human-readable/writable, and I might
as well have something like XML or s-exp that are easier for many tools to
work with, more easily composable, easier to generate automatically, etc.

> There is not silver bullet.

This phrase needs to be retired. It's true, but invoked far too often to
excuse shit bullets.

~~~
j0057
Moreover, a silver bullet is meant to be especially effective in very specific
scenarios, ie. shooting werewolves or vampires, whereas in colloquial usage
'silver bullet' refers to a thing that works well under all circumstances.

------
mixmastamyk
Use strict yaml, solvable problems solved.

