i honestly have trouble understanding the love for yaml when decades ago everyth...

skywhopper · on Oct 17, 2019

I don't really understand what you're saying here. It's definitely not a matter of being "cool" or not.

There's always going to be a need for some structured static configuration file format. Be it XML, INI, TOML, JSON, YAML, properties files, or whatever.

From my perspective, YAML and JSON have been more successful and well-liked than XML because they map much more directly to the basic data types common to all programming languages. How do you represent a list or a map in XML? Well, it depends...

Besides missing straightforward ways to map common data structures, XML is also way more verbose and much harder to read and write by hand than YAML and JSON. And no, there really is no way to easily map between XML and these languages. Again, how do you specify a list in XML?

Add to that the fact that for most use cases, marshaling and unmarshaling YAML can be handled directly with common libraries. But to parse XML into your internal data structures? You're going to have to write code, or decide on some schema to encode your data in before converting to XML. So XML didn't actually solve your problem of "how do we serialize this data?" It just provided a framework within which it was possible to write further standards.

Add onto all of this, that XML pretty early on started adding layers of confusing and contradictory standards and associated tools--XML Schema, XML Namespaces, XPath, XSLT. And still, none of those things solved the underlying problem. They just provided the framework.

And to that that XML is much, much more expensive to parse than JSON and YAML...

So I guess I don't get why you are confused. XML addresses a different set of problems than YAML, and it does so in an overly-complicated manner that's both human- and machine-unfriendly.

baq · on Oct 17, 2019

serialization is explicitly not the problem being discussed. JSON also sucks at serialization but much less than YAML or XML both. i don't care that much about serialization.

what i care about is writing yaml for the purpose of configuration and not have any feedback about whether the data i've prepared by hand is actually a valid configuration. not having to have a schema is a bug in the spec for this use case in all those nice acronyms and shortcuts you've listed above and they're all guilty of it. i'd like my configs strongly typed and well documented and none of the above helps developers do that - and that's where my confusion comes from.

rkangel · on Oct 17, 2019

The fact that they're isomorphic to machine partly misses the point. Yaml is immensely more friendly on the human to write (and read). Yaml is used when people need to write declarative instructions to machines, and it does a good job of that. XML is much more of a pain to read and write by hand.

klodolph · on Oct 17, 2019

I used to think YAML was friendly for humans to read. Then I wrote a parser for it, and discovered all the weird corners, edge cases, etc. I now consider it to be a fairly user-hostile format, which should be avoided in favor of just about everything else (XML, JSON, TOML, text protobuf, etc are all more friendly).

For example, consider this map of regions in YAML:

    regions:
      northamerica: [ca, us, mx]
      scandinavia: [dk, no, se, ax, fi, fo, gl, is, sj]

Spot the error!

Writing a parser is also a bit of a nightmare, because there are a bunch of features which can turn a bit dangerous if you’re not careful—things like cyclic graphs or declaring types of objects. These are complete non-issues for the other formats I listed above—they’re all trees, and it’s very unusual for parsers to let you instantiate unintended types with those formats.

bruth · on Oct 17, 2019

> Stop the error!

I know this is rhetorical, but I've been bitten by this enough times so for those who don't know `no` will translate to a boolean false.

aairey · on Oct 17, 2019

Am I rhe only one who likes single quotes around literal strings?

minusf · on Oct 17, 2019

yaml is not nice, but just quote every string that is a string and many corner cases go away.

marios · on Oct 17, 2019

Thanks. I was staring at the snippet wondering. I'm not all that familiar with YAML, so I thought perhaps all the values needed to be quoted rather that just written as is.

skywhopper · on Oct 17, 2019

Curious if you wrote parsers for the other languages you claim are easier. YAML has problem areas, particularly around implicit booleans, but languages without any comment syntax (ie JSON) can not be considered human-friendly. And XML is not even the same sort of language as the rest of these.

I understand thinking YAML makes the wrong tradeoffs, but if you think it's less friendly than XML, then you haven't really worked with XML.

klodolph · on Oct 17, 2019

> Curious if you wrote parsers for the other languages you claim are easier.

Yes. YAML was a damn mess compared to the others. You can get a rough estimate of how much by looking at the size of the specs—the XML spec is a fair bit shorter than YAML’s, and if you drop the part about DTDs (which are used less these days) the difference is even bigger. The TOML spec is far, far shorter than either one and the JSON spec makes the TOML spec look big.

I write a lot of parsers. I think it’s fun.

> …but if you think it's less friendly than XML, then you haven't really worked with XML.

If you want to talk about formats, let’s talk about formats. If you make claims that I must be inexperienced because I disagree with you, then it’s just rude.

I have done a few reasonable size projects with heavy XML use. A build system, some work with RPCs, and a web app where I wrote a ton of data for it in XML format, by hand. I also wrote an XML pretty-printer and a YAML pretty-printer. I did a conversion of the XML build system to YAML. I thought it was a bad tradeoff, so I reverted it. Since then I’ve migrated to Bazel. All this experience is a mix of hobby projects and professional.

The bad for XML—it’s more verbose. You have to decide on your own mapping between XML and data. That’s it, as far as I’m concerned.

My personal sense of it is YAML is in a pretty awkward place—it only makes sense for human authoring, not data exchange. My experience with it is that people will naturally want to automatically generate things that they would otherwise have to write by hand. So if you draw a Venn diagram, the YAML use cases are “human authored but not machine generated”.

If we think of using these formats for configuration, then the BIG problem is the sliding scale between pure-data approaches to configuration and using code for configuration. As systems mature and get more complex, the configs often acquire features of programming languages, or parts of the config gets rewritten in code. This is where YAML really suffers. XML is a bit easier, either to extend to add these kind of features or to emit from code.

tannhaeuser · on Oct 17, 2019

XML is just a canonical form and proper subset of SGML always requiring quotes around attribute values, all start- and end-element tags explicitly specified, no short reference (Wiki syntaxes), nor other constructs which can be (unambigiously) omitted in SGML as directed by a DTD grammar. As such, XML is a machine format rather than a format intended for editing by humans, and it's odd to complain about XML being unfriendly to edit when that's what SGML is for.

baq · on Oct 17, 2019

my point is that yaml isn't easier to write at all; it isn't as verbose but it bites you all over the place with unexpected behavior and the fact that validating the schema isn't the same as checking the syntax is super frustrating as you can create a valid yaml file with a typo and it'll be an either invalid or noop configuration. i'd rather have a proper DSL, preferably strongly typed.

klodolph · on Oct 17, 2019

I didn’t understand the XML hate either. It was just a bit annoying to parse, depending on the language and ecosystem you used. It was a little verbose, but so what?

majewsky · on Oct 17, 2019

The biggest problem to me is that XML is not a data serialization language, it's a document markup language. In documents, the distinction between attributes and content makes sense. In data serialization, the choice of whether a given datum is an attribute or a text content appears rather arbitrary. Should I write this?

  <book>
    <title>XML Cookbook</title>
    <author>Jane Doe</author>
  </book>

Or this?

  <book title="XML Cookbook" author="John Doe" />

Now attributes don't work when there are multiple values, so I guess I should use attributes for single values and child nodes for lists:

  <book title="XML Cookbook">
    <author>Jane Doe</author>
    <author>Tim Pickens</author>
  </book>

But that rule also has problems. If I decide to include markup in the title, it suddenly needs to be a child node again:

  <book>
    <title>The <strong>Awesome</strong> <abbr>XML</abbr> Cookbook</title>
    <author>Jane Doe</author>
    <author>Tim Pickens</author>
  </book>

Also, "author" is a misleading name for a field that is actually an array, so should I actually use an "authors" node to make that clearer?

  <book>
    <title>XML Cookbook</title>
    <authors>
      <author>Jane Doe</author>
      <author>Tim Pickens</author>
    </authors>
  </book>

Or maybe:

  <book>
    <title>XML Cookbook</title>
    <authors>
      <person name="Jane Doe" />
      <person name="Tim Pickens />
    </authors>
  </book>

Now compare to this to YAML:

  book:
    title: XML cookbook
    authors:
      - name: Jane Doe
      - name: Tim Pickens

Or even just:

  book:
    title: XML cookbook
    authors: [ Jane Doe, Tim Pickens ]

I need to make way fewer design choices when writing that down. In fact, I probably don't need to design anything since that's already the data structure that I've written down as a type somewhere in my code. That's why it's a good idea to use a data serialization language for, well, data serialization.

klodolph · on Oct 17, 2019

Thank you for articulating this, but I’m familiar with these complaints. XML does give you a lot of freedom to format your data in different ways, which can get you into traps. I’ve run into those traps before, like the decision between attributes and child nodes.

This doesn’t add up to XML hate, for me. The way I would probably write the document is:

  <book>
    <title>XML Cookbook</title>
    <author>Jane Doe</author>
    <author>Tim Pickens</author>
  </book>

This is a fairly boring way to write out a document and while you can bikeshed all you want, I don’t see the possible bikeshedding as a major drawback. The above is concise and easy to understand.

I wouldn’t use YAML as a basis for comparison. YAML has a fair number of oddities and inconsistencies that led me to stay away from it. XML is at least consistent and simple, there are not really any surprises to speak of and there are plenty of tools for modifying XML documents even when you don’t have the schema. For YAML, although there’s a spec, it’s complicated enough that different implementations are inconsistent with each other and there seems to be some inertia at work here.

There’s also the downright bizarre set of regexes that YAML uses to recognize bare strings as other types, that means that '3.3.0' is a string, but '3.3' is a number. If I write 'ni' that’s a string but 'no' is a boolean. I personally find it harder to read or author YAML due to all these rules. You also have to be a bit more careful to sanitize YAML input due to things like the way !! is handled by various libraries, or the way YAML allows object cycles. It gives you too much rope to hang yourself, has too many surprises, and too many footguns. The fact that YAML is a bit more concise just isn’t enough of an advantage.

    # Quiz: What value does this give you when parsed?
    MAC Address: 11:02:03:04:05:06

For data serialization, I would stick to something like Protocol Buffers. You get a text and binary format, a schema, consistency across implementations, and good tooling.

XML is workable in a lot of situations and in some cases the verbosity makes it a bit more self-documenting than e.g. JSON.

TOML would be my choice for config files that I maintain.

jacques_chester · on Oct 17, 2019

I've grown to like Avro, mostly because of its ability to support schema evolution for reader and writer independently. You get the usual niceties around binary wire format, schema, dynamic parsing and/or code generators etc.

tracker1 · on Oct 18, 2019

Thank you... this pretty much sums up most of my disgust regarding XML in general. And while JSON is more universal, YAML is much more accessible for humans.