Hacker News new | past | comments | ask | show | jobs | submit login
YAML: It's Time to Move On (nestedtext.org)
265 points by firearm-halter on Nov 14, 2021 | hide | past | favorite | 392 comments

I don't like YAML and would like to move on, but I hope we don't move onto this.

I think it's crazy that when I add a string to an inline list, I may need to convert that inline list to a list because this string needs different handling. I think it's crazy that "convert an inline list to a list" is a coherent statement, but that is the nomenclature that they chose.

I don't like that a truncated document is a complete and valid document.

But what is most unappealing is their whitespace handling. I couldn't even figure out how to encode a string with CR line endings. So, I downloaded their python client to see how it did it. Turns out, they couldn't figure it out either:

>>> nt.loads(nt.dumps("\r"),top="str") '\n'

I wish people would stop trying to write programs for which there are no interpreters, compilers, or linters:

    name: Install dependencies
        > python -m pip install --upgrade pip
        > pip install pytest
        > if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi
That is a program that is hiding in the bowels of a "nestedtext" document ... It is no better than a program that is hiding in the bowels a JSON or YAML document.

We all have to deal with this, but it is beyond stupid.

    [Install Dependencies]
Then, write `install-script` in whatever language you want ... verify it works. It should have tests. etc etc etc.

I don't think it matters much if this is inline or in separate file. If you want to test your tests, "yq -r .run input.yaml | sh -e" works as well.

In fact, if I really wanted to test my tests, I'd say that directly testing the corresponding clause is the more comprehensive approach. For example, what if someone accidentally changes the line to read:

? then your test of "install-script" will not catch anything. But if your test runs "yq -r .run | sh -e", then it will catch that error. And you can still forward to a script if you wanted to.

So let's keep inline scripts, they are very reasonable methods for just a few commands.

Depending on the source control tool you may lose syntax highlightning, you, most likely, lose linters and even copying those multi-line commands to shell becomes cumbersome. I consider inlining example from GP's comment awful

V minor point: new yq versions use 'e' instead of '-r'.

It would be nice if YAML wasn't horrendously abused the way it is. You have CI pipelines that let you construct DAGs to represent your builds, but you need several thousand lines of YAML and a load of custom parsing to get programming constructs in the string types, for example. And then each provider has its own way of providing those.

I don't have to re-read manuals describing how to do if/else in Ruby or Java or Lisp, but as soon as yaml and some 'devops' tooling is involved, I have to constantly jump back and forth between the reference and my config.

The main point being that the problem isn't the file format but the products that continue to push it, presumably because hacking stuff on top of `YAML.parse` is less effort than designing something that fits the purpose.

Yeah. A lot of times I find myself thinking YAML is like a really awful programming language. You can sort of do conditional logic and loops, but usually I find it hard to follow what's going on.

For build systems, I always liked the idea of Gradle where the core functionality was simple and declarative, but with the option to use a real programming language for things that weren't simple. For example, integrating installers or form builders (pre-processing) into a build are things I would consider non-trivial if there aren't official plugins, but it was still relatively easy to do with Gradle.

The biggest problem I always had with Gradle was that I didn't like Groovy and I always though there was a missed opportunity to have a statically typed build system with a solid API/contract and all the fancy tooling like auto-complete that you get with statically typed languages.

I see JSON5 mentioned a lot in the comments. In terms of CI / build systems, I feel like something built with JSON5/TypeScript could be really good. I'd be really happy using TypeScript for configuring things like build systems where there shouldn't really be an argument for needing it to be usable by non-programmers.

Personally I feel like I've spent way to much of my life debugging YAML syntax issues.

If you're happy to go lispy, there's Babashka [1], a Clojure without the JVM. It has built-in support for 'tasks' designed to make writing build scripts easy.

[1] https://babashka.org/

Does the Kotlin support in Gradle solve your problem? https://docs.gradle.org/current/userguide/kotlin_dsl.html

My experience with Kotlin gradle scripts is worse than Groovy. For example, given the following valid groovy/kotlin gradle program:

    dependencies {
What would you expect to see between the curly braces? IntelliJ IDEA which supposedly has full support for the gradle DSL both for Groovy and Kotlin offers only generic suggestions. Common function calls such "implementation()" or "testImplementation()" are not suggested. If you do use those functions, no suggestion is made for their parameters. Because Gradle's DSL is built on top of a general purpose language, it loses the benefits of a DSL (constraining the set of possible configurations and guiding the user towards valid configurations).

The key benefit of the Kotlin DSL is that in this precise example, IDEA does suggest valid stuff: https://imgur.com/a/vFYNIU1

Kotlin DSL is miles ahead of Groovy in terms of discoverability and IDEA integration. With Groovy DSL, most of the build script is highlighted with various degrees of errors and warnings; with Kotlin DSL, if something is highlighted, it is a legitimate error, and vice versa - if no errors are detected by IDEA, then it is almost certain to work.

There were rough spots of IDEA integration a couple years ago, but now it is close to perfect, within Gradle's limits of course (due to sheer dynamic nature of it, some things are just not possible to express in a static fashion, unfortunately). The biggest obstacle to Kotlin DSL use might be that some of the plugins use various Groovy-specifc features which are hard to use from Kotlin, but thankfully most of the plugins either fix those, or are rewritten in Java or Kotlin instead.

This was one thing I found difficult learning Gradle: its seeming complete lack of autodiscoverability.

I expected it to catch on, but I think a lot of people are sticking with Maven.

There's a huge gap in Java build tool space for a tool that is simple and easy to learn and can cover 90% of projects' requirements. I have this feeling that we're in the "subversion" days of java build tools and the day someone introduces "git" people will wonder why we suffered with Gradle and Maven for so long. If I had time I would be looking into building this.

predating Gradle was a tool called gant. It was simple, intuitive and did 90% of what every project could want. Ironically it was Groovy based as well. But instead of the Gradle arcane magic based configuration it was literal, direct, a simple extension of Ant that came before it. I liked it much better, but someone decided they could make a business out of Gradle and gant got deprecated and here we are.

I found it fairly simple to build Gradle plugins with Kotlin. If anything, the problem was just having the patience to actually find the right documentation in the first place, and understand what was being described. The main problem I faced there was that I wanted a plugin to configure dependencies for the project it would run against and the docs around dealing with dependencies and detached configurations were a bit confusing.

I do find it curious that a lot of these tools get seen as basic task runners despite offering much more potential.

It's always the same trajectory with declarative programming. It starts with "it's just configuration, we need something simple". Then users come with use cases which are more complex. Then you have programming language on top of configuration language syntax.

* https://ilya-sher.org/2018/06/30/terraform-becomes-a-program...

* https://ilya-sher.org/2018/09/15/aws-cloudformation-became-a...

Very much so. A good few years ago I got annoyed that I couldn't have change mutt configuration the way that I wanted, because it has a built in configuration language which doesn't allow complicated conditionals etc.

(There are workarounds, and off-hand I can't think of a great example, but bear with me.)

In the end I wrote a simple console-based mail-client, which used a Lua configuration file. That would generally default to using hashes, and key=value settings, but over the time I used it things got really quite configurable via user-defined callbacks, and functions to return various settings.

For example I wrote a hook called `on_reply_to`, and if you defined that function in your configuration file it would be invoked when you triggered the Reply function. This kind of flexibility was very self-consistent, and easy to add using an embedded real language.

Later I added some hacks to a local fork of GNU Screen, there I just said:

* If the ~/.screenrc file is executable, then execute it, and parse the output.

That let me say "If hostname == foo; do this ; otherwise do this .." and get conditionals and some other things easily. Another example was unbinding all keys, and then only allowing some actions to be bound. (I later submitted "unbindall" upstream, to remove the need for that.)

Some even start with a programming language and pretend it's declarative...

IMO, it's only declarative when it's a data model which is easily parsed by multiple languages/systems where it's needed.

What's really sad is that XML had a much better ecosystem around this for ages. I'd very much rather deal with XQuery or even XSLT to construct build trees, than the current crop of ad-hoc YAML preprocessors. At least the XML stuff had a consistent type system underneath!

XSLT is an absolute horror and not something I would want to deal with again. It feels like some weird academic experiment in an XML declarative programming language that should never have made it to print.

If something needs the flexibility of a programming language, why not use a real one that's been well tested for writing other programs? These various config file programming systems always end up creating something notorious that everyone tries to avoid having to work on.

XQuery is, in many ways, XSLT with better syntax. It doesn't have the pattern-matching transforms that are the T in XSLT - but for configs, I don't think it makes a big difference.

Also, I don't think many realize that the stack has evolved since early 00s. XSLT 1.0 was a very limiting language, requiring extensions for many advanced scenarios. But there's XSLT v3.0 these days, and XPath & XQuery v3.1, with some major new features - e.g. maps and lambdas. Granted, this doesn't fix the most basic complaint about XSLT - its insanely verbose syntax - but even then, I'd still take XSLT over ad-hoc YAML-based loops and conditionals.

I will take the verbosity of XML any day over YAML wrestling (complex YAML configs of course). There is simply too many "implicit rules" for YAML. it's why I prefer python over ruby and perl. Generally though TOML has been good enough for me to do lots of fairly large config files that are easy for humans and machines to parse.

XML died because too many configurations turned what should be a 'prop' into an inner tag -- and it doesn't help that XML doesn't really give guidance as to when to use which. And, of course, when you deserialize XML, the innerText is always in a very strange place as to not really being clear "what the right way to handle it" is.

Honestly, I think using an embedded scripting language, like lua or even javascript, would be a much better fit for these use cases than trying to make yaml do something it wasn't designed for.

Ironically, having used cdk8s[1] for dealing with kubernetes infrastructure, that's the one thing where I've actually preferred yaml. That said, k8s resource definitions are pure config so there's no need to try and hack extra bits on top of a serialized data structure.


I really like approach of buildkite CI -- they use yaml, but this yaml can be produced by an executable script.

So you write yaml by hand for trivial cases, but once it get complex, you can just drop back to shell/python/ruby/node/whatever, implement any complex logic, and serialize results to plain yaml.

This is the crux of the issue right? YAML is fine for most projects, but THE project for using YAML is CI configuration, for which it doesn’t match.

I'm pretty sure the format is the issue.

I still don't know how to do arrays in yaml.

Is it new line, tab to same indent or same indent + one space, do I need a dash? Does a dash make it an array or an object?

It's just simply not that big of a deal to add a few quotes and braces to make everything make sense.

The only real issue with Json is the lack of comments and strictness about extra commas.

> I don't like that a truncated document is a complete and valid document.

Me either. If your documents have this property you're likely to tempt people to start trying to process partial documents.

When they do that, they violate Full Recognition Before Processing and likely there's a latent security bug as a result.

> So, I downloaded their python client to see how it did it.

Who are you suggesting the python client belongs to, who is 'they' in 'their'?

... the people that made it and wrote the submission we are discussing? how is this even a question?

Author seems to use misfeatures of a particular implementation to tar all implementations with. The round-tripping issue is not a statement about YAML as a markup language, much in the way a rendering bug in Firefox is not a statement about the web.

Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on. Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy. They're prone to opinion and style, as if replacing some part or other will make the high level problem (that's us) go away. Fretting over perfection in UI is an utterly pointless waste of time.

I don't know what NestedText is and find it very difficulty to care, there are far more important problems in life to be concerned with than yet another incremental retake on serialization. I find it hard to consider contributions like this to be helpful or represent progress in any way.

I actually disagree it's bike shedding.

If you can write a bad YAML document because of those mis-features/edge cases, I'd say you've already lost.

Humans are messy, but at the end of the day the data has to go to a program, so a concise and super simple interface has a lot of power to it for humans.

Working at a typical software company with average skill level engineers (including myself), no one likes writing YAML. But everyone is fine with JSON.

I think it's a case of conceptual purity vs what an average engineer would actually want to use. And JSON wins that. If YAML was really better than JSON, we'd all be using that right now.

So does it really matter if YAML is superior if >80% of engineers pick JSON instead?

I would argue that you can write something poor and/or confusing in any markup language that is sufficiently powerful.

Conversely, if a markup language is strict enough to prevent every inconsistency, then it's not powerful enough or too cumbersome to use to be generally useful.

I'd say that YAML is anything but conceptually pure, with all the arbitrariness, multitude of formattin options, and parsig magic happening without warning.

If you want conceptual purity (and far fewer footguns), take Dhall.

> Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on

Nah, in the 1970s we had Lisp S-expressions that completely solved the problem, and everything since then has been regressions on S-expressions due to parenthesis phobia.

After hearing that thing about the country code for Norway, I became convinced that YAML has to just die. Become an ex-markup language. Pine for the fjords. Be a syntax that wouldn't VOOM if you put 4 million volts through it. Join the choir invisible, etc.

This is good: https://noyaml.com/

Erik Naggum had a notoriously NSFW rant about XML (over the top even for him) that I better not link to here, but lots of it applies to YAML as well.

S-expressions don't solve the problem at all, you just get to fractally bikeshed all over again about what semantics they have and what transformations are or aren't equivalent. Does whitespace roundtrip through S-expressions? Who knows. Are numbers in S-expressions rounded to double precision on read/write? Umm, maybe. How do I escape a ) in one of my values? Hoo boy, pick any escape character you like and there's an implementation that does it.

EDN solves all these problems: https://github.com/edn-format/edn

I have to second that. Including the variant "canonical s-expressions" which is in fact a binary format.

Why of course, embedding a full-blown Lisp development environment for parsing a config file is totally sane and normal.

(Sarcasm, just in case.)

S-expressions don’t completely solve the problem: they don’t have a syntax for maps, and in practice there are at least two common incompatible conventions: alist or plist?

Obviously the application has to interpret the Lisp object resulting from reading the S-expression, just like it has to interpret any JSON, YAML, or anything else that it reads. So for maps you can, as you mention, use alists or plists. Regarding other stuff mentioned: none of the encodings are supposed to be bijective (the writer emits the exact input that the reader ingested). Otherwise, for example, they couldn't have comments, unless those ended up in the data somehow. There is ASN.1 DER if you want that, but ASN.1 is generally disastrous.

Stuff like escape chars were well specified in Lisps of the 1970s (at least the late 1970s), including in Scheme (1975). Floating point conversion is a different matter (it was even messier in the pre-IEEE 754 era than now) but I think the alternatives don't handle it well either. You probably have to use hexadecimal representation for binary floats. Maybe decimal floats will become more widely supported on future hardware.

A type-checked approach can be seen in XMonad, whose config files use Haskell's Read typeclass for the equivalent of typed S-expressions.

EDN has maps, sets, vectors and lists and is extendable.

Solutions for this problem that I've used in my own S-expression config files:

1. Use only alists for maps because they prevent off-by-one errors.

2. Allow plists because they're less verbose than alists and use reader macros to distinguish them, and allow the reader macro definitions to be in the same file.

Most of the time I use option 1 because it's simpler.

I would argue that, in a data markup language, there shouldn't be a syntax for maps. Whether a given sequence should be treated as key-value pairs, and whether keys in that sequence are ordered or unordered, is something that is best defined by the schema, just like all other value types.

It'd be bikeshedding if the status quo was good. But it isn't.

> Author seems to use misfeatures of a particular implementation to tar all implementations with.

There's no canonical YAML implementation, and YAML spec is enormous (doubly so if you need to work with stuff like non-quoted strings etc. )

> There's no canonical YAML implementation

The formal grammar counts as canonical and several implementations are derived from it: https://github.com/yaml/yaml-reference-parser

If you use YAML in situations where it may need hand editing, it means you actively hate your users.

YAML is patently unsuitable for any use case where the resulting output may require hand editing.

> YAML as a markup language

YAML ain't markup language.

>Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy

I don't know what to make of this statment, it has so much handwaving built-in. The most charitable interpretation I can find is that by 'Human-convenient' you simply meant the quick-and-dirty ideology expressed in Worse Is Better: Does job, makes users contemplate suicide only once per month, isn't too boat-rocking for current infrastructure and tooling.

Taken at face value (without special charitable parsing), this statement is trivially false. Python is often used as a paragon of 'Human-convenience', I sometimes find this trope tiring but whatever Python's merits and vices its _definitely_ NOT messy in design.

Perl is the C++ of scripting languages, it's a very [badly|un] designed language widely mocked by both language designers and users. Lua and tcl instead are languages literally created for the sole exact purpose of (non-) programmers expressing configuration inside of a fixed kernel of code created by other programmers, and look at their design : the whole of tcl's syntax and semantics is a single human-readable sentence, while lua thought it would be funny if 70% of the language involved dictionaries for some reason. These are extremely elegant and minimal designs, and they are brutally efficient and successful at their niches : tcl is EDA's and Network Administration's darling, and lua is used by game artists utterly uninterested in programming to express level design.

'Humans are messy' isn't a satisfactory way to put it. 'Humans love simple rules that get the job done' is more like it. But because the world is very complex and exception-laden, though, simple rules don't hug its contours well. There are two responses to this:

- you can declare it a free-for-all and just have people make up simple rules on the fly as situations come up, that's the Worse Is Better approach. It doesn't work for long because very soon the sheer mountain of simple rules interact and create lovecraftian horrors more complex than anything the world would have thrown at you. Remember that the world itself is animated by extremely simple rules (Maxwell's equations, Evolution by Natural Selection, etc...), it's the multitude and interaction of those simple rules that give it its gargantuan complexity and variety.

- you stop and think about The One Simple Rule To Rule All Rules, a kernel of order that can be extended and added to gradually, consistently and beautifully.

The first approach can be called the 'raster ideology', it's a way of approximating reality by dividing it into a huge part of small, simple 'pixels' and describing each one seperately by simple rules. I'm not sure it's 'easy' or 'convenient', maybe seductive. It promises you can always come up with more rules to describe new patterns and situations, and never ever throw away the old rules. This doesn't work if your problem is the sheer multitude and inconsistency of rules. The second approach is the 'vector ideology', it promises you that there is a small basis of simple rules that will describe your pattern in entirety, and can always be tweaked or added to (consistently!) when new patterns arise, the only catch is that you have to think hard about it first.

>and lua is used by game artists utterly uninterested in programming to express level design

Rather short sighted and dismissive to a successful programming language that's evolved over 20+ years. Lua is a great general purpose programming language that specializes not in "game making for non-programmers" but in ease of embedding, extension/extensability, and data description (like a config language). There's a whole section in Programming in Lua[1] to that effect. The fact that it's frequently used in games is credit to it's speed, size and great C API for embedding, not because of any particular catering to game designers.

[1]: https://www.lua.org/pil/10.1.html

You misunderstood me. I love lua and I wasn't being dismissive of it, I was using the first example that came to my mind to counter the claim that a convenient language has to be messy. Just because that was the example used doens't mean there is an implicit "and that's the only thing it's good for" clause I'm implying there: if someone said "Python is used by scientists utterly uninterested in programming to express numerical algorithms" would you understand that to be a dismissive remark against Python ?

Being used by non-programmers utterly uninterested in programming to solve problems is the highest honor any programming language can ever attain, because it means that the language is well-suited to the domain enough (or flexible enough to be made so) that describing problems in it is no different than writing thoughts or design documents in natural language. This is the single most flattering thing you can ever say about a language, not a dismissive remark.

It's really sad to see the pervasiveness of JSON. For one thing its usage as a config file is disturbing. Config files need to have comments. Second, even as a data transfer format the lack of schema is even more disturbing. I really wish JSON didn't happen and now these malpractices are so widespread that it's hurting everyone.

JSONC. JSON with comments. And even if your favorite parser does not support it natively it’s not so hard to add with a very simple pre-lexer step.

JSON schemas exist and they’re ok for relatively simple things. For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.

> For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.

Not sure if this is what you're looking for, and whether it's powerful and expressive enough for your use case, but you can use typescript-json-schema¹ for this, and validate with eg ajv.


I like JSON5 for similar reasons. I specifically like the addition of comments, trailing commas, and keys without quotes.

I've struggled with this in Java recently and at first I used Jankson which supports the complete JSON5 spec, but later we figured out we could configure the standard Jackson JSON package to accept the things we actually need and actually use.

Also needed is string concatenation. One line strings are very limiting.

There's libraries that let you define a schema programmatically, and then infer the types.


Seems to me that YAML just needs type/schema support to be less of a hurdle.

As an alternative, the encoding/decoding roundtrip using protobuf seems reasonable to me, catches the footgun of using floating-point version numbers (it becomes a parse error), whitespace/multiline concatenation being more obvious, and allowing comments (compared to JSON):

  ( cat << EOF
  # yes, comments are allowed
  name: "Python package"
  on: "push"
  build {
    python_version: ["3.6", "3.7", "3.8", "3.9", "3.10"]
    steps: [
        name: "Install dependencies"
            "python -m pip install --upgrade pip\n"
            "pip install pytest\n"
            "if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi\n"
        name: "Test with pytest"
        run: "pytest\n"
  ) | protoc --encode=Config config.proto  | protoc --decode=Config config.proto
  name: "Python package"
  on: "push"
  build {
    python_version: "3.6"
    python_version: "3.7"
    python_version: "3.8"
    python_version: "3.9"
    python_version: "3.10"
    steps {
      name: "Install dependencies"
      run: "python -m pip install --upgrade pip\npip install pytest\nif [ -f \'requirements.txt\' ]; then pip   install -r requirements.txt; fi\n"
    steps {
      name: "Test with pytest"
      run: "pytest\n"

> Seems to me that YAML just needs type/schema support to be less of a hurdle.

JSON schemas exist and can be applied to yaml and this is supported by many editors. For example this vscode extension: https://marketplace.visualstudio.com/items?itemName=redhat.v...

It's strange to see so many complains about "missing tooling" that actually exists and is well supported.

> Seems to me that YAML just needs type/schema support to be less of a hurdle.

Unfortunately YAML already got type support, which made it easier to roundtrip, but also insecure. Creating a type calls constructors with possible insecure side effects. Which was eg used to hack Movable Type.

JSON Schema is an official thing that exists and has implementations in all major languages. Personally I’m very glad that it’s an opt-in addition rather than a requirement.

(I agree with you about comments though)

For comments just use JSONC.

I agree, but I would recommend JSON5 as the solution. Not YAML or this abomination.

JSON5 has many advantages:

* Superset of JSON without being wildly different. I know YAML is a superset of JSON but it's completely different too. Insane.

* Unambiguous grammar. YAML has way too many big structure decisions that are made by unclear and minor formatting differences. My work's YAML data is full of single-element lists that shouldn't be lists for example.

* Comments, trailing commas

* It's a subset of Javascript so basically nothing new to learn.

* It has an unambiguous extension (.json5). I think JSONC would be a reasonable option but everyone uses the same extension as JSON (.json) so you can never be sure which you are using. E.g. `tsconfig.json` is JSONC but `package.json` is just JSON (to everyone's annoyance).

* Doesn't add too much of Javascript. I wouldn't recommend JSON6 because it's just making the format too complicated for little benefit.

I would rather recommend jsonc:

- it has good editor support (VsCode) - has comments support - Support jsonschema

Only thing missing is trailing commas, but i would rather live without trailing commas than tooling support

JSONC supports trailing commas.

> - it has good editor support (VsCode)

Unfortunately it doesn't really because of the extension issue I mentioned. Certain file names (like `tsconfig.json`) are whitelisted to have JSONC support, but any random file `foo.json` will be treated as JSON and give you annoying lints if you put comments and trailing commas in.

That's a fairly recent change I think.

Tools that use JSON as configuration format could simply allow certain unused keys (e.g. all keys starting with #) and promise never to use them. Then author can write their comments with something like:

      "name": "my-tool",
      "#comment-1": "Don’t change the version!",
      "version": "42.1337.0"

There's a lot of JSON tooling, and it's liable to interact badly with this. For example, a formatter might re-order the fields of a dict, moving "#comment-1" away from "version". Or the software that this JSON is for might error upon receiving unexpected keys (which is actually useful behavior, as that would catch a typo in an optional field).

Also, this doesn't let you put comments at the top of the file, or before a list item, or at the end of a line.

If you're going to change your JSON tooling to handle comments of some kind, you might as well go all the way to JSONC.

I've heard and read this multiple times. Why are you trying so hard to fit into a format that doesn't just support comments out of the box? What advantages is JSON offering you that you've compelled to bend over backwards to do this? It's exactly these kinds of workarounds that is making it super difficult stop such malpractices. It's just plain ugly. Please stop doing this.

In many cases, you're using a library or service that you don't maintain, so you don't have much of a choice.

You can't comment out a large section of config easily. For me, this is a relatively common use case for config files, so I take the position that JSON should be used for serialization only.

And I am just writing a JSON de/serializer to move my config from the current system to JSON. I worked on it today and yesterday and several days some time ago.

This situation makes me feel rather silly

So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

(and it doesn't have to be comment-less... JSON with comments is a thing and VSCode has syntax highlighting for it - just strip out the comments before parsing).

> So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

Aren't we past basic false dichotomies?

Nope: basic false dichotomies and JSON are both pervasive.

There is a corporate- and government-approved standard for false dichotomies, but it works as a de-facto standard, not published.

    - foo
       'some bar'

XML is perfect. + With all the fancy editors now its very easy to write. Easy schema to check, comments. Perfect.

Disclaimer: this is not a defense for YAML, I'm just trying to remove the rose tinted glasses some people view XML configs through.

As someone who has used XML configs they have a few problems:

- technical: missing comments are mentioned multiple times here so I will mention that while XML has comments they cannot be nested.

- socially: for some reason (maybe because XML is structured enough that this doesn't immediately collapse?) XML tends to just grow and grow. People start programming in XML too, and not only using XSLT or other standard approaches but also in completely proprietary ways.

At one project someone even wrote an authorization framework in Apache Tiles which allowed one to create roles using somewhere between 600 and 5000 lines of XML pr role. The benefit was of course that you could update the roles without touching the Java code.

(In case it isn't immediately obvious: it would have been extremely much simpler to edit it in Java, and people who know enough Java to fix it are available at the right price, the XML system had to be learned at work.)

Personally I just want it to be kept simple:

- a settings.local.ini and default settings in settings.ini or something to that effect

- if necessary, just use a code file: config.ts works just as well, or config.js if it needs to be adjustable at runtime without transpilation.

not easy to read, it's the java of config, pages of code that express very little, by the time you find what you need, you forget the context and what level of nesting you're on already. It's also more wasteful as a transport.

> It's also more wasteful as a transport.

This is most certainly true, however with GZip thrown into the mix, it's not quite as bad as one might imagine: https://www.codeproject.com/articles/604720/json-vs-xml-some...

It compresses pretty decently and doesn't have too much of an overhead, in the example it being around 10% larger than JSON when compressed.

I'd argue that if one were to swap out JSON for XML within all the requests that an average webpage needs for some unholy reason, the overall increase in page size would be much less than that, because huge amounts of modern sites are images, as well as bits of JS that won't be executed but also won't be removed because our tree shaking isn't perfect.

Edit: as someone who writes a good deal of Java in their dayjob, i feel like commenting about the verbosity of XML might be unwelcome. I'll only say that in some cases it can be useful to have elements that have been structured and described in verbose ways, especially when you don't have the slightest idea about what API or data you're looking at when seeing it for the first time (the same way how WSDL files for SOAP could provide discoverability).

However, it all goes downhill due to everything looking like a nail once you have a hammer - most of the negative connotations with XML in my mind actually come from Java EE et al and how it tried doing dynamic code loading through XML configuration (e.g. web.xml, context.xml, server.xml and bean configuration), which was unpleasant.

On an unrelated note, XSD is the one truly redeeming factor of XML, the equivalent of which for JSON took a while to get there (JSON Schema). Similarly, WSDL was a good attempt, whereas for JSON there first was WADL which didn't gain popularity, though at least now OpenAPI seems to have a pretty stable place, even if the tooling will still take a while to get there (e.g. automatically generating method stubs for a web API with a language's HTTP client).

You mean something like https://pyotr.readthedocs.io

Thanks for the link, but not necessarily.

How WSDL and the code generation around it worked, was that you'd have a specification of the web API (much like OpenAPI attempts to do), which you could feed into any number of code generators, to get output code which has no coupling to the actual generator at runtime, whereas Pyotr is geared more towards validation and goes into the opposite direction: https://pyotr.readthedocs.io/en/latest/client/

The best analogy that i can think of is how you can also do schema first application development - you do your SQL migrations (ideally in an automated way as well) and then just run a command locally to generate all of the data access classes and/or models for your database tables within your application. That way, you save your time for 80% of the boring and repetitive stuff while minimizing the risks of human error and inconsistencies, with nothing preventing you from altering the generated code if you have specific needs (outside of needing to make it non overrideable, for example, a child class of a generated class). Of course, there's no reason why this can't be applied to server code either - write the spec first and generate stubs for endpoints that you'll just fill out.

Similarly there shouldn't be a need for a special client to generate stubs for OpenAPI, the closest that Python in particular has for now is this https://github.com/openapi-generators/openapi-python-client

However, for some reason, model driven development never really took off, outside of niche frameworks, like JHipster: https://www.jhipster.tech/

Furthermore, for whatever reason formal specs for REST APIs also never really got popular and aren't regarded as the standard, which to me seems silly: every bit of client code that you write will need a specific version to work against, which should be formalized.

> model driven development never really took off

same as to why REST is now not a hot thing anymore, the idea that your API is just a dumb wrapper around data model is poor api design.

API-driven development didn't really took off either, that is write your spec in grpc/OpenAPI and have the plumbing code generated in both ends. It's technically already there with various tools, but because of dogma like "code generation is bad", quality of code generators, or whatever reason, we're still writting "API code"

Well, in Python, code generation is an anti-pattern.

> Well, in Python, code generation is an anti-pattern.

Hmm, i don't think that i've ever heard of this. Would you care to provide any sources, since that sounds like an interesting stance to take?

So far, it seems like frameworks like Django don't have an issue with CLI tools to generate bits of code, i.e. https://docs.djangoproject.com/en/3.2/intro/tutorial01/

  If this is your first time using Django, you’ll have to take care of some initial setup. Namely, you’ll need to auto-generate some code that establishes a Django project – a collection of settings for an instance of Django, including database configuration, Django-specific options and application-specific settings.
  $ django-admin startproject mysite
Similarly, PyCharm doesn't seem to have an issue with offering to generate methods for classes (ALT + INSERT), such as override methods (__class__, __init__, __new__, __setattr__, __eq__, __ne__, __str__, __repr__, __hash__, __format__, __getattribute__, __delattr__, __sizeof__, __reduce__, __reduce_ex__, __dir__, __init__), implementing methods, generating tests and copyright information.

I don't see why CLI tools would be treated any differently or why code generation should be considered an anti-pattern since it's additive in nature and is entirely optional, hence asking to learn more.

First of all, just because a tool or project uses a pattern, it doesn't mean that it's a good idea. Second, code generation as part of IDE or one-time setup is something else.

I need to clarify: when I say that "code generation" is an anti-pattern, I'm talking about the traditional, two-step process where you generate some code in one process, and then execute it in another. But Python works really well with a different type of "code generation".

Someone once said that the only thing missing from Python is a macro language; but that is not true - Python has its own macro language, and it's called Python.

Python is dynamically evaluated and executed, so there is no reason why we need two separate steps when generating code dynamically; in Python, the right way is not to dynamically construct the textual representation of code, but rather to dynamically construct runtime entities (classes, functions etc), and then use them straight away, in the same process.

Unless you're dynamically building hundreds of such constructs (and if you do you have a bigger problem), any performance impact is negligible.

> Someone once said that the only thing missing from Python is a macro language

Ahh, then it feels like we're talking about different things here! The type of code generation that i was talking about was more along the lines of tools that allow you to automatically write some of the repetitive boilerplate code that's needed for one reason or another, such as objects that map to your DB structure and so on. Essentially things that a person would have to do manually otherwise, as opposed to introducing preprocessors and macros.

For a really nice example of this, have a look at the Ruby on Rails generators here: https://medium.com/@simone.catley/ruby-on-rails-generators-a...

>you forget the context

Wait, it the opposite. XML is designed to indicate context, and JSON is designed to hide context, you have a bunch of braces in place of context there, no matter where you are it's braces all the way down, like lisp.

not really, what enables you to have the context is shorter code. It's useless to have context reminders at the top and bottom of the thing, but not the middle and it's too damn long

For me XML and YAML are about the same. I think I'd also prefer comment-less JSON over both. However, XML wasn't that bad. With a decent editor and schema validation I would say there's a good chance I was more productive with XML than I am with YAML.

It's simple. For config files, choose the format that has the best tooling in your company and that supports comments. For data transfer, choose that supports schemas, backwards compatibility and good tooling (protobufs is just one e.g. that I'm most familiar with).

Actually, yes, I do. XML syntax was far from stellar, and much of the ecosystem (e.g. XML Schema) was drastically overengineered... but even so, we had gems like RELAX NG to compensate. On the whole, it was better than the current mess.

So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

Sure, why not? XML rocks. I'll take it over JSON for many purposes.

My opinion only: I love JSON because it lacks so many foot guns of yaml. If you’re doing lots of clever stuff with yaml you probably want a scripting language instead. Django using Python for configs made me fall in love with this. Spending years with the unmitigated disaster that is ROS xml launchfiles and rosparams makes me love it even more.

Yaml and toml are fine if you keep it simple. JSON direly needs comments support (but of course wasn’t designed to be used as a human config file format so that’s kind of on us). And not just “Jsonc that sometimes might work in places.”

Beyond that, I think we generally have all the things we need and I don’t personally think we need yet another yaml. =)

These aren't foot-guns per se, but I can think of another handful of grievances I have with JSON:

* JSON streaming is a bit of a mess. You can either do JSONL, or keep the entire document in memory at once. I usually end up going with JSONL.

* JSON itself doesn't permit trailing commas. I can measure the amount of time that I've wasted re-opening JSON files after accidentally adding a comma in days, not hours.

* JSON has weakly specified numbers. The specification itself defines the number type symbolically, as (essentially) `[0-9]+`. It's consequently possible (and common) for different parsers to behave differently on large numbers. YAML also, unfortunately, has this problem.

* Similarly: JSON doesn't clearly specify how parsers should behave in the presence of duplicate keys. More opportunity for confusion and bugs.

Running prettier (https://prettier.io) on each save will fix trailing commas for you. If you accidentally have one, it will just sneakily remove it and turn your document into one that is valid.

How someone could have decided on a subset of javascript and not include comments is beyond me.

It may have been a good or bad decision. But comments were intentionally left out of JSON to avoid obvious ways to sneak in parsing directives and thus incompatibilities between different JSON-parsers.

Yet incompatibilities persist from day 1: big integers, duplicate keys, keys order.

On the other hand, XML allows comments, yet I've never seen XML parsers incompatibilities.

Not exactly an incompatibility, but my mind jumped to issues like this: https://github.com/swisskyrepo/PayloadsAllTheThings/blob/mas...

Some parsers will take just the first text element ("user@user.com"), and others will concatenate the text elements ("user@user.com.evil.com").

> Some parsers will take just the first text element

Those are not in compliance with the relevant spec. We need to treat them as damage and confront them on the technical and social level.

If I had a penny every time someone tried to parse xml using a regex, if that classifies as a parser. Those are 100% incompatible with everything else.

Easiest way to demonstrate how wrong that is, is to throw in a comment in the example document ;)

the funny thing is that json doesn't even need commas, they essentially act as whitespace, any amount or no amount would make no difference in the meaning of the document.

Arrays with hole are a JS-only feature

> json doesn't even need commas

JSON is defined by the spec. The people who wrote the spec think otherwise[0].

[0]: https://www.json.org/json-en.html

> Arrays with hole are a JS-only feature.

There are other langauges that allow arrays with missing elements.

JSON requires commas, but does not need them, semantically they are treated like whitespace

The document > {1:2 3:[4 5]}

can only be "commafied" to > {1:2, 3:[4, 5]}

> There are other langauges that allow arrays with missing elements.

But javascript is the only one that gave JSON the JS in its name

You can parse JSON in a streaming fashion with many libraries. You just don't know at the beginning if it is going to be valid or not.

And the flip side of that with YAML is you can stream it, but you don't know once you've gotten to the end if it was the whole document without some user defined checksum mechanism.

Ran into a great bug with the INI format which has the same issue. The application would read the config file on modification but if you just wrote over the file it would sometimes read the config before the file was fully written. Have to use a temp file and move it rather than just edit it.

It's possible to have document start and end markers in yaml:

    foo: 1
Your application can mandate usage of these. But yeah, not ideal.

> Your application can mandate usage of these

I believe that's only true if one were to load YAML via the "SAX"-style per-event stream, and not the "object materialization" that normal apps use (aka `yaml.load_all` or JAX-B objects) since in those more data-object centric views, where would one put the processing events for those markers?

I also originally expected `yaml.parse(...)` to eat them as it does for comments and extraneous whitespace, but no, it does in fact return dedicated stream events for them, so TIL

2, 3 and 4 can be caught early with JSON schema.

Not really, json schema validation is applied after json parsing on already parsed json.

> Django using Python for configs made me fall in love with this.

I also started advocating in-language configuration files (Python for Python, but also Lua for Lua, etc) a number of years ago because it lets you do really useful things (like functionally generating values, importing shared subsets of data, storing executable references, and ensuring that two keys return the same values without manual copy/paste) all without needing to spec and use Yet Another Thing™ that does only a fraction of what the programming language you're already using already does.

That also implies that you can't just test a foreign config file without first reading and understanding what it does, as just using one would imply arbitrary code execution.

This is a place where Tcl excels. You can easily create restricted sub-interpreters that can't do anything dangerous. If you need more power for trusted scripts you just reenable selected commands.

Same thing with with Lua!

Using the programming language to do the comments works only when using some scripting language.

Things that get compiled can't really use it without recompilation.

But you can embed Lua or Python using its C interface.

That is how our Tcl based application server was, the configuration files were a Tcl DSL.

> My opinion only: I love JSON because it lacks so many foot guns of yaml.

While true, parsing it is still a minefield because it's very underspecified: http://seriot.ch/projects/parsing_json.html

JSON5 is the way to go. It supports comments and trailing commas. Unfortunately it's going to be difficult to supplant legacy JSON, which is so pervasive.

Except parsing JSON5 in browser is super slow. Native JSON.Parse doesn't support it, non-native parsnips are slow, and the only fast way to parse it is `eval()`.

Does the browser need JSON objects with comments?

The desire to use a single interchange format for all data is the problem. There are plenty of reasons to support comments and minor syntax issues that JSON itself dislikes for human consumable and interactive JSON. I'd think software JSON could be just that.

This shouldn't really matter for the JSON5 use case - config files - which are usually small enough.

For machine-to-machine generated payloads JSON is good enough.

I work with ros extensively and have not heard of using django for this use case. do you know of any open source projects that do this?

Sorry. Two separate contexts. I use both in the big picture but the Django world doesn’t directly interact with ROS. there’s an HTTP api for that.

I’ve never liked YAML. For whatever reason, it always feels like working in a mine field. It comes from the same cargo cult of people who think the problem with human machine formats is that it needs to be “clean”.

Clean, of course to them means some bizarre aesthetic notion of removing as much as possible. Only it’s taken to an extreme. I wonder if the same people also think books would be better with all punctuation be removed to make it look “clean”?

It’s unhealthy minimalism, causes more problems than it solves. As soon as I see a project using YAML I cringe and try to find an alternative because god knows what other poor choices the developer has made. In that sense, YAML can be considered a red herring and I’m usually right. The last project I used that adopted an overly complex and build-breaking YAML configuration syntax had other problems hiding under the covers, and in some cases couldn’t parse it’s own syntax due to YAML’s overly broad but at the same time opinionated syntax.

Just say no to YAML.

By its very name (and the fact that the MEANING of the name flip-flopped in mid-flight after launch) you can tell that the designers of YAML had no clue what they were doing, because originally they named it "YAML" for "Yet Another Markup Language", when it clearly was NOT a markup language.

Only AFTER YAML had been around and in use for a few years did those geniuses actually realize that they had made a mistake in naming it something that it's not, and retroactively changed the name "YAML" to mean "YAML Ain't Markup Language", which was a too clever by half way of whitewashing the fact that they originally CLAIMED it was "Yet Another Markup Language", since they had no idea what a markup language actually was.

I prefer to use markup languages and data definition languages that were designed by people who are situationally aware enough to know what the difference between a markup language and a data definition language is, please.

Hard pass on YAML, whatever it stands for this week.

I've often heard this argument about YAML being "clean", but over time I have realized that they are conflating minimalism with cleaninless, when they are two different things. That realization is what it took for me to realize why I didn't like it. I did _not_ find it clean, I found it "messy" by virtue of the increased cognitive overhead. But it is minimal at least compared to other formats. Other formats appear cleaner to me.

I'll give my opinion as someone who has to choose among JSON, XML, TOML, and YAML about two years ago for a new project. Whatever I chose had to be easy for end-users who don't know the specification to to understand later.

Here were my thoughts on the options.

JSON - No comments -> impossible

XML - Unreadable

YAML - 2nd place. Meaningful indentation also made me worried someone was going to not understand why their file didn't work. The lack of quotes around strings was frustrating.

TOML - 1st place. Simpler than YAML to read & parse. It truly seems 'obvious' like the name says.

I haven't encountered any situations where I wish I had more than TOML offers.

I disagree. TOML is terrible at handling nested data.

Check this thread:


I don't see Kubernetes switching to TOML anytime soon!

There may be no nested data in his use case. There’s no single correct answer here.

Too YAGNI for me.

I have nesting up to three levels deep. I use inline tables^ for the many innermost (or other few-element) tables. It's never seemed excessively verbose.


A bit later you see [fruits.physical]

Even XML doesn't make you repeat the higher level keys!

Also agree, I find toml way less readable than yaml for lots of data structures

I feel like if you have data so nested that TOML is a problem then your schema is a problem/you should just be using a script

I think the "right" choice is HCL

It's plenty easy to convert YAML to JSON and use it in Terraform :)


Why convert when HCL is superior to YAML and JSON?

It isn't. YAML and JSON are much more proven than HCL. HCL is used for some relatively small products. Just making something more complicated doesn't make it better.

Proven in what sense? Several implementations are broken are incorrect. HCL is used in very large products as well. Just because it isn't the majority currently doesn't mean that it isn't a worthy choice. HCL isn't more complicated if used as an alternative to YAML or JSON, in fact, I would argue that it is simpler. It bridges the pros of YAML and JSON combined, and addresses the nested complexity of TOML. It really is IMO the best, but you of course are free to share a different opinion. However, I would encourage you to actually try it out and re-evaluate.

The unreadability of XML is grossly exaggerated.

I agree. I have never really had a problem reading XML myself.

There’s properties files too, but TOML is my “format of choice” as well for a bunch of use-cases where human readability is important.

More people should give it a try. Very reminiscent of old Windows INI files and Java properties.

TOML is pretty good but it gets too verbose when you add bunch of arrays.

All we need is to revise official JSON standard(ECMA 404) to include comments.

And trailing commas. And unquoted keys.

Why are unquoted keys so critical? I feel like one of the strengths of a DDL like JSON or XML is that it's easy to tell what the data (key-value pair or otherwise) is, while with YAML and others, understanding data-vs-structure can be challenging.

Mostly so copy between JS and JSON isn’t such a PITA.

It’s not essential, but if we’re already changing the format we might as well?

> All we need is to revise official JSON standard(ECMA 404) to include comments.

That would be a step back for GitLab CI, GitHub Actions, Kubernetes, Google App Engine, and a bunch of other projects which use YAML and seldom encounter the Norway problem. https://hitchdev.com/strictyaml/why/implicit-typing-removed/

That's fine. They should not have used a format made for data to describe what essentially is code.

Why shouldn't they have?

TOML can't decide if it's a super INI file or a JSON cousin. You can represent the same information using two completely different representations and you can mix both styles in the same document. Manually navigating and editing values is error prone and hard to automate.

JSON with comments would be ideal.

Which is why many parsers support that. I'm positive you'll find one that does so in pretty much every environment.

In that case, you might want to have a look at JSON5: https://json5.org/

It is pretty niche, but attempts to improve upon JSON in a multitude of ways, one of which is the support for comments: https://spec.json5.org/#comments

The libconfig format is fairly close to that, and it's great!


I guess it's just matter of personal taste, but I don't see how XML is any more "unreadable" than any of the other options mention here.

TOML can handle nested data at the application level by using entity reference token semantics.

It does need an XPath traversal and search query format for application use and data references.


A lot of people have really strong opinions towards syntax things like YAML vs JSON vs XML, HTML, even programming languages. I think at some point we assign way too much importance to this kind of stuff.

I recently read a piece by Joel Spolsky that resonated with me (even though my career is not nearly as long as his).

> I took a few stupid years trying to be the CEO of a growing company during which I didn’t have time to code, and when I came back to web programming, after a break of about 10 years, I found Node, React, and other goodies, which are, don’t get me wrong, amazing? Really really great? But I also found that it took approximately the same amount of work to make a CRUD web app as it always has, and that there were some things (like handing a file upload, or centering) that were, shockingly, still just as randomly difficult as they were in VBScript twenty years ago. [0]

It makes me wonder if we're really focusing on the right stuff. Maybe there's lower hanging fruit somewhere that's more valuable than focusing on fundamentally subjective things like syntax.

[0]: https://www.joelonsoftware.com/2021/06/02/kinda-a-big-announ...

A radically different alternative with a lot going for it is Starlark: https://github.com/bazelbuild/starlark

It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.

EDN [1] and Transit [2]... Elegant weapons for a more civilized system.

[1] https://github.com/edn-format/edn

[2] https://github.com/cognitect/transit-format

Really came here to search why EDN wasn't mentioned. It is used in Clojure/ ClojureScript/ hylang ... projects a lot. It is a superset of JSON, is in my opinion a lot more readable than JSON but familiar enough too. It has native sets e.g. #{1 2 "three" '("four element list with a string inside")} and keywords. Tagged elements can be used for extending e.g. with a timestamp (such as the built-in #inst) or #uuid. And it also supports comments and discards for stuff, that should be omitted in evaluation.

As a sysadmin, YAML seems nice until you have actually done anything more advanced with it. See Julien Pivotto's presentation about some of its pitfalls: https://www.slideshare.net/roidelapluie/yaml-magic?next_slid... Btw. Jsonnet doesn't seem too bad either: https://www.youtube.com/watch?v=LiQnSZ4SOnw and here some examples: https://jsonnet.org/ but in my book, EDN still wins.

How about simply using pure full blown JavaScript or Python for config files, and not hiring people who you can't trust not to write infinite loops?

Or if you really must, then simply interrupt processes that loop infinitely, and fix the bugs that caused it.

You know, like you already do when you have an infinite loop.

Infinite loops are not the end of the world, you know. Processes can be interrupted, and computers have reset buttons.

IMO using code that generates (possibly binary/opaque) config data is the sweet spot. It's one more layer of indirection, but it means you're language-agnostic, you have a "safe" interface, and your "config-generating" process can be as expressive as you like -- comments, loops, whatever.

The underlying conundrum is:

- systems need to be configured,

- human-readability is obviously necessary at some level,

- configuration is often very "compressible" (needs loops, needs variables to be maintainable), but

- system-writers don't know the structure of your data, the axes on which you'd want to compress things, the best abstractions for you.

Templating languages are an obvious direction, but they're uniformly bad. If they have limited expressiveness you'll run into the limits. Maybe there are templating languages with good unit testing frameworks, but I haven't seen them. "Look at the expanded diff" doesn't scale. And generating gobs of human-readable "data" (in a format that supports comments!) is very wasteful.

It's not just a trust thing. Knowing that some snipped has bounded evaluation is super important, for mental models, processin, security, etc.

It's still resources to detect loops, it often involves introspection or privileged views; it's simply easier to prevent loops.

Determing whether or not arbitrary code is looping is actually impossisble (halting problem).

> Starlark is a dialect of Python. Like Python, it is a dynamically typed language with high-level data types, first-class functions with lexical scope, and garbage collection.

If it has first-class functions, how can you avoid infinite recursion? Like, what stops me from running the omega combinator in it? This is why Meson (a similar language) does not allow those kinds of shenanigans, to keep the language non-Turing-complete.

No recursion and no lambda.

So it doesn't have first class functions then?

Not a bad idea but only implemented in Rust, Go, and Java so far. Meanwhile, all sorts of languages can interpret JSON and YAML.

It's a cool idea to do configuration in a subset of Python but now you have to go implement that subset in every language.

Have you had any experience building on top of it directly outside of blaze/bazel?

How about just nudge json a couple more notches towards js? https://github.com/leontrolski/dnjs

Interesting! I started using jsonnet this year, but found that the language was needlessly quirky (e.g. the `::`, purely functional aspect, and no one wants to learn a new language to write configuration in the first place). More importantly, it is extremely slow (lazy evaluation without memoization...): rendering the Kubernetes YAML of my 5-container app taking over 10 seconds...

I will look into this further.

> It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.

Starlark is indeed deterministic and guaranteed to terminate (the Go implementation has a flag that allows recursion, but it's off by default), but these are two orthogonal properties.

Plenty of tools lacking in the Starlark environment, e.g.: generating Starlark files, machine editting Starlark maps

So one thing I wasnt sure of is: If you have a Starlark program how is the value of it decided? Is it simply the value of the last expression? And where does the print-output end up? Is it just for diagnostics and has no influence on the value?

I like INI. It's simple it's readable and it leaves the data types up to the application to interpret. It's also really easy to parse, I can work out how to do it and JSON is beyond me.

I like CSV (and similar delimited files) it's less verbose than anything else for tabular data.

I like JSON for data transfer, you know the data types, it's succinct, and readable.

I personally don't need anything else.

This is the right answer in my view. If you need something structured use XML, otherwise INI.

I'm more likely to Yacc my own config format than use YAML or JSON personally.

JSON is great as an output format for data though.

INI is my favorite. I dont understand why it isn't the automatic default for everything.

As far as I know there is no standard for INI. There is a TOML that looks close enough I guess?

TOML looks good. I'd rather it be call the ini standard but.

iirc, it's hard to do any nested structure in INI - you'd have to do a convention like putting prefixes and dots in the name of the entry to denote hierarchy.

Exactly what I think about the matter. Sometimes I use proprietary binary formats together with UDP where performance is critical (game servers for example).

I like NestedText, it's less verbose than anything else for nested data.

I have to say I hate the fact that I have low confidence when editing YAML that the result will be what I intend. It's kind of the number one job of such a format. And I routinely run into people using advanced features and then I have no idea at all how to safely edit it. It is interesting that it seems so difficult to pick a good tradeoff between flexibility and complexity with these kinds of languages.

I just stick to XML unless forced to use something else.

Schema validation, code completion on IDEs, endless amount of tooling including graphical visualisation, a language for data transformation and queries, and.... wait for it... comments!

If you're going to use XML, I would consider it mandatory to also use XSDs (W3C XML Schemas).

XSDs is something I think people need to pay more attention to when dealing with XML; the type system that the W3C XSD standard lays out (when used effectively) really does relieve much of the pain that people experience with XML.

What is the obsession with removing braces? I will never find the lack of clear demarcations (relying on indent) easier than braces.

Visual clutter, familiarity to non-coders. Curly braces are almost never used outside of programming and are ugly to boot.

My benchmark for yaml/JSON alternatives is "how would I feel explaining it to a busy, sceptical client?"

If the intended audience is purely developers, then sure. JSON (with the addition of comments and trailing commas) is just fine.

White space has the additional advantage of agreeing with itself. Other demarcations can have issues where the indentation and the structure contradict each other.

Again the dreaded Cobol argument. We had to struggle with a lot of this in the past: Cobol, SQL, YAML, BDD. All this would be much easier without this nonsensical idea that nontechnical people will read code. They won’t. Making code a bit more like prose doesn’t make it readable for nontechnical people. Yet we again and again make our life harder - ugly syntax rules, no code completion, no auto-formatters.

Please stop making code easy for non-coders. They don’t want to read it. They never did. They just want this damn box to work.

as a counter argument. I work in robotics, where many operators will look at and change settings in a yaml file during testing. They do not have software skills outside of this.

My educated guess then is you could have gotten them to change "settings" in C, Java or basically anything.

Just put the file in the root folder, and keep it as simple as possible and you should be fine? I mean, if they manage to write yaml correctly and consistently C is no match?

maybe? the dynamic loading of the configs kind of restricts it to a markup language.

The reason the situation is the way it is now is precisely because the code being made easy for non-coders increased the popularity and reach of the products. Probably because non-coders also found it easy to pick up and start working with it.

Wrong use case. I'm talking about asking them to write it edit these files.

I haven't found "this needs to be indented exactly the right amount or it won't work" to be much easier for non-programmers than "this needs to be enclosed in braces or it won't work." Most people have at least experienced parentheses in math (albeit maybe decades ago), so it's not an entirely foreign concept. Either one requires a bit of learning, but I think most people are capable of it, so any improvement in non-coder familiarity seems minor at best, vs. the very real costs.

Counter-argument - why do programmers insist on clear indentation if it doesn't aid readability? The indentation is there for humans and the braces are there for the compiler.

That's why you have both and lint against inconsistencies. This catches errors.

>My benchmark for yaml/JSON alternatives is "how would I feel explaining it to a busy, sceptical client?"

My benchmark is this: can an autoformatter do its job every time without breaking something that's technically working right now but possibly formatted wrong?

Every data format that cannot comply with this contains in it a huge waste of time. Even as a python programmer, I extend the same rule to programming languages.

Significant whitespace is evil. It's just begging for copy/paste bugs.

Funny that's never been an issue for me after 2 decades of writing Python.

My google ability didn't see anything, but are their any case studies that show it's more readable? I'm happy to accept that it is, but I can't help wondering if research has been done or it's mostly gut feeling / anecdotes / aesthetics.

> that show it's more readable?

I'm not clear exactly what the "it" here refers to but as I mentioned in other comments it's fairly self-evident that indentation is easier to visually parse than braces. A simple thought experiment - would you find it easier to skim read code where the indentation was consistent with the bracing or where it was inconsistent? Your brain registers the indentation first and you only resort to counting braces if there's a reason to doubt the former.

Reality often times runs counter to our expectations though, which is why I wondered if no brace methods have been shown to enable simpler to understand usage / understanding.

My point was that the debate is between "indentation alone is sufficient" and "braces plus indentation is better than just indentation".

Nobody advocates for "braces are better without indentation".

This surely implies that (practically) everyone agrees indentation is carrying most of the weight of visual indication of structure.

Of course "everyone" might be wrong - but that's a fairly tricky corner to defend.

> Nobody advocates for "braces are better without indentation".

^^^ THIS ^^^

That's just display though, if you have to show it to a skeptical client, why not run it through a browser that shows it without braces? It's the same as showing a webpage instead of the html.

I was specifically thinking about asking clients to edit or write these files. Isn't that a fairly common use case for config languages?

If programmers find getting YAML indentation correct difficult how are non-programmers going to fair?

If the HomeAssistant subreddit is anything to go by its their biggest complaint (HA configuration is in YAML).

With that said if they weren't complaining about white space they'd be complainit about missing semicolons, missing/extra commas, missig equals signs, missing closing )]} or whatever.

Are you arguing "braces are easier to get right than indentation" or is this a point specific to YAML's rules? Because I'm not defending the latter but I find it hard to understand I would need to argue against the former.

That's a good point someone brought up in a different comment. I haven't really dealt with those scenarios, so honestly I'll accept that as a reason.

I wish it could be displayed with braces though, I wonder if someone already has built that as an extension for editing / viewing yaml files.

YAML is a superset of JSON. In other words: any syntactically valid JSON file is a valid YAML file. If you want braces like JSOM, but not quoted strings, YAML supports it.

> YAML is a superset of JSON.

That's false. http://p3rl.org/JSON::XS#JSON-and-YAML

You could run a formatter which adds braces if you like?

This makes zero sense to me. Why do config files out of all things have to be accessible to non-coders?

Because we are asking them to write these things in many cases.

I'm not buying that you genuinely have a target audience of "I trust this person with config files but their eyes are too gentle to see a curly brace".

Well. I've genuinely had clients editing YAML so there's that.

I can definitely think of a broad range of people where I'd be happy to recommend they use text files for config and data but I wouldn't be happy if those text files needed to follow the rules of JSON syntax.

I mean - to some extent I would rather not edit JSON. It's not a terribly ergonomic experience. If I had to design a format for my own use, it would be indentation based and probably look a little bit like YAML, Markdown or similar.

Because they're not code?

Isn't an indent a clear demarcation?

Not at all.

If it was code, an indentation error would often not compile or show errors or fail tests.

Configuration in YAML is much worse: most of the time an indentation error goes undetected until an application starts misbehaving.

Significant whitespace is perfectly ok for code but a huge footgun for YAML

Literally, yes.

But I find it incredibly annoying to estimate indentation when lines are wrapped in an editor (or webpage). Or, to a lesser extent but still throws me off, when multiple blocks end at the same line. Or when pasting blocks into another block, and having to double check to make sure the indentation was carried over correctly. I like editors that visually show indentation characters.

Only if you forbid tabs ...

If it must be created by non-programmers too delicate to match braces, how about BEGIN .. END and an autoformatter.

Or, NODE <name> .. END .

tabs or spaces for that indent?

it makes a difference on how it is parsed, so it's more than a devs preference

It really doesn't matter. Just force the leading indent to be exactly the same bytes. If indent moves between two values where one isn't a prefix of the other raise an error.

is one \t the same as \s{4,} in bytes? that doesn't make any sense

No it isn't. That is the point. If you have one line indented with 4 spaces and one line indented with a tab there is no correct answer what the difference of indent is. The only good option is to raise an error.

right, or encapsulate in some sort of braces ;-)

Sure, that is another option. But your code still looks confusing.

Personally I mostly use indent to read code, so requiring that the indent matches the semantic nesting makes it much easier for me to understand.

Looks confusing to who? Another coder? Then they have issues.

I get the original was talking about client facing config files. I'd rather see INI style config files personally.

If you're writing ugly code, braces or spaces won't save you. Just don't write ugly code.* Write it like the next person to view your code is an axe murder that knows where you live, so don't make them mad. You can minify later.

*I'm ignoring Perl, as it's always ugly

An underrated property of braces in this case is that a truncated document is no longer valid (assuming your document only has one top-level item).

Truncate YAML and in most cases you still have valid YAML.

I was surprised the first time I saw Daniel J. Bernstein's qmail configuration. Qmail uses separate configuration files for each parameter being set. The directory /var/qmail/control contains most of these files.

For example, to set the maximum message size to by 10Mb and to set the timeout to be 30 seconds:

    echo 10000000 > /var/qmail/control/databytes
    echo 30 > /var/qmail/control/timeoutsmtpd
There are many more files like this that hold simple values. /var/qmail/control/locals is a file that is a list of domain names, one per line.

Dictionaries are just subdirectories with one file per entry, for example this is how aliases are defined to qmail:

    echo fred > /var/qmail/alias/.qmail-postmaster
    echo fred > /var/qmail/alias/.qmail-mailer-daemon
See [1] for more about qmail.

DJB also created a simple, portable encoding for serializing data called netstrings, see [2]. XML, YAML, JSON, TOML, and INI files all have some advantages over netstrings, but netstrings are simple to understand and simple to parse correctly.

[1] https://www.oreilly.com/library/view/qmail/1565926285/ch04.h...

[2] https://en.wikipedia.org/wiki/Netstring

I used this system myself for a project. It has some downsides, but overall it worked pretty dang well.

My opinion: I can live with yaml and json. Toml,tjson if I have to. Xml with a gun to my head. But I don't want yet another markup language (ironically that's what YAML stands for)

What I want from YAML (or a competitor) is access to the concrete syntax tree.

For one of my art projects I make YAML files that describe the front side, back side, and web side of a "three sided card". I generate these out of several templates, currently using ordinary string templating.

I'd love to be able to load a YAML file and add something programatically to the list and have the list stay in the same format that it was in, so if it was a

list I get

if it was a

   - 1
   - 2
   - 3
list I want

   - 1
   - 2
   - 3
   - 4
sadly I'm the only one who thinks this way.

In JavaScript, use https://www.npmjs.com/package/yaml for this:

    import assert from 'assert'
    import { parseDocument } from 'yaml'
    const flowDoc = parseDocument(`[1,2,3]`)
    assert(flowDoc.toString(), '[ 1, 2, 3, 4 ]\n')
    const blockDoc = parseDocument(`\
    - 1
    - 2
    - 3`)
    - 1
    - 2
    - 3
    - 4

You are not the only one. But even if you find a library for YAML AST transformations for your language. What ever other language uses your YAML probably doesn't have it.

E.g. I tried exactly the same thing, and it was quiet difficult with Rust. Because the way to parse it usually is with serde and it just removes the AST of course.

In the end I gave up, and just used JSON for my use case.

YAML stands for "YAML Ain't Markup Language"

Which is more than a tad bit ironic, in retrospect.

Not really since it's true. It isn't markup, it's a configuration file format.

"<em>This</em> is a markup language" since there is text which is marked up.

YAML/JSON is a way to serialise fairly common data structures (arrays/lists, hashes/dictionaries, numbers, strings, bools, etc.)

Incidentally, if you can seamlessly replace XML with something like JSON, then you probably aren't using the 'markup' bit of XML.

Ah, yes, that’s totally correct. I was mentally glossing over the difference between markup and configuration languages.

There was a previous discussion about YAML:

YAML: Probably not so great after all (arp242.net)



To which I posted:


I was suspicious of YAML from day one, when they announced "Yet Another Markup Language (YAML) 1.0", because it obviously WASN'T a markup language. Who did they think they were fooling?


XML and HTML are markup languages. JSON and YAML are not markup languages. So when they finally realized their mistake, they had to retroactively do an about-face and rename it "YAML Ain’t Markup Language". That didn't inspire my confidence or look to me like they did their research and learned the lessons (and definitions) of other previous markup and non-markup languages, to avoid repeating old mistakes.

If YAML is defined by what it Ain't, instead of what it Is, then why is it so specifically obsessed with not being a Markup Language, when there are so many other more terrible kinds of languages it could focus on not being, like YATL Ain't Templating Language or YAPL Ain't Programming Language?


>YAML (/ˈjæməl/, rhymes with camel) was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. Originally YAML was said to mean Yet Another Markup Language, referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.


>In computer text processing, a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. The idea and terminology evolved from the "marking up" of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red or blue pencil on authors' manuscripts. In digital media, this "blue pencil instruction text" was replaced by tags, which indicate what the parts of the document are, rather than details of how they might be shown on some display. This lets authors avoid formatting every instance of the same kind of thing redundantly (and possibly inconsistently). It also avoids the specification of fonts and dimensions which may not apply to many users (such as those with varying-size displays, impaired vision and screen-reading software).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact