This assumption was discovered deep within some templated YAML generated by templated something else three levels down and resulted in a complete k8s cluster failure for us, but only the 08 cluster! It worked in the first 7.
Damn! I’m guessing most developers today wouldn’t even know octal or how octal numeric literals are written with a “0” prefix. I actually laughed when I read this comment, but this behavior (and assumption) is ridiculous considering the fact that almost nobody would be using octal in some configuration file in 2023. Hexadecimal? Maybe. Decimal? Yes. Octal? C’mon!
Which - seeing like that, were the Linux file permissions originally octal? When thinking about permissions and permutations... and old magic numbers, that seems like the bits might fit.
Exactly there I was hit so many times by not using quotes around file permissions, because 0644 is interpreted as 420 and Ansible sets the permissions to '-r---w----' instead of '-rw-r--r--'.
For those used to /usr/bin/chmod remember that modes are actually octal numbers. You must give Ansible enough information to parse them correctly. For consistent results, quote octal numbers (for example, '644' or '1777') so Ansible receives a string and can do its own conversion from string into number. Adding a leading zero (for example, 0755) works sometimes, but can fail in loops and some other circumstances.
Despite this, I still think YAML is a nice configuration language. The problem is not with YAML, it's with the whole complexity we're trying to represent there. Whatever configuration language you use, the outcome will be the same. Reduce the complexity, and it will shine.
> Whatever generated that clearly wasn't serializing its data using a library
Wow, you couldn't ask for a stronger condemnation than that for YAML as a configuration language. "You clearly weren't using a library to write your config file".
In the world I live in, config file formats exist to be written and read by humans. Serialization formats exist to be generated by libraries.
Yeah, helm is...interesting. it's just a gnarly yaml -> template -> yaml cycle. it's better than nothing for the problem it solves....but god, there's gotta be a better way.
Just use JSON for everything that is supposed to accept YAML, and rely on the fact that the entirety of JSON is somehow a subset of YAML and that most YAML never actually needs the "features" of YAML.
At least with JSON, tooling can help you find invalid syntax in the middle of your file.
And JSON tooling/libraries (compared to YAML tooling/libraries) are usually faster, easier to use, have less bugs and present less incompatibilities between themselves most of the time.
There are a lot of problems in YAML, but I think the main real problem is trying to put logic inside configuration. YAML is still one of the nicest human-readable and -writeable data formats if it is only used for data and not logic.
CI/CD always has some logic, and it's almost never in pure YAML but also has some weird templating going on. Like, please, couldn't you just have a real API for some real programming language.
That's not really a problem with YAML it's an abuse of YAML. Accidental turing completeness is a problem all over, too. I remember using Ant 15 years ago - built in XML. It was the same problem and that wasn't XML's
I think a lack of type safety is the main problem with YAML - a wrong indentation, a typo in a key, a string that gets parsed as a boolean, etc.
Other than that I think it's a great format - terse and far less syntactic noise than other formats. This is why I wrote https://github.com/crdoconnor/strictyaml so people could write type safe yaml and get instantaneous, clear error messages on those things.
> That's not really a problem with YAML it's an abuse of YAML.
absolutely true, but the same happened to XML. XML is actually awesome with tooling that hasn't been matched to this day (XSD, XSLT, just to name a few).
But people abused it, it got a bad rap, browsers made JSON preferable, etc.
But the point is that companies are using YAML for these things and we're starting to see attitudes shift just like they did when the industry went nuts and abused XML.
Nickel (https://nickel-lang.org/) is a new configuration language which builds upon lessons learnt from YAML, TOML, JSON, JSONNet, Nix and many others. It has optional types and logic (functions, variables) for when you need it.
Every language website should have an example or 2 on the homepage. Show me the code/syntax, and add an explanation if you need to but stop with the feature lists. If it can't explain itself then... I don't know.
In my experience, the more a language is able to do, the more it will be used in ways that overcomplicate stuff.
People will just pull in abstractions because they think it is needed. This is why I always avoid using Turing complete programming languages as a configuration format.
I've spent hours using a JavaScript debugger to step through AWS CDK. A simple, dumb yaml/Json/whatever file wouldn't have that problem (and it was a small project, the complexity wasn't needed). Sure, there might be use-cases where you need to be able to do more. But until now, I've never seen that.
This is why I also prefer JSON over JS in JS tool configuration. This is why webpack configs get so messy. As soon as people can use a real language, their "DRY sensor" (or other bad applied paradigm sensor) triggers and they make things more complicated.
Being declarative also makes it easier to follow standard practice and supports tooling. Imagine the package.json would actually be like a build.gradle. That would make things much worse.
I like toml even more than json, but the real issue to me is that we, or most of us, have this subtle drive to generalize, and the more things we can make comfigurable the better. But as it is said, when something is fully configurable that is like saying it isn't done and assembly is required. Also, we end up designing declarative domain specific languages. But I don't want to program in json, yaml, or toml. They aren't made for that, and it hurts me. For rust the clap crate helps put the brakes on if you agree on a simple rule, anything that is a config file parameter is a command line argument and environment variable. It is one struct that gets populated by the combination of those things. If you wouldn't make it one of them, don't make it any of them.
Executable config can bring huge wins. Python is an obvious choice. Just say, I'm going to run your script in a Python interpreter (in heavily limited cgroup) and your script must result in a dictionary called CONFIG. Some wrapper logic could then serialise that in whatever way is convenient for the program under configuration.
Declarative vs imperative. Languages have state and behavior.
I've had to convert 50k class-based schemas to a declarative DSL; the requirement is to be declarative, not rely on external state. and portable as-is with as little interpretation as possible.
The flexibility of a language means some bug deep in a library will cause your infra to delete or misconfigureitself on accident.
An API requires transactions or at the very least a read-modify-write cycle. The API need to always be up so engineers can merge their modifications to infrastructure. Recovery and undo is also a problem that needs to be solved so you need a system with point in time recovery.
My Preference is to have an actual language generate what the system will do in a human-readable format, and you check that in, but Git and JSON or YAML with a Proto DSL act as the API, and then anything that can write a flat file can be used to configure your system.
Yeah, but we're adults and just because we can do something doesn't mean we have to. You can write shit code and you can write shit configs. Python can be written in a declarative style with perhaps a little magic sprinkled when pragmatic to do so. Completely banning imperative programming is like throwing out the baby with bath water.
If the project could benefit from such a configuration, why not? It's not a super heavy dependency and most distros include it by default. Obviously doesn't make sense for every project, but something like a CI system or desktop window manager, sure.
Why is "bad" schema design yaml's fault and not the schema designer? It's like complaining about complexity or rigidness of "helm". The helm charts do not write themselves. It seems like the issue is it's easier to complain than dismiss than understand and implement.
I agree with this 100%. When people talk about hating YAML, I think a lot of the time they really mean "I hate describing pipelines in YAML". And I get that because I feel it too.
As a file-format, YAML has pros and cons. But the real problem is in trying to use a JSON-equivalent file format to describe things that have conditionals,
loops, functions, and classes/subclasses (by way of templates).
YAML is fine for small configurations. But it grows into a vendor-specific spaghetti mess very quickly, especially once you start to need any kind of control flow.
Well I think you are asking in jest, there are some things you can do which amount to using it as minimally as possible. Basically make it as few steps as possible, using the minimal amount of options, and just call your own scripts.
The secondary benefit of this is that you move towards a world where you can "run CI" locally. This is something I try to do with all my CI now. If you've ever pushed a stream of commits trying to fix a CI only bug, you'll really appreciate the green of the grass once you get to the other side :]
I think Jinja-in-YAML is very much an antipattern. It seems to come from not designing for enough programmability from the start (and then from people copying successful projects that took that route?).
The page mentions some alternatives like Dhall and Jsonnet. Two more to consider:
1. Write a configuration library for a real programming language. Have this library generate a JSON configuration file that is treated purely as an inspectable artifact, not something to edit by hand. The user will have their configuration as tooling-enabled code checked into version control (hopefully). That it will be harder to make an emergency fix directly on the server has its downsides and upsides.
(What was the first prominent piece of software to implement this idea? I kind of got it by osmosis.)
2. Starlark. It is a non-Turing-complete language derived from Python that was originally developed for the Bazel build system. It has several implementations (I am not sure how deeply compatible they are) and now Python bindings (https://github.com/caketop/python-starlark-go for starlark-go, https://github.com/inducer/starlark-pyo3 for starlark-rust).
I also think Jinja in YAML is an antipattern, but I think that due to them using the default block and expression characters of "{%" and "{{" both of which are yaml characters and thus every single such occurrence needs to be quoted, versus the very sane "${{" used by GHA or even "<%" and "<<" (running the risk that, yes, "<<:" is a yamlism but is not legal jinja)
If you meant that having any executable inside yaml (e.g. not just picking on jinja, but "a yaml document is not code") then I fear that ship has sailed because people realized that inverting the literal parts and only sometimes having executable parts makes for a great way to ASP/JSP/PHP up some content
If I wanted start a sibling flame war within this thread, we'd talk about HCL and for_each :rolling_eyes: (but, for clarity, please don't talk about it, or at least not here, submit your own noHCL.com link)
> 1. Write a configuration library for a real programming language. Have this library generate a JSON configuration file that is treated purely as an inspectable artifact, not something to edit by hand. The user will have their configuration as tooling-enabled code checked into version control (hopefully). That it will be harder to make an emergency fix directly on the server has its downsides and upsides.
this is the route Amazon chose with CDK. it kinda works... but it really feels like you're building a rube goldberg machine to do anything non-trivial. how much of it is the CDK and how much is cloudformation plainly sucking I don't know.
I agree. I find working with templated YAML so cumbersome that I ended up creating a tool (Cels - https://github.com/pacha/cels) because of it. I like Jsonnet and Starlark but in practice I don’t usually need a new programming language for most use cases. Most of the time I just want to create a base document and apply patches to do modifications. That simplifies everything a lot.
The experience of writing pure YAML is actually not that bad (a couple of things about the format are really questionable, but it is workable). I find that problems appear from the complexity of the solutions you have to add to adapt documents to different environments.
> I think Jinja-in-YAML is very much an antipattern.
The thought of working with Ansible again is my nightmare due to this. Apparently they had an actual Python DSL you could use instead of YAML for a while, but discontinued it? So now loops and if-thens are these awful drawn-out messes of YAML that interpret jinja in completely unpredictable ways.
I would also like to avoid writing Ansible playbooks when I need to manage servers again.
Something I want to try out is Pyinfra (https://pyinfra.com/).
It is like Ansible in that it is agentless and works over SSH, but with configuration in Python instead of YAML and Jinja.
There is also itamae (Ruby, works over SSH or on the local machine, https://github.com/itamae-kitchen/itamae) and mitamae (mruby, local-only, https://github.com/itamae-kitchen/mitamae).
YAML is great in my opinion. What isn’t great is how hard we have made the deployment part of CD. I’ll admit that our setup within Azure DevOps isn’t amazing, but the fact that you have organisations with multiple teams or 5-6 operators working on these tools is just mind blowing to me. Maybe not for Google, but for us in regular boring enterprise where we top out at 50.000 concurrent users or typically way, way, less.
It was easier for me to deploy enterprise web-applications to our on-prem IIS that had all the load balancing and networking and whatever else done by less than a quarter of a full time employee back in the early 00’s than it is for me to deploy the same damn thing into our “modern” setup. Yes, there are advantages to our modern pipeline. We’ve moved far beyond the whole “it works on my computer” thing and we’ve increased the quality control immensely by putting up much better acceptance gates, but the actual deployment? That’s a damn nightmare in 2023.
Again, this is likely not an issue for many HN programmers who work in actual tech companies, or in companies with good dedicated DevOps teams, but in the world of non-tech enterprise CI/CD has frankly never felt worse in my career. I guess you can chose to blame YAML, and the loads of it you need to do anything and how hard templating seems to be. But in my opinion it’s far more of an organisational issue than a technical one. We need CD tools that are far more automated, so that it doesn’t become the job of the developer to describe the infrastructure as code. It’s fine that you can do that, but the reality is also that we are asking millions of developers to deploy infrastructure that they might not have the slightest clue how works. I’ve never met a developer who didn’t just want to hand off their container and then have all the networking and “server stuff” just done for them. And when you don’t, you end up with a lot of VNETs and subnets that nobody really knows how works and that your organisation is losing a ton of money on because your developers didn’t know you could so /x and then start the subnet where it came to.
You work offline on a "job definition" of some sort, submit it to the proprietary shared system, wait in a queue, then you receive a log file generated by a system you don't control. You can't run the proprietary system code locally on a workstation, and hence the inner loop is often tens of minutes long at best, hours or days at worst. There's no preview, no "what if" mode or "dry run". You work directly on production even if it's called "test" because there's only one system.
The real problem isn't YAML. It wouldn't matter if the pipelines were scripted in God's own programming language.
Software development on workstations instead of central timeshare mainframes became wildly popular because it allowed a dramatically faster inner loop, it allowed isolation from production environments, and it gave control back into the hands of the developers.
Current-generation CI/CD pipelines generally undo all of that.
Single-box Kubernetes reintroduces most of what made workstation-based development good, but it is still a very new system and has many teething issues.
PS: A related issue to yours is that there are great solutions for the solo dev doing click-ops for one app, there are great solutions for megacorops doing automation at scale for thousands of devs, but in the middle where you have a couple of enterprise devs managing a few dozen apps its just madness.
> We’ve moved far beyond the whole “it works on my computer” thing
I dunno. I've had things working on my docker image quite a few times while breaking at the deployed one.
This is only passed if you have complete transparency of the entire build and deployment pipeline, total access to the images repository, actual control over the building instructions, etc. That's about as restrictive as control over a local OS, and I do expect it to fail on about as many organizations as the ones that had problems with code breaking when moved into another machine.
Access to our container registry breaks all the time for no reason anyone has been able to figure out. Not even the third party people we pay a lot of money to help. Basically we now add both a managed ID to the AD group with pull rights, and we setup rights the old fashined way and now it works maybe 95% of the time. Sometimes it still doesn't, and usually the fix is to delete everything and deploy again.
As far as actually running the build images, however, if they run in Docker locally then we never have issues with running them in the cloud.
What's your problem with deployment? In everything I've set up I just tag a commit, push it, then that commit gets deployed. It seems really straightforward to set that up in any CI/CD system.
It's in the resource allocation/generation process of your deployment things get tricky.
The first thing you're going to have to do is the networking. You're not going to deploy things on the internet in an Enterprise Organisation, so you're going to have something that can make sure your application gets on the right VNET and the right subnet. Obtains the correct private-endpoints. Gets a DNS address so people can go to https://yourapp.yourcompany.com from the internal on-prem network that will need a DNS routing as well because it's not going to ask the internet. Once in a while, when you actually want your app to be accessed from the internet, you're going to need some sort of gateway resource that can transfer the traffic through whatever security your org has setup in such a way that it blocks, every, IP you're not intending on letting through.
The second thing you're going to want is a place to persist your data. You can have a centralized shared database, and most places use this, but the issue with that is that it'll eventually end up a mess. Not by bad intention, but because it's just very easy to build stored procedures, views and whatever else on top of the data from all those 100+ applications that are sharing the same DB server because it's a quick way to deliver business value (until it isn't). It also doesn't scale, which will eventually be an issue for you even if you aren't netflix. The flip-side is that you can't actually do a bunch of managed DBs because that is ridicilously expensive, and you likely can't use the CosmosDB (and whatever the AWS equvilient is called) because of company red-tape and also because your 3rd party DB backup vendor doesn't support it. So you either run them in containers, or share servers, or run up the bill. Two of those solutions require all the steps in the networking step, because they too need to be on the right private endpoints on the right subnets on the right VNETs, and all of them will need you to grant some sort of access rights through your IDM. Which is often as simple as giving your apps some sort of managed ID and then adding it to some AD group, but that's still something that needs to be done.
Deployment slots: now we're moving into the more ridicilous parts of it, but you're probably going to need to deploy your application at least twice. It'll probably be three or more, but it'll be at least twice. Each of these slots require that you go through the two previous steps. If you're wondering why, then it can be as silly as because some Enterprise Architect lied in a meeting and convinced everyone it was required by law, similar to how you're also forced to put 4-6 eyes on every pull request (though to be fair, this part makes sense).
And that's if you're using serverless dockerized function apps in Azure in an enterprise organisation. If you're not doing serverless it gets way worse. Sure kubernetes, travis, openshift and so on can do it much smarter, but in non-tech enterprise you don't have anyone employed to manage it.
@jiggawatts really put it well when he said this in one of the other comments:
> there are great solutions for the solo dev doing click-ops for one app, there are great solutions for megacorops doing automation at scale for thousands of devs, but in the middle where you have a couple of enterprise devs managing a few dozen apps its just madness.
In all seriousness I've written and deployed business critical serverless applications, which are less code than the bicep and yaml required to deploy them, and here is the biggest issue, I have no idea if my bicep is actually good because GPT wrote most of it.
Can't help thinking a lot of that is self-inflicted. The small company I work at is at "in between" size. We use Artifactory and Gitlab, both on-prem, and an on-prem k8s cluster. I'm not saying nobody needs more, but you can work your way up, it's not an all or nothing thing.
Sounds like you develop your own code, and/or get a choice of language, framework, or platform.
In enterprise settings the story goes more like: “We signed a contract with a vendor you’ve never heard of, now make it work. Their account manager assured me we don’t need all that fancy cloud stuff you keep talking about.”
Also: “We just signed a $1M contract for this cloud automation tool without talking to you first, the only person who will have to use it.”
(Both of the above are very lightly paraphrased from conversations I’ve had… this week.)
Yeah... I understand. I worked in an enterprise like that for a while. Got out as soon as I could. Can't stand the glacially slow pace of actually getting things (not bullshit) done. I don't think your problem is with tooling, it's with enterprise bullshit.
I think I have a solution to keep the peace, if we can all universally respect one rule:
Never use YAML outside of the Python ecosystem.
That way, people who love esoteric scripting formats that prioritize legibility over correctness, durability, and maintainability can keep all their tab characters, loose typing, and cryptic syntax. And the rest of us never have to! Those of us who prefer C-style syntax can keep our sanity.
Edit: OK, I think I've finally got my finger on the crux of the issue here. I think I can't explain this whole thing without being spicy. For C-syntax devs like myself, syntactic whitespace is pure madness. Whitespace is not information or an instruction -- it's formatting. Good formatting is helpful and useful, and good C-syntax devs care about legible formatting. In Python (and YAML), the formatting is instructional information. This has the benefit of making all functional code legible. But why the blazes does code have to be legible to be functional? Imagine working with a YAML coworker. You send them a long message. They reply with, "What? This is nonsense." You suddenly realize your mistake. Without separating your paragraphs with an empty line, you've broken the meaning of what you sent. Adding the empty lines back in and re-sending the chat message, your co-worker can now read what you wrote. Without syntactically-accurate formatting, the information you sent was meaningless.
Another purpose of code, as in software, is to be readable (aside from being executable). There's a million ways to do the formatting, and i would prefer people to use linter when they write code (hopefully, the same linter i'm using).
Those languages enforce a standard syntactic structure; there are less ways to write unreadable code, which is a good thing.
I prefer the middle ground taken by Go and others—have a standard formatter that is part of the language that everyone agrees to use, but don't make that syntactically relevant. That way you get readable code without individual coders ever having to manually fiddle with indentation levels, which is a source of endless problems when copy/pasting.
Agreed, top comments smacks of gatekeeping. I work in both Python(ic) languages and in C/C++, both have their strengths and weaknesses. No need to start flame wars.
Generally it's assumed that significant whitespace is a philosophical or even religious issue: some people love it, others hate it, and while both camps rationalize their preferences, it seems it's it: just a strong preference.
However, I came to think that the difference is not philosophical. It is just that some tools (text editors, e-mail programs, etc) support significant whitespace well, while other don't.
For example, the text editors I use are all set up to display space and tab characters, and to display them differently (usually as a faint dot and a faint mdash, or something). I am used to it, and this does not distract me at all.
From my point of view, code is not some random text. We use fixed-width fonts for it that we would not use in a book, we color-code it, I don't see why not make the white space visible.
I still have a "philosophical" preference for languages without significant whitespace, but I don't hate languages with it, they are not a problem for me at all.
But if the tools you use and love don't have good support for significant whitespace, or you won't make the whitespace visible, or even, you know, use vaiable-size fonts for coding... Well, then you will hate significant whitespace with a passion and consider it pure madness!
Coming from a C background I had the same views as you regarding white space not being information or instruction. I'll admit I looked down my nose a bit at Python in it's early days because of it.
What changed my mind, of all things, was writing Coffeescript¹. I'm not a big fan of Javascript, and writing Coffeescript felt like a distillation of Crockford's The Good Parts book. You couldn't forget and generate the bad parts.
But additionally found the indenting as code felt quite pleasant. The only hassle is not being able to press % on an opening or closing curly brace in Vi to find the other end of the block. OTOH, with the indentation, if a piece of code looked wrong, it probably was.
Mind you, I still haven't learnt much Python.
¹ These days I use Typescript, but if they ever came out with a Coffeetypescript…
I think TOML is a great format. Almost self documenting, easy to understand, less foot Gün’s for the uninitiated and the best is it’s based on INI format, but it’s smarter.
If you think you need to serialize something more complex, use XML. It might look ugly, but it’s very powerful, can be verified on many levels and has a mature ecosystem.
“Cooler” doesn’t always mean better, especially for bigger data. Neither YAML, nor JSON scales to that sizes while being readable and easy to maintain.
This. I've said it in one of the other threads re: YAML. TOML is great for simple user editable configs. Once you get into data exchange territory, use XML. Just the ability to define a schema and 'test' your XML document conforms is powerful and can tell you if something's wrong long before you end up with catastrophic failures.
So what if it's ugly? Is this a serialization format or paintbrush?
Also, I think TOML stdlib gets to the heart of what I've been arguing. People who like syntactic whitespace like YAML. Everyone else (like C devs) don't. It makes quite a bit of sense to me that C devs don't want to touch YAML with a 10-ft pole.
Will not immediately help in all cases of YAML usages, but at least can be a cooler way of defining resources in a Terraform-like style. In fact, it's already helpful as a replacement for HCL in one internal project, that was a final motivation to hack it.
In a bigger picture, I have no idea how to help with YAML omnipresence in Kubernetes. More than a half of my problems at a $daily_job is how crude is consolidating a final Helm chart from different sources. I am not saying that Helm would be inherently a bad tool or my company has chosen pretty bad way of using it - I guess everyone is doing their best considering the ciscumstances. But manipulating textual templates with semantic whitespaces is just too error prone, and the detection of errors happens too late. I dare to say - Kubernetes would do much better with custom format based on a C-like syntax, instead if trying to prove how cool YAML is, especially when it isn't.
Is it that an academic property, or are people serious about throwing JSON into running software that parses YAML and expect good results?
(Should I add: on production, because I want my tools to be used on production, but that's just me)
Hm, what JSON documents have a different meaning in YAML? I'd have hoped that with all names & strings being quoted and leading zeros being disallowed, that wouldn't be the case.
Yeah, ucl looks syntactically similar. On the other hand, I am not sure if it has expressions, and the variables usage is limited - they need to be registered in the parser, so they are existing outside of the config file; plus, they seem to represent just text.
My assumption was to use variables in the same file, to make the content more modular/DRY, and to evaluate expressions using these variables. Moreover, variables and expressions have basic types: int, float, string and bool. This is not an artificial need, I will give you concrete examples what I want to have in the config file:
- port numbers increased from a base value (int arithmetic)
- fields enabled if some other condition is true (comparisons, bool arithmetic)
- names concatenated from text and numbers
What else... in BCL variables are internal to the file; I want also to resolve the information existing external to the file, which would be: env varables (planned builtin function getenv) and ideally, output of executing command (like shell's $(), also planned). With this I can express easily and shortly quite complex configurations.
Actually thanks a lot for your comment, because answering this makes me realize I should put such information in the README of my project, maybe in some more structured way.
This is the inner platform effect. As an application grows, its configuration expands until it became a programming language, except bug-ridden, underspecified and with terrible ergonomics. The configuration-bankruptcy is declarer and a new configuration format is picked. Rinse, repeat.
The format itself is not without blame of course: the more flexible it is, the more easily can be repurposed into a (bad) programming language.
Having committed this mistake over and over, these days, I would pick the simplest config format possible (even .ini might be too powerful) for basic configuration, and delegate more complex "configuration" to a real programming language (possibly the same language the application is written in).
The vast (VAST) majority of the "YAML sucks" examples are solved just by quoting all your weird literals. YAML is definitely annoying at times (list of maps get weird in a hurry, for example, and significant whitespace almost always bites you sooner or later) but these kinds of articles seem disingenuous at best.
> The vast (VAST) majority of the "YAML sucks" examples are solved just by quoting all your weird literals.
But none of the examples you see do that, and your tools won't do that. The whole YAML ecosystem nudges you into writing your stuff unquoted, and it works most of the time and breaks just often enough to trip you up in production.
Then there's docstrings which have special language support, textwrap.dedent, type annotation strings, several different interpolating systems and most people don't even know some of them.
YAMLs strings are overblown and are just the result of having a few modifiers. Of all the things to complain about having ways to make chucks of text readable is the weirdest one. Don't even dare look at how many weird strings bash has.
2. I think bash is a bad tool and we just can't get rid of it because it's ubiquitous and people are used to it. Stockholm syndrome. I wish something like Oil shell or Nu shell would take over, ideally a shell which passes objects, not strings.
I love YSH! But that's an odd take since it's heavily inspired by Python and also has a bunch of string types and 5 different types (i did have to look that up) of string interpolation. Combine that with OSH and you have a language ecosystem that has more weirdness than Bash and Python combined.
* three string syntaxes (regular, formatted, and raw) plus byte arrays.
* single vs double quotes, which are insignificant, other than that you don't need to escape the containing literal quote type.
* inline vs multiline.
These all compose in a predictable way.
If you want to be a jerk, you can multiply 4 * 2 * 2 and say "ZOMG 16 string types!!", but that's about as fair as saying that having positive/negative, octal/hex/dec, int/float, and standard/scientific means that a language has 24 numerical types.
They are actually really useful when you want multi-line formatted strings without indentation and had to be made official when people wanted to take them away.
> If you want to be a jerk, you can multiply 4 * 2 * 2 and say "ZOMG 16 string types!!"
And now you understand how they calculated the number of string types in YAML.
This is the first data format I've seen that makes explicit distinction between lists and sets, which is good. But I'm not sure what the semantic distinction is between lists and vectors: in my mind, arrays and linked lists are implementation details of the data structure in code, not the data format.
The distinction between lists and vectors manifests from the same two distinct Clojure collection types. They are syntactically distinct within the language and are present to provide homoiconicity -- code as data -- Clojure lisp expressions can be represented in EDN. Vectors, [], are typically the preferred collection literal.
Indeed, it translates into and out of JSON quite well, and is much more human readable. A much much saner format than YAML. However, EDN is less hospitable to non-Lisp code blobs and heredocs, which limits its adoption IMO.
Not precisely whitespace-insensitive, as whitespace is required for element demarcation/separation. However, semantic indentation shenanigans are non-existent. Commas are considered whitespace and unnecessary -- which is beautiful.
When students send in homework through our e-learning platform, we receive all the submissions in a large-ish xml file. We read the submissions, pass it on to some (static) analysis and example execution, and write out (per exercise) a yaml-file with all the submissions, grading hints, and fields for our comments and grading, etc..
We then generate reports/statistics/feedback pdfs from the yaml files via markdown+pandoc.
For us yaml works great, as it's easy to add further feedback in markdown syntax (just correctly indented "- you missed a `NOT` here"). Via the different escape methods for block text, we can print the SQL submission nicely formatted without escape characters, etc., even when the students uses all the different SQL-delimiters. It is all plaintext, so we just use a text editor and we store that all in git and have accountability over grading. We have everything archived machine readable, so that we may test new static analysis stuff on past submissions.
But also having to write CI-pipelines and home-automation configurations in yaml, I understand the struggle.
Stripping indentation for embedded text is probably the nicest feature YAML has.
In TOML you either just forget about indenting multi-line strings (which makes them less readable) or you have to remember to putting a backslash at the end of each line. Neither option is ideal.
This makes YAML pretty nice for DSLs or configurations that need to embed markdown or other textual format and give it an edge over something like TOML.
But I won't put the entire blame for "YAML fatigue" on all the CI and DevOps tools that chose to use it as the carrier format for their DSLs. YAML has major issues which the OP site outlined very well:
- The famous "Norway issue" (solved in YAML 1.2)
- Leading zero parse as octal (solved in YAML 1.2)
- Overeager type-coercion for numbers, dates, times etc. can be confusing.
- Different modes for processing multi-line strings can be quite confusing.
- Unsafe serialization (not an issue for modern parsers, but you need to be careful when using YAML on a veteran language with dynamic features like Ruby, Python or Java).
All of these are issues with the YAML spec itself.
There is another fundamental problem that exaggerates the yaml loathing:
If you're on the dev team, you solve the problem by programming. You also leave and/or create other problems for the devops team to solve because of microservices, 3rd party integration, etc. Depending on your dev environment, this cycle can range from incredibly quick to incredibly slow.
If you're on the devops team, you're often left with solving the problem with a configuration file only. The dev cycle for that is usually horrendous both in time and feedback on each iteration. Anything that gets in your way becomes a victim of your wrath and anger.
I feel like we keep making the same mistake with these configuration formats.
Most of the problems with YAML are the problems that XML had in the early Spring Java days, and most of those problems stemmed from "programming" by gluing together software via configuration files. This isn't a good pattern regardless of how you format the files.
The moment you start to reach for schema definitions and validation for your configuration language, you should consider that a warning sign. We've been down this road before. It does not lead to a good place.
I really like what dhall-lang (https://dhall-lang.org) is doing in that sense. It manages to strike a nice balance between power and simplicity from the start.
I have seen this page more than a few times, but it took me until this viewing to realise that the document shown is not valid YAML, because it uses tabs, which YAML rejects.
Not that joins in SQL use a particularly intuitive notation. At the very least it doesn't make it obvious you can usually swap the order of joins around without affecting the end result.
Joins can certainly work in a data format like YAML. For an example, see Honey SQL from the Clojure community [0] (though without something to contrast strings like Clojure's keywords, you miss out on the automatic parameterization).
You mentioned moving JOINs around, so I'll mention that if represented as structured data, you can move any of the top level components around, so you could more closely follow the "true order of SQL" [1]. For example, I would love to be able to put FROM before SELECT in all or almost all cases. There's also being able to share and add to something like a complicated WHERE clause, where essentially all programming languages have built-in facilities for robustly manipulating ordered and associative data compared to string manipulation, which is not well-suited for the task.
Now don't get me wrong, I don't particularly care for YAML (though it doesn't bother me that much), but as someone who's done their fair share of programmatic SQL creation and manipulation in strings, not having a native way to represent SQL as data is a mistake in my opinion.
Yeah those examples aren't really fixing the problem I was referring to, if anything they're making it worse.
What I was referring to is that 'SELECT * FROM A JOIN B ON f(A) = g(B)' and 'SELECT * FROM B JOIN A ON f(A) = g(B)' mean exactly the same thing, but this is not obvious from the language. This is especially iffy when you start joining even more tables together.
The equivalence is clear when you write out the corresponding diagram and note that the join is its limit, but your examples seem to make the same mistake SQL did by lumping together the join condition with one of the two tables.
Does it seem clear to you that a + b is equivalent to b + a? If you know the semantics of join operations (as you should to at least a basic degree if you're going to use them), the SQL notation above is just as clear as the arithmetic notation for addition.
Hm, maybe it's because I'm so used to SQL, but I had to lol about the yaml example, I think it's quite nasty to read and I'm glad that SQL looks different.
It's not that different from formatted SQL. It's uglier, though, IMO, and more error prone. But to hate YAML for the fact that SQL looks less familiar in it, is a weird take.
I guess the broader point they're driving at is that attempting to port familiar programming constructs to YAML DSLs will result in this kind of ugliness. If that's true though, it's a bit too subtle for my tastes.
GenX: “Here’s XML, it can be used as a data exchange format and here’s a tool ecosystem to ensure structure and correctness”
Millennials: “That is way too much text, it’s like way longer than an SMS message. And everything’s on the internet. Here’s JSON, which is much shorter and more Web 2.0.”
GenZ: “Bra, all those squiggly lines are sus, try this YAML. It’s dank, no cap.”
On the topic of "shocking deficiencies in markup/configuration languages": I only recently discovered there are some characters in XML that are outright illegal, even when escaped[1]. Implementations vary on whether they actually honor these restrictions, making interoperability an additional issue.
As a result, user-supplied input can't just be escaped: it may also need to be base64 encoded or otherwise transformed into a safe alphabet.
I guess it's just me but that website is quite readable on desktop. I mean, all I'm seeing is perfectly legible prose and satire while still being a legitimate config file. I literally would share this website as proof of Yaml's readability.
As to your frustration with logic and code being stuffed into YAML, I totally get it. Everyone coming up with their DSL, I'm guilty of it too. I think YAML unfairly takes the blame for unnecessary abstractions built on top of it. YAMLs only fault being, that you actually can write DSLs with it easier than with anything else.
I wrote a YAML parser for fun a few years back (https://hex.pm/packages/mark_yamill). I haven't looked at it in years, and to be honest I can't even remember if I finished it. That being said, the thing that turned me off to using YAML in my own projects was there was subtle behavior I didn't want to have to understand or remember. JSON isn't perfect, but it's easier for me to reason about.
I recently discovered Strict YAML, it sounds it makes lots of sense but there is only a python implementation.
I like YAML, it is super easy to read, I like it for simple configuration, it works good for Ansible, Github actions use it. I never ran into any issues, but reading about it I get why people do not like some parts of it. I think it may have to many features that nobody uses, but the features to cloning and injecting and merging content is are kind of cool, used them once for config that got a lot or repeated stuff.
Joe Beda makes the point of why configuration/DSL "languages" end up having issues in one of his early TGI Kubernetes streams, and it still resonates with me. YAML/JSON/XML/TOML have issues, but the main issue is that configuration DSLs evolve until they become an ad hoc "real" language. Joe describes the issue well here: https://youtu.be/ILMK65YVSKw?t=941
I like Cue and Dhall that aren't Turing complete which is important feature for configuration language that can be run by someone else.
I think some of the problems come from different areas using similar languages. There are data serialization langugages, JSON is good for this. Then there are configuration languages, which need to be human editable, YAML is good for this. Then there are configuration languages that generate configurations. Shoehorning this into YAML/JSON makes a mess. Better to have dedicated language and output JSON for API.
Anyone have a suggestion for an alternative to YAML that's easier to write by hand than JSON and also supports nested declarations? ("XML with a fancy text editor" doesn't count)
I have a project where I'm using YAML just for that: nested tree-like (data) declarations and reasonably succinct and relaxed syntax. I have run into some of YAML's parsing weirdness occasionally, but overall it hasn't been that bad to deal with when it happens.
> Anyone have a suggestion for an alternative to YAML that's easier to write by hand than JSON and also supports nested declarations? ("XML with a fancy text editor" doesn't count)
S-expressions. A parser for s-expressions (with no evaluation) can be as small as 300 lines of C code.
It'd take a day to write, and can express everything in a way that:
1. Current IDEs and code-editors can semantic-highlight
2. Current IDEs and code-editors can auto-format
Add in builtins for setting/getting symbols, and you can also DRY your config with variables and functions.
[EDIT: The DRY is optional - if you don't want someone messing up configs down the line, then don't implement any builtins that modify the symbol table, just examine the tree in your program after you have read in the s-expressions.]
But then you have to evaluate things, which brings all kinds of problems.
Your parser is no longer 300 lines, and eventually you might snap and throw it all away for a full-blown interpreter which makes dealing with user provided input difficult.
I'm actually investigating using S-expressions for describing jobs and systems, but haven't figured out how to safely evaluate them short of running in isolated VMs.
Edit: thinking more about this, it is possible to represent all those things without evaluation. It's just awkward to create them... but that can be solved on a different level.
There is a discussion above regarding EDN, which supports S-expressions, has maps, sets, and vectors (JSON style collection syntax []). I think it could be an excellent replacement for YAML in some important use cases, but it is lacking for representing inline, non-Lisp code and heredocs.
I'm kinda curious what's so difficult about writing JSON. I imagine it's quite subjective, but I've found the explicit bit of JSON to be nice. My editor is going to spit out matching brackets, quotes, braces, etc.
I think, at least for me, when I'm writing JSON in a document I can just go "I need a map, type `{`" and "I need a list, type `[`" and I like that.
I think my biggest gripe with YAML, TOML, etc is that to save me a few keystrokes I often end up playing this game of "Ok now how will the library interpret this, I'm adding a line here, it's indented, let me scroll up, ok, the last logical thing that happened was X so it's going to interpret this as a list item..." and that feels so taxing to me.
Sorry they are all company resources so I cannot share a screenshot. But imagine writing dags using yaml, so you would have a field called "dag_id", one called "schedule", things like that.
If I need to grok a complex nested data structure then YAML is a 'mare, but JSON is not much better - even with nice IDE support. TBH end up just running a script to flatten the whole lot into lines of key-value pairs, where each key is.the.full.path.from.root, with numbers for array indices - NOW I can see what's going on. I believe this is similar to how TOML is structured, albeit people criticise it for being too verbose as a result!
For anyone looking for such a script, there's some CLIs that make it easy. One is `yq -o props` [1], another way is to use `yq -j` or `yj` [2] to convert to JSON and pipe it to `gron` [3].
This is made worse by the fact that you need to write for a specific structure/schema, but typically you have no validation for it, and no immediate feedback on errors.
You tweak a line, push, wait, check if it worked, rinse, repeat. It's even more infuriating when YAML-consuming tools silently ignore extra fields. You get some dash or colon wrong, and then it just doesn't work, and it won't tell you why.
Yeah, this is where it really gets painful. CI/CD systems like GitLab don't include linters for their yaml configs so you're just shooting in the dark trying to get it work for push after push. it's a terrible way to make a system
I think what's missing is an easy way to describe schemas, field types and documenting the meaning of the field (of course you can use json schemas with yamls, but it's an external file).
I know of a new bunch of new tools which all look interesting. I feel like most of the time, they miss an easy way to parse using your programming language of choice, and you end up having to convert from cue -> json -> load in python.
The very last thing (Don't forget the extra line break otherwise all shit goes to hell) is not valid. There is the ">" operator that "replaces a single newline by space but keeps empty line as newlines" and the "|" operator that preserves newlines. So in the example they just need to use the right operator and all is fine.
I don't find YAML undertandable. I find JSON very understandable.
I think it's another example of the different ways people understand (I don't want to call it "reading") code: "textually" or "visually".
I suspect the same subset of people find lisp, coffescript or yaml readable, with the same drive of removing everything that isn't "words" from the code (useless noise!). The punctuation doesn't let them read the code as a text, I guess? and they only build the structure of the code in their heads after a "linear pass" through the code?
While myself and others like me are very happy with all the punctuation that makes the structure jump out to us at literal first sight. I build the structure first in my head and fill in the "words" later.
I made it sound all neutral-like and all, but it's not, my way is certainly much better. Learn to grok code visually, it's better.
I would rather read K8s manifests in yaml than any of the alternatives. The important point here is that config/serialization languages are like programming languages in one key aspect - there is no single language/format that covers every need. The choice is based on the tradeoffs - there are cases where yaml is better than toml and cases where the inverse is true. There are also a lot of cases where people prefer a bit more features like Turing completeness (eg. jsonnet. Ansible playbook format is a counter-case where yaml gets retrofitted with ill fitting features). And there are cases where such features don't really make sense (like serialization of state in K8s).
It's not like anyone if forcing you to use one format. For example, K8s spec can be in yaml, json or jsonnet. Choose what feels most natural, instead of sticking with the default and complaining. Conversion between formats is cheap anyway.
Formatting the web page as YAML is a false argument against the markup language. YAML isn't meant to be read like that. The web page would look terrible in any of the alternatives, JSON, XML etc.
And don't forget "Yaml Aint a Markup Language".
It's actually apparently a data serialization format, which I don't get, because it's obviously designed to look nice to humans and not computers.
Markup languages aren't supposed to look nice to humans? They are supposed to produce documents that do.
Markdown is a markup format that looks nice to humans, but neither XML nor HTML are particularly nice. Some XML-based formats like Microsoft Office documents are downright incomprehensible without specialized tools.
> It's actually apparently a data serialization format, which I don't get, because it's obviously designed to look nice to humans and not computers.
What's hard to get? It's a data interchange format whose primary usecase is to store data structures intended to be read and edited by humans, using plain old text editors.
Maybe I'm misunderstanding serialization here, but my understanding of it is that a program can serialize it's internal datastructure into a easily transmittable or readable format. And YAML is almost always a configuration format, (so a deserialization format?)
Edit:
> In computing, serialization (or serialisation) is the process of translating a data structure or object state into a format that can be stored (e.g. files in secondary storage devices, data buffers in primary storage devices) or transmitted (e.g. data streams over computer networks) and reconstructed later (possibly in a different computer environment). [Wikipedia]
So a serialization format is something a machine generates to store or transmit some internal data structure or state. JSON is an obvious example. YAML on the other hand is a superset of JSON with a lot of different ways to represent the same data. This creates a problem; the serializer has to choose which way to use. But writing YAML is simple, way more than JSON. Built in to YAML there are many things that seem to me to be made for the humans writing the format and not a program serializing to it. In fact, it's not often I see some machine-generated YAML, it's almost always something made for humans to write.
Saying the sole argument this webpage poses against YAML is that it's hard to format a webpage with is a straw man by you, though. There are 172 lines of arguments you're ignoring.
My point was that they didn't identify a false argument, as "Formatting the web page as YAML" doesn't look good so yaml is bad wasn't an argument put forward by the author of the web page. Hence the straw man.
Is there a yaml without all the bells, whistles and magic?
Like the syntax (it doesn't turn non-techies off from changing some config values), but aren't a fan of all the magic whistles that can shoot you in the foot if you don't have near-encyclopedic knowledge of them.
The spec is too big for something that's supposed to just live in some of our configuration files.
YAML 1.2 removed some of the footguns like having five different ways of expressing `true`, but the main Python YAML package is permanently 1.1.
strictyaml is an attempt at tightening things further. But a lot of systems that use "yaml" don't implement any spec, because they write their own parser. For example IIRC gitlab ci doesn't support anchors but has their own code reuse construct.
And then there are all of the string templating extensions that projects like Ansible and Helm employ. These in my opinion are far worse than anchors and should be eschewed in favor of a non-static configuration like jsonnet.
I reckon the only thing it's missing to be truly accessible to non-techies is that string values still need to be quoted, i.e. you can't have:
key: this is my value
(I'm definitely not saying it would be a good idea to allow quotes to be dropped, just that that's the only potential stumbling block I see for non-techies.)
Edit to add: oh, hmm, I guess trailing commas are fiddly too. Possibly I want some even laxer variant of JSON that lets you use just newlines! :P
Hjson is very similar to JSON5 but allows quotes to be dropped and can use newlines instead of commas. There are implementations for a lot of different languages, I myself contributed the C++ implementation. I wanted something smaller than Yaml but more lax than JSON, found Hjson to suit my needs perfectly.
JSON5 is elegant in that it's purely a subset of JavaScript (exactly as the original JSON is). But in practice that doesn't really gain you much besides a familiar syntax, and Hjson looks simple enough that the slight differences from JS shouldn't be a problem.
The real problem is how to persuade people to use something like this. For the few cases where JSON isn't already entrenched, YAML and TOML and the like have already salted the earth.
1 is a number, 01 is a string, 0.1 is a number, .01 is a string.
I prefer explicitly marking strings. If I try to write a number, I want confidence that I'll get either a number or a parse error. (Though that doesn't necessarily mean quoting the string, other methods could work.)
Nickel (https://nickel-lang.org/) is a new configuration language which builds upon lessons learnt from YAML, TOML, JSON, JSONNet, Nix and many others.
Not affiliated. Just want to see the project succeed.
Reading the responses to this makes me wonder how many people have read the Mythical Man Month and not made the connection that config files are modern JCL -- the original instance of "this config file format should have been a real programming language from the start".
This is my problem with writing YAML for Home Assistant. You have to do so much code gymnastics to do the simplest things. It tries so hard not to be a programming language that it really should just be a programming language.
It’s not that hard, YAML is disappointingly and unfortunately, unnecessarily, extremely ambiguous, which is exactly what you do not want in the contexts where it’s dominant.
The solution is uncannily simple tho, fuck YAML, just generate it from Dhall; enjoy the subsequent bliss!
YAML would be fine if parsers did not try to guess the type of values. 07 or 08 should not be parsed as integers or as strings, but stored as is until the application requests an integer or string.
Then you run into the differences between implementations. No one implements the whole standard.
The company I work for had a bug driven by the underscores-in-numbers and scientific notation functionality (`20230101000000_e111111`, a timestamp underscore a git short hash, was parsed as a huge number rather than a string). The serialization library from one language did not implement those features and so did not quote the value in its output, but the deserialization library in the other did and would parse it as a number. This eventually resulted in it becoming null (not sure why, maybe it was too large for the int type in the target language?) but even if it hadn't been we then format it into a URL (to download an artifact) so the fact that our YAML libraries disagree would cause the string value to change and break the download.
The YAML standard is too big, resulting in only partial implementations, and the disagreements can result in unexpected value changes. With no handcrafting whatsoever.
I believe YAML has serious issues and I would not be using it as a configuration format.
But I think a lot of the criticism around YAML misses the mark. YAML is not excessively overused as configuration language. It is not dominant like XML was back in its heyday. TOML has a pretty strong mindshare, and many projects are still using JSON or XML. I would pick TOML over YAML any day for this usage, but I still think YAML beats XML and JSON (just think about comments and terrible merge resolution whenever you add an item to an array or an object).
But YAML does reign supreme in the world of DevOps-oriented DSLs. YAML is more suited for DSL compared to alternatives, like TOML, JSON, JSON5 or XML:
- YAML is the least verbose of all alternatives (especially XML, but even JSON5 and TOML).
- YAML requires the least of amount of "syntax noise" on top of your DSL. The "syntax noise" is mostly limited to indentation, colons, dashes (for arrays), indented text block markers and occasionally quotation marks. JSON, JSON5 and TOML require quoting all string values and explicitly marking arrays and object boundaries, which make the DSL seem less fluent. XML is even worse.
- YAML supports indenting multi-line strings, so you can easily embed other formats. This makes adding markdown descriptions (in Swagger/OpenAPI) or embedded shell scripts (in YAML-based CI/CD DSLs) a breeze.
- YAML syntax is hierarchical. The same is true for JSON and XML, but not for TOML. TOML syntax only has two fixed and assymetrical tiers of hierarchy: sections and key-value pairs. Hierarchy can be achieved while parsing, by grouping sections hierarchically, but this is not reflected in the syntax, and this property makes TOML ill-suited for DSLs.
Of course, TOML was never meant to be used for DSLs - it's was designed as a configuration language, just as JSON was designed to be a simple data exchange format and XML was designed as a document markup language. But if you had to choose any of the format above to base your DSL on, YAML seems like the natural choice.
What is the best alternative to YAML then? Probably writing your own parser and removing all unnecessary cruft that you keep to make your DSL fit the YAML straitjacket. And that's the real problem, and the real reason why YAML is overused: parsing YAML is easy, but writing your own custom parser is hard.
Sure, most languages have at least one easy to use library that lets you implement a basic parser based on PEG or parser combinatorics quite easily. But even then you have to deal with all the fun things like: ambiguities (or lack thereof[1]), the limitations of whatever parsing technique you've chosen, tokenizing newlines and indents and parsing a token tree that is still not as clean and easy to parse as YAML.
YAML is still two or three order of magnitudes easier than even writing the simplest parser, so most tool writers pick up it for writing their DSL. This is the age-old "worse is better" principle. It is faster and easier to release and iterate on YAML-based tooling, so most projects pick up YAML and the ones that pick up YAML tend to both ship and iterate faster, and then go ahead to capture the market.
> IMO if there was something that was substantially better, we would see projects switching to it in a heartbeat.
Maybe not in a heartbeat, but I do think most of the "YAML alternatives" out there didn't even try to address DSLs. Maybe we should admit defeat on the make-your-own-parser front and design proper DSL carrier language that can be at least somewhat better than YAML.
Funny that a site trashing YAML is actually a great resource for it. I, for one, am more than happy to use YAML, there isn't a better alternative today to write human-friendly, complex configuration.
I miss the days of not putting all the settings of a thing into a text file. I'd take checkboxes and radio buttons over esoteric text files that require extra research every time you edit one.
YAML is like Regex in the "Now you have two problems" philosophy.
It's like using an XML file for what a txt file could do.
It demands frankly unnecessary structure and formatting just to be ingested, when it's all serially-read parameters, anyway.
YAML isn't the only way to implement parameterization. We could use our imaginations and think of a ton of tried-and-true ways to abstract it.
Have you ever done a quick-deploy on a cloud platform? The alternatives just generate all the esoteric stuff for you. Config files generated by guided steps.
these complaints are misguided. yaml isn’t the problem, it’s whatever the tooling is around it. Yaml is pretty easily transcribed 1:1 to json, these same people will never complain about json because the tooling around it is better. Yet, yaml is usually easier for me as a devops/SRE person to work with because it is more human readable than json and super great tools like helm use it almost exclusively.
I'm happy to complain about JSON as a config format as well..
Yaml is so hard to get right, a misplaced dash or indentation in a list and you're screwed but it's still passable, just does something else than expected. It's not really human readable either, it takes a lot of brainpower (at least for me) to understand what belongs together in a list or how things are grouped together. Much more than for instance xml where it's obvious.
Yaml only barely works, because many editors have added support to help you write k8s config. Otherwise it would be impossible.
It’s essentially the same data structure under the hood. It’s the interpreters/parsers that are the issue at least in this complaint. “Norway” being interpreted as “NO” boolean false is something that sounds very weird to me, almost to the point of being hyperbole, and I’ve worked with “yaml” (mostly helm and simple configs) for years and dealt with really ridiculous parsing issues. Usually though they’re really easily resolved.
Yea, you shouldnt put a lot of logic in your configs. yaml doesnt do that. the tooling does. and templates can be simple and easy to read and DRY principles can put you into traps that make your code harder to maintain than it should be.
> these complaints are misguided. yaml isn’t the problem, it’s whatever the tooling is around it. Yaml is pretty easily transcribed 1:1 to json
I disagree. YAML can't be transcribed 1:1 to JSON.
Here is an example:
&map: ? {[Complex, key]}
: *map:
Otherwise Yaml wouldn't be a superset.
And fact introduces context sensitivity (flow vs block, plain vs fold/literal) means its perfs will be kneecapped. I'm aware of high perf json/xml able to read GB/s. Not so for YAML.
The context of this conversation, I think, is in configuration languages - I can’t think of a single reason I’d ever consider performance in that spot. Obviously in data transfer you’re not gonna use gb/sec of yaml, which is why I’ve repeatedly stressed the term “human readable.”
Can you give me a practical example of the incompatible data structure you’ve described that isn’t reliant on tooling? I really can’t.
That's not obvious and second, even for configuration it highly depends on the domain. I've seen game engines that used YAML for configuration of in-game entities, where YAML brought its own set of performance misery.
Just because you use it for one page docker conf, doesn't mean someone won't use it for 3000 yaml files that need to load in under ten seconds.
Not to mention complexity of the language will reflect onto complexity of the tooling. The more complex writing tooling is, the less likely someone will write it.
Also I hate the term "human readable". JSON is human readable and editable. Sure you might have problems with editing it, but my work colleague can read binary blobs. That doesn't make them binary blobs human readable.
YAML has a lot of quirks. I've written a fair share of YAML config files and I might still prefer it to JSON for that use case. However, JSON is as barebones as it gets and requires zero brain power to read. It's not just tooling, it's a lot less error prone.
to me, json is not easier to read at all, not by a longshot. one innocent missing brace can cause an hour-long rabbithole in a lot of tools. and again, this is an issue of tooling. There’s no way that json’s enforcement of syntax like curley brackets that are easy to confuse with square brackets is easier on the human eyes than indentation based organization like python/yaml, but, people should use whatever’s easiest for them. I just struggle to believe most people naturally find json more readable.
I think you have like a dozen of available json-validate tools, and even dumb ones can find a missing curly brace (dome will even propose you lines where you can add it).
Also, if the CD process is not too dumb the json validate will find the errors before the file enter production which will save you. At my first company they rewrote our ansible inventory files in json because a yaml uncaught indentation mistake made a firewall open by default, silently.
Obviously when I show you this, it's clear that b is incorrectly indented, but in a real file, this can be a nightmare to figure out. Particularly if the entries are nested, or tab characters are involved.
Not really, i think yaml have more capability, you can convert json to yaml 1:1, but the reverse isn't true.
But here is the issue: the errors caught in a json-validate won't be in a yaml-validate. If you have 15 inventory file to write per day, you will do a lot of copy pasta, sometimes with errors, and most of those times ansible won't resolve because the yaml was obviously false. The one time it wasn't, it was sadly on a critical part of our infra, and almost cost us a client (and definitely costed trust).
Yaml is way more permissive (because it has more capabilities). But we didn't care about map and semi-code or whatnot. If you need it, more power to you, it just wasn't our original usecase. So they wrote some perl json->yaml interpreter with restricted keywords, and after that, the misalignment risk was negated. Ultimately i arrived at the company, was told the story, then wrote stuff to convert excel->json, then SQL->json, and effectively in the end the extra conversion was unnecessary, and probably harder than just converting directly in yaml. I'm not saying yaml is worse than json, i think in 80% of cases, it's better. I'm just saying that in the rare cases where being correct is way more important than being fast, i'd use json.
Makes me wonder if a kind of meta-format for config would be useful. Like, this is the internal data structure, you provide the mapping from whatever config format you like. People would then be free to open the config in whatever format pleases them. The project maintainer might have arbitrarily picked YAML, but your editor seemlessly opens it as TOML, for example.
The more I look at it, the more I think that Deno is actually a perfect fit for infrastructure as code.
The first reaction I get is usually "What, JavaScript!?". Here is why its better than it looks like.
Deno easily runs TypeScript without a compile step. TypeScript is a very mature, developer friendly language that was designed to model the super complicated types often seen in JS. This includes unions and intersections which help you model complex rules between optional properties, as well as template literal types which can help you restrict string constants. As it happens, those same types of objects are present in our configurations.
TypeScript is also flexible about how much you'd like to model. Do you want only properties with really nice autocomplete with docs from the language server? Sure we can do just that. Do you want to only allow strings that look like a time duration for that property? Can be done too with template literal types. You choose where to invest your energy.
Building abstractions on top of IaC is seen as painful and obscure. TypeScript has tools to help you avoid that, such as api-extractor: https://api-extractor.com/ - it can enforce that all your function abstractions are documented and it can generate that documentation for you.
The larger ecosystem also comes with a lot of ready-made codegen tools that can help, from small modules such as json-schema-to-typescript (https://www.npmjs.com/package/json-schema-to-typescript) to larger mature projects such as jsii and https://github.com/cdklabs/json2jsii . They can be used to build the tooling to import things like CRDs and other external schemas.
What about running arbitrary code, launching rockets, turing completness? This is where Deno comes in, with its permissions model (https://docs.deno.com/runtime/manual/basics/permissions). You can allow only a subset of commands (lets say, `helm` and `kustomize`, while you're migrating away from them), a subset of (writable) directories, accessible network hosts - or none at all.
What about the pains of managing JS modules? Deno lets you import files directly from URLs. It also allows you to set private tokens in env vars to ensure the imports succeed: https://docs.deno.com/runtime/manual/basics/modules/private - as a result, you can manage your IaC libraries the way that makes sense to you, without dealing with the unwieldy package managers of the node ecosystem.
Did something happen recently to get the YAML hate train going again?
I get it. YAML is not perfect. Neither is JSON, TOML, XML or even code as configuration (Xmonad anyone?). They each have pros and cons and projects/technologies take those in consideration and pick one.
Not sure I see the point in hating on one specific configuration language. If it was that bad, nobody would use it. And if you still think it's bad anyways, you can always improve on it. But very few actually want to put on the enormous amount of work needed to improve YAML or create a new language.
IMO if there was something that was substantially better, we would see projects switching to it in a heartbeat. But the fact is that most times the difference between them is not substantial, so the effort to make any kind of switch so you can shorten Norway is simply not worth it.
Haha, if that logic held true ... we wouldn't be using lots of things.
Usually people use things, because "that's how we have done it before" or "that's all we know". Not many people look for better tools frequently. They try to get something done and the moment it is working, they are done with it. People are frequently punished for trying to improve the current state, by management that tells them: "But it already works! No business value in changing it now."
You forgot "that's the only viable option provided by the vendor." Some people are reliant on certain software as a requirement and forced into certain standards imposed on them. They don't exactly choose the standard, they have a functional requirement and the only way to achieve it in some budget/timeline may be with a third party solution that uses, say YAML.
Yeah I've been there personally too. But why aren't we complaining about the vendor then?
I find it weird that we love to complain about the YAML format instead of the projects that chose it. Given some of the emotions I saw in other responses in this thread, it looks more like a venting exercise.
Which is fine, I guess. I was simply curious on why suddenly it got propped up again.
If ten thousand vendors use one subpar language, it makes sense to try to swing opinion away from the language, rather than play whackamole with eleven thousand vendors.
For some reason that line triggered a lot of folks. That was not my intention, haha.
I agree that it's not always under our control and that can be extremely frustrating. But that's not YAML's fault, is it?
When I wrote "that bad" I truly meant the extreme version. Something so bad that it has no upsides. Which IMO is not the case here. YAML has pros and cons, just like all others, and for one reason or another many folks in several different projects decided that YAML was a good enough choice.
I have a very hard time assuming everyone that ever chose YAML is so incompetent that they never thought about the pros/cons of it.
There's always a tool that will work better for a specific use-case, especially when you either don't understand or have forgotten about some of the requirements.
Well the question about business value is a good one, no? After all, the question is not "Would improving this thing be a good thing?" it's "Would improving this thing be better than every single other thing we could be doing with those same resources?".
Besides there's an equal and opposite danger with too much change or change delivered without clear benefits. I'd prefer a 7/10 UI which stayed the same for a few years vs a 8/10 UI which changes substantially every month.
Here's the thing. We've basically been using TOML since the INI file format from MS-DOS. It works, and it's useable, and we've all seen it. TOML is just an evolution of the INI format to fix some of its shortcomings. YAML blasts onto the scene like "hey, what if we just rewrite the JSON format to make it legible?" which brings with it a million edge cases of problems.
YAML is the new kid on the block. And despite having several great formats for different use cases already, some kind of mass hysteria caused big players to adopt YAML. I suspect everyone who adopted YAML is either a Python dev or they were drawn in by the legibility of the format. It's easy to read. While that's true, did anyone stop to think about what it's like to actually use the format?
I think the reason everyone jumped ship from XML to JSON was that JSON is comparatively very dumb - and dumb is quick to grok.
Then some of us realised JSON is actually too dumb for many things, and instead of going back to XML we made JSON Schema, OpenAPI, etc.
Others of us thought that the main problem with JSON is that it's not human readable and writeable enough. So we came up with new formats like YAML. [EDIT: My timing is wrong here, sorry.] Unfortunately being human we could not resist making it much more complicated again, thus increasing cognitive load.
There have been many times when in order to really understand a YAML file (full of anchors and aliases, etc) I've had to turn it into JSON. This is ridiculous.
The timing doesn't quite work out for this explanation, unfortunately. YAML is about the same age as JSON and started finding a niche as a configuration language in parallel with JSON finding use as a serialization format. Ruby on Rails 1.0 was using YAML for configuration in 2005, and it didn't even have JSON support out of the box at that point.
Serves me right for not checking! I certainly became aware of YAML long after I started using JSON. But I do think people are choosing it over JSON for it's alleged improved read/write friendliness.
Indeed, back then it as "Yet another Markup Language" (https://yaml.org/spec/history/2001-12-10.html). I remember using it to write blog posts with static generators, like webgen around 2004.
Interesting, I'm surprised the opposite way as the others replying -- I thought YAML was much older than JSON. We all encounter things at different times I guess.
This is lovely, I didn't know. I guess this is what Kuhn was talking about, we write history in retrospective, sorting it out preferring narrative over fact.
> Then some of us realised JSON is actually too dumb for many things, and instead of going back to XML we made JSON Schema, OpenAPI, etc.
This take doesn't make any sense at all.
JSON Schema is a validation tool for languages based on JSON. You build your subset language based on JSON, and you want to validate input data to check whether it complies with your format. Instead of writing your own parser, you specify your language with a high-level language and run the validation as a post-parsing step. Nothing in this usecase involves anything resembling "too dumb".
OpenAPI is a language to specify your APIs. At most, it is orthogonal to JSON. Again, nothing in this usecase involves anything resembling "too dumb".
JSON is just one of those tools that is a major success, and thus people come out of the woodwork projecting their frustrations and misconceptions on a scapegoat. "JSON is too dumb because I developed a JSON-based format that I need to validate." "JSON is too dumb because I need to document how my RESTful API actually works". What does JSON has to do with anything?
You're making an interesting distinction between "JSON" and "languages based on JSON" there, which I don't. JSON and XML in isolation are just a bunch of punctuation and not useful. They're only useful when we know the structure of the data within them. XML already had schemas, and we were able to easily (YMMV!) validate the correctness of a document.
JSON was simpler because we would just say to each other "I'll send you a {"foo":{"bar":1234,"ok":true}}", "Cool thx" - and there wasn't a way to formalise that even if we wanted to. That doesn't scale well though. We needed a (machine-readable) way to define what the data should actually look like, thus OpenAPI etc.
> You're making an interesting distinction between "JSON" and "languages based on JSON" there, which I don't.
That's basically the root cause of your misconceptions. A document format encoded in JSON is a separate format, which happens to be a subset of JSON. A byte stream can be a valid JSON document but it's broken with regards to the document format you specified. Tools like JSONPath bridge the gap between your custom document format and JSON. This is not a JSON shortcoming or design flaw.
> They're only useful when we know the structure of the data within them.
They are only useful to you because you're complaining that you still need to parse your custom format. This is not a problem with JSON or any other document format. This is only a misconception you have with regards to what you're doing.
>There have been many times when in order to really understand a YAML file (full of anchors and aliases, etc) I've had to turn it into JSON. This is ridiculous.
There's no reason why editor couldn't inline those things in YAML to help see what's going on locally. I can't code without code navigation and stuff like type hints for inferred types.
As YAML gets used for more complex stuff I think the tooling needs to catch up.
> There's no reason why editor couldn't inline those things in YAML to help see what's going on locally.
There's no reasons why an editor couldn't present either YAML style (block style or the JSON-like flow style) that the user prefers, and save in whichever one is preferred for storage.
Referencing common data is not a weakness, but it does introduce tradeoffs - I'd take that over having to do a "have I updated every instance" whack a mole.
> There have been many times when in order to really understand a YAML file (full of anchors and aliases, etc) I've had to turn it into JSON. This is ridiculous.
Spitballing here. If underlying data is identical in JSON or YAML or whatever, why not introduce a view layer that is structure agnostic provided that the syntax can be translated without modifying the data?
I'm imagining a VSCode plugin or some view that parses the data into whatever format you'd like when you open it, then when you write it serializes it into the file format specified in the filename. You could do the same with your code review system.
Ultimately the specific syntax is for humans only, so as tooling improves why not add that next layer of abstraction? Is it because there are so many format-specific idiosyncrasies that can't translate well, due to the complex nature of a lot of these config files (gitlab-yaml, etc.)?
Just wondering, without having the time to think through the language specs properly, why we haven't seen this yet when it seems like such a huge quality of life improvement.
> Then some of us realised JSON is actually too dumb for many things, and instead of going back to XML we made JSON Schema
XML has numerous different schema languages made for it, outside of the XML standard, because XML itself is just as “dumb” as JSON in this regard, and apparently no one got schemas exactly right for it. The holy wars over XML schema languages only faded when XML’s dominance did.
> OpenAPI
OpenAPI uses JSON and JSON Schema much as SOAP uses XML, but it doesn't prove JSON is “too dumb” any more than SOAP proves that XML is.
I don't know if SOAP caused it, but since it popularized, every single tool just assumes your XML is specified by a DTD, even though that's the one schema language that nobody ever liked.
I never saw a war. AFAIK nobody ever wanted to use DTDs, but that's what everybody used because it was what everybody used.
One thing about yaml that I think conflicts with 'human readable' is the idents mattering. On long files, even with a small number idents, it can be tough to tell whether something is in one section or another. For whatever reason, lining up braces/brackets/parentheses makes it easier for me to tell
Encoding meaning in whitespace not only makes it difficult to verify correctness by eye (for nontrivial cases), but also is very fragile.
You're lucky if it survives a cut and paste across tools. Almost every tool ever written treats whitespace as formatting, not meaningful content, and many tools therefore assume that "fixing" whitespace will improve readability.
Python's use of whitespace is, dare i say, perfect.
It makes everything more readable. I've never seen cases where indentation is ambiguous.
I think this is because in a programming language indentation happens in a small set of specific situations like `for` loops and such. Either you indent right or you get an error. On the rare occasion the wrong indentation level can assign a variable to the wrong value, but that is a rookie mistake.
Whereas in YAML, everything starts out ambiguous. Everything is context sensitive. Indents change the meanings of things in slight, unclear ways. Its a constant source of confusion
Because Python only has one semantic meaning for whitespace. YAML’s whitespace is also mixed with other symbols that change what the whitespace means, and it differs further depending on how much whitespace follows the previous indent and precedes the following symbol, if any. Or if the previous line ends with a pipe, then the only semantic meaning is “none of this whitespace matters, so much so that it’s trimmed… but the indentation is preserved after the first line”. I’m probably wrong about some or all of this! It’s been a whole day since I had to write YAML, so the YAML gnomes have rightly reclaimed the part of my brain that was sure I knew what a given bit of whitespace actually means.
Amen. My current theory: data science. Data science has become much more prevalent in the last decade, and I suspect data scientists prefer English-like syntax because they're not engineers.
Engineers use language like mechanical components. It should be precise, neat, and functional. We're designing machines. I have a feeling that data scientists don't want any of that. They want to use language as a way to describe data transformations in a format that resembles a log rather than functional instructions. Much like SQL.
That should fail a lint check in your CI and your editor/IDE should autoformat that indentation away.
You also claim that is a "problem with braces" but you're not using braces and languages like rust no longer allow brace-less single statements like that.
Yeah, if I had my druthers, brace-less shorthand syntax like that would never be allowed. I never use it in my own code.
Beautify takes care of all questions about what an indent means. C-syntax devs never assume an indent has syntactic meaning. We use it has a helpful hint. We innately understand that it's the braces that matter, and it's really easy to Beautify a file if the original author was sloppy.
C-syntax devs read differently. We're not reading a book. We're reading code.
And we generally strongly prefer correctness. Braces avoid all unintended bugs related to where the instructions are located on the screen. Pretty is nice. Structure and correctness are better.
<source>:6:17: note: ...this statement, but the latter is
misleadingly indented as if it were guarded by the 'if'
6 | launch_missile();
| ^~~~~~~~~~~~~~
> Others of us thought that the main problem with JSON is that it's not human readable and writeable enough. So we came up with new formats like YAML. [EDIT: My timing is wrong here, sorry.] Unfortunately being human we could not resist making it much more complicated again, thus increasing cognitive load.
One of the odd things about the progression is how user-hostile it is. JSON lacks support for comments, YAML has systematically-meaningful indentation indentation and frequently deep nesting, etc.
XML has plenty of problems of its own which legitimately generated a lot of hate for the format. JSON, at least superficially, didn't have many of those because it lacked (and still lacks) a lot of features. So, for a reasonable person it wouldn't be a proper comparison, but... there's the reason number 2.
JSON rise to prominence coincided with Flash dying and JavaScript hype train gathering momentum. Flash made a bet on XML (think E4X in the latest AS3 spec, MXML, XML namespaces in the code etc.) Those who hated Flash for reasons unrelated to its technological merits hated everything associated with it. In particular, that hate would come from people doing JavaScript. HTML5 that was supposed to replace Flash, but never did was fueling this hype train even more.
At the time, JavaScript programmers fell inferior to every other kind of programmer. Well, nobody considered JavaScript programmers to be a thing. If you were doing something on Web, you'd try hard to work with some other technology that compiled to JavaScript, but god forbid you actually write JavaScript. But people like Steve Yegge and Douglas Crockford worked on popularizing the technology, backed by big companies who wanted to take Adobe out of the standardization game. And, gradually, the Web migrated from Flash as a technology for Web applications to JavaScript. JSON was a side-effect of this change. JavaScript desperately needed tools to match different abilities of Flash, and XHR seemed like it won't be part of JavaScript and was in general a (browser-dependent) mess, especially when it comes to parsing, but also security. JSON had a potential to exploit a security hole in Web security model by being immediately interpreted as JavaScript data, and this was yet another selling point.
To expand on the last claim: one of the common ways to work with JSON was through dynamically appending a `script` element to HTML document, then extracting the data from that element, which side-stepped XHR. There was also a variant of pJSON (I think this is what it was called, but don't quote me, it was a long time ago), where thus loaded JSON would be sent as this:
$callback({ ... some JSON ... })
Where the `$callback` was supplied by the caller. I'm not entirely sure what this was actually trying to accomplish beside dealing with the asynchronous nature of JavaScript networking, but I vaguely remember hearing about some security benefits of doing this.
Anyways, larger point being: JSON came to life in a race to dislodge one of the dominant forces on the Web. Speed of designing the language and the speed of onboarding of new language users was of paramount importance in this race, where quality, robustness and other desirable engineering qualities played a secondary role, if at all.
As someone else said, JSONP, the callback thing you showed, is/was a same-origin workaround: script tags are legacy and can be loaded cross-origin and executed, but that doesn't expose the content of the script to you.
So you pass along the callback that you want it to execute in a query string param, the script comes back as a call to your callback, and you can then get at the data even though it's coming from a different origin. The remote side has to opt in to looking at the callback query param and giving you back JSONP, so it's kind of a poor man's CORS where the remote side is declaring that this data is safe to expose like this. Of course on the flipside, you're just getting and executing whatever Javascript the remote chooses to send, so you're trusting them more than you have to with a modern fetch/xhr using CORS.
> There were some nasty things about Flash, but in retrospect mobile applications are so much worse.
Not really. Flash was terrible on phones that supported it. There were SDKs that turned Flash applications into native iOS ones, IIRC, but otherwise Flash was a dead end once mobile started to grow.
Not really... people made these claims without any testing, as per usual.
I was on Adobe's community advisory board at the time of the iPhone fight against Flash and I'd work with all sorts of things that were supposed to be for the phones.
Flex had problems with phones. Macromedia and later Adobe built this GUI framework on top of Flash with multiple problems... performance being one of them, but the other, and perhaps more important problem was that Flex was created by people who wanted to copy Swing. In no way was it any good for making typical smartphone UI.
So, Adobe tried, but with very little commitment to produce some sort of Flex-based "extension" for smartphones... and that thing never went beyond prototype. Also, at that exact time, Flex was transitioning to using the new text rendering engine, which while offered more typographically pleasing options was really, really slow to render.
People behind Flash player had some good ideas. Eg. Flash Alchemy: a GCC-based backend for ActionScript that made Flash very competitive performance-wise (but never really went beyond prototype). Around that same time a new UI framework appeared in Flash aiming to utilize GPU for rendering, which was a big step in the right direction, especially considering how "native 3D" in Flash failed (it was all on CPU, operating very heavy display objects).
None of these ideas saw much traction in particular because Adobe's management responsible for the product lived under illusion of invincibility. They did a little bit of something, just enough to keep the lights on, but they didn't realize they were side-tracked until it was late. And even at the time it was late, they made some really bad choices. Instead of open-sourcing the player, they started a feud with people who wanted to maintain Apache Flex (the Adobe-abandoned Flex) because of some irrelevant IP right on the Flash player core API. They never officially recognized Haxe. And, generally, undermined a bunch of other projects that targeted their platform (Digrafa comes to mind).
They didn't come clear with major users of their technology, repeating "Adobe is eternally committed to supporting Flex" until they left it in the ditch and forgot all about it. They made it very, very hard to support them in whatever they were doing.
----
Bottom line, Flash could've been made to perform well on smartphones. It ran OK on what would today be called "feature phones" before smartphones existed (eg. it was available on Symbian), if you knew what you were doing.
It died because of piss-poor management on one side and monopolistic desires of mega-corporations on the other side.
> Not really... people made these claims without any testing, as per usual.
Come on… I was around at the time and played with it on phones that supported it. From a user’s perspective it was very bad. Some people desperately wanted it to happen and they had their reasons, but saying that I am negative without having tested it is pointless speculation, and wrong.
I worked for a shop that made Flash games for Symbian phones (i.e. old Nokias). That's a lot more resource-constrained environment than any of iPhone or Android ever were. And it ran fine, if you knew what you were doing.
When Android just appeared on the market, I worked for a company that was making a video chat Facebook app. It was written in AS3 and one of the main features was to apply various effects to video. We tested it on Android, and it worked fine, even though that's a very memory and CPU intensive app.
Really, Flash player was not the problem. It couldn't go toe-to-toe with native code, but optimized AS3 code would beat unoptimized native code.
It was some form of code-golf to write Base64 encoding in AS3 and benchmark it. Usually comparing to the implementation in Flex. When Flash Alchemy came out, I wrote a version of Base64 encoding that beat it something like 100:1. A friend of mine who was known by his forum / Github user name "bloodhound" (here's some of his stuff: https://github.com/blooddy/blooddy_crypto/tree/master/src/by... ) wrote a bunch of encoding / decoding libraries for various formats (he also improved upon my Base64 code). And these were used all over the place for things like image uploads / image generation online. This stuff would beat similar Java libraries for example.
Not sure if you remember this, but at one point in the past Facebook had a Java applet that they used to manage image uploads to your "albums". Later they replaced it by Flash applet. It didn't work any worse that's for sure.
----
The performance problems were in Adobe AS3 code, not the player. Flex was a very inefficiently written framework. And so were AS components. But if you take AS3 3D engines, even those that were fully on CPU... you had plenty of proper 3D games. Eg. Tanki Online (a Russian game made with Alternativa3D Flash engine) was a huge hit. Even if the phone could handle a fraction of that, you'd still have plenty of room for less complex UI.
iPhone didn't replace Flash: the intent was to be be smartphone, not a data distribution format...
iPhone browser, like MacOS browser, drop support for Flash (and most plugins in fact but Flash was the most noticeable.) In the other hand, HTML5 was adopted quickly by Apple. So we can say that HTML5 replaced Flash (not iPhone per se, as it didn't come with a specific replacement first and an alternative was already there second.)
However, I wouldn't say that HTML5 is the drop-in replacement of Flash. It did help to avoid this later on some common use cases with video and audio tags and standardisation of formats (that also kill use of QuickTime/WindowsMedia/RealMedia plugins)
I dunno what it is. After a little search, I think you wanted to point out that games developers lost interest in it and migrated to the smartphone stores?
There was an absolutely excellent essay by James Duncan Davidson, creator of the Apache Ant build tool for Java, about why XML was its configuration language, and why it was perfect at first but grew to be a huge problem. To summarize from memory:
- Ant’s config needs were modest: configs were static strings, numbers, booleans, and lists
- XML parser existed, was well tested, was supported by lots of tools, so was a quicker and easy way to parse configs
- As Ant became successful, the configs grew more complex and started to need to be dynamic and have “control flow”
- Once control flow was needed XML was no longer the best choice for config parsing, but it was too late. (The correct answer is that a scripting language is what you want here.)
Edit: I posted this quote from his essay on HN back in 2020 [1]:
> Now, I never intended for the file format to become a scripting language—after all, my original view of Ant was that there was a declaration of some properties that described the project and that the tasks written in Java performed all the logic. The current maintainers of Ant generally share the same feelings. But when I fused XML and task reflection in Ant, I put together something that is 70-80% of a scripting environment. I just didn't recognize it at the time. To deny that people will use it as a scripting language is equivalent to asking them to pretend that sugar isn't sweet.
Edit 2: The reason you’re not seeing an “improved yaml” is because the improved version is just Python/Ruby/JS/PHP,…
I was the first person to use Ant outside of Sun. I committed it to the Apache cvs repo after James donated it.
The initial use for it was to build the source code for what became Tomcat (also donated by Sun/James). At the time, Java didn't really have a build system. We were using Makefiles for Apache JServ, and it was horrid.
Ant was a breath of fresh air. At the time, XML was the hot "new" thing. We were instantly in love with it and trying to get it to do more. Nobody could have predicted what Ant was going to turn into. It was effectively an experiment and just a step forward from what we had previously. Iterative software development at its finest.
Similar to how Subversion was a better CVS. At the time Subversion was being developed (I was part of that as well), nobody could predict that a whole different model (git) would be so much better. We were all used to a centralized version control system.
It is entertaining watching everyone bork on about this topic, 24 years later.
By the way, Java still doesn't have a good build system. Maven and Gradle, lol.
I didn't always hold this opinion, but currently I believe general build (and deploy, integration, etc) engines should never be anything more than a logically controlled workflow engine, capable of transparently querying your system, but never directly changing it.
It should allow for any random code to be executed, but by themselves they shouldn't provide any further feature.
Ant is not like that. In fact, the Ant script was always a programming language. It lacked some flow control features, but if you have a sequence of declarations that should run one after the other, it's an imperative programming language already, you don't need loops and conditionals to qualify to that.
It's not about being perfect. It's about using the wrong tool for the problem.
Ultimately, code as configuration is the solution. The code generates the actual configuration-data which can be of an arbitrary format (e.g. json) since no one is gonna see it anyways except for debugging.
There are usually two arguments against that:
1.) code can be or become too complicated
It's a non-argument because while code can and will be complicated, the same configuration in non-code will be even worse. In the worst case, people will build things around the configuration because the configuration isn't sufficiently flexible.
What is true is that a single line of code is usually more complex than single line of yaml. The same can be said about assembly vs. any high level language. But it should be clear that what counts is the complexity of the overall solution, not a single line.
2.) code can be dangerous
This is a valid point. You will have to sandbox/restrict it and you need to have a timeout since it could run forever - and that creates some additional complexity.
I think this complexity is worth it in 90% of the cases. Even if you are not using code to generate the configuration, you probably still want to have a timeout, just in case of some malicious actor trying to use a config that takes forever to parse/execute etc.
But if you say that this is a nogo, then at least use a proper configuration language such as Dhall instead of YAML. It already exists. You don't need to invent a new configuration language to avoid YAML.
I think it's more of a progression thing. The config file starts out pretty simple, enough where one of the text file formats is clearly the right solution and code is over-complex. But then it grows and grows, continuing to accumulate more depth and use more features of the format, and maybe some hacky extensions. At some point, it gets complex enough that, if you designed it from the start with that feature set, being code would obviously be the right solution. But now the transition over is hard to pull off, so it tends to get put off for longer than it should be.
For instance looking at kubernetes, do you really think they didn't know in advance how complex things would get?
My explanation is different: often configuration files are used for things that are close to infrastructure/operations and those are done by folks who are already used to yaml and who are not used to high level coding as much. It's probably not a concious decision by them, it's just what they believe is the best.
K8S maintains the internal state of the system in JSON which is easy to convert to YAML which is seen as more readable in comparison. I think their choice to support JSON and YAML as an interface for configuring K8S is because of this.
I'd also guess that the expectation was that abstractions would form around YAML to make it more powerful. Helm uses Go templating language which supports logic and loops (but is very unpleasant to write complex configs with) and the operator pattern is also popular for more advanced cases. Ultimately both end up exposing some JSON/YAML interface for configuring, though. It is up to you to decide how you create that JSON/YAML.
They could have chosen Dhall as well, which also converts to json (or YAML). So I have to wonder: why did they choose YAML in the end?
> I'd also guess that the expectation was that abstractions would form around YAML to make it more powerful.
Yeah, now you have 3 layers: you have an arbitrary abstraction which is likely different for every tool, then you have yaml and then you have the internal json. That seems like a loss to me in terms of complexity.
Dhall was first released in 2018. Helm was started in 2015. Helm did flirt with using Jsonnet which would have been better, but I think they already had charts using templates.
Code also requires a runtime or needs to be compiled to something static, like YAML or JSON, in order to be consumed. That latter option is essentially how things are today given that templating tools the generate configurations are plentiful.
I think that this requirement makes broad adoption a lot more difficult for code-based configuration.
> That latter option is essentially how things are today given that templating tools the generate configurations are plentiful.
Yeah, that's what I mean partially by "building around the configuration". How is that better than writing the configuration directly in code?
Obviously, if the configuration is described with code then there is a runtime needed. But since the configuration is usually being run by program... there should be a runtime already. It's not like I'm saying that just any code/language should be available.
Any configuration format that cannot be auto-formatted (i.e. has significant whitespace) goes straight in the trash. 90% of my time spent with YAML is usually trying to figure out if something is indented correctly.
> Any configuration format that cannot be auto-formatted (i.e. has significant whitespace) goes straight in the trash. 90% of my time spent with YAML is usually trying to figure out if something is indented correctly.
What are you talking about, exactly? If I hit alt+L in IntelliJ, my YAML file is auto-formatted.
Indentation changes the structure of the data. There is no deterministic way to auto format YAML, you have to make sure, as a human, that everything is at the right indentation level
It can be deterministically auto-formatted as long as you respect the format, like with everything. If you wrongly indent a line then it’s your fault and the auto-format can’t know if some indent is wrong or not. I don’t see what’s wrong here; it’s like adding a superfluous } in JSON or not closing a tag in XML.
Because braces delimit entire blocks of code, whereas leading whitespace only delimits a single line of code.
It is very easy to stick your cursor between two braces and start typing or paste the clipboard. It is easy to see exactly which depth the added content will exist in. If you'd like a little breathing room, you can put some empty lines between the braces and insert there. Either way, once you (auto)format it'll be fine.
But if whitespace is significant, it's not enough to place your cursor in the spot where the content — the whole block — belongs because there is no one spot where the content belongs. Rather, each line of the content you wish to insert must be placed individually. When pasting a block, placing the cursor at the correct spot before the paste is no guarantee that all of the content will be indented correctly; you can only guarantee that the first line is correct (and even then, doing so might require taking into account any leading whitespace of the copied content). Subsequent lines might need to be adjusted to retain the same relative indentation relative to the first line — or not! It depends on your particular scenario.
But with braces, there really is no “it depends”. Place cursor, put text. It just works.
> And if you still think it's bad anyways, you can always improve on it. But very few actually want to put on the enormous amount of work needed to improve YAML or create a new language.
The issue is less about putting in the work to create a new language, it's about convincing a significant chunk of an ecosystem to use that new langauge.
> IMO if there was something that was substantially better, we would see projects switching to it in a heartbeat.
Switching costs are non-zero and can block a switch.
ie: Helm charts are built on YAML and gotmpl, even though JSON is also an option. I can make a fairly compelling argument that Helm would be better with JSON and JSonnet, but that ignores a huge amount of investment in existing Helm charts. Whatever benefits are there would be swamped by the cost in terms of additional complexity and potential porting costs.
This is spot on. My brief exposure to Jsonnet made me realize how superior it can be, especially for k8s manifests. Unfortunately the learning curve is steep, to the point that I've held off on attempting to introduce it anywhere new.
Recently I discovered kpt (https://kpt.dev/), which attempts to improve the k8s manifest situation and seems to have a lot of good ideas. Considering how long it's been under development though, it may also never catch on among mainstream k8s users.
Grafana recently switched their dashboard DSL to one based on Cue. I haven't yet dug into learning it, but it seems potentially even more powerful than Jsonnet. They also have the advantages of a much smaller audience (relative to k8s manifests) and of putting in the work up front to define the new language and a sample implementation (useful out of the box), not to mention making it a requirement going forward.
I suppose where I was going with all that is: hopefully, as people are exposed to superior solutions within a smaller context, they might be open to considering similar alternatives for other purposes.
> But very few actually want to put on the enormous amount of work needed to improve YAML or create a new language.
No need to create a new language: S-expressions have existed for longer than I’ve been alive.
> IMO if there was something that was substantially better, we would see projects switching to it in a heartbeat.
I disagree. Path-dependence is a real issue, as are local maxima.
> the effort to make any kind of switch so you can shorten Norway
It’s not so that one can shorten Norway; it’s to avoid silent errors (like the Excel-gene issue). Fortunately, this particular bug no longer exists in the current YAML spec.
You say "S-expressions" like it's a standard, but it's really hundreds of different standards. Maybe someone needs to create nice websites for SON (scheme object notation), RON (racket object notation), ION (IMAP object notation), CLON, CLJON, ELON, R7RON (large and small), and the rest.
> IMO if there was something that was substantially better, we would see projects switching to it in a heartbeat.
This weird line seems to govern a very persistent & substantial minority of thinking in tech that I've never been able to grok. It's like a sort of "natural selection" of quality - not only have I never seen it apply in real life, there's a reasonable argument that the opposite applies (a topic on which there's been plenty written). I guess it's based on some imagined model of each individual being an intentional & fully informed (& objectively correct) decision-maker in their own tech usage; it's certainly not based in experience.
I use YAML in work. A lot. It may be the filetype I interact with most. Does that make me an advocate?
I use it because the tools I interact with default to it - I have not actively selected each of these tools; of the subset I have chosen myself, YAML was one of the cons in that selection process.
JSON is imperfect. XML is imperfect. TOML is imperfect.
YAML is not imperfect - that implies something approaching quality.
I don't think using it makes you an advocate. I totally understand the point that most of the time we're not choosing it. It has been chosen for us in one way or another and that sucks.
My whole point is that we should focus on the projects using YAML instead of the format itself. Complaining about YAML at this point is like kicking a dead horse. We know it's not ideal, but saying it's crap is not gonna change anything.
That's true, & at the end of the day, people use YAML for a reason - one that other alternatives fall down on: it's the same reason people use Markdown (& despite it having similar drawbacks to YAML it's become even more ubiquitous). Ultimately, YAML & Markdown are good for one common reason: human write-ability. A lot of people think it's about readability, but it isn't: readability may be a pro in terms of quality, but its write-ability that actually drives adoption. YAML has pretty bad readability in reality (people think it's good because they confuse readability with aesthetics), in no small part due to its non-standardised indentation (also in smaller part due to its ambiguous data-types).
JSON is probably the most readable of all of them (readability isn't about being terse or "clean"), but the requirement for terminators makes it much less write-able. XML is worse again in this regard.
TOML is a kind of middle-ground - adopting many of YAML's flaws but fixing some of it's most egregious faults. Personally though, I think it seems like an odd attempt to resurrect INI, & I'd prefer something like StrictYAML[0].
I would prefer almost any of the alternatives you mentioned to yaml for most places it is used. Except for json, due to lack of comments.
Although for things where you have ifs, loops, etc.like CI pipelines and ansible, it really should be using a fully featured language instead of definig a new mini-languagr inside of yaml.
Yes, none of those are perfect, but that doesn't mean none of them would have been a better choice.
Yeah... this is like saying that murdering all people in line to buy ice-cream and waiting for your turn to buy ice-cream have pros and cons: either you get to buy ice-cream faster, but some people die, or you have to wait, but nobody dies.
YAML is trash. There's no pros to using it. Not in any non-preposterous way.
> If it was that bad, nobody would use it.
If fossil fuels were so bad for the planet, nobody would use them.
If super-processed fast food was so bad, nobody would eat it.
If wars were so bad, nobody would fight in them.
Do you sense where the problem with your argument is, or do I need more examples?
I'm the first person to dislike YAML, but I also like it better than the other popular alternatives. JSON sucks to write and read, but is great for machines. TOML sucks beyond simple top-level key value entries; Ini is even worse. XML reminds me of a thorn bush (but I can't explain this). Nobody on the team would want to learn to read Dhall
YAML has some "cute" features that are annoying (but anchors are nice).
Cue/JSONNET are a nice compromise between Dhall and JSON/YAML, but I don't see them reaching wide adopting (I'm hopeful it happens, but alas).
Aside: I wonder how people develop such an extremist view regarding something as mundane as config languages to equate them with casual murder
Config languages and build systems are both in a domain of maximum annoyance and minimum intellectual satisfaction.
Within the realm of software.. casual murder is wrong and all but it would be config languages driving a build system that drove me to it if anything could.
That's an overblown reaction: comparing yaml use to murdering people? Come on.
And you examples are flawed.
> If fossil fuels were so bad for the planet, nobody would use them.
It's not the planet that's using fossil fuels.
> If super-processed fast food was so bad, nobody would eat it.
It's probably not as you make it out to be, and another thing: food costs money. People choose with their wallets, whether you like it or not. Processed foods are usually cheaper. YAML, JSON, XML, they all cost the same; their indirect costs are really hard to measure.
You are very naive if you believe that. This can only work if people can meet the following requirements:
* People are rational and will always or at least statistically enough times will make the best choice.
* People are egotistic and will only do what's best for them, they don't consider the benefits of others.
* People know everything there is to know about nutrition, they can deduce the short term and long term effects of consuming any quantity of any chemical in any combination with any other chemicals based on precise knowledge of their gut functionality, the bacteria that lives in it etc.
* People should be able to predict the future, at least enough for them to be able to make rational choices about the effects of their current actions. In particular, they should be able to predict famines, invention or hybridization of species of various agricultural plants, developments in pharmaceutics helping them to combat various eating-related health problems.
----
People don't choose. They happen to eat super-processed foods, drive energy-inefficient cars, take out mortgages they cannot pay etc.
Finally, eating from a dumpster is even cheaper than eating super-processed foods (whose price can very much depend on the scale of manufacturing rather than anything else). For some reason most people choose not to eat from a dumpster...
When people say "it's bad for the planet", we need to assume that "it's bad for the humanity". The planet does not really care about CO2 emissions, humans suffering/dying from them probably do.
> If fossil fuels were so bad for the humanity, nobody would use them.
Where do you see me comparing YAML to murdering people? Please read that place again. I compare murdering people to the inconvenience of having to wait in a queue.
You're not bringing any arguments yourself. "There are no pros to using it" has got to be the most lazy thing I've read today.
Are we to assume that when the Google folks working on Kubernetes chose YAML they had a very brief moment of insanity? I can clearly see you hate it, but that's not an argument against it. Same for every other project that ever used YAML? Come on, that's a leap if I've ever seen one.
Those examples of yours are, again, very lazy. I'll take one of those just to prove my point: fossil fuels have bad consequences especially with overuse, but they of course had their pros, otherwise nobody would ever use it.
Take a deep breath mate. This is a discussion about YAML. Nobody is getting murdered.
You'd need to search through my post history. The arguments are long and numerous. I'm not sure I want to repeat myself again. I promise to do the search part though. So, if you have patience, you can wait and I'll hopefully find something, but cannot promise to work fast.
PS.
> Are we to assume that when the Google folks working on Kubernetes chose YAML they had a very brief moment of insanity?
I am "Google folks", so what? Just come from a department unrelated to Kubernetes (and I don't work for Google anymore).
Kubernetes is an awful project inside and out. Poorly designed, poorly implemented... It owes its popularity to the lack of alternatives early on. (Docker Swarm never really took off, by the time they tried, Kuberenetes advertised itself as having way more features, way more integrations, even though the quality was low). Kubernetes fills a very important niche, that's why it's used so much. It would've been used just as much if you had to write charts in any other similar language, no matter the quality or the popularity of that language.
For the purpose of full disclosure: I worked at Elastifile, some time before and after acquisition. In terms of development of Kubernetes, I have nothing to do with it. We might have been among the early adopters, but that's about it. At my first encounter with Kubernetes I knew about it just as much as any other ops / infra programmer who'd be tasked with using it outside of Google (we actually started using it long before the acquisition).
I do, however, greatly regret that Kubernetes exists. I cherish the idea that instead of writing non-distributed applications and then trying to stitch them using the immense bloat that is Kubernetes the industry will turn to a framework that enables making applications distributed "from the inside" (like Erlang's OTP). I see Kubernetes as a "lazy" way into the future, where we waste a lot of infrastructure to cover for lack of talent and desire to learn how to do things right.
For me, Kubernetes sits in the same spot with planned obsolescence, unnecessary plastic bags and cigarette butts in the ocean. It's the comfort of today at the expense of the much greater struggle tomorrow.
Interesting comment. I agree with pretty much all of it.
Although in fairness, I never claimed YAML was good in any way, shape or form. My comment was always about the pointlessness of it all. Either we get projects to not use YAML or we're yelling at the void.
Given how passionate you are about this though, I am curious what would you suggest instead of YAML?
I agree. That post about all of the pitfalls of YAML was very well written. My two cents: The best config format that I have used is JSON with C & C++-style comments allowed. Sure, dates and numbers (int vs float) are still a bit wobbly, but it is good enough for 99% of my needs.
> If fossil fuels were so bad for the planet, nobody would use them.
If super-processed fast food was so bad, nobody would eat it.
If wars were so bad, nobody would fight in them.
Do you sense where the problem with your argument is, or do I need more examples?
You mention these things as if they had only downsides. They do not. Fossil fuels have lots of advantages: high energy density, good storability, easy transport. Otherwise we would not have any problem getting rid of them.
Same for ultra processed food: it is cheap to make, addictive, has long shelf life so can be easily stored and shipped around the world, and all of that makes them very profitable.
Same for wars: some people or entities (companies, countries, etc) do profit from them (or at least hope or plan to profit).
So yeah, even things you find personally disgusting have some purpose, at least to some people. It’s also the case for YAML, as for all things. Comparing YAML to fossil fuels or wars is unhelpful hyperbole. We should be able to take a deep breath and discuss these things rationally like adults.
The argument was much more simple: popular doesn't mean good (for whatever metric you are using to measure it).
No need to look for reasons why something bad is also good. That's not the point. Hitler was a vegetarian and had some other commendable character traits. None of which make him a good person.
YAML is bloody terrible, and I've hated it since the first time I was forced to use it.
JSON or TOML are always preferrable, based on the use-case. If you want humans to be able to change the file, use TOML. If only machines and debug programmers need to see the file, use JSON.
YAML is in this horrible in-between where it's extremely unintuitive at first glance and it's really easy to break. The only positive aspect to YAML is that it's legible. But legible is not the same as understandable. TOML is just as legible, but it's also easy to understand and change. JSON is hard to read and hard to understand, but the structure is very durable.
I'm going to take a guess here and assume that there's a strong correlation between people who like Python and people who like YAML.
I don't know if direct code would solve any problems of YAML, the problem is you're configuring a system that isn't runnable locally, and by the nature of that, it doesn't matter if that system is configured in js, ruby, python, brainfuck, html, or yaml, without a validator you're screwed, and if you have a js validator, no reason you can't have a yaml one.
I personally think YAML has a lot of bad flaws, for one it doesn't have a schema definition similarly to JSON or XML, so you can't just say "write a yaml that conforms to this schema" and boom, autocomplete, self-documentation, etc.
That's about where I landed. "If only we had the ability to run a miniature, totally unscalable, but testable architecture locally, it'd solve all my challenges of administering a system."
This is just an indicator that what we have now isn't the best, it's just the one that functions. Kinda like humans, we're not the best biological organisms, we're just the ones that ended up winning.
Validation is pretty solvable though. Validate your configuration schema once it’s in memory. This is what we do for our yaml configs, it takes maybe an hour or two to write something from scratch. Sure it’d be nice to have it as a core feature, but it’s not a difficult add on.
Probably because of things like Github Actions, which uses YAML. The correct thing to do is use the .yml (like 15 lines) to execute a script of some kind, that does the real work, and can more easily be tested, changed, and used on its own.
But many people instead start writing more and more in the yaml files to do what they want, and they're left with a mess that cannot easily be run outside of github actions, so then they test it less, or other teams build parallel processes to do the same things but locally, etc
> Not sure I see the point in hating on one specific configuration language. If it was that bad, nobody would use it. And if you still think it's bad anyways, you can always improve on it. But very few actually want to put on the enormous amount of work needed to improve YAML or create a new language.
The point is that this format is like a virus. It doesn't need to be improved. It needs to go away. We already have a lot of great serialization formats that have all the use-cases covered. Whenever someone chooses to serialize configuration into YAML, they are unleashing hell upon all of the humans that have to use it.
Let me put it this way. If I was designing some software to sell to you that requires a config file, why would I skip over TOML (INI), JSON, or XML to pick YAML -- a brand new format that has a lot of funny rules that take training to understand? I would be subjecting you to confusion for no good reason. If I'm designing the software for a developer, I'd use JSON or XML. If I'm designing for a non-dev, I'd use TOML. If I'm intentionally trying to cause suffering, I'd choose YAML.
Edit: I wrote this comment in a way that makes it sound objective. Programmers like to think our arguments are all objective. Truthfully, I think the hate train is really about that fact that YAML seems to be very divisive -- some people really like it, and some people feel very frustrated with the format (like me). When those of us who don't like the format encounter some system, tool, or package that relies on YAML serialization, we lose our sh--. We feel that we're being subjected to a difficult and silly new format for reasons that seem entirely arbitrary.
YAML was first released in 2001, same year JSON was formalized, so if we're going to reject a language for being "new", we probably shouldn't use JSON either. It supports comments, and doesn't use commas for lists, so you don't have the trailing comma problem for diffs. It's far from perfect, but let's not lie and pretend it kills babies or something.
XML is perfect for everything in theory. Too bad the average programmer apparently has such an issue grokking a little complexity, adjusting their eyes to the sight of brackets, and configuring their parser correctly that we all had to throw our hands up and use dumbed down formats instead.
One of my big drivers away from XML is that processing used to be insanely expensive. Over the years this has improved some, but XML parsing is still significantly less performant than any other option, especially compared to JSON or YML. Used to be a few orders of magnitude more expensive on compute to read, and that's why many folks in the industry were happy to move on, esp for cases like message passing. I've pulled XML out of a few apps over the years in apps with a lot of message passing and the throughput and performance improvements were measurable and significant. For that use case Protobufs are much better than XML and have a schema though it's a binary format, so not usable for conf files.
Yes, that's very true. I actually have used json representation at previous gigs, I just wasn't up to adding the extra qualifiers. Whenever I've needed to use Protobufs, I've always found them very nice work, really. I certainly have a much warmer opinion of them than XML.
Actual angle brackets look like this: 〈〉 A bracket should visually surround the text it's containing, otherwise it looks confusing. The less-than and greater-than signs are only used due to ASCII limitations, though I'm wondering why they didn't go for square brackets, which are available in ASCII.
> Not sure I see the point in hating on one specific configuration language.
Point is most devs do not control what their employers mandate to use. Ranting a bit help them release some frustration. And how one can improve on it? It is not an individual app where I can change or upgrade a bit and be happy.
> But very few actually want to put on the enormous amount of work needed to improve YAML or create a new language.
So some did awful amount of work to come up with awful config language and everyone feel thankful about it. It doesn't seem right to me.
I wonder if it's usage in puppet is different enough to be a factor... But the introduction of Hiera to Puppet was a godsend (to me) and since then I've liked it (or at least put up with it). As long as it's only used as a way to hold data, and not the code (like Kubernetes) then I'm ok with it. If I have to use a DSL like Puppet or Terraform, I much prefer to use Hiera to perform context-aware data lookups so I can write environment-agnostic infra code (with zero hardcoded values) that iterates over data. I'm sure Hiera could have been implemented in another markup, but I wouldn't like to force my other choice on anyone. YAML'll do.
The worst part about that is not having the symbol type to use in JSON, so you must quote every symbol (or make your language incompatible with standard JSON).
We most certainly have a problem with our society, the internet has turned from being a place of information and sharing, to boldly whining about things that impact nobody. This is the 6th time over the course of 5 years someone has shared this link on hackernews. We get it, tools aren't perfect.
Do we have mods on HN that can get rid of these dupe posts?
It got shoved down people's throats more. Probably because of kube.
The same thing happened with XML. First people liked it then it became a monster. JSON and YAML came to rescue us. As YAML grew in use, its warts become more evident, so it became the new monster. Time for a new language to come to the rescue. The cycle repeats...
>if there was something that was substantially better, we would see projects switching to it in a heartbeat.
Both of these takes are extremely naive, a person’s choice of tools has almost nothing to do with the tools and everything to do with the person. Psychology governs human behavior, not logic
As it's used in CI/CD systems, it's almost as if people are making a adhoc, buggy, non-portable version of Make.
I would actually love an Xmonad style configuration system for this use case. Use an actual programming language's much better grammar and resolve errors at compile time.
JSON is not perfect either but hey, you can write just the data no whitespace and everything is fine! In YAML you need to first write the data, then find the whitespace error. The next logical thing is to use handwriting for configuration files, no technology at all
> But the fact is that most times the difference between them is not substantial, so the effort to make any kind of switch so you can shorten Norway is simply not worth it.
Switching from YAML 1.1 (2005) to YAML 1.2 (2009) for that reason makes perfect sense, I think.
> But very few actually want to put on the enormous amount of work needed to improve YAML or create a new language.
Actually a lot of people do work on creating new languages, but I think language nerds create languages for other language nerds, not really for the bulk of people who need something simple. They always want to make them functional and immutable or self-hosting or build high performance webservers with them. Things that nobody wants to do with YAML.
What we need is a configuration language with the simplicity of Logo or the C that was in the old ANSI C Book.
So YAML is the most flexible and bastardizable markup language and it gets used for configuration as well, even though it is terrible.
Well, maybe they think that it's worth "hating" on it because people are using it, and they want to make people aware of the problems? I know I do, though "hating" sounds like I have malicious intent, which I do not. I am sure the creator(s) are very bright people with nothing but good intentions. But I agree with the article that YAML is frustrating to work with. The "it's not perfect, but neither is anything" argument is a bit of a cop-out in my opinion, as that can be applied to anything and everything.
I feel (but don't know!) that YAML was inspired by markdown as an attempt to create a format that felt like the most natural and intuitive to read for humans, while still machine-consumable. A noble idea, but in my opinion, one that fails as soon as you have more than half a page of configuration. Then, it just becomes a pain to even figure out which parent a specific bullet belongs to. And that's not even getting into all the cleverness.
I don't want to create a new language because [XKCD standards comic]. I'd prefer people use JSON or TOML as I consider those better even if they have plenty of issues on their own.
YAML in the very the basic usage is OK. You need to watch the indent, and that's mostly it.
YAML using all the bells and whistles that you had no idea were even part of the spec (e.g. Anchors and Aliases) is terrible, hard to read and harder to edit.
Could just be one of those things where people have grumbled about something for a while, and when there's a sudden outcry that goes viral, brings the scale of the discontent into sharp relief.
Everything that is intensely loved will eventually be intensely hated. My theory is that it attracts a type of people with borderline personality, who intensely love something then after a period of disenchantment in which they fight, reject and ignore evidence that their passion is not flawless, they abruptly switch to hating it. They’re also typically the loudest voices in a community as they always feel intensely about everything, whether positive or negative.
As a dev at a company that almost exclusively uses HOCON for application configurations it's really sad that it doesn't have a bigger audience. I guess Lightbend is mostly to blame for that.
We used it in our python projects as well for a little bit until we hit a bug with how pyhocon handles certain references and we just switched to using the java implementation to serialize configs for python apps into JSON...
JSON and XML are not perfect but are simple formats with simple rules. YAML, on the other hand, is not simple. The specs are so baroque that JSON is part of it.
That's my problem with YAML. Other people may have other problems, I understand that.
It's nice that you believe that people tend to use good things and abandon bad things. I wish world would work like that. But it doesn't.
You wrote a lot of words but said nothing. My argument is that picking only on YAML is useless because we can find faults in all of them. There's no perfect choice, just trade-offs.
Something being suboptimal (JSON not allowing comments unless it's JSONC) is very different from the trash heap of poor YAML design decisions shown in this article.
Isn't this confirmation bias at work? These are poor decisions of different degrees, but they're still simply poor decisions.
I have no horse in this race. I suffered with the shortcomings of all of these formats. So I don't see a point in saying "this one is in a different category of bad". YAML was built with different ideas in mind. If we are so adamant on hating on something, we should hate the projects that chose YAML over something else.
I, the other poster, and the article writer feel a preponderance of poor decisions regarding YAML (mentioned in the story) make it much worse than a few poor decisions regarding JSON or similar.
No it's not confirmation bias. The linked site has TONS OF EXAMPLES of why. XML is verbose, that's pretty much the only problem. JSON is simple, that's its only problem. Those are TINY problems.
Not my comment but I would agree that "If it was that bad, nobody would use it" is a weak argument - there is a lot of bad things around us which we have to use. It should not stop us discussing that they are bad.
This is unquestionably hillarious, and there are for sure some questionable "features" of yaml. But all this yaml hate recently seems to be hate that would be better directed at the garbage CICD toolkits that come with things like github, azure dev-ops etc.
YAML is good for human readable configuration, with some notable caveats. But its terrible for being a secret programming language where the only way to execute it is to put in a PR request, commit to a live branch etc.
> YAML is good for human readable configuration, with some notable caveats. But its terrible for being a secret programming language where the only way to execute it is to put in a PR request, commit to a live branch etc.
Bingo, people keep implementing incredibly complex DSLs that are based on YAML, and users keep shoving more complexity into that YAML since it's the only tool they have to interface with $THING, and all of a sudden nobody can comprehend what is actually going on just by reading it.
It's not a problem of YAML per se (another format would just have its own footguns / pain points), it's a risk with any DSL: you can create something so complex that it's actually less clear and harder to manage than it would have been with a normal programming language from the start.
> users keep shoving more complexity into that YAML since it's the only tool they have to interface with $THING, and all of a sudden nobody can comprehend what is actually going on just by reading it.
code monkeys do that.
I like YAML for exactly what you describe but people abuse it then complain it sucks.
No 1000 ways to say true/false. No 1000 ways to have strings.
No significant whitespace in config files (also helps out with the 1000 types of YAML strings).
Etc.
A subset of YAML that drops all that garbage.
And yeah, the CI/CD stuff should also use... a sandboxed programming language.
I think Lua and Tcl can remove functions and language constructs in a sandboxed enviroment. Pick one of those, or improve it slightly (make Lua 0-index based if you so care), and then use that.
How's this supposed to make any difference about YAML (or this Restricted YAML) being used for things it's not supposed to be used for?
It's not a problem that $CI YAML configs have multiple ways to encode strings and need indentation. Yes, "no" issue sucks a bit, but it's not that huge, just a minor papercut. The giant root issue is that YAML is simply not an appropriate format for this kind of stuff. People are writing more and more complex programs (modern CI pipeline is a program*) in it, when it's not a [suitable] programming language.
And it's not suitable not because of whitespace. Syntax doesn't matter that much (unless you're a junior developer first time seeing APL - just kidding, of course), it's all about the semantics. YAML was never meant for writing programs, it doesn't have necessary constructs to support those (but people are inventing DSLs on top of it, and it looks and feels awful).
Similar logic applies to some orchestration systems, some IaaC stuff and other areas where YAML is not a great fit because systems either "outgrew" simple configurations to something else (like programs, but not always) or YAML was a bad fit from the very beginning. CI is just the most obvious example.
*) It made sense when CIs were simple, and those YAML files were configuration snippets rather than programs. Just like Makefile makes a lot of sense, when it's able to do all you need. But CI grew more and more complicated.
But then I also have to make sure python and the dependencies are available in all the places. Not an ideal situation for git hooks. What do you do when you bump the version in one branch and then switch branches?
You missed the "/s" bit, I presume? But you certainly have a point.
If JSON still fits somewhere - that's nice, it means the issue is just about YAML syntax (which is a matter of personal preference, really; but I thought we left all the entertainment of putting Pythonistas and Perl Hackers in a virtual ring in the past decade, somewhere next to /.'s grave).
If JSON feels worse - it means YAML's syntax is not a problem, it's a symptom of YAML being misplaced to express something other than a simple data structure.
No /s, either JSON is OK for you or it's not, and if not a slightly nicer looking but way more error prone JSON doesn't solve your problem.
1. If you're making a user-facing program you give them a nice settings menu and store the config as whatever.
2. if it's developer-targeted software, and your config starts approaching a scripting language you don't encode arbitrary computations in YAML (or XML, Microsoft), but rather let the user use a scripting language and preferably you don't invent a new one.
We always cry about fragmentation and complexity in software and yet even in the rare cases when we have universally agreed upon formats/standards we will still go out of our way to make a new one for the amazing benefit of significant whitespace.
I agree, as the maker as a tool that lets you program your CI with a real language instead of a terrible DSL, our first instinct was to bash YAML itself for cheap marketing points, but what you’re saying is exactly right: YAML is not perfect but it gets the job done, as long as it’s the right job. Pseudo-programming a CI pipeline is decidedly not the right job.
Enter Lowdefy - Build business web apps in YAML - and its fantastic! Also, founder here so.. Honestly our team is writing more YAML than I think anyone, and we, including junior devs are loving the experience. Check it out!
I'm curious: on average, how many hours does one typically spend directly interacting with YAML on a *yearly* basis?
I'm not sure my experience is typical, but the creator of this page probably spent more time dealing with YAML building said page more than I spent working with YAML for MY ENTIRE LIFE.
I get it, the language has its flows, and they are apparent even with my limited exposure to it, but I don't think YAML would constitute the major part of anybody's workday. People probably spend more time and energy hating on YAML than actually working with it.
You would be very surprised. Infra people can very easily spend the vast majority of their (non-ops) time in the REPL from hell, manually editing static configuration files, visually inspecting (because there is rarely any local let alone static validation), pushing, waiting for minutes, and then trying again.
As an infra team we had this same problem. We also had a lot of different configuration types that were required for each tool, often several types being mixed within the same repo. After a couple of issues with syntax errors making it further down the release than we would have liked we ended up building a single tool to validate all of our configs rather than using a bunch of separate tools. We added it as a quality gate in our pipelines which really helped. We open-sourced it for anyone else going through the same issue: https://github.com/Boeing/config-file-validator. It's written in go and runs on Windows, MacOS, and Linux.
I think that's precisely the problem with YAML. By being so unpredictable, it turns from "format I use without thinking much about" into "format that makes me spend a lot of time finding out what's wrong this time".
Maybe you and I haven't been bitten personally by YAML idiosincracies yet, but just learning about examples like the "Norway problem" and the "negative GPS coordinates bug"—and thinking how frustrating must have been to figure those out in the first place—makes me shudder.
Not too many hours in the grand scheme of things, I guess, but I spend more time than I'd like mucking about with package.json files (JSON) and GitHub Actions workflows (YAML). The JSON files feel vastly less painful.
YAML is a data serialization format that was never intended to be written by hand. It was intended to be written by programs serializing their data, and read by a human.
This is the reason so many people don't like YAML. They don't know how to use it right, and they keep fucking it up, getting pissed off, and not learning their lesson.
Don't write YAML by hand and it works fine. And maybe read the spec...
While we're on the subject - Did you know you don't have to pick from 3 existing formats? You are allowed to make your own new configuration formats. And you are allowed to make them not be some perfect universal solution for all configuration needs for all programs.
Back in the day, every program invented its own configuration format, with features designed just for what that one program needed to do. They were ugly and not based on any kind of "principles" or "best practice" or trendy bullshit found on a social media site for nerds. It was just whatever the user found easiest that worked. No standards, just a wonky configuration that worked great for that one program. And guess what? All those weird formats, they are still around today, because the programs are still around, because they are useful and work.
Get off of HN and make something unique that you find useful. Don't ask people on here if it's good, just use it. If it works for you, it's good enough.
It looks easier than JSON to write by hand. JSON at least makes it known that you’ll have a better time generating it, that writing it.
It’s like those doors that open by pushing them. Some open both ways, some have a handle to grab and open it to you, some have a plate for you to push on.
And some (like YAML) have a plate on the side you need to pull, and a handle on the side you need to push.
YAML was always intended to be written by hand. It was originally an alternative to XML for "markup language" although it isn't one. It was used for serialization, but JSON quickly won in that realm. But YAML was used for configuration files from the start. A good example is the quoting of strings, that wouldn't be optional in serialization language.
No, it was intended to be an alternative to XML. But it was the renamed to YAML Ain't a Markup Language, because it's really intended to be used for data, not human-centric markup.
YAML uses hard data typing, and types are inferred (though can be explicit). When a program serializes data, it does not add the explicit types. Therefore for a human to edit the file, they would need to have the entire YAML parser semantics in their brain, or add the explicit types, and hope the YAML parser they are using will re-serialize the explicit types (most are poorly implemented and won't).
Therefore it is virtually impossible for a human to write YAML correctly by hand. Its only reliable use is as a data language for programs to serialize data. It just happens to be easy to read.
This assumption was discovered deep within some templated YAML generated by templated something else three levels down and resulted in a complete k8s cluster failure for us, but only the 08 cluster! It worked in the first 7.