(I am an absolutist on this matter. To be a superset, all, that's A L L valid JSON strings must also be valid YAML to be a superset. A single failure makes it not a superset. At scale, any difference will eventually occur, which is why even small deviations matter.)
I’ve often heard this (YAML is a superset of JSON) but never looked into the details.
According to https://yaml.org/spec/1.2.2/, YAML 1.2 (from 2009) is a strict superset of JSON. Earlier versions were an _almost_ superset. Hence the confusion in this thread. It depends on the version…
CPAN link provided by the parent says 1.2 still isn't a superset:
> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even though the incompatibilities have been documented (and are known to Brian) for many years and the spec makes explicit claims that YAML is a superset of JSON. It would be so easy to fix, but apparently, bullying people and corrupting userdata is so much easier.
"Please note that YAML has hardcoded limits on (simple) object key lengths that JSON doesn't have and also has different and incompatible unicode character escape syntax... YAML also does not allow \/ sequences in strings"
I just checked YAML 1.2 and it seems that 1024 limit length on keys still in spec (https://yaml.org/spec/1.2.2/, ctrl+f, 1024). So any JSON with long keys is not compatible with YAML.
Another reason to have a limit well below the computer's memory capacity is that one could find ill-formed documents in the wild, e.g., an unclosed quotation mark, causing the "rest' of a potentially large file to be read as a key, which can quickly snowball (imagine if you need to store the keys in a database, in a log, if your algorithms need to copy the keys, etc.)
I assume JSON implementations have a some limit on the key size (or on the whole document which limits the key size), hopefully far below the available memory.
I assume and hope that they do not, if there is no rule stating that they are invalid. There are valid reasons for JSON to massive keys. A simple one: depending on the programming language and libraries used, an unordered array ["a","b","c"] might be better mapped as a dictionary {"a":1,"b":1,"c":1}. Now all of your keys are semantically values, and any limit imposed on keys only makes sense if the same limit is also imposed on values.
Yes absolutely, in practice the limit seems to be on the document size rather than on keys specifically. That said it still sets a limit on the key size (to something a bit less that the max full size), and some JSON documents valid for a given JSON implentation might not be parsable by others, in which case the Yaml parsers are no exceptions ;)
I'm not even sure why I'm playing the devil's advocate, I hate Yaml actually :D
> Then we said it's too verbose. We named some subsets XML, HTML, XSLX
If anything, XML as an SGML subset is more verbose than SGML proper; in fact, getting rid of markup declarations to yield canonical markup without omitted/inferred tags, shortforms, etc. was the entire point of XML. Of course, XML suffered as an authoring format due to verbosity, which led to the Cambrian explosion of Wiki languages (MediaWiki, Markdown, etc.).
Also, HTML was conceived as an SGML vocabulary/application [1], and for the most part still is [2] (save for mechanisms to smuggle CSS and JavaScript into HTML without the installed base of browsers displaying these as content at the time, plus HTML5's ad-hoc error recovery).
While indeed neither markdown, much less JSON syntax has been intended as an SGML app, that doesn't stop SGML from parsing JSON, markdown, and other custom Wiki syntax using SHORTREF [1] ;) In fact, the original markdown language is specified as a mapping to HTML angle-bracket markup (with HTML also an SGML vocabulary), and thus it's quite natural to express that mapping using SGML SHORTREF, even though only a subset can be expressed.
I think you'll find that in the beginning were M-expressions, but they were evil, and were followed by S-expressions, which were and are and ever will be good.
SGML and its descendants are okay for document markup.
XML for data (as opposed to markup) is either evil or clown-shoes-for-a-hat insane — I can’t figure out which.
JSON is simultaneously under- and over-specified, leading to systems where everything works right up until it doesn't. It shares a lot with C and Unix in this respect.
If XML for data is bad, check out XML as a programming language. I think this has cropped up a few times, one that stuck with me was as templating structures in the FutureTense app server, before being acquired by OpenMarket and they switched to JSPs or something.
Lots of <for something> <other stuff> </for> sorts of evil.
Python's .netrc library also hasn't supported comments correctly for like 5 years. The bug was reported, it was never fixed. If I want to use my .netrc file with Python programs, I have to remove all comments (that work with every other .netrc-using program).
It's 2022 and we can't even get a plaintext configuration format from 1980 right.
> It's 2022 and we can't even get a plaintext configuration format from 1980 right.
To me, it's more depressing that we've been at this for 50-60 years and still seemingly don't have an unambiguously good plaintext configuration format at all.
I've been a Professional Config File Wrangler for two decades, and I can tell you that it's always nicer to have a config file that's built to task rather than being forced to tie yourself into knots when somebody didn't want to write a parser.
The difference between a data format and a configuration file is the use case. JSON and YAML were invented to serialize data. They only make sense if they're only ever written programmatically and expressing very specific data, as they're full of features specific to loading and transforming data types, and aren't designed to make it easy for humans to express application-specific logic. Editing them by hand is like walking a gauntlet blindfolded, and then there's the implementation differences due to all the subtle complexity.
Apache, Nginx, X11, RPM, SSHD, Terraform, and other programs have configuration files designed by humans for humans. They make it easy to accomplish tasks specific to those programs. You wouldn't use an INI file to configure Apache, and you wouldn't use an Apache config to build an RPM package. Terraform may need a ton of custom logic and functions, but X11 doesn't (Terraform actually has 2 configuration formats and a data serialization format, and Packer HCL is different than Terraform HCL). Config formats minimize footguns by being intuitive, matching application use case, and avoiding problematic syntax (if designed well). And you'd never use any of them to serialize data. Their design makes the programs more or less complex; they can avoid complexity by supporting totally random syntax for one weird edge case. Design decisions are just as important in keeping complexity down as in keeping good UX.
Somebody could take an inventory of every configuration format in existence, matrix their properties, come up with a couple categories of config files, and then plop down 3 or 4 standards. My guess is there's multiple levels of configuration complexity (INI -> "Unixy" (sudoers, logrotate) -> Apache -> HCL) depending on the app's uses. But that's a lot of work, and I'm not volunteering...
I quite like CUELang (https://cuelang.org/), although it not yet widely supported.
It has a good balance between expressivity and readability, it got enough logic to be useful, but not so much it begs for abuses, it can import/export to yaml and json and features an elegant type system which lets you define both the schema and the data itself.
Although I do feel like there is a case to be made that if you need a Turing complete configuration language then in most cases you failed your users by pushing too many decisions on to them instead of deciding on sensible defaults.
And if you are dealing with one of the rare cases where Turing complete configuration is desirable then maybe use Lua or something like that instead.
I'm not defending YAML. YAML is terrible. It's even worse with logic and/or templates (looking at you, Ansible). Toml is certainly better but I'm still baffled as to why we don't have a "better YAML". YAML could almost be okay.
Followup to my own post: don't forget about Scheme! Same nice properties as Lua, but you get some extra conveniences from using s-expressions (which can represent objects somewhat more flexibly, like XML, than Lua, which is more or less 1:1 with JSON).
There's StrictYAML[1][2]. Can't say I've used it as let's face it, most projects bind themselves to a config language - whether that be YAML, JSON, HCL or whatever - but I'd like to.
Yeah, I think it's because nobody sat down and methodically created it.
People create config languages that work for their use case and then it is just a happy accident if it works for other things.
I don't think anyone has put serous effort into designing a configuration language. And by that I mean collect use cases, study how other config languages does things, make drafts, and test them. etc...
I know a lot of people hate it but I find it to be the only configuration language that makes any sense for moderately large configs.
It’s short, readable, unambiguous, great IDE support. Got built in logic, variables, templates, functions and references to other resources - without being Turing complete imperative language, and without becoming a xml monstrosity.
Seriously there is nothing even close to it. Tell me one reasonable alternative in wide use that’s not just some preprocessor bolted onto yaml, like Helm charts or Ansible jinja templates.
There's a world of difference between "simple configuration needs" and "complex configuration needs".
I will take a kubernetes deployment manifest as an example that you would want to express in a hypothetically perfect configuration language. Now, eventueally, you end up in the "containers" bit of the pod template inside the deployment spec.
And in that, you can (and arguably should) set resources. But, in an ideal world, when you set a CPU request (or, possibly, limit, but I will go with request for now) for an image that has a Go binary in it, you probably also want to have a "GOMAXPROCS" environment variable added that is the ceiling of your CPU allocation. And if you add a memory limit, and the image has a Java binary in it, you probably want to add a few of the Java memory-tuning flags.
And it is actually REALLY important that you don't repeat yourself here. In the small, it's fine, but if you end up in a position where you need to provide more, or less, RAM or CPU, on short notice (because after all, configuration files drive what you have in production, and mutating configuration is how you solve problems at speed, when you have an outage), any "you have to carefully put the identical thing in multiple places" is exactly how you end up with shit not fixing themselves.
So, yeah, as much hate as it gets, BCL may genuinely be better than every other configuration language I have had the misfortune to work with. And one of the things I looked forward to, when I left the G, was to never ever in my life have to see or think about BCL ever again. And then I saw what the world at large are content with. It is bloody depressing is what it is.
Yeah absolutely. I think there are four corners to the square: "meant to be written by humans/meant to be written by computers" and "meant to be read by humans/not meant to be read by humans". JSON is the king of meant to be written by computers read by humans, grpc and swift and protobuf and arrow can duke it out the written by computer/not read corner. We are missing good options in written by humans half.
And the sysadmin in me developed a dislike of both within 1 minute of looking at them.
Honestly, I think a good configuration library should be more than a spec, it should come with a library that handles parsing/validation.
See, there are two sides to configuration, the user and the program. Knowledge about the values, defaults and types should live on the program side and should be documented. Then the user side of configuration can be clean and easy to read/write and most important of all, allow the user to accomplish the most common configuration without having to learn a new config language on top of learning the application.
> Honestly, I think a good configuration library should be more than a spec, it should come with a library that handles parsing/validation
You just described CUELang.
The type system allows to define a schema as well as the data, in the same file, or in 2 separate ones. Then you can call either a cli tool (that works on linux, windows or mac) or use the Go lib (or bind to it).
For compat, cue can import and export to yaml, json and protobuf, as well as validate them.
Exactly. So if I'm going to learn/use one of them, there's no clear winner, really. Both also seem to also have about the same amount of adoption (zero?).
About Ansible, I think it gained it's success partially due to YAML.
Ansible is worse than Puppet and CFEngine in many ways, but it is superior in the user interface.
It managed to not only be a config management solution, but provide a universal config language that most apps could be configured with. So for a lot of use cases, if you know Anisible/YAML then you don't have to learn a new configuration language on top of learning a new application.
The problem with Ansible is it's not universal, because most app playbooks, are configured in the worst possible way. In my experience typically you get handed an Ansible script, something which you'd hoped was declarative but isn't (like a version that apt-get grabs isn't fixed, or even, gets patched) then suddenly a downstream templated command fucks up, and the person who wrote the script isn't around anymore (or you don't trust their chops because they are a blowhard that worked at Google/Facebook and had a coddling ops team behind them in the past) or worse it's from "community" and has a billion hidden settings that you can't be bothered to grok - and so you have to dig so many layers down that you are better off just fucking rewriting the Ansible script to do the one thing which probably should have been four lines.
In any case, I found Ansible scripts to have like a 3 month half life. If we were lucky. I'm not bitter.
haha, I can go on lengthy rants about every single configuration management system that I have used.
My dream configuration system should revert to default when the config is removed (keeping data). Have a simple/easy user interface. Have maintained modules with sane defaults for the 500 most common server software. I would rather there be no module than an abandoned one with unsafe defaults, that way it is clear that I would have to maintain my own if I want to use that particular piece of software. Performant, it really shouldn't take more than a few minutes to apply a config change. No more than 30 min for initial run.
Early on, Ansible was primarily agent-less from the start which made it ridiculously easy to sneak into existing infrastructure and manual workflows. I probably would not have been able to stand up Puppet or Salt or whatever but I could run Ansible all by myself with no one to stop me :).
I'm curious what your thoughts are on a config language I'm working on.
GitHub.com/vitiral/zoa
It has both binary and textual representation (with the first byte being able to distinguish them), and the syntax is clean enough I'm planning on extending it into a markup language as well.
This is why I like INI. It doesn't have these problems, because it doesn't try to wrangle the notion of nested objects (or lists) in the first place. The lack of a formal spec is a problem, sure, but it such a basic format that it's kind of self-explanatory.
When the problem is TOML not supporting easy nesting, a solution of "Don't nest." works just as well in TOML as it does in ini. It's not really an advantage of ini. Especially when a big factor in TOML not making it easy is that TOML uses the same kind of [section]\nkey=value formatting that ini does!
I wrote an INI parser that has numerical, boolean, timestamp, MAC address, and IP address types ;) "advantages" of not having a spec!
Seriously: for application-specific config files, the lack of a formal spec can be kind of a nice thing. You can design your parser to the exact needs of your program, with data types that makes sense for your use case. Throw together a formal grammar for use in regression testing, and you're all set.
Obviously a formal spec is essential for data interchange, but that's why JSON exists. To me, YAML is in a gray area that doesn't need to exist. The same thing goes for TOML, but to a far lesser extent.
Everything gets serialized to a string of bytes. The point is that you can fail at parsing when the value doesn't make sense, rather than failing at some point in the future when you decide to use the value and it doesn't make sense. And if you have a defined schema, you can have your editor validate it against the schema when saving, so you don't accidentally have "FILENOTFOUND" in a Boolean.
TOML sucks for list of tables simply because they intentionally crippled inline tables to only be able to occupy one line. For ideology reasons ("we don't need to add a pseudo-JSON"). Unless your table is small, it's going to look absolutely terrible being all crammed into one line.
I would still reach for TOML first if I only needed simple key-value configuration (never YAML), but for anything requiring list-of-tables I would seriously consider JSON with trailing commas instead.
I see the point and this is certainly a drawback of TOML but for me this is something of a boundary case between configuration and data.
When configuration gets so complicated that the configuration starts to resemble structured data I tend to prefer to switch to a real scripting language and generate JSON instead.
This is why there should be a way to automatically install software into a sandboxed location, e.g. a virtualenv.
Considering we are having software drive cars today it should be trivial and I would say even arguably expected that software should be able to autonomously "figure out" how to run itself and avoid conflicts with other software since that's a trivial task in comparison to navigating city streets.
Tested on python what? I was curious to see what error that produced, figuring it would be some whitespace due to the difference between the list items, but using the yamlized python that I had lying around, it did the sane thing:
PATH=$HOMEBREW_PREFIX/opt/ansible/libexec/bin:$PATH
pip list | grep -i yaml
python -V
python <<'DOIT'
from io import StringIO
import yaml
print(yaml.safe_load(StringIO(
'''
{
"list": [
{},
{}
]
}
''')))
DOIT
$ sed 's/\t/--->/g' break-yaml.json
--->{
--->--->"list": [
--->--->--->{},
--->--->--->{}
--->--->]
--->}
$ jq -c . break-yaml.json
{"list":[{},{}]}
$ yaml-to-json.py break-yaml.json
ERROR: break-yaml.json could not be parsed
while scanning for the next token
found character '\t' that cannot start any token
in "break-yaml.json", line 1, column 1
$ sed 's/\t/ /g' break-yaml.json | yaml-to-json.py
{"list": [{}, {}]}
It would be great if instead of the histrionic message on CPAN (which amusingly accuses others of "mass hysteria"), the author would just say "YAML documents can't start with a tab while JSON documents can, making JSON not a strict subset of YAML".
The YAML spec should be updated to reflect this, but I wonder if a simple practical workaround in YAML parsers (like replacing each tab at the beginning of the document with two spaces before feeding it to the tokenizer) would be sufficient in the short term.
> "YAML documents can't start with a tab while JSON documents can, making JSON not a strict subset of YAML"
But YAML can start with tabs. Tabs are allowed as separating whitespace in most of the spec productions but are not allowed as indentation. Even though those tabs look like indentation, the spec productions don't interpret them as such.
Note: the YAML spec maintainers (I am one) have identified many issues with YAML which we are actively working on, but (somewhat surprisingly) we have yet to find a case where valid JSON is invalid YAML 1.2.
Thanks for the clarification. Let's fix it in PyYAML then :)
Speaking of PyYAML, I recently ran into an issue where I had to heavily patch PyYAML to prevent its parse result from being susceptible to entity expansion attacks. It would be nice to at least have a PyYAML mode to completely ignore anchors and aliases (as well as tags) using simple keyword arguments. Protection against entity expansion abuse would be nice too.
They should remove the phrase "every JSON file is also a valid YAML file" from the YAML spec. 1) it isn't true, and 2) it seems like it goes against the implication made here:
> This makes it easy to migrate from JSON to YAML if/when the additional features are required.
If JSON interop is provided solely as a short-term solution that eases the transition to YAML, then I applaud the YAML designers for making a great choice.
I'm not a fan of YAML either, but I think you should not generate YAML files if you can avoid it. All YAML you encounter should be hand-written, so this problem should not occur.
I read "YAML is a superset of JSON" not as a logical statement, but as instructions to humans writing YAML. If you know JSON, you can use that syntax to write YAML. Just like, if you know JavaScript or Python (or to some extent PHP) object syntax, you can write JSON.
If you get a parse error, no biggie, you Alt+Tab to the editor where you are editing the config file and correct it. It is not like you are serving this over the net to some other program.
As long as you tell the typescript compiler not to stop when it finds type problems, all JavaScript works and compiles, right? That sounds like a superset to me. Syntactically there are no problems, and the error messages are just messages.
> As long as you tell the typescript compiler not to stop when it finds type problems, all JavaScript works and compiles, right?
Does such code count as valid TypeScript though? It sounds more as if the compiler has an option to accept certain invalid programs.
You could build a C++ compiler with a flag to warn, rather than error, on encountering implicit conversions that are forbidden by the C++ standard. The language the compiler is accepting would then no longer be standard C++, but a superset. (Same for all compiler-specific extensions of course.)
Personally I'm inclined to agree with this StackOverflow comment. [0] It's an interesting edge-case though.
It's syntactically and functionally correct, so despite the error messages I think 'valid' is a better label.
> You could build a C++ compiler with a flag to warn, rather than error, on encountering implicit conversions that are forbidden by the C++ standard. The language the compiler is accepting would then no longer be standard C++, but a superset. (Same for all compiler-specific extensions of course.)
The way I see it, these errors are already on par with C++ warnings. C++ won't stop you if you make a pointer null or use the wrong string as a map key.
Are people not even reading about what they are using?
Always read documentation or you will get burned by something completely innocuous. For example, that’s how you get Norway missing in your config file with YAML:
NI: Nicaragua
NL: Netherlands
NO: Norway
Oops, “NO” evaluates to false like 10 other reserve words.
Did you read the entire YAML spec before using it (https://yaml.org/spec/1.2-old/spec.html_? And all other specs? And docs for all your dependencies? What about docs for system services on all your servers?
Software/specs/formats have edge-cases. Theses exists usually because of a tradeoff in usability. That's why there is YAML, JSON, TOML, etc. Choose the one that best fits your use-case and the strictness you need.
This feels like an attempt at victim blaming. If you have to read an entire spec from top to bottom to avoid a pitfall in a relatively common operation, maybe something is wrong.
FWIW, I couldn't even find the relevant section in that spec from a quick glance. I probably would have to read a significant portion of that spec just to figure out where it went wrong.
I don't think I understand what this means in context of my comment. Are you referring to the parent of my comment?
Someone wrote an article about an interesting thing they discovered about a well-known spec and OP's response is "Are people not even reading about what they are using?" Did the author of the article do something wrong?
No, it's to your comment. One obvious reading of "did you read every relevant spec completely" is that if you didn't read every relevant spec 100% cover to cover then it's your fault that your software had a problem with that spec, and a reasonable person could view that as infeasible (those things get really long). Hence, it's easy to see your comment as victim blaming. (I'm not saying that I do or don't agree with that view, just trying to make sure everyone understands each other.)
Your tone suggests you think it's obviously infeasible to read the whole YAML speed before using it. But it is possible to read the whole JSON spec [1]. It takes less than a minute.
Saying that all formats have edge cases as an excuse for YAML's glaring faults is, frankly, a cop out. Like if a bridge collapses when a leaf lands on it and saying, well, all bridges have some maximum load. Yes, but in this case it's so bad it's just not useful for anything.
My comment was to point out how ridiculous the parents comment is in response to the article.
What does the length of the JSON spec have to do with my comment? The parent comment says if you don’t read all your docs you will be bit by an innocuous bug. You linked to a short spec, but that doesn’t mean anything in this context.
> Like if a bridge collapses when a leaf lands on it and saying, well, all bridges have some maximum load.
Is that what you got from reading my comment or the article? Is that what yaml is like?
Sorry, I misread the flow of the conversation. If anything my comment made more sense as a reply to the one you replied to, rather than yours.
More specifically, I missed the first line of their comment: "Are people not even reading about what they are using?" If you miss off that line, then it sounds (to me) that they're arguing YAML is a terrible format. With that line, it turns out they think it's reasonable, so long as you read the (huge) spec first. Madness!
It's not addressing what you said. I don't think what you said is very relevant, as the "read the spec" doesn't add anything to a discussion about whether something is a good idea to use. Length of spec is relevant to that question.
Thanks, I havent seen that before. It took me 10 minutes to go through and it is very clear (took a moment to realize whitespace also alllows no character).
A couple of years ago I looked at tutorials and found it very confusing, but the spec is just great.
While you’re right it’s a shorter spec (setting aside if you’re right about your broader point) this [0] is a more reasonable spec. Even JSON with its microscopic spec has implementation details, inconsistencies, and errata. Is this why we can’t have nice things?
I’m not sure what this comment is trying to say. I’ve been programming for a while and I know plenty of talented, amazing engineers. Nobody reads the whole spec top to bottom for like, anything.
Those are forgivable. I watched a team get burned by doing software HMAC and it turns out the underlying native function in the kernel is not thread safe. Would have caught me too.
JSON being a subset of YAML is a core feature. It was to help with adoption.
There's a difference in not reading all pages of every document and not reading anything at all. Not even a well researched blog post.
If this individual is a junior, then awesome, they learned a lot of valuable ideas and solutions.
If they're a senior, yikes. Don't use technology you can't explain to the junior discovering core features and writing blog posts about it
Yep. I used to love Yaml back in the day. But two things burned me.
1. Significant whitespace in a data storage file doesn't scale. Yes, eventually someone wants to dump a giant graph of data and the library breaks.
2. Intermixed context on quoteless strings. Intermixing code and data doesn't work reliably. Every where I see it tried I see it break. If you don't want quotes on your strings, then you have to put something on your keywords. A simple @no would have stopped this and other situations.
As an aside, I find it stupid to have multiple names for true, false, and null.
YAML turns raw literals into strings, except when the string matches a certain format. Then it may turn it into other things, like int or float, and you better know all the rules by heart and be attentive. And not introduce any typo, which of course no human ever does.
You just learn about the edge cases. This applies to everything. There's a number of languages where an empty list/array is truthy and a number where it's falsey. Learn the tool, do some defensive programming (always quote the strings) and you'll be fine.
(This issue was recognised and since yaml 1.2 (2009) the spec says "no" is a string, packages like https://yaml.readthedocs.io/ have migrated a while ago)
YAML's parsing of `no` as `False` has not been part of the spec for 13 years now. It was changed in YAML 1.2 in 2009 to only be `true` and `false` (with variations in case allowed I think).
I always read documentation. I just have to run into issues because I haven’t read it before I do. I imagine most engineers are like this as we tend to prefer learning by doing
Reading often doesn't save you from gotchas. Usually it just makes you think "oh, right! I remember this bullshit from the spec" after you've been stung.
What is this strange concept you bring up? Googling saved me an entire day of reading dry documents. I may not know how the code works, but I go around telling everyone how easy coding is because of copy+paste.
- human-wise ambiguity of its syntax (If I understand correctly, you "can" indent array items, but you don't have to. And than one guys says "OK, I'm gonna indent", and another guy says "Nah. I'm not gonna indent")
- still no support for datetime as a first class citizen
- strings usually don't need quotes, until they do (I prefer to always quote)
Note that two of the above points are about allowing inconsistent styles, which is a thing I hate.
Love:
- it supports comments whereas JSON does not. If the IETF ever officially updated JSON to support C-style inline (and maybe block) comments, I would absolutely ditch YAML.
You might get a kick out of Concise Encoding then (shameless plug). It focuses on security and consistency of behavior. And it supports comments and has first class date types ;-)
The simplest solution here would be to use JSON5 (https://json5.org/) if you're after comments.
It still doesn't support / standardize dates, though.
But realistically, it's also all about the ecosystem. VSCode for example doesn't come with JSON5 support out of the box. GitHub and many other tools / renderers and supports it at least in syntax highlighting.
Maybe they should try to file an RFC update/new spec? I would be all for it, since it is backwards compatible and covers some essential new needs of a modern configuration syntax. It seems they already have some notable people included in the project, making it more plausible to succeed as a succession to RFC8259.
So use any of a dozen or so other configuration formats that have comments but don't have all the problems YAML does. TOML is probably the most popular of those right now.
TOML is good for config, but not for data exchange though. (Why would I need comments in a data exchange format? Comments are useful when you want to annotate some sample data for other developers who will consume or generate that data.)
Also, OpenAPI specs (also known colloquially as Swagger) can only be written in JSON or YAML.
Agreed, so use JSON for data exchange and TOML for configuration. YAML isn't all that great for data exchange either.
In the cases where you have a tool that require json or yaml, you could use something like cue, dhall, or jsonnet, and convert it into json (or yaml). Unfortunately, that's a little tricky when your build cinfiguration itself has to be in yaml, as is the case for github actions, travis-ci, etc.
I have no hate for YAML, but I think this is the "human readable" format with the most gotchas you might ever encounter.
Every time I use YAML, I got bitten by some edge case. For this reason, I wouldn't count of JSON compatibility, except if the implementation also passes a comprehensive set of JSON tests (which the few I used in the past did not).
Even after writing yaml for years, I can never get the indentation, when to use a dash or not and other things right without an editor knowing the spec of the yaml file I'm trying to write. It's bonkers.
Fun fact: Heroku's app.json actually uses a YAML parser so even though it isn't documented you could use YAML with it. (At least this was the case years ago, it's possible it may have changed)
This is at odds with the top comment here suggesting there are edge case bugs. Assuming validly heroku wouldn’t use something buggy like that either the top comment is wrong or heroku uses two different parsers after inferring type.
Many parsers either default to YAML pre-1.2 or do not even expose a YAML 1.2 option. PyYAML has no 1.2 option, for example. So unless Ansible is using something other than PyYAML...
The top comment quotes an implementor of a YAML parser, an implementor who in an addendum specifically calls out YAML 1.2 as STILL not being a superset of JSON:
> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even though the incompatibilities have been documented (and are known to Brian) for many years and the spec makes explicit claims that YAML is a superset of JSON. It would be so easy to fix, but apparently, bullying people and corrupting userdata is so much easier.
It is very convenient when you need to generate YAML via text template which unfortunately appears to be something that our industry has decided is reasonable. You can do {{ someval | toJson }} and have reliable escaping. Way better than "{{ someval }}" or {{ someval | toJson | indent 13 }}
Did you try the cited example? Because I did, and it parsed as expected
I do believe there are likely some incompatibility bombs hiding in either the monster specification, or undoubtedly in the various implementations, but it was not my experience that one is the one to bring to yaml's court case
So there are six criticisms actually in this file, by my count.
Of those, "NO" is fixed, 0 causing octal is fixed, to me that SQL syntax doesn't look any worse than the low bar set by normal SQL, and the CI providers using different schemas doesn't really have anything to do with YAML.
So that leaves two complaints.
I'm not immediately offended by the clock thing, like I am by NO and octal, but I don't really have the right experience to say how bad it is.
And I was going to say it's bad that nesting escaped string is hard, and it's a shame when languages don't have better quoting mechanisms... then I remembered that YAML has block quotes with no need to escape inside, so that example is just wrong. And they even link later to the stackoverflow post talking about YAML block quotes.
There are problems with YAML but these examples are not good ones.
((By the way, I've seen that "There are 63 different ways to write multi-line strings in YAML" link before but only took the time to fully understand it just now, and that's a gross exaggeration that makes me question the author more than it reinforces their point.
There's a reason the original link says -5- -6- NINE (or 63*, depending on how you count).
Block quotes have 1 or 2 characters to say what to do with newlines, then might have a digit to indicate indentation. That's "60" of the "63". I suppose if it allowed multi-digit numbers there it would be "billions" of ways to write multi-line strings in YAML? That number isn't a real criticism.))
Okay, since this felt like common knowledge and apparently is also an unpopular opinion: emojis can and often do improve accessibility of documentation. Like any other symbolic reference in text, their use can be defined in a key and they can be used to concisely document a support matrix, or signal complexity levels of deeper links, or… even just thematically identify content in a friendly and inviting way.
Do they? Even if you use them perfectly consistently, people need to know what they mean. For some really common ones that might be fine (green check mark on CI probably means the build ran successfully), but the moment anyone gets confused or doesn't recognize a symbol I feel like accessibility drops off a cliff.
Considering in some cultures the meaning of red and green are somewhat reversed from westeren I would not be so sure about that.
I remember being in the traffic management center in Tokyo for an arranged visit and thought the entire city had come to a halt but red was high throughput not stoppage.
I frequently place emoji in code files and database scripts to ensure other people aren't using the wrong encoding in their editors when working on our projects.
(I am an absolutist on this matter. To be a superset, all, that's A L L valid JSON strings must also be valid YAML to be a superset. A single failure makes it not a superset. At scale, any difference will eventually occur, which is why even small deviations matter.)