Hacker News new | past | comments | ask | show | jobs | submit login
YAML: Probably not so great after all (arp242.net)
445 points by kylequest on Aug 18, 2019 | hide | past | favorite | 444 comments



From my experience, while YAML itself is something one can learn to live with, the true horror starts when people start using text template engines to generate YAML. Like it's done in Helm charts, for example, https://github.com/helm/charts/blob/master/stable/grafana/te... Aren't these "indent" filters beautiful?


I developed Yet Another JSON Templating Language, whose main virtue was that it was extremely simple to use and implement, and it could be easily implemented in JavaScript or any other languages supporting JSON.

We had joy, we had fun, we had seasons in the sun, but as I added more and more features and syntax to cover specific requirements and uncommon edge cases, I realized I was on an inevitable death-march towards my cute little program becoming sufficiently complicated to trigger Greenspun's tenth rule.

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

There is no need for Yet Another JSON Templating Language, because JavaScript is the ultimate JSON templating language. Why, it even supports comments and trailing commas!

Just use the real thing to generate JSON, instead of trying to build yet another ad-hoc, informally-specified, bug-ridden, slow implementation of half of JavaScript.


> the true horror starts when people start using text template engines to generate YAML

I just had a shiver recalling a Kubernetes wrapper wrapper wapper wrapper at a former job. I think there were at least two layers of mystical YAML generation hell. I couldn't stop it, and it tanked much joy in my work. It was a factor in me moving on.


Oh my god! I'm working on the same wrapper wrapper wrapper


I called ours The Yamlith. Name yours!


What was the straw that broke the YAML's back?


get out.


The Yamdenburg would be apt since it gives you a pretty good indication of how that's going to end.


Oh the huyamlity!


try moving to something like http://skycfg.fun


Surely the right approach needs to be generating the desired data programmatically, rendering back to YAML if needed, rather than building these files with text macros.


Kubernetes works well but pretty it is not.


> Kubernetes wrapper wrapper wapper wrapper at a former job

oh god why


Why would they have chosen to use template/text to generate YAML? That seems insane.

Surely using an encoder on an object/structure hierarchy (like people do with encoding/json) is the way to go?

On the other hand, the quality of the yaml libraries in Go wasn't great, last time I had to choose a configuration file format.


A lot of people working with YAML have an ops background and aren't familiar with basic data structures.


That’s disingenuous. Most “ops” folks I work with despise templating files but it’s the easiest way to parameterize things, especially when providing ways for “devs” to do “ops”.

Yes, we may not have a cleaner way of deploying k8s Deployment configs to different clusters but the desire to templatize YAML is easy for everyone to understand. The decision to abstract or templatize is one rooted in time and cost, not ability to understand data structures.


personally I’d prefer a templatized yaml file over an over-engineered, snowflake DSL created by a “real programmer” and not an ops person.


Ok, those aren't the only two choices though.


It probably starts with an existing YAML config file that you only need to pass one or two variables to. Then things get out of hand.


What is "an encoder"? Like a function that takes the same variables as the template would but does some work itself generating things?


An encoder is anything that serializes some data. Think `JSON.stringify()`.


YMMV, I believe most folks would call that "serialization," reserving "encoding" for turning a notionally written-down-ish representation into bytes; e.g. string -> utf8 bytes, or float -> IEEE-754 bytes.


Right, for some reason in Go (which Helm is written in), the standard library calls them encoders/decoders with marshal/unmarshal as the operations. Serialize is definitely the more common term generally.


I've seen it used even more generally for any `A -> B` (or `A -> Option<B>`) where B could be faithfully decoded back to A.

In my head, serialization is a special case of encoding when B is some type of string. In this case it's YAML.

I could probably have worded my previous comment more precisely.


At my old place we developed a small tool that wraps CloudFormation with a templating language (jinja2). This was actually great as it CloudFormation is extremely verbose and often unnecessarily complex. Templating it out and adding custom functions to jinja2 made the cfn templates much easier to understand.

I think it all depends. Most of the time I would agree that you shouldn't template yaml, but sometimes, it's the lesser of two evils.


Templating CFN is really good practice once you hit a certain scale. If you have 5 DDB tables deployed to multiple regions, and on each of them you want to specify keys, attributes, throughput, and usage alarms, at a minimum. That’s already 30-40 values that need to be specified, depending on table schemas. Add EC2, auto scaling, networking, load balancer, and SQS/SNS—now untemplated cloud formation is really unpleasant to work with.

Some of the values like DDB table attributes are common across all regions, other values like tags are common across all infra in the same region. Some values are a scalar multiple of others, or interpolated from multiple sources. For example, a DDB capacity alarm for a given region is a conjunction of the table name (defined at the application level), a scalar multiple of the table capacity (defined at the regional deployment level), and severity (owned by those that will be on-call).

To add insult to injury, a stack can only have 60 parameters, which you butt up against quickly if you try to naively parameterize your deployment.

Given all these gripes, auto-generating CFN templates was easiest for me. I used a hierarchical config (global > application > region > resource) so the deployment params could be easily manipulated, maintained, and where “exceptions to the rule” would be obvious instead of hidden in a bunch of CFN yaml. To generate CFN templates I used ERB instead of jinja, but to similar effect.

A side benefit of this is side-stepping additional vendor lock-in in the form of the weird and archaic CFN operators for math, string concatenation, etc. I don’t have a problem learning them, but it’s one of those things that one person learns, then everyone who comes after them has to re-learn. My shop already uses ruby, so templating in the same language is a no-brainer.


For cloudformation; my team a few years ago got a lot of mileage out of using troposphere.

https://github.com/cloudtools/troposphere

The basic type checking done was quite helpful, and avoided some of the dumb errors that we had run into when we attempted to do everything by hand.


> This was actually great as it CloudFormation is extremely verbose and often unnecessarily complex

I think its opposite, the most lean way to deploy AWS resources. Did you wrote it yourself, in text editor? I was doing it for 5 years now. You can omit values if you're fine with defaults, you only state what needs to be different. Other tip is use Export and ImportValue to link stacks.

I kept on using JSON, even after all my buddies jumped on YAML. JSON is just more reliable, harder to miss syntax errors, and can be made readable by not using linters and keep long lines that belong on one line. Also, the brackets are exactly what they are in Python :)

> wraps CloudFormation with a templating language (jinja2)

Not sure it it is a good idea. Everyone's use case is different, though. A well written CFN template is like a rubber stamp, just change the Parameters. The template itself doesn't need to change.


Hell no, terraform is way better even though HCL isn’t the nicest DSL.


3x times LOC, 1/3x speed and weird State corruptions? Not to mention dependence on 3rd party.


k8s and helm is where I learned to dislike yaml. I now want a compiled and type safe language that generates whatever config a system needs.

I'm pretty much thinking I want Go as a pre-config where I can set variables, loops, and conditionals and that my editor can help with auto-complete. Maybe I can "import github.com/$org/helmconfig" and in the end write one or more files for config.


Helm 3 is moving to Lua, that may be better or worse.


Sounds like another short sighted decision. Why don't they support an intermediatory representation that many languages can support. Even yaml would be fine if other languages can generate it. If they had to absolutely use something why not something more main stream and popular like Python. Helm asks too much for the functionality it provides.


-1 for Turing complete config languages.


+1 for Turing complete programming languages instead of half-assed config languages.


Why are you putting logic in the config in the first place? Just let it be data.


That's like saying that God existed before creating the universe. Then you have to ask who created God? And if God was created only from data and not any logic, then the config file must have been really huge and unmaintainable.


Do you have an example of when logic would help a configuration file? Their value and use are entirely dependent on the context.

Also I can't make a head or tail of your statement, no offense intended—I can make out parts, but the whole just has no meaning to me.


Given any non-trivial data-only config file, it will always grow to the point that you'll end up needing to generate it automatically with logic. And that goes double for God's config file.


Do you have an example? I don't think I've ever needed to generate a config automatically (except in niche cases like generating configs for services in chef), and if I'm understanding correctly, nothing proposed in this thread would help with that scenario.

I suspect you're front-loading a lot of logic from the app bootup into the config file, and I'm not sure what you stand to gain conflating those two things.

Like—what's the execution order of a config file? Can you refer to a value later in the file? If so, how does it determine which value is executed before the next? If not, how do you setup circular dependencies—can you redefine config values halfway through a file? If not, you're gonna have to fall back on at-boot pre-processing anyway with sufficient complexity, so just treat a config like dumb data to begin with and do all your logic in the bootup and put all the values/whatever in the config file. And god knows, I would be strongly tempted to murder an engineer that introduced a config file capable of rewriting itself—that engineer has clearly never needed to debug another person's shitty code before.


This thread hurts so bad.

I never used helm / kubernetes before 3 months ago.

Not 2 weeks ago I needed to loop in a helm config file in order to basically say "all this same config, libraries, etc., just run this other command instead" ... because someone who makes those decisions had ~100 lines of environment-injected configuration + boilerplate in the yaml that I couldn't get rid of, needed, and would have otherwise needed to copy / paste.

Since then, those environment variables have been pulled out into a different file (refactoring!), and now we replaced a loop over 100 lines of config, with 2x sets of 15-20 lines of config boilerplate. Better, but still a lot of bull. I don't know what the right answer is, because we've got less helm templating bullshit in there, but we still need boilerplate. Because it's not like I can tear down an entire kubernetes + helm infrastructure because I don't like how the config files are written.

Configs / config generation is hard, and generally awful. If you don't see it that way, congratulations; you're either a genius in your field, you've got not enough experience, and/or you're wrong. If you believe it's easy, and we're all missing something - please, by all means, write a book on how / why configurations aren't as hard as the rest of us say they are.

Best of luck to you.


> Configs / config generation is hard, and generally awful. If you don't see it that way, congratulations; you're either a genius in your field, you've got not enough experience, and/or you're wrong.

The point I'm trying to make is that you're describing broken frameworks, data flows, and work flows, and blaming it on config generation. If you have a counter example, I'd love to see it. Discussing these things in the abstract is pretty pointless and based in emotional language/semantic quibbling rather than meaningful things people can reason about and discuss, like code comparison or time tradeoffs.

Hell, because no specific GOOD examples of configuration-as-code have been brought up, literally everyone in this thread could be considering a different pet example of theirs. It's OBVIOUSLY a waste of everyone's time without examples. Why bother comment at all—to go out of your way to punch down without contributing to the discourse?


> punch down

You say this is easy. Seems to me that you're claiming to be elevated above us all with something we don't know, claiming that everyone else is doing it wrong, all the while hiding behind anonymity.

Stop clutching your pearls and faking the victim. No one is punching down; you're claiming knowledge you don't have and are being called out for it.

Look at any one of the references cited in the thread.


It's quite hard to claim anyone is wrong when nobody (in this thread...) has made substantial claims.


Were my claims insubstantial? Do you think I faked those videos, or didn't write the code I linked to?


Lots of examples in my other post, including links to some open source code (UnityJS).


I've developed programs with tens of thousands of lines of expanded JSON in their config files. No fucking way I'm maintaining all that by hand as pure data.

See: https://news.ycombinator.com/item?id=20735231

Also: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself and https://en.wikipedia.org/wiki/Single_source_of_truth

The opposite of DRY (Don't Repeat Yourself) is WET (Write Everything Twice or We Enjoy Typing or Waste Everyone's Time) -- but twice could be ten times or more, there's no limit. Writing it all out again and again by hand as pure literal data, and hoping I didn't make any typos or omissions in each of the ten repetitions, without some sort of logical algorithmic compression and abstraction, would be idiotic.


> I've developed programs with tens of thousands of lines of expanded JSON in their config files.

To what end? I don't get what you could possibly be putting in these files that could consume so much space without refactoring.


I posted some examples above. (Or maybe it's below. The long message with lots of links.)



FYI, here are some concrete examples, and some demos of a multi player cross platform networked AR system I developed that's based on shitloads of JSON config files describing objects, behaviors, catalog entries, user interface, a multi player networking protocol, etc.

Pantomime extensively (and extensibly) uses the simple JSON templating system I described in this other link, which I wrote in C#. Everything you see is a plug-in object, and they're all described and configured in JSON, and implemented in C#:

https://news.ycombinator.com/item?id=20735366

Pantomime Playground - Immersive Virtual Worlds for the Rest of Us

https://vimeo.com/137924152

Pantomime Creatures - Real-Time Augmented Reality

https://vimeo.com/183132061

Bug Farm – First Minute, First Person 3D

https://vimeo.com/148805981

Consumer Augmented Reality Arrives – with Pantomime

https://vimeo.com/149319403

If I were to rewrite it from scratch, I'd simply use JavaScript instead of rolling my own JSON templating system, because it would have been much more flexible and powerful.

Oh wait -- I DID rewrite at least some of that stuff from scratch! To illustrate that superior JavaScript-centric approach, here's an example of some other JSON based systems I developed with Unity3D and JavaScript, one for scripting ARKit on iOS, and the other for scripting financial visualization on WebGL, both using UnityJS (an extension I developed for scripting and configuring and debugging Unity3D in JavaScript).

One nice thing about it is that you can debug and live code your Unity3D apps running on the mobile device or in the web browser while it's running, using the standard JavaScript debugging tools!

UnityJS is a plugin for Unity 5 that integrates JavaScript and web browser components into Unity, including a JSON messaging system and a C# bridge, using JSON.net.

https://github.com/SimHacker/UnityJS

WovenAR Tools with ARKit and Pantomime (built with an early version of UnityJS):

https://vimeo.com/240612327

A description of UnityJS, and how I use JavaScript and JSON to define, configure and control Unity3D objects:

https://news.ycombinator.com/item?id=19804242

https://news.ycombinator.com/item?id=19748582

https://news.ycombinator.com/item?id=19745034

Here's a demo of another more recent application using UnityJS that reads shitloads of JSON data from spreadsheets, including both financial data, configuration, parameters, object templates, etc.

https://reasonstreet.co/2019/01/31/how-does-amazon-make-mone...

https://reasonstreet.co/2019/08/09/tech-giants-acquisitions/

Here is an article about how the JSON spreadsheet system works, and discusses some ideas about JSON definition, editing and templating with spreadsheets, which is about a year old, but that I've developed it a lot further since writing that article.

https://medium.com/@donhopkins/representing-and-editing-json...

>Representing and Editing JSON with Spreadsheets

>I’ve been developing a convenient way of representing and editing JSON in spreadsheets, that I’m very happy with, and would love to share!

>I‘ve been successfully synergizing JSON with spreadsheets, and have developed a general purpose approach and sample implementation that I’d like to share. So I’ll briefly describe how it works (and share the code and examples), in the hopes of receiving some feedback and criticism. Here is the question I’m trying to answer:

>How can you conveniently and compactly represent, view and edit JSON in spreadsheets, using the grid instead of so much punctuation?

>My goal is to be able to easily edit JSON data in any spreadsheet, conveniently copy and paste grids of JSON around as TSV files (the format that Google Sheets puts on your clipboard), and efficiently export and import those spreadsheets as JSON.

>So I’ve come up with a simple format and convenient conventions for representing and editing JSON in spreadsheets, without any sigils, tabs, quoting, escaping or trailing comma problems, but with comments, rich formatting, formulas, and leveraging the full power of the spreadsheet.

>It’s especially powerful with Google Sheets, since it can run JavaScript code to export, import and validate JSON, provide colorized syntax highlighting, error feedback, interactive wizard dialogs, and integrations with other services. Then other apps and services can easily retrieve those live spreadsheets as TSV files, which are super-easy to parse into 2D arrays of strings to convert to JSON.

[...]

>Philosophy: The goal is to leverage the spreadsheet grid format to reduce syntax and ambiguity, and eliminate problems with brackets, braces, quotes, colons, commas, missing commas, tabs versus spaces, etc.

>Instead, you enjoy important benefits missing from JSON like like comments, rich formatting, formulas, and the ability to leverage the spreadsheet’s power, flexibility, programmability, ubiquity and familiarity.

More info:

https://news.ycombinator.com/item?id=17384078

https://news.ycombinator.com/item?id=18860467

https://news.ycombinator.com/item?id=18171571


I really appreciate the post, I have a much better understanding.

I don't use configs in this way, or if I did, I would not be inclined to call them configs. I can certainly appreciate the problem of processing of JSON objects in many different contexts. I was more referring to a UX concept of providing a configuration interface—short of something like emacs that gives full functionality, simpler and easily debuggable is emphatically better.


The topic of this discussion is YAML (and JSON as an alternative), and of this thread is "using text template engines to generate YAML", which covers a lot more than just config files. YAML and JSON and template engines are used for a hell of a lot more than just writing config files, but they're also very useful for that common task too. The issues that apply to config files also apply to many other uses of YAML and JSON. Dynamically generated YAML and JSON are very common and useful, and have many applications besides config files.

The fact that you've never done and can't imagine anything complicated enough to need more than a simple hand-written data-only config file doesn't mean other people don't do that all the time. It's simply a failure of your imagination.

What I can't understand is what you were getting at about "punching down". When you say things like "I would be strongly tempted to murder an engineer", that sounds like punching down to me. And why you were complaining nobody gave any examples, by saying "no specific GOOD examples of configuration-as-code have been brought up". Don't my examples count, or do you consider them bad?

So what was bad about my examples (or did you not read them or follow any of the link that you asked for)? Pantomime had many procedurally generated config files, using the JSON templating engine I described, one for every plug-in object (and everything was a plug-in so there were a lot of them), as well as some for the Unity project and the build deployment configuration itself. It also used dynamically generated JSON for many other purposes, but that doesn't cancel out its extensive use of JSON for config files.


Here are some concrete examples of some actual JavaScript code that dynamically generates a bunch of JSON, both to create, configure, send message to, and handle messages from Unity3D prefabs and objects, and also to represent higher level interactive user interface objects like pie menus.

What this illustrates should be blindingly obvious: that JavaScript is the ideal language for doing this kind of dynamic JSON generation and event handling, so there's no need for a special purpose JSON templating language.

Making a JSON templating language in JavaScript would be as silly as making a HTML templating language in PHP (cough i.e. "Smarty" cough).

https://news.ycombinator.com/item?id=20736574

JavaScript is already a JSON templating language, just as PHP is already an HTML templating language.

UnityJS applications create and configure objects by making lots and lots of parameterized JSON structures and sending them to Unity, to instantiate and parent prefabs, configure and query properties with path expressions, define event handlers that can drill down and cherry pick exactly which parameters are sent back with events using path expressions (the handler functions themselves are filtered out of the JSON and kept and executed on the JavaScript side), etc.

At a higher level, they typically suck in a bunch of application specific JSON data (like company models and financial data), and transform it into a whole bunch of lower level UnityJS JSON object specifications (like balls and springs and special purpose components), or intermediate JSON user interface models like pie menus, to create and configure Unity3D prefabs and wire up their event handlers and user interfaces. Basically you're transforming JSON to JSON, and associating callback functions, and sending it back and forth in messages and events between JavaScript and Unity.

There are also a bunch of standard JSON formats for representing common Unity3D types (colors, vectors, quaternions, animation curves, material updates, etc), and a JSON/C# bridge that converts back and forth.

https://github.com/SimHacker/UnityJS/blob/master/doc/Archite...

This is a straightforward function that creates a bunch of default objects (tweener, light, camera, ground) and sets up some event handlers, by creating and configuring a few Unity3D prefabs, and setting up a pie menu and camera mouse tracking handlers.

Notice how the "interests" for events include both a "query" template that says what parameters to send with the event (and can reach around anywhere to grab any accessible value with path expressions), and also a "handler" function that's kept locally and not sent to Unity, but is passed the result of the query that was executed in Unity just before sending the event. The point is that every "MouseDown" handler doesn't need to see the exact same parameters, it's a waste to send unneeded parameter, and some handlers need to see very specific parameters from elsewhere (shift keys, screen coordinates, 3d raycast hits, camera transform, other application state, etc). So each specific handler gets to declare exactly which if any query parameters are sent with the event, up front in the interest specification, to eliminate round trips and unnecessary parameters.

https://github.com/SimHacker/UnityJS/blob/master/Libraries/U...

The following code is a more complex example that creates the Unity3D PieTracker object, which handles input and pie menu tracking, and sends JSON messages to the JavaScript world.pieTracker object and JSON pie menu specifications, which handle the messages, present and track pie menus (which it can draw with both the JavaScript canvas 2D api and Unity 3D objects), and execute JavaScript callbacks (both for dynamic tracking feedback, and final menu selection).

https://github.com/SimHacker/UnityJS/blob/master/Libraries/U...

Pie menus are also represented by JSON of course. A pie can contain zero or more slices (which are selected by direction), and a slice can contain zero or more items (which are selected or parameterized by cursor distance). They support all kinds of real time tracking callbacks so you can provide custom feedback. And you can make JSON template functions for creating common types of slices and tracking interactions.

This is a JavaScript template function MakeParameterSlice(label, name, calculator, updater), which is a template for creating a parameterized pie menu "pull out" slice that tracks the cursor distance from the center of the pie menu, to control some parameter (i.e. you can pick a selection like a font by moving into a slice, and also "pull out" the font size parameter by moving further away from the menu center, and it can provide feedback showing that font in that size on the overlay, or by updating a 3d object in the world, to preview what you will get in real time. This template simply returns a blob of JSON with handlers (filtered out before being sent to Unity3D, and kept and executed locally) that does all that stuff automatically, so it's very easy to define your own "pull out" pie menu slices that do custom tracking.

https://github.com/SimHacker/UnityJS/blob/master/Libraries/U...


It sounds like you're assuming that configurations are only created before running a program. But you can also create them while programs are running, to configure dynamically created objects or structures, too. And you can send those configurations as messages, to implement, for example, a distributed network object system for a multi player game. So you may be programmatically creating hundreds of dynamic parameterized "configuration files" per second.


How about normally-Turing-complete languages that can be stripped down to non-Turing-completeness to make a configuration DSL?

This is exactly what Tcl supports / was designed to do (and in turn is one of my motivations for developing OTPCL). This is also exactly what your average Lisp or Scheme supports.


Any reason why you think they're bad? Sounds enticing to me to be able to have a bit of logic in configuration file.


Programming language design and implementation is a huge and hard problem. What you get is an incomplete frustrating language full of semantic oddities and confusions without any serious support tooling to help you out.

If you use it in anger, you quickly need all language features e.g. importing libraries, namespaces, functions, data-structures, rich string manipulation etc. But you rarely get these.

At run-time, you don’t have a debugger or anything leading to a maddening bug fix experience because config cycle times are really high.

Because it’s a niche language, only one poor soul in a team ends up the expert of all the plentiful traps.

Eventually... you give up and end up generating the config in a proper language and it feels like a breath of fresh air.


Why not use an actual language instead? Like Guix uses guile.


One of the most ridiculous examples of this was the Smarty templating language for PHP.

Somebody got the silly idea in their head of implementing a templating language in PHP, even though PHP is ALREADY a templating language. So they took out all the useful features of PHP, then stuck a few of them back in with even goofier inconsistent hard-to-learn syntax, in a way that required a code generation step, and made templates absolutely impossible to debug.

So in the end your template programmers need to know something just as difficult as PHP itself, yet even more esoteric and less well documented, and it doesn't even end up saving PHP programmers any time, either.

https://web.archive.org/web/20100226023855/http://lutt.se/bl...

>Bad things you accomplish when using Smarty:

>Adding a second language to program in, and increasing the complexity. And the language is not well spread at all, allthough it is’nt hard to learn.

>Not really making the code more readable for the designer.

>You include a lot of code which, in my eyes, is just overkill (more code to parse means slower sites).

https://web.archive.org/web/20090227001433/http://www.rantin...

>Most people would argue, that Smarty is a good solution for templating. I really can’t see any valid reasons, that that is so. Specially since “Templating” and “Language” should never be in the same statement. Let alone one word after another. People are telling me, that Smarty is “better for designers, since they don’t need to learn PHP!”. Wait. What? You’re not learning one programming language, but you’re learning some other? What’s the point in that, anyway? Do us all a favour, and just think the next time you issue that statement, okay?

http://www.ianbicking.org/php-ghetto.html

>I think the Broken Windows theory applies here. PHP is such a load of crap, right down to the standard library, that it creates a culture where it's acceptable to write horrible code. The bugs and security holes are so common, it doesn't seem so important to keep everything in order and audited. Fixes get applied wholesale, with monstrosities like magic quotes. It's like a shoot-first-ask-questions-later policing policy -- sure some apps get messed up, but maybe you catch a few attacks in the process. It's what happened when the language designers gave up. Maybe with PHP 5 they are trying to clean up the neighborhood, but that doesn't change the fact when you program in PHP you are programming in a dump.


> One of the most ridiculous examples of this was the Smarty templating language for PHP.

Wow... Yuck!

Lua is at least a "standard" and rather sane language. Not ad hoc insanity.



You should check out dhall-lang


Are you looking for something like https://jsonnet.org/ ?


Some templating languages such as Jsonnet[0] add built-in templating and just enough programmability to cover basic operations like templating and iteration.

I originally felt it was overly complex, but after seeing some of the Go text/template and Ansible Jinja examples in the wild, it actually seems like a good idea.

Perhaps we should more strongly distinguish between “basic” data definition formats and ones that need to be templated. JSON5 for the former and Jsonnet for the latter, for example.


agreed, text templating of yaml (or any structured content) does not make sense. too much context (actual config structure) is lost if plain text is used.

i've collaborated on ytt (https://get-ytt.io) - yaml templating tool. it works directly with yaml structure to bind templating directives. for example setting a value is associated with a specific yaml node so that you dont have to do any manual indenting etc. like you would with plain text templating. defining functions that return yaml structures becomes very easy as well. common problems such as improperly escaped values are gone.

i'm also experimenting with a "strict" mode [1] that raises error for questionable yaml features, for example, using NO to mean false.

i think that yaml is here to stay (at least for some time) and it's worth investing in making tools that make dealing with yaml and its common uses (templating) easier.

[1] https://github.com/k14s/ytt/blob/master/docs/strict.md


Shoot, it will be a hell to test such monster, say you have a single typo somewhere, lol


The issue is, I think most people (myself included) enter YAML into their lives as basically a JSON alternative with lighter syntax. Without really realizing, or perhaps without internalizing, the rather ridiculous number of different ways to represent the same thing, the painful subtle syntax differences that lead to entirely different representations, the sometimes difficult to believe number of features that the language has that are seldom used..

It's not just alternate skin for JSON, and yet that's what most people use it for. Some users also want things like map keys that aren't strings, which is actually pretty useful.

I recall there being CoffeeScript Object Notation as well... perhaps that would've been better for many use cases, all things said.


I've never understood this. JSON is really not that difficult to work with manually. I tend to write my config files as JSON for utilities I write. What is it with peoples' innate aversion to braces?


I don't aversion to braces. Rather, my issues with JSON is that it doesn't have comments and that you cannot use a optional trailing comma.


Having spent a nontrivial amount of my life hand editing large JSON files now, I have to agree here. Lack of comments and trailing commas are a real QOL issue.


Also no multiline strings. Using \n or string arrays is painful.

I don't get why TOML is so underrated, it's barely mentioned in the HN discussion


Thanks for mentioning it; I hadn't encountered it yet and it seems like a very sane config file format. There's several mentions in this discussion actually, nearly all positive.


And the required double quotes around strings. YAML’s string handling is a lot easier to deal with.


I think it is good to require quotation marks for strings, at least for values (although I could live with it if quotation marks for strings are allowed even if not required, since then, if you do not like the feature of not having quotation marks for strings, you can just not use that feature).

Maybe it would be sense if quotation marks were not required for keys with only a restricted character set which are not an empty string, though.


No quotes around keys would be sufficient, honestly. I use YAML a lot for API documentation, and there are still some cases where wrapping your values in quotes is necessary. But requiring it for keys becomes very annoying.


It’s also a lot less obvious. What’s so difficult about wrapping a string in double quotes?


It gets annoying when you have to do what seems unnecessary.


It's not really unnecessary when someone can write an entire article on how unfathomable the output is without them.


It’s easier until you hit one of the cases where a particular value is interpreted as a different type, possibly in a very confusing context. I’ve seen that bite enough people that I end up quoting strings to avoid confusion.


It’s also extremely hard to learn as a beginner.

“Hey, I deleted a character in a string and now I am getting this weird schema validation exception”.


Or “Why did it break when I changed the version from 3.7 to 3.7.1?”



JSON5 looks like good, actually. It has all of the added features that it should have.

They don't mention mismatched surrogates and otherwise invalid Unicode characters, but they should perhaps be implementation-dependent, like duplicate keys are. (It can either allow them or report an error.) There is also the possibility that some implementations may wish to disallow "\x00" or "\u0000" too, I think.

The thing I disagree is the part about white space. U+FEFF should be allowed only at the beginning of the document (optionally), and other than that only ASCII white space should be allowed. Unquoted keys also should be limited to ASCII characters.

Other than that, I think it is good.


JSON is serviceable as an intermediate format, machine-generated and machine-consumed.

It is outright bad as a human-operated format. It explicitly lacks comments, it does not allow trailing commas, it lacks namespaces, to name a few pain points.

YAML is much more human-friendly, with all its problems.


I often hear the “comments aren’t supported” argument against JSON, but as a daily consumer, creator, and maintainer of JSON, I honestly can’t recall ever _really_ needing comments in JSON. It tends to be somewhat self documenting in my experience.


A config file without comments can mean serious annoyance.

If JSON was developed more recently on places like GitHub, it would never have ended up like that with that many deficiencies.


When maintaining a JSON file, did you ever happen to wonder why a particular value is what it is?

This is where comments belong.


If it's that important and complex, have an accompanying README that lists line numbers and comments.


Good idea! And then put a comment into the configuration file that refers to where the documentation is.. ah, f__k!


Lesson to learn: Nobody reads the docs.


This is such a dumb aphorism. I read and create docs every single day.

If the comments are so critical that it is a problem, then an accompanying file with those comments would be used. Otherwise, it's just a bunch of crocodile tears.


The lack of comments is the real problem. When you need to explain why a particular parameter in the config file is set a certain way JSON becomes a real problem.


Comments, comments, comments.

Seriously, our batch jobs for better or worse have configs with a bunch of parameters that are passed around as json, and while most variable names are intuitive and there is documentation on the wiki, and most often the config can be autogenerated by other tools it would still be better if when I manually open it in the config itself I would easily see the difference between n_run_threads vs n_reg_threads, etc...


json's lack of int types is what ruins it for me


JSON alternative with lighter syntax and comments is basically what I tried to make StrictYAML.

I made it largely because I saw a disconnect with what YAML was, and what people - including me - thought it was (which is what it should be).

Don't agree with non-string map keys though... they're a complication I never saw a use for.


They’re fairly useful in applications that use numeric IDs. For example, if I’m using SQL, and I have a table with an AUTOINCREMENT primary key, I’m going to have a lot of numeric IDs. If I want to reference these in a config file of some kind, I don’t want to have to read them as strings and handle the parsing on my end.

Even if you’re of the opinion that IDs shouldn’t be numeric, there are a lot of cases where you’re stuck with integers—on Linux, user IDs, group IDs, and inodes are just a few examples.


Ah I see, yes that makes perfect sense. I've used integer keys too. Sorry, I thought by non-string you meant non-scalar - i.e. the idea of using lists as keys (allowed in YAML).


I think they did mean nonscalar keys. Say I have a compoubd primary key in a database, over 3 columns. In YAML, representing that key as an array of the three columns' values (or a map from column name to value) makes sense, and so does using that as a key in other maps.


I was suspicious of YAML from day one, when they announced "Yet Another Markup Language (YAML) 1.0", because it obviously WASN'T a markup language. Who did they think they were fooling?

https://yaml.org/spec/history/2001-08-01.html

XML and HTML are markup languages. JSON and YAML are not markup languages. So when they finally realized their mistake, they had to retroactively do an about-face and rename it "YAML Ain’t Markup Language". That didn't inspire my confidence or look to me like they did their research and learned the lessons (and definitions) of other previous markup and non-markup languages, to avoid repeating old mistakes.

If YAML is defined by what it Ain't, instead of what it Is, then why is it so specifically obsessed with not being a Markup Language, when there are so many other more terrible kinds of languages it could focus on not being, like YATL Ain't Templating Language or YAPL Ain't Programming Language?

https://en.wikipedia.org/wiki/YAML#History_and_name

>YAML (/ˈjæməl/, rhymes with camel) was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki. Originally YAML was said to mean Yet Another Markup Language, referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.

https://en.wikipedia.org/wiki/Markup_language

>In computer text processing, a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text. The idea and terminology evolved from the "marking up" of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red or blue pencil on authors' manuscripts. In digital media, this "blue pencil instruction text" was replaced by tags, which indicate what the parts of the document are, rather than details of how they might be shown on some display. This lets authors avoid formatting every instance of the same kind of thing redundantly (and possibly inconsistently). It also avoids the specification of fonts and dimensions which may not apply to many users (such as those with varying-size displays, impaired vision and screen-reading software).


https://noyaml.com/

YAML is bad.

Every YAML parser is a custom YAML parser.

https://matrix.yaml.io/valid.html


The problem is with parsers, how they are implemented or used. YAML actually has a way to specify type of the data, alternatively the application supposed to suggest desired type. What's this take is showing is what types are assumed when they are not specified.


Oh Puppet, why did you use your own executable YAML.


I'll say it: I think YAML is great and a joy to use for configuration files. I can write it even with the dumbest editor, I can write comments, multi-line strings, I can get autocompletion and validation with JSON schema, I can share and reference other values. It allows tools to have config schemas that read like a natural domain specific language, but you already know the syntax. I haven't had problems with it at all.


This was me too - until yesterday, when I made a minor change to one of our YAML config files and everything broke. On investigation it turned out that all of our YAML files had longstanding errors but those errors happened to be valid syntax and also did not cause any bad side effects, so we had been getting away with it by pure luck until I made a change that happened to expose the problem.

So now no longer a YAML fan...


That would make me not a fan of the particular parsers/validators I've been using, rather than not a fan of YAML.

The big strike against YAML I see there is that it needs a good conformance test suite and implementations need to be tested against it. But that's not a problem with the format but a fairly easy to fix ecosystem problem.


> of the particular parsers/validators

But the syntax was valid, the parsers/validators would've been correct to accept it.


I agree. As long as you're using a strict parser, I've found YAML to be much nicer for configuration than JSON. I use Python's ruamel.yaml library, and have never had any weird type problems. Once the nesting gets too deep, it can be a pain. but that's the same for JSON.

I have found myself using TOML more and more for configuration, though. It helps a lot with keeping things flat and easy to read. I'll still prefer YAML over JSON for human-writable files, but I'm starting to prefer TOML over YAML.


I've got to say it is the most frustrating config file ever to wrote. The only time I have to use it is for Docker Compose and I am constantly fighting vim on indentation and trying to make sense of confusing errors about "unexpected block start." Do you have any suggested vimrc for YAML?


fish shell is looking for a new text serialization format for its history file (currently it uses an ad-hoc broken psuedo-YAML).

Boxes to check:

1. Self describing format

2. SAX-style parser available to C++

3. Easy for users to understand and ad-hoc parse using command-line tools

4. No document closing necessary, so appending is trivial

YAML looks pretty good:

    - cmd: git checkout file.txt
      when: 1565133286
      pwd: /home/me/dir/
      paths:
      - file.txt
protobuf is also an option:

    entry {
      cmd: "git checkout file.txt"
      when: 1565133286
      paths: "file.txt"
    }
though I am unsure of how well its text serialization is supported.

Any suggestions?


Disclaimer: I work on Tree Notation. (https://github.com/treenotation/jtree)

Here's a proposal: use a Tree Language.

I created a demo for you called "Fished": https://github.com/breck7/fished.

Took me just a few minutes but already get type check, autocomplete, syntax highlighting, and more.

Tree Notation is early, and there will be kinks until the community is bigger, but I think it may be useful for you.

http://treenotation.org/designer/#grammar%0A%20fishedNode%0A...


Wow. This looks really cool. Is there a sort of design defense on how this was designed (tree notation)?


Not sure if I’m familiar with the term “design defense”. Can you explain?

Stumbled into the idea. Basically just brute forced it. Tried thousands of things, built a huge database of languages, and tried to keep it simple.


I take the idea of “design defense” from Pyramid (Python Web Framework) [0], and have incorporated it into documentation on my projects.

Basically, it’s a narrative discussion of how this solution came to be, the trade offs involved, and perhaps its relationships with prior art.

[0] https://docs.pylonsproject.org/projects/pyramid/en/1.10-bran...


Do you know if this is called something else? (perhaps in other disciplines)

Googling "Design defen{c,s}e" just got me a whole lot of military contractors.


http://enwp.org/Design_document is the commonly used term of art.


Very cool. Thank you. Seems like it would be a useful exercise. Added to the todo list.


How about JSONL (JSON Lines)? http://jsonlines.org/

Ps. Thanks for (all the) fish, it's my daily driver shell and keeps me that much more sane c.f. the alternatives.


That's really close to [RFC 7464](https://tools.ietf.org/html/rfc7464), JSON Text Sequences. It uses U+001E RECORD SEPARATOR. The `jq` tool supports those if you pass a flag.


So just add a record separator at the end of each line?


jq can also handle newline-separated JSON objects, with the -s flag.


(suggestion) Drop the 4th requirement.

Having to close contexts is a VERY good 'sanity check' to see if something is malformed or not.

If appending is necessary make the parser handle multiple copies of the namespace and merge them upon output. Unknown keys and sections should also always be copied from input to output (this is how you embed comments).


I'm interested, can you explain more about the merging idea?

To clarify the requirement, history could be a JSON array of objects:

    [
        {"cmd": "git checkout", "when": 1234 },
        {"cmd": "vagrant up", "when": 4567 }
    ]

To append an entry to this file and keep it valid, one must locate the closing square bracket and overwrite it. That work is what I hope to avoid.


Why not use a “stream” of objects?

  {"cmd": "git checkout", "when": 1234}
  {"cmd": "vagrant up", "when": 4567}
Not sure about other languages and libraries, but Go supports this out of the box[1]. And while we're at it, why not CSV? That can be processed with awk.

  #cmd,when
  "git checkout","1234"
  "vagrant up","4567"
[1]: https://play.golang.org/p/sTN9z4Kv3DB


CSV is fine for simple cases but has issues with versioning (adding new/optional fields) and nested data like arrays.

The object stream idea is apparently supported widely and seems pretty strong. Thanks for the suggestion!


I use JSON streams a lot with command line tools. Keep in mind that it’s limited in that each object must consume a single line. This allows you to recover from syntax errors in a single entry; each line is a fresh start.

Have you considered SQLite? I know it’s not a friendly text format, but it alleviates a lot of the issues with the “append to a text file” approach, such as concurrency. It’s great for this sort of thing.


Loosely running with your example:

A file already exists with this content:

[{"cmd": "git checkout", "when": 1234 }]

Another tool wants to add a setting/element/etc, and simply creates a new config object with only the change in question included and APPENDS the existing config.

[{"cmd": "git checkout", "when": 1234 }][{"cmd": "vagrant up", "when": 4567},{"comment": "Comments, notes, etc are kept even if they don't validate to the recognized configuration options."}]

A configuration validator / etc loads the config and merges these in state-machine order, over-writing existing values with the latest ones from the end of the stream of objects and then determining if the result is a valid configuration. (Maybe file references fail to resolve / don't open / there's some combination of settings that's not supported...)

    [
      {"cmd": "git checkout", "when": 1234 },
      {"cmd": "vagrant up", "when": 4567},
      {"comment": "... actually kept as above"}
    ]


> Unknown keys and sections should also always be copied from input to output (this is how you embed comments).

Better than nothing I guess, but I'd say just use a syntax that supports comments.


S-Expressions are quite simple, there are some parsers floating around in well-known projects, although I'm not sure they're SAX-style: https://leon.bottou.org/projects/minilisp

I also wonder if you need a text format, or if SQLite or systemd's journal API would work.


I love proto, but the textformat was an after thought. The binary format is rigorously defined, portable, extensible and optimized. The text format was reverse engineered from the c++ implementation after the fact when folks found textproto useful. Unfortunately there are discrepancies between languages around the corner cases of the textformat and that's the sad world we live in. Avoid letting textproto be part of your user exposed interface.


TOML?


TOML would be great, if not for an annoying obscure detail in the specification that makes it hard to use for my typical use cases (scientific computation) [1]. Moreover, I find quite unintuitive how you are supposed to specify array of tables [2]: this kind of is much easier in JSON (which is the format I am currently using, although it is far from perfect).

[1] https://github.com/toml-lang/toml/issues/356

[2] https://github.com/toml-lang/toml#user-content-array-of-tabl...


That is an annoyingly obscure detail, I think the wrong decision was made there. Hopefully it gets reversed.

Personally I really like the array of tables syntax. It is a little unintuitive but it's not difficult to remember. It's useful for fulfilling the OP's "No document closing necessary, so appending is trivial" requirement.

And if you don't want to use it, you can always use inline tables in an array, just like JSON.


Can't recommend TOML enough. I use it for everything. Super simple and easy to edit.

It fulfills all of the requirements. There are several available C++ TOML parsers, including one from Boost.


Do you ever have the limitation that arrays must be of a single type come up as an issue?


I haven't run into that. I think in the contexts for which TOML was meant, it isn't that big of an issue though. Also, I'm not entirely sure how many parsers enforce that restriction.


TOML is great when your data are mostly flat.


It's great at representing nested data in a flat way, as well.


Obligatory "thanks for fish shell".

Try just using line-delimited JSON objects (http://jsonlines.org/). It ticks all of your boxes, especially 3: "jq -s '.cmd' fish_history | histogram".

Neither YAML or Protobufs are quite as easy as that.

All in all it's ridiculously simple, easy to parse in a variety of languages and each row is a single line that's simple to iteratively parse without loading the whole thing into memory.


this seems like it makes json useful for logging, but not too useful as configuration. For instance, it doesn't support commenting, and it seems like every line needs to have all its children compressed onto one line?


Indeed, it's not suitable for configuration. But we are talking about logging shell history, not configuring it.


Tcl with control structure commands disabled and infix assignment for convenience. Jim Tcl is a lightweight implementation if the main line isn't workable.


I'm tempted to suggest CSV.


I've used YAML as the format for a config file, and I certainly regret that choice. Trying to explain to someone that doesn't know YAML how to edit it without setting them up for failure is quite annoying. There are too many non-obvious ways to screw up, like forgetting the space after the colon or of course bad indentation.


YAML is easier to read and write. That's the benefit. It's also always going to be smaller than anything JSON or XML. Maybe it's not as correct, maybe some people don't like it, I don't really mind it. I don't see it really going anywhere soon either considering Kubernetes and the lack of alternatives in widespread usage.

I've never had someone that needed extensive help understanding YAML and that's besides reviewing work for people just coming up to speed. Find me an IDE or editor that doesn't have YAML support. Also, YAML supports comments so if you have pitfalls people need to know about you can document them inline.

Your argument is people who don't know things might screw stuff up. Well Yeah! This applies to everything.


>”YAML is easier to read and write.”

You may be surprised to find that there’s significant disagreement on that point.


Not surprised at all, people on the internet complain about everything. When they build something better I'll be the first to jump ship.


> When they build something better I'll be the first to jump ship.

Will you though?

With learning a tool (almost any tool), its value increases with more use until you find a local maximum. The amount of effort to switch to something with a higher maximum at that point will definitely be considered by most people as part of the cost of that competitor. It is very hard to write off years of your life on anything.

When was the last time you did this?

It could take a few months at a minimum, just to see if you can understand it. Then, perhaps years of building things another way just to get good at it. Would you really give a few years of your life in making configuration suck less forever?

If you're serious, I have a potential solution: We don't have configuration files (or indention) or parsing problems, and users aren't the slightest bit confused on how to configure our applications (although they are often surprised if they've ever had to use a configuration file!). The downside is it's going to require a lot of re-learning on your part, and there's little I can do to make it any easier for you.


Will I? Personally, I enjoy tinkering with new tech. If we look at trends in technology, it's an absolute guarantee.

The last time I wrote off technology? In 30 years I've been doing this, I would say constantly, especially with tooling. I wouldn't consider the lessons learned using defunct tools to be lost either.

I'm not complaining about serialization formats, so why would I dedicate time to making it better?

Solution? I'm a quick learner so I'm not overly concerned. In a perfect world, you could just point me to your documentation.


Ok, I'll give it a shot.

Smalltalk environments don't typically have "configuration files" because it's so much easier to just directly manipulate the configuration parameter. Interactive development- the kind impossible with anything but the most specialised of IDE-- simply makes "config files" obsolete. Take a look at the seaside configuration guide[1] to get an idea what this is like.

In k/q[2] we also don't typically use configuration files, but for a different reason: Every type can be serialised over network or onto the disk, and when it's written to the disk it's usually in a format we can mmap() and access transparently. This is how q is also a database -- the data types q supports also includes tables.

k/q also has a built-in event loop that's not dissimilar to what you get when you run nodejs with the debugger port open except it's fast, and it's the regular way k/q processes communicate with each other.

What typically happens is that a table is designed for configuration, and we just expose it to the UI. Production environments are usually locked down so the only UI is to edit existing configuration parameters (and those are permissioned accordingly). These UI are typically quite general for any k/q data type, so they're quite rich and easy for people to use.

Then parts of the application interested in configuration just query the appropriate configuration table - this is only about 1000x faster than you would expect connecting to a remote database, and in many ways it's similar to a python application just storing its config in an sqlite database, except SQLite doesn't let you have a table as a data type so you can't put a table into another table, and you don't have tooling around comments and advice like you do with k/q UIs.

There are other places if you look carefully: Environments people have used (even beyond thirty years ago) that didn't have configuration "files", often had interesting and useful solutions to storing configuration. People tended to build configuration into part of their application, and so the legacy of that has tended to be excellent tooling instead of novel file formats.

[1]: http://www.shaffer-consulting.com/david/Seaside/Configuratio...

[2]: https://kx.com/


You're trying to tell me a database or passing config options to your program are better than configuration files? What you're describing is the entire reason and purpose of why configuration files exist. YAML is also not limited to configuration files. If we look at the example you provided it's pretty clear:

"From the point of view of a component, the configuration can be thought of as a Dictionary-like collection"

This is exactly what YAML and JSON provide too. Using a configuration file is a choice, so are parameters, and so are databases. I'm not really understanding what you're trying to get at?


No. I'm saying when your application language is also your database or your operating environment [or for some other reasons], you don't need a configuration format.

The reason configuration file [formats] exist is because many programs are configurable and programmers are too lazy (or are not specified to, take your pick) to build a configuration tool [that has all their needs]. Configuration files are inferior in every way to an integrated and well-thought-out configuration process except that they may be easier to build and use in less ideal environments.

JSON is a fine format for interchange, and even persistence (i.e. to store configuration) but as a "configuration file" that people are expected to edit in their own way it is lacking, and that's why there are things like YAML and TOML and a million other things.


Giving meaning to whitespace causes so many headaches and yet people still embrace Python, for some reason. I don’t understand it.


Your editor makes a world of difference here. Since you shouldn't be writing brace-language code without indents anyways, the biggest issue remaining is mixing tabs and spaces. Gedit makes this a big pain with it's default config (it doesn't even auto-indent) but Atom and IDLE handle it well.


Code you write yourself is not usually the source of problems with significant whitespace; it's situations like posting code on websites and discussing it where code in a whitespace-significant language becomes next-to-useless when leading whitespace is stripped, whereas code in any other language will still survive and then easily be autoformatted without changing its meaning.


If I were to use such a shitty website, I'd rather make a pastebin and link to it, instead of forcing every reader to reformat it themselves.


Can’t remember the last time this has actually happened to me. In what websites are people posting code without code block formatting support? Like, instant messengers?


Funfact, even Facebook, Whatsapp, and Telegram support preformatted text in triple backticks.


I’ve heard this argument in a lot of contexts, and it has always struck me as saying, “if hitting yourself with this bat hurts, try wrapping this towel around it and maybe it will hurt less.”


Python 3 rejects mixed whitespace so the problem will be caught quickly.


If I pinky promise to indent my code anyway, then why does it matter whether I also have braces or not? In fact, braces allow me to press one button in my editor and get the indentation absolutely perfect, without affecting semantics.

Braces also allow for easily copying and pasting blocks of code because the braces delimit the semantics of the copied text. Because your code is already indented, with white space indentation you have to check that a) you pasted the first line at the right indentation level b) every subsequent line is also at the right level relative to the first line. No small feat.


> If I pinky promise to indent my code anyway, then why does it matter whether I also have braces or not?

Without braces copy-pasting becomes a context-sensitive headache. That's poor usability.


Admittedly I’ve been writing code in Python for many years now but, even from the start I never had a problem with the significance of whitespace.

Quite the opposite.

I like to format my code nicely anyways (or rather, mostly my editor does it for me because I’ve asked it to do so).

I indent with two spaces usually, regardless of language. And have my editors configured to insert two spaces when I press tab.

JavaScript, Rust, Python, C. Same difference, in terms of how I use whitespace.


The main headaches are due to people either wanting to copy and paste code from various sites, or wanting to write really deeply nested code.

If you're writing well structured, original code in Python, it's generally cleaner and easier than other languages because the syntax avoids ambiguities that other languages have.


The difference in my experience is that once you know what's wrong with your whitespaces in Python, you're out of the woods. The interpreter is your friend from that point onward. YAML parsers, on the other hand, give you these really strange errors that are pretty difficult to understand, and it doesn't end with whitespaces.


There are quite a few comments saying they don't like python even from 10+ year users.

Language becomes popular largely through library ecosystem and resources around it, not just how the language looks. I think Google embracing it had a good role in acquiring mind shares.

https://news.ycombinator.com/item?id=20672051


YAML is so bad for human writing. Everytime I write ansible tasks, I get confused with indentation and how to do arrays etc. JSON and YAML is frankly a generation behind compared to TOML or JSON5.


I’m not keen on how so many tools and services opt for YAML by default, either. Both JSON and YAML are a nightmare to handle once you’ve got 3000 line files and several layers of nesting.

CI would be a lot nicer to use if it didn’t rely on a single YAML file to work. And if you want to switch, suddenly you had a build step to convert back to YAML.


I keep my YAML CI files as minimal as possible by putting the logic into a Makefile and/or shell scripts and just have the YAML invoke that.


As an ansible user, I hate YAML and its broken parsers with a passion, but the security objection does not make much sense. It does apply verbatim to any parser of anything if the implementation decides that a given label means "eval this content right away". I fail to see how this can be a fault of the DDL rather than the parser's.


The reason this is a fault of the DDL and not the parser is that the DDL spec decides that it has label that evaluates a command. The parser then has two options, either implement it or not conform to the spec (and essentially implementing a different DDL). For programming languages it makes sense to have an eval label/command. For configuration/serialization DDLs I think it's a terrible choice.


And terrible it is indeed, but I cannot find it specified - the strings eval, exec, command, statement do not even occur in the official specs (shallow doc perusal, I know)


That's because there's nothing in the spec stating anything about execution. The parent is simply incorrect. That's why they haven't responded.


> DDL spec decides that it has label that evaluates a command

This is simply wrong. There is nothing in the spec stating that.


> As an ansible user, I hate YAML and its broken parsers with a passion

Could you elaborate on this? I use Ansible daily and I've never had a problem with YAML once I took some time to understand it. What do you mean by broken parsers? I'm assuming that's something Ansible specific you are referring to.


I intensely dislike yaml's whitespace-based syntax because whitespace is white, and it gives very little visual context expecially in long, nested documents. Editors that expand/collapse branches do help some, but are no match for highlighting matching pair of braces in other saner formats/languages (I am also not a fan of syntactic whitespace in python, if you get my drift.)

And ansible's parser is broken, in more ways that I can remember (haven't been writing playbooks and stuff for a couple of months now). If you like pointless pain, try embedding ":" in task names for a demo (or one of other several "meta" characters: the colon is just the one that ends to recur most).

I will give a passing mention to the smug, vague error message "You have an error at position (somewhere in the middle of the file) It seems that you are missing... (something) we may be wrong (they almost always are) but it appears it begins in position (some position close to the first line)" that sets off a hunt for the missing brace/colon/space/whatever and makes me want to do stuff to the person who devised it.

This compounds with the confusion brought on weaving of yaml's and jinjia2 syntaxes and ansible's own flakiness on deciding what is evaluated when - which decides when and if a variable does indeed change, when does yes means "yes" rather than true or 1, but not '1' or "true" (try prompting the user for a boolean variable, and find yourself writing if ( (switch == "true") or (switch == "1")) in short order).

Pity that ansible is so damn convenient, or I would have ditched it long time ago for anything - bash included (OK, maybe not bash).


Funny, as an Ansible user I love YAML. It works so well for me.


So what's the HN consensus on the best format for config files?

Is it TOML as the author seems to prefer at the end?


My vote is yes. Most configuration doesn’t need anything more sophisticated than key-value pairs, perhaps with namespaces. INI can manage that and TOML is basically a better-specified INI.


I can't tell if I've spent too much time on HN or if I came to this conclusion on my own, but TOML is my language of choice for configuration now. It's flexible in the right ways and sectioning of config is so important.


No reason to use INI over TOML. INI doesn't even have a standard specification.


That’s my point, yeah. INI is the right idea but TOML is the same thing but actually specified so use it.


.ini, followed by TOML, followed by an identical implementation of some other app's config format.

The biggest problem with config formats is they mislead users into thinking they understand the format. The user tries to edit it by hand, and chaos ensues. So only formats that are stupidly simple, or whose warts are already familiar and well documented, are good choices.

Apache had a great configuration format. Nothing else used it (that I knew of) but you could in theory implement "Apache configs" and then people'd just have to look up how to write those, which there's lots of examples of.

JSON and YAML and XML are data formats; they should only be written by machines, and read by humans. Same with protocols like HTTP, Telnet, FTP... You're not supposed to write it yourself, but it's readable to make troubleshooting easier.

Data formats are nice for expressing nested data structures, but then they don't (usually) support logical expressions; at that point you need a template/macro/programming language, and at that point you're writing code, which will need to be tested, and at that point you should just write modules and use a config format to give them arguments. Every complex tool goes through the same evolution.

If you care about your users, write a tool to generate configs based on a wizard. Good CLI tools do this, and it really makes life better. (It's also a great way to document all your config features in code, and test them)


If possible, prefer what tools in your vicinity use. My team uses Kubernetes and Concourse extensively, which both use YAML, so I tend to stick with YAML since people are already familiar with it.

(More recently, I've come around to prefer plain environment variables for configuration, but that only works nicely when the amount of configuration is fairly limited, say 20 values instead of 1000 values.)

For my own use, I do prefer TOML.


Perfect comment! I agree 100%.


In the scale world, HOCON is very nice. It’s a format designed explicitly for config files, and has a lot of niceties (like you can append files together and they merge correctly, so you don’t have to end up with giant config files)


What's the "scale" world?


I think they meant "Scala", the programming language.


Sorry, yes, Scala. Autocorrect changed it and I didn't notice.


I agree with HOCON being nice based on personal usage but I haven't seen an in depth analysis of it. This is the canonical parser for JVM based languages — https://github.com/lightbend/config, are there many other implementations that are widely used?


That's the one Akka uses, which probably comprises the majority of HOCON usage.

I've also used the Python port without issues.


I think it's horses for courses. JSON I guess is the best for interchange i.e machine to machine, but I never want to edit it by hand; XML is relatively easy to read but can be quite painful to edit raw, but it can be quite easy to develop a structures editor. I’d favour it for document persistence. YAML is fine for configuration files but I would be careful about how I apply it and would always provide it as a heavily documented templated config file. YAML when used correctly is by far the easiest to edit in the clear, with a plain text editor. With that said, I would try to get away with basic namespaces properties files first before I’d go that far ...


ini if needs are crazy simple, YAML if you need a structure like JSON's but with something any human ever needs to interact with. JSON if humans aren't in the loop.

TOML, in my opinion, is like a weird mishmash of JSON, ini, and bashisms. Though I have worked with it a lot less than the other formats, so YMMV.


Since when is JSON not human readable/maintainable??


Try writing strings with backslashes, or adding a new line to your array and accidentally leaving a trailing comma (or accidentally forgetting it for the previous row). It's also just very visually noisy. I agree with a lot of the other comments in this thread that for human configuration: TOML > YAML > JSON > XML


You can edit JSON by hand but it’s not what it’s for. It’s not designed for that and it’s not really suitable for that. Theoretically you can milk anything that has nipples, but you might find the experience of milking a cat to be ... challenging


The main issue I had with TOML is how much more syntactically noisy it is. Equivalent files with 2-3 levels of nesting usually become at least 50% longer than equivalent YAML.

More here : https://hitchdev.com/strictyaml/why-not/toml/


This is a different use case, I think. This example is defining content, not configuration. In this case the content is user stories. I agree for creating sequences of documents/content in this way, YAML often is nicer. But for configuration, TOML is designed to specify it in a simple and flat way, and that can be very helpful.

I have some projects where I'm frequently writing and midifying content that resembles the example here, and I use YAML there and plan to keep using YAML. For most other things, I'm just doing configuration, so I use TOML. No reason you need to stick to one or the other.


Been working more with Dhall and have really enjoyed it so far.

https://dhall-lang.org/


Putting commas at the start of the line is the toe shoes of syntax.


You can put them at the end of the line.


There's no white Knight here, they all suck in some way. Personally I've had decent success with yaml as simple configuration, but I would never use it as an interchange format. If you know it's caveats and you're targeting one language so you can become familiar with the parser it's serviceable.


I say just use JSON. Everyone knows it already and it's good enough. Use a parser in your app that allows comments and trailing commas like vscode does.


That's not JSON anymore, that's some custom format that's JSON inspired.


If JSON did support this (and multiline strings), I think it’s unlikely anyone would reach for anything else.


Yep. Some kind of JSON++ is where we're headed. Hopefully we can agree on a new standard someday?

(No, not YAML.)


JSON5.org would be nice :)


Cool. It doesn't matter.


It's called JSON5. Don't bend JSON to confuse parsers.

https://json5.org/


I dislike json with comments or trailing commas as even if your parser can handle them, it surprises many text editors.

Aside from lack of comments, the other major thing that can sometime make json a bad config choice is lack of multi-line strings.


JSON is for data. Not documents. Not config files. I don't agree with any "add this to JSON" comments. It's fine just as it is....for data.


Config files are data about program configuration. So “for data” and “not config files” don't go well together.


I'm still using INI files like it's 1999.


Everything being a string is a bit of a joke now.


I use JSON in the end. I prefer to write TOML, then parse that into JSON. This seems to strike a nice balance between human/machine write/read. It's simple enough to reason TOML, even if it gets verbose. If I have to write YAML after 2 layers I usually write it as JSON and include the JSON in the 2nd level of key.


It's telling that the responses to this question are broad and varied. Still not a well solved problem, it seems.


It's also telling that even with every other possible answer being given by someone, there's still no one who wants XML.


nah, the XML people are just keeping quiet because no one likes hearing 'I told you so'


My cursory survey of config / serialisation formats concluded that nothing is close to being good.

It's overly verbose, and hard to understand XML, it's no comments son, horrors of yaml or some okay format that doesn't have parsers for the languages (plural) you are using on your project.


For in-house, python-only project, my way to go is to create a "config.py". Then I declare a bunch of module variables that can be overridden by environment variables as a bonus.


J S O N all the way.


YAML for me, then TOML


If you still aren't convinced YAML is terrible, try copying and pasting YAML fragments with a regular text editor.

You might end up with valid YAML, but you won't know until the YAML consumer barfs.

BTW, all of a sudden XML with DTDs are looking sane again :)


Use the right tool for the job. I use yaml extensively but never in a situation where someone would want to edit it with a regular text processor.


do you deliver a YAML editor with your software? Because people will use notepad or nano to edit that stuff.


Sort of, I deliver a GUI that exports into YAML for pretty much only reading, portability, and version control. People are expected to do the editing in the GUI, only using YAML for editing when doing complex regex operations that my GUI doesn't support.


If people aren't editing it by hand, why does the format matter? Why not just use JSON? Tools for too-complex-for-the-GUI manipulations are at least as good for JSON as they are for YAML, and the editing is less error-prone.


Internally it is JSON. It exports as YAML for readability when sharing on discord.


Regardless of the reasoning laid out in the OP, it's difficult to argue in YAML's favor comparing it with JSON. I'm not an ardent fan of JSON either -- both YAML and JSON have issues wrt inconsistencies:

- what draft of JSON Schema are you using 4? 7? Neither?

- what version of Swagger or OpenAPI are you using?

- etc.

Sure, it's great to see ongoing development of schemas, but with each new development we have yet another dialect to consider/support.

In my view, perhaps an even greater problem with structured data formats in general is the void that separates them from programming languages esp. static languages such as Java where static type information is otherwise leveraged. The industry standard solution, code generation, is awful in almost every respect. The Manifold framework looks promising in this regard (http://manifold.systems/).


JSON Schema is not affiliated with JSON and should not be confused with it. JSON is a data format, like YAML, and there is only one version of it: the spec at http://json.org/.



XML is as pleasant to look at or touch as a nettle rash, but it seems it can join ALGOL 60 among the ranks of technologies which were a great improvement on their successors.


Umm, no. You can find cases where JSON sucks, but you have to look for them. You can find cases where XML doesn't suck, but you have to look for them.


Other than looking ugly and being a pain to type does xml actually suck?


Is there an agreement on whether it’s

  <author name="pete" />
or

  <author>
    <name>pete</name>
  </author>
yet?


Yes there is. Markup languages are for representing rich text. Anything that gets rendered to the user is content and anything that's metadata (data about how to render) but not rendered as such goes into attributes. If there is no concept of "rendering to the user" in your app/data, then markup isn't the right choice for representing your data. You're blaming markup languages for not being designed to represent arbitrary data structures when you should be blaming yourself for misusing and misunderstanding markup. Though by skimming through this thread, XML appears to still do much better than YAML. And it's true that in the 00's, XML was improperly used and advocated as universal data format.


That is a reasonable approach - thanks.


Attributes are XML’s foot-gun.


Particularly annoying is that there's no way to do lists with attributes. Looks good:

  <user name="alice" group="wheel" />
Oh no:

  <user name="alice" groups="wheel admin sudoers" />
Or is it one of:

  <user name="alice" groups="wheel,admin,sudoers" />
  <user name="alice" groups="wheel:admin:sudoers" />
  <user name="alice" groups="wheel;admin;sudoers" />
Or give up and use elements:

  <user name="alice">
    <group>wheel</group>
    <group>admin</group>
    <group>sudoers</group>
  </user>
Hold on, should there be a container?

  <user name="alice">
    <groups>
      <group>wheel</group>
      <group>admin</group>
      <group>sudoers</group>
    </groups>
  </user>
XML really needs either richer attributes, or no attributes.


Replace with "Element" with "Class" and "Attribute" with "Property" in your examples.

It's your job to decide how the data should be structured, in any language.

I would do the following :

  <user name="alice">
    <groups>
      <group uid="wheel"/>
      <group uid="admin"/>
      <group uid="sudoers"/>
    </groups>
  </user>


Attributes are XML’s foot-gun.

I disagree. Back in the day we used attributes for everything that was key value and inner tags for anything with structure. We also formatted for clarity:

    <lunch
      env="outside"
      food="sandwiches"
      drink="cola"
    />
Compared with what we used to do, I look at attribute-less maven pom.xml with horror.


The decision of when and how to use attributes is the thing that is most often done wrong or clumsily in an XML file. That is my point.


What if it's to be used by french speaking software/people ?

    <lunch
      env="outside"
      envfr="dehors"
      food="sandwiches"
      foodfr="sandwichs"
      drink="cola"
    />
Or

    <lunch
      env="outside"
      food="sandwiches"
      drink="cola">
          <label type="env" language="fr">Dehors</label>
          <label type="env" language="de">Außenseite</label>
          <label type="env" language="en">Outside</label>
    </lunch>
Quite curious about it.


I guess this way:

  <lunch
    env="outside"
    food="sandwiches"
    drink="cola">
    <label xml:lang="fr">Dehors</label>
    <label xml:lang="de">Draußen</label>
    <label xml:lang="en">Outside</label>
  </lunch>
says[0]the w3.

[0]https://www.w3.org/TR/REC-xml/#sec-lang-tag


I see, thanks. (I also see you corrected my broken German ^^).


Putting l10n/i18n data inline is generally not a pleasant experience, and this problem persists regardless of what serialization format you are using. Either have separate data files per-locale that are merged with the defaults or store the localized strings in a central location (ala your usual gettext setup).


namespaces are absolutely a bear and always unpleasant to work with. The libraries to use xml are equally frustrating when you’re doing complicated things, unless you want to make a class for every single type of detail that this xml document wants - then it’s fine, but some of us don’t want to do that, or inherited a project that didn’t do that.

XPath is a struggle with namespaces, as well. It’s ... trying.


How is that relevant for config files though? If you implement an app and want to use a config file, you don't have to use namespaces. I agree that namespaces are no fun, so I don't use them for my config files.


Namespaces are great when used wisely: they allow for embedding unrelated XML fragments in your document and your application will not see them. Or versioning your XML in namespaces. XPath and namespaces are trivial but it depends on the library you use on how to setup the namespace resolving. My pet peeve about namespaces: people that use HTTP URLs for their declaration and expecting them to be resolved for some reason. The other one being people that see a namespace prefix and think that that is the actual namespace.


Depends on the XML. When you start mixing in namespaces (like trying to parse Maven pom.xml files in Python), it quickly becomes a mess.


Namespaces are horrible in Python because the Python XML libraries are deficient. They're generally fine in e.g. Java. (Unless you're trying to represent QName typed data, but that's very niche.)


I used Python as an example, but handling namespaces in most languages besides Java (and even arguably in Java depending on your point of view) is rather painful.


Any time the semantics represented by the XML is non-trivial, the schema design is likely to be obtuse and/or broken, even when done by smart people. XML is just hard to get right, and hard to evolve gracefully.

When you add verbosity on top of that, working with XML over the long haul is an utter pain.


I'd say the same about JSON!


Numbers over 2^52. Unicode. Terminator vs. Separator. Weak type system no DTD. No Xpath equivalent. No namespaces.


Date representations. No comments. Behavior when multiple instances of the same key occur.


XMLs "predecessor" is full-blown SGML, and since XML is just a proper subset of SGML, XML is no improvement. The one thing it brought was DTD-less, canonical angle-bracket markup where documents can be parsed without grammar rules eg DTDs (this was also incorporated into SGML via the WebSGML adaptations, so that XML could remain a proper subset of SGML). But this also means XML is useless for parsing HTML, the most important markup language (XML was started to supersede HTML as XHTML, but that failed).


I think author confuses YAML problems with his favorite languages problems. I bet those problems (at least most of them) are non existent in Java, for example, only because Java programmers usually more responsible. Same for Haskell or Rust I think.

But in other languages with notoriously irresponsible coders (JS, PHP) I bet to see even more of these problems.

(I coded in all of them)


Exact after-taste I felt after article. Why not just move the focus to point that author is not like Ruby and other Ruby frameworks anymore.

During my working life using Python I got few meh-moments with YAML. And this is all. Never lost real joy of using it.


The first argument, about YAML security, isn't valid; YAML is hardly the only format whose parsers have admitted deserialization vulnerabilities (they're endemic in Java; Rails had this problem with XML, and even before ROP-style deserialization was a thing, XML was getting applications owned up through external entity definitions).

Format aside, no matter which you choose, you have to pick library interfaces that don't deserialize to arbitrary, constructed objects.


It is a valid criticism when comparing to Json or TOML


I've been working on a software that's eavily based on XML and in a number of occasions I've been glad XML is strict and verbose.

you can quickly tell if an xml document is malformed (good parsers will tipically point you to the un-closed tag).

Yaml on the other hand would probably load anyway, with the application receiving garbage data, potentially misdirecting the application behavior...


Another one: Parsing partial YAML files doesn’t detect an error with loading the complete file. We’ve had a production outage, because of large yaml files getting cutoff and not all settings getting loaded into our server. JSON or XML typically will not parse.


YAML files should have an explicit start point and end point, respectively `---` and `...`. yamllint will enforce this, although I have never used these for integrity checks.

https://yamllint.readthedocs.io/en/stable/rules.html#module-...


Is this not an issue with a parser rather than with YAML?


Unreliable parsers are an issue of yaml.


Why?


If the JSON and YAML folks can’t get along, I swear I’ll turn this car around and make you all use XML.


I'd prefer XML because of the stability of the tools available for it.

Recently I was writing a custom static site generator for my website. I started with python and yaml using the pyyaml lib. After two months (don't laugh, I wasn't writing this generator all this time; i had a break) I tested if everything I wrote previously was warking. Pyyaml came at me screaming that they deprecated something and I shouldn't use it, otherwise the feds will get me.

Let's go several months earlier still. I was learning Python, using Debian Stretch which has Python 3.5 installed. The book I was learning from used Python 3.7. When I got to a point it became clear that 3.5 lacks a few things which are needed to continue learning Python according to the book. So I compiled Python 3.7, set up virtualenv with it and... The code I had previously written stopped working. With a version bump from 3.5 -> 3.7? That's a minor version change. And now 3.5 was expecting at one point to have a string path passed as an argument, and 3.7 was expecting a pathlib path. That was trivial to fix in case of a small example, but I would dread using such a thing for anything big and then having to debug what exactly broke between different (minor!) versions of Python or its libraries.

These new hip tools seem to have a backwards-compatibility issue.

I eventually settled on using XSLT and a couple really short shell scripts (which all fit on my screen at the same time) and I don't expect them to break in the next two decades.

However, XML is still a pain and I would prefer just using S-expressions and Lisp[1]. It's just that for now my only experience with them is writing things for Emacs and I would like to learn Scheme/CommonLisp to do anything outside of Emacs with Lisp.

[1] https://sites.google.com/site/steveyegge2/the-emacs-problem


> With a version bump from 3.5 -> 3.7? That's a minor version change.

Python doesn't do semantic versioning. (Just like most language runtimes) You can find deprecations and backwards incompatible changes in the release notes.

Although those breaking changes are not frequent and if https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-pep... resulted in broken code, you likely ran into something that's reportable as a bug.


What are the de facto tools for formatting and querying XML files through a CLI? They should be ridiculously simple to install and use.


If you have libxml2-utils installed (package name may vary depending on your distro, but it almost certainly has a package) you can probably do something like "xmllint --format". You can also use "xmllint --shell file.xml" to get an interactive shell, or execute an xpath query and return, eg. "xmllint --xpath //foo file.xml". Use "-" for the file to read from stdin, as you might expect.


xmlstarlet is a useful CLI XML manipulation tool


This sounds more like a Python and pyyaml problem than a YAML or JSON problem. Someone could come along and totally refactor XSLT even if XML stayed the same, and then you would be in the same boat.


Parent addressed your comment in his first sentence.

> I'd prefer XML because of the stability of the tools available for it.


To be fair, the article begins by linking to a previous article by the same author about the problems with JSON. He is an equal opportunities opponent :)


inb4 YAML adds "Yeah, nah." and "Nah, yeah." as boolean values. Interpretation is locale dependent.


YAML 1.2 fixed this making booleans just true/false and it's a real shame that a lot of things still use 1.1


I’m sure Christopher True and Robert False really appreciate that “fix”.


So true. Your comment made my day.


I don't like NO as a boolean literal (mostly because it's also an ISO country code), but there's only one right way to parse it in a YAML 1.1 document. It was a mistake to apply the 1.2 rules to 1.1 documents and issue a "warning" about incorrect output, rather than require rejecting the document if the 1.1 rules aren't implemented.


I'd also like "Sweet" to be true and "Stink" to be false.


That's why StrictYAML always interprets as string unless there's a schema. Cuts out the surprise type conversions.


Aren't we just reinventing the wheel, though? Got your structured data format, now you need parsers (tons available for XML, incl SAX, DOM parsers, SimpleXML, Nokogiri...) a schema and validation tools (XSD), a templating mechanism (XSLT), a query language (XPath), ...

JSON was a reaction to the verbosity of XML, but a better reaction would have been to work harder on our text editors so that working with XML would be just as easy as working with JSON in terms of the numbers of keystrokes needed. Better parser interfaces that help you treat the dataformat more like it's part of the language would also help (i.e. SAX and DOMDocument suck to work with, but SimpleXML is almost idiomatic).


Verbosity certainly is an issue with XML, but far from the only one. IMO the main problem of XML is that it was designed as a markup language, but then misused as a data structure serialization language. When used as a markup language, the distinction between attributes and children is meaningful. When serializing data structures, the dichotomy breaks down. For most subfields of a larger data structure, it's not obvious whether to serialize that subfield as an attribute or as a child. Contrast with JSON, which only consists of obvious data structures.


> not obvious whether to serialize that subfield as an attribute or as a child.

Attributes are just strings, generally for metadata. I'd probably serialize an object from another language more verbosely. This is where an important distinction needs to be made: the XML format you use for config files or for data exchange from your app to others should not necessarily be just a serialized object from the most convenient form inside your application. If you care about the operators of the app, you'll allow them a more concise format for that kind of thing, and use XSD or your own internal mechanisms to turn that into an object you want to actually work with.

It's the sort of problem people have once, abstract away, and move on from.


It's also the sort of problem that just doesn't exist at all if you use a real data serialization format from the get-go.

I'm so frustrated with our collective attitude of building abstractions upon abstractions upon abstractions, without ever stepping back and realizing that we're using the wrong tool to begin with.


This difficulty is of your owm making. An attribute is 'metadata' about the element. A child element is precisely that: another element.


The fact that you put "metadata" in quotes illustrates the problem.


> JSON was a reaction to the verbosity of XML, but a better reaction would have been to work harder on our text editors so that working with XML would be just as easy as working with JSON in terms of the numbers of keystrokes needed.

Isn't that only solving half the problem? XML is also pretty difficult to read


It's not even the important half.

If I have data that I need to send somewhere, and I can create the format for it, that's really easy to do.

The problem, every time, is the reverse; receiving some piece of data and trying to figure out what parts of it I care about. Both XML and JSON allow for schema definitions, but in both cases it fundamentally requires me, as a consumer "grokking" what is being sent. And the verbosity of XML simply makes that harder. Working with either is not _that_ hard (though I have run into XML in the wild that is so large a payload, yet so poorly designed, that there is no good way to process it; I can stream with via SAX without writing my own state handling mechanism, and I can't just deserialize it into an object without massive memory issues at scale); the difficulty really is in containing it in my mind, and JSON simply facilitates that better due to it's simplicity and explicitness (yes, explicitness; in XML it's not clear if a child element should only exist once, or multiple. JSON it's obvious)

Per the OP; I cringe every time I see YAML. Pain to write, pain to read; have to have tooling every time or I get whitespace issues.


What schema definition is there for JSON?


JSON Schema (no points for imaginative names)

https://json-schema.org/


> XML is also pretty difficult to read

I’d say this is schema-dependent. If you’re talking about plist files, sure; those are ugly and unintuitive. But on the whole I find XML far easier to read than JSON. With closing tags, what you lose in terseness is made up for with scannability: it’s easier to understand the document hierarchy at a glance, and find your place again after editing. Whereas with JSON I often have to match curly braces in my head, or add comments like `// /foo` which isn’t even possible outside of JS proper or a lax-parser environment.


Look. I am just a web guy.

But why is XML so freaking great? We can’t even tell if whitespace is significant or not. If a schema says it’s insignificant then that’s that!

https://www.oracle.com/technetwork/articles/wang-whitespace-... That alone is TERRIBLE! (Same problem with YML.) Why should I bother with that? JSON can encode strings, hashes, arrays etc. in a way that’s instantly interoperable with JS and is far far more unambiguous.

What exactly is so great about XML that you can’t do with JSON in a better way? Schemas can be stored in JSON. XPATH can specified for JSON. Seriously I never got the appeal of XML except that it was first.


Some of XML's biggest achievements lie in written documentation formats(DocBook, DITA) where fine-grained markup control is needed and the presentation of the content is secondary to semantic features like footnotes, indexing, etc. These are formats that professional technical writers turn to when Markdown, Word docs or PDF won't quite do the trick.

For a lot of data, XML isn't the right form and buries too much data in hierarchy and tag soups - but it's flexible enough to make it into whatever you want, and since XML was buzzworded and XML libs were some of the easiest things to reach for in the 90's, it got pushed into every role imaginable.


Two specific things about JSON: elements with the same name overwrite each other; and even though parsers are generally good about it, items are not required to be returned in the order they appear in the file.

Oh and there's no comments.


One thing that bugs me about JSON is that it can't easily [0] represent general graph-structured data because its notion of identity is too limited. XML can represent general graphs trivially.

[0] https://realprogrammer.wordpress.com/2012/08/17/json-graph-s...


XML wasn't meant as replacement for JSON, but for HTML without vocabulary-specific parsing rules (eg. SGML DTDs).


As far as I can make it, JSONs popularity grew from it being JavaScript which is, as we know, the knees bees. There was no big thinking in behind the whole thing and the role it plays today was certainly not the intended role (otherwise I cannot explain the non-standard data-formatting that is handled differently everywhere). JSON is more akin to Java RMI than to XML imho.


I think that XML is often used for stuff that it shouldn't be used for. It could be almost OK (there are still a few problems though) for stuff containing text with other stuff inside that may in turn contain text, and so on. For other stuff, JSON or RDF or TSV or other formats can be good.


> JSON was a reaction to the verbosity of XML,

JSON was a reaction to the simplicity of having a data format that JS, ubiquitous on the web, could load via eval (which, once JSON was established, was largely abandoned because it is ludicrously unsafe, but the momentum was already there.)


I have to wonder if these stories are written by people who simply hate to type, or aren't good at it. Beyond that I can only chalk it up to academic curiosity.


You say that like it'd be a bad thing. I like XML.


You say that like you’ve never really used XML...

(Mostly /s. Come at me:))


I've used it quite a bit. I even like the namespacing bits. I find that XML composes elegantly in a way that the JSON and friends don't.

My one request would be to bring back to SGML-like closing tag abbreviation:

That is, instead of

    <foo><bar>qux</bar></foo>
we should be able to write

    <foo><bar>qux</></>
I think this one change would make XML more "palatable" for the JSON/YAML/TOML crowd.


SGML also has tag omission to make this even less verbose if desired. Or short references, which basically let you define arbitrary tokens SGML recognizes and replaces into something else, depending on the element context. These techniques in combination can be used to parse s-expr, CSV, markdown, and even some JSON, for example. Though personally I agree with others here that SGML is first and foremost a markup rather than config language.


The problem with tag omissions is that it requires that the parser have tag-specific information to properly build the AST. You can still do generic parsing with balanced </>.


Or even better:

  <foo/<bar/qux>>
(Although standard SGML inexplicably specifies a null-end-tag character of '/' rather than '>', so this won't work in stock parsers.)


This looks absolutely hideous. I don't see why this would make me want to switch from YAML or JSON.


Interestingly I find SGML-esque to be quite unpalatable. I get the feeling doing code reviews would be nightmarish.


Why? It's no worse than S-expressions.


It’s significantly worse than single char s expression for my eye, anyway.


The verbosity makes it harder to parse. It is subjective, but I find ")))" is a lot easier to instantly parse as 3 than "</></></>"


As much as I am an old-school Unix zealot, I think it is time to move towards a well standardised binary config format with non-trivial types (i.e a schema). There still has to be a standard text format, but only for the source from which the live configs have to be built. Done right, this has several advantages:

1. Built-time validation (or at least type checking).

2. Built configs can be easy to parse but (potentially) rich enough to avoid confusing templating.

3. Separation of concerns between storing/maintaining configs and applying them. E.g. scoop text configs off a source repo, but send out binary configs over the network.

All this is a fantasy in my head. Right now the closest mainstream thing is protobufs. But they make trade-offs for non-config use cases, and thus don't really cut it in the "... rich enough to avoid confusing templating" department.


SQLite databases might fit the bill. Fairly lightweight. Can talk to them in basically any language. Instead of templates you copy the database file and issue some UPDATEs.


But that, and the parent's idea of binary formats in general, throws away the absolute golden property of text format configuration files: you can put those in git, and see with an accuracy of a single character what has changed. My impression was always that this was a huge reason for plain text files in the first place.

Someone mentioned protobufs, maybe with something like those one could have both?


> you can put those in git, and see with an accuracy of a single character what has changed.

You can use sqldiff[1]. Try adding it to your .gitattributes[2]. If you need TRIGGERs and VIEWs, consider dumping your database[3] instead.

[1]: https://www.sqlite.org/sqldiff.html

[2]: https://git-scm.com/docs/gitattributes

[3]: https://gist.github.com/peteristhegreat/a028bc3b588baaea09ff...


They don't type check though; most constraints arent enforced, and the underlying reality (slinging around mostly strings) leaks out often.


I agree this a pretty choice.


Property Lists already do tick some of those boxes.


These are all valid points. But I still find YAML to be the best format for storing my strings for localizations. I find it much easier on my eyes than JSON. I’m open to other suggestions though.


Per the article consider Toml


There's two types of formats: 1) those people complain about, and 2) those no one use.


TOML seems widely used but I've never seen complaints about it. I'm sure there are some, but the only time I see it mentioned is when someone is recommending someone else switch to TOML.

Out of curiosity, is there anyone here who doesn't like TOML for configuration?


I've encountered toml a couple of times but I wouldn't call it wide spread. It's alright for small configuration files. However, if you keep things simple, json and yaml are also not so bad and even properties files or good old ini files will work. Doing e.g. cloudformation stuff in toml is not a thing though and it supports both yaml and json. If your data is simple, use a simple format. I've always liked properties files with simple name value pairs separated by =. Still very common in the Java world though yaml has replaced a lot of that.

BTW. I've handled all of those formats using jackson on Java & Kotlin. It has a flexible parser framework originally intended for json. But it has lots of plugins for different tree like configuration files. Look for jackson-dataformat-yaml and jackson-dataformat-toml on github. There are loads more formats that you can support with jackson. Nice if you need to translate from one to the other or need to support multiple formats.

IMHO Json with some tweaks would be really nice. E.g. just supporting comments and multi line strings would make it a lot nicer. A lot of json becomes unreadable due to the need to escape strings. I've come across Hocon a couple of times (jackson-dataformat-hocon) and it's a strict superset of json, which means that if you accept hocon as input, you implicitly also accept json.


Despite spending time writing and reading YAML on a daily basis for years, it still trips me up once things get non-trivial. It's definitely my least-favorite non-propritery config file format. XML might be overly verbose, but there are no surprises (unless you go bananas with schemas).


Yeah. I never got the hate for XML. I feel like it was always mismatched expectations: some people wanted something the Markdown of configuration files, and other people wanted something extensible enough to encode any possible data structures.


It's verbose, illegible, redundant, and shares mos of the problems YAML has.

Not to talk about the attribute/content duality and all the ill-defined parsers it leads to.


Not to mention; while pretty wordy, XSLT was incredibly powerful.


XML is great for documents too:

  <t>Some <b>text</b> is here </t>
And I found some things weird, like entities and (external?) DTD references.

But it's great for building your own formats. For data interchange it has it's problems, like no types. Everything is kind of a string. With encoding problems and XML in XML everything ends up in CDATA...

I think that's why XML makes for heavy parsers. I remember it was better in Java, because of the strong libraries.


XML solved this problem a long time ago, but everyone hates it now because it's too enterprise and Javalike.


I think the author’s conclusion is in line with my own thought: If JSON is the problem, YAML isn’t the solution.

I recall the first time I saw YAML and all I could think to myself was that I have to learn yet another syntax. I find it far less readable than JSON or XML and made me pine for the latter.


YAML is bad. It's like markdown, there are too many parsers behave differently. Unlike markdown just for reading, it is used in configurations for critical systems. JSON is much better, it IS readable and writable, people using package.json all the time without problems.

Templating YAML is even worse. Templating is an ad-hoc abstraction, and very easy to run into issues. A minimal JavaScript runtime with JSON would be much better, JSON is JavaScript Object Notation after all.


Markdown's problem is no single standard. This is not the case with YAML, so no, it is not like markdown. And you can technically write assembler also.


You are right, but if we shrink the scope to CommonMark the problems still exists.

And what is assembler, may I ask? Is it for YAML or Markdown?


YAML is really so much more than JSON.

  * YAML can have several 'documents' in the same file,separated by ---
  * there are anchors and references
  * easy to read multi line texts
  * it's also a superset of JSON
I can see, how choosing YAML when you just wanted readable JSON might give you more headaches than expected.

And like someone else said, putting another template engine (or two) on top of YAML is when the real problems start.


It's really to maintain JSON without being allowed comments. JSON is fine for transmitting data across the web, but if you need complex configuration data in version control that many people work on, you need comments so new teammates can be easily onboarded onto a project. There's definitely a need for something that is programming-language-like, but entirely about structuring and formatting data for configuration. YAML is the closest thing to that. It's not perfect, but it works better for that purpose than JSON or XML.


I have a completely unrelated question with the topic but derived from the font-face used in the article.

https://arp242.net/yaml-config.html#can-be-hard-to-edit-espe...

In the heading, how was sp ligatures in the `espcially` written, is there a name for this? How do you connect the beginning of a `s` to the beginning of a `p`?


These are discretionary ligatures [0].

They're turned on in html with the following two css lines (though on firefox, either one is enough to have them happen):

    font-variant-ligatures: common-ligatures discretionary-ligatures;
    font-feature-settings: 'liga' on, 'dlig' on;
[0]: https://www.fonts.com/content/learning/fontology/level-3/sig...


Note it won't work for every font, or some fonts may have a different flag to enable it (I think there's an historical-ligatures, as well).


1. General purpose serialization format is released.

2. Format is declared to have x and y problem, new format is invented that is "simpler and better"

Time passes

3. People slowly discover format in #2 has the same issues that led to creating #1.

Repeat.

(Just like attempting to super-generalize anything else)


The points in the article are pretty solid.

But here is a big question.

Imagine you can influence the switch of the configuration formats for projects like Ansible, Kubernetes, Docker Compose, AWS CloudFormation, Google Cloud Deployment Manager, et al.

You can take any project with huge user base and all of those project will have one thing in common: JSON-based configuration with an option to write this configuration in YAML.

Since basically anyone talking YAML in the context of JSON is talking about a JSON superset.

So here's the task: propose a JSON-compatible alternative to YAML.

Things to keep in mind:

- backwards compatibility

- easy migration from YAML to a new format

- full JSON compatibility

- relatively cheap to get supported by the project of interest.


Just parse the YAML and spit out JSON5 (https://json5.org/) and then keep that version.


I wish the INI file format was standardised. It's easy for computers and humans to read and write, and it has nice features like comments and non-destructive editing!


TOML is basically a standardized version of INI.


TOML is similar to INI, but it's not the file format I know and love.


The first example (i.e. `yaml.load()` in Python) doesn't work with the current version of PyYAML.

Function application was disabled some time ago, and `yaml.load()` logs a noisy deprecation warning telling users to use `yaml.safe_load()` instead [1].

[1]: https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-...


I learned some of the intricacies of yaml the other day when refactoring a docker-compose project. At first glance, it's brilliant... Until I started running into limitations, edge cases, and issues.

I like the _idea_ of yaml, but: - it's overly complicated in the wrong ways - common/simple use cases aren't supported and require post processing (i.e. Merging block maps/arrays, string interpolation, etc)


I've recently started using https://jsonnet.org/ to generate more complex config.

It's easier to write than JSON (no need to quote keys, allows trailing commas), has reusability through functions and objects, and can output JSON which is much easier to parse than YAML.

Downside: you need another build step for the config.


S-expressions rule. I use them for configuration everywhere.

I wrote a lovely library for parsing them in Go (along with a full lisp interpreter if you like) https://github.com/glycerine/zygomys

Provides comments, multiline strings, and automatic translation into Go structs using reflection.


It's best to consider YAML in it's appropriate context as a better XML fragment. The ideas in YAML evolved into JSON which is preferable today.

However at the time, a tree data format that deserialized into native types was quite useful. The alternative was writing event based SAX parsers, or incredibly verbose XML object apis.


The security problem is common to many serialisation formats and similarly terrible bugs have happened in a large number of formats.

For instance, the recent iMessage bugs that project zero announced were because NSCodable serialization tells the deserializer what class should instantiated. Followed by remote code execution (woo!)

Similar problems have occurred with java serialization over the years, the python serialisation thing (that silly name I can’t recall).

I was recently learning swift and was getting frustrated by the verbosity/work for deserialisaing abstract classes when I realized the clunkiness was due to a design that made the deserialise attacker specified objects basically impossible. Obviously you could engineer a solution that would be exploitable but there’s only so much a platform can do to stop developer mistakes.


Just go with Dhall and be done with that


I dislike whitespace sensitive languages or definition formats with a passion. Especially when tabs and spaces are not treated equally. I don’t mind python as much nowadays but YAML is borderline insulting to me. I hope we all move on to something more sane soon.



I find yaml to be the best for making cloudformation templates. On its own it isn’t much good but if you use the right plugins it really is better than json.


The problem with most configuration file formats is: you can't put functions in them realistically.

The best configuration file is simply source code that initializes whatever you want to run, and then runs it. That way, you can install hooks in the form of closures and make the program behave exactly like you want without the constraints that a simple "value-only" configuration file format has.


Lua was originally designed as a configuration file format that supported functions. Description: https://www.netbsd.org/~mbalmer/lua/lua_config.pdf

Relative to other languages, it is very easy to embed Lua and it is also easy to cut out Lua’s entire standard library to restrict what the scripts are capable of doing within your program.


This may be application-specific, but I might worry about security if my configuration files support arbitrary closures.


If an attacker has write access to your configuration files, isn't all lost already?


Any more so than the rest of your code?


Config files are typically written updated by non-developers and often go through a less rigorous release process, so having a less-complex and dangerous, even if less-capable, language can be desirable.


Ah, that's not been my experience - but I can understand that if non-developers are the ones making the changes


Heh, yaml haters unite! I wish yaml would go away.


I hate yaml but I have yet to find a better option for deeply nested confog files. Toml is the closest thing I have seen. Toml support is also not that great.

Yaml despite its flaws works pretty well for Ansible playbooks and for storing localizations.


Even the creator of Ansible got frustrated by YAML - https://medium.com/@michaeldehaan/16-ways-opsmop-improves-on...


Would you say actual YAML does a better job in Ansible over just writing JSON and letting it be parsed as YAML?


Have you tried writing JSON by hand or diffing it in a pull request?


Err, yes? Is quoting your keys, delimitation members with a comma instead of whitespace, and putting brackets/braces around collections really that confusing that people struggle to edit it by hand or read it in a diff?

I think the syntax is actually what makes it more human readable, it's still 95% text/numbers just annotated with information that makes it clear what things actually are instead of hiding them behind confusing computer parsing rules nobody is going to think about while reading human-friendly text.


Or commenting JSON?


I can at least write comments in YAML.


That's probably the biggest thing YAML has going for it and probably the only feature I take advantage of when something takes YAML instead of JSON.


I don't care for occasional config file writing but for stuff like ansible where writing YAML is the main job, I just wish it had chosen s other format but I guess it's too late.


For what use cases, and what would you prefer?


ThoughtWorks has templating in yaml on their hold list. https://www.thoughtworks.com/radar/techniques/templating-in-...

How does one write Kubernetes specs and Ansible without yaml?


As YAML is a superset of JSON, use $favourite_language to generate it. I like JSONnet for that.


Google uses a subset of python called Starlark for build configuring (available in Go and I think Java). Nice if you want to be able to compute things during config.

https://github.com/google/starlark-go


I still keep using XML as my favourite format.

Get to use parsers out of the box, validation tooling, support comments, IDE code completion and they are super easy to transform.

In a couple of years some trendy SV unicorn will make XML the best format of the world, as these cycles happen to be.


The irony is that the best format that takes over the world will be an XML representation of JSON, enabling comments and trailing commas.


Kubernetes supports JSON but overwhelmingly leans towards YAML. I've had to spend some time really grokking it to do basic dev ops, and now have my IDE pretty dialed to support it. That said, its not my favorite by a long shot. Can Jsonette save us?


I should clarify that kubectl converts YAML to JSON, so most examples I see are in YAML in a repo, then applied by a CI system via kubectl where it is transformed into JSON.


Kubernetes API accepts only JSON with a single exception of Server-Side Apply which is an alpha feature.


JSON is valid YAML.


I liked the indented style of YAML, asked how to properly parse an indented file, and wrote my own small "parser".

What's nice about YAML is the choice to use indentation, for the rest, I have a hard time following the language's choices.


S-Expression or TOML! I personally use TOML for all my projects' configuration file.


TOML is so much easier to read IMO


Not a fan of YAML, but it has its advantages over JSON, such as the ability to comment specific portions easily.

Haven't used TOML yet, but it seems promising given that for most use cases you would only use a portion of the YAML language.


FYI, PyYAML 5.1 partially fixes the security issues:

https://github.com/yaml/pyyaml/issues/265


I strongly recommend doing away with config files completely for sake of ease of use, maintainability and security.

Instead just declare all config variables within code itself in a separate config class/module file, along with initialization to default values and provides dynamic getter/setter interface over a debug API (which can be enabled/disabled via a command line flag).

If you want, you can also provide a friendly cli tool to interact with the debug api. This tool could output help messages, show current config values - differentiate between default vs overwritten etc.

Of course, this can be written once as a utility library and cli and used consistently across all your programs.


For the love of god, if anyone reads this, please don't do this!

Config files are fantastic. Trivial to read, write, copy, track in version control, diff, grep, generate with scripts, etc.

API-driven configuration has none of these properties.

Some Java application servers take this approach of API-driven configuration. It's an improvement over UI-driven configuration, which is what they had before. But it's still significantly worse than simple file-driven configuration.

If you want to provide a 'friendly' CLI tool, by all means do so - but provide a tool to interpret and generate config files, not something which replaces config files.


Config variables in code gets version controlled and release-managed alongside code and takes the same CI/CD route to production as rest of your code does. Your code asserts, compilers and test cases can help you catch errors in config data. All of this happens without any special file format or parser concerns.

If you need runtime reconfiguration in production, then it requires a runtime config management system tailored for operations folks with proper authn/authz, audit logs etc. This is a product by itself. The connection between your running program and the runtime configurator has to be intentional, secure etc.

IMO, config variables should have following bindings: 1. config variable in code. 2. program start env variable. 3. program start command argument flag. 4. runtime configuration.

All available configuration options are declared in the code. But not all configuration options should be accessible from #2 to #4. And the override preference order may not be same for all types of configuration variables.

Btw, this isn't related to Java or any particular programming language. I've seen this done in large C++ projects 20 years ago.


> Config files are fantastic. Trivial to read, write, copy, track in version control, diff, grep, generate with scripts, etc.

Code based configuration offers most of those benefits, just look at some of the suckless tools, if you've never used dwm or written a line of C I bet you can still guess how to configure some stuff here: https://git.suckless.org/dwm/file/config.def.h.html .

It doesn't work for all software of course, but for anything developer centric and/or anything with complex configuration done by experts it's a good choice, particularly for any software going down the path of creating a custom language (tmux, vim, etc). You get the power and flexibility of a full programming language, you get compile time config checks, you get excellent performance, you don't have to learn yet another config language and it's easy to write patches.

It's not great if you're targeting Joe Average, but neither is any configuration format, or any configuration at all.


> If you want, you can also provide a friendly cli tool to interact with the debug api. This tool could output help messages, show current config values - differentiate between default vs overwritten etc.

And we load the config by curling a JSON payload :) ?


This breaks when you have multiple services in potentially different programming languages that need to read the same config values.


I follow this plan in many (maybe most) app config situations, but it has at least two noteworthy drawbacks:

1) it presents an obstacle to sharing config files in a multi-language environment.

2) writing user prefs in the host app’s native language is fraught with security issues

The first concern is admittedly niche, and both concerns are addressable with some care and thought, but they’re good to consider.


This is not going to make you many friends in the server community (think apache, mysql...)


I went from that to INI to TOML and back to that in 10 years.

It just makes things run smooth in TypeScript because all the members and types can be statically analyzed giving you errors, auto completion and ability to jump to the definition when config files cannot do that but probably not good for large projects with multiple languages.

For small projects I'm just taking the benefit over small concerns.


How did we get to this point?


The superior replacement for XML, JSON and YAML is the SQLite .db file. Easy to “parse”, easy to manipulate programmatically, what more could you want?


- Editable in vim/emacs

- Meaningful version control


- editable in Emacs easily, there’s a mode for it

- dump it to SQL and version control that, if you must use Git


- Not everyone uses emacs.

- Not everyone likes the extra step.


- that’s your own choice, if you don’t want the features you don’t have to have them

- not everyone likes the hoops you need to jump through with the alternatives either! That’s why we’re discussing this :-)


Annoying for diffing probably?


Not human readable...


> Not human readable...

This refrain just cheeses me right off every time. Nothing is human readable! Everything requires a program to read it, because no human being can read states of charge or states of magnetic polarization directly.

What makes something 'human readable' or not is a software tool. Underlying that tool is a data format that the tool can accept and display. What everyone means when they try to sound smart by saying 'human readable' is just 'plain text.' In other words, they know where to find the dumbest possible reader/editor for it. Text editors are the dumbest possible editors because they cannot constrain edits to conform with the grammar of the interface language; they allow bugs at a point in the development process where it is trivial to disallow bugs, especially considering that interface languages should probably be, at most, regular languages.

I'll step off my soapbox, now.


Perhaps you should stand on that soapbox a little more often. People are way to enamored with their little 1970's ed and ex derivatives.

Every veteran in the field knows that data and data structure are the primary enablers for almost any solution. Nevertheless, we have regressed in the last decades w.r.t. that. Had we had better structured (AST-driven) editors and context dependent representation, XML (or sth similar) might have taken off.

Personally, I blame the exponential increase in inflow of new developers. Nothing but reinventing the wheel without knowing history.


Human readable in that I can use TCPDump and make sense of it all. That's one of the reasons HTTPS everywhere sucks.


I've a certain sympathy for your position but I interpret "human readable" as meaning readable and somewhat comprehensible without using specialized tools.


But when is a tool specialized? Would a dedicated XML / JSON / YAML / ASN.1+DER / Avro / Protobuf / Parquet editor be specialized? And what if that hierarchical standard would be the de facto industry standard? Is a binary file editor specialized? What about a structured assembler?

Personally, I think most editors are rather specialized. They deduce the character sets, often add syntax highlighting and provides paging for very large files. High level strongly typed languages, such as modern C#, Java, Scala are designed to be used in an IDE. You could view and edit it, but it provides a difficult situation (not unlike editing XML by hand).

"Human readable" is very subjective. It depends on the person, the task at hand, the intended recipients, etc. etc.


WTF is that weird ligature between s and p on the heading "Can be hard to edit, especially for large files"? That is an odd font.


YAML (/ˈjæməl/, rhymes with camel[2]) was first proposed by Clark Evans in 2001,[10] who designed it together with Ingy döt Net[11] and Oren Ben-Kiki.[11] Originally YAML was said to mean Yet Another Markup Language,[12] referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.


I never understood how YAML is more human-readable than JSON. I find JSON much easier to read. What annoys me the most about YAML is that it's easy to misinterpret the indentation. You need a special IDE to know whether a property belongs to a specific object or to its parent.


> I never understood how YAML is more human-readable than JSON

Two things: comments and multi-line strings.


My personal JSON Pet hate is: ``` x = [ "Foo", "Foo2", ] ``` Is not valid, but the following is: ``` x = [ "Foo", "Foo2" ] ``` Makes dealing with packer configs feel like punching yourself in the face.

I still prefer it over YAMLs awkward initial learning curve.


At first I found it really annoying but then the more I thought about it the more I came to value the "," semantics as proper validation for a "forgot to put the last element in the list" error which would otherwise be silently hidden via the parser.


So your comment is vaild. Having strict and not-strict validation would be a nice compromise though (:


What? No you don't. You don't need a special IDE any more than you need a special IDE to know if a Python statement is in an "if" block or a parent "if" block.


Configuration files are usually meant to be single purpose. Docker, Kubernetes and Helm all use YAML exceptionally well.


Yaml is great at the core. It just has too many features, the first things I disable.

A simplified subset, .syml would be a good idea.


StrictYAML[0] is a YAML subset that removes some of the problematic features.

The implementation is in Python.

[0] https://github.com/crdoconnor/strictyaml


This should be the post then, a solution rather than griping.


EDN


tl;dr: Another alternative to YAML (among many great others), this one designed and developed by me:

https://eno-lang.org/

I've been doing a lot of research and development on language design for file-based content (e.g. for static site generators). I've found that YAML - although established as the go-to format for statically generated blogs, etc. - was never designed for these things as it by its nature does not support simple, essential features for this usecase like for instance unindented blocks of verbatim text (for which YAML frontmatter was invented as a very limited hack).

The result of all this R&D is a language called "eno notation" which is designed especially for file-based content usecases, and around which I've also built an entire ecosystem for many languages and editors - if you're working in that field, it might be worth taking a look!


I find it surprising that your format doesn’t distinguish strings and numbers, or other types of scalar values in general. For example, in your demo “eno's javascript benchmark suite data” on https://eno-lang.org/eno/demos/, both of these lines:

  iterations: 100000
  evaluated: Fri Jul 06 2018 09:46:48 GMT+0200 (Central European Summer Time)
are tagged below as just a “Field”. Do client programs that read an Eno file need to run `int()`/`float()` or `.to_i`/`.to_f` on the field values they know should be numbers? That seems unergonomic.


You are correct! The thinking behind this is that for the majority of file-based configuration and content usecases the expected types are fixed and known beforehand already - ergo it makes more sense that a developer has to specify once which type a field is (gaining in return 100% type safety, validation, localized validation messages, ...) than all users later having to e.g. explicitly write quotes a million times when writing configuration/content, just to tell the application something about the type it already knows anyway (and wouldn't expect/accept any other way too). I think this is really more ergonomic, even in the short run.


What about HCL, would it make sense to use this as a config language?


I just want json with comments. Is that too much to ask?


Someone else already mentioned JSON5 (https://json5.org/), which is JSON with a few ergonomic improvements, including comments. Hjson (https://hjson.org/) is a similar, slightly more complex format with a few extra features such as unquoted strings for object values.


I just want mainstream/built-in JSON parsers to be able to ignore comments and trailing commas. What ever happened to lax parsing?



Disclosure: I work on Tree Notation. It’s the future of file formats, IMO.

The idea is to have 2 levels: a simple, minimal syntax/notation (think binary) called Tree Notation, and then have higher level grammars on top of that, called tree languages.

It works for encoding data and also for programming languages, regardless of paradigm.

https://github.com/treenotation/jtree


Your project looks interesting, but I looked through the Github project and site, but I couldn't find a language specification, reference manual, BNF-like grammar, or anything to indicate what the syntax is, beyond a very trivial example in the Github. To be blunt, I think you need to start with a spec to get any traction. That allows people to understand your data model and particular text encoding of it. If they like it, they might use your tool and perhaps port the system to other languages.


Good feedback, thank you. You may have seen the spec, it’s just quite minimal (here’s a more elaborate one: https://github.com/treenotation/jtree/blob/master/spec.txt)

Here’s a BNf:

https://github.com/treenotation/jtree/issues/1

There’s a FAQ as well.

Docs needs work, in particular I’m hoping people will create their own explanations of the ideas in external places, as that might be a better way to understand it. Happy to provide help to anyone that is interested in that.


From a quick glance, my issue with Tree Notation would be that it's not enough syntax, i.e. it does not provide enough structure for me to grasp the overall structure with a cursory glance. Maybe it would work better if GitHub had a syntax highlighting for it. (But requiring syntax highlighting to be readable is a large red flag on its own.) Or that's just a feeling that would be mitigated if I saw larger files.


GitHub syntax highlighting is coming. If anyone wants to help with that that would be awesome. One of the top outstanding issues. Syntax highlighting is there for sublime and codemirror but one bug left in those implementations and would love help getting syntax highlighting generation going for Monaco and linguist so we’d have it everywhere on GitHub.

Your impression is correct though, without highlighting it’s awful. Try the Tree Language designer app to see it with highlighting, type checking, autocomplete etc...there are interesting ways to accomplish everything without syntax, often not obvious. But tooling is essential to make it better than existing options. Help wanted!


This sounds a lot like s-expressions.


Thinking of it as s-expressions without parens is a very good analogy. That ends up making a huge difference.


Reminds me of Stylus over CSS.


wow that is awful


"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

Edit: we've had to ask you multiple times already not to be a jerk on HN. Would you please review the guidelines and take the spirit of this site more to heart?


It’s terrible! A waste of time! No one should use it! It’s definitely not the future!


YAML is horrible. Toml is much better. Even Json is not that messy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: