Hacker News new | past | comments | ask | show | jobs | submit login
JSON as configuration files: please don’t (2016) (arp242.net)
401 points by pcr910303 10 days ago | hide | past | web | favorite | 426 comments





"Lack of programmability" is a feature. Declarative config lets tools statically analyze and transform the config. You can't even figure out what the dependencies are going to be for some Gradle projects without executing them. https://stackoverflow.com/questions/51153878/conditional-dep... -- The author's programmable example is a classic Bad Idea in my experience.

Also, check out JSON schema and its integration into tools like VSCode. VSCode gives you generalized autocomplete on its own config and known config formats like package.json by pulling in json schema definitions. It works well. Easily a better experience than the executable-program-as-config I learned to hate in the JVM world, and easily better than ad-hoc config like Caddy's.

I have a more practical pitch: make your tool consume json5 as well as json.

Also, it's telling that the author wasn't even willing to stick their neck out enough to pitch a single solution. I guess in the end they realized there are always trade-offs and any alternative would have its own downsides. :) But I'm getting a bit fatigued of these you're-doing-it-wrong posts. They are effortless to write and they're like crack to HN. I can complain about literally everything.


This totally misses the point of the article :(

JSON is a fine format for storing configuration data, and any reasonably structured data. So is XML, protocol buffets, what have you.

JSON is a poor format for humans. For a configuration file that an actual human would edit by hand, and read and try to understand later, JSON is a pain.

Comments are mostly useless in a machine-generated and machine-consumed file. They are indispensable for humans, because they are a link to what machines are not doing.

Nice formatting is useless for a machine-consumed file, because they process a stream of bytes. They are indispensable for humans who recognize text very differently.

This boils down to one thing: config files are source code, and need all the amenities that make source code comfortable for humans.


> This totally misses the point of the article :(

I didn't. There wasn't much to the blog post. In fact, the blog musings are the same criticism we've heard a thousand times. But please don't suggest that I missed some deep point in a trivial blog post when you really just mean "I disagree".

I even pointed out a real-world concrete example where JSON config is made pleasant to edit by hand for humans: an IDE like VS Code that pulls in JSON schema.

Though since you only doubled down on the blog post's opinion instead of responding to a single point I made, can I say that you missed the points of my comment? You didn't even address my criticism by pitching an alternative. How are we supposed to have a conversation?


> I even pointed out a real-world concrete example where JSON config is made pleasant to edit by hand for humans: an IDE like VS Code that pulls in JSON schema.

I think the debate here is what "editable by humans" means. If you're using a tool for editing JSON like VS Code, I don't believe you're really editing it by hand. You're using a program that has specific things for JSON that give you to have a pleasant experience editing. I believe editing by hand means using a standard plain text editor and it's a lot worse.

Formats that use braces are relatively easy to parse by machines but can be harder to read by humans. Utilizing white space makes things easier for humans to read. A JSON file could have everything on one line. But something like YAML it forces you to use white space and new lines which helps humans to read it.


I would 100% rather edit JSON (in vanilla vim) than ever touch YAML again in any editor I've used. It has weird rules for what whitespace is significant and what indent level I need for things, versus JSON where % will match quotes and parents and braces quite easily.

The only significant thing missing from JSON is comments. JSON5 or convention can solve that, so it's not as bad as you'd expect.


I'm sorry man, but it really doesn't. In over ten years of using yaml I haven't seen a single instance where the whitespace rules were in any way weird or counter intuitive.

Most programming languages don't have significant whitespace, but it doesn't prevent people from writing readable code in them. And I haven't ever seen anybody write a long JSON config any differently - it's indented etc, for the same reason why C code is indented.

Still, editing JSON directly by an IDE that still exposes its syntax is "editing by hand", just as editing C in Vim with ctags is still editing by hand.

I do have to note though that VSCode uses JSON with extensions. Most obviously, it allows comments. But there are other minor things that make it easier to write (and parse) - e.g. it allows trailing commas in literals.


> And I haven't ever seen anybody write a long JSON config any differently

It's not people to have to worry about. It's the machine generated code. But now that I think about it more, YAML is a bad example since it's a superset of JSON. So a machine could generate really ugly YAML too.


Operating under the assumption that config files will be handled by a plain text editor and not an IDE seems unreasonable.

I've edited the VSCode config in JSON form by hand when adding snippets and overriding how the vim plugin handles some keys. Instead of the normal settings view, it was just the JSON in the text editing view.

JSON is a poor format for humans

But do you know what sucks even harder? XML. If I can replace an XML configuration file with JSON then screw it.


How does it suck harder? Sure, the closing tags can be a bit annoying (but decent editors can insert those automatically), but attributes don't have that problem, and contrary to JSON it has comments.

Closing tags is not really an issue with XML.

In my experience, the biggest reason people hate it is because it has too many ways of structuring data, and it allows people who design the schemas to go overboard. Attributes or child nodes? Multiple namespaces or just extend the one you have? CDATA or embedding?

That said, simple XML is as good as simple JSON. XML is fine if you keep it simple when designing the schema. AND of course one can screw up with JSON too. But after almost 20 years of people over-engineering their XML schemas, I can't fault anyone for choosing a simpler data format.


The big difference is that mapping that XML into native data structures in your language is a mess in the general case because XML has so damn many ways of structuring the data. JSON maps more or less directly into JavaScript/Python/Perl or really any untyped language with arrays, hashes, and scalars.

Personally, the main reason I don't like XML is that it's too verbose.

I think part of it is the context that JSON and XML are used in, particularly the era of software that uses these formats for configuration and the design implications that has.

Recently I've had to work with XML config files for a tool called Oozie, which is a data pipeline scheduling tool within the hadoop/spark ecosystem, and it has been soul crushingly tedious for me. Everything feels verbose and opaque, the documentation seems to prioritize enumerating every possible configuration option over providing minimum viable configs for common use cases.

JSON configs often just feel more simple and developer friendly. I'd say this has less to do with technical differences between JSON and XML and more to do with how "modern" software systems have been designed to be more ergonomic for developers/administrators and these modern systems happen to be more likely to use JSON.

It's also fun to hear non-technical people at work talk about "Jason files".


> It's also fun to hear non-technical people at work talk about "Jason files".

Ha! That one always makes me smile. One of the best exchanges I’ve heard was something like:

“... and we’ll have JSON handle all the serialization”

“Jason? Oh we hired a new engineer?”


This is a great point: correlation between tool and era/ergonomics. My current role is swimming in XML and abides in the XML era design philosophies.

> It's also fun to hear non-technical people at work talk about "Jason files".

I usually go the other way; Jasons I know have their names represented as "JSON" in my mind, and sometimes in written form.


There's a flip side to the common use case of documentation... How do you solve the uncommon use case when all the docs it seems are showing "how easy it is to do xyz." A month ago, I started on a new web front end. It was set up with React, Redux and webpack. Since it was primarily going to be me solo on the project, I wanted to integrate TypeScript. There are tutorials everywhere for starting with TypeScript but good luck finding something to say how to integrate it in.

You might be able to guess my name by my comment.

I really wish everyone would think of JSON as a French word. "Jay-sonne" (or jay-SAWN). Not "Jase-in"... it would help keep the confusion to a minimum.

Ha - https://www.youtube.com/watch?v=zhVdWQWKRqM He starts by saying "Jason" but then suggests the French version. Love it!


I find it interesting you're being downvoted. I think your question is reasonable: why does XML suck "harder" than JSON? This just seems a bit faddish to me. Things like namespaces and attribute/child distinction in XML are super annoying ... until you really need them in your ML (JSON, in this case). Then, suddenly, you're reinventing a wheel, hoping all of your downstream tools 'agree' on the behavior.

The votes seem to be going the other way now :-)

I've heard people prefer JSON over XML because XML is too complex... and now we have json schemas, jsonpath... the namespaces can't be far away.

I'll admit that XML probably would be better without entities and the crazy whitespace handling.


Why does it suck harder? I’ll give one example: arrays.

In JSON they’re plain and simple. You even get to use good ‘ol [ ] delimiters.

In XML? It’s a sea of open and close tags.

Now do an array of arrays. Then hand-edit it.

Like I said, it’s no better. But it’s way easier to read.


I can easily import/export the JSON and share it within my codebase. The XML is much worse in that regard (unless there's some clean way to do the same thing that I'm not aware of).

Millions of developers use XML all the time, it's called HTML.

People can easily read and edit it with deeply nested structures, attributes, and even malformed data. Frontend frameworks using components and variations like JSX also maintain the same style because it's very natural to use.

Now try taking a typical HTML document and expressing it in JSON. Even an empty page would be completely unmanageable without tooling. People have a strange reaction when they hear "XML" but it's much more structured, usable, and widespread than you think.


I think using the popularity of HTML (a rich text format) to justify using XML as a configuration format is a bit disingenuous. The two use cases (structured data and rich text) are genuinely very different from each other. In rich text, the primitives are text and markup. In structured data, the primitives are structures (lists, maps) and data (strings, bools, numbers). The impedance mismatch goes both ways - try taking a typical JSON document and expressing it in XML. Which parts end up as attributes? Elements? Text nodes? How do you express the semantic difference between a list and a map? A string and a boolean value? It’s arbitrary, ugly and verbose no matter how you slice it. And, just like it’s inverse, basically unmanageable without tooling.

Also HTML is not XML. HTML is non-strict (the spec is it’s parser not its format). HTML doesn’t have schema files. HTML doesn’t understand self-closing tags. HTML does have void elements (img, br, input, etc). HTML is designed for humans to write. And it’s a tool fit for its purpose, unlike the abomination that is XML.


Another commenter linked to this: http://www.jsonml.org/

Scroll down and I think it shows it perfectly how (X|HT)ML tags are much simpler than JSON syntax once things get complex and nested. It's not that hard to define basic primitives like we do with HTML (which has dozens of tags) but XML also lets you define your own schema if necessary to make things more compact.

I understand the technicalities between HTML vs XML but I don't see how it makes any practical difference when you're editing a bunch of tags in a text file. It's the same thing. The structure looks identical. What is the actual issue that makes HTML easy but XML hard?


This is such a contrived example, they take html and show the equivalent conversion in json with the same schema. The point is in a configuration file you can remove nearly all the cruft of xml, instead of having <input “key”=key “value”=value> you don’t need any brackets or extra info. You just type { key:value } and you’re done. As many times as you want. I’ve had to hand edit complex msbuild configurations (xml based) in past projects and I can tell you lists and maps are hell. We’ve since converted them to yaml (another discussion), but the point is xml is terrible for human editing.

And yet you have literally done the same exact thing in reverse - in XML, the exact equivalent would be <key>value</key>; you don't need any extra attributes, either.

Well, until you do need some metadata for that key-value pair. Which is why even in many JSON schemas it's pretty common to get something like { "key": "key", "value": "value", ... } (where ... is usually empty in practice).

The problem with MSBuild, for the most part, isn't XML - it's its own data model (which is not XDM, by the way).


The nuance that differentiates my example is I’m not picking some random web page (where html might actually make sense to use) and trying to apply json to a domain it most certainly is not optimized for. I’m taking an extremely common case and showing why xml is way too verbose for very simple scenarios.

My main point is in xml the closing tags are unnecessary, and you need an opening tag for every value where it’s obvious from nested context what that value is supposed to be. Xml is very redundant, there’s just all this zero entropy text everywhere which conveys no information. I disagree with your metadata in json example, why not just add more data to the value? The data model of msbuild is also annoying, but I’ve worked with some dotnet core projects now and using json as the project format definitely saves typing and actually allows you to intuit what a project is doing rather than just bombarding you with useless text.


You're picking a particular way to handle that common case. A common way, I agree, but you have others if verbosity is what you're optimizing for.

And with respect to data and metadata, what you propose is not fundamentally different in terms of verbosity:

   <foo metadata="bar">baz</foo>

   {
     "foo": {"value": "baz", "metadata": "bar"}
   }
And it's even worse if you also need to preserve order of keys.

Ok, but like what if your metadata is even more complicated than just a single field, it’s ridiculous to have to include even more opening and closing tags. <map> <kvp> <key>k</key> <value>v</key> <metadata> <field1>m</field1> <field2>m2</field2> </metadata> </kvp> </map>

It’s just crazy to look at tbh, and it was annoying to type on my phone. I’m amazed people are still arguing in favor of it. In reality you just want the markup to transmit the most information in the least amount of bits. This is a measurable quantity and xml objectively sucks at it.


> In reality you just want the markup to transmit the most information in the least amount of bits.

If that were the case, we'd be using binary serialization everywhere. But you yourself are making an argument that ease of writing matters. So does ease of reading. Overly verbose markup is a tax on both, but so is extreme brevity.

Anyway, this particular thread was a discussion to address a very specific point made in the article that overstated XML verbosity over JSON. I'm not actually arguing that XML is perfect, or even "good enough". Its syntax is overly verbose, and its data model has pointless distinctions and arbitrary restrictions. But it also had many good ideas, and it's unfortunate that those get ignored in the quest of simplifying everything - and then later, when the issues that were the original motivation for those ideas are rediscovered, that wheel gets reinvented in dozens of flawed and mutually incompatible ways.


.NET Core uses .csproj project files which are XML.

Do you have difficulty in reading a large HTML document? That's as verbose and repetitive as XML but also usually filled with junk comments and malformed tags. If you don't find it hard, then why is XML different?


> Millions of developers use XML all the time, it's called HTML.

HTML is not XML and even then few developers today use HTML without some sort of processing involved. HTML is also not a language used for configuration files, so it is really off-topic.

The question is, do people prefer XML configuration over JSON configuration. The answer is that people generally prefer JSON. JSON just maps far better to data types we use in everyday programming. Have you seen some of the common ways of expressing key-value pairs in XML? It's horrible.

> Now try taking a typical HTML document and expressing it in JSON.

That's not the problem anybody is trying to solve here. We're talking about configuration files, not complex documents. But let's say I wanted to solve that problem, the representation would look like this:

http://www.jsonml.org/

Not that bad at all.


It's a technical distinction without a practical difference. They are both declarative languages to describe data.

Why is it so horrible? Have you ever typed a <ul><li> list? Or <input name="key" value="value">? What exactly is so difficult about this syntax in XML but completely fine in HTML?

The point isn't whether you can represent a document in JSON, it's about how easy it is for a human to manage it. The XML/HTML structure is far easier as things get more complex. The page you linked to shows this quite clearly even with the small "Bulleted List Example" at the bottom.


> It's a technical distinction without a practical difference.

There's a lot of practical differences when actually implementing configuration with either XML or JSON, due to technical distinctions.

> The point isn't whether you can represent a document in JSON, it's about how easy it is for a human to manage it.

If XML is easier to manage, why are people shifting from XML and to JSON for configuration? You may have some experience with HTML, but have you actually used some of these XML abominations that are used for configuration?

> The XML/HTML structure is far easier as things get more complex.

Configuration files shouldn't be complex. They're mostly key-value pairs, perhaps a couple of lists here and there and a modest amount of nesting. XML is complex to begin with.

Perhaps at some stage, for some use-cases, XML starts becoming simpler to edit. Configuration files generally isn't one of them.

> The page you linked to shows this quite clearly even with the small "Bulleted List Example" at the bottom.

It's an example of representing XML-like structure with JSON, which is fairly easy. Try it the other way around, things get hairy. If you were to represent the data as JSON, you wouldn't write it like that.


If config is small, both formats are easy and it doesn't matter. When config is large and complex then both formats can be hard, however XML is much easier than JSON as complexity climbs. My evidence is how many people easily edit large complex document structures in HTML already without issue.

I'm just not sure what the argument against this is other than verbosity. Config files shouldn't be complex? Sure, but what if they are? Are people really moving to JSON everywhere or is it just tied to the rise of Javascript and web frameworks?


> If config is small, both formats are easy and it doesn't matter.

Even if I would agree with this (which I don't), if it doesn't matter then you should pick JSON because it's easier to work with as a developer and it doesn't require XML parsing as a dependency.

> My evidence is how many people easily edit large complex document structures in HTML already without issue.

This isn't evidence at all. An HTML document is not a configuration file. Even then, people rarely author large HTML documents by hand these days. It's more likely that they're using a simpler language like Markdown to author documents.

> Are people really moving to JSON everywhere or is it just tied to the rise of Javascript and web frameworks?

If Javascript and web frameworks are responsible for the rise of JSON, but XML is better, why didn't they stick with XML? Remember, SOAP was XML. AJAX stands for "asynchronous Javascript and XML". The way to do an HTTP request (before fetch) from Javascript was called (inappropriately) XMLHTTPRequest. JSON on the other hand was just an informal spec with several flaws, yet it won out over XML.


1) What I mean by "it doesn't matter" is that at small scale, neither one easier than the other.

2) JSON does need to be parsed and does add a dependency. Javascript is not the only language out there.

3) HTML not being config files is not the point. The document/tag structure is identical. If you can edit large complex HTML files (regardless of how they are generated) then you can edit large complex XML files. And it's rather easy to do so, against the claims that XML is so hard to work with.

4) There are plenty of XML configs out there, backing just about every piece of software you use. If you only focus on web/js projects then JSON "won out" because it is a simple dump of the in-memory representation rather than a more formal serialization like XML. The missing features like JsonPath and Json Schema are now being added back to turn it into a proper serialization format. When storing configs, these schema features are rather important.


1) Yes, if they're both easy to use, pick JSON - for simplicity's sake.

2) A JSON parser is a far smaller and simpler dependency than an XML parser, especially when we're adding XPath to the mix.

3) XML isn't necessarily hard, it's tedious.

4) There's plenty of legacy software out there using all kinds of stuff for config. You don't see a lot of people choosing XML these days. You see YAML or TOML or JSON, even though all of these have issues of their own. JSON happens to be the simplest and most commonly supported.


HTML is distinctly not XML. There is XHTML if you want to use a form of HTML that is processed like XML, but XML processing is a lot stricter than HTML processing in which you can take a lot more liberties in the markup and still end up with a readable result, maybe even what you intended, whereas an XML processor will usually refuse to render incorrect XML markup at all.

I'm comparing the data structure and editing ergonomics (which are identical) between XML and HTML, not how they are processed. It's good for config files to have precise parsing instead of tolerating errors.

>They are both declarative languages to describe data.

This still applies then


Well put. Thank you.

In my world, config files are machine-parsed but human-generated and modified. JSON is just easier to work with.


XML is a lot less pleasurable to parse from a programming point of view though. With a JSON file it's something like:

const config = require('config.json');

console.log(config.apiKey);

With XML you often have to use things like XPATH etc. to extract the data which is a lot more effort.


Javascript isn't the only language though. There are lots of XML and JSON parsers and serializers available to convert to an in-memory object. Most relational databases have great XML support too.

I don't think people realize that JSON doesn't actually have a querying system at all, you have to deserialize it to an object to use. There is the coming JsonPath standard but that's not well supported yet and is pretty much the same as XPath.


I think JSON is just universally more easy to use. Every language I’ve used with it is mainly the same in the way it works. JavaScript, Ruby, Python, Perl, PHP...

But you’re not wrong. Yes you have to deserialise it. Hence it being an Object Notation system.

While you say JSON is trying to catch up with XML in regards to XPath, JSON is trying to catch up to XML with things like JSON Schema.


> While you say JSON is trying to catch up with XML in regards to XPath, JSON is trying to catch up to XML with things like JSON Schema.

Absolutely. I would say XSD is the strongest advantage of XML over JSON right now. XSD still has many things to improve but JSON schema is even further behind.


> I don't think people realize that JSON doesn't actually have a querying system at all, you have to deserialize it to an object to use.

You don't have to deserialize it to an object, you just do that because it's convenient in Javascript (and other dynamic language). It's a feature.

Now try using XML like an object, what do you get? A DOM. Which is fine if you wanted a DOM, but I don't want a DOM. I don't need XPath or any of that stuff. These are tools to deal with the complexity of XML, which I don't have, because I am using JSON.


A DOM is just a tree (and deserialized XML is not always W3C DOM; indeed, in most modern languages, it usually isn't). Deserialized JSON is also just a tree. And even if you deserialize JSON by eval'ing it from JS, it is still deserialization - you're just happening to be reusing the JS parser for that purpose. But any parser is fundamentally a deserializer from the language syntax to an AST.

So you're really saying that JSON deserializes to something that is a more natural fit for the language that you're using. And it's true in many cases; but also not so much in others, like when you're dealing with 64-bit integers, or dates, or all those other things that JSON doesn't spec because "complexity". In practice, it just means a proliferation of incompatible ways to represent these things, and utterly insane deserialization behavior in corner cases when implementations try to be "smart" to transparently compensate for JSON lacking something (e.g. https://github.com/JamesNK/Newtonsoft.Json/issues/862).

Conversely, if you are writing in a language that has integral support for XML - say, XQuery, or even VB.NET (https://docs.microsoft.com/en-us/dotnet/visual-basic/program...), the complexity is mostly not there. At the very least, if you control the format - which you have to, if you're in a position to decide what to use - then you can certainly stick to the subset of XML that is not anymore complex than JSON.


Remember, we're comparing with XML. There's no dates or numeric types in XML at all. This kind of proliferation is far worse in XML.

Sure, there are corner cases and limitations with JSON. I've never experienced that as a significant issue.

> Conversely, if you are writing in a language that has integral support for XML - say, XQuery, or even VB.NET...

I am not using any of that stuff, nor is there any reason for me to start using it.

> At the very least, if you control the format - which you have to, if you're in a position to decide what to use - then you can certainly stick to the subset of XML that is not anymore complex than JSON.

...or I can just use JSON.


> Remember, we're comparing with XML. There's no dates or numeric types in XML at all. This kind of proliferation is far worse in XML.

XML (or rather, XDM, which is the appropriate level of abstraction to talk about this) has all those things:

https://www.w3.org/TR/xmlschema-2/#built-in-primitive-dataty...

Nor does it doesn't require an out-of-band schema - you can slap xsi:type on any element. And you can do that without breaking the data model, because namespaces keep data and metadata unambiguously separate, and code can easily deal with the former while being completely oblivious to the latter, unless it needs it.

JSON also has similar higher-level abstraction layers with more metadata. The problem is that nobody can agree on which one to use, or even whether to use one at all, and most code that's deserializing JSON in the wild is not going to be able to distinguish metadata from data.


XML has string attributes and text content.

Sure, you can add information until you arrive at a point where a string attribute or text content will be interpreted as a certain data type, but XML itself doesn't have it.

> JSON also has similar higher-level abstraction layers with more metadata.

JSON has all the basic data types built right in, there's no need for more metadata to do simple things. There's a reasonable mapping to basic data types and structures for almost any language.

> The problem is that nobody can agree on which one to use, or even whether to use one at all, and most code that's deserializing JSON in the wild is not going to be able to distinguish metadata from data.

...which is generally fine because of the aforementioned mapping. Your JSON library doesn't have to (and shouldn't) do any magic.


That's ridiculous - there are lots of pragmatic xml deserializers available. Not to mention, sometimes you want something like xpath; lack of xpath and lack of validation (schema, whatever) aren't features, they're bugs.

The only real advantage json has when it comes to deserialization is that it suggests to a human reader that keys are like object properties, leading to a very obvious deserialization strategy. But even that's a bit misleading, since json allows duplicate keys, just like xml, so a pragmatic deserialization library is going to make that feature impossible.

Make no mistake - the culture of straightforward deserialization is hugely valuable! But that's because of its history and other human factors more than the language itself.


Fair, but many developers are intimately aware of the shape of that kind of XML. Ever tried making sense of an XML document that tries to express something as complex as a web page, but without the requisite domain knowledge? It's painful.

Slightly less so as a JSON document, IMHO.


I never write html directly. With jsx you can have variables and reuse code.

I think the stigma comes from issues using XQuery etc. Not that they are unusable but they require adjusting your way of thinking.

But XQuery (and XPath, being its subset) is pretty much just pure sequence comprehensions for the XML Data Model. It might have been unusual from a mainstream PL perspective back when it was introduced, but today, when C# has LINQ, and JS developers preach the miracles of map/filter/fold over immutable data structures, I don't think XQuery is all that exotic.

JSON has no querying system, it can only be accessed by deserializing to an in-memory object or using whatever custom APIs are available (like postgres json functions).

JsonPath is the proposed standard, and it looks pretty much like XPath. And that's before getting into Json Schema.


At least XML has comments and you can simply duplicate the last element in a list without fear of forgetting a comma.

XML has other problems though.


And the problem would not exist if we just used the good ol' s-expressions. Gets all the structural benefits of XML and JSON with few of the drawbacks.

The problems people have with XML is not the syntax. Also, how would you represent (optional) attributes in s-expression? I could think of a couple ideas but none that is nice.

Basic S-expressions also don't distinguish between lists and maps, which is something that turns out to be very convenient in practice. Sure, a map is just a list of pairs - but the deserializer needs to be aware of its meaning to parse it into the appropriate data structure. So you either need a schema even for the most trivial cases, or you need a distinct syntax.

EDN is a better Lisp-flavored candidate, IMO.


Basic S-exp syntax can easily be extended to denote dictionaries. Just like #(...) gives us vectors and #S structures, some #H can provide hash tables.

Of course, but then we're not talking about raw S-exps, but something built on top of them. And then you get something like EDN anyway.

> Of course, but then we're not talking about raw S-exps, but something built on top of them.

I don't understand what this means.


If you really needed, you could just define that the second list element is an attribute list/map, turning <foo bar="baz"><quux /></foo> into (foo ((bar "baz")) (quux)).

The problem with XML itself is being unnecessarily verbose (and thus difficult for both human and machine to read) for what's just a way to encode trees. Attributes are arguably XML's self-inflicted gunshot wound in the foot; you mainly need them because of visual noise caused by regular nodes.


I believe that more than being verbose (closing tags for example) it is that the whole spec is enormous, with entities and namespaces making it even more complex. Still we see that some of that is actually needed as various json path/schema projects show.

mapping a 1-1 s-expression translation on HTML/XML/JSON/YAML etc. solves nothing. YAML has (had) code execution security problem and parser incompatibility issues. HTML/XML actually need namespaces in a few cases. JSON is abused as "compile target".

There can be no one format that works for everything. This is why I like TOML, it is really good at what it tries to do and stops there.


There is no need for optional attributes in the first place, because you can always just do this:

  <thing>
     <optattr>42</optattr>
     ...
  </thing>
insteadof

  <thing optattr="42">
    ...
  </thing>
Attributes exist in order to create a more compact syntax within XML/HTML tags, in ironic recognition of these notations being horribly verbose.

Many implementations of JSON - especially the ones that parse human input - allow trailing commas for this exact reason.

You may be right, some JSON parsers even support comments. But you never know which parser is strict and which not, because it's not part of the specification.

Why would you limit yourself to those two options? Use TOML or something.

TOML kinda breaks down when you have nested data:

https://stackoverflow.com/questions/48998034/does-toml-suppo...


The answer to the question is "yes it is supported", it doesn't break down at all.

It's even in the example file https://github.com/toml-lang/toml/blob/master/examples/examp...


looks quite ugly. repeating keys... yaml looks better (while it has other problems)

Your link shows nested data structures working? But regardless TOML is a configuration file format (like INI), not a general purpose data structure format (like JSON is).

> general purpose data structure format

General purpose until you need to store dates/times, self-referential structures, enums, etc.

json and toml are both serializations of data. I don't think the line you're trying to draw between "configuration file format" and "data structure format" is a well-defined line.


Sure it's a blurry line but some formats are better than others for some specific tasks. JSON isn't the right tool for every job but it's a good enough lowest common denominator for data sharing.

I wouldn't want to use if for config files or passing complex data but then it's not designed for that.


TOML also has more "normal" notations for dicts and lists that are almost the same as JSON.

There is a distinction between markup languages e object notation languages. XML, HTML and Markdown are markup, JSON, YAML, TOML and most other are object notations.

(here my distinction, in general, is if they allow unquoted text)


Shameless plug for a notation I work on: http://treenotation.org

To me it’s an improvement over JSON which is an improvement over XML.


I mean, we're not going back to xml anyhow; and xml is considerably older anyhow, so this is a pretty hypothetical case.

But I'd posit that most - but not all - of the issues with xml config files have little to do with xml per se, and everything to do with the crazily detailed stuff people squeezed in it. You don't have to stick everything in a namespace, and nest even trivial things 3 deep with annotations for obvious types. And if you do those things in json it turns into a mess too.

SOAP is a perfect example of that. But: just because soap is messy doesn't imply xml everywhere has to be that.

But sure, xml has some downsides. Then again, so does json. Oh well!


At least XML has comments.

This is a false choice. These aren't the only two options.

XML does suck, but at least it supports comments.

Its easy to write multiline xml and add conments.

Cant do that with json.


Being able to statically analyze and transform configuration is a benefit for humans.

I don't understand why people think JSON is a bad format for humans. The ability to write invalid JSON is very hard, and doing so will result in JSON parse errors. Take YAML on the other hand -- simply omitting a dash can result in unintended consequences that will parse just fine:

  some_collection:
    - first_entry
vs

  some_collection:
      first_entry
When you're staring at a non-trivial YAML file, it's very hard to spot these errors, and you'll potentially get some very weird behavior in whatever is depending on the config.

Not as bad as YAML isn’t much of an endorsement.

Usually when people complain about JSON they offer YAML as a better option. This article complains about JSON and offers no alternative at all -- what would you suggest?

I think for most cases, old school Unix style is perfectly adequate. Where it isn’t xml is the least bad alternative (and yaml is probably the worst).

Ever edit a 200 line yaml file? There are worse options than JSON.

I did, and it was mostly painless, both from IntelliJ and from Emacs.

I do know about a number of gotchas, and stepped on some briefly. Yaml is not ideal, but its set of tradeoffs is more suitable for human-editable configs, from my point of view.


I’ve had to debug 600+ line elasticsearch queries in JSON format, while working the yaml for Ansible playbooks also quite long.

It doesn’t matter the language or syntax, if the unit of structure doesn’t fit on the screen, it’s hard to work with.


I never did, what problems did you encounter?

Not the parent commenter, but it really gets tough to figure out _where_ you are in a bunch of nested lists, objects, etc deep down in the yml file. When the schema gets complicated enough you start having to refer back to the documentation to understand what syntax a particular item needs (I'm looking at you, docker compose!)

One example from a larger file:

  networks:
    macvlan51:
      driver: macvlan
      driver_opts:
        parent: eth0.51
      ipam:
        config:
          - gateway: 192.168.51.1
            subnet: 192.168.51.0/24
Maybe I'm just getting older and dumber, but that hyphen under config seems completely arbitrary to me (and the necessity of it is poorly documented in the manual). This gets magnified to the extreme as your config file grows. Maybe this is a list of objects under a key? Who knows.

Edit: OK, it looks like hyphens indicate arrays in the schema but for me it feels like a guess as to which one of these to use for each particular case.


Without the hyphen configs value would be an associative array. With it it’s a list.

I’ve been editing long sublime-syntax files recently and I do not like it much. But I’m also not sure if JSON would be better for such a thing.


Whats the point of a list of tuples? Order? Json seems more immediately obvious and consistent in this case ie [{a:b},...]

It's not a list of tuples, it's a list of dictionaries.

Ah, but my confusion just deepens my point, haha.

It's mostly a question of what you're familiar with. JSON using [] and {} to denoting list vs dict isn't fundamentally easier than "-" and a prefix-less list of key:value pairs. YAML has a bunch of extra complexity, but IMHO this difference is really not it.

From a syntax standpoint parenthesis are extremely more clear than prefixes+whitespace (maybe less readable).

Semantic whitespace is a nightmare. It's also sometimes I ambiguous whether things are ints, bools, or strings.

> JSON is a poor format for humans

I have never understood this argument. I'm not saying it's the best, but I find it much much more readable than yml or xml.


I don't feel pain with vscode's settings.json, or .eslintrc for example, because they are autocompletable, allow comments (json5), allow trailing commas (json5), ...

Unfortunately, it's not actually JSON5, it just supports a few features from the spec.

But yes, if every implementation of JSON moved on to JSON5, I don't think we'd be having this discussion.


It would be interesting to try Markdown files for configuration data. Documentation and settings in the one file: structured data like arrays can be stored in md tables.

Interesting idea. I did a quick search and found literate-json on github.

I think it's possible to skip the .md to JSON step?

Example is below where an application could either extract values at startup, or is extracted when code is built and deployed?

  # My App

  ## Intro about My App

  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

  ## Prod Server Settings

  | Host                     | Port          |Operating System| Type   |
  | -------------------------|:-------------:|:--------------:|--------|
  | webserver1@myapp.com     | 80            | Debian         |Web     |
  | webserver2@myapp.com     | 80            | Debian         |Web     |
  | db@cloud.com             | 1433          | aws            |Database|
  | webservice.com/api       | 80            | aws            |Database|
  ---------------------------------------------------------------------

  ## Support Information

  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

  ## Application Roles

  |Name                   | Active Directory Group |Description                              |
  |-----------------------|------------------------|-----------------------------------------|
  |System Administrator   | MyApp.SysAdmins        | Details who should be assigned the role.|
  |Finance Administrator  | MyApp.FinAdmin         | "         "                             |
  |Finance Delegate       | MyApp.FinDelegate      | "         "                             |
  |Finance Analyst        | MyApp.FinAnalyst       | "         "                             |
  -------------------------------------------------------------------------------------------|

> JSON is a poor format for humans

Depends on which kind of humans.

For the regular non tech types I agree JSON is poor. A couple of years ago I created a mini spec for a config file so that very low tech paper designers could configure an interactive book.

For programmers, JSON works fine in my experience.


> For programmers, JSON works fine in my experience.

The author mentions this, and I very strongly agree. What JSON significantly misses is the ability to have comments in your configuration files. I'm not going to hold up the Apache config files as an example of amazing config formats, but... here's an example: https://svn.apache.org/repos/infra/websites/cms/webgui/conf/...

That's awesome. It describes options that are available and not enabled, and what they do. Right. In. The. File. Comparing that to, say, webpack config... there's a pile of research needed to figure out what options are even available, let alone what they do.


Yeah, that's the #1 problem with JSON as a config file format, IMO, and the Apache logs, among others, are a perfect example of why.

All the other arguments for or against it boil down to personal preference, but not having comments is a real miss.


Scale is the important thing here. Even if you're a magical superior human programmer variant, json sucks once you cross over 50 lines.

Cloud Formation is a great example of the failures of JSON. Specifying just a single dynamo table with indexes and scaling roles in JSON is a hard to read cluster of a mess. Even with a good idea and json collapse / expand help, large json files are just insufferable to deal with as a human


I'd say if you're crossing over 50 lines, it's going to suck no matter what format you're in. Other solutions are not really going to fare any better than JSON at that level.

XML + XInclude would allow you to break it all into multiple files, for example.

> For programmers, JSON works fine in my experience.

Programmers will maybe cope better but just being a programmer doesn't make you magically immune to poor readability and lack of comments.

Most of the points in that article apply to programmers as much as non-programmers.

(I agree there is a reasonable debate to be had about programmability - I'm on the fence about this one)


We've been using HOCON for some configuration files. JSON but with comments and a few other more human friendly bits.

I don't hate it, but I really dislike it. It's definitely better than JSON for this purpose, but it's still quirky and a bit of a pain.


I feel like JSON is a decent format for storing human-readable configuration data.

The use of brackets means if something is broken, you'll know explicitly because your IDE will throw a fit. Formats that are white-space dependent can have subtle errors that aren't easily recognizable on first glance.

The 2 problems with JSON are multiline string support and comment support, but JSON5 solves both of those problems. Sure JSON5 isn't JSON, but YAML isn't JSON either. No matter what you're going to need a library to parse a config file, so why not stick to something based on a widely used data-storage format?


While there are no explicit comments in JSON, you can just add a superfluous string:

{ useFluxCapacitorComment: "Enables time travel. true or false", useFluxCapacitor: true }

I don't mind the formatting, but I'm a professional developer, so my tolerance for complex UIs is abnormal.

Written on mobile, so that JSON may look like garbage.


Often not an option:

E.g: npm/yarn will complain loudly if you add unexpected lines in package.json IIRC.


I have seen people using additional attributes for comments. That does influence the runtime, which comments should never do, but I thought it as a practical solution. Not optimal of course.

Though not all JSON parsers accept random attributes. The AWS CloudFormation one, for example, is extreme in its refusal to accept "malformed" JSON. It really frustrates me sometimes.

> Comments are mostly useless in a machine-generated and machine-consumed file. They are indispensable for humans, because they are a link to what machines are not doing.

But comments are often code smell—a crutch for poor identifier naming and design. If your names and design are well thought out you rarely need comments and the consumer of your code should be able to understand it using the statically checked source alone—-versus manually having to parse your unchecked human language documentation.


Comments are a code smell when they describe things that should be described in code.

Comments are important when they express things that code cannot.

Examples:

// See ticket XYZ-123 for discussion and rationale.

// We are sidestepping a bug in Docker 3.x here.

// TODO: Allow queue size per service version. Currently we're using max size for all.


Great examples. I wonder though if all those could be better expressed using something like decorators in order to enforce some static time checks. For example, instead of a free form comment to do have a structured todo notation that then makes the todo parsible and checkable. Often I see a //todo comment that is still there by accident after the thing is done. Or a todo that has since been abandoned.

Edit: adding a link to the Clean Code bit about comments, which I think is great advice : https://gist.github.com/wojteklu/73c6914cc446146b8b533c0988c...

From the link, about comments:

  Use as explanation of intent.
  Use as clarification of code.
  Use as warning of consequences.
Here's a link to a well-known recent book, half of which is pretty much about why those points are a real and frequent thing, and why code should be much more thoroughly commented than it's recommended by the usual philosophies.

https://www.amazon.com/Philosophy-Software-Design-John-Ouste...


In a JSON config comments can be very useful to "hide" disabled options

["DoStuff", "EvenMore"/, "ops go back :("/]


> If your names and design are well thought out

Sure, but how often does that happen?


It usually happens in good code if the language supports it. But I agree, not enough and comments are great as a stopgap until the thing you need to explain can be explained in the programming language. But sometimes that can’t be and so you do need to resort to comments. It’s like types in doc comments being used as a stopgap until a language added types. I hypothesize (I plan at some point to crunch some data here) the more comments the worse the language, and so you’d see a pattern like Python 3 code having fewer comments on average than Python 2 code.

JSON is data, not code.

Anybody that wants turing complete programmability in their config file is asking for pain. You can always embed such a language in a string as an escape hatch, so there's no loss of generality in any case, but it's sufficiently painful to encourage simplicity and security. But there exist computation models weaker than turing completeness that are still useful, some of those might be reasonable in a configuration file format. For example, you might want to specify a bunch of things iteratively, or perhaps with some fixed number of temporary variables; that's merely an FSM. Or even simply plain non-iterative logic with parameter passing, which is even easier to reason about and possibly still useful; i.e. re-usable templates without iteration or recursion.

I can at least imagine such a configuration file format could be both simple enough for human and machine to reason about, and flexible enough to add value over (say) json. But the devil is in the details; and it's not always immediately obvious what's too complex.

In any case, json5 looks like what json should have been from day 1; several of the really critical improvements were common JS practice when json was invented.

I'm a little worried about the IdentifierName extension, since that's more than what javascript itself allows (no reserved keywords!).

But json5 is also an interoperability problem. The nice thing about a (good) standard is the that if it works someplace, it'll work anywhere. Add json5 into the mix, and that's not the case anymore.

Then again, there aren't a huge number of plausible successors to json, so maybe this is the best we can hope for. And it's trivial to down-convert to plain json, which means it's not too bad, I guess?

I wish json5 simply worked everywhere ;-).


Dhall is a good example of a "programmable" configuration file format. It's based on System F, which means you get no-nonsense abstraction (in the LC sense) without any pain. It really helps to be built on something conceptually robust instead of ad-hoc whims. Oh and it has a static type system. Quite nice!

I find HOCON to be a nice minimal extension to JSON - allows config values to reference each other, with some basic string and array functions included, and allows successive files to append to or override existing config values. Fills 90% of what I see people asking for in terms of programmability.

https://github.com/lightbend/config/blob/master/HOCON.md#goa...


It's hard to believe json standardized without comment section and everything json5 fixes.

TOML is fine.


It's easy enough to get Turing complete config files with LISP

shivers


Author of the article here; I think this is a complex issue with no single "right" answer. It depend on the context and what you're doing with it.

I wrote this after getting annoyed with MediaWiki's new extension format which uses JSON files to set up the extension. What I wanted to do is not include some CSS if a certain plugin is loaded, and that's pretty much impossible to do in a declarative configuration file. Using a JSON file – or any other declarative config – to describe how code works is not a good idea IMHO (although it's probably fine to add basic metadata).

The worst case of declarative programming gone wrong that I've seen is probably k8s.

I can't really judge your Gradle example, as I've never worked with Gradle (or Java in general).

> I'm getting a bit fatigued of these you're-doing-it-wrong posts

I agree, and I would rename the article if I hadn't put the full title in the URL (I have since stopped doing that exactly because I wanted to rename this). In my defence, at the time I was annoyed with MediaWiki, PHP, this JSON stuff, and the reasons I even had to work on MediaWiki in the first place (which is a long story involving a lot of drama).

That being said, I think using JSON as a human-editable configuration format is, quite frankly, never a good application of the tool. I can't think of any cases where it's the best tool, chiefly due the lack of comments, but also because it's kind of a pain to write. Variants which allow comments – such as JSON5 – are okay, but that's no longer JSON.


You can have declarative configs that are programmable, these are not exclusive.

> You can't even figure out what the dependencies are going to be for some Gradle projects without executing them

This sounds like an issue of Gradle, not of programmable configs in general.


It's not an issue of Gradle, it's a feature of Gradle. Why would you want to figure out the dependencies without executing script? Programmatic configuration allows e.g. specify some library version once (e.g. springVersion) and use it for 10 dependencies as a variable. And that's a simplest use-case. Full-featured programming language allows for a lot of flexibility.

Not every config file needs that, but a config file for a build system definitely needs that. Otherwise your build system becomes too restricted and you must either write your logic in plugins or use another scripts along with your main build tool. Both options are bad.

I'm not even sure that build.gradle should be called a config file. It's a build script.


> Why would you want to figure out the dependencies without executing script?

Originally, the whole point of using configuration files for builds was to eliminate the programmatic element where you used lots of little scripts to run builds. Config files are easy to read and reason about.

> Programmatic configuration allows e.g. specify some library version once (e.g. springVersion) and use it for 10 dependencies as a variable

Declarative syntax can also include variables which can't be reassigned to later.

Gradle's original marketing message was based around its syntax, i.e. Apache Groovy being easier to read than XML, which is true. But providing the programmatic ability of Groovy as well is a step backwards for configuration best practice, ultimately resulting in an increase in technical debt. Thankfully, almost all the Gradle build scripts out there are simple 20-liners which don't use programmatic logic.

Gradle ought to follow the example of Jenkins pipelines, which later provided a Declarative Syntax as an alternative to Groovy to mitigate this very issue.


> Why would you want to figure out the dependencies without executing script?

For the same reason that I would want to figure out if my script type-checks without executing it - making it easier to avoid bugs and reason about the program.


Why does every piece of advice need a counter pitch of a better idea? If I tell you “don’t stick your arm in a wood chipper, even if it’s not running”, I’m hoping you’re not going to be like “well then what should I stick my arm into?!” Presumably he didn’t give prescriptive advice because configuration isn’t one size fits all.

Worth mentioning that VS Code also lets you map any filename glob to a JSON schema. [1]

1: https://code.visualstudio.com/docs/languages/json


> "Lack of programmability" is a feature

if you ever have a non-programmable config file, then at some point someone will make a turing-complete programming language which goes on top of your tool.


Or, if it's open source or an interpreted language, they could edit the code of the tool to implement whatever new features they want.

> executable-program-as-config I learned to hate in the JVM world

I don't understand how anyone who ever used gradle or even maven and went somewhere else can actually say that

Gradle is simply amazing and literally don't know what could be better about it. The link posted is so trivial compared to it's power, it's like criticizing a bullet train for the pattering on the seats.


JVM world had XML before JSON became the rage.

- parsing xml may cause random random network requests (firewall issues)

- xml contain comments, sooner or later someone will try to add data to comments, but you already have the thing for data, the format itself, adding data to comments is stupid

- unlike js, in json someone finally said "Enough! There will be only one type of quotes", it makes parser simpler


> "- xml contain comments, sooner or later someone will try to add data to comments, but you already have the thing for data, the format itself, adding data to comments is stupid"

That's easily the worst argument against comments I've ever heard! Would you argue against comments in code because some jackass could abuse it to insert custom compiler directives or something? Ridiculous. The good comments do easily outweighs any harm some moron is going to do to himself by parsing comments for data.

I don't like XML any more than the rest of you, but praising json for not having comments just smacks of sour grapes.


The intentional lack of comments cements json's position as a (human readable) data interchange format. It's not designed to be anything else. Which is fine. There's nothing wrong with a file format being focused on one goal.

Indeed.

And the point of the article is that JSON is fine for data, but configs are not data. Configs are code.


Lack of comments just makes it a data interchange format. It’s comments which make things human readable.

But you don't _need_ to abuse comments for extra (or meta-) data if you're using XML, you just use a separate namespace.

It's definitely the case that XML has features that aren't needed -- or at least, we've survived without them thus far -- but there definitely feels like more than just a smidge of only realising later why some of that complexity was present. See also: npm vs pretty much any other dependency management system.


> parsing xml may cause random random network requests (firewall issues)

inability to properly use a library should not count against it


Java is a domain specific language for converting XML files into stack dumps.

Maybe it’s meant to be a feature, but instead what happens is you end up with declarative configs absolutely littered with custom metadata and huge and massively complex custom engines to decipher them, where it would have been simpler to include it in the config. They suck.

VSCode configs is a bad example to advocate JSON. They embedd a custom DSL into JSON as using JSON structures to express the same things will be too verbose I presume.

JSON5 makes things better, but it still requires too much syntactical noise and multi line string values are still not there.


> "Lack of programmability" is a feature.

True... for end users, but absolutely not for other developers. I often see this mentality used as a crutch to baby developers from having to write original code.


Lack of comments was the reason I never liked JSON for configuration. JSON5 looks awesome.

So what's the author's proposed alternative?

IMHO JSON5 (JSON with comments basically) is fine for config files, VS Code is using this and they seem to be doing alright.

Addressing the article's points:

- Lack of comments: not an issue with JSON5.

- Readability: I think it's pretty readable. Are the quotes, semicolons and commas really an issue? Not for me.

- Strictness: just edit it with an editor that points at the errors quickly. Escaping quotes might be annoying, but I can't think of a possible alternative format where escaping isn't required at all.

- Lack of programmability: I've never had any problem with this, and again pretty complex and customizable applications like VS Code seem to be doing fine without this. This may even be considered a feature, as it forces you to keep the configuration well separated from the actual logic.


Personally I'm partial to the classic Windows-style INI file.

Because it was designed to be a human-editable configuration file format, it has a simple and intuitive format that is easy for both humans and machines:

  [Section]
  ; comment
  Key1=Value
  ; comment
  Key2=Value
  [Section2]
  Key1=Value
It is also easy for machines to non-destructively edit, maintaining existing comments and formatting by humans, and applications take advantage of that. That is a big plus in my opinion.

TOML is arguably an evolution of ini format.

The problem with ini is that it has no good story for arrays or associative arrays. TOML attempts to fix this, but it provides ugly format for arrays and associative arrays.

I personally find the TOML solution quite nice. In part because you can always drop to JSON syntax to avoid excessive nesting of array tables.

It is pretty limited and the worst part is quoting. What if your value contains \n? What about UTF8?

Horrible for: UTF8, syntax checks, numbers (albeit JSON has this problem too for floats), deep nesting, arrays, strings as keys, consistency between different parsers, etc etc

Good for: comments, readability if keys limited, familiarity.


UTF-8 is only a problem if you're using the Windows implementation. I don't see what the problem with numbers is, either, pick a representation that suits you and use it.

I agree, but my second choice if INI doesn't suit someone is just the standard app.config in Windows (trivial to interface with in the .Net space), or storing the values in a table. I'd prefer any of these before JSON or the other alternatives.

No standard = deal breaker.

Comments might work for a parser but not on another. What's the point when you can do better with TOML?


> - Lack of comments: not an issue with JSON5.

Yeah, but now you have to find a tool that supports that. Easy on small projects, but try being at a company supporting many languages and frameworks.

> - Readability: I think it's pretty readable. Are the quotes, semicolons and commas really an issue? Not for me.

If you haven't tried TOML, you're missing out. Why on earth would you put up with syntax (hard to author, at that) that isn't necessary? Syntax distracts from the actual problem that this is trying to solve. JSON is fine for machine read/write, but horrible on humans.

> - Strictness: just edit it with an editor that points at the errors quickly. Escaping quotes might be annoying, but I can't think of a possible alternative format where escaping isn't required at all.

What if you have an issue in prod and need to fix something quick? You might only be able to open it up in a stripped-down vi, or if the machine is locked down you might only be able to materialize a plain text view. What if not everyone uses the same tool? This imposes a further constraint. Yet more concessions pointing to how horrible JSON is for humans.

> - Lack of programmability: I've never had any problem with this, and again pretty complex and customizable applications like VS Code seem to be doing fine without this.

Can you imagine projects where this might be an issue? These aren't the only codebases in the world. And just because VSCode appears to be doing fine doesn't mean they don't second guess that decision. They might just be locked in at this point.


Most languages have JSON5 libraries just as they have TOML libs.

TOML is decent but if you need nested data it gets super awkward.

> What if you have an issue in prod and need to fix something quick? You might only be able to open it up in a stripped-down vi, or if the machine is locked down you might only be able to materialize a plain text view. What if not everyone uses the same tool? This imposes a further constraint. Yet more concessions pointing to how horrible JSON is for humans.

JSON5 is pretty easy to edit correctly with any text editor. Not much harder than TOML.

> Can you imagine projects where [lack of programmability] might be an issue? These aren't the only codebases in the world. And just because VSCode appears to be doing fine doesn't mean they don't second guess that decision. They might just be locked in at this point.

Does TOML allow programmability that JSON5 doesn't?


This is the first time I read about TOM. After checking out the syntax, I don't like the way you declare nested data, because if you need to move one block of data to a different parent you need to replace all the path of each table. If done wrong, this can lead to very difficult to find errors. OTOH, it can be easier to send partial information, just append this block to your config and you are done.

How do you tell if a file is json or json5? Feels better to forget about json for config and go with the obvious TOML that has single file extension with single standard.

If you're trying to parse it, you don't need to tell. Just parse as JSON5, it's a strict superset.

And if you're editing a file, you can usually tell by the fact that there are comments in it. :)


Official json5 site does mention using .json5 extension to be obvious but I bet people will just use .json which will be confusing if the file is passed to another party and they get parse error until they realize it's json5. I think the only downside to json5 is that they inherited the name json instead of coming up with a different name.

While J4X would simply let you specify the version number in the DTD, if course! ;)

https://news.ycombinator.com/item?id=19656646


Generally, you swap out the config file at build time if that's a requirement. Just as a few examples - you can set up a dev/demo/prod(whatever you call these where you work) config file in .NET Core, Docker/K8S, Angular, etc. As far as I know, this is considered best practice.

There are many different needs.

  - static config
  - environment specification
  - runtime feature flagging 
  - semi-static control plane config 
  - rollout weights
Etc.

All of these have different characteristics.

Infrequently changed, a priori config that doesn't need to be changed in emergencies is suitable for baking in at compile time.


> What if you have an issue in prod and need to fix something quick?

Just fix it quick somewhere that’s not prod. Snowflaking has not been a good practice for a long time now, so arguing that a certain tool makes it easier is a bit of a moot point.


I'm not sure what the the author would suggest but I'm increasingly of the opinion that config files formats should be treated as a compile target not a file that you edit.

There are formats that are easier to edit than others and formats that are easier to parse than others. But sooner or later all of them end up being created by an ill-suited template engine or embedding a programming language in the syntax (cloudformation, ansible, hcl2).

Why not treat the config file as a generated artifact that is readable but not written by a human. Tools like jsonnet (https://jsonnet.org/), dhall-lang (https://dhall-lang.org/), or my personal side project ucg (https://ucg.marzhillstudios.com/) allow you reuse values and safely generate your configuration formats with comments and logic all while bypassing the pitfalls of most formats.

They give you a purpose built language for creating configurations with comments and logic w/o putting that burden on your application later. And it allows you to ignore the parts of a particular format that cause you problems.


Create a good config file format and transpile that to a bad config file format for use in an application? I dunno I would rather just have the application use a decent config file format in the first place.

It's just a config file, no need to overthink this with translation layers IMO.


No good config file format will allow you to share common shared values across all of the formats. That database host everything uses? You'll have to copy paste it everywhere or use a template engine to generate all of the formats.

If you only have one application then it's overkill. When you have 10's of applications it's a life saver.


Good point. However hopefully whatever brilliant file format the source of truth is in will eventually end up being able to be consumed natively by the other applications and we can forgo the translation layer. Until then yes you have a good use case there.

> I'm increasingly of the opinion that config files formats should be treated as a compile target not a file that you edit.

I think this makes sense as far as it goes.

The first choice, no matter what the config file format, should be editing within a tool. The tool can check values, be accessible, assist in coordination, etc.

However, I think it's essential that the intermediate format be human readable and have the ability to contain comments. Because if something in the toolchain goes wrong, determining what the settings are right now on a particular system will be very useful. Being able to modify those settings by hand, though not your first choice, may save time and money and (depending on the system) lives.

Also, if the tool breaks, you can still change configs (though obviously this is not what you want).

Essentially, you should have a tool that edits config values with full explanations of their purpose. That tool should output a machine-readable format (JSON5, YAML, TOML) that supports comments with each value preceded with the comments containing their explanations and, ideally, a list of comments made when changing config values with previous values listed.

You get the best of "both worlds."


There are cases when JSON is really unreadable. Multi line strings, strings with lots of escaping, etc. There are a lot of other formats that solve these problems (eg YAML, which certainly has other unrelated problems).

There are trade offs with the various formats. Think about your use cases and pick the one that fits them the best.


> So what's the author's proposed alternative?

I am the author. It depends on your purpose; in general, I would say "the simplest solution that meets all needs", where "all needs" depends on what you're doing. In Python, this could be the configparser module, or parsing a Python file (again, depending on requirements).

As a matter of taste, I prefer minimal syntax. I wrote a library with this syntax a while ago[1], although it's clearly not the best solution for every case.

I should probably write a separate article about this.

[1]: https://github.com/carpetsmoker/sconfig#what-does-it-look-li...

> Lack of comments: not an issue with JSON5.

JSON5 isn't JSON though. It's JSON5. None of the standard JSON parsers can deal with JSON5, you need a different JSON5 parser.

> I think it's pretty readable

A matter of taste.

> just edit it with an editor that points at the errors quickly

It seems to me that having to rely on such tools for issues such as this is not a great idea, especially not if it's easy to do better.

> I can't think of a possible alternative format where escaping isn't required at all.

There is a difference between "escaping isn't required at all" vs. "having to constantly and awkwardly escape a commonly used character".

> Lack of programmability [..] VS Code seem to be doing fine without this.

I've not used VSCode, As I'm a Vim guy. I can't imagine configuring Vim with just declarative configuration as so many useful parts come from being able to program Vim. This could be separated out to just an "extension" or "plugin" system, but I really like being able to just stick one or two lines of code in my vimrc and have it do something really useful, without writing a full plugin.


for my tastes, sconfig is the opposite of human readable. IMO, JSON is good enough for the average cases of configuration files.

I'm curious why you would consider it to be "the opposite of human readable"? It has a minimal of syntax to parse, so at least in my view that should make it easier, rather than harder?

So many options out there, just look for config formats on github.

TOML is my personal favorite.


Agreed. I feel like everyone uses YAML, I hate it.

YAML feels like 10 steps back; no parser I've ever used has been able to accurately give me a line number for an error, white space issues make it impossible to work with without special editor features, and some of the types are just cryptic. I spend most of my time that's spent editing configs in vim over SSH and it can be challenging at times.

TOML resolves a lot of these issues in my experience but I'd still argue that by virtue of its capability, is still quite complex when trying to define configs for new applications.


And TOML does not have the security code execution vulnerability that is in YAML.

I once tried to use YAML as my format of choice. Then I read the spec. YAML: Not even once.

> TOML resolves a lot of these issues

The only complaint is that the name TOML isn't as cool as other formats.


Came across a pain point with TOML yesterday in use.

When trying out a CLI I've been writing recently, this time on Windows... the TOML config file (generated on first run) wasn't usable.

The problem turned out to be TOML not handling single slashes "\" in strings, instead treating them as escape characters.

Annoying as on Windows the paths are (eg): "C:\Program Files\stuff". Which then needs changing to "C:\\Program Files\\stuff" to work ok.

I can see both good and bad points of TOML's string handling there. But it's annoying it needs a workaround for one of the signature aspects (slash vs backslash) of windows. Especially when paths are very common in config files, regardless of OS. :/


Just use 'literal strings' instead of "basic strings".

TOML has two types of strings, each of which can be single line or multi line [0]. Basic strings allow escape characters, while literal strings do not (so there's no way to represent an apostrophe in a single line literal string). Multi line strings are like Markdown's code blocks:

    """
    This is a multi-
    line basic string.
    C:\\Program Files\\stuff
    """
    
    '''
    This is a multi-
    line literal string.
    C:\Program Files\stuff
    '''
[0] https://github.com/toml-lang/toml/blob/master/README.md#stri...

Thanks heaps, that looks like exactly the right solution. :)

Definitely

In cases where the file is written and read by the program, without the expectation that the user will typically edit or read the file by hand, sqlite is something that should be considered.

A sqlite file isn't an ascii text file that you can edit with vi, so that suggestion is likely to get a lot of jeering, but it's a serious suggestion. It's already being used for configuration on all mobile platforms. There exists a wide range of applications, particularly for windows users, where the common user is not a developer and is not meant to be editing configuration by hand.

Furthermore, access to the sqlite3 binary is very nearly as widespread as access to a vi binary. And self-styled developers who don't know SQL aside, it actually is a rather convenient format to edit by hand. Machine transformation of the config is obviously straight forward, and sqlite databases tend to be self-documenting anyway. Sane table and row names go a long way (obfuscated tables and rows would be roughly equivalent to obfuscated keys in an ini file...) and if that's not enough, comments for tables and rows can be stored in the schema.

The one downside is strictness; sqlite doesn't enforce types. It does however have data constraints, which help.


https://medium.com/@robmuh/yaml-has-won-ba5dae37e740

Seems like YAML won.

But TOML seems to be heavily used in Rust.

Also there is SANE.


I don't care if it has "won" in a few places, I'll be using TOML and mindshare can change who has "won" later.

In the Scala world, HOCON addresses a lot of these (except programmability, which I think is probably a bad idea).

The only config system I've worked on that ever felt even vaguely satisfying was built around the excellent Typesafe (now Lightbend) HOCON implementation. The fact that HOCON is a superset of both JSON and properties formats made adoption in a large org feasible since it could "just work" with the many already existing config files in those formats. While generalized programmability has obvious dangers, HOCON's ability to do variable substitution was super useful for the large amounts of config that was supposed to follow a convention, e.g. hostnames based on a template like

  ${service}.${env}.example.com


Whole bunch of other ports into different languages too: https://github.com/lightbend/config#other-apis-wrappers-port...

Thanks for pointing me to that - I'm working in Scala on server-side and JS on browser-side for a couple of years now, and sharing config would be nice. Still need object merging and value concatenation, which hocon-js doesn't support, but I'll keep an eye on it!


It has a few features that feel like programmability but really aren't, and I think these address most of the use cases that people want a Turing complete language for.

I have a proposal that I'm sure will make EVERYBODY happy by combining the best of both worlds!

How about "J4X" (JSON for XML), that lets you embed XML in JSON the same way the ever-popular "E4X" (ECMAScript for XML) lets you embed XML in ECMAScript?

https://developer.mozilla.org/en-US/docs/Archive/Web/E4X

J4X Example:

    <!DOCTYPE j4x PUBLIC "-//W3C//DTD J4X 1.0 Transitional//EN" "http://www.w3.org/TR/j4x/DTD/j4x-1.0-transitional.dtd">
    {
        april: <fools late="13 days"/> <!-- sorry! -->
    }

>Escaping quotes might be annoying, but I can't think of a possible alternative format where escaping isn't required at all.

One nifty way of doing this is custom quote sequence.

In C++ it works like R"delimiter( raw_characters )delimiter"

As delimiter can be anything it can be chosen not to conflict with the payload.

Bash has had the same trick for here documents for a long while.


The alternative is javascript. Javascript gives you json with comments and programmability. And where possible .env.

A lightweight ini reader in js I wrote a few years back: https://github.com/cmroanirgo/jsinf

Yes - I have seen at some jobs what a horror "configuration files" can be if you let people add programmability into them.

Many years ago I worked at a job that made an XML tag-based language that was turing complete with the intent of it making their configuration "smarter". It ended up making the entire project much more difficult to debug, trace through, and even understand at all.


The lack of comments is a problem. The rest seems whiney to me, but sometimes I need to leave notes for other developers as reminders.

I formalize comments and keep them themselves as data. I don't use JSON for this, but it'd work out as a perfect solution here. There's also good reasons to do it this way. For instance the formalized comments can work as a tool tip for a user configuration UI. Another reason is to do something like automatically produce source code from the config file with the comments in a format such that they work with intellisense. The produced source code now also can trivially reproduce a config file with appropriate defaults. Lots of perks.

Wow, just looked at JSON5 and it's everything I could've ever hoped for! I remember distinctly looking for a JSON alternative that supports comments, trailing , and single quotes. This does all that and more. Now how long until it becomes standard (supported by python's default json library, browser's JSON.parse, etc).

Somehow not widely spread. I can't believe json was standardized with all those shortcomings.

Perhaps HCL[0] is worth a look. It has comments and no need to quote keys. I ported it to C++ and Lua for a project and it gets the job done for non-dynamic configs.

[0] https://github.com/hashicorp/hcl


Take a look at HJSON.

I agree that HJSON is better for config files than JSON5, because of its more relaxed syntax requirements that making adding and editing values quicker.

Link for HJSON: https://hjson.org/

Compare to JSON5: https://json5.org/


> a possible alternative format where escaping isn't required at all.

HEREDOC-like format:

#qweqwe123" string where everything is unquoted and is terminated by a " followed by qweqwe123. you can chose a short relevant sequence of alphanumeric chars as needed"qweqwe123#


I thought he would suggest yml but he didn't. yml is good but, personally I don't like it. I find json file easier to read yml. It might be because of years of experience with json.

EDIT: moved this comment to the root of the discussion https://news.ycombinator.com/item?id=19655249

Of course, JSON5 didn’t come out until 2017, and this was written in 2016.

In my opinion, a statically typed config file would be preferable.


Yes I like Dhall, I also had a go at my own one a while ago: https://github.com/willtim/Expresso

It's like every other article I read these days takes a strong opinionated stance and then gives decent but not overwhelming support for it.

    Please don’t. Ever. Not even once. It’s a really bad idea.
Followed by a list of reasons of which lack of comments is the only one with any universal weight.

What's even worse is the lack of a proposed alternative. These days I'm getting sick and tired of "X is bad and you should not use it" articles with X being something popular that everybody uses/does, without proposing a decent alternative. Tell me what I should do, not what I shouldn't do. The first one is very valuable, the second is just giving yourself a tap on the shoulder.

As the author, the reason I removed some proposed alternatives is because people kept emailing me with stuff like "but what about X?" where "X" was something I never heard of, or was only superficially familiar with.

You make it sound like it's easy to just "propose an alternative", but it's not really. First off, some problems aren't necessarily obvious from just reading a few examples. YAML is a good example of this with a lot of subtle behaviour that can really trip people up. This isn't really obvious form just a glance, and required an in-depth review. So doing a full in-depth review of all alternatives is much harder than it might seem.

Secondly, there is no single right answer. It depends on what you're doing, the language and environment you're using, social factors (e.g. if all tools in your company already use XML, then it's probably a good idea to stick with that), and probably a bunch of other factors.

Alternatives that are better are also so plentiful that quite frankly, listing all possible alternatives strikes me as rather redundant. YAML, TOML, JSON5, or even eval(), and many more, all are decent alternatives for many different scenarios, although obviously not all.

That JSON is popular doesn't mean it's a good idea. I think there is a simple reason that it's so popular: for quite a few programming languages it's the only file format that's supported in the stdlib, and people are already kinda familiar with it. This is not unreasonable choice at face value, but at the same time a lot of people don't always see the downsides at face value.

"Tell me what I should do, not what I shouldn't do" sounds like a half-wisdom. Sure, it's probably more useful. But at the same time, pointing out possible errors people haven't thought of isn't useless. I've actually received quite a few emails over the years with the gist of "I wanted to use JSON for my program, but then I read your article and decided it was a bad idea so I used YAML instead, thanks!"; so clearly, it's useful for some.


That’s a pretty silly stance. If I tell you not to touch a hot stove do you expect me to give you instructions on what to touch instead? And if I don’t are you going to burn your hand to spite me? “Well if you don’t have a better idea I’m just going to keep burning my hand”

Well if you need to do X, then yes, you need an alternative. The alternative of not doing anything is worse in the case (or at least that premise seems to be accepted by the author). So if you say do X but don't use tool Y, then you need the alternative.

Your stove metaphor breaks because the alternative of doing nothing have no consequences.


If you’re a professional you get paid to make these decisions. You don’t need a blog to do it for you.

Yes, because doing any research on anything is not what professionals do, they just do everything themselves...

Blogs are meant to offer insight and arguments, and all the top level comment is saying is that this blog did a bad job in that respect.


Advice on what not to do IS an insight. And there's an argument for why not. The idea that you have to give an alternative in order to not be "negative" or whatever when there's not a one-size-fits-all piece of prescriptive advice is just dumb.

You don't have to offer alternatives but it does improve the piece. No one said it had to be one-size-fits-all, there's plenty of room for nuance.

Shows how people just want to stand out from the average with elitism without decent back up facts.

It's almost like if different config languages have their pros and cons, and each can be useful in different use cases.

I do wish though that JSON5 would become more widespread, as that basically solves every downside of JSON.


Agreed - really folks, it's not going to be the end of the world if you use JSON for your configuration. Or XML. Or comma delimited text files. As long as you agree to it and you have it documented somewhere, I promise your codebase won't suddenly set on fire because of it.

Now, if you add programmability to your configuration files... that is asking for someone to abuse it and turn a simple configuration file into an abomination. I've definitely seen that happen.


Maybe you should be reading peer reviewed research papers instead of blogs.

Website is really wigging out on iOS.

Back on topic, first we’re not supposed to use text files. Then, no XML files. Then, don’t use YAML. Now, we’re supposed to stop using JSON.

At some point you realize the problem isn’t with the tech but with the programmer. We have this need to constantly reinvent the wheel all the time. It’s silly.


Exactly. Also, OP does not explain the need for comments in a config file. I mean, do people consider Apache config files -- which extensively use comments -- to be a shining example of clarity? Hundreds of lines long, impossible to parse manually. Good luck finding those 3 lines you need.

OP's argument is silly. JSON files are fine for config.


> I mean, do people consider Apache config files -- which extensively use comments -- to be a shining example of clarity? Hundreds of lines long, impossible to parse manually.

You're conflating two things, the readability of very large config files, and the readability of config files that allow comments.

> Good luck finding those 3 lines you need.

Personally, since I live in a world where searching text in a file is commonplace and trivial, additional text to match, possibly even comments that note where the config differs from the stock one that ships, are useful.


The comments of these config files are part of what make them unreadable, these are not two unrelated issues.

Yes, searching Apache config is pretty much the only reasonable way of finding what you need. But have we really gotten to the point where we passively accept gargantuan, unreadable config files because it's possible to search for the exact pattern you need?

What if there's a problem in your config and you need to find the issue? Prepare to set aside a few hours of your time to read every block.


I just looked at some random configuration file on my system. Right at the top, in the comments was the following:

    # Please don't edit this file directly (it is updated with every Red Hat
    # Linux update, overwriting your changes). Instead, edit /etc/...
Nice. I now know to make changes in a file that won't be overwritten by an update, and if there are issues, it's either with the site local file, or the one provided by Red Hat, and it's probably (although not exclusively) the site local file that is in error.

Have you never piped a config file through grep to exclude comments to get a streamlined view? It's pretty easy, and quickly gives you the best of both worlds. Configure your editor of choice to collapse comments and that's an even more convenient interface.

For stock configs that ship with a lot of comments, I would sometimes replace the file with one where I've stripped the comments. Since I always save the stock config as $FILE.dist, I get maximum usability. Never would I want the configs to no support comments though. Even after I've stripped configs, I might add my own comment to each change from the stock config, or for some test change I'm trying out. Not having comments means any documentation would need to be at least one degree removed from the item it's documenting. Sometimes that's acceptable, sometimes it's not.


OP does not explain the need for comments in a config file

The same reasons you'd want comments anywhere else. As one example, often in a package.json file I want to add a comment that a dependency is intentionally using an older version due to an incompatibility. The workaround is to use a key of "//", which is hideous: https://github.com/npm/npm/issues/4482


Agreed that workaround are hideous, but no, it's not immediately obvious to me we need configuration data (which should be self-explanatory) glossed with comments.

Username? Got it. Password? OK. Hostname? Sure. Why do these things need comments?

If there's subtleties to the config, put that in the README. Or give better names to variables.

Functions are not always self-explanatory and get complicated fast, so comments in programming make a lot of sense and are helpful. A config file in theory should just be a series of assignment variables-- it's not clear to me why these require detailed explanations.


Comments in config are useful in the same way that comments in code are useful --- to point out things that are non-obvious to other people who work on the same config.

Things like listen on port 8081 instead of port 80, because the loadbalancer does something weird. Or this strange config is important because of feature X and blah. Or if you change this here, be sure to change it there.


To add to what 'toast0 wrote - code is about what; comments are about why. Even if your program has config so trivial that they're clearly data and not code, data alone doesn't tell you why it looks the way it looks. "username=admin" doesn't need explanation about what it was, but it might need an explanation if "admin" isn't what's typically expected as user name here, and is e.g. your specific workaround for something.

Apache conf files in practice suffer from people editing from the 'example' file to what they want and leaving all the embedded documentation.

When you build a file from scratch, or delete the embedded documentation as you go, you end up with reasonably compact configuration.

Additionally, apache has like thousands of options because it's software built to adapt to the world, and not software built for the world to adapt to it, so that adds to length.


> Also, OP does not explain the need for comments in a config file.

He does explain, there is even an example.


No, he complains that comments are not possible and the example he references is about the workaround some people use but he accepts a priori the need for comments in a config file. Why do I need explainers in my config file?

How about putting instructions in the instruction file (README) and configuration data in the config file?


I just checked, and I have over 100 config files in /etc. Does that mean I need to have an additional 100 README files?

You can die from vitamin poisoning. Doesn't mean that vitamins are bad. Similarly, too many comments are obviously not a good idea, but that doesn't mean you never need them.

There are many cases you need an occasional comment: to explain why a possible surprising value is set to what it is, for example. Or to warn future editors for a mistake you made in the past, or even to just describe what a potentially confusing setting does.


Comments in most /etc files on Linux are fantastic - you can skim the config file and often find what you need, and you often know what is irrelevant, and the comments often warn you against common failures.

Just mmap C structs to files and give people the headers!


mmapping and protocol buffers are in no way comparable... You have the downsides of a predefined schema and the downsides of wasting CPU time on serializing/deserializing. What you're looking for is flatbuffers or cap'n proto which can be mmapped directly while staying independent from the ABI of the compiler.

Thanks for the links, though I stand by my snark

I still have nightmares from back when this was standard practice.

What if as time goes on and memory and computation get cheaper, we find better ways to do things and it would be good if the industry thought about that at a technical level rather than a business level.

Huh? This “we” and the supposed “them” doesn’t seem well defined outside your mind. Nobody said don’t use JSON, he said don’t use it for config... for very specific reasons. Just use something designed for storing configuration data.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: