instead, let me give you my unsolicited opinion:
if you need to represent both the structure of your data and characteristics within that structure, xml is great because attributes are a really good way to do that. there's a reason most UIs are represented as XML.
if your data is just - well, data - use json. or better yet use edn
As for why UIs are represented in XML? It seems like a simple case of people getting used to it before JSON was popular. XML is a default choice for Java and for C#. It also looks vaguely familiar to HTML.
Regarding the legacy thing, I think it's both. If you look at a lot of new tooling coming out that has to store UI state, people are still using XML
In effect it doesn't matter than the encoding is arbitrary edn (e.g. a 150KB png could be a solitary bigint literal) because the tags prevent false-positive decodes and keep things "strongly typed".
There are many, many real life situations where a developer needs to choose between using JSON, XML or YAML to store their configuration data, message formats etc. Simply stating that JSON is good only in one scenario and XML must be used in every other is over-simplification.
On the other hand, XML can make error handling much easier by schema-validating received documents and thus rejecting a large class of invalid inputs at an early stage. This is particularly helpful if you're making something interoperable, like a public API.
So even when you're just using a library, there are ramifications to the decision.
There's also a bitter taste in many of our mouths left by the "XML All The Things!" crowd that brought us such monstrosities as Maven config files and XLST.
XLST the syntax is an abomination. Trying to shoehorn a language syntax into XML was a stupid, stupid idea.
> JSON was not designed to have such features [as XPath, XML Schema, XSL, etc.]
But this claim is not justified, just stated. And the supposed inferiority of eg. JSONPath vs XPath is not justified either.
I would actually claim that JSONPath is superior to XPath. JSONPath is much simpler, easier to understand, easier to implement, and still fulfills the most common use cases. Also JSONPath can be evaluated on a streaming input, which is not possible for XPath in general (maybe some subset of XPath queries could support streaming).
Unfortunately, a lot of programming languages have poor support for XML, and standard libraries usually only give you XPath 1.0 compatibility. XPath 2.0 and XQuery 3.0/3.1 are far more powerful and flexible, but you need Saxon or a good XML database to make proper use of them.
Right tool for the job. JSON is a great 80% encoding format, which handles most cases. The problem is that folks try to make it handle all cases (ironically, just like they did with XML).
XML is literally too expressive, and you can't tell how to deserialize xml with a good fragment... json you can (at least better)... the query/expression syntax is even worse than learning/using a simple general purpose programming language.
That said, you could very well argue its additional complexity is often not worth the gain in most applications.
Of course, you can also define your own extensibility/interoperability conventions with any data serialization format, but the point is XML has it baked in the standard and already provides an accepted way of doing things that everyone has implemented.
It's both very insightful and enjoyable to read.
But people who are serious about XML-related technologies understand that syntax of XML is mostly fine, interoperability and existing tools matter way more. It would be hard for such an alternative syntax to catch on. People often forget that it is a direct descendant of SGML, I imagine for similar reasons - there were existing tools for it.
If that was entirely true there'd be no JSON.
What really happens there is called Impedance Mismatch
Similar to this, but with XML
Remember how Scala had XML as a first class citizen in the language? It didn't work out that well.
There are different programming paradigms and I would (entirely serious here) say that XSLT, XQuery, XProc and related XML technologies belong to a separate category. It's mostly declarative and pure, but there are some imperative elements here and there. It's just a different way of doing things, which has XML and XPath at the very center.
> Remember how Scala had XML as a first class citizen in the language? It didn't work out that well.
Funny how well having (sort-of) xml as first class citizen works in React ecosystem (in HTML too, to some degree). Easily composable bite sized chunks without over-engineered namespaces, doctypes and such. If only xml designers thought of their creation as pieces of flexible, composable hierarchical data not as documents.
> It's mostly declarative and pure, but there are some imperative elements here and there.
Yes. And despite all that it gets almost no love. My point is that this is mainly because of syntax (a thing that is considered by many, superficial and unimportant).
JSON is a first class citizen in JS, XML isn't and never was. E4X never was properly supported by browsers.
Namespaces and doctypes are not mandatory.
Merits of a language and/or framework has little to do with it's popularity. There are much better languages/frameworks than Go or Angular, for example, but that doesn't make them popular.
I'm not overthinking it, it's not my idea in the first place.
> If only xml designers thought of their creation as pieces of flexible, composable hierarchical data not as documents.
I can't agree with that. Each JSON payload is a tree with three kinds of nodes. Lists, associative arrays and values (of few flexible types).
I'm not sure what you mean by set of objects because JSON doesn't have set type. You can have something like set if you use an array but disregard order or use associative array but disregard types. Also the objects can contain objects. That's what makes it actually a tree.
> But what do you use to work with XML(or HTML) as a tree? You use DOM, OO API. And DOM sucks.
> Namespaces and doctypes are not mandatory.
Probably just my personal experience but some implementations of xml interoperability didn't allow me to specify default namespace for my xml tags which made me add bogus namespace identifier and prefix all my tags with that.
Namespaces are optional but that optionality is not stated clearly enough in the spec and some apis don't allow you to forget about namespaces.
> Nothing was stopping you from using XML the way it used in React.
Just lack of good syntax/api.
> I'm not overthinking it, it's not my idea in the first place.
I was referring to your argument (as I understood it) that there are some deep mismatches between JSON and XML akin to object relational impedance mismatch.
It's not that JSON is objects and XML is data. Both are just trees of values.
My point is that the difference is just syntax but I'd like to backpedal from that now. JSON vs XML is not just syntax and lack of XML oddities. It's also a good deal of "less is more". Now I think JSON is XML but with one obvious way to express sequence and one obvious way to label the data for access.
Anyways. I think we can agree to disagree and finish at that.
JSON is easier to use, that's why it displaced XML in browsers. But if there was native or close to native (WebAssembly?) support of things like XSLT/XQuery 3.0 and XForms 2.0, then I would switch away from JS and its framework zoo in a heartbeat.
> deep mismatches between JSON and XML
not between JSON and XML, google "X/O Impedance Mismatch"
Attributes and namespaces: these can sort of be faked, but fair enough. But then you get discussions on what should be an attribute and what shouldn't...
Schema: pretty sure this exists for JSON
XSL: Ah, the poor man's Lisp macros... And, again, easier to do in code in a scripting language.
That's not exactly true. There are books in the library. Here's how I get some book's main character's name: library['mainCharacters']['name']. Except I want to know all main character names in all books. This is a clear, simple request, there's no need for me to put 2 cycles here if I don't have to. So I would end implementing my own XPath here anyway, to write declarative stuff declaratively.
But then again, I don't have to, because JsonPath already exists and is just fine.
> Here's how I get some book's main character's name: library['mainCharacters']['name']. Except I want to know all main character names in all books.
library.flatMap(x => x.mainCharacters).map(x => x.name);
Using higher-order functions (and a functional programming style) it's still declarative. Yes, there are still nested cycles but, just like in JSONPath, they're hidden in the abstraction.
Plus, not using a DSL but a fully-fledged programming/scripting language, you can store intermediate results, make more complex queries, etc. which is more often than not what you need.
I've used JSONPath mostly in Bash scripts to quickly hack some JSON manipulation, but I'm slowly transitioning into using more powerful scripting languages and not using it anymore.
I'm not really that familiar with JSONPath though. Are there any use cases where it's really convenient?
[character['name'] for book in library for character in book['mainCharacters']]
First of all, syntax is quite different from what you use (Scala, I suppose?), when were we using JSONPath we could just copy 1 line from one project to another and that would be fine.
Moreover, both our implementations have the same problems: we assume every book has mainCharacters and every character has name. Would it be JSONPath — it doesn't matter, no 'mainCharacters' means path doesn't match the pattern, just skip it. In our cases this means exceptions.
And what if we want to get all 'name' fields from whatever object at whatever depth? Or 'name' of every object where 'color' is 'yellow'?
Now, if you consider dictionary structure much more nested (say, some AST) — processing that without errors would be quite painful. And you also would end up writing your own XPath (JSONPath), even with all your map's and reduce's.
Of course, should it get complicated enough we would end up needing to write something custom anyway, but stuff like JSONPath just helps to keep things simple when possible. That's it.
Off the top of my head, without testing it:
for characters in [
for book in library
for character in characters
XML is a bit more complicated, and often need dedicated libraries to manage. Every time I try to get the nth text element of node X with elementtree, it's bit of a hassle.
Rich metadata is therefore represented as child nodes, giving them similar child/sibling status as JSON children, and the semantic ambiguity of when to use XML attributes vs child nodes remains.
As for the tooling around XML, that's okay, but it's almost always overkill, and it almost always turns out the overhead of the tooling becomes a problem in-and-of-itself.
Anyway, JSON is actually a better data format.
In hindsight, the biggest advantage of XML over JSON was that it was painful enough to make schemas popular, a quality JSON is lacking. Unlike schema languages, which do exist.
To me, XML tooling lost quite a bit of its appeal when I realized that all the typing available via the various schema languages is completely lost to the world of xsl/xpath/xquery. I understand the reasons for that, but that does not provide much consolation.
XML is designed towards a type of problem that is not an everyday programming problem. It is designed for a full-fledged schema - it builds on the lessons of SGML and its predecessors such as GML, for people who need those things, which has historically meant "documentation writers". DocBook and DITA do not really have equals at what they do, which is semantic, textual content with rich markup. (Yes, you like TeX. But it focuses on presentation for typesetting, not semantic meaning.)
This means that in practice XML is really useful for describing an abstract, pre-tokenized syntax. This is a useful tool from a language design perspective; it lets you take an intermediate position between human-friendly and machine-friendly formats, without going straight to binary data or writing a full string parser. When computer language tooling emits an XML AST, they give tool-writers who would like to manipulate or inspect the AST a major leg up.
Simpler forms like sexps or JSON exact additional overheads on that problem that can be nearly as bad as just writing a custom string parser, once you get beyond the "strings, numbers, and simple containers" cases that are basically about data serialization, not data parsing. You want to have nodes that have unique names or attributes once you get into the parsing problem, but they are superfluous if you have plain old data. And as soon as you get into mixing different types of documents validation becomes a major concern and XML has the right groundwork for that.
It's just that most people don't want or need to deal with data of that complexity, especially since XML as a plaintext document just looks like angle-bracket trash. For those needs they are better off writing in something that a string parser can work with and then using XML as an intermediate, if at all. Even for the documentation-writing situation, it's easier to write Markdown for 98% of the prose and then convert it to add the last 2%.
Basically, XML has been used for far too many things that it shouldn't, and the blame for that lies on some 90's-era hype machines that decided that XML would be the buzzword of the future and pushed it into every technology. We got some nice tooling out of it, but in the end, it's still most useful for a certain kind of document markup.
RE AST, somehow Lisp managed to encode it in S-expressions since forever, and there was never a problem. In fact, writing Lisp code is mostly writing an AST directly. There is apparently no need for the additional syntactic sugar XML adds.
The one thing JSON misses that actually is important is symbol type. S-expressions have it, and at this point it's no longer "strings, numbers and basic containers" - per code=data, you have everything you need to encode an AST conveniently.
But XML has XPath, XMLSchema and XSLT and JSON doesn't
A similar document would look like this in XML
PS. XML is a markup language, if your data is a text document with markup then XML is a good choice, otherwise leave it alone.