I think this article has done a great job enumerating trends that show JSON is beating XML for data serialization applications. I think these points are evidence of a shift in thinking, but not the reason for shift itself.
Why JSON over XML? Because people need data serialization format and XML is a Markup Language. JSON is gaining widespread adoption for data serialization applications since it's the correct tool. XML isn't.
In a markup language there is an underlying text that you're annotating with machine readable tags. Most data interchange doesn't have an underlying text -- you can't strip away the tags and expect it to be understandable. If you're writing a web page, that has to be read by humans and interpreted by machines... you need a markup language.
By contrast, data interchange is about moving arbitrary data structures between processes and/or languages. JSON's information model fits this model perfectly: its nested map/list/scalar is simple & powerful. As for typing, it found a sweet spot with text/numeric/boolean.
JSON is the right tool for the data serialization problem.
This makes sense, but from what I can tell, in virtually no major XML-based systems is the basis for XML files an underlying text extended with markup. Most XML systems, since the dawn of XML, have been top-to-bottom structured data.
You're correct that almost no XML formats, with exception of XHTML, have an underlying text. This is the core problem, since the underlying information model inherited SGML presumes one. XML brings with it significant overhead for dealing with textual data, and most data isn't textual. Why is there an element vs tag debate? Wrong information model -- it's a difference without a distinction when you're doing data serialization. This is why XML was the shoe that never quite fit and why JSON will displace it so easily.
In 98/99, the XML bandwagon was something no one wanted to miss, it was the Web 2.0 and everyone knew it was the future. It was Java/WORA ("Write Once Run Anywhere") for data interchange and promised that you wouldn't be locked into a proprietary application. The marketing hype was simply outstanding. Even for technical people that hated XML itself, the promise of open formats was something you couldn't ignore and had to support even if you had to hold your nose. Open formats have since won -- holding your nose isn't needed anymore.
Now that the marketing hype of XML doesn't shut down the technical debate... JSON will soon dominate for data serialization tasks.
But there's also the stack. XML has XSD for validation and documentation; XSLT and XQuery for transformation; and most people seem to like XPath. The overwhelming response to analogues for JSON is horror - don't pollute our simplicity! - and acknowledgement that while some tasks do indeed need these features, the XML stack already has them. The corruption of XML is what keeps JSON clean.
In SGML land, you wouldn't have implemented SVG using tags, you would have created a NOTATION with syntax specific to the problem at hand. In XML land, everything is XML; for example, schemas are XML (in SGML they are DTDs, that are _not_ SGML), transforms are in XML (in SGML, they are DSSSL, a lisp variant). The XML approach is one-size-fits-all, not the use-the-best-tool/syntax for the job.
For me the single greatest selling point of JSON is that it's just so danged easy to go from json string to a usable map/list/dictionary in every language. Most of the time you can get from A to B in one or two lines of code.
XML always seemed like such a struggle by comparison. Figuring out which parser(s) you've got installed, figuring out their respective APIs -- it felt like total overkill. The only way I could be productive with XML was using Python's ElementTree API because it was so simple.
Some day I'll need my data to be checked against a complicated schema. But until that day arrives, I'm sticking with JSON.
XML is a smooth fit on strongly typed languages. You can easily translate an exact type into a corresponding XML encoding and know the type of what you're getting out on the other end. JSON on the other hand is duck typing in web service form. You can shove any data structure in on one end, and get it back out the other end, without writing any custom code, and without actually knowing the type of the data you've sent. You could say that JSON itself is weakly typed.
The popularity of JSON is tied to the popularity of weak typing. You can more rapidly iterate your API design and codebase without those bothersome types getting in the way. The flip side of that is the end result isn't "done done". It lacks full validation of input and it lacks complete documentation. In short it's more difficult to use and more prone to bugs and security issues. I suspect that if you compare "done done" API's JSON and SOAP are probably equally productive.
Having said that, I use JSON myself. It's too easy to get going in.
"XML is a smooth fit on strongly typed languages. You can easily translate an exact type into a corresponding XML encoding and know the type of what you're getting out on the other end."
This is a characteristic of the encoding and decoding layer, not the data format. Haskell's aeson library  is a JSON serialization library that is perfectly well strongly typed. And yes, that's strongly typed with your local domain datatypes and a relatively-easy-to-specify conversion back and forth, not merely strongly typed by virtue of having a "JSONString" type here and a "JSONNum" type there.
That's an impressively succinct way of mapping types to JSON, but it's still a mapping. There's one step for the developer between obtaining the JSON and using its data. In weakly typed languages there is no such step, the JSON data is the object you interact with in your business logic.
There's always a serialization step. The type of the resulting data is a consequence of the serialization technique, not the data format. I demonstrated the part you seemed to most strongly claim didn't exist, JSON <-> strong typing, but I can show you "weakly-typed" XML too. In addition to the DOM, which is a standardized "weak type" XML representation, you also have things like ElementTree http://effbot.org/zone/element-index.htm .
(Also, I scarequote all my "weakly typed" because the term is basically ill-defined. I'm coming around to prefer "sloppy type", which is a language where all values are perfectly well strongly typed but the language and/or library is shot through with automatic coercions and/or extensive duck typing. A sloppy type language considers it a feature that a function may have a a value and not really know or care what it is.)
I think part of the reason "weakly typed" is ambiguous is because it's a bit pejorative, and "sloppy type" certainly isn't helping that. Maybe just "less typed" ? It really is an engineering tradeoff of how many assumptions you want to make explicit.
Let me state that for the record, I believe JSON is a fantastic data interchange format, especially when compared with the current state of XML.
However, the point you've touched on is exactly my gripe with JSON. I just might not know enough, which is completely adequate, but afaik all the JSON schemas are either extremely complicated (I'm looking at you json-schema) or way too simple (jschema).
When working with service oriented architectures and if you're following the principles of RESTful architecture, discoverability and HATEOAS become central to your service. That means that the API needs to be self-documenting.
How does one do this with JSON? Essentially, if you boil the problem down, what someone would try to accomplish is "marking up" their JSON responses/requests. The irony is hilarious because this is exactly the job that XML was designed for.
It's obvious that the XML ecosystem grew way of control exogenously, but the core concept was very simple and was designed to solve this exact problem which I think the JSON ecosystem currently lacks.
HATEOAS, discoverable apis, and the overengineered gunk that has today hijacked the name of REST (ironic, when much of the point of the original REST was to be not SOAP) remain pipe dreams. You can't write an API that clients that don't know your API can use, and I suspect it will take at least a hundred years until that becomes possible.
Certainly XML schema doesn't let you accomplish this. All I've ever seen it accomplish is telling you that a document doesn't conform to the schema, functionality that you can trivially achieve in JSON any number of ways (e.g. an API version field in the data).
There's no point trying to write a schema system without a use case that it can solve, and I've never seen such a thing.
Could someone who knows a lot about these things tell me why JSON took such a long time to arrive?
JSON, at its core, is essentially a hierarchy of maps and lists -- which seems a very intuitive and useful way to store data.
XML on the other hand has always baffled me with its attributes and the redundant and verbose tags (why do I need <tag attr="data">data</tag>?). I'm sure there was a good reason at the time for this, so perhaps someone can enlighten me.
What took the longest time was for a language to come out with key-value maps as the main core data structure, and a specialized literal syntax. Once that happened, it was relatively quick for that syntax to become a standardized interchange format for K-V data.
Lisp had assoc-lists, but those were a convention, not a specialized structure. Many languages had K-V maps as libraries, but not core structures, and most lacked literal syntax. Eventually most scripting languages starting getting them as native, and even having literal syntax, but they weren't the "go-to" data structure for doing things. In Python, for example, all of its objects are really just hash maps, but when you're working with them you pretend that they're objects and not hash maps, and you use lists more than maps anyways.
XML's popularity is an accident of history, due in part to the rise of HTML, which is also an accident of history.
> Lisp had assoc-lists, but those were a convention, not a specialized structure.
I'm not sure what you mean by this. What's the difference, syntactically, between a convention and a specialized structure?
XML is Lisp. In fact, the XML grammar and Lisp's grammar are (almost) homomorphic. SXML is a trivial mapping of XML to s-expressions which demonstrates this.
There's no point in comparing XML and S-expressions like that; they're essentially the same thing!
If you're talking about internal representation, well, that's up to the compiler. But since you have to declare the format either explicitly or by context, there's no 'advantage' of XML over s-expressions.
 To be pedantic, XML is homomorphic to SXML, which is a subset of the Lisp grammar, but that just means that Lisp recognizes some strings that aren't in the XML grammar, so if anything, Lisp is more powerful, but that's beside the point.
> I'm not sure what you mean by this. What's the difference, syntactically, between a convention and a specialized structure?
I agree that XML and Lisp grammars are basically interchangeable. My comment was answering a question about the emergence of JSON, and my comments about assoc-lists were only in relation to JSON, not XML.
Honestly, JSON isn't really much of an improvement over a technology that's been around since 1958, i.e. s-expressions. JSON is just the flavor of the month - I know people who dislike it because it loses some of the power of XML (XSLT, attributes, etc.)
In the end, I think it's just subjective. All of the above formats are equally capable of representing the same data.
XML looked like HTML at a time when the web was the "next big thing". Like, not a technology on the web, or social media, HTML itself was this big revelation. So a solution that looks like HTML has a leg up.
Then, once there were mature parsers in a lot of languages, server software configured by it, etc, XML had some inertia that takes time to displace.
I've actually always liked the idea of XML at it's core- attributes and so on often make data structures easier to understand (just look at HTML), but namespacing and all that junk ruined the whole thing.
The only reason I still use XML every now and then is XPath. There are third-party alternatives for JSON, but XPath is ubiquitous.
From a big data perspective, I'm pretty sure people were making do with CSV files before JSON came along. I think most practitioners would not subject themselves to stupid, stupid XML unless they really had to.
Well-written XML that was designed for humans instead of machines is much, much more easier to read than JSON. The primary reason is that unlike s-expressions or xml, there is no block-name. In JSON you loose valuable time figuring out the block context in a hierarchy since this isn't labelled.
The only kind of JSON that is readable is flat JSON that is nested to a maximum of 1 level.
JSON is not a silver-bullet. Actually I think JSON-only APIs suck -- an API should have an equivalent XML alternative as well. Let me explain.
There are established industries such as publishing that use complex XML workflows -- I don't think JSON will push them out.
XML family so far has much better standard specifications and tool support. Some of the most useful are XPath and XSLT. There are also advanced features -- too complex for some, useful for others -- like namespaces and schemas. If JSON is to expand its use, it will have to go to the same interoperability issues XML addressed, and develop similar features with similar problems. That's why the idea of JSON schemas sounds funny to me.
Let me give an example. I've developed a semantic tool that lets me import 3rd part API data as RDF. If it is available in XML, I can apply a GRDDL (basically XSLT) transformation to get RDF/XML -- and boom, it's there. RDF/XML serves as the bridge format between XML and RDF.
Now if the data is JSON-only, what do I do? I could download an API client, try to write some Java or PHP code, but that would be much less generic and extensible than XSLT.
My point is, by offering JSON-only you cut off all the useful tools from the XML world, which is pretty well established.
One of my biggest issues with JSON is it's a lot harder to generate valid JSON as a stream. Granted this may be an esoteric use case, but the quoting rules and type representations seem to require some amount of look ahead which isn't fun when generating that stream.
I think JSON is more popular than XML for a lot of things simply because it's so much simpler to interact with. No querying attributes, elements, elements inside elements, text inside elements, etc. You just look up the value attached to a key, or look up an index in an array, and that's it. It's simple every level down. And it's also simple to construct.