

Why JSON will continue to push XML out of the picture - lucperkins
http://blog.appfog.com/why-json-will-continue-to-push-xml-out-of-the-picture/

======
clarkevans
I think this article has done a great job enumerating trends that show JSON is
beating XML for data serialization applications. I think these points are
evidence of a shift in thinking, but not the reason for shift itself.

Why JSON over XML? Because people need data serialization format and XML is a
Markup Language. JSON is gaining widespread adoption for data serialization
applications since it's the correct tool. XML isn't.

In a markup language there is an underlying text that you're annotating with
machine readable tags. Most data interchange doesn't have an underlying text
-- you can't strip away the tags and expect it to be understandable. If you're
writing a web page, that has to be read by humans and interpreted by
machines... you need a markup language.

By contrast, data interchange is about moving arbitrary data structures
between processes and/or languages. JSON's information model fits this model
perfectly: its nested map/list/scalar is simple & powerful. As for typing, it
found a sweet spot with text/numeric/boolean.

JSON is the right tool for the data serialization problem.

~~~
tptacek
This makes sense, but from what I can tell, in virtually no major XML-based
systems is the basis for XML files an underlying text extended with markup.
Most XML systems, since the dawn of XML, have been top-to-bottom structured
data.

~~~
bartonfink
I may be talking out of my ass here, but isn't OO-XML and whatever Microsoft
calls their new Office format exactly that?

~~~
tptacek
You're right, I forgot about the "proprietary" document formats that got
transformed into "open" (heh) XML formats. Good point. Thanks!

------
portmanteaufu
For me the single greatest selling point of JSON is that it's just so danged
easy to go from json string to a usable map/list/dictionary in every language.
Most of the time you can get from A to B in one or two lines of code.

XML always seemed like such a struggle by comparison. Figuring out which
parser(s) you've got installed, figuring out their respective APIs -- it felt
like total overkill. The only way I could be productive with XML was using
Python's ElementTree API because it was so simple.

Some day I'll need my data to be checked against a complicated schema. But
until that day arrives, I'm sticking with JSON.

~~~
Joeri
XML is a smooth fit on strongly typed languages. You can easily translate an
exact type into a corresponding XML encoding and know the type of what you're
getting out on the other end. JSON on the other hand is duck typing in web
service form. You can shove any data structure in on one end, and get it back
out the other end, without writing any custom code, and without actually
knowing the type of the data you've sent. You could say that JSON itself is
weakly typed.

The popularity of JSON is tied to the popularity of weak typing. You can more
rapidly iterate your API design and codebase without those bothersome types
getting in the way. The flip side of that is the end result isn't "done done".
It lacks full validation of input and it lacks complete documentation. In
short it's more difficult to use and more prone to bugs and security issues. I
suspect that if you compare "done done" API's JSON and SOAP are probably
equally productive.

Having said that, I use JSON myself. It's too easy to get going in.

~~~
jerf
"XML is a smooth fit on strongly typed languages. You can easily translate an
exact type into a corresponding XML encoding and know the type of what you're
getting out on the other end."

This is a characteristic of the encoding and decoding layer, not the data
format. Haskell's aeson library [1] is a JSON serialization library that is
perfectly well strongly typed. And yes, that's strongly typed with your local
domain datatypes and a relatively-easy-to-specify conversion back and forth,
not merely strongly typed by virtue of having a "JSONString" type here and a
"JSONNum" type there.

[1]:
[http://hackage.haskell.org/packages/archive/aeson/0.6.0.2/do...](http://hackage.haskell.org/packages/archive/aeson/0.6.0.2/doc/html/Data-
Aeson.html#g:4)

~~~
Joeri
That's an impressively succinct way of mapping types to JSON, but it's still a
mapping. There's one step for the developer between obtaining the JSON and
using its data. In weakly typed languages there is no such step, the JSON data
_is_ the object you interact with in your business logic.

~~~
jerf
There's always a serialization step. The type of the resulting data is a
consequence of the serialization technique, not the data format. I
demonstrated the part you seemed to most strongly claim didn't exist, JSON <->
strong typing, but I can show you "weakly-typed" XML too. In addition to the
DOM, which is a standardized "weak type" XML representation, you also have
things like ElementTree <http://effbot.org/zone/element-index.htm> .

It is the case that JSON has a simple default weak serialization in many
popular languages, and that it is a great deal of the reason for its
popularity, but it is worth pointing out this is a local effect in the
Javascript/Python/Perl/Ruby space, and that it hasn't got anything to do with
strongly or weakly typed but rather what the target languages shipped with.
There is no natural mapping for JSON in C++, C#, Erlang, Haskell, Prolog, SQL,
or a wide variety of other languages (and Erlang and SQL are both fairly
"weakly typed"), and even in JS/Python/Perl/Ruby there are some edge cases
that can bite you if you aren't careful about exactly what the
"just_decode_some_json_man()" is _really doing_ with Unicode and numbers that
may not fit into 32 bits.

(Also, I scarequote all my "weakly typed" because the term is basically ill-
defined. I'm coming around to prefer "sloppy type", which is a language where
all values are perfectly well strongly typed but the language and/or library
is shot through with automatic coercions and/or extensive duck typing. A
sloppy type language considers it a feature that a function may have a a value
and not really know or care what it is.)

~~~
mindslight
I think part of the reason "weakly typed" is ambiguous is because it's a bit
pejorative, and "sloppy type" certainly isn't helping that. Maybe just "less
typed" ? It really is an engineering tradeoff of how many assumptions you want
to make explicit.

------
Xcelerate
Could someone who knows a lot about these things tell me why JSON took such a
long time to arrive?

JSON, at its core, is essentially a hierarchy of maps and lists -- which seems
a very intuitive and useful way to store data.

XML on the other hand has always baffled me with its attributes and the
redundant and verbose tags (why do I need <tag attr="data">data</tag>?). I'm
sure there was a good reason at the time for this, so perhaps someone can
enlighten me.

~~~
lmkg
What took the longest time was for a language to come out with key-value maps
as the main core data structure, and a specialized literal syntax. Once that
happened, it was relatively quick for that syntax to become a standardized
interchange format for K-V data.

Lisp had assoc-lists, but those were a convention, not a specialized
structure. Many languages had K-V maps as libraries, but not core structures,
and most lacked literal syntax. Eventually most scripting languages starting
getting them as native, and even having literal syntax, but they weren't the
"go-to" data structure for doing things. In Python, for example, all of its
objects are really just hash maps, but when you're working with them you
pretend that they're objects and not hash maps, and you use lists more than
maps anyways.

JavaScript (and maybe Lua) was the first language to build itself around K-V
maps, so it was the first language where idiomatic usage included a lot of map
literals. Like Python, its objects were all really just maps, but unlike it
encouraged taking advantage of that fact. Also, because it was on the web,
there was a lot of need to be serializing data structures and passing them
around. Eventually someone realized "this is much better than XML!" and gave
it a name, and that's how we got where we are today.

XML's popularity is an accident of history, due in part to the rise of HTML,
which is also an accident of history.

~~~
chimeracoder
> Lisp had assoc-lists, but those were a convention, not a specialized
> structure.

I'm not sure what you mean by this. What's the difference, syntactically,
between a convention and a specialized structure?

XML _is_ Lisp. In fact, the XML grammar and Lisp's grammar are (almost)
homomorphic[1]. SXML is a trivial mapping of XML to s-expressions which
demonstrates this.

There's no point in comparing XML and S-expressions like that; they're
essentially the same thing!

If you're talking about internal representation, well, that's up to the
compiler. But since you have to declare the format either explicitly or by
context, there's no 'advantage' of XML over s-expressions.

[1] To be pedantic, XML is homomorphic to SXML, which is a subset of the Lisp
grammar, but that just means that Lisp recognizes some strings that aren't in
the XML grammar, so if anything, Lisp is more powerful, but that's beside the
point.

~~~
lmkg
> _I'm not sure what you mean by this. What's the difference, syntactically,
> between a convention and a specialized structure?_

The big difference is how people look at it, not what it really _is_.
JavaScript has a built-in key-value map data structure. An assoc-list isn't a
built-in data structure, it's a way of using a more primitive data structure
(lists). In particular, assoc-lists don't really _look_ different than normal
lists, so it's a slightly larger mental leap to think in terms of them.
Furthermore, Lisp doesn't use assoc-lists as often as JavaScript uses key-
value maps, preferring flat lists instead, so even if there were a specialized
reader-macro for assoc-lists it wouldn't have been as ubiquitous.

I agree that XML and Lisp grammars are basically interchangeable. My comment
was answering a question about the emergence of JSON, and my comments about
assoc-lists were only in relation to JSON, not XML.

------
lenkite
Well-written XML that was designed for humans instead of machines is much,
much more easier to read than JSON. The primary reason is that unlike
s-expressions or xml, there is no block-name. In JSON you loose valuable time
figuring out the block context in a hierarchy since this isn't labelled.

The only kind of JSON that is readable is flat JSON that is nested to a
maximum of 1 level.

~~~
digisign
Either format can be pretty printed. If you need signposts to figure out where
you are, it is a simple manner to add dictionary key names to things.

------
nemetroid
I don't foresee JSON ever replacing XML as a "full-blown successor" in the
contexts where XML actually is useful: for marking up documents.

As a general data storage format, XML is certainly going away.

~~~
TazeTSchnitzel
It might for non-text document structures, perhaps.

------
pumba_lt
JSON is not a silver-bullet. Actually I think JSON-only APIs suck -- an API
should have an equivalent XML alternative as well. Let me explain.

Web APIs are not only consumed by client-side Javascript-based AJAX apps --
they are also used by server-side (web)apps where Javascript is much less
widespread. If the primary application language is not Javascript for which
JSON is a native format, but PHP or Java for example, then its value is much
lower.

There are established industries such as publishing that use complex XML
workflows -- I don't think JSON will push them out.

XML family so far has much better standard specifications and tool support.
Some of the most useful are XPath and XSLT. There are also advanced features
-- too complex for some, useful for others -- like namespaces and schemas. If
JSON is to expand its use, it will have to go to the same interoperability
issues XML addressed, and develop similar features with similar problems.
That's why the idea of JSON schemas sounds funny to me.

Let me give an example. I've developed a semantic tool that lets me import 3rd
part API data as RDF. If it is available in XML, I can apply a GRDDL
(basically XSLT) transformation to get RDF/XML -- and boom, it's there.
RDF/XML serves as the bridge format between XML and RDF.

Now if the data is JSON-only, what do I do? I could download an API client,
try to write some Java or PHP code, but that would be much less generic and
extensible than XSLT. I could probably try a pivotal conversion via JSON-LD
somehow, but oh, bummer -- there's no JSON transformation language? Or is
there... Javascript? Thanks, I would prefer XSLT anyday since it is designed
specifically for these kind of tasks.

My point is, by offering JSON-only you cut off all the useful tools from the
XML world, which is pretty well established. I see JSON as an alternative
syntax to XML, which is easier to use with Javascript -- but by no means THE
"right tool" to all data serialization problems.

------
nirvdrum
One of my biggest issues with JSON is it's a lot harder to generate valid JSON
as a stream. Granted this may be an esoteric use case, but the quoting rules
and type representations seem to require some amount of look ahead which isn't
fun when generating that stream.

------
antihero
For human readable stuff, I don't know why we don't use YAML more often. The
serializer is utterly fantastic, though I don't think JavaScript support
parsing it quickly.

~~~
tmcw
Poorly implemented parsers, especially in Javascript.

------
streptomycin
> [XML] enabled people to do previously unthinkable things, like exchange
> Microsoft Office documents across HTTP connections.

wat

------
TazeTSchnitzel
I think JSON is more popular than XML for a lot of things simply because it's
so much simpler to interact with. No querying attributes, elements, elements
inside elements, text inside elements, etc. You just look up the value
attached to a key, or look up an index in an array, and that's it. It's simple
every level down. And it's also simple to construct.

------
Kilimanjaro
JSON is data. XML is markup. JSON won.

Move on.

