
Deprecating XML - adambyrtek
http://norman.walsh.name/2010/11/17/deprecatingXML
======
patio11
After three years working in Big Freaking Java Web Applications which needed
to frequently talk to other Big Freaking Legacy Applications, which is pretty
XML's core use case, I struggle to think of a concrete example of a time where
XML was a clear win over JSON. I spent a lot more time fighting XML (and, more
importantly, libraries built on libraries built on libraries which parsed XML
which failed to interact with libraries built on libraries built on libraries
which parsed XML in a subtly different and incompatible way) than I spent
enjoying the richness of my namespaced mixed content.

We were once, no lie, forced into JSON-over-XML because of technical and
organizational imperatives that had to be reconciled with the need to ship
functional code sometime this year. ("The web services parser is breaking
again!? Screw it. We only need three fields. JSON grab bag, serialize as
string, read string and interpret as JSON on other side.")

~~~
ElbertF
Try saving a marked up document as JSON.

~~~
wvenable
You make a good point yet outside of XHTML and perhaps XSLT, I've never seen a
document marked up as XML. I have, however, worked on hundreds of projects
that used XML as a structured data exchange format, for remote procedure
calls, or for configuration.

------
wccrawford
"XML deals remarkably well with the full richness of unstructured data."

What? No, it doesn't. XML has even more structure than JSON does. Unless
you're using it improperly, and the you're negating all the advantages that
XML was supposed to bring.

It wasn't for 'mixed content'. It was for system interoperability.

The advantage that XML brings over JSON is that you can have an external
document that you can use to validate any XML you're sending or receiving. You
can guarantee it's formatted correctly. JSON provides no such guarantee.

Not that that's necessarily a bad thing.

~~~
Groxx
It only provides that guarantee because the parsers have agreed to do so (and
not all do, nor do they do so identically). It's a guarantee by communal
agreement, nothing more.

There's no reason a JSON format can't do the exact same thing. JSON-RPC, for
instance, defines its own rules on top of JSON which anything using it must
conform to. How hard would it be to make a set of JSON interchange rules which
specify external schema documents?

edit: answer:

    
    
      {json_strict_version:1, schema:{external:[url,url], internal:[{schema},{schema}]}}

~~~
locopati
If I'm reading you right, you're saying that XML only provides a guarantee
because of an abstract agreement between parsers? Nonsense. XML has a
specification for what constitutes a well-formed document and an XML Schema
defines rules that assure validity. XML may be verbose and the libraries may
be a pain to work with (though there are some well-written ones out there) and
there may be other flaws you want to bring up, but this is not a defensible
point.

~~~
Groxx
XML spec: <http://www.w3.org/TR/REC-xml/#sec-conformance>

> _Conforming XML processors fall into two classes: validating and non-
> validating._

Yup, that's a guarantee alright. A guarantee consisting of the people who
wrote the parser agreeing to write DTD-parsing code. Or not. Depends on your
parser.

It has a specification for validation _because a specification for validating
it has been written_ , based on agreement of the specification _and
implementation by the parsers_ , not because XML has some inherent, magical
validatability quality.

edit: this is all on top of that a spec is just a spec until someone
implements it. Plenty of specs have been chucked because others did something
else, or have attributes which nobody implements (how many email forms accept
the full RFC-compliant email set?). A spec _is_ an abstract agreement.

~~~
dfox
I have never seen anything that implements validating XML parser as intended
by specification and is actually used in production code. Half of XML
consumers I've seen in production implement namespaces in various broken ways
(like depend on exact values of prefixes)

------
DanHulton
God, but I hate XML, and sad to say, for the same reason a lot of people hate
PHP.

I hate XML because it is so mis-used. People try to use it for EVERYTHING and
you get these monstrous XML trees with legions of attributes, when what you're
trying to send could comfortably expressed in "{success: true, message:
'Record deleted.'}".

I admit my ignorance, at least. I'm sure that were I properly shown well-used
XML and these wonderful and vast tools that the competent XML worker has at
their disposal, I'd be as happy to work with XML as I am to work with PHP.

That said, you won't ever catch me trying to write a 3D FPS in PHP, and I hope
that if you ever DID catch me doing so, you'd give me the exact same fish-
slapping that XML abusers deserve.

~~~
palish
At the game studio I work at, they use XML for all scripting tasks: gameplay
scripts and UI scripts.

As in, XML nodes are "game script commands" which are executed sequentially.

~~~
Udo
That's kind of like XSLT, a language with an XML syntax. I can see how that is
an appealing use of XML since it basically is a nice representation of a high-
level syntax tree.

I believe you have come up with a rare example where XML is decidedly better
suited than JSON!

However, the majority of XML use cases are for document-interchanging
applications and the way those are being handled requires massively bloated
toolchains of poorly inter-operating software.

~~~
eru
If you want a language with a JSON-like syntax, consider using JavaScript. (If
you want a language with a uniform syntax, that's easy to parse, choose some
form of Scheme or Forth.)

~~~
Udo
I believe though the point was that I kept wondering what it is specifically
that XML does better than JSON, since the article didn't specify. And so it
occurred to me that one way to make use of XML's inherent complexity is indeed
to represent an instruction tree. But the question here is not what language
is best, or even that JSON can used for the same purpose, it was merely the
"discovery" of an inherent XML feature that would make reasonable developers
prefer it for a certain task. The whole point is moot, however, since XML is
actually intended for structured information interchange, regardless of any
recent efforts to portray it otherwise, and that's a task which has been
thoroughly taken over by JSON in the last few years.

> _If you want a language with a JSON-like syntax, consider using JavaScript._

You're probably right in making this joke though, since JSON is based on
JavaScript it certainly owes its popularity mostly to the browser environment.

~~~
simonw
"it certainly owes its popularity mostly to the browser environment"

I don't think that's a given. JSON is useful because it provides a clear,
compact syntax for the most commonly useful data structures - lists, dicts,
strings, numbers and booleans. That's certainly why I switched to it over XML
(and serialised PHP objects).

------
sigzero
These XML vs JSON talks are tiring. They solve entirely different problem
domains. XML is for documents and JSON is for data. It is not the fault of the
XML spec that programmers go nutso on its usage outside of its problem domain.

~~~
j_baker
I don't understand your point. Documents _are_ data, are they not?

~~~
alavrik
Documents are semi-structured data, that is they are a mix of text and
structured data elements put into a structured hierarchy.

------
haberman
XML's bad, JSON is better.

But Protocol Buffers are best. Like JSON, the data model for Protocol Buffers
maps nicely onto simple data structures (unlike DOM). But with Protocol
Buffers you also get a schema for free, a wicked efficient binary format _if
you want it_ , default values, etc.

 _And_ you can use JSON as a text format. In other words, you could take your
JSON that you have sitting around, whip up a Protocol Buffers .proto file for
it, and get nice generated C++ classes for it with full schema validation.
Your JSON file:

    
    
        {field1="foo", field2=5}
    

...could be accessed from a C++ object as:

    
    
        my_obj.field1();
    

The only bummer about protocol buffers is that its support for high-level
languages (PHP, Perl, Ruby, etc) is not very good. I've been working on a
separate implementation of Protocol Buffers in C to address this (by making it
easy to write bindings for) but this project has unfortunately been stalled as
I've been busy with work and life. :(

~~~
prodigal_erik
The protobuf encoding carries a ton of redundant crap around (if not as much
as text formats). You can do a lot better by paying more attention to the
schema and omitting facts that aren't allowed to change, like tags on self-
delimiting required fields and arrays (ASN.1 PER does this well).

The protobuf tradeoffs only make any sense if you think you can somehow do
something useful with messages that are somewhat but not entirely corrupted,
because you still have to solve the problem of finding intact field boundaries
without being given any HDLC-style framing.

~~~
haberman
> The protobuf encoding carries a ton of redundant crap around (if not as much
> as text formats).

[citation needed]

One of the features of protocol buffers is that they're backwards and forwards
compatible. You can add and remove fields, change "required" to "optional" and
back again, and still make sense of what comes to you on the wire. I don't
think there's much of anything that can be eliminated from the protobuf binary
format.

Eliminating tags for a field (even if _you_ consider it required) wouldn't be
backward compatible with a previous version of the protocol that considered it
optional.

A message from a previous version of the schema is not "corrupted." Being able
to make wire-compatible changes to the protocol is an extremely important
feature.

~~~
prodigal_erik
You can't add or remove required fields, because outdated recipients will be
dangerously wrong about whether they understand the intent of a message in the
revised format. At some point meeting new requirements and keeping
interoperability calls for deprecating and replacing the old format, because
treating everything as optional (which some Googlers apparently do, avoiding
"required" completely) is almost as bad as having 2^n mostly-untested formats.

~~~
haberman
This is the kind of over-engineering analysis that leads to overly complicated
systems like XML schema.

Just because a field says "optional" doesn't mean it's logically optional. You
don't have to make your schema formalism complex enough that it can describe
every last rule of what it takes for a message to be valid. In fact you
definitely don't want to do that, because it's a horrible amount of complexity
in the schema for little gain.

Yes, it's true that some Googlers use "optional" instead of "required"
everywhere in their .proto files. That doesn't mean that you can omit any
field and expect your message to be processed by your peer without error. It
just means that you won't get an error _at the schema validation level._ But
the application could still throw an error. More complex rules about what
fields must be specified or what values they must have can be described in
comments, and enforced with custom validation if necessary.

Also, since protobufs support default values, you can define what value will
be returned for scalar fields if no value is explicitly sent. This can often
be used to define useful default behavior for the case where a field is
omitted.

~~~
prodigal_erik
It comes down to different design choices. Any constraints that are missing
from the schema have to be recreated separately in each implementation, and I
see them as unlikely to do that consistently enough to interoperate well. So I
prefer not to pay at runtime for expressing many variations of the protocol,
when I don't expect them to work anyway. Like HTML vs. XHTML—I shudder to
think how much work was wasted trying to handle the worst tag soup imaginable,
simply for lack of a well-formedness (or DTD validity) requirement.

~~~
haberman
> Any constraints that are missing from the schema have to be recreated
> separately in each implementation

Not true at all. The server can implement them -- once -- and any client who
makes an invalid request to the server will get an error message. These
constraints can be expressed in comments in the interface (.proto) file.

> Like HTML vs. XHTML—I shudder to think how much work was wasted trying to
> handle the worst tag soup imaginable, simply for lack of a well-formedness
> (or DTD validity) requirement.

HTML and XHTML is a completely different ball of wax. Insisting on even well-
formedness is simply unreasonable in practice, because it is so difficult to
ensure, and it is the user who pays the price when the software isn't perfect.

If you're still convinced that the world would have been better if strict
XHTML had won, you should read:
[http://diveintomark.org/archives/2004/01/14/thought_experime...](http://diveintomark.org/archives/2004/01/14/thought_experiment)

------
nradov
The lack of a standard for handling namespaces is a fatal flaw in JSON for
many applications. Some of the healthcare data exchange formats we deal with
require mixing multiple XML namespaces together; it's complex, but it works.
Trying to do the same in JSON would be a huge mess.
<http://blogs.sun.com/bblfish/entry/the_limitations_of_json>

------
JonnieCache
IMO the biggest advantage of JSON on the web is a simple one: fewer bytes on
the wire.

EDIT: also the concise yet all-encompassing nature of its home at
<http://www.json.org>

~~~
kenjackson
Is this really true? It seems like they would be remarkably close in number of
bytes on the wire, accept in cases where you generate really poor XML. In
fact, in some basic examples I quickly looked at, I think XML actually puts
fewer bytes on the wire.

I do have to agree with the author though. This feels like a really "meh"
discussions.

~~~
xiongchiamiov
If the actual data is small compared to the tag names, then XML will be
significantly larger.

~~~
JonnieCache
Also if the data scheme is very simple and only for consumption by the person
who designed it, one can eschew tag names altogether and just have an array
and rely of the order of elements to define meaning.

------
andybak
And I'm looking forward to a real world example of 'width' where JSON fails
and XML shines.

------
Goladus
I have the same reaction as the author, although for perhaps a slightly
different reason. XML has far less syntax than JSON, and strings are the
default input. So XML is better when you have large chunks of text want to add
structure and meaning to abstract subcomponents, and want it to be human-
readable. JSON is better when you need to organize lots of little pieces of
data.

Unfortunately, neither use case is typical. Usually it's a mix of both, and
it's not worth using both formats so you have to pick just one.

------
seanalltogether
I spend all my time doing front-end development and I'm not sure what the
author means by "I look forward to seeing what the JSON folks do when they are
asked to develop richer APIs". The only deliberately unstructured XML I've
worked with is document formats. Where are these rich unstructured APIs that
wouldn't otherwise be classified as lazy?

------
grovulent
I'm no expert on these things...

But generally what I want out of a data stream is a set of name/value pairs.
That's it. I really don't understand why, but trying to get those name/value
pairs from an xml based api is always an adventure. Well - I know why. It's
because everyone structures their data differently when they create xml.
Sometimes the name I'm after will be a node name. Sometimes it will be a value
of a text node. Sometime the value that properly maps to that name will be a
textnode 3 layers deep, sometimes it will be an attribute value.

I'm still a newb but I've worked with about 8 different apis now. For the json
apis I wrote one recursive function that worked on all of them to get the data
I wanted... It was about 8 lines of code. For the XML apis, I STILL haven't
figured out how to write a recursive function which works for ONE of them, let
alone all of them. (while relying on python minidom to do the actual parsing
for me).

------
wvenable
I remember when XML was still young and I was introduced to XML-RPC. It was
truly beautiful protocol -- straight forward, human readable, human writable,
and easy to implement. I wrote desktop apps that connected to web services
written in different languages on different platforms and it all worked.

Then SOAP happened. The same philosophy as XML-RPC gone horribly wrong: a
human readable format no longer readable by humans, insanely complex, and hard
to implement properly (even with the libraries). Frustrating all around.

Although JSON is much simpler than XML, there's no reason why someone couldn't
invent something as horrible as SOAP (or XSLT) on top of it. I can only hope
that JSON implementers see the value of keeping things simple. Perhaps the
existence of XML as an "enterprise" technology will help differentiate JSON in
that way.

~~~
roel_v
I used xmlrpc for this exact purpose (desktop to 'web service' - now there's a
buzzword I haven;t heard in 5 years!) early this year, it was awesome. On the
server side I had PHP's simple xml api, which also made pretty much all of the
pain of parsing xml go away.

------
dstein
JSON falls flat when trying to define data types and serialize objects.

    
    
      var foo1 = new Foo1();
      foo1.bar1 = new Bar1();
      var foo2 = new Foo2();
      foo2.bar2 = new Bar2();
    

If you were to serialize that data using XML you might do this:

    
    
      <Foo1 name="foo1">
        <Bar1 name="bar1"/>
      </Foo1>
      <Foo2 name="foo2">
        <Bar2 name="bar2"/>
      </Foo2>
    

But now what if you use JSON? You'll probably end up with something like this:

    
    
      {
        "foo1" : {
          "class" : "Foo1",
           "@bar1" : {"class":"Bar1"}
         },
        "foo2" : {
          "class" : "Foo2",
          "@bar2" : {"class":"Bar2"}
        }
      }
    

Generally it makes sense to use XML for imperative data structures, and use
JSON for functional data structures.

~~~
j_baker
I don't see what you're getting at. First of all, how is example 1 better than
example 2? And how does that prove that JSON is better for functional data
while XML is better for imperative data?

~~~
dstein
It boils down to JSON being schemaless. In XML the data type is implicitly
defined by the tag, whereas in JSON you need to arbitrarily encode the
datatype as a property and somehow differentiate between properties and child
elements/objects.

~~~
murrayh
How about this?

    
    
        <Foo1 name="foo1">
            <Bar1 name="bar1"/>
        </Foo1>
        <Foo2 name="foo2">
            <Bar2 name="bar2"/>
        </Foo2>
    
    
        [["Foo1", {"name" : "foo1"}, [
            ["Bar1", {"name" : "bar1" }]
         ],
         ["Foo2", {"name" : "foo2"}, [
            ["Bar2", {"name" : "bar2" }]
         ]]
    

I personally wouldn't do this, but the arbitrary encodings between the two
samples are now equivalent.

~~~
dstein
No, I think you proved my point.

~~~
murrayh
I still don't understand. What makes the XML version better than the JSON
version? Why is creating XML like that a good solution, while creating JSON
like that is not?

~~~
dstein
The XML is better because when the data types directly relate to tag names it
is simple to understand, read, and write, process, and there are tons of XML
libraries in every language that do object-xml serialization automatically.
With JSON, since there are no types, you must manually write your own data-
object serialization and embed the type system right into the data structure
itself.

~~~
j_baker
It sounds like that might be better for a statically-typed language where you
might need specific types in the data file. But I would argue that not only is
it overkill in a dynamically-typed language, it's bad. We're talking about
data here. There's no reason to couple it to data types in your code.

Plus, serialization to JSON in a dynamically-typed language is pretty much
automagic compared to what you have to do with XML.

------
sielskr
In various conversations on the internet, starting with comp.text.sgml where
XML was born in the 1990s, I have seen many signs that someone hopes or
expects some of the immense richness in meaning that humans assign to words
and other symbols to be carried along by the XML.

In other words, I've seen many special cases of the mistaken belief that
programmers of today can do things that only researchers of the future
wielding human-level AI will be able to do -- and that just _has_ to cause
grief if those hopes and expectations inform IT decisions.

------
chipsy
I am a huge JSON fan, but I have still managed to find a few cases where I can
make use of XML - the mixed-mode situations the author describes.

To generalize my viewpoint: XML is a good way to mix data and data processing,
or mash up multiple DSLs in polyglot fashion. When stream-parsed, one can
imagine great use-cases of XML, where an early node adds context, references
and functionality for other data later in the document. But it sucks for plain
old standalone-structured data, which is the kind of data we usually like to
_store_.

------
joseacta
I think they shouldn't be mixed. JSON has its strong attributes as also does
XML. I started working with XML on 1999 and started doing AJAX style web
applications on that time using XML/XSL and Internet Explorer XMLHttpRequest.
So now we have JSON, good, excellent addition. Is fast and simple to
implement.

But to deprecate XML, I don't think so. Imagine just an EDIFACT document in
JSON?

------
DjDarkman
I find XML to be too heavy and complicated for an interchange format.

Quote: "any damn fool could write a better data interchange format than XML"

~~~
prodigal_erik
There are hilariously terrible protocol designers out there, and too often
nobody with taste manages to stop them. The EDI standards are often cited as
examples of how much worse than XML it could get.

<http://en.wikipedia.org/wiki/EDIFACT#Example>

------
latch
an example would be useful

------
plq
one good property of xml is that it's also a database. it can be queried once
inside, say, postgresql's xml type.

json lacks that level of support from mainstream relational databases, which
makes it a waste of time for certain use-cases.

~~~
simonw
<http://wiki.postgresql.org/wiki/JSON_datatype_GSoC_2010>

------
jtchang
XML just gives me a headache.

I just like to point out that there was a time before XML. Fixed width formats
are still very common when working with mainframe systems. Surprisingly, they
aren't actually that bad to work with.

------
ilovejson
I agree XML has it's uses, but I would go for JSON first, XML somewhere after
JSON. Any day.

I still love JSON, <http://ilovejson.com> for it's readability and simplicity.

------
est
I remember a while ago I read a submission on HN (or reddit?) talking that a
founder of jabber protocol is ditching XML for something new. Anyone remember?

------
b-man
How about SXML?

<http://okmij.org/ftp/Scheme/SXML.html>

------
johnm
FWIW, if you change "unstructured" to "semi-structured" then Norm's argument
works much better.

