
XML Can Give the Same Performance as JSON - blacktulip
http://www.infoq.com/news/2013/08/xml-json-performance
======
clarkevans
It's significantly harder to write a compliant and performant parser for XML,
even if one might be able to make it as fast. But, after parsing, the
application developer still has to transform XML data into something usable to
their program. This is much easier with JSON since it's information model
(map/list/scalar) matches the information model of most modern programming
languages.

XML is a poor choice for most serialization since it has the wrong information
model. It's designed for extensible document markup and most application data
packages aren’t documents that require markup.

~~~
zdw
I'd argue the opposite, simply because XML has many well defined schema
formats and XPath. If you end up having versions of data formats, or are
dealing with data from an unknown source, XML is far superior, and allows for
easier interchange of data.

The correct way to load an XML document is as follows, IMO:

    
    
      * Open it
      * Run it through the schema to validate it's 
        structure 
      * Manipulate/access it via XPath, with the ability to 
        assume correct structure as defined by the schema. 
    

Compare to JSON, where the process is

    
    
      * Open it
      * Attempt to access it, even though it's structure 
        may deviate from the expected
      * Die or try to rollback work when an assumption about
        the data structure isn't met.
    

Thus, most JSON code either trusts the input data implicitly, or is a mess of
"is this where it should be?"/independently re-developed schema equivalent
every time. I am aware of attempts to create JSON schema languages -
unfortunately these are neither in common use, nor well defined enough to be
standards, and they're more verbose and less expressive than XML's RelaxNG
compact syntax.

JSON is fundamentally a dump of a data structure, not a document format. It
isn't designed for long term data storage, or for inter-compatibility, despite
being used those ways. Thus, it's great if you control both ends and your data
structures don't change, as is the case with most web programming. It
frequently falls down outside that realm.

Now, the above ignores runtime performance concerns. I'd argue (again) that
outside of web apps, the load/store into a data structure is done rarely, or
can be done in the background or in parallel if there's a lot of data
concerned. Also, we're probably talking about less than an order of magnitude
in the vast majority of cases.

~~~
Mindless2112
If you have untrusted input, XML has it's own issues [1].

> JSON is fundamentally a dump of a data structure, not a document format.

Exactly! IMHO, it's unfortunate that mark-up documents can be used to
represent other data structures. It isn't what they're meant for, and there
are (or "should be" if you're not fond of JSON) better tools for human-
readable serialization.

[1]
[http://en.wikipedia.org/wiki/Billion_laughs](http://en.wikipedia.org/wiki/Billion_laughs)

~~~
zanny
You mean yaml?

~~~
Mindless2112
Indeed, YAML is a more powerful serialization format than JSON.

The original name (Yet Another Markup Language) threw me off, so I never
seriously considered using it. Even its Wikipedia article is in the "Markup
languages" category.

------
Wilya
Alternate title: "XML performs as badly as JSON".

JSON is used because Javascript, Python and Ruby (among others) understand it
without needed external libraries, and because it's fairly readable for
humans.

Performance has nothing to do with it. If you want performance and have some
control on your stack, you're better off with something more specialized.

~~~
gioele
Python and Ruby needed external libraries to understand JSON until those
libraries have been moved into the respective standard libraries.

~~~
droithomme
Do you use Python or Ruby?

JSON has been in Python's standard library since 2008.

[http://docs.python.org/2/library/json.html](http://docs.python.org/2/library/json.html)

It's also in the Ruby standard library.

[http://www.ruby-
doc.org/stdlib-1.9.3/libdoc/json/rdoc/JSON.h...](http://www.ruby-
doc.org/stdlib-1.9.3/libdoc/json/rdoc/JSON.html)

~~~
gioele
Your comment just reinforces my point: neither Python or Ruby supported JSON
since their inception.

JSON has been launched around 2002.

Python has incorporated one of the many existing libraries in 2008 after a
lengthy discussion, the last part of which is
[http://mail.python.org/pipermail/python-3000/2008-March/0125...](http://mail.python.org/pipermail/python-3000/2008-March/012583.html)

Ruby had no support JSON for years, it has been added, in its current form, to
the stdlib only in 2007.

XML has been supported by Python since 2000 (minidom + sax) and by Ruby since
2003 (REXML).

The idea that JSON is used more than XML just because programming language
supports it "natively" clashes with the evidence.

------
jacques_chester
That's ... not why people use JSON.

~~~
steveklabnik
It is a factor people do bring up in discussions, though, even if it's not the
primary one.

------
zeroDivisible
That's just my opinion, but even if JSON-parsing performance would be few
percent worse than XML-parsing one, I would still try to use it whenever it
would make logical sense.

(web services - JSON; configuration files - JSON for simpler files, XML for
more complicated ones AND only if JSON couldn't handle it).

Also, I might be missing something but it appears that this benchmark doesn't
take into consideration libraries used server-side to serialize and
deserialize JSON / XML? I'd say that depending on those, there can be some big
differences.

Not mentioning the fact that bigger impact on the feel how "snappy"
application is lies in the overall application design, than in choosing
whether we are using JSON / XML / YAML / anything else to transmit the data.

~~~
drunkpotato
JSON is great for data interchange. I think better than XML because its
simpler structure encourages simpler interchanges than XML.

For me, a disadvantage in JSON for configuration files is the lack of
comments. My current project uses JSON for config and I sorely miss 1)
comments and 2) unquoted config var names. I'd rather use YAML or another
format with a config loader for whichever language I'm using.

------
jakozaur
The link to actual article seems to be well hidden:
[http://balisage.net/Proceedings/vol10/html/Lee01/BalisageVol...](http://balisage.net/Proceedings/vol10/html/Lee01/BalisageVol10-Lee01.html)

------
buster
Warning! Highly unscientific, non realistic test ahead! Running on a
baremetal, idle server..

    
    
      python -m timeit -r 10 -n 100000 -s 'import json'  'j="""[{"t":"Hello"}]"""; json.loads(j)[0]["t"]'
      100000 loops, best of 10: 9.07 usec per loop
    
      python -m timeit -r 10 -n 100000 -s 'from xml.dom import minidom' 'xml = "<t>Hello</t>"; minidom.parseString(xml).getElementsByTagName("t")'
      100000 loops, best of 10: 74.9 usec per loop
    
      python -m timeit -r 10 -n 100000 -s 'from xml.dom import minidom' 'xml = "<t>Hello</t>"; minidom.parseString(xml).firstChild.firstChild.wholeText'
      100000 loops, best of 10: 76.6 usec per loop
    
    

Given that the "study" comes from a "markup conference" that has XML all over
its website and given the fact that XML documents (uncompressed, in memory)
are much larger and that XML is a much much more complex standard, i seriously
doubt the conclusion that XML and JSON are "almost the same" in terms of speed
and memory usage.

Or simply put: How is it technically even feasible that an XML document that,
by its nature, is much more complex (and thus i imagine a XML parser to be
much more complex) can be parsed as fast as JSON with more or less the same
memory footprint?

Wouldn't that only suggest that the JSON parser that is tested is just not as
optimized as the XML parser?

I can imagine that a webbrowsers XML parser is much more optimized and mature
then its JSON parser, given that it's a browser that mainly needs to parse
HTML/XML?

Does it mean i should switch to using XML instead of JSON in commandline
tools?

Or to put it another way: The headline "XML Can Give the Same Performance as
JSON" is probably true for browsers which have had years in improving the XML
parser. But i don't think this can be a general conclusion.

~~~
corresation
_Given that the "study" comes from a "markup conference" that has XML all over
its website_

So? The overwhelming majority of JSON advocacy comes from developers who
happen to know JavaScript, and thus JSON appeals to them. Virtually everyone
speaks from the position of self-interest.

~~~
jimfuller
yes, a bit of a cheap shot that ... and the guy works at a company that makes
MarkLogic server which spits out XML ... but it also spits out JSON (all at
very large scale I may add) so choose your poison.

Until someone replies with the same experimental rigor to refute the
findingsthen I think the observations this paper makes stands ... thats how
peer reviewed journals work.

The paper is not saying 'XML is better' or even 'XML is faster' ... its just
addressing the perception that XML is slow in certain scenarios which has
become a default myth.

JSON has been accepted as a datatype in the Markup conferences of the world
its great for data transfer, XML is a compromise on many different levels but
tends to be good for mixed content and documents. I think we've all moved on.

------
delinka
"Use HTTP Compression which most often is the single most important factor in
total performance."

How does this advice stack up against the recent security issues with
compressed HTTP traffic? Is this article's recommendation at the same place in
the transmission stack where this would cause trouble?

~~~
jerf
"How does this advice stack up against the recent security issues with
compressed HTTP traffic?"

It doesn't. The recent issues with HTTPS suggest there may be a fundamental
tension there between performance and security. In fact the recent issues
don't particularly care "where" in the HTTPS connection the compression
occurs, it just has to be inside of it. It won't matter whether you use
standard HTTP compression or roll your own (which on the TCP socket won't look
all that much different anyhow, you'll just be giving up browser support for
automatic decompression).

------
ebbv
Among my reasons for choosing JSON over XML performance isn't in the top 3.

Whenever I see a modern API that uses XML is gnash my teeth and shake my fist
skyward.

------
Tloewald
Original article:

[http://balisage.net/Proceedings/vol10/html/Lee01/BalisageVol...](http://balisage.net/Proceedings/vol10/html/Lee01/BalisageVol10-Lee01.html)

So his two JSON test cases are eval and jQuery. He does not use JSON.parse.

So this shows that with a lot of hand waving around cases no-one likely cares
about, XML is almost as performant as JSON, even if the code is way, way
uglier.

In an actual real project where you're probably passing lots of fiddly objects
around we have no results.

------
TeeWEE
I'm going to say something bald here: JSON people often dont know what schema
based interchange formats are, and why they are fast. Such as protocol
buffers, or facebook/apache thrift.

speed: use thrift or protocol buffers

ease of implemention: json

xml has best of both worlds. And therefore it is most of the time not suited.
However the fact that it can support schema (XMLSchema), document translations
(XSLT/XQuery) and query mechanisms (XPATH/XQuery) makes it a format very well
suited for big enterprises. Where specification is important.

This is why cluncky protocols such as SOAP are built on top of XML. It is
future proof (extensibility is more difficult in json). It is schema based
(parsing, validation and language binding is easier).

JSON for example doesnt support references between nodes. You can built it in,
but its not standard.

JSON is the easy peasy solution, the quick win, the fast enough one, and
therefore the winner. However XML is the big beast that has it all, and
therefore its complex. But that doesnt mean it sucks.

------
g8oz
Maybe the same performance but not the same programmer productivity. (Speaking
from experience here)

------
daigoba66
It really all depends on the use case and your target end-points. JSON is
great when the data is transmitted to and from web browsers because of
JavaScript. Nearly every major framework has a really good XML library.
Readability of the payload is usually not a concern, at least my experience,
because only the software has to deal with it and not a human. Also in my
experience the size difference between a JSON payload and an XML payload is
not a big deal (except perhaps across slow network links where every byte
counts, but there are more succinct and faster binary serialization formats
available).

------
voidr
> David also tries to cover a range of devices, browsers, operating systems
> and networks in his test

This title is misleading, nowhere does it have the word "browser".

------
DamnYuppie
One thing I haven't seen mentioned in this is the size difference between XML
and JSON. At the end of the day that is the main reason I prefer JSON, it is
much more compact when storing it and sending it across the wire. Anyone who
has ever worked in a system where the canonical data representation was XML
will know the horror of seeing a nice 500k file balloon to over 10 MEG just
because it is now in XML...

~~~
ygra
Back in uni there was a contest for an efficient XML parser from some company
elsewhere in Europe. The problem was efficient parsing of XML files around 10
GiB and more – apparently the European standard data format for bank
transactions (or the log of them, I don't remember precisely) is XML and for
the Central Bank such large files were not uncommon.

Makes me _really_ wonder why they went with that format.

------
wprl
The difference in speed of XML and JSON is not going to be a bottleneck to
most applications. Of course, the small difference in parsing speed I'm sure
matters a lot to some people.

The major advantage of JSON is a readable syntax. XML tends to be overly
verbose, and therefore not as easy to read.

JSON is more compact too, which matters when sending 1000s of objects across
the wire.

~~~
mindcrime
_The major advantage of JSON is a readable syntax. XML tends to be overly
verbose, and therefore not as easy to read._

That's pretty subjective. For example, I find XML to be far more human
readable than JSON.

------
Millennium
It can give the same performance, but it's more work. For a lot of devs,
that's all that matters.

------
snarfy
If you care about performance, you would use neither format.

Developers choose json over xml because using it and dealing with it _is_ more
lightweight than dealing with the very baroque xml. The fact that the typical
json payload is 1/2 the size of xml is just a bonus.

------
fetbaffe
Next week:

CSV Can Give the Same Performance as JSON

------
jgalt212
this xml vs json debate is getting tiresome but as someone focused more on
getting things done over optimizing processes here are my rules of thumb:

for data transmitters: lean towards json unless your data structures are
deeply nested.

for data receivers: be prepared to handle both (as even in cases where json
clearly makes more sense) as large orgs tend to lean towards xml. When an API
offers both serialization options, choose the better one so through log
analysis you can nudge the producer towards the optimal solution.

