
The havoc the XML purists wrought - whyleyc
http://www.scripting.com/stories/2010/05/19/theHavocTheXmlPuristsWroug.html
======
petercooper
_They say that XML is too complicated, but that's wrong. Just ignore
everything but elements, attributes and namespaces. That's what RSS does, and
it's very simple. [..] I don't trust would-be platform vendors who won't
accomodate all possible developers, esp those who use a language that's so
deeply installed as XML is._

He talks as if JSON is somehow "hard" for developers to grok. It's about as
simple as you can get. I don't think it's noble to support a subset of XML,
where regular JSON would fit better, merely to placate some developers used to
doing things the hard way. If anything, API developers should be leading us by
the nose to do things the best way.

Even if you "ignore" much of XML, as Winer suggests, you can still stumble
into trouble. I've written (and used) a few RSS parsers in my time and dealing
with broken XML with a regular XML parser is a gigantic pain in the ass. Many
developers don't sanitize their input properly or dump source HTML or XHTML
verbatim into their <description> elements. Try throwing _that_ through Expat
without thinking about it.. You can screw up JSON too, of course, but it's
almost entirely down to quote marks alone.. not tagging, erroneous character
encodings, non-existence namespaces, and more.

 _In contrast, Amazon supports XML in their web services, quietly and
competently. I can't imagine them saying one day "It's too much work for us to
keep supporting XML so you all have to rewrite your code now if you want to
keep paying us for the web services you use."_

Amazon's Web Services are good/unique enough for people to put up with the
bullshit of XML in order to use it.. though their command line and Web
interface tools are good enough that I suspect most users never need to get
down to dealing with XML anyway.

~~~
mattmillr
There are plenty of complicated things that aren't complicated anymore if you
ignore the complicated parts. (For example, the global financial system.) I
don't think that line is a valid argument.

------
jmillikin
XML is more complex than it needs to be, but this article is just ridiculous.
XMPP isn't complicated because it uses XML, it's complicated because it's a
modern IM protocol. If anything, using XML makes implementing an XMPP library
significantly easier than if it had its own wire format.

A better comparison than RSS v. XMPP is RSS v. Atom. And in this comparison,
RSS is _by far_ the loser. It's incredibly difficult to write an RSS parser
which can handle even a small fraction of the RSS published today, mostly
because the standard is absolute garbage. In contrast, an Atom parser can be
knocked out in an afternoon.

I'd be happy with a "reduced set" of XML, which excludes stuff like DTDs,
named entities, and references. I've never seen these features put to any
significant use in real life. But most of XML is quite sane (if a bit
verbose).

------
RyanMcGreal
If we follow Winer's advice and use just a simple subset of XML, what does XML
buy us that JSON doesn't also provide with less overhead and better
readability? (Yes, I know JSON doesn't have namespaces, but I'm not persuaded
that namespaces belong in any subset of a data format that can be called
"simple".)

~~~
prodigal_erik
XML Schema is hilariously painful but it at least exists and is in fairly
common use. JSON Schema is still an Internet-Draft that almost nobody seems to
be aware of, so JSON-based systems are still at the "garbage in, garbage out"
level and will only get worse with maintenance.

<http://tools.ietf.org/html/draft-zyp-json-schema-02>

~~~
rams
XML Schemas are well established in areas like insurance. I worked on a
insurance portal in 2001-2002 that made extensive use of the Origo schemas - I
realized that XML is really huge in some areas. To the non-enterprise / HN
startup / web dev types it seems that XML can be wished away, but they have no
idea of how well entrenched it is.

------
billpg
When I've designed XML files in the past, I use the "tags and attributes"
principle. The entire file is tags, and ignored whitespace between the tags.
All the data is in attributes. Looks a lot nicer IMO.

What I don't get, with XML in the real world, is why people prefer this...

<A><B>1</B><C>2</C></A>

instead of my preference...

<A B="1" C="2" />

With the first form, you need to write out each tag name twice, and there's
ambiguity when whitespace appears between tags. Yet that's what most XML out
there looks like. Maybe I'm just odd,

~~~
cema
A choice between a node and an attribute is not always clear, but there are
helpful guidelines out there. A search for _xml nodes versus attributes_ will
give you a list of articles to ponder -- sorry for not being more detailed,
but I am away from my archives, or else I could have given you links to the
places that helped me to visualize it better. Here is one classic, though,
from as far back as 1992: <http://xml.coverpages.org/attrSperberg92.html>

EDIT: ok, the first link is to an IBM Research article which I recall having
been useful: <http://www.ibm.com/developerworks/xml/library/x-eleatt.html>

------
wendroid
I always use the same example as why I dislike XML so much

    
    
        <a>
            <b>hi</b>
        </a>
    

is not the same as

    
    
        <a><b>hi</b></a>
    

But Erik Naggum makes the argument against XML so much more fun

<http://harmful.cat-v.org/software/xml/s-exp_vs_XML>

~~~
elblanco
How are those two examples different? Every XML system I've used would see
those as the same.

I, however, really enjoyed this line from the link. _"I once believed that it
would be very beneficial for our long-term information needs to adorn the text
with as much meta-information as possible. I still believe that the world
would be far better off if it had evolved standardized syntactic notations for
time, location, proper names, language, etc, and that even prose text would be
written in such a way that precision in these matters would not be sacrificed,
but most people are so obsessively concerned with their immediate personal
needs that anything that could be beneficial on a much larger scale have no
chance of surviving."_

This wonderfully, succinctly explains why efforts like the semantic web are
doomed to failure.

~~~
DrJokepu
Technically, they are different. In the first example, there's whitespace
between <a> and <b> as well </b> and </a>. Most applications ignore whitespace
but they're not required to - whitespace is not ignored in XML.

~~~
jpr
I just died a little inside. Knowing that there are people out there that
would inflict this kind of nonsense as a standard for others to use makes one
really lose faith in humanity.

~~~
prodigal_erik
That's the only way to distinguish

    
    
      an <strong>emphasized</strong> word
    

from

    
    
      an<strong>emphasized</strong>word

~~~
scott_s
Which makes perfect sense for documents. So why do we use a markup language
suitable for documents for general data? Whitespace matters in a document, but
does not matter for data. (I'm not accusing you, I'm actually curious if you
know the answer. I've never had to deal with XML.)

~~~
bsaunder
It may matter a lot for some data... When you send a binary file, would you
like all of the zeros stripped? Your insignificant data from one perspective
is possiblely significant from another perspective.

~~~
jerf
It is not possible to ship binary over XML. Regardless of encoding used, XML
rigidly forbids the bytes corresponding to non-whitespace ASCII chars below
32: <http://www.w3.org/TR/2008/REC-xml-20081126/#charsets> This makes
arbitrary binary impossible. You have to base64 encode it or something.

Not entirely relevant to your point, but worth bringing up. (XML doesn't ever
mangle binaries, because it simply forbids them.)

