
Erik Naggum's wonderful rant about XML - uriel
http://harmful.cat-v.org/software/xml/s-exp_vs_XML
======
olavk
This is the worst kind of internet rant. It uses all kinds of elaborate
similes to make the author (and presumably the sympathetic reader) feel smug
and superior, but have very little technical content to justify it.

A reasoned criticism of XML could note that XML is a quite well-designed
syntax for its intended purpose (domain-specific structured document formats
for interchange on the internet), but that it has grown to be used outside of
this niche for purposes like (non-document) data interchange, RPC's,
configuration files and so on, where the advantages of the XML syntax for its
intended domain turns into disadvantages.

For example, the distinction between elements and attributes is very useful
when marking up documents, like in (X)HTML:

    
    
        <a href="http://harmful-cat">A <em>wonderful</em> rant</a>
    

However, if you want to markup a data record, which is not intended as a
readable document, like:

    
    
        name: Justin
        address: Copenhagen
        phone: (12)34-56
    

Then the distinction between elements, attributes and content just becomes
superfluous, and the XML syntax needlessly verbose. This kind of data is much
clearer marked up with YAML, JSON or s-expressions.

On the other hand, the link-markup above would become pretty convoluted and
error prone to write using any of these formats.

The "verbose" end-tags like </p>, </body> are very helpful syntax when
manually editing large documents (which may have deeply nested structures
spanning several screenfulls). However for simple and compact data structures
a simple ")" (or even "}") is easier and clearer. Of course, if the content is
never edited by hand anyway it doesn't make any difference, and you might as
well chose the format that is simplest to parse (or in the very rare
circumstance where bandwidth is the bottleneck, you could choose the format
with highest content to markup ratio).

So if XML-syntax is better for structured documents, YAML for configuration
files, and s-expressions for data-structures and code, which format is "best"
in general? Should you always choose the optimal format, or does it make sense
to chose the same format everywhere for consistency? For example a Lisp-based
system might choose to use s-expressions for documentation even if it is a
pain to edit, and conversely an XML-based publishing system might choose XML
for configuration also, even if YAML would be easier to edit. This is just
trade-off decisions.

But reasoned trade-off decisions are not glamorous and don't make you into an
internet hero. If you want to be an internet hero you should write rants that
provide the reader a conceptual framework which allows the reader to feel
smart and superior. In this case, technical details detract from the purpose,
since an informed reader might disagree with technical details, which might
undermine the ego-boost the reader is supposed to feel.

But Erik goes beyond the common smugness, and introduces the concept of the
stupid, moronic (XML-using?) masses which somehow reigns over and suppresses
the few intelligent (presumably s-expression using) persons. Thereby Erik tabs
into the deeply rooted insecurities (and consequently delusions of grandeur
coupled with persecution complex) of many socially-challenged geeks.

~~~
jamesbritt
'The "verbose" end-tags like </p>, </body> are very helpful syntax when
manually editing large documents (which may have deeply nested structures
spanning several screenfulls). However for simple and compact data structures
a simple ")" (or even "}") is easier and clearer.'

The verbose end tags also make it easier the write consistent robust parsers.
One complaint about SGML was that it hard to find a tool that correctly
implemented the entire spec. The XML spec is 11 pages.

XML came from a desire to have SGML on the Web. As you've pointed out, people
have used XML were it likely didn't belong.

To be fair, though, once the world had a choice of decent XML parsers and
tools it made sense to use XML for many things, even where the syntax itself
was less than ideal for the given task. The proliferation of JSON parsers will
likely fix a lot of this abuse moving forward.

Still, berating XML for how people misuse it would be like saying Git is crap
because some people use it as a general purpose database, and there are better
ways to design relational databases.

~~~
fauigerzigerk
An 11 page XML spec? That would be surprising. But you're right, XML itself is
pretty simple and useful for document processing. Where things really went
completely awry is XML Schema.

I have read (and implemented) a lot of weird specs in my life, but XML Schema
has to be the worst. What makes it stand out is that it's incredibly
convoluted and completely unfit for purpose at the same time.

~~~
jamesbritt
" Where things really went completely awry is XML Schema."

I think XML worked out OK for its intended purpose because there was a lot of
experience with SGML, HTML, and ad-hoc attempts at "re-purposing" HTML. Folks
could say, well, we tried this and that, and this works and that is painful.
And since it was not assured to be a success, there were fewer major vendors
clamoring to get their fingerprints all over it.

But after XML caught on there was interest from tool vendors to beef things
up, largely with abstractions that had yet to see real-world testing, and with
things that just so happened to require massive IDE support.

The worst may have been the schema stuff, but there's a lot of competition.

BTW, this page <http://www.w3.org/TR/REC-xml/> gives me 40 pages of print
preview. A good chunk consists of appendices, but the main part runs more than
11 pages. I don't recall where I got that number from.

I'll just blame Tim Bray, for lack of a real excuse. :)

~~~
fauigerzigerk
What really surprises me in XML Schema is not so much all the half baked stuff
they put in and not even the crazy nesting of complex types for instance. What
surprises me is what XML Schema cannot do.

One thing it cannot do is probably the most frequently used structure in all
structured documents I have seen. It is to specify that a particular set of
quantified elements can occur in any order.

The reason they gave for not supporting this is that validators would have to
be more than contextless state machines. That's insane. They have created a
schema language that doesn't support the most important schema constraint of
them all for performance reasons.

------
michael_dorfman
A classic rant.

 _"In many ways, the current [2002] American presidency and XML have much in
common. Both have clear lineages back to very intelligent people. Both
demonstrate what happens when you give retards the tools of the intelligent."_

Erik will be sorely missed.

------
CodeMage
_A brief summary, then: Remove the syntactic mess that is attributes. (You
will then find that you do not need them at all.) Enclose the /element/ in
matching delimiters, not the tag. These simple things makes people think
differently about how they use the language. Contrary to the foolish notion
that syntax is immaterial, people optimize the way they express themselves,
and so express themselves differently with different syntaxes. Next, introduce
macros that look exactly like elements, but that are expanded in place between
the reader and the "object model"._

Maybe I'm weird, but to me (this part) sounds like LISP.

~~~
wvenable
"Enclose the /element/ in matching delimiters"

I don't understand this part? What is he proposing this looks like?

~~~
mcav
I don't understand it either. You somehow have to differentiate the tag from
the content; maybe he wanted to reduce the redundancy of the open/close tags.
Something like CSS:

    
    
        h1 { this is a header }
        p  { this is the content. }
    

HTML5-style optional ending tags might also work:

    
    
        <h1>This is a header
        <p>This is some content.
    

(Though I remain confused about which closing tags are optional in the HTML5
spec. I think h1 tags have to be closed.)

~~~
CodeMage
Both of those are valid examples. A LISP-like syntax would also work:

{h1 This is a header}

------
tybris
People severely underestimate the damage XML does. Example:

<number>9012853</number>

You just added a data overhead of 500% over using a 4-byte integer, and an
even bigger parsing overhead. Let me guess, now you need to "scale"?

~~~
jauco
Serialising a single number that way is indeed worthy of a dailywtf mention,
however none of that has anything to do with xml,

    
    
        json:
          {number:9012853}
        Yaml:
          number: 9012853
    

It's bad code, no matter the language.

------
antipax
Did he compare XML to rape?

------
diN0bot
wish he'd get to the point a little faster. the was pretty good for the first
20 minutes.

------
trezor
Am I the only one who finds it ironic that his main complains about XML is 1.
that it is verbose and 2. requires too much resources to process. And then he
proceeds to writing 10 pages trying to express this?

------
BerislavLopac
Boring. All these problems have already been solved by JSON.

