
Erik Naggum's XML Rant from 2002: Disturbingly prescient - batista
http://www.schnada.de/grapt/eriknaggum-xmlrant.html
======
dhconnelly
The title of this post should be "how not to disagree with others in a public
forum". The tone is alienating and disrespectful, so much so that the valid
criticisms of XML as a technology are completely buried by the arrogance and
off-topic political and social commentary: "the current American presidency
and XML have much in common", "XML is the drug-addicted gang member who had
committed his first murder before he had sex, which was rape."

If we want computing to be a friendly and welcoming field and contribute to
the betterment of humanity, we can't tolerate people who behave like this in
public.

~~~
batista
> _If we want computing to be a friendly and welcoming field and contribute to
> the betterment of humanity, we can't tolerate people who behave like this in
> public._

Other people can't tolerate overly-polite, overly-cautious, overly-friendly,
overly-political-aware, overly-PC tone.

It makes us droozy and keeps the discussion in a level better suited for
enterprise brainstorming meetings and committees. Everybody looses.

Tech and science leaders where very frequently passionate and prone to rant.
Just think of Torvalds reactions to (what he thinks are) stupid ideas in the
kernel list -- even if he's wrong, it helps keep everybody vigilant and
agitated for his contributions.

> _If we want computing to be a friendly and welcoming field and contribute to
> the betterment of humanity, we can't tolerate people who behave like this in
> public._

The aim is not to make computing a "friendly and welcoming field" (that would
be the boy scouts), it's to make it an effective, productive and interesting
to work in field.

If you start judging contributions to a field by politeness of "behavior in
public" then you're doing the field a disfavor.

The biggest scientist in some field could as well be a huge jerkoff -- for
example, Djikstra was known for his passionate rants and snarky tone against
everything he considered a bad practice.

~~~
dhconnelly
You're setting up a false dichotomy. There is a middle ground between behaving
like an asshole and too much diplomacy. Do you really think respectful
disagreement is the same as being "overly-PC"? Torvalds is a little harsh
sometimes, but it's nothing compared to the disgusting rudeness of this rant.

~~~
jusben1369
Yes it's a great point. Too often we try and reduce things to black and white
when nearly everything is a shade of gray.

~~~
gruseom
I've never understood that "shades of gray" line. How about some color?

Edit: this seems to need clarifying. I understand what the line _means_. What
I don't understand is why people use it so much when it repeats the error it
purports to correct. In nearly every case where "reality is either 0 or 1" is
wrong, "reality is some coefficient of a single variable between 0 and 1" is
just as wrong. That is a poor way to champion the richness of reality. What's
interesting is how it corresponds to the emotional crimpedness of a world in
which everything is gray.

~~~
effigies
It's a gradient, not a binary.

Adding colors to the metaphor adds words with no increase of understanding.

~~~
gruseom
Understanding is not disjoint from emotion.

~~~
coderdude
What is this, arguing for the sake of arguing? He answered your question
perfectly. Adding color for "emotions" makes absolutely no sense in this
context.

------
dkarl
I can't believe the discussion about this article has devolved into a
discussion about whether a particular writing style, which has been a witty
and entertaining vehicle for argument for thousands of years, should be
expunged from the programming community as if it were a pathology. A
preference for civility is one thing, but an inability to distinguish between
different pieces of writing according to their intent, effect, and value is
another thing entirely. Naggum's target here is XML (a juggernaut) and the
entire community behind it (another juggernaut,) not an individual poster. He
isn't bullying anyone or singling anyone out personally. He also writes with a
great deal of experience and insight into this particular problem, and his
tone is very well-suited to communicate his experience and insight.

The programming community has its share, and maybe more than its share, of
people who hurt the community more than help it because of how they treat
other people, and also people who could learn to participate in a much more
constructive way. The matter can't be oversimplified, though. There are no
simple rules that can be applied. Sure, you could describe the post as
aggressive, intolerant, and irritable. Most writing that can be described as
aggressive, intolerant, and irritable could be improved by making it less so.
Then again, most things described as "pungent" -- rotting garbage, my feet --
could be improved by making them less so, but not Camembert.

~~~
erikpukinskis
What you find "witty", I find boring. What is so interesting about dozens of
ad hominem attacks? They're not even clever insults... just long adverbs
(staggeringly) attached to playground names (idiot) and misogynistic,
nonsensical analogies (rape).

~~~
dkarl
Well, that's one way of reacting to it. Many people who found Christopher
Hitchens delightfully witty decided he was no longer so witty when he became a
neocon and started writing things they disagreed with. That only means they
never appreciated him as a writer, but only as a cheerleader for their
particular side.

 _misogynistic, nonsensical analogies (rape)_

Being offended at horrible things treated lightly in a piece like this is the
surest sign one has misplaced one's sense of humor.

------
sciurus
The heart of his suggestions:

Remove the syntactic mess that is attributes. (You will then find that you do
not need them at all.) Enclose the /element/ in matching delimiters, not the
tag. These simple things makes people think differently about how they use the
language. Contrary to the foolish notion that syntax is immaterial, people
optimize the way they express themselves, and so express themselves
differently with different syntaxes. Next, introduce macros that look exactly
like elements, but that are expanded in place between the reader and the
"object model". Then, remove the obnoxious character entities and escape
special characters with a single character, like \, and name other entities
with letters following the same character. If you need a rich set of
publishing symbols, discover Unicode. Finally, introduce a language for micro-
parsers than can take more convenient syntaxes for commonly used elements with
complex structure and make them /return/ element structures more suitable for
processing on the receiving end, and which would also make validation
something useful. The overly simple regular expression look-alike was a good
idea when processing was expensive and made all decisions at the start-tag,
but with a DOM and less stream-like processing, a much better language should
be specified that could also do serious computation before validating a
document -- so that once again processing could become cheaper because of the
"markup", not more expensive because of it. But the one thing I would change
the most from a markup language suitable for marking up the incidental
instruction to a type-setter to the data representation language suitable for
the "market" that XML wants, is to go for a binary representation.

~~~
olavk
He basically suggest turning XML into s-expressions. Eg.

    
    
        <p>Hello, <a href="http://example.com">world</p>
    

Turns into something like:

    
    
       <p "Hello," <a <href "http://example.com">"world">>
    

Not an obvious win as far as I can tell, wouldn't like to fix an ummatched
bracket in a document ending in >>>> rather than e.g.
</form><div></body></html>. But your mileage may vary.

His other ideas have more or less been implemented, with varying success. XML-
Schemas have been introduced which is more powerful that DTD's. Some schema
languages can indeed do complex computations for better or worse.

The macro-expansion approach is how XSL-formatting worked, and it turned out
it was not as powerful or elegant as the CSS approach. It made sense when
coming from a publishing background, but didn't predict dynamic interactive
web pages.

~~~
antidoh
"Not an obvious win as far as I can tell, wouldn't like to fix an ummatched
bracket in a document ending in >>>> rather than e.g.
</form><div></body></html>. But your mileage may vary."

HTML would then be an acronym for Lots of Irritating Single Angle Brackets.

------
cppsnob
Heh, Naggum. Why do we still talk about him? Did he build awesome things? Did
he change the world? Affect so many lives by being a great teacher?

One thing he did do was hate a lot. You can find him throwing hate at whatever
it is you hate, no problem. Perl, C++, XML... whatever (except Lisp), he
probably wrote an article flaming it. So when you want to find a well-written
article lambasting whatever it is you hate, you can look to Naggum.

And while he wrote those thousands of USENET posts, the rest of us were off
building great things with the stuff he hated on. He was barely on my radar
until he died, and looking through archives I realized he actually responded
to me a couple times. I guess I was too busy making stuff to pay much
attention.

~~~
aerique
I can't really be bothered to cite references since I'm typing this on an iPad
and I doubt you'd care but I do like to point out for others than he has build
great things and worked on influential projects (take SGML for example).

I can only dream to attain 10% of the technical prowess he achieved in his
short life.

------
gfodor
It's certainly a poor way to argue when you set yourself up so that unless the
opposing side agrees with you on politics, morality, the U.S. government,
education, human development, and oh yeah, portable document formats, they're
going to be forced to disagree with you.

~~~
Jach
Only if the opposing side is an idiot who can't separate beliefs about
separate things from one another. Naggum himself has a great rant on such
"one-bit" people:
[http://www.xach.com/naggum/articles/3225130472400367@naggum....](http://www.xach.com/naggum/articles/3225130472400367@naggum.net.html)

But then, if you're knowingly arguing with idiots, it's typically done more
for some weird sense of fun than for the purpose of actually trying to change
their mind. Even an argument between supposedly smart people (even those aware
of Bayes' Theorem) doesn't result in mind changes all that often. Actually
changing someone else's mind about something they already have a relatively
firm belief on is at least as hard as changing your own mind.

~~~
vidarh
So it's only a poor way to argue if the opposing side is a normal human, is
what you're saying.

If you think you can fully separate beliefs about separate things from
another, there's a whole slew of psychology research that says you're deluded.

~~~
kstenerud
This has piqued my interest. What specific research are you referring to?

------
conanite
It is usual to correlate technology preferences with entrepriseyness - BigCo
developers, in general, prefer java or c#, whereas startuppy devs, in general,
prefer ruby or python, and nobody likes php except the millions of devs who
use it.

Is there a similar correlation between technology preferences and political
leaning, or religion? If I said I was Christian, or opposed abortion, or
approved of gun ownership or of invading Iraq, would you assume, bayesianly,
that I probably prefer XML to JSON?

Does a preference for lisp, for example, indicate a predisposition to atheism?
What kinds of political or religious leanings might be inferred from one's
position on the RDBMS-NoSQL spectrum?

~~~
fierarul
Programming is a way of modeling reality and a preference might reflect a bit
of your own mind. Then again, people aren't always consistent, so it might not
necessarily correlate strongly. Nice remark though!

------
raverbashing
Yes, it's a rant. Why? Because probably the author is tired of explaining the
same things over and over to the XML defenders

Really

JSON is better (and XML is crap) it's ridiculous, it's obvious, it's clear
cut, and when people "don't get it" you start resorting to irony

Like this:

> Enclose the /element/ in matching delimiters, not the tag

Let's compare them

<blah>foo</blah>

To json

blah: foo

1 - size (json wins)

2 - matching. Here's an exercise, try splitting json with sed, and then try
with XML. Sed is a good exercise, because it's limited to simple grammars, so
it kind of gives an idea how much trouble it is to parse each one

3 - redundancy: what would mean if the line was <blah>foo</lala> This is a
mistake, right? But I can assume the tag name is blah and the data is foo.
This is a glaring redundancy in grammar, bound to cause trouble.

Not to mention working with json data (like python) is _natural_ compared to
work with XML data.

~~~
olavk
Now try to apply this to the real world. The most popular and widely used
SGML-derived format _by far_ is (X)HTML. Show me e.g. how a paragraph with a
few embedded links is improved by translating to JSON.

It is not. Of course _some_ kinds of data are indeed expressed simpler with
JSON.

Another real world example is RSS. More data-y that HTML but still may contain
mixed content. Would it be better if it had been JSON from the beginning? The
answer is not clear cut.

So maybe JSON is not _always_ the right answer. Maybe it depends on what kind
of data you want to express. But then it suddely is not a debate with a "XML-
defenders" which "dont get it". Then it is a discussion about which tool is
approprite in a given situation rather than a war between tribes. And how do
you boost you ego and sense of superiority with a boring debate about tools?

~~~
raverbashing
Popular != Better

"Show me e.g. how a paragraph with a few embedded links is improved by
translating to JSON."

Even though HTML and XML are related, something like a web page is so
ingrained that would be difficult to change

And JSON was never a markup language

The criticism of XML goes much more towards using it as a 'Key/Value'
serialization format.

But you could imagine something like that (commas and colons would have to be
replaced in a 'JSON' markup language)

{p: Read it on {href:news.ycombinator.com, a:Hacker News}}

Size gains, parsing speed gains.

RSS is not as clear cut, really, it's not a complicated structure, so it could
be YAML for example

"So maybe JSON is not always the right answer. Maybe it depends on what kind
of data you want to express"

XML has attributes, which I'm not against (unlike the article) and it has a
more natural nesting of information. But for the majority of data, yes, JSON
is better

This is about coder productivity (which is _increased_ with JSON it's not even
funny), either in Python/Java (even with the whole swiss army knife of XML
libs they have)/C# or others.

Speed is also a issue. Encoding and decoding XML is slower, even with C
speedups (example: Python lxml). And this is relevant for transformation of
data or even sending it to the user.

------
mindstab
And that's how we seem to have transitioned to JSON, at least for some
information storage and transition :) Remember when it was all the rage to use
XML for RPC? and even in AJAX?

~~~
TeeWEE
JSON and XML have two totally different application domains. For some XML is
better for some JSON. People who think JSON is better never had a problem with
changing data schemas, different parties and common interface.

In short XML is a lot more powerful than simple JSON. JSON is a subset of XML
with a different syntax.

~~~
DavidAbrams
How is XML any more immune to schema changes?

~~~
rickette
Is not immune, but you have an explicit contract in the form of a XML Schema
Definition (XSD).

~~~
haberman
XSD is not part of XML, it's a totally separate specification. JSON likewise
has schema specifications like JSON Schema.

Also, Protocol Buffers are _by far_ better a better schema language than XSD.

------
thyrsus
My worst practical problem with XML is that it is kryptonite to doing conflict
resolution in version control systems. I've eased my pain with one such
document by translating the document elements to a directory full of files,
each of which has the name of its unique identifier attribute and the contents
of which looks something like

    
    
      === attributes ===
      a=b
      === contained elements ===
      id_1234
      id_4567
      === associated text ===
      This is the text associated
      with this element
    

Those one-line tags which are syntactically distinct from attributes or
identifiers serve as anchors for the diff program to "do the right thing". For
this application, I can omit whitespace text elements, and there is at most
one #text element per containing element, so I can conflate them. Were there
more than one, I could generate id's for the contained text elements.

Are there better ways of doing this? Tools?

------
haberman
Eh, as XML rants go I prefer this one from Graydon Hoare (lead designer of
Mozilla's Rust): <http://www.rdb.com/demo/XML/>

It colorfully beats around the bush a bit, but its primary points are:

1\. any lossless coding of bits can be cajoled into representing anything, the
important question is whether it's a _convenient_ coding.

2\. the fact that so many XML formats have sub-languages embedded into strings
means that nearly all XML processing requires another, higher level of
software to fully parse the document. (it also is evidence that XML wasn't
that convenient of an encoding to begin with).

3\. just because you can write a document of XML expressing some logical idea
doesn't mean that the idea is implementable. For example, the "Spacecraft
Markup Language" (which is a real thing, or was).

------
jroseattle
Unfortunately, I lose interest in this rant amongst all the inane references
to other things in the world.

Kind of ironic that an argument in favor of less verbosity couldn't be more
verbose itself.

~~~
bestes
I feel the same way, except it has been quite a while since I've read a good
usenet-style rant, so I indulged.

The correctness of an idea vs the expression of the idea are not related. You
can be "right" and horrible at explaining or "wrong" and very persuasive and
eloquent.

The comparisons with comments from Linus are good examples. He can be very
abrasive and condescending, but is usually right!

~~~
dhconnelly
The author's technical criticisms are indeed valid. But Linus Torvalds doesn't
compare things to murderous, rapist gang members, or offer offensive social
commentary on the culture of an entire nation, and I suspect he wouldn't find
your comparison of him to this author very flattering.

------
pwpwp
Naggu _m_ 's the name.

------
scranglis
This part felt particularly applicable to the intellectual (and emotional)
environment of young companies and the products they bring forth:

 _Many an idea or concept not only looks, but /is/ good in its infancy, yet
turns destructive later in life. Scaling and maturation are not the obvious
processes they appear to be because they take so much time that the
accumulated effort is easy to overlook. To be successful, they must also be
very carefully guided by people who can envision the end result, but that
makes it appear to many as if it merely "happens"._

The idea and the people that carry it out are intertwined because neither are
static or even have any "essential" material whatsoever.

Worth reading just to have that thought articulated so well (once again).

------
spinchange
This is kind of off-topic, but whenever I see a spate of, "Is RSS dead/dying?"
posts, I wish they would focus more of the historical context around the W3C
and XML standards and developer/browser maker reaction to that to split from
the W3C to focus of HTML/CSS/JS

Are there any other end-user facing specs/apps built on XML beside beside RSS?

It always seemed redundant to me that the most popular RSS reader is Google's
which is re-wrapping stripped down XML feeds back into a another
HTML/CSS/Javascript presentation layer.

Maybe JSON readers would be more popular to implement.

~~~
icebraining
I don't know what do you mean by "end-user facing specs/apps".

But if enterprise apps where the content is formatted in XML count, then there
are thousands (I'm working on OpenERP, for example, which uses XML both to
define views and to transmit content to the client).

Then there's SVG. And the Office formats, both MS' and LibreOffice's. And
obviously XHTML. And XMPP, which drives Google Talk and clients to FB Chat.

Just look at the list:
<https://en.wikipedia.org/wiki/List_of_XML_markup_languages>

------
crazygringo
> (I note in passing that the stereotypical American male longs for much
> larger than natural female breasts, presumably to maintain the proportion to
> his own size from his infancy, which has caused the stereotypical American
> female to feel a need for breasts that will give the next generation a
> demand for even more disproportionally large breasts.)

Is this how Norwegians see Americans?!

~~~
golden_apples
Given that breastfeeding has been declining in America over the past couple
generations, it reflects a misunderstanding of American sexual pathology, if,
in fact, it was meant as more than a throwaway comment.

------
6ren
> ...go for a binary representation. ... The question of what we humans need
> to read and write no longer has any bearing on what the computers need to
> work with.

We seem to be heading away from binary.

I think the bigger problem with XML is the XML stack. Many of the ideas are
insightful, powerful, useful - but the embodiments are tedious (e.g. XML
Schema, XSLT). OTOH, a problem is an opportunity

------
grannyg00se
At first the hyperbole was somewhat amusing. But as he criticizes xml for
being too heavy for what it is frequently used for, so are his diversions into
politics and human behaviour. They end up becoming an obstruction to the meat
of the text.

------
silverlake
15 years ago I had Erik in my killfile. He's dead and I still can't escape
him.

------
evo_9
Wiki on Mr. Naggum for those curious about the man behind the rant:
<http://en.m.wikipedia.org/wiki/Erik_Naggum#_>

------
cdent
This really shouldn't be news. The complaint seems to be that XML is no good
for anything that isn't document/text heavy (i.e. doesn't fit in with the
intent of SGML).

If you weren't already thinking this in 2002 then you probably weren't paying
attention.

------
rch
I noticed the reference to binary representations with some enthusiasm. For a
variety of reasons, I find myself using HDF5 in instances where I might have
chosen XML (or JSON for that matter) a few years ago.

------
Aqueous
"If GML was an infant, SGML is the bright youngster far exceeds expectations
and made its parents too proud, but XML is the drug-addicted gang member who
had committed his first murder before he had sex, which was rape."

I think he could have just left it there.

------
stesch
R.I.P.

------
batista
Especially the part around 2/3rds of the text, about how to improve XML.

Also loved the jokes and metaphors. We need such people in tech discussions,
most community leaders have turned teletubbies-nice to each other, to the
detriment not only of a good flame-fight, but of the actual dismissal of brain
dead ideas.

Instead of violently ridiculing proponents of bad ideas/software/etc to shame
and (hopefully) sepuku, we tip-toe around them, or at best, just make light
fun of them, like in the "Mongo is webscale" video.

~~~
olavk
I think rants like these makes us more stupid. It turns a discussion about
pros and cons of various technologies into pissing matches. His colorful
metaphors does not actually illuminate the issue at hand, but only serves to
set up "studid" versus "smart" and appeals to emotion by hoping to convince
the reader to the side of the smart ones, without actually arguing the
specific points of the thechnology. (E.g. why is backslash _obviously_ better
than ampersand-semicolon as character escapes? If he have a valid argument for
this he certainly does't feel the need to disclose the reason for the reader.
He would rather make an elaborate infantilization analogy about breasts.)

I guess this kind of rant appeals to people who have a deep need to feel as
part of small elite surrounded by a sea of stupidity, but doesn't have the
capacity to understant how different tools may have various pros and cons in
different specific situations.

~~~
batista
> _His colorful metaphors does not actually illuminate the issue at hand_

I think quite the opposite, he makes a very compelling case, and even gets
into the details. Even in the part you mention, for example you missed some
stuff that answers your question (without hand-holding):

= = = Then, remove the obnoxious character entities and escape special
characters with _a single character_, like \, and _name other entities with
letters following the same character_. If you need a rich set of publishing
symbols, _discover Unicode_. = = =

 _> I guess this kind of rant appeals to people who have a deep need feel as
part of small elite surrounded by a sea of stupidity, but doesn't have the
capacity to understant how different tools have pros and cons in specific
situations._

It's not about "different tools have pros and cons in specific situations",
it's about how some tools have defects that make them bad for EVERY SINGLE
situation. CORBA comes to mind as another example.

It's not that something XML-like cannot be good for certain situations.

It's rather that XML as-it-was-designed has several flaws that don't NEED to
be there, and don't benefit ANY use case.

For example, goto has some use cases in C that it is good for. NULL terminated
strings on the other hand are a bad idea, and have no place anywhere.

~~~
Estragon

      > he makes a very compelling case, and even gets into the details
    

He always made a compelling case, I would go so far as to say that he was
almost always right. The problem with this style of communication is that it
hardens people's positions, because the criticism of the position comes
wrapped up with the implication that anyone who holds it must be an asshole.
That makes it much more difficult for most people to approach the question
from a purely technical perspective.

------
moron
The parts of this that aren't about XML are stupid as hell.

~~~
SoftwareMaven
While I generally dislike comments like this (and I wouldn't be surprised if
it got downvoted on HN as lacking substance), this time, it fits. The
"arguments" provided we're inane, providing no justification for the very-few
_actual_ suggestions contained therein.

What is particularly sad is that I completely agree with his comments on XML,
and I try to teach others about the dangers of not using XML intelligently,
but I would _never_ refer to this author to back up my feelings.

