

What's frustrating with XML? - julien
http://stackoverflow.com/questions/3536893/whats-frustrating-with-xml

======
crux_
Closed as subjective and argumentative. Darn, I wanted to pick a fight there.
;)

Many of the really crappy aspects of XML have been thoroughly abstracted away
by libraries. I'm of the opinion that you can only truly despise XML if you've
tried to write a parser for it yourself.

Examples of the issues: It requires arbitrary lookahead & backtracking. There
is no canonical document encoding. (Support arbitrary character encoding for
content: great idea! Support arbitrary encoding for the metadata/XML itself:
The opposite of a great idea.) Entity references: need I say more?

There's a reason this sort of thing keeps coming up:
[http://voices.washingtonpost.com/securityfix/2009/08/researc...](http://voices.washingtonpost.com/securityfix/2009/08/researchers_xml_security_flaw.html)

XML as it would be in an ideal world would be (a) simple and (b) unambiguous;
it fails both.

All that said, there's a huge benefit in the whole world arriving at a
somewhat standard way of doing things, and a lot of that benefit remains even
if the standard itself really sucks.

~~~
sofuture
XML isn't bad in and of itself. It's just a powerful, neutral format. The
problem with XML is that it allows so much abuse.

I had to interface with a government system at some point. They shipped us a
whole schema of custom elements such as "IsShipment" (for example) which
extended bool to allow extra options. All this was clearly documented in the
schema comments "You can use 'true', 'false', 'sortof', and 'mostly'." So we
go to validate the data.... and it doesn't validate. Against their schema.
Because they didn't extend bool, they just said they did in the comments. When
we got back in touch with them, we realized that they had no clue that the
stuff _not_ in comments mattered. As far as they knew, XML was just text, and
they had clearly specified how it was to be interpreted (in English).

~~~
crux_
> XML isn't bad in and of itself.

I'm arguing that it is. Symptoms of badness: version 1.0 of the basic standard
is on its 5th edition over ~8 years. Said standard is stupendously huge. Major
security/crash & other bugs in pretty much every parser
([https://www.cert.fi/en/reports/2009/vulnerability2009085.htm...](https://www.cert.fi/en/reports/2009/vulnerability2009085.html))
as of less than a year ago. XHTML's abandonment. In short, XML stinks and
there's plenty of evidence out there that it does.

Please note that doesn't mean I would wish XML away. Widespread adoption, in
and of itself, is a killer feature.

------
terra_t
Namespaces are both the genius and stupidity of XML.

The idea of being able to stitch different vocabularies together is genius.

That said, I haven't seen a single XML toolchain that doesn't have some bug
or, ahem, irregularity, in how namespaces are handled. XLinq comes pretty
close to being correct though.

~~~
dantheman
The biggest problem is that a namespace can be declared anywhere, so
technically it's impossible to use a streaming reader unless you stream
through the document twice. Also xlink and other ways of dynamically composing
xml is too complicated.

~~~
masklinn
I don't really get why, since you can't use a namespace before it's defined
and namespaces are scoped. The only reason why you'd have to stream through
twice is if you wanted a list of all namespaces in the document at the start
of your parsing, and why would you want that (let alone care about it)?

~~~
jerf
... which reveals the real problem with namespaces, which is that "nobody"
actually understands them. There really isn't that much to them, in my
opinion, but every time I encounter them in the wild, they're never
implemented correctly, with the variance ranging from really, really wrong to
just sort of off. XMPP is the closest to correct, but they still screwed up in
that a user's connection to an XMPP server is under one namespace, and a
component is done under another, yet an <iq> packet with no namespace
qualification is supposed to be treated as the same packet in both, despite
being two different packets. (I would accept simply acknowledging that they
are the same somewhere, but I've never found it.)

Some people use the namespaces in what appears to be a decorative manner. Some
people mandate that the prefix be a certain thing <x:tag xmlns:x="thing">
works while <y:tag xmlns:y="thing"> doesn't, proving they're doing it wrong
under the hood. Some things get it right with xmlns:*, but don't understand
what the bare xmlns itself means, so they only trigger namespace logic if
there's a colon in the tag. I've seen cases where the namespace is declared,
then used out-of-scope like it's a global declaration or something. I'm still
waiting to encounter the standard or software that actually uses them
correctly. And despite this listing of wrong answers it really isn't that
complicated....

~~~
masklinn
> There really isn't that much to them, in my opinion, but every time I
> encounter them in the wild, they're never implemented correctly, with the
> variance ranging from really, really wrong to just sort of off.

My biggest issue with namespaces is the dichotomy between namespace URLs and
namespace prefixes, and most people not understanding that prefixes are
actually aliases for the URLs. For that reason, I quite like Clark's notation
for namespaces (which is used by ElementTree and LXML), it makes the relation
between element and namespace much clearer. Shame you can't use it in XML
documents or XPath queries.

Also, that the default namespace only applies to elements, not attributes. I
kind-of understand why they did that, but it's still very annoying.

> Some people use the namespaces in what appears to be a decorative manner.
> Some people mandate that the prefix be a certain thing <x:tag
> xmlns:x="thing"> works while <y:tag xmlns:y="thing"> doesn't

Oh yeah. Isn't it maven or something, which does that? Or did? I know I
encountered it once or twice and I was using ElementTree 1.2 at the time (the
one that went into the Python stdlib... and still is) and it doesn't keep
track of XML namespace aliases (or defaults for that matter, and doesn't let
you set them short of hacking through the private and undocumented namespace
map) so everything comes out as `ns0:foo`, `ns1:bar`, ... Perfectly valid, and
then you have a retarded tool which doesn't actually understand namespaces
(even though the example documents say you need a namespace spec) and want an
element called `foo:bar` and not "the element `bar` in the namespace
<http://foo.com>.

> Some things get it right with xmlns:* , but don't understand what the bare
> xmlns itself means, so they only trigger namespace logic if there's a colon
> in the tag.

When that happens, somebody ought to get shot. Default namespaces are one of
the most basic parts of namespaces (and it's not that hard to _parse_ , though
production might be a different issue).

> I'm still waiting to encounter the standard or software that actually uses
> them correctly.

libxml2 tended to work quite well in my experience (mostly though lxml),
though I don't doubt I just missed its bugs.

edit: damn it, is there no way to escape those damn asterisks in yc?

~~~
jerf
Well, to be fair, the best parsers do handle it correctly. Frequently the
binding logic in Perl or Python or whatever will then proceed to get it wrong!
And if you've got anything more complicated than a straight C binding and it
actually tries to do stuff for you, you can just forget about it working
correctly.

When I say I'm still waiting to see the software that does it correctly, I
mean like end-user-level software, not the parsers.

------
contextfree
It convinced a generation of framework designers not to bother designing a
decent concrete syntax for their domain-specific languages. When the framework
is small and/or its developers probably couldn't hire a good language person
anyway, this might be for the best, but it's a shame when a huge, well-funded
and otherwise fairly well designed beast like WPF/Silverlight/XAML is trapped
behind tasteless syntax.

On a deeper level, the element/attribute distinction unnecessarily mucks up
the data model, but I'm not sure how big a problem that is in practice.

------
dstein
The problem with JSON is that isn't very useful by itself because you always
have to encode and decode it, and then analyze the structure to do something
with the data. It works better as part of a protocol, not as a native data
format.

If you convert your JSON data to XML (assuming it is structured in a way that
makes it lossless) you have a whole lot more useful tools at your disposal.

~~~
arethuza
People do seem to like abusing attributes... A surprising number of systems
seem to rely on embedding entire XML documents in attributes within other
documents.

------
wooby
"We're an XML shop": who says that?

------
joe_the_user
The most horrible XML encoding I've encountered recently was the Gnome/Kde Xdg
desktop menu system.

Rather than encode what menu item is in what pseudo-folder, it encodes every
change made to the menu system as a diff and expects the applications will
piece these together -- and the libraries that parse this monstrosity all
huffily say "this code is NOT stable...".

<http://standards.freedesktop.org/desktop-entry-spec/latest/>

Of course, this isn't so much XML's fault as the fault of the folks who
kludged together this monstrosity. This shows, however, how XML is more or
less a tool for knitting together two or more generally poorly-specified
encodings. The good is that these might be somewhat better inside XML than
running about wild but the bad is it lets them continue to exist all. See
Microsoft's Office XML "standard".

