
XML is almost always misused - mrzool
https://www.devever.net/~hl/xml
======
alkonaut
I’m going to just argue the exact opposite of the article: xml and json are
both structured data formats useful for tree like data graphs, such as
objects.

Whether that was the intended purpose when xml was designed is irrelevant.
It’s what xml is used for in almost every case.

The author also doesn’t suggest what should be used instead to encode
structured data, or perhaps more importantly what _should_ have been used to
encode graph like things such as map/lists/objects in the 2000’s. Json really
hasn’t been an alternative until quite recently (10 years ago?).

In fact reading the article carefully I fail to see the author argue _why_ xml
shouldn’t be used as a data format either.

~~~
Koshkin
Well, XML is a _markup_ language (and is really good at being that) while JSON
is not. Sure, XML can be used as a poor man's data storage, as a base for a
DSL, etc., but almost always there are better choices.

~~~
jeremyjh
You aren't responding to the comment here, you are just reasserting the
article's position. I'd argue there is not another format that is obviously
better for every data storage or exchange use case, or that surpasses all of
XML's benefits while minimizing all of its downsides. I don't want to look at
XML, but I do understand why it is being used.

~~~
gambler
Abuse of XML _killed_ it a format. JSON is absolutely shit for semantic
markup, and yet developers today routinely use it for documents because "XML
is bad". They contrive ridiculous schemes for adding metadata and type
information. They use it to generate HTML even when HTML _takes less space_.
Finally, we regressed from XTML to HTML5. Buy-buy namespaces and parsing
consistency.

~~~
myrryr
Right, but, if given a choice of what to use, between XML and JSON, I'll pick
JSON every time.

XML is a complete mess. Have you SEEN it's spec?

You can put JSONs spec in a single page. XMLs spec, not so much. Hell, most of
the XML parsers don't support the spec, and the ones which do, historically
have been riddled with security holes.

JSON over XML was simplicity over a crazy spec built by a bunch of companies
all wanting to shove their own crazyness into it.

~~~
sedachv
[http://seriot.ch/parsing_json.php](http://seriot.ch/parsing_json.php)

[https://en.wikipedia.org/wiki/JSON#Data_portability_issues](https://en.wikipedia.org/wiki/JSON#Data_portability_issues)

XML has XSD and RELAX NG for more than 15 years now. [https://json-
schema.org/](https://json-schema.org/) is still a draft.

------
kbenson
So, what _is_ XML good for? If it's not good for data as everyone says (and
I'm not inclined to argue), but it is good for documents, what kind of
documents are we referring to? A defined metadata on a text document? A
template used with data to generate something else? Is a configuration file a
document or data? Where would I want to use XML that something like JSON, a
text document, or some combination thereof wouldn't be better?

I'm not being facetious, this is an honest question. Where are the "right"
places to use XML?

~~~
Finnucane
Just the sort of thing you would think of as 'documents'\--the texts of books,
manuscripts, and the like, where structure may be somewhat arbitrary. For
instance, I work with a few different text corpuses--one of which is an actual
dictionary, with entries, definitions, usage examples, etymological
information, and bibliographic references. Another is a collection of poetry
manuscripts, with annotations for line breaks and editorial emendations, both
from the author and other editors (i.e, places in the manuscript with
crossouts, interlineal notes, marginal notes, etc).

I mean, in theory, you could do this in JSON or some other data structure. But
you would go insane and be shooting yourself in the head before long.

~~~
jolmg
> you could do this in JSON or some other data structure

I'm not sure you could. For example, in another comment, I mentioned
DocBook[1]. How would you do the following sample document in JSON?

    
    
      <?xml version="1.0" encoding="UTF-8"?>
      <book xml:id="simple_book" xmlns="http://docbook.org/ns/docbook" version="5.0">
        <title>Very simple book</title>
        <chapter xml:id="chapter_1">
          <title>Chapter 1</title>
          <para>Hello world!</para>
          <img src="hello.jpg"/>
          <para>I hope that your day is proceeding <emphasis>splendidly</emphasis>!</para>
        </chapter>
        <chapter xml:id="chapter_2">
          <title>Chapter 2</title>
          <para>Hello again, world!</para>
        </chapter>
      </book>
    

Would you make each <chapter> into an object? But you have 2 <para> children
in there with an <img> in between. And one <para> has an additional <emphasis>
in the content. I can't think of a good JSON schema equivalent to this.

[1]
[https://en.wikipedia.org/wiki/DocBook#Sample_document](https://en.wikipedia.org/wiki/DocBook#Sample_document)

~~~
unilynx
See [https://developers.google.com/docs/api/samples/output-
json](https://developers.google.com/docs/api/samples/output-json) for what
Google Docs does - basically separating markup from the text by using indices.

which is probably the only way to properly deal with markup and especially
commented sections that can span over paragraph start/ends - neither JSON or
XML seems to have a proper answer for such annotations and I wonder if there's
any standard format that can that, especially if humans still want to
reasonable be able to view or edit iit...

(OOXML and its binary equivalents more or less solve this by completely
separating paragraph and character formatting, both separately indexing the
spans of text they annotate)

~~~
dfox
That is what essentially every WYSIWYG text processor does. And also the
reason why getting sane HTML out of text processor is somewhat non-trivial, as
the separately indexed spans can very well overlap, contradict each other or
contain completely unnecessary formatting information.

------
mojuba
That's a very useful and sobering view actually. Unfortunately for the XML
format it wasn't designed to prevent its own abuse. But anyway, _XML is for
documents_ sounds like a good and acceptable paradigm.

That being said, one subtle and important (and often overlooked) difference
between XML and, say JSON is that you can stream XML while parsing it on the
application level, whereas JSON can not be parsed by the application due to
arbitrary ordering of keys. (Of course lower level parsers use streaming
anyway, but that's not the point)

In fact you not only can but you should parse XML while streaming it. This is
another common abuse: wherever you look you see some high level function that
loads an entire parsed XML structure into memory at once. But once you start
asking yourself where the file may be coming from you realize that your system
may be open to denial of service attacks. E.g. is your system ready to receive
a 16GB XML file?

~~~
fanf2
There is no reason you can’t do a SAX-style parser for JSON. A quick bit of
searchengineering will find dozens, eg. [https://github.com/dgraham/json-
stream](https://github.com/dgraham/json-stream)

~~~
mojuba
You can in principle, but because the order is not guaranteed, you may find
yourself accumulating things in memory. It depends on the task of course, but
neither standard parsers nor generators are required to send adjacent JSON
entities in a specific order.

I remember someone describing a trick where instead of sending a JSON array
they'd just send a stream of JOSN objects, one per line, so that the receiving
end could parse the data in a streaming fashion. But that's not JSON anymore.

~~~
fanf2
JSONlines is a relatively common technique
[http://jsonlines.org/](http://jsonlines.org/)

------
unilynx
XML is a actually wonderful format for data and especially extensible
configuration if you combine it with XML Schema and CSS selectors...

\- XML schemas give you a ready-to-use format to describe, restrict and
document available configuration settings. The unique keys help and a libxml2
gives ready to use validation, even if you may need to 'translate' its error
messages before showing them to end users

\- XML schemas also support other annotations so you can further generalise
your configuration readers by recording the necessary bindings in the XML
schema itself, allowing to use it eg to define application user interfaces.

\- Almost any text editor can do basic syntax validation preventing most
typographic errors, and even better if they can read the schema

\- XML schemas are extensible using <import>s, but namespaces still enforce
some separation. You can define explicit points where plugins extend your
configuration format using <any>

\- Human editable - closing tags are noisy but more readable than }],{}] when
non-programmers may have to edit these files just to add a few extra
textfields to an UI.

\- Better datatype support, eg datetimes, by using XML schemas. JSON's type
support is too limited

\- Support for comments!

\- And once you've verified the schema... CSS selectors and DOM APIs to
actually process the XML documents.

YAML fixed quite a few things, but still no date times or as far as I know
standardised approaches to defining schemas. And I've lost count at how any
attempts exist to add schema information or namespacing to JSON...

But for markup... we may be better off to just use markdown inside CDATA
blocks

~~~
sedachv
> XML schemas give you a ready-to-use format to describe, restrict and
> document available configuration settings.

As someone that likes and uses s-expressions, I never thought I would find
myself defending XML, but here we are in 2019, no one understands basic
parsing theory anymore, and file formats have "evolved" to hot garbage like
YAML and TOML.

XML has some great tools in comparison:

[http://xmlsoft.org/xmllint.html](http://xmlsoft.org/xmllint.html)
[https://relaxng.org/](https://relaxng.org/)

> But for markup... we may be better off to just use markdown inside CDATA
> blocks

SGML can still make a comeback: [https://leancrew.com/all-this/2014/09/sgml-
nostalgia/](https://leancrew.com/all-this/2014/09/sgml-nostalgia/)

~~~
novok
YAML and TOML were optimized for human reading and writing, like markdown. XML
& even JSON are not as human optimized, but still human usable.

~~~
gambler
Humans should not interact with computers by manually editing data structures
serialized to ASCII.

~~~
unilynx
But we can't always budget a nice configuration application.

And once we've just put simple XML file there for configuration and we're past
the prototyping phase... "well this actually works good enough, let it be".

~~~
Const-me
You likely need data model classes for the config anyway, along with support
of serialization and deserialization (XML or not is not important). Use a
stock property grid control, pass the root object of the config, and you’ll
get a GUI that does the job much better than ASCII files.

------
zippergz
If a piece of technology is "almost always" misused, is that the fault of the
users, or the technology?

~~~
scarygliders
If you mean "users" as in "users of the XML format", then it's the fault of
developers as 'users' of their applications don't have any choice in the
matter.

~~~
zippergz
I mean "users" exactly the same way "used" was meant in the author's
statement. So yes, the developers, who are the people who are using XML.

------
CobrastanJorji
> Here are some very frequently occurring examples of bad schema design:

(4 lines, 75 characters)

> Here's the right way:

(10 lines, 133 characters)

I have a suspicion as to what went wrong.

~~~
SamBam
> "But if the people who made the strange decision to use XML as a data format
> [...] they might realise that what they're doing is unsuited to it and
> unergonomic

The author's point is that XML should not be a data format.

~~~
zwkrt
Is there logic to the authors assertion past “that isn’t what XML was intended
for”? It is a pretty nice data format if you want wide compatibility and
schema integration.

~~~
mrzool
The point is, XML is not a data format. It’s a markup language.

~~~
LudwigNagasena
I just invented XDF, it is exactly like XML, but it can be used to represent
arbitrary data structures. Do you have any objections now?

~~~
benibela
That is only valid if the XDF standard has been published by the W3C

~~~
gnulinux
This doesn't make any sense. W3C doesn't restrict/define use cases of XML, it
defines structure and semantics thereof; using it as-if it is an XDF document
is perfectly ok, just like you can use XML data for your DASH stream etc...
It's a structure on top of XML structure.

------
theamk
One thing this misses in the "dictionary" example is that tools (like xpath)
push you towards "key in attribute" selection. One of the most common
operations we do with dictionaries is lookup by known key, and storing the key
in attributes makes it much easier.

~~~
spiralx
"Key in attribute" is the correct way to do it, it's just that his examples
are absolutely terrible and make no sense at all. A completely unstructured
list of key-value pairs is overkill for any structured data format.

------
altmind
Can I have my gripe with apple plists?

<key>CFBundleDisplayName</key>

<string>TextEdit</string>

<key>NSHumanReadableCopyright</key>

<string>Copyright 2019</string>

This may be perfectly parsable by a SAX parser storing some state, but its
totally not processable by xslt.

~~~
mikl
Yeah, that's awful XML. Ironically, they used to have a decent readable format
in NeXTStep, it was only later it was XML-ified.

------
robofanatic
I guess the correct answer depends upon the requirement.

I like this way

    
    
      <root>
       <item key="name">John</item>
       <item key="city">London</item>
      </root>
    

So I can use this xpath to get the person's name:

    
    
      //root/item[@key="name"]/text()
    
    

Not sure what would be the xpath to get the name if the XML was

    
    
      <root>
       <item>
        <key>Name</key>
        <value>John</value>
       </item>
       <item>
        <key>City</key>
        <value>London</value>
       </item>
      </root>
    
    

This is a better example:

    
    
      <employees>
       <employee id="1">
         <field name="name">John</field>
         <field name="city">London</field>
       </employee>
       <employee id="2">
         <field name="name">Jack</field>
         <field name="city">Boston</field>
       </employee>
      <employees>

~~~
elFarto
The XPath for the second one would be:

//root/item/key[text()="Name"]/../value/text()

~~~
icebraining
Alternatively

    
    
      //root/item[key[text()="Name"]]/value/text()

~~~
benibela
You usually do not need to use text() in xpath, so this should work the same:

    
    
      //root/item[key="name"]/value

------
mickduprez
XML is/can be much more than a markup language and yes, it can be used very
badly but this is usually by inexperienced 'data wranglers' who don't
understand the difference between data attributes and data proper.

While XML can seem cumbersome (compared to JSON say) it is a very good 'data
transport' tool when used correctly with a sensible schema (XSD).

For example, we use XML as a 'vendor neutral' data format to export/import CAD
geometry and associated data for town utilities such as buildings, pipes,
roads etc. All this data has to be validated against the schema to ensure its
correctness. Using a schema like this enables the city council to import this
XML into the GIS system to be used for asset management, financial planning
etc.

A good schema can be key to sharing XML effectively between
departments/applications and being a markup language this data can also be
viewed independently using XLST.

------
just_myles
I agree with this portion 100%

The correct way to express a dictionary in XML is something like this:

<root> <item> <key>Name</key> <value>John</value> </item> <item>
<key>City</key> <value>London</value> </item> </root>

In the past I used to create scripts that exported xml from relational data
but didn't really understand the right way to build and structure them.

------
sosuke
The larger your XML file is the more accurately you're using it. Less " and
more <>. I made these mistakes, using XML like a I was writing an HTML doc.

------
tehjoker
It is difficult for me to see what the real issue is with examples given. It
seems to be more an aesthetic preference of the author rather than a technical
argument. People can use formats for whatever they want. :P

If you told me that the transmission and parsing rate is too slow for their
application, that's a real dig at it.

~~~
mehrdadn
I think it's a bit like trying to explain why a Python REPL isn't a substitute
for a calculator? Like it can of course do what you want, and you can't "see"
what's wrong if you just take what you see literally (you'll obviously get the
same answers regardless of what tool you use), but it's just... not meant for
that.

------
LameRubberDucky
For those wondering what you do with XML as a document markup language, see
the XML document that is the specification for XML. I had to look at the page
source to determine it really is an XML document. Looks like an HTML document.

[https://www.w3.org/TR/xml/REC-
xml-20081126.xml](https://www.w3.org/TR/xml/REC-xml-20081126.xml)

~~~
spiralx
I've got a stylesheet I wrote that turns HTML or XHTML into a fully indented
and highlighted representation of itself which I was quite proud of :) I used
to spend a lot of time writing XSLT and XQuery lol.

------
deanCommie
I think that Software Engineers should take influence from Authors (after all,
are we not all craftsmen/artisans?) and incorporate the philosophies from
[https://en.wikipedia.org/wiki/The_Death_of_the_Author](https://en.wikipedia.org/wiki/The_Death_of_the_Author)

The idea, for those not familiar, is that once a work of art is published (a
novel, a poem, a song, a painting), it speaks for itself, and authorial intent
no longer matters.

That is, meaning and purpose are in the eye of the beholder/consumer. And
there is no right or wrong way to "interpret" art. If someone finds meaning
that the author did not intend, it is just as valid as a deeply hidden but
intentional allegory they intentionally placed in when they were writing.

The relevance to software is it applies to APIs, specifications, standards and
formats.

There is no such thing as users using your software or specification "wrong"
\- if they insist on doing so, the meaning has evolved. Evolve with it or die.

~~~
rkagerer
That's a little extreme but you raise a good point. I think a talented spec
designer anticipates how their work might be interpreted / used / abused, and,
like an adroit villain, nudges their audience toward tenets their grand scheme
seeks to achieve.

------
tannhaeuser
> _In 1996, XML was invented._

XML wasn't an original invention; it is specified as a proper SGML subset.
From the XML spec:

> _The Extensible Markup Language (XML) is a subset of SGML that is completely
> described in this document._

Now I totally agree that SGML and XML aren't for service payloads and config
files. The sole purpose of markup languages is representing _structured text_.
And arguably, SGML fills this role much more adequately than XML today as it
can represent (via the SHORTREF mechanism) custom Wiki syntaxes such as
markdown and others, and in contrast to XML, can deal with the largest corpus
of markup out there eg. can parse HTML with all its minimization features such
a omitted tags, enumerated and unquoted attributes, etc. See [1] for a
practical introduction (disclaimer: link to a tutorial I held last month at
ACM DocEng).

[1]: [http://sgmljs.net/docs/sgml-html-
tutorial.html](http://sgmljs.net/docs/sgml-html-tutorial.html)

~~~
catalogia
SGML's implied close tags are a pain in the ass though. End tags in XML are
overly verbose, but at least they're required.

~~~
tannhaeuser
You control whether an element requires start- and end-element tags in your
element declaration via "O" (letter O as in "omissible") in the respective tag
omission indicator position:

    
    
        <!ELEMENT e - -(f,g,h)    -- no tag omission -->
        <!ELEMENT f O - (#PCDATA) -- start-tag omission -->
        <!ELEMENT g - O (#PCDATA) -- end-tag omission -->
        <!ELEMENT h O O (#PCDATA) -- both start- and end-
                                     tag omission allowed-->
    

What's painful about end-element tag omission?

------
foolfoolz
turns out a well specified format that has a lot of parsers available is
useful for more than just a markup language. xml is great at data formatting,
a little more verbose than alternatives but also a lot more feature rich

~~~
kazinator
The format wasn't well-specified and didn't have a lot of parsers in 1996
though. The parsers came after the decision was made by a lot of people to use
a markup language inappropriately for structured data.

------
h2odragon
I present, in the spirit of 'worst XML ever', the docs for ScriptXML:

[https://www.egosoft.com:8444/confluence/display/XRWIKI/Missi...](https://www.egosoft.com:8444/confluence/display/XRWIKI/Mission+Director+Guide)

This is the language used in the game X4 Foundations (and others in the
series). An example of its use (mine, i claim no grace in it):

[https://github.com/h2odragon/dragoncommands/blob/master/aisc...](https://github.com/h2odragon/dragoncommands/blob/master/aiscripts/deployglobe.xml)

... XML is not a great format for an extension language, I have to say.

~~~
Gibbon1
Eclipse project files are also hot garbage.

------
bullen
I would love to know what people think of my XML node-graph/tree editor I made
before JSON became mainstream (my excuse):
[http://rupy.se/logic.jar](http://rupy.se/logic.jar)

It basically names the tag what you name the node. :S

\- You link/unlink nodes (I called them entities! Xo) by right-click-dragging
between them.

\- You copy stuff by right-click-dragging to an empty space.

\- You delete by grabbing something by left-click-holding and pressing the
delete key.

\- Oh, and nodes are completely tree structure expandable, just drag-drop
attributes on nodes and nodes inside nodes.

The editor uses lightweight rendering so you can have a ton of elements with
good performance.

(I know, not super intuitive; but very handy once you know about these.)

------
commandlinefan
The first rule of XML is: whatever you're doing with it, that's not what it
was for.

------
hashberry
> a simple test for determining if an XML schema is well designed: remove all
> tags and attributes from it ... If what you have left over does not make
> sense ... you shouldn't be using XML at all.

Magento 2 (acquired by Adobe for $1.68bn) uses XML to render its layouts.
Here's some fun XML for the checkout page:

[https://github.com/magento/magento2/blob/2.3-develop/app/cod...](https://github.com/magento/magento2/blob/2.3-develop/app/code/Magento/Checkout/view/frontend/layout/checkout_index_index.xml)

------
benibela
The worst XML use I have ever seen are lists generated by Lazarus. Every list.
For example in the project files you have:

    
    
        <RequiredPackages Count="5">
          <Item1>
            <PackageName Value="LazUtils"/>
          </Item1>
          <Item2>
            <PackageName Value="treelistviewpackage"/>
          </Item2>
          <Item3>
            <PackageName Value="internettools"/>
          </Item3>
          <Item4>
            <PackageName Value="LCLBase"/>
            <MinVersion Major="1" Release="1" Valid="True"/>
          </Item4>
          <Item5>
            <PackageName Value="LCL"/>
          </Item5>
        </RequiredPackages>

~~~
hyperman1
I once got a really nice one:

Software v1.0: Config is a binary blob and everybody curses the nasty editor
provided by vendors.

Software v2.0: Config is in XML. Thank god!

Oh wait.

<Binaryblob><Byte value="65"/><Byte value="99"/> ....

------
mpweiher
Hmm...somewhat disagree with the "correct" way to express a dictionary. I
prefer:

    
    
       <root>
          <Name>John</Name>
          <City>London</City>
       </root>
    

Removes one level of indirection, XML already has keys.

~~~
progval
That's not a good way to express a dictionary because it does not allow
arbitrary strings as key names.

It's also not a good example of XML, because XML schemas should have a fixed
list of tag names.

~~~
sorenjan
But the starting point in the example was

    
    
        <item name="name" value="John" />  
        <item name="city" value="London" />  

where the key names are used as attributes, so it wouldn't work with arbitrary
key names either, right?

~~~
progval
It would work (kind of); most XML parsers/generators would take care of
escaping and unescaping quotes; but there's no way in the XML spec to escape
characters in tag names.

------
scarygliders
I have always thought that using XML as a format for storing and retrieving
configuration files, was complete insanity.

Which is why I still use the simple, effective, INI format for configuration
files for applications I write.

XML for config files is madness personified.

------
billsix
RIP Eric Naggum

[http://www.schnada.de/grapt/eriknaggum-
xmlrant.html](http://www.schnada.de/grapt/eriknaggum-xmlrant.html)

------
davidw
I love the quote I originally saw on the Nokogiri (Ruby XML lib) site:

"XML is like violence – if it doesn’t solve your problems, you are not using
enough of it."

------
Quarrelsome
I once had to write a data layer in xml, in-situ with a lifespan for up to
hours, as more data was appended to it. An invalid xml document that you
couldn't load in many xml apis for 99.9% of its lifespan. I begged and pleaded
the lead architect to use an sqlite db for the elements of the data until the
transaction was complete and then merely produce the xml file at the end, but
no.

It had to survive power outs too.

------
miggol
The worst example of this that I deal with regularly has to be the libvirt
domain XML.

[https://libvirt.org/formatdomain.html](https://libvirt.org/formatdomain.html)

It does occasionally put information outside of the tags, but because there's
no logic to when, it's nearly worse.

------
lmilcin
And JavaScript was never meant to be used to build applications...

There are many, many more inventions that are used for different purposes they
were meant for.

The Internet was created so that US can withstand nuclear attack and it was
never meant to be primarily used to spread advertisements.

Get over it.

------
micimize
I can't find much to corroborate this article's take. RDF is a stark counter-
example - a standard from the W3C. It has endorsement from Tim Bray, one of
XML's co-authors.

------
billpg
XML with only tags and attributes (nothing but ignored spaces between '>' and
'<') is a reasonable structured data format.

