
XML for databases: a dead idea - joeyespo
http://lemire.me/blog/archives/2013/01/14/xml-for-databases-a-dead-idea/
======
binarymax
I had a project where I used an xml column[1] in MSSQL2008, since we had
variable schema data and didn't want to store it vertically (one row per
value), and definitely didn't want to alter schema on the fly. While I
wouldn't necessarily recommend doing what I did, I don't see anything wrong
with it and it solved that specific problem rather eloquently. We got the data
out just fine with XQuery.

Believe it or not it was extremely fast...but only after playing with the xml
format for awhile and making sure the indexes all fit nicely. You can, almost
surprisingly, index specific XML fields inside of xml data-type columns[2].

I can almost hear you all cringing after reading that :)

[1] [http://msdn.microsoft.com/en-
us/library/ms190936(v=sql.90).a...](http://msdn.microsoft.com/en-
us/library/ms190936\(v=sql.90\).aspx) [2] [http://msdn.microsoft.com/en-
us/library/ms191497(v=sql.90).a...](http://msdn.microsoft.com/en-
us/library/ms191497\(v=sql.90\).aspx)

~~~
meaty
We had an entire system based on SQL Server's "FOR XML" output clause and XML
data islands in IE. It was an unmitigated fucking disaster zone that took 5
years to get rid of.

That put us off the above pretty sharpish.

~~~
DenisM
FOR XML and XML data type (e.g. XML column) are pretty much unrelated.

~~~
meaty
Yep but they share one feature: shitty idea.

------
itsnotdeadyet
For better or worse it's only dead in the same sense that the "relational
model" is dead - as a research topic. It's certainly alive in the real world
where it's more properly used as an interchange format. E.g. The IRS's MeF[1].
It's also finding interesting adoption in other industries[2].

1- [http://www.irs.gov/uac/Modernized-e-File-(MeF)-Program-
Infor...](http://www.irs.gov/uac/Modernized-e-File-\(MeF\)-Program-
Information)

2- [http://www.service-
architecture.com/xml/articles/oil_and_gas...](http://www.service-
architecture.com/xml/articles/oil_and_gas_xml.html)

------
MattRogish
Good. XML for data-storage was always a bad idea and I (and others
<http://www.dbdebunk.com/>) have been ranting about it since the beginning.
XML is verbose, hard to parse, and enforces a ridged hierarchy, even if your
data model didn't need it.

Presumably XML is good for "interchange" but I've never found that to be the
case. JSON seems to do just fine. :)

~~~
EugeneOZ
Actually XML is more "reach" format than JSON. XML nodes have attributes and
values, when in JSON - values only. And lot of databases are "document-
oriented": <http://en.wikipedia.org/wiki/Document-oriented_database> and what
is matters in comparison is parsing performance and size. JSON just more
compact, but not better in all aspects. I think XML is an excellent format,
especially when size matters less than "dimensions" of presented information.
I think it's perfect format for RSS and for REST API and for web-pages.

~~~
evincarofautumn
XML seen in the wild rarely has good separation of attributes and values. It
_can_ be useful to distinguish data and metadata, but it’s more expressivity
than perhaps most applications need.

------
fusiongyro
XML wasn't a good fit for databases or configuration files, but it was
basically inevitable that it would wind up used in those capacities. Playing
around with new technology and misusing it in fun ways like that is an
important part of the growth of a new technology--that's how we learn what the
strengths and weaknesses of a technology are.

~~~
bsg75
How did the trend develop, especially for things like configuration data?

Some warped sense of ease from using Java and .NET libraries to read / write
XML?

~~~
MichaelGG
XML is still a good solution for human-readable data. JSON, for instance,
doesn't allow comments because of some dogmatic idea of the author. The author
of JSON even goes as far as suggesting that you use a non-JSON parser to strip
out comments before parsing. In other words, not using JSON for configuration
files that need comments.[1]

XML is also very ubiquitous in library support, having been around for over 15
years.

I think the most negative reaction to XML is the extreme overuse for things
like component configuration, where there's no defaults, and every property
and class name must be specified. Also, the misguided tag closing
(<tag>content</> would have sufficed) adds to the verbosity.

1:
[https://plus.google.com/118095276221607585885/posts/RK8qyGVa...](https://plus.google.com/118095276221607585885/posts/RK8qyGVaGSr)

~~~
mnarayan01
God knows something to ease the closing tag verbosity would make XML at least
twice as useable. I wonder if there's anything in the grammar that would
conflict with </> as a closing tag? I'm almost tempted to muck with libxml and
add it for personal use.

~~~
fusiongyro
A personal dialect of XML? Wouldn't you be better off making your own grammar
from scratch?

~~~
mnarayan01
If I installed it at the system level, then anything that linked against
libxml/libxml2 would have the enhancement. Which is also the biggest reason
I'm not going to do it...I'd invariably end up distributing XML with the
sugarfied close tag.

------
tinco
What I am missing from this article is what replaces XML for databases.

An XML database is just a document store, and that idea has not died. You can
s/XML/JSON/g and you'll see that the ideas and research is still very relevant
today.

XML databases are decidedly 'NoSQL'. MongoDB is a good example of a straight
port of the document store idea to JSON.

MongoDB would be a lot richer if it were to support XQuery in some form,
rather than their awkward json querying system.

I think there is a discrepancy between the academic world which likes the
orderly nature of XML and the pragmatic programmers world who like the
terseness of JSON.

~~~
EugeneOZ
Couchbase also stores documents in JSON (and uses JavaScript for "views"). And
it's one of the best in performance.

------
No1
I used an XML DB in a project back in the early 2000s. The database was
populated by running an in-database XQuery that pulled data across the webs in
XML. When people visited a web site, XML was queried back out, then XSLT
translated it to (drum roll) XHTML. I even experimented with XForms to capture
data from visitors, which had the benefit of directly using the DB schema to
validate data. All in all, it was pretty nifty when you got all the pieces
working together properly. Sort-of XRX without the R.

As to reasons for their "death": DTDs and XSDs were a serious PITA. XML data
types didn't map closely to standard relational DB types; this wasted space
(which was costly at the time), and made it difficult to interact with a
relational DB when that was necessary. The XQuery query planner was braindead.
That was really the nail in the coffin.

All in all, they're just document stores, and those are more popular than ever
now with the whole NoSQL thing. As the author mentions, XML seems to have won
out as a document format... I wonder what sort of DB we could put all those
documents into. One that could support storing those documents without a bunch
of translating to other formats, and had some method of running queries over
them (preferably in a standardized way), and output the data in a similar
format. Too bad XML databases are dead, one of those would have worked pretty
well.

------
meaty
I will say I don't like it for online storage of any kind, but it's pretty
hard to beat XML as an offline interchange or archival format.

Case in point: we can dump an entire client's data from our system into a
single gzipped XML blob straight from the ORM. This can also be restored
straight back into the ORM as well with very little code.

This means we can switch database engines, do snapshots, backups, manage on
site deployments and all sorts of nice things very easily.

------
mwexler
Remember all the XML programming languages? I recall Water
<http://waterlanguage.org/>, for example, getting some buzz in the Boston area
during the buzzy time Daniel Lemire refers to...

~~~
meaty
Ugh I remember that. What an utter vomit-bag of a creation that was.

You can see how popular it was considering their web site was (c) 2001 -
2003...

------
mturmon
Nice observation:

"I initially wanted to write an actual research article to examine why XML for
databases failed [but the article would] be unpublishable because too many
people will want to argue against the failure itself. This is probably a great
defect of modern science: we are obsessed with success and we work to forget
failure."

------
runarb
Down for me, but Google has it cached:
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://lemire.me/blog/archives/2013/01/14/xml-
for-databases-a-dead-idea/)

~~~
hxseven
The CoralCDN also has a cached version:

[http://lemire.me.nyud.net/blog/archives/2013/01/14/xml-
for-d...](http://lemire.me.nyud.net/blog/archives/2013/01/14/xml-for-
databases-a-dead-idea/)

------
SeanLuke
> Storing data in XML for long-term interoperability is an acceptable use of
> XML.

What?

~~~
eli
Not sure it would be my first choice either, but that's not a crazy statement.
In a couple of decades, I'm confident there will still be well-maintained XML
libraries for every popular platform.

------
tantaman
They just evolved into document databases like Mongo and CouchDB. Same thing
is happening in the transport world. XML started the shift, JSON is continuing
it.

------
snarfy
It's bad for data storage, but it's great for passing hierarchical parameters
to stored procedures. It's also great for retrieval of hierarchical data.

Storing the XML in a column, that's just dumb. If I did that I would only ever
treat it as a binary blob, same as any file. When you start using the db's XML
query support to query that column, that's when you get into trouble and need
to rethink your design.

------
ucee054
If we exclude stuff like OLAP and spatial databases, there are only 3 types of
database: relational, pointer and logical.

Relational means something that accepts Codd's constraints, more-or-less.
Pointer is something that doesn't, and which therefore easily allows
hierarchies.

Logical is like relational, expanded to allow recursive queries, prolog-style,
which is a high level way to have hierarchies and Codd's constraints, but you
pay for in terms of query performance.

The thing is, pointer databases have been reinvented again and again: First as
the mainframe cobol databases of the 70s, then as the filesystem, then as the
windows registry, then as OODBMSs, then as XML databases, and now as NoSQL.

The other thing is, Postgres is designed to cater for all 3, relational,
pointer and logical.

------
Zash
XML is fine if used where it fits.

Use the best tool for the job and all that.

~~~
calpaterson
> Use the best tool for the job and all that.

I don't know of anyone advocating that they something other than the best
tools for the job should ever be used. The question is whether a particular
tool is good for a particular job or not.

