Hacker News new | past | comments | ask | show | jobs | submit login

"What is XML good at?"

(For the purposes of this post, I'm including HTML in the XML family.)

XML/HTML is good when:

1. You have two dimensions of markup you want to do. That is, you have a clear distinction between what is a new "tag" and what is an attribute on that tag. If you can't almost instantly decide whether some feature you want to add works as an attribute or a tag, you probably shouldn't be in XML.

2. Almost every tag one way or another contains some text, the third dimension that XML supports. A proliferation of tags that never contain any text is a bad sign. A handful may not be a problem, e.g. "hr" in HTML, but they should be the exception.

3. You have a really good use case for XML namespacing, the fourth dimension of information that XML supports, in which case there's almost no competition for a well-standardized format, as long as you're also using the previous three dimensions.

There's sort of this popular myth that XML is useless, which I think isn't because it's true or that XML is bad, I think it's because in general, most times you want to dump out a data structure #1 isn't true, let alone #2 or #3. In a lot of data sets, you've only got the two dimensions of "simple structure" and "text", not annotations on the structure itself. (Or, perhaps even more accurately, they end up implicit in the format itself, and the format is constant enough for that to be just fine.) A lot of stuff in the 1990s and 200xs used XML "because XML" even though it clearly failed #1. XML is really klunky when you don't want that second dimension because the XML APIs generally can't let you ignore it, or they wouldn't actually be XML APIs.

On the other hand, when you learn this distinction, you do come across the occasional JSON-based format that clearly really ought to be XML instead. You can embed anything you want into JSON, but when you're manually embedding a second structure dimension into your JSON document, it loses its advantages over XML fast. If you've ever seen any of the various attempts to fully embed HTML into JSON, without leaving any features behind, you can begin to see why XML or XML-esque standards like HTML aren't a bad idea. HTML is much easier to read for humans than HTML-in-JSON-with-no-compromises.

And if you've truly got the four-dimensional use case, XML is really quite nice. When you need all the features, suddenly the libraries, completely standardized serialization, and XPath support and such are all actually convenient and surprisingly easy to use, for what you're getting.

Some examples: HTML is a generally good idea. SVG is a middling idea; it passes #1 and #3 but fails #2. SOAP and XML-RPC is generally a bad idea; SOAP fails #1 and #2 but sort of uses #3 and XML-RPC fails all three. XMPP I actually think is pretty solid as an XML format (mere network verbosity problems can be solved with an alternate encoding, though admittedly that becomes non-standard), and in a lot of ways, the real problem with XMPP isn't so much the format itself as that people are not used to dealing with the four-dimensional data structures that result. People expecting IRC-esque flat text are not expecting such detail. Using the fourth dimension of namespaces for extensibility is neat, but few developers understand it, or want to.

This is perhaps the best (most terse and accurate) summary of XML tradeoffs I've seen in years.

I generally don't just comment "attaboy" but there you go.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact