I remember teaching classes in '99 where I needed to explain XML. I could do it in five minutes -- it's like HTML, but you get to make up tags that are case-sensitive. Every start tag needs to have an end tag, and attributes need to be quoted. Put this magic tag at the start, so parsers know it's XML. Here's the special case for a tag that closes itself.
Then what happened? Validation, the gigantic stack of useless WS-* protocols, XSLT, SOAP, horrific parser interfaces, and a whole slew of enterprise-y technologies that cause more problems than they solve. Like COBRA, Ada, and every other "enterprise" technology that was specified first and implemented later, all of the XML add-ons are nice in theory but completely suck to use.
XML has changed from a mildly verbose but eminently human-readable and machine-parsable data representation into a "technology" that has become over-specialized and covered with encrustations to the point that it is approaching regex's on the "now you have two problems" scale. JSON just gets back to a format that both the machine and I can read reliably and with ease.
This as a re-purposed quote, originally referencing sed or awk, and numerous people since have substituted in XML.
PS: Parsing is one of those cases where you need to have the program do the types of things people are used to the source code doing. AKA, simply nesting if statements does not get you to a solution, which often leads to people tossing out lot's of buggy code. Creating a DSL (or Yet another LISP) can be really useful, but regexps is a dead end for a wide range of problems.
Okay, that's a little bit flip. Regexes are fine when used in well-understood cases when they're properly documented for whatever guy comes after you and has to deal with them. The "now you have two problems" part, IMO, comes more from the way most people use them instead of any particular flaw in regexes themselves.
And then you have to add a few utterly pointless lines of code to deal with that namespace.
And people defending XML namespaces will appear in 3, 2, 1...
TBH at one point I thought the XML/XSLT combination for generating html server and clientside would be the next big thing, it was amazing for me, much, much better than jQuery.tmpl or any other js templating language as it acted identically on server/client. Unfortunately non-trivial XSLTs are a little much for many programmers to grok, you have to be able think functional. People want their loops. So I've given up on them as too high maintenance (even I had trouble reading xslts I'd written even a month ago).
I do prefer json to xml now though.
0. I don't really like XML (and most stuff around XML), but I rather like XML namespaces. Not all of them, mind you, but my issues mostly have to do with default namespaces and the way they're treated on elements versus attributes.
All-in-all, XML namespaces solve real issues in rather smart a manner.
Now tooling access to XML namespaces is a completely different bucket of filth, most APIs and XML-related stuff don't fare well there.
> Unfortunately non-trivial XSLTs are a little much for many programmers to grok, you have to be able think functional.
To think shit functional. Functional without real functions, with broken scoping, with an implicit state throughout, ... A "functional" language where you need something like 6 likes to write a recursive function which does nothing but recurse is closer to dysfunctional.
XSLT had a smart idea (only one) in XPath-selected templates, but the whole implementation is garbage.
> People want their loops.
XSLT has loops. It's not loops which throw people off, it's the unreadable verbosity and having to jump through hoops in order to do the simplest processing. It is also hard (as in, you need quite a bit of experience) to create anything even remotely reusable, and even then XSLT has no modularity worth speaking about. And the namespace handling is a mess.
I think several healthy businesses supplying a GUI for XSLT suggests two things: 1. there's a need for what XSLT does; and 2. XSLT sucks
Of course, it could be that GUIs indicate a kind of disruption - making a difficult technology simpler to use, and therefore accessible by less-skilled people (or skilled people with less time). The potential disruptees are the XSLT experts (who in turn displaced people who wrote manual transformers, before XSLT).
Having said all that: there isn't anything comparable for JSON. And yet, communication between different apps - or even communication between different versions of the same app - is pretty much a given.
shrug I don't actually think these problems are all that complex. XML made them complex.
The bottom line is that you need to treat any data format as a protocol. The only thing you need to know about the data is what protocol it is. A simple file extension pretty much takes care of that. Then each data format can pick whatever (in)formalism they want to describe the protocol.
JSON is popular precisely because it doesn't try to solve any non-problems.
This reminds me of Erik Naggum's XML rant: http://www.schnada.de/grapt/eriknaggum-xmlrant.html
"If a solution is much smarter than the problem and really stupid people notice it, they believe they have got their hands on something /great/, and so they destroy it . . ."
If you just need a light-weight API that is only ever going to be REST orientated, then use JSON.
If you're building a very complex Enterprise SOA system, use XML as the associated technologies (xsd, xsl, soap+ws-* etc) will be very useful.
If people are starting to invent XML like technologies for JSON (namespaces or schemas etc), then why not just use XML? It's already been done and works well.
I disagree with that assertion. There is nothing open Web can possibly gain by adopting JSON for its APIs.
In a heterogeneous ecosystem, much like Web is, it'd be much more productive to have a typed, extendible and validatable format like XML.
Last but not least, what does make XML ill-suited for any ‘lightweight’ APIs? Atom fits nicely with REST and is quite lean. There's hardly anything JSON can offer, to my knowledge, that could be any lighter or more robust.
But then XML goes further and provides you with a contract. Now anyone can write their independent implementation and test it (validate it) to that contract. This is what Web was about in the first place. You can use any technology imaginable, any language imaginable, but in the end you are delivering to common format, HTML+JS.
Now that this is insufficient, the natural evolution would be XML and REST. JSON, however, is a liability. I believe it is, in fact, the next IE6!
XML is not any more or less typed, extendible or validatable than JSON. They are both serialisation formats that make you put data into their own structure, a tree with attributes for XML and lists and dictionaries for JSON.
Exactly two types exist in XML: tree and text. And exactly six types exist in JSON: list, dictionary, string, number, boolean and null.
What XML has that JSON doesn't is a solution for automating validation called XML Schema.
XML was, in fact, designed with all those things in mind. Here is the 2nd design goal and its rationale, from 1996 draft specification:
> 2. XML shall support a wide variety of applications.
> No design elements shall be adopted which would impair the usability of XML documents in other contexts such as print or CD-ROM, nor in applications other than network browsing. Examples of such applications include:
> validating authoring tools
> batch validators
> simple filters which understand the XML document structure
> formatting engines
> translators to render XML documents into other languages (e.g. RTF, TeX, ...)
It should also be noted, that XML included the DTD specification from the very beginning. Although much different from the current state of affairs—still no namespaces and not quite the flexibility of XSD—it's nevertheless an important distinction.
As you might imagine, such core concept were further developed, which resulted in the 1.0 spec of XML Namespaces (called xml-names at that time) and with XSD soon following suite.
To say that JSON and XML are anything alike is unfair. As JSON's own RFC states, it was meant to be merely:
No concern is given for its interoperability between vastly different environments, nor did it try to position itself as a replacement for XML, which would surely be known to its creators at that time. Subsequently, there's no mention of validation or extendibility.
Of course the concept of XML Namespaces is well unique to XML. Even the JSON Schema, does not introduce anything similar, but just merely provides an optional meta-document, describing—or should I say—hinting, about how the document may look like:
> It is acknowledged that JSON documents come in a variety of structures, and JSON is unique in that the structure of stored data structures often prescribes a non-ambiguous definite JSON representation.
> Cumulatively JSON Schema acts as a meta-document….
The JSON schema is of course still a draft today, and has received almost no adoption from the community.
This should be, however, obvious, given the fact that it took a whole 4 years for JSON Schema to emerge (such a long time in the noughties), suggesting just how non-essential value it must hold for its users.
What I'm honestly afraid, is that JSON Schema will ultimately get our attention. But neither because we will seek it for its technical brilliance, which it hasn't got, nor because we will seek it for its user-friendliness, which is unimpressive, but because we will be desperate.
After enough developers will get seduces by the carelessness of JSON—“blimey, I don't have to think at all, and it's endorsed by some bloggers”—we will embrace JSON Schema and, surely, invent a whole novel ecosystem, just to deal with the mess, that such ‘fuck open Web, I'd rather just output garbage, because it is more convenient right now’ attitude could only bring.
What precisely are they useful for? Honest question, I have not had a job yet, let alone the "enterprise" one.
XSD (Schema) - lets you describe the format of your XML document. Useful if you need to tell other parties about your format. You can use XSDs to programmatically validate that a given XML document is "correct" as per your schema. IDEs can use XSDs file to do intellisense on XML files as you type. You can also include documentation within XSDs and have an external tool auto-generate documentation (e.g. in HTML) to describe your Schema in a more user-friendly way.
XSL - These are XML docs that define how to transform 1 XML format into another. Can use combine this with XSDs to ensure that the result of the transformation is valid.
SOAP + WS-* - WS-* is a set of standards on top of SOAP (which just defines a web-service interface). E.g. WS-Security can define that your web-service must encrypt the contents of the message, WS-Addressing makes SOAP less dependant on HTTP.
If you do not, then you will guaranteed run into schema differences of common objects (ie, SAP "Customer" <> Oracle "Customer <> Salesforce "Customer" etc etc). XML w/ XSL/XSD wins here.
XSL is an umbrella for a few things, most interesting is XPath which is a super fast query language for xml documents.
SOAP is just worthless.
WSDL is nice though, it compiles a web service API to source code so you get a nice layer of abstraction around what you're writing to.
Before XML we had CORBA, and before that EDI, and before that ASN.1... Just reinventing the wheel and each time a little worse.
For example (AFAIK) XML webservices tend to be less tightly coupled than CORBA's IDLs (though you can tight-couple in any language):
- XML using its own format specification language (DTD; XML Schema) helps keep the external data format independent from the application's internal data structures (until some bright spark realizes they can avoid duplication! by deriving one from the other... thus re-coupling them. Because of this, SOAP is CORBA all over again).
CORBA's IDL produced an interface for every object. This is fine for methods of the top-level object (that represents the main actions of the application/service), but when applied to objects that form the internal data structures (the nested object tree/graph that is received/sent as an argument/return value), it forces the external format to have the same structure as the internal data structure, thus tightly-coupling them.
[And, CORBA tried to do "distributed objects", which just didn't work very well - though it sounds like a good idea.]
REST seems to be making a big difference. It explicitly uses "representational state", meaning that the external format is independent of the sender and receiver's internal representation. By defining the format independently of an end-point, it seems you escape tight-coupling. In practice, devs will often derive internal object definitions from the external format - but then they do the loose-coupling themselves, internally, and so tight-coupling is still avoid. [It's seeming to me that
deriving internal-->external is bad; but external-->internal is OK.]
In a way, I think this progression can be seen as realizing the concept of information hiding in greater and greater depth as time goes on.
Does that seem right? What do you think? Can you point me to any references that address these issues (esp about tight-coupling between internal/external representations)? Many thanks for any help!
I don't disagree with you about bloated middleware products, but you don't have to use them to use soap,xsd,xsl,xpath etc.
It may be my lack of experience in the area, but I have never found big-iron solutions to be a good solution to anything I have dealt with, usually they seem to be more trouble than the actual problem.
If it would be so easy. A few years ago I had a boss who wanted to use XML everywhere. No matter what. The whole company only had Java+XML as a selling point.
If you have a typed format (i.e. XSD), there are enough tools out there to generate the code <-> XML interchange boilerplate. In C#, this is typically solved by creating a serializable class with attributes/annotations denoting what XML nodes the actual class-properties map to. Again, fairly painless. And declarative.
If your main beef with XML is that the libraries you have to work with is an exact replica of the W3C DOM-model, then it is your tooling which sucks. XML is just fine, when used correctly ofcourse.
Think how jQuery managed to make writing JS fun, as opposed to when you had to work with the plain DOM. HTML is not that far from XML and if tooling can make that kind of difference, judging a technology exclusively based on your lacking libraries (or lack of exposure to good ones) can really not be considered fair by any means.
Now with that said, I find JSON nice as well, but for different reasons and with different drawbacks.
Speaking of that, http://code.google.com/p/phpquery/ is an interesting project. I haven't used it, but it seems like it could make some XML parsing less strenuous.
In Python, I use the ijson bindings for yajl (http://softwaremaniacs.org/blog/2010/09/18/ijson/en/).
Here is a blog post demonstrating ijson: http://thechangelog.com/post/1169335384/ijson-parse-streams-...
For example, in financial time series/ trading applications, see the following two charts, a data table and a visualization chart completely generated on the fly using JSON data inputs through ajax calls.
Makes the application very light and all rendering is done on the client side.
Market Returns Dashboard: http://bit.ly/iVxYoP
Daily Market Commentary: http://bit.ly/my5A3c
<series>1.0 2.0 1.2 15.6 3.1 4.5 3.2</series>
- XML is for documents
- JSON is for data-structures
Both of them do a great job in their domain, but you shouldn't try to mix them up and store data-strctures in XML or Documents in JSON
I would argue that data in general does not fit well to XML. Documents do fit well as it's absolutely fantastic at markup, i.e. where there isn't repeated structure. As soon as you have repeated structure (i.e. data as opposed to datum) then XML becomes a terrible format. CSV, tab-delimited, gzipped and separated by pipe characters, RDF triples, N3, give me almost •anything* other than XML and I'll be much happier.
In fact, XML is the reason that RDF is dead in the water. RDF could have been a useful concept, but it's so encumbered by a terrible formats and tools that it will almost never be useful; it can not escape the existing body of XML-RDF. And such a waste in the name cargo-cult mimicking of HTML.
Edit: While underscore.js is wonderful, I doubt it performs nearly as well as XPath.
Streaming JSON API/parsers are not wide-spread (and definitely not as widespread as SAX-style streaming XML parsers), which makes JSON unusable for datasets which don't fit in memory.
And generally spreaking, I think most people would be satisfied with a CSS-type query language, I'm always surprised by the dislike my colleagues have for xpath and their lack of knowledge of it, even though xpath is one of the very few things I like in the XML infosphere (the other one being RelaxNG)
Around for four years. Stable. Language bindings for all major languages.
That's the only one I can find which would be relevant to milweed's assertion that:
> Large data sets should be published in XML.
Apart from the rarely used ability to mix data formats into a single semi-coherent document and some sort of backwards compatibility, I don't see much in the way of reason.
> As far as I know, there is nothing inherent to JSON to prevent anyone from writing a streaming parser.
Of course there is not, and there are several such parsers, hence my writing that they are not widespread. Were they impossible (or non-existent) or at the very least unknown to me I'd have written that they don't exist instead.