Hacker News new | past | comments | ask | show | jobs | submit login
1 in 5 APIs Say “Bye XML” - JSON gaining ground (programmableweb.com)
138 points by ChrisArchitect on May 26, 2011 | hide | past | favorite | 63 comments



I think JSON's power comes from the fact that it is a subset of a programming language. I organize data in arrays and hash tables (or objects, which can be implemented as hash tables), and that is what JSON provides.

I remember teaching classes in '99 where I needed to explain XML. I could do it in five minutes -- it's like HTML, but you get to make up tags that are case-sensitive. Every start tag needs to have an end tag, and attributes need to be quoted. Put this magic tag at the start, so parsers know it's XML. Here's the special case for a tag that closes itself.

Then what happened? Validation, the gigantic stack of useless WS-* protocols, XSLT, SOAP, horrific parser interfaces, and a whole slew of enterprise-y technologies that cause more problems than they solve. Like COBRA, Ada, and every other "enterprise" technology that was specified first and implemented later, all of the XML add-ons are nice in theory but completely suck to use.

XML has changed from a mildly verbose but eminently human-readable and machine-parsable data representation into a "technology" that has become over-specialized and covered with encrustations to the point that it is approaching regex's on the "now you have two problems" scale. JSON just gets back to a format that both the machine and I can read reliably and with ease.


Nitpick but its "CORBA": Common Object Request Broker Architecture -- from someone who suffered with Orbix and Visibroker back in '99.


What is the issue with regex?


A widely quoted aphorism, originally from a USENET post: "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." —Jamie Zawinski,

This as a re-purposed quote, originally referencing sed or awk, and numerous people since have substituted in XML.


While regexps can look intimidating try this: write something that does a regexp using just "normal" language features. Greenspun's law applies equally here; any sufficiently complex parser will include a DSL for regular expressions...


I think the root issue is deeper than that; regexps can't handle tree data structures of undefined depth. So while you can often get the common cases to work just fine, some of the edge case are often literally impossible to solve with regexps.

PS: Parsing is one of those cases where you need to have the program do the types of things people are used to the source code doing. AKA, simply nesting if statements does not get you to a solution, which often leads to people tossing out lot's of buggy code. Creating a DSL (or Yet another LISP) can be really useful, but regexps is a dead end for a wide range of problems.


http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Okay, that's a little bit flip. Regexes are fine when used in well-understood cases when they're properly documented for whatever guy comes after you and has to deal with them. The "now you have two problems" part, IMO, comes more from the way most people use them instead of any particular flaw in regexes themselves.


Regex themselves are fine, but people try to do ridiculous things with them such as "parse" HTML. They're used when they shouldn't be, and they go unused when they should be used. :)


Of course, you can still use plain old XML.


Some languages are hostile to plain old XML too though, mix one namespace declaration with an XML novice and you'll end up with much head scratching as to how to actually get the data out.

And then you have to add a few utterly pointless lines of code to deal with that namespace.

And people defending XML namespaces will appear in 3, 2, 1...

TBH at one point I thought the XML/XSLT combination for generating html server and clientside would be the next big thing, it was amazing for me, much, much better than jQuery.tmpl or any other js templating language as it acted identically on server/client. Unfortunately non-trivial XSLTs are a little much for many programmers to grok, you have to be able think functional. People want their loops. So I've given up on them as too high maintenance (even I had trouble reading xslts I'd written even a month ago).

I do prefer json to xml now though.


> And people defending XML namespaces will appear in 3, 2, 1...

0. I don't really like XML (and most stuff around XML), but I rather like XML namespaces. Not all of them, mind you, but my issues mostly have to do with default namespaces and the way they're treated on elements versus attributes.

All-in-all, XML namespaces solve real issues in rather smart a manner.

Now tooling access to XML namespaces is a completely different bucket of filth, most APIs and XML-related stuff don't fare well there.

> Unfortunately non-trivial XSLTs are a little much for many programmers to grok, you have to be able think functional.

To think shit functional. Functional without real functions, with broken scoping, with an implicit state throughout, ... A "functional" language where you need something like 6 likes to write a recursive function which does nothing but recurse is closer to dysfunctional.

XSLT had a smart idea (only one) in XPath-selected templates, but the whole implementation is garbage.

> People want their loops.

XSLT has loops. It's not loops which throw people off, it's the unreadable verbosity and having to jump through hoops in order to do the simplest processing. It is also hard (as in, you need quite a bit of experience) to create anything even remotely reusable, and even then XSLT has no modularity worth speaking about. And the namespace handling is a mess.


The keys things about XSLT are that it exists and it works: it's better than nothing. I agree it's horrible to use e.g. "if (i<10)" are you kidding me? Though, many people use a graphical front end to generate it these days (mapforce, biztalk mapper, stylus studio, IBM and Oracle have their own - there's even a free one with an online demo http://jamper.sourceforge.net/).

I think several healthy businesses supplying a GUI for XSLT suggests two things: 1. there's a need for what XSLT does; and 2. XSLT sucks

Of course, it could be that GUIs indicate a kind of disruption - making a difficult technology simpler to use, and therefore accessible by less-skilled people (or skilled people with less time). The potential disruptees are the XSLT experts (who in turn displaced people who wrote manual transformers, before XSLT).

Having said all that: there isn't anything comparable for JSON. And yet, communication between different apps - or even communication between different versions of the same app - is pretty much a given.


I've written an XML to RTF converter using XSLT 1.0 (does CALS tables as well!) back in the day at the place I still work at.

The pain!


"XML is like violence, if it doesn't work, you aren't using enough."


Minor correction: JSON is not a subset of js http://timelessrepo.com/json-isnt-a-javascript-subset.


Quit trolling, that's a technicality.


Which we are discussing here. People should keep this in mind because it can lead to bugs that are hard to detect.


Well so long as we're being technical it's a bug in the JavaScript standard and implementations. At least that's my understanding.


> much of the reason XML is complex is that it’s trying to solve complex problems of data interchange by providing a meta language to describe the data. As JSON gains wider adoption, it will face some of the same issues XML tackled over the last decade.

shrug I don't actually think these problems are all that complex. XML made them complex.

The bottom line is that you need to treat any data format as a protocol. The only thing you need to know about the data is what protocol it is. A simple file extension pretty much takes care of that. Then each data format can pick whatever (in)formalism they want to describe the protocol.

JSON is popular precisely because it doesn't try to solve any non-problems.


> JSON is popular precisely because it doesn't try to solve any non-problems.

This reminds me of Erik Naggum's XML rant: http://www.schnada.de/grapt/eriknaggum-xmlrant.html

"If a solution is much smarter than the problem and really stupid people notice it, they believe they have got their hands on something /great/, and so they destroy it . . ."


Not only are they not complex, they were solved in the 80s.

http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One


JSON has a nice way of reinventing that wheel on their website. http://json.org/


I find the whole JSON vs XML debate pretty tedious. Just use what ever is best for the job.

If you just need a light-weight API that is only ever going to be REST orientated, then use JSON.

If you're building a very complex Enterprise SOA system, use XML as the associated technologies (xsd, xsl, soap+ws-* etc) will be very useful.

If people are starting to invent XML like technologies for JSON (namespaces or schemas etc), then why not just use XML? It's already been done and works well.


> If you just need a light-weight API that is only ever going to be REST orientated, then use JSON.

I disagree with that assertion. There is nothing open Web can possibly gain by adopting JSON for its APIs.

In a heterogeneous ecosystem, much like Web is, it'd be much more productive to have a typed, extendible and validatable format like XML.

Last but not least, what does make XML ill-suited for any ‘lightweight’ APIs? Atom fits nicely with REST and is quite lean. There's hardly anything JSON can offer, to my knowledge, that could be any lighter or more robust.


Not so heterogeneous if the only data exchange format available is XML.


Well that's the point, isn't it? XML unifies various independent ecosystems. At the very least, if defines such essential things like data and time representation formats, whereas JSON doesn't even attempt to do that.

But then XML goes further and provides you with a contract. Now anyone can write their independent implementation and test it (validate it) to that contract. This is what Web was about in the first place. You can use any technology imaginable, any language imaginable, but in the end you are delivering to common format, HTML+JS.

Now that this is insufficient, the natural evolution would be XML and REST.  JSON, however, is a liability. I believe it is, in fact, the next IE6!


> In a heterogeneous ecosystem, much like Web is, it'd be much more productive to have a typed, extendible and validatable format like XML.

XML is not any more or less typed, extendible or validatable than JSON. They are both serialisation formats that make you put data into their own structure, a tree with attributes for XML and lists and dictionaries for JSON. Exactly two types exist in XML: tree and text. And exactly six types exist in JSON: list, dictionary, string, number, boolean and null.

What XML has that JSON doesn't is a solution for automating validation called XML Schema.


That's not entirely Apples–Apples.

XML was, in fact, designed with all those things in mind. Here is the 2nd design goal and its rationale, from 1996 draft specification[1]:

> 2. XML shall support a wide variety of applications.

> No design elements shall be adopted which would impair the usability of XML documents in other contexts such as print or CD-ROM, nor in applications other than network browsing. Examples of such applications include:

> validating authoring tools

> batch validators

> simple filters which understand the XML document structure

> normalizers

> formatting engines

> translators to render XML documents into other languages (e.g. RTF, TeX, ...)

It should also be noted, that XML included the DTD specification from the very beginning. Although much different from the current state of affairs—still no namespaces and not quite the flexibility of XSD—it's nevertheless an important distinction.

As you might imagine, such core concept were further developed, which resulted in the 1.0 spec of XML Namespaces[2] (called xml-names at that time) and with XSD soon following suite.

To say that JSON and XML are anything alike is unfair. As JSON's own RFC states, it was meant to be merely:

> JSON's design goals were for it to be minimal, portable, textual, and a subset of JavaScript.

No concern is given for its interoperability between vastly different environments, nor did it try to position itself as a replacement for XML, which would surely be known to its creators at that time. Subsequently, there's no mention of validation or extendibility.

Of course the concept of XML Namespaces is well unique to XML. Even the JSON Schema[3], does not introduce anything similar, but just merely provides an optional meta-document, describing—or should I say—hinting, about how the document may look like:

> It is acknowledged that JSON documents come in a variety of structures, and JSON is unique in that the structure of stored data structures often prescribes a non-ambiguous definite JSON representation.

> Cumulatively JSON Schema acts as a meta-document….

The JSON schema is of course still a draft today, and has received almost no adoption[4] from the community.

This should be, however, obvious, given the fact that it took a whole 4 years for JSON Schema to emerge (such a long time in the noughties), suggesting just how non-essential value it must hold for its users.

What I'm honestly afraid, is that JSON Schema will ultimately get our attention. But neither because we will seek it for its technical brilliance, which it hasn't got, nor because we will seek it for its user-friendliness, which is unimpressive, but because we will be desperate.

After enough developers will get seduces by the carelessness of JSON—“blimey, I don't have to think at all, and it's endorsed by some bloggers”—we will embrace JSON Schema and, surely, invent a whole novel ecosystem, just to deal with the mess, that such ‘fuck open Web, I'd rather just output garbage, because it is more convenient right now’ attitude could only bring.

[1] http://www.textuality.com/sgml-erb/dd-1996-0001.html

[2] http://www.w3.org/TR/1998/NOTE-xml-names-0119

[3] http://tools.ietf.org/id/draft-zyp-json-schema-03.txt

[4] http://json-schema.org/implementations.html


If you're building a very complex Enterprise SOA system, use XML as the associated technologies (xsd, xsl, soap+ws- etc) will be very useful.*

What precisely are they useful for? Honest question, I have not had a job yet, let alone the "enterprise" one.


Some examples:

XSD (Schema) - lets you describe the format of your XML document. Useful if you need to tell other parties about your format. You can use XSDs to programmatically validate that a given XML document is "correct" as per your schema. IDEs can use XSDs file to do intellisense on XML files as you type. You can also include documentation within XSDs and have an external tool auto-generate documentation (e.g. in HTML) to describe your Schema in a more user-friendly way.

XSL - These are XML docs that define how to transform 1 XML format into another. Can use combine this with XSDs to ensure that the result of the transformation is valid.

SOAP + WS-* - WS-* is a set of standards on top of SOAP (which just defines a web-service interface). E.g. WS-Security can define that your web-service must encrypt the contents of the message, WS-Addressing makes SOAP less dependant on HTTP.


To summarize, if you own the code for both endpoints of data exchange (and will always do so), then JSON wins by sheer terseness and schema.

If you do not, then you will guaranteed run into schema differences of common objects (ie, SAP "Customer" <> Oracle "Customer <> Salesforce "Customer" etc etc). XML w/ XSL/XSD wins here.


XSD is used to validate a document conforms to some standard. Say you run a legal document firm and you receive Form Foo from 100 different clients, XSD will handle validating that they are conforming to the standard for you.

XSL is an umbrella for a few things, most interesting is XPath which is a super fast query language for xml documents.

SOAP is just worthless.

WSDL is nice though, it compiles a web service API to source code so you get a nice layer of abstraction around what you're writing to.


I've never seen WSDL used without SOAP. Not saying it couldn't happen, just not seen it. Do you ever use WSDL without SOAP? Or SOAP without WSDL?


Haven't you overlooked the fact that WSDL-based proxies and SOAP-over-HTTP are almost always used in concert?


Well, same question here. I've interned on an enterprisey web service before but I couldn't imagine what sort of mental gymnastics are required to justify xml being used the way it is.


They are good for creating jobs and selling expensive enterprisey software packages to PHBs.


Well I am an enterprise developer and I can tell you exactly what XML is for: selling you expensive middleware. It costs a fortune and it starts with an X, it must be eXtremely good right!? I avoid it like the plague as much as I can... But everyone gets bogged down in SOAP WSDL nonsense.

Before XML we had CORBA, and before that EDI, and before that ASN.1... Just reinventing the wheel and each time a little worse.


It's unfortunate you're downvoted, as there's truth amid the cynicism. It's true that consultants get paid more for hard-to-use software, and I get the impression even some open-source projects play this card... But, usually there's some progress in successive technologies - from your position, can you see any? I'm interested in understanding the real problems here.

For example (AFAIK) XML webservices tend to be less tightly coupled than CORBA's IDLs (though you can tight-couple in any language):

- XML using its own format specification language (DTD; XML Schema) helps keep the external data format independent from the application's internal data structures (until some bright spark realizes they can avoid duplication! by deriving one from the other... thus re-coupling them. Because of this, SOAP is CORBA all over again).

CORBA's IDL produced an interface for every object. This is fine for methods of the top-level object (that represents the main actions of the application/service), but when applied to objects that form the internal data structures (the nested object tree/graph that is received/sent as an argument/return value), it forces the external format to have the same structure as the internal data structure, thus tightly-coupling them. [And, CORBA tried to do "distributed objects", which just didn't work very well - though it sounds like a good idea.]

REST seems to be making a big difference. It explicitly uses "representational state", meaning that the external format is independent of the sender and receiver's internal representation. By defining the format independently of an end-point, it seems you escape tight-coupling. In practice, devs will often derive internal object definitions from the external format - but then they do the loose-coupling themselves, internally, and so tight-coupling is still avoid. [It's seeming to me that deriving internal-->external is bad; but external-->internal is OK.]

In a way, I think this progression can be seen as realizing the concept of information hiding in greater and greater depth as time goes on.

Does that seem right? What do you think? Can you point me to any references that address these issues (esp about tight-coupling between internal/external representations)? Many thanks for any help!


So what do you use then? (Honest question)

I don't disagree with you about bloated middleware products, but you don't have to use them to use soap,xsd,xsl,xpath etc.


JonoW, I'd like to see a definitive case study for the Enterprise SOA system's use of XML.

It may be my lack of experience in the area, but I have never found big-iron solutions to be a good solution to anything I have dealt with, usually they seem to be more trouble than the actual problem.


Just use what ever is best for the job.

If it would be so easy. A few years ago I had a boss who wanted to use XML everywhere. No matter what. The whole company only had Java+XML as a selling point.


Aside from wall of what was saying here, if you use JSON, you saved yourself from the attributes vs. elements discussions. Which is a net profit.


I love APIs that offer JSON instead of XML. Especially when I'm programming in PHP, parsing XML is so tedious. On the other hand, using json_decode is a pleasure.


I can't stand PHP's DomDocument OR SimpleXML. I think it's mainly PHP's libraries at fault, though - I've made short work of tasks that were quite annoying in PHP using Beautiful Soup in Python.


PHP's DomDocument is pretty much the same as every other platform's DomDocument. Nearly the whole XML API universe is full of suck.


There are neat implementations out there. I particularly like C#'s LInQ for XML. Lets you shred that (untyped) XML into strongly typed objects faster than you can say "wow that's declarative".

If you have a typed format (i.e. XSD), there are enough tools out there to generate the code <-> XML interchange boilerplate. In C#, this is typically solved by creating a serializable class with attributes/annotations denoting what XML nodes the actual class-properties map to. Again, fairly painless. And declarative.

If your main beef with XML is that the libraries you have to work with is an exact replica of the W3C DOM-model, then it is your tooling which sucks. XML is just fine, when used correctly ofcourse.

Think how jQuery managed to make writing JS fun, as opposed to when you had to work with the plain DOM. HTML is not that far from XML and if tooling can make that kind of difference, judging a technology exclusively based on your lacking libraries (or lack of exposure to good ones) can really not be considered fair by any means.

Now with that said, I find JSON nice as well, but for different reasons and with different drawbacks.


> Think how jQuery managed to make writing JS fun, as opposed to when you had to work with the plain DOM.

Speaking of that, http://code.google.com/p/phpquery/ is an interesting project. I haven't used it, but it seems like it could make some XML parsing less strenuous.


In case you are interested in fast, streaming parsing of JSON, check out the yajl library (http://softwaremaniacs.org/blog/2010/09/18/ijson/en/)

In Python, I use the ijson bindings for yajl (http://softwaremaniacs.org/blog/2010/09/18/ijson/en/). Here is a blog post demonstrating ijson: http://thechangelog.com/post/1169335384/ijson-parse-streams-...


JSON is extremely powerful and very light for well structured matrix /array like data objects. This is particularly useful for financial time series objects (asset prices, economics, etc).

For example, in financial time series/ trading applications, see the following two charts, a data table and a visualization chart completely generated on the fly using JSON data inputs through ajax calls.

Makes the application very light and all rendering is done on the client side.

Market Returns Dashboard: http://bit.ly/iVxYoP

Daily Market Commentary: http://bit.ly/my5A3c


You can just as well use XML. The XSD list data-type makes for great building block for structured arrays. For example:

Schema:

    <xs:element name="series">
      <xs:simpleType>
        <xs:list itemType="xs:double"/>
      </xs:simpleType>
    </xs:element>
Document:

    <series>1.0 2.0 1.2 15.6 3.1 4.5 3.2</series>


Large data sets should be published in XML. Small transactional exchanges (already contextual) should be JSONP.


I agree, another way of phrasing this would be:

- XML is for documents

- JSON is for data-structures

Both of them do a great job in their domain, but you shouldn't try to mix them up and store data-strctures in XML or Documents in JSON


I guess it depends on your definition of "large," but I would say that large is where XML is at its very worst. When data can't fit in main memory, then the XML overhead provides no benefit and incurs a great cost.

I would argue that data in general does not fit well to XML. Documents do fit well as it's absolutely fantastic at markup, i.e. where there isn't repeated structure. As soon as you have repeated structure (i.e. data as opposed to datum) then XML becomes a terrible format. CSV, tab-delimited, gzipped and separated by pipe characters, RDF triples, N3, give me almost •anything* other than XML and I'll be much happier.

In fact, XML is the reason that RDF is dead in the water. RDF could have been a useful concept, but it's so encumbered by a terrible formats and tools that it will almost never be useful; it can not escape the existing body of XML-RDF. And such a waste in the name cargo-cult mimicking of HTML.


If we had a JSON query language as powerful as XPath would you feel the same way? I can't think of any other reason for agreeing with you (thought XPath is reason enough).

Edit: While underscore.js is wonderful, I doubt it performs nearly as well as XPath.


> I can't think of any other reason for agreeing with you (thought XPath is reason enough).

Streaming JSON API/parsers are not wide-spread (and definitely not as widespread as SAX-style streaming XML parsers), which makes JSON unusable for datasets which don't fit in memory.

And generally spreaking, I think most people would be satisfied with a CSS-type query language, I'm always surprised by the dislike my colleagues have for xpath and their lack of knowledge of it, even though xpath is one of the very few things I like in the XML infosphere (the other one being RelaxNG)


JSONSelect (http://jsonselect.org) provides a CSS-like query syntax for JSON. I haven't used it much yet, though, so I don't know how it performs.


As for streaming JSON parsers: https://github.com/lloyd/yajl

Around for four years. Stable. Language bindings for all major languages.


Is the real pro-XML argument, then, that it currently has support? As far as I know, there is nothing inherent to JSON to prevent anyone from writing a streaming parser.


> Is the real pro-XML argument, then, that it currently has support?

That's the only one I can find which would be relevant to milweed's assertion that:

> Large data sets should be published in XML.

Apart from the rarely used ability to mix data formats into a single semi-coherent document and some sort of backwards compatibility, I don't see much in the way of reason.

> As far as I know, there is nothing inherent to JSON to prevent anyone from writing a streaming parser.

Of course there is not, and there are several such parsers, hence my writing that they are not widespread. Were they impossible (or non-existent) or at the very least unknown to me I'd have written that they don't exist instead.


The JSON/XML debate is very similar to the static/dynamic typing debate. The costs of static typing or XML don't seem worth it until a project starts getting bigger and you want to try to impose more organization and structure.


But still we are referring to the term AJAX, when we are in fact using AJAJ (what a name. I actually understand why nobody uses it).


Have to say that I am surprised that only 20% of new apis are JSON. Seems like a shoe-in to me. Although, I use python and its perfect for that.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: