Hacker News new | comments | show | ask | jobs | submit login

KISS is not a magic formula, though. The precursor to SOAP was XML-RPC which is very simple. So simple it didn't allow time zones in datetimes, and didn't support characters beyond ASCII in strings. There was no way to extend XML-RPC to support unicode or unambigous datetimes. This basically killed XML-RPC for most of the world. JSON already have problems because there is no datetime format. People need datetimes, so several incompatible hacks have been designed to represent datetimes as strings. There is no way to extend JSON to support proper datetimes.

The complexities in SOAP are there because someone need it. Someone already have invented a schema language for JSON: http://json-schema.org/. If every feature from the SOAP stack is reinvented for JSON, JSON will end up as complex as SOAP. Then someone else will surely invent yet a new standard (web-sexprs?) and claim it saves us from the needless complexity of the JSON-stack.

The bottom line: Some systems have simple needs, some systems have complex needs. Making a format layered and extensible (like the SOAP-stack) introduces additional complexity. But a non-extensible format requires complex hacks if your needs grows beyond what the format were design for. There is no perfect one-size-fits-all data format.

Fundamentally, XML has the wrong information model for most applications -- it is a document markup language often used as a data serialization format. SOAP inherits quite a bit of this complexity, irrelevant to the task of data serialization. Even if all of the necessary features from SOAP are layered on top of JSON, the result would be less complex than SOAP.

Unless of course you actually want to transmit structured document content over JSON - which is actually pretty common in AJAX applications.

In that case, you can use the appropriate XML format for the structured document content, then store than in a JSON string. Most of the time the "structured document content" you're sending over AJAX is HTML or SVG, and the browser can still handle parsing and validating it.

Yeah, and it works fine. But it is hard to argue that JSON+XML is a simpler data interchange format than just XML.

It might be better though, because the JSON and the XML is handled by different layers in the application anyway.

Sure it’s simpler: they’re used in two separate layers – each with its own purpose – which are consumed by separate components (and actually more, because when you send this down you’re of course wrapping it in HTTP and TCP &c.). For a better understanding of why this kind of design is simpler, and therefore better, I recommend Rich Hickey’s talk: http://www.infoq.com/presentations/Simple-Made-Easy

Edit inre “more complex compound data exchange format”: No, the point is that this should be thought of as two simple protocols wrapped one inside the other, not one “complex” format. Watch Rich Hickey’s talk. It would be a complex format if the two layers reached across into each-other, if the consumption of one depended on the details of the other, etc. But if they’re kept properly separate, that’s not complex – by Hickey’s definition anyhow, and I think it’s an excellent definition.

Agreed, the overall architecture may become simpler by choosing a more complex compound data exchange format.

The important thing to remember is that sometimes the solution to a problem adds so much overhead that it actually creates a bigger problem than the problem it solved. That has been the fate of SOAP, at least for the vast majority of uses.

That reminds me of that saying I heard about XML:

"Some people have a problem and think they can solve it with XML. Then they have two problems."

Sidenote: that's actually a reference to a usenet post from 1997 by Jamie Zawinsk, a Netscape engineer and the driving force behind the open-sourcing of the Mozilla codebase. In the original, Zawinski was talking about regex:

> Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Zawinski's post, in turn was a reference to a 1988 sigfile from John Myers, quoting D. Tilbrook:

> Whenever faced with a problem, some people say "Let's use AWK." Now, they have two problems.

You can find a detailed history here:


Why not just use Unix time (in milliseconds) for your timestamps? You'll have to convert it to whatever your desired container format is on the other end of the wire, but...you're going to have to do that for all of your other data as well.

Sure. Or you can use one of the ISO 8601 date formats encoded as a string. Or the special string "\/Date(...)\/" which some JSON libraries have chosen. The problem is not to encode a date, the problem is there is no agreed way to do it, so you can't be sure that the other end actually receive a date.

It would be ambiguous, since you need to know what timezone the timestamp is from. This is why most applications use ISO8601.

Unix time is always UTC, isn't it?

Yeah, it is. I was thinking that since it just a raw number of seconds with no TZ designation, that there's too much room for incorrect implementations using localtime.

xml-rpc was a classic Winer format and like RSS it shared the same limited interest in cleaning up oversights. Not handling unicode was simply lazy - the support was there in XML so it took effort to disable a native feature. Similarly, date encoding is a solved problem for almost all applications: use ISO 8601.

This is not to say that there are areas where complexity will creep in or that some people haven't learned from the failure of SOAP (or CORBA before it) but simply that for the vast majority of cases the problems just aren't that hard unless we choose to make them hard.

FWIW, I'm using json-schema in a project and find it both fantastic and lightweight, and it feels not at all like I'm tending towards SOAP. YMMV

To bring out your implicit point, use the right tool for the job. For a simple job use JSON, for a complex job use XML (and XML schema, XSLT etc). Moreover, the existence of XML keeps JSON simple.

Instead of complex needs motivating people to complicate JSON with hacks, they just use the XML stack. Complicating proposals for JSON (like json schema) don't get traction, because their would-be users already have their complicated needs met.

And... enough grey-beards are around to make this point. I will add that before XML, there was CORBA. XML was hailed as simpler, til people added all the bits that were missing. In the bigger picture, I think everyone is waiting for a replacement for the XML stack that really is genuinely simpler - not just reinventing the same hacks on a different base.

I am skeptical it is possible to invent a stack which solves the same set of problems as SOAP but is significantly simpler. (I would love to be proven wrong though!)

The other limitation of JSON is there doesn't appear to be a standard way of representing Unicode code points above 16 bits (such as some Emoji) in strings. One way I've seen it done is to put two escaped UTF-16 surrogate pairs together; another way I've seen it done is having the UTF-8 literal inside the string.

Please forgive my ignorance, but why not? Couldn't the entire JSON response be, for instance, UTF-8 encoded, thus allowing the representation of all possible Unicode code points?

Yes. In fact, JSON documents must be Unicode text, and the default encoding is UTF-8.

I'm still wet behind the ears when it comes to Unicode, but why can't the character just be transmitted "natively"? (Like you said, UTF-8 inside the string assuming the encoding is set to utf8) {"I'm unicode" : "←"}

In case anybody is nervous about this, the JSON RFC declares UTF-8 to be the default encoding, and points out a simple rule for reliably detecting the Unicode encoding (and endianness) of any JSON text encoded with UTF-8, UTF-16, or UTF-32. You can stick any Unicode characters in a JSON document, unescaped, with confidence that any halfway decent parser will decode it properly.

Apparently the first solution is the correct according to the RFC: http://www.ietf.org/rfc/rfc4627.txt

To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair.

This is regardless of whether the json itself is UTF-8 or UTF-16 encoded.

datetime is a data format problem, not a data structure format problem.

Agreed; defining that stuff is up to an higher layer - we just need to make a standard for that too.

JSON-LD, for example, has typed datetimes, links, etc: http://json-ld.org/

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact