One of the core aspects of XML that is really important is that no typing is inferred by the structure of the file unlike JSON. JSON is by nature tied to the JavaScript type system which is sparse and inaccurate. For example, if you look at the following:
{ "name": "bob", "salary": 1e999 }
Ah crap! Deserializer blew (in most cases silently converting the number to null)
I think it's refreshing to hear someone advocate XML instead of JSON, specifically because you bring up a good point.
The problem I think is that just because XML is human-readable, it's less sufficient as a format that is human-writable (I'm looking at you, Maven!). I believe this is the root cause that many people hate XML, even though it has a very sweet spot in application-to-application communication.
If you take the brackets and the closing tags out (use meaningful space) it's a hell of an improvement[1], . A format I really like (ok it's aimed at html not xml) is the slim templating language[2]. It manages to pack the same information in but is massively more readable.
Yeah this is exactly where my hate towards Maven configuration comes from, but it's more a testimonial of a bad fit for configuration files than critique towards XML. Java enterprise application configuration has the tendency to be very "expert-friendly", and this is where XML got its bad name from.
> Ah crap! Deserializer blew (in most cases silently converting the number to null)
Right -- the parser blew it. That many implementations do this is frustrating (and caused me so many problems that I ended up building my own validator for problems like this: http://mattfenwick.github.io/Miscue-js/).
JSON doesn't set limits on number size. From RFC 4627:
An implementation may set limits on the range of numbers.
It's the implementation's fault if the number is silently converted to null.
I guess we need better implementations!
> JSON is a popular format but it's awful.
If you're willing to take the time to share, I'd love to hear more examples of JSON's problems. I'm collecting examples of problems, which I will then check for in my validator!
If you're looking for examples of problems, RFC7159 (http://rfc7159.net/rfc7159) is a good place to start - just search for 'interop', as suggested by [1]. A quick look at Miscue-js suggests you already check for most of them, but you might still find something new.
Your example doesn't do anything but make XML look as bad as your saying JSON is. Think about it again - do you think your first XML example doesn't ALSO have to be deserialized twice (once into an XML in memory tree, once into a number)? It does. Also, both examples will fail if you try to deserialize either of them into numbers...
Regardless, JSON is so much more readable that I'm very glad it's pushed XML out of the picture for the most part.
XML can be read as a stream and at certain points like after reading an element or attribute, an object can be created on the fly or a property on an object set and the type deserialised at the same time. The types don't have to be native types either; they can be complex types or aggregate types such as any numeric abstraction or date type you desire.
See java.xml.stream (Java) and System.Xml (CLR) for example.
As for readability, some XML is bad which is probably what you've seen but there's plenty that's well designed.
XML is afflicted with piles of criticism which usually comes from poor understanding or looking at machine targeted schemas that humans don't care about.
You'd complain the same if you looked at protobufs over the wire with a hex editor.
What is that massive semantic difference? If you want the number represented by 1e999 as the value for salary, at some point, something has to take "1e999", whether you call it a string or a something-with-no-type, and turn it into a number. Your deserializer has to know to do that in either case.
How does the [deserializer] step in the XML example know to call into [bignum], and why can't the [json reader] in the JSON example have that knowledge in the same fashion?
Because the XML document has a semantic meaning that is specifically designed for this application. It may even have a schema definition document which formally defines what types to expect. JSON, by contrast, has type definitions imposed on it by its nature as JavaScript code.
I've sort of lost track of what this debate is about... Assuming you don't have a schema definition, it seems to me that you can just as easily parse `{ "salary": "1e999" }` with application-encoded semantics as `<salary>1e999</salary>` with (again) application-encoded semantics. Maybe having a formal schema definition is a win, though.
iff you have a schema, and a parser that actually uses it. I've seen a few DTDs but the vast majority of XML documents don't have a schema or even a DTD to follow.
And the vast majority of parsers will not parse anything for you, regardless of schema definitions.
Which effectively puts you in the same place as the JSON string.
Either to author of the serialized data realized that the numbers could overflow a float or didn't. This is independent of serialization format.
In your contrived example, somehow, the user of JSON didn't realize the salary could overflow a float. (OTOH, he succeeded in serializing it, mysteriously.) All the while, the XML user was magically forward thinking and deserialized the value into a big decimal. Your argument simply hinges on making one programmer smarter than the other. If one knows that a value will not fit a float, the memory representation won't be a float and the serialization format won't use a float representation. It has nothing to do with JSON vs XML.
This. Types are a huge pain in JSON, particularly the lack of a good date time type. BSON fixes tips, but only of you're using MongoDB and are willing to give up the "human readable" requirement outside of mongo.
OK, so the provided number format is not sufficient for the kind of numbers he is trying to deal with. So instead you would represent it as a string and handle the encoding/decoding of that number yourself. How is that different from the XML way where there is no provided number format to begin with, and everything is a string?
And the following is not acceptable as it breaks the semantics of JSON and requires a secondary deserialisation step as strings ain't numbers...
JSON is a popular format but it's awful.