One of the core aspects of XML that is really important is that no typing is inf...

stingraycharles · on March 7, 2014

I think it's refreshing to hear someone advocate XML instead of JSON, specifically because you bring up a good point.

The problem I think is that just because XML is human-readable, it's less sufficient as a format that is human-writable (I'm looking at you, Maven!). I believe this is the root cause that many people hate XML, even though it has a very sweet spot in application-to-application communication.

Perseids · on March 7, 2014

I would even argue that XML is not even that human-readable. Take a look at this pom: https://maven.apache.org/pom.html#The_Super_POM . Even with syntax highlighting it is extremely difficult to parse visually. Compare that to nginx's custom config file format: http://wiki.nginx.org/FullExample .

ollysb · on March 8, 2014

If you take the brackets and the closing tags out (use meaningful space) it's a hell of an improvement[1], . A format I really like (ok it's aimed at html not xml) is the slim templating language[2]. It manages to pack the same information in but is massively more readable.

[1] https://gist.github.com/opsb/9424457

[2] http://slim-lang.com/

stingraycharles · on March 8, 2014

Yeah this is exactly where my hate towards Maven configuration comes from, but it's more a testimonial of a bad fit for configuration files than critique towards XML. Java enterprise application configuration has the tendency to be very "expert-friendly", and this is where XML got its bad name from.

mattfenwick · on March 7, 2014

> Ah crap! Deserializer blew (in most cases silently converting the number to null)

Right -- the parser blew it. That many implementations do this is frustrating (and caused me so many problems that I ended up building my own validator for problems like this: http://mattfenwick.github.io/Miscue-js/).

JSON doesn't set limits on number size. From RFC 4627:

An implementation may set limits on the range of numbers.

It's the implementation's fault if the number is silently converted to null.

I guess we need better implementations!

> JSON is a popular format but it's awful.

If you're willing to take the time to share, I'd love to hear more examples of JSON's problems. I'm collecting examples of problems, which I will then check for in my validator!

stegro32 · on March 7, 2014

If you're looking for examples of problems, RFC7159 (http://rfc7159.net/rfc7159) is a good place to start - just search for 'interop', as suggested by [1]. A quick look at Miscue-js suggests you already check for most of them, but you might still find something new.

[1] http://www.tbray.org/ongoing/When/201x/2014/03/05/RFC7159-JS...

giblaz · on March 7, 2014

Your example doesn't do anything but make XML look as bad as your saying JSON is. Think about it again - do you think your first XML example doesn't ALSO have to be deserialized twice (once into an XML in memory tree, once into a number)? It does. Also, both examples will fail if you try to deserialize either of them into numbers...

Regardless, JSON is so much more readable that I'm very glad it's pushed XML out of the picture for the most part.

bananas · on March 7, 2014

Actually no you couldn't be more wrong.

XML can be read as a stream and at certain points like after reading an element or attribute, an object can be created on the fly or a property on an object set and the type deserialised at the same time. The types don't have to be native types either; they can be complex types or aggregate types such as any numeric abstraction or date type you desire.

See java.xml.stream (Java) and System.Xml (CLR) for example.

As for readability, some XML is bad which is probably what you've seen but there's plenty that's well designed.

XML is afflicted with piles of criticism which usually comes from poor understanding or looking at machine targeted schemas that humans don't care about.

You'd complain the same if you looked at protobufs over the wire with a hex editor.

icebraining · on March 7, 2014

And the following is not acceptable as it breaks the semantics of JSON and requires a secondary deserialisation step as strings ain't numbers...

XML strings ain't numbers neither. You can throw a big decimal deserialiser (e.g. as a custom deserialization adapter) at a JSON document as well.

bananas · on March 7, 2014

Let's break this down into two statements:

XML doesn't have strings (or types at all really)

JSON strings are strings.

There is a massive semantic difference here when it comes to parsing.

sanderjd · on March 7, 2014

What is that massive semantic difference? If you want the number represented by 1e999 as the value for salary, at some point, something has to take "1e999", whether you call it a string or a something-with-no-type, and turn it into a number. Your deserializer has to know to do that in either case.

bananas · on March 7, 2014

As follows. It's more how the abstraction works.

XML:

  ->[byte stream]->[deserializer]->[bignum]

JSON:

  ->[byte stream]->[json reader]->[string]->[deserializer]->[bignum]

The latter is, well, wrong.

icebraining · on March 7, 2014

Multiple JSON deserializers have that mapping integrating, eliminating those steps. See, for example, the ContextualDeserializer in Jackson.

sanderjd · on March 7, 2014

How does the [deserializer] step in the XML example know to call into [bignum], and why can't the [json reader] in the JSON example have that knowledge in the same fashion?

jdbernard · on March 7, 2014

Because the XML document has a semantic meaning that is specifically designed for this application. It may even have a schema definition document which formally defines what types to expect. JSON, by contrast, has type definitions imposed on it by its nature as JavaScript code.

sanderjd · on March 8, 2014

I've sort of lost track of what this debate is about... Assuming you don't have a schema definition, it seems to me that you can just as easily parse `{ "salary": "1e999" }` with application-encoded semantics as `<salary>1e999</salary>` with (again) application-encoded semantics. Maybe having a formal schema definition is a win, though.

jules · on March 7, 2014

The equivalent of your XML would be:

    {"name": "bob", "salary": "1e999"}

snowwrestler · on March 7, 2014

I believe that creates a string with the characters "1e999", not the number 1e999.

beagle3 · on March 7, 2014

Same as the XML

snowwrestler · on March 7, 2014

I don't think XML does either by itself. The schema will determine which fields are parsed as strings and which are parsed as numbers.

beagle3 · on March 8, 2014

iff you have a schema, and a parser that actually uses it. I've seen a few DTDs but the vast majority of XML documents don't have a schema or even a DTD to follow.

And the vast majority of parsers will not parse anything for you, regardless of schema definitions.

Which effectively puts you in the same place as the JSON string.

jules · on March 7, 2014

Exactly.

pierrebai · on March 7, 2014

Either to author of the serialized data realized that the numbers could overflow a float or didn't. This is independent of serialization format.

In your contrived example, somehow, the user of JSON didn't realize the salary could overflow a float. (OTOH, he succeeded in serializing it, mysteriously.) All the while, the XML user was magically forward thinking and deserialized the value into a big decimal. Your argument simply hinges on making one programmer smarter than the other. If one knows that a value will not fit a float, the memory representation won't be a float and the serialization format won't use a float representation. It has nothing to do with JSON vs XML.

edraferi · on March 7, 2014

This. Types are a huge pain in JSON, particularly the lack of a good date time type. BSON fixes tips, but only of you're using MongoDB and are willing to give up the "human readable" requirement outside of mongo.

halflings · on March 7, 2014

JSON's semantics is that you represent numbers by their decimal representation.

In this particular case, you're giving a different representation, so of course you an pass it as a string.

andor · on March 7, 2014

His point was that this number is too large to store it in a Javascript Number variable (which is a IEEE 754 double).

shawnz · on March 7, 2014

OK, so the provided number format is not sufficient for the kind of numbers he is trying to deal with. So instead you would represent it as a string and handle the encoding/decoding of that number yourself. How is that different from the XML way where there is no provided number format to begin with, and everything is a string?

boomlinde · on March 8, 2014

That's completely irrelevant. Grok the JSON specs and reconsider what the javascript number format has to do with it.

DougBTX · on March 7, 2014

1e99 is valid JSON, that isn't what he is complaining about. See: http://json.org/number.gif