Hacker News new | past | comments | ask | show | jobs | submit login

It is very likely than I am an idiot, but I've always found parsing XML too hard, specially compared to JSON which is almost too easy.

Whether parsing XML is easy or hard, how often do you actually write an XML parser? If I'm digesting a JSON/XML document, I resort to a parser library for the language that I'm using at that point, so the complexity of writing such a parser is pretty much non-existent. Definitely not a compelling reason to switch to JSON.

Most XML parsers I've used are leaky abstractions. Even once the document is parsed, actually accessing the data can require a lot more complexity than accessing parsed JSON data.

IIRC, the popular C++ implementations were glorified tokenizers. It was up to you to figure out which tokens were data and how those tokens related to each other.

Ah, SAX. People built some true horrors with that API, just because it was "more performant" than DOM. Never mind that their hacked-together tree builders often leaked like sieves.

If there was an `XML.parse` just like there's `JSON.parse`, I doubt you'd say the same. As it stands, the added complexity in JS-land is to import a library that provides this functionality for you. Fortunately there are many, but I agree a built-in would be nice. It's a bit of a shame that E4X never landed in JS.

It's more than JUST library support. It's also that JSON deserializes into common native data types naturally (dictionary, list, string, number, null).

You can deserialize XML into the same data types, but it's not anywhere near as clean because of how extensible XML is. That's a big part of what's made JSON successful.

Right, but you inevitably end up with boilerplate "massage" code around your data anyway. Case in point: dates, any number that isn't just a number e.g. currencies or big numbers, URLs, file paths, hex, hashes. Basically any type that carries any kind of semantics beyond array, object, string, number, or null will require this boilerplate, only that your data format has no way of describing them except for out-of-band specs, if you want to call them such.

At least XML has schemas, and even if all you're doing is deserializing everything into JsonML like objects you're still better off because you'll have in-band metadata to point you in the right direction.

CBOR [1] allows the semantic tagging of data values and makes a distinction between binary blobs (a collection of 8-bit values) and text (which is defined as UTF-8).

[1] RFC-7049. Also checkout http://cbor.io/

IMHO the boilerplate code is much easier to read than understanding the nuances of XML if I have to read a document.

{"type":"currency", "unit":"euro", "amount": 10}

feels easier to understand than

<currency unit="euro">10</currency>

Maybe it's just conditioning, but I find the latter example easier to read and understand. In fact, I'd say that - in general - I find XML better in terms of human readability than JSON. I guess it just goes to show that we all see certain things differently. shrug

I think that's totally reasonable - because it was after all one of the goals of XML. That is, to be human readable.

There is a difference however between readable + parsable vs parsable + easily dealt with.

XML was not the latter. You have to do more work to traverse and handle XML inside your application than you do JSON, and most of the (reasonable) reasons for this are due to features that most cases don't need.

JSON makes the common case easy, XML doesn't.

How about:

<rec type="currency" unit="euro" amount="10" />

I don't think your problem is with the syntax, necessarily. It seems more like you prefer name/value pairs over semantic markup.

The biggest problem with XML is how easy it is to make a very bad schema, and how hard those can be to parse

Also, for what it's worth, your point is exactly why I mentioned E4X. Sure wasn't a panacea, but it had some things going for it.

E4X was exceedingly great. Loved it.

This is really only true in dynamically typed languages. From personal experience: parsing json in Java or Go without just treating everything as a bag of Object or an interface{} requires a ton of menial boilerplate work.

Super nice in python/ruby/javascript, though.

Swift 4 will have JSON encoding/decoding built in, and I wouldn't be surprised to see such a feature spring up in other modern languages too. Once that boilerplate is eliminated, json is a pretty decent solution.


I am very stoked for this.

> parsing json in Java or Go without just treating everything as a bag of Object or an interface{} requires a ton of menial boilerplate work.

From my experience in Java it is pretty simple using a library like Jackson. You define the types you expect to read, maybe add some annotations, and then it's one function call to deserialize. IIRC Go has something similar in its json library.

Yes, it's arguably nicer in Go, because you specify exactly what types and field names you expect, and then it's just a simple json.Unmarshal

Sure--it's kind of a pain in Swift, too.

Wouldn't it just be worse with XML, though? I get that people don't realistically parse it themselves and libraries are smart enough to use schemas to deserialize, but there's nothing inherent about JSON that makes it unable to conform to a schema or be parsable by libraries into native strongly-typed objects the same way.

Except JSON doesn't have semantics to describe schemas, only arrays, objects, strings, numbers and null. You can say "but this key is special" but then it's not JSON anymore. And if you're ok with that, may as well just use JSON-LD or some other JSON-like format.

Idk, I think JSON parsing is pretty ergonomic in Rust, definitely nicer than your typical XML DOM.

Of course, JSON doesn't support complex numbers, bignums, rationals, cryptographic keys &c. And it'd be even worse than XML to try to represent programs in.

JSON is definitely easier to decompose into a simple-to-manipulate bag-of-values than is XML.

XML is fundamentally much more complex than JSON so any XML parsing library will inevitably present more complicated API. I kinda like XML (!), but there is no point pretending that using it is as simple as JSON.

I think that depends what you mean by "using it".

XML can convey a lot more semantic meaning than JSON ever will, and standardisation of things like XPath, DOM, XSLT, etc provides a lot of power when working with XML documents.

With JSON, essentially everything is unknown. You can't just get all child nodes of an object, or get all objects of a certain type, using standard methods. You need to know what object key 'child' nodes are referenced by, or loop through them all and hope that what you find is actually an array of child nodes, and not e.g. an array of property values. Finding all objects of a given type means knowing how the type is defined, AND the aforementioned "how do i get child nodes" to allow you to traverse the document.

Of course that assumes what you have is a document, and not just a string encoded as JSON. Or a bool/null.

My point is, the tooling around XML is very mature. "Use" of a data format is a very broad topic, and covers a lot more than just "i want to get this single property value".

Absolutely. Right tool for the right job. Mixed content (perhaps a paragraph with bold and italics) is absolutely horrible in JSON because it lacks the complexity that XML has to cope with this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact