Hacker News new | comments | show | ask | jobs | submit login
Stop Comparing JSON and XML (yegor256.com)
66 points by padraic7a on Jan 5, 2016 | hide | past | web | favorite | 57 comments

This article is pretty garbage. compares almost nothing about xml and instead compares the tooling around xml giving no thought to the tooling around json.

instead, let me give you my unsolicited opinion:

if you need to represent both the structure of your data and characteristics within that structure, xml is great because attributes are a really good way to do that. there's a reason most UIs are represented as XML.

if your data is just - well, data - use json. or better yet use edn

Except what if "characteristics within that structure" also have structure? Attributes are an ugly, crippled special case of a tree node; they make some sense in a markup language, but for serializing data, I see zero point in using them.

As for why UIs are represented in XML? It seems like a simple case of people getting used to it before JSON was popular. XML is a default choice for Java and for C#. It also looks vaguely familiar to HTML.

Agreed, they're not convenient for arbitrary attributes. I think where they shine are using a well defined preset of attributes to define structure.

Regarding the legacy thing, I think it's both. If you look at a lot of new tooling coming out that has to store UI state, people are still using XML

It's funny to say json and edn represent data, because they are both strictly UTF8 encoded - raw bytes have to be encoded somehow (with neither standard specifying a prefer erred method) to be represented. Why is this important? Sometimes we want to embed a binary formatted piece of data (e.g. an image) as part of our data.

Isn't that the point of the 'hashtags' in edn? If a #png arrives then your program is going to throw an exception unless you've registered a handler that reports success in decoding it?

In effect it doesn't matter than the encoding is arbitrary edn (e.g. a 150KB png could be a solitary bigint literal) because the tags prevent false-positive decodes and keep things "strongly typed".

If you control consumers and producers, and your data is just data, please just use a safe, common binary protocol (protobuf etc).

It is a great example "Worse is better"[1]. XML has a lot more functionality than JSON but very few people can fully understand all the X* family of specifications, whereas JSON is a _lot_ easier and handles majority of the use cases.

There are many, many real life situations where a developer needs to choose between using JSON, XML or YAML to store their configuration data, message formats etc. Simply stating that JSON is good only in one scenario and XML must be used in every other is over-simplification.

[1]: https://www.dreamsongs.com/RiseOfWorseIsBetter.html

In all fairness, 99% of the time you will use a library to serialise and deserialize. So I always found this xml vs json debate a bit sterile. As long as it is possible to inspect the file visually ocasionally, they're both good enough. I don't know anyone who edits manually huge amounts of xml/json, and if they do, there is probably a better way.

One consideration that's worth thinking about is that XML serialization is larger (often not insignificantly) than the same data serialized as JSON.

On the other hand, XML can make error handling much easier by schema-validating received documents and thus rejecting a large class of invalid inputs at an early stage. This is particularly helpful if you're making something interoperable, like a public API.

So even when you're just using a library, there are ramifications to the decision.

Correct. But in the order of things I care the most about, using angle or curly brackets matters a lot less than problems like: can I serialize multi-dimensional arrays, dictionaries, arrays of bytes, etc.

And 99% of the time the JSON serialize/deserialize library is much simpler and straightforward than the XML one, especially if your language has native hashes and arrays.

There's also a bitter taste in many of our mouths left by the "XML All The Things!" crowd that brought us such monstrosities as Maven config files and XLST.

XLST is useful when you want to change an Xml configuration file based on whether you are in debug or release mode, without having to duplicate the whole thing. But I wouldn't hold it against Xml, you only use it if you find it useful.

XLST the language is great. I love the idea, it's wonderful and powerful in practice.

XLST the syntax is an abomination. Trying to shoehorn a language syntax into XML was a stupid, stupid idea.

The entire article rests on this:

> JSON was not designed to have such features [as XPath, XML Schema, XSL, etc.]

But this claim is not justified, just stated. And the supposed inferiority of eg. JSONPath vs XPath is not justified either.

I would actually claim that JSONPath is superior to XPath. JSONPath is much simpler, easier to understand, easier to implement, and still fulfills the most common use cases. Also JSONPath can be evaluated on a streaming input, which is not possible for XPath in general (maybe some subset of XPath queries could support streaming).

The main problem with comparing XML and JSON is that their use cases don't perfectly overlap. I work with complex text documents, typically encoded in TEI schema, and it would be insane to try to do this work in JSON. It's possible, but the result would be an incomprehensible mess.

Unfortunately, a lot of programming languages have poor support for XML, and standard libraries usually only give you XPath 1.0 compatibility. XPath 2.0 and XQuery 3.0/3.1 are far more powerful and flexible, but you need Saxon or a good XML database to make proper use of them.

I am yet to see an example that makes me say "this JSON file does not represent the data very well, I wish they used XML instead". Representing HTML/XML does not count.

Encode a hash table where key order matters (like the Python OrderedDict). Or a hash table with non-string keys (like a sky chart with XY coordinates). Or any other data structure which doesn't conform to a simple list/hash table format.

Right tool for the job. JSON is a great 80% encoding format, which handles most cases. The problem is that folks try to make it handle all cases (ironically, just like they did with XML).

If you ever do, https://www-01.ibm.com/support/knowledgecenter/SS9H2Y_7.1.0/... is ready to help you out...

I've done a bit of xml encoding myself, more custom schemas than TEI but I've dabbled in TEI a little too. I think xml is wonderful as an encoding language for text, and can't see anything else replacing it in the near future - unless someone comes up with a really extensible, and easy to use wysiwyg editor. I think the lack of available open source processors is a pain. I remember doing a college assignment and getting really frustrated because my xslt wouldn't produce what I expected - it turns out the Home Edition of Saxon didn't implement this. I wonder Saxon's better implementations being closed source hurt the spread of XSLT and Xquery at a crucial time?

Sure, if you define "XML" to include algorithms that operate on it like XSLT and XPath, but define "JSON" to be just the data format, then the former is indeed more featureful.

That was my first thought, that there's tooling around JSON to do all of the above... for that matter straight python, js/node and other languages do very well processing JSON... if your encoding ensures no unescaped cr/lf, then a record per line in json + gz is awesome.

XML is literally too expressive, and you can't tell how to deserialize xml with a good fragment... json you can (at least better)... the query/expression syntax is even worse than learning/using a simple general purpose programming language.

Stop comparing JSON with XML... let me compare them to show why.. ;)

The article misses the mark, but XML--the language itself, not the tooling--does have an advantage: it is designed to be quite good at one thing that's often neglected despite being in its name: extensibility. XML provides enormous flexibility via namespaces and integrating many schemas that your application may or may not understand in one document. When done correctly, it can be a huge win in interoperability in heterogeneous environments and across versions.

That said, you could very well argue its additional complexity is often not worth the gain in most applications.

Of course, you can also define your own extensibility/interoperability conventions with any data serialization format, but the point is XML has it baked in the standard and already provides an accepted way of doing things that everyone has implemented.

It's worth linking to the famous XML rant of Erik Naggum:


It's both very insightful and enjoyable to read.

XML is interesting case in point that syntax matters. XPath and XSL are so awesome but barely anyone cares because XML.

XSL is a case that even authors found XML syntax too obnoxious, so they included special non-XML syntax in string attributes.

Are you referring to Attribute Value Templates?

     <div id="{@ID}"/>
It's simply a built-in template for "xsl:value-of" result fragments to be embedded directly in the literal result fragment's attribute. Call it "string interpolation", since it effectively serves the same purpose specifically for attributes in your literal markup for a given template.

      <xsl:attribute name="id">
        <xsl:value-of select="@ID">
AVT is much nicer, to my eyes, though I will use the longer notation for more complex values. And, by definition, it's standard XSL.

It was practical, easy enough to do and didn't break any existing stuff. It's still XML, right? So why not? Nothing to do with obnoxiousness.

It's easy enough to create an alternative syntax for it, one-to-one translation. For HTML (XHTML?) there are plenty examples already like Jade.


But people who are serious about XML-related technologies understand that syntax of XML is mostly fine, interoperability and existing tools matter way more. It would be hard for such an alternative syntax to catch on. People often forget that it is a direct descendant of SGML, I imagine for similar reasons - there were existing tools for it.

The reason it's "easy enough to create an alternative syntax for it, one-to-one translation" is because XML document is a tree. Just that. There's an argument to be had about tooling & specification ecosystem, but that can be replicated in other formats, and if all you want is hierarchical data representation, it's not worth it to poke your eyes out while working with human-unreadable format XML is.

What's important about XML isn't XML itself. It's specifications, standards, toolset. Just because it can be replicated doesn't it will be. That's probably man-centuries of work.

I agree. Though a big part of the XML ecosystem is made of dead ends and very domain-specific stuff. But either way, we need to be precise - let's evaluate XML and XML ecosystem separately. If you're not heavily exploiting the latter, XML is almost never a good choice - because alone by itself, it's just a tree notation that sucks.

> syntax of XML is mostly fine

If that was entirely true there'd be no JSON.

Are you even serious? Of course there would be different formats for every goddamn thing under the Sun.

What really happens there is called Impedance Mismatch

Similar to this, but with XML


Remember how Scala had XML as a first class citizen in the language? It didn't work out that well.

There are different programming paradigms and I would (entirely serious here) say that XSLT, XQuery, XProc and related XML technologies belong to a separate category. It's mostly declarative and pure, but there are some imperative elements here and there. It's just a different way of doing things, which has XML and XPath at the very center.

JSON doesn't have that Impedance Mismatch with Javascript and languages that support JSON well, that's why it's there.

I'm serious. We had AJAX not AJAJ. JSON mostly replaced XML in its only popular application. There would be no point of having JSON (as a thing not just JS notation for object literals) if XML wasn't that horrible to look at.

> Remember how Scala had XML as a first class citizen in the language? It didn't work out that well.

Funny how well having (sort-of) xml as first class citizen works in React ecosystem (in HTML too, to some degree). Easily composable bite sized chunks without over-engineered namespaces, doctypes and such. If only xml designers thought of their creation as pieces of flexible, composable hierarchical data not as documents.

> It's mostly declarative and pure, but there are some imperative elements here and there.

Yes. And despite all that it gets almost no love. My point is that this is mainly because of syntax (a thing that is considered by many, superficial and unimportant).

> JSON doesn't have that Impedance Mismatch with Javascript and languages that support JSON well, that's why it's there.

I believe you are overthinking this. JSON is just xml subset with slightly different syntax. If there is no mismatch between JSON and Javascript on any level beyond the syntax then there was no mismatch between subset of XML that was used for same purpose before JSON became popular.

With JSON you're not working with it as a tree, you're working with it as a set of objects. But what do you use to work with XML(or HTML) as a tree? You use DOM, OO API. And DOM sucks. There are wrappers around DOM, like jQuery, but even that leads to some spaghetti code.

JSON is a first class citizen in JS, XML isn't and never was. E4X never was properly supported by browsers.

Namespaces and doctypes are not mandatory.

Merits of a language and/or framework has little to do with it's popularity. There are much better languages/frameworks than Go or Angular, for example, but that doesn't make them popular.

I'm not overthinking it, it's not my idea in the first place.

> If only xml designers thought of their creation as pieces of flexible, composable hierarchical data not as documents.

That's just plain nonsense, what are you talking about? Nothing was stopping you from using XML the way it used in React. In fact these recent 'discoveries' in Javascript-land (React, Flux, Redux) are totally not new ideas.

> With JSON you're not working with it as a tree, you're working with it as a set of objects.

I can't agree with that. Each JSON payload is a tree with three kinds of nodes. Lists, associative arrays and values (of few flexible types).

I'm not sure what you mean by set of objects because JSON doesn't have set type. You can have something like set if you use an array but disregard order or use associative array but disregard types. Also the objects can contain objects. That's what makes it actually a tree.

> But what do you use to work with XML(or HTML) as a tree? You use DOM, OO API. And DOM sucks.

Yes but you don't have to. You could map XML (at least the subset commonly used with AJAX) directly to tree of plain old JavaScript objects (you'd need some conventions though because in xml you can invent various ways of representing lists and associative arrays).

> Namespaces and doctypes are not mandatory.

Probably just my personal experience but some implementations of xml interoperability didn't allow me to specify default namespace for my xml tags which made me add bogus namespace identifier and prefix all my tags with that.

Namespaces are optional but that optionality is not stated clearly enough in the spec and some apis don't allow you to forget about namespaces.

> Nothing was stopping you from using XML the way it used in React.

Just lack of good syntax/api.

> I'm not overthinking it, it's not my idea in the first place.

I was referring to your argument (as I understood it) that there are some deep mismatches between JSON and XML akin to object relational impedance mismatch. It's not that JSON is objects and XML is data. Both are just trees of values.

My point is that the difference is just syntax but I'd like to backpedal from that now. JSON vs XML is not just syntax and lack of XML oddities. It's also a good deal of "less is more". Now I think JSON is XML but with one obvious way to express sequence and one obvious way to label the data for access.

Anyways. I think we can agree to disagree and finish at that.

Bottomline is that you're not free to use what you want in the browser, that's what drove the popularity of JS (and JSON). If you think that XML was a popular choice, but then lost its popularity due to JSON having better syntax - that's not how it happened.

JSON is easier to use, that's why it displaced XML in browsers. But if there was native or close to native (WebAssembly?) support of things like XSLT/XQuery 3.0 and XForms 2.0, then I would switch away from JS and its framework zoo in a heartbeat.

> deep mismatches between JSON and XML

not between JSON and XML, google "X/O Impedance Mismatch"

Ok, let's take the points one by one.

XPath: JSON doesn't need this in, say, Python or Javascript. You write normal code once it's a Python/JS dict/array/list and you're done. I don't need yet another language, I have general purpose ones that will do just fine.

Attributes and namespaces: these can sort of be faked, but fair enough. But then you get discussions on what should be an attribute and what shouldn't...

Schema: pretty sure this exists for JSON

XSL: Ah, the poor man's Lisp macros... And, again, easier to do in code in a scripting language.

> XPath: JSON doesn't need this in, say, Python or Javascript

That's not exactly true. There are books in the library. Here's how I get some book's main character's name: library[5]['mainCharacters'][3]['name']. Except I want to know all main character names in all books. This is a clear, simple request, there's no need for me to put 2 cycles here if I don't have to. So I would end implementing my own XPath here anyway, to write declarative stuff declaratively.

But then again, I don't have to, because JsonPath already exists and is just fine.

IMHO JSONPath is an unnecessary DSL when you have a sufficiently powerful and expressive language. It basically abstracts a less-powerful version of the essential higher-order functions.


> Here's how I get some book's main character's name: library[5]['mainCharacters'][3]['name']. Except I want to know all main character names in all books.

    library.flatMap(x => x.mainCharacters).map(x => x.name);
> This is a clear, simple request, there's no need for me to put 2 cycles here if I don't have to.

Using higher-order functions (and a functional programming style) it's still declarative. Yes, there are still nested cycles but, just like in JSONPath, they're hidden in the abstraction.

Plus, not using a DSL but a fully-fledged programming/scripting language, you can store intermediate results, make more complex queries, etc. which is more often than not what you need.


I've used JSONPath mostly in Bash scripts to quickly hack some JSON manipulation, but I'm slowly transitioning into using more powerful scripting languages and not using it anymore.

I'm not really that familiar with JSONPath though. Are there any use cases where it's really convenient?

I would say, that being a DSL is a benefit by itself. This is how the above example would look in Python:

    [character['name'] for book in library for character in book['mainCharacters']]
(Python also has maps and stuff, but it's considered non-idiomatic, plus flatten would be more awkward.)

First of all, syntax is quite different from what you use (Scala, I suppose?), when were we using JSONPath we could just copy 1 line from one project to another and that would be fine.

Moreover, both our implementations have the same problems: we assume every book has mainCharacters and every character has name. Would it be JSONPath — it doesn't matter, no 'mainCharacters' means path doesn't match the pattern, just skip it. In our cases this means exceptions.

And what if we want to get all 'name' fields from whatever object at whatever depth? Or 'name' of every object where 'color' is 'yellow'?

Now, if you consider dictionary structure much more nested (say, some AST) — processing that without errors would be quite painful. And you also would end up writing your own XPath (JSONPath), even with all your map's and reduce's.

Of course, should it get complicated enough we would end up needing to write something custom anyway, but stuff like JSONPath just helps to keep things simple when possible. That's it.

> I want to know all main character names in all books

Off the top of my head, without testing it:

        for characters in [
            for book in library
        for character in characters

Actually the reverse is true: you should compare XML (and related tech) vs JSON vs other options (say HTTP headers) and choose what's best for you. I've seen projects that are all about editing books and stubbornly use JSON, where XML would let them have schemas, and I've seen huge chunks of XML being exchanged where a simple key value pair would suffice. Things like JSON Schema indicate that people are using JSON for things not intended for it, likewise XML can be put to bad use.

The main strength of JSON is that it directly maps to simple data structures that are first class citizens in many programming languages like Ruby, Python, JavaScript, Golang, etc.

XML is a bit more complicated, and often need dedicated libraries to manage. Every time I try to get the nth text element of node X with elementtree, it's bit of a hassle.

The metadata argument is rarely applicable. Metadata tends to be rich, and it's horrible to represent rich data in opaque string attribute values.

Rich metadata is therefore represented as child nodes, giving them similar child/sibling status as JSON children, and the semantic ambiguity of when to use XML attributes vs child nodes remains.

Nonsense. They look very different, and that matters, at the very least. In particular I like that JSON has some sense of "type" built into the format, when you omit quotes you know it's a number or a boolean. You can get that (and better than that, really) with XML Schema (or old-school DTDs) but it's baked into JSON. Plus, dealing with JSON in a JavaScript interpreter is about as simple as it gets; only if you're programming in XML (e.g. with Ant) would you ever say the same about XML. It's quite nice to be able to "dig in" to JSON using normal, plain-vanilla JavaScript dereferencing tools and looping constructs.

As for the tooling around XML, that's okay, but it's almost always overkill, and it almost always turns out the overhead of the tooling becomes a problem in-and-of-itself.

Anyway, JSON is actually a better data format.

I think your point could have been made even better if instead of quotes/numbers/strings you would have illustrated the in-band typing using square brackets and arity. Two bytes for "there can be more", so much better than the plural/singular convention often seen in XML (rarely without some creative breaches closely nearby).

In hindsight, the biggest advantage of XML over JSON was that it was painful enough to make schemas popular, a quality JSON is lacking. Unlike schema languages, which do exist.

To me, XML tooling lost quite a bit of its appeal when I realized that all the typing available via the various schema languages is completely lost to the world of xsl/xpath/xquery. I understand the reasons for that, but that does not provide much consolation.

There is a point, but I think the wrong things are compared to indicate that it's "apples to oranges".

XML is designed towards a type of problem that is not an everyday programming problem. It is designed for a full-fledged schema - it builds on the lessons of SGML and its predecessors such as GML[0], for people who need those things, which has historically meant "documentation writers". DocBook and DITA do not really have equals at what they do, which is semantic, textual content with rich markup. (Yes, you like TeX. But it focuses on presentation for typesetting, not semantic meaning.)

This means that in practice XML is really useful for describing an abstract, pre-tokenized syntax. This is a useful tool from a language design perspective; it lets you take an intermediate position between human-friendly and machine-friendly formats, without going straight to binary data or writing a full string parser. When computer language tooling emits an XML AST, they give tool-writers who would like to manipulate or inspect the AST a major leg up.

Simpler forms like sexps or JSON exact additional overheads on that problem that can be nearly as bad as just writing a custom string parser, once you get beyond the "strings, numbers, and simple containers" cases that are basically about data serialization, not data parsing. You want to have nodes that have unique names or attributes once you get into the parsing problem, but they are superfluous if you have plain old data. And as soon as you get into mixing different types of documents validation becomes a major concern and XML has the right groundwork for that.

It's just that most people don't want or need to deal with data of that complexity, especially since XML as a plaintext document just looks like angle-bracket trash. For those needs they are better off writing in something that a string parser can work with and then using XML as an intermediate, if at all. Even for the documentation-writing situation, it's easier to write Markdown for 98% of the prose and then convert it to add the last 2%.

Basically, XML has been used for far too many things that it shouldn't, and the blame for that lies on some 90's-era hype machines that decided that XML would be the buzzword of the future and pushed it into every technology. We got some nice tooling out of it, but in the end, it's still most useful for a certain kind of document markup.

[0] https://en.wikipedia.org/wiki/IBM_Generalized_Markup_Languag...

XML looks well for markup use - for documents with lots of text and sparse semantic tagging. Matching opening and closing tags is actually a pretty nice feature there.

RE AST, somehow Lisp managed to encode it in S-expressions since forever, and there was never a problem. In fact, writing Lisp code is mostly writing an AST directly. There is apparently no need for the additional syntactic sugar XML adds.

The one thing JSON misses that actually is important is symbol type. S-expressions have it, and at this point it's no longer "strings, numbers and basic containers" - per code=data, you have everything you need to encode an AST conveniently.

    But XML has XPath, XMLSchema and XSLT and JSON doesn't
XML has those things because it's data model is hostile to every programming language in existence and you need tools designed specifically for it to manipulate it concisely.

JSON fits well with the list/map/primitve data model that is common to most scripting languages so those tools aren't needed, you can just use javascript or python, something you use everyday anyway and doesn't look alien.

    A similar document would look like this in XML
... plus encoding and schema declaration and also all the garbage involved with dealing with namespaces. Or, at least, don't sing praise to XMLSchema and namespaces if you aren't going to put them in your example.

PS. XML is a markup language, if your data is a text document with markup then XML is a good choice, otherwise leave it alone.

So, what screenshot from "The Men Who Stare at Goats" is there for?

This tool will help to convert json to xml http://jsonformatter.org

Not to mention that XML is generally more suitable for streamed i/o.

Except that it is generally less suitable for streamed I/O than linewise JSON. Remember that you don't need to stream a single big document. You can make a stream of several separate documents.

Provided your data can be chunked in that way, yes.

Looks like a troll...

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact