Hacker News new | comments | show | ask | jobs | submit login

Do we really need this? Atom is fine for feeds. Avoiding XML just for the sake of avoiding XML, because it isn't "cool" anymore is just dump groupthink.

If this industry has a problem, it's FDD - Fad Driven Development and IIICIS (If It Isn't Cool, It Sucks) thinking.

Part of me is with you. But even in established languages I've had trouble finding an appropriate xml parser and had to tweak them way more than I thought necessary. I haven't (yet) had that problem with JSON.

I think with something like feeds there's the possible benefit of becoming a 'hello world' for frameworks. Many frameworks have you write a simple blogging engine or twitter copycat. I don't think I've ever seen that for a feed reader/publisher. People have said that Twitter clients were an interesting playground for new UI concepts and paradigms because the basics were so simple (back when their API keys were less restrictive). Maybe this could be that?

But even in established languages I've had trouble finding an appropriate xml parser and had to tweak them way more than I thought necessary. I haven't (yet) had that problem with JSON.

Maybe it's just that I work mostly with JVM languages (Java, Groovy, etc.) but I haven't had any problems with handling XML - including Atom - in years. But I admit that other platforms might not have the same degree of support.

Most of my experience is from Python. Each time I use it I have to look at the docs for etree (a library that ships with Python). We would hit performance and feature support issues with etree and tried lxml but had binary compatibility issues between our environments.

The Hitchhiker's Guide to Python[1] (a popular reference for Python) recommends untangle[2] and xmltodict[3], neither of which I've used.

I feel like in other languages I've used had similar brittleness when dealing with xml. I might be biased because working with xml in an editor it's difficult to validate visually or grok in general when used in practice.

[1] http://python-guide-pt-br.readthedocs.io/en/latest/scenarios...

[2] https://untangle.readthedocs.io/en/latest/

[3] https://github.com/martinblech/xmltodict

Beautiful Soup is alright in most cases. JSON is handled much better than any XML library I've seen so far though.

Oh yes, I've used Beautiful Soup, too. If I remember correctly I had great luck with html, but issues with xml. It also is only a reader, not a writer.

> Maybe it's just that I work mostly with JVM languages (Java, Groovy, etc.) but I haven't had any problems with handling XML

Yeah, no surprise. XML may as well be a native data-type in most core JVM languages.

It's not the case everywhere else however.

What language are you using that doesn't have a working XML parser? REALLY?

He said appropriate XML parser.

All languages have XML parsers, it's more that a lot suck, they might have weird concepts you have to use, or are constantly tripping you up with namespaces, or make it really hard to write xpath queries.

> or are constantly tripping you up with namespaces

You mean requires that you understand the XML format you are working with? Oh noes!

Namespaces exist, just about everywhere in the world of programming, and they do so for a reason.

<bar /> is not the same as <foo:bar /> just like http://bar.com is not the same as http://bar.foo.com.

If that's putting the bar high, I really think I may be suffering a huge disconnect from the rest of my peers in terms of expected capabilities.

Just because JSON doesn't have namespacing-capabilities at all, doesn't make it a worthless feature. It's actually what gives you the eXtensibility in XML. As a developer I expect you to understand that.

(And I wonder how long time it will take before the JS-world re-implements this XML-wheel, while again doing so with a worse implementation)

The reason why many developers hate XML namespaces isn't the concept but the implementations which force you to repeat yourself everywhere. I think a significant amount of the grumbling would go away if XPath parsers were smart enough to assume that //tag was the same as //default-and-only-namespace:tag, or at least allowed you to use //name:tag instead of //{URI}tag because then you could write against the document as it exists rather than mentally having to translate names everywhere.

Yes, you can write code to add default namespaces when the document author didn't include them and pass in namespace maps everywhere but that's a lot of tedious boilerplate which requires regular updating as URLs change. Over time, people sour on that.

It really makes me wonder what it'd be like now if anyone had made an effort to invest in making the common XML tools more usable and other maintenance so e.g. you could actually rely on using XPath 2+.

> (And I wonder how long time it will take before the JS-world re-implements this XML-wheel, while again doing so with a worse implementation)

I'm going to guess never. I'm also going to guess that there isn't a single flamewar in the entire history of JSON where someone was trying to figure out how to implement anything close to XML namespaces in JSON. And by "close", I mean something that would require changes to JSON parsers and/or downstream APIs to accommodate potentially bipartite keys.

You never know. This is what they said about schemas too not many years back.

Have there been any discussions whatsoever about adding some sort of namespacing mechanism to JSON?

Well, there's JSON-LD (JSON Linked Data) already.

It's for making interoperable APIs, so there is a good motivation for namespaces. But the namespaces are much less intrusive than XML namespaces. Ordinary API consumers don't even have to see them.

One of the key design goals of JSON-LD was that -- unlike its dismal ancestor, RDF/XML -- it should produce APIs that people actually want to use.

Thanks, I haven't explored JSON-LD before.

But that's not a case of adding namespaces to JSON, is it?

What I mean is if one were to take the skeptical position that JSON is going to end up "re-inventing the XML wheel", that would mean JSON advocates would need to push namespaces into the JSON spec as a core feature of the format. I've never read a discussion of such an idea, but I'd like to if they exist.

edit: clarification

Well, yeah, perhaps the craziest thing about XML is that it has namespaces built into its syntax with no realistic model of how or why you would be mashing up different sources of XML tags.

Namespaces are about the semantics of what strings refer to. They belong in a layer with semantics, like JSON-LD, not in the definition of the data transfer format.

I am convinced that nobody would try to add namespaces to JSON itself. Just about everyone can tell how bad an idea that would be.

> Well, yeah, perhaps the craziest thing about XML is that it has namespaces built into its syntax with no realistic model of how or why you would be mashing up different sources of XML tags.

The thing that gets me is that they were added to XML, so the downstream APIs then got mirrored interfaces like createElementNS and setAttributeNS that cause all sorts of subtle problems. With SVG, for example, this generates at least two possible (and common) silent errors-- 1) the author creates the SVG in the wrong namespace, and/or 2) more likely, the author mistakenly sets the attribute in the SVG namespace when it should be created in the default namespace. These errors are made worse by the fact that there is no way to fetch that long SVG namespace string from the DOM window (aside from injecting HTML and querying the result)-- judging from Stackexchange users are manually typing it (often with typos) into their program and generating errors that way, too.

Worse, as someone on this site pointed out, multiple inline SVGs can still have attributes that easily suffer from namespace clashes in the <defs> section. It's almost comical-- the underlying format has a way to prevent nameclashes with multiple attributes inside a single tag that share the same name-- setAttributeNS-- but is no help at all in this area.

edit: typo and clarification

XML parsers have a pretty bad track record for security vulnerabilities. If I was writing code to distribute that was going to be parsing arbitrary data from third parties (which is the RSS/Atom use case), I would be more comfortable trusting the average JSON parser than the average XML parser.

Otherwise, I agree with the "if it ain't broke" principle. There's also cases where so much ad hoc complexity is built on top of JSON that you end up with the same problems XML has, except with less battle-tested implementations.

As terrible as XML parsers can be, they've never been as bad as "XMLdoc = eval(XMLString)". I'd be more likely to trust a JSON parser not written in JavaScript than an arbitrary XML parser, but that's only because of the XML specification itself, which includes such features as including arbitrary content as specified by URLs (including local (to the parser) files!). Great ideas when you can trust your XML document, not so great otherwise.

modern browsers don't internally call eval(). See e.g. the definition of JSON.parse in v8: https://chromium.googlesource.com/v8/v8/+/4.3.65/src/json.js...

And modern XML parsers aren't full of vulnerabilities anymore. You're missing the point.

It is very likely than I am an idiot, but I've always found parsing XML too hard, specially compared to JSON which is almost too easy.

Whether parsing XML is easy or hard, how often do you actually write an XML parser? If I'm digesting a JSON/XML document, I resort to a parser library for the language that I'm using at that point, so the complexity of writing such a parser is pretty much non-existent. Definitely not a compelling reason to switch to JSON.

Most XML parsers I've used are leaky abstractions. Even once the document is parsed, actually accessing the data can require a lot more complexity than accessing parsed JSON data.

IIRC, the popular C++ implementations were glorified tokenizers. It was up to you to figure out which tokens were data and how those tokens related to each other.

Ah, SAX. People built some true horrors with that API, just because it was "more performant" than DOM. Never mind that their hacked-together tree builders often leaked like sieves.

If there was an `XML.parse` just like there's `JSON.parse`, I doubt you'd say the same. As it stands, the added complexity in JS-land is to import a library that provides this functionality for you. Fortunately there are many, but I agree a built-in would be nice. It's a bit of a shame that E4X never landed in JS.

It's more than JUST library support. It's also that JSON deserializes into common native data types naturally (dictionary, list, string, number, null).

You can deserialize XML into the same data types, but it's not anywhere near as clean because of how extensible XML is. That's a big part of what's made JSON successful.

Right, but you inevitably end up with boilerplate "massage" code around your data anyway. Case in point: dates, any number that isn't just a number e.g. currencies or big numbers, URLs, file paths, hex, hashes. Basically any type that carries any kind of semantics beyond array, object, string, number, or null will require this boilerplate, only that your data format has no way of describing them except for out-of-band specs, if you want to call them such.

At least XML has schemas, and even if all you're doing is deserializing everything into JsonML like objects you're still better off because you'll have in-band metadata to point you in the right direction.

CBOR [1] allows the semantic tagging of data values and makes a distinction between binary blobs (a collection of 8-bit values) and text (which is defined as UTF-8).

[1] RFC-7049. Also checkout http://cbor.io/

IMHO the boilerplate code is much easier to read than understanding the nuances of XML if I have to read a document.

{"type":"currency", "unit":"euro", "amount": 10}

feels easier to understand than

<currency unit="euro">10</currency>

Maybe it's just conditioning, but I find the latter example easier to read and understand. In fact, I'd say that - in general - I find XML better in terms of human readability than JSON. I guess it just goes to show that we all see certain things differently. shrug

I think that's totally reasonable - because it was after all one of the goals of XML. That is, to be human readable.

There is a difference however between readable + parsable vs parsable + easily dealt with.

XML was not the latter. You have to do more work to traverse and handle XML inside your application than you do JSON, and most of the (reasonable) reasons for this are due to features that most cases don't need.

JSON makes the common case easy, XML doesn't.

How about:

<rec type="currency" unit="euro" amount="10" />

I don't think your problem is with the syntax, necessarily. It seems more like you prefer name/value pairs over semantic markup.

The biggest problem with XML is how easy it is to make a very bad schema, and how hard those can be to parse

Also, for what it's worth, your point is exactly why I mentioned E4X. Sure wasn't a panacea, but it had some things going for it.

E4X was exceedingly great. Loved it.

This is really only true in dynamically typed languages. From personal experience: parsing json in Java or Go without just treating everything as a bag of Object or an interface{} requires a ton of menial boilerplate work.

Super nice in python/ruby/javascript, though.

Swift 4 will have JSON encoding/decoding built in, and I wouldn't be surprised to see such a feature spring up in other modern languages too. Once that boilerplate is eliminated, json is a pretty decent solution.


I am very stoked for this.

> parsing json in Java or Go without just treating everything as a bag of Object or an interface{} requires a ton of menial boilerplate work.

From my experience in Java it is pretty simple using a library like Jackson. You define the types you expect to read, maybe add some annotations, and then it's one function call to deserialize. IIRC Go has something similar in its json library.

Yes, it's arguably nicer in Go, because you specify exactly what types and field names you expect, and then it's just a simple json.Unmarshal

Sure--it's kind of a pain in Swift, too.

Wouldn't it just be worse with XML, though? I get that people don't realistically parse it themselves and libraries are smart enough to use schemas to deserialize, but there's nothing inherent about JSON that makes it unable to conform to a schema or be parsable by libraries into native strongly-typed objects the same way.

Except JSON doesn't have semantics to describe schemas, only arrays, objects, strings, numbers and null. You can say "but this key is special" but then it's not JSON anymore. And if you're ok with that, may as well just use JSON-LD or some other JSON-like format.

Idk, I think JSON parsing is pretty ergonomic in Rust, definitely nicer than your typical XML DOM.

Of course, JSON doesn't support complex numbers, bignums, rationals, cryptographic keys &c. And it'd be even worse than XML to try to represent programs in.

JSON is definitely easier to decompose into a simple-to-manipulate bag-of-values than is XML.

XML is fundamentally much more complex than JSON so any XML parsing library will inevitably present more complicated API. I kinda like XML (!), but there is no point pretending that using it is as simple as JSON.

I think that depends what you mean by "using it".

XML can convey a lot more semantic meaning than JSON ever will, and standardisation of things like XPath, DOM, XSLT, etc provides a lot of power when working with XML documents.

With JSON, essentially everything is unknown. You can't just get all child nodes of an object, or get all objects of a certain type, using standard methods. You need to know what object key 'child' nodes are referenced by, or loop through them all and hope that what you find is actually an array of child nodes, and not e.g. an array of property values. Finding all objects of a given type means knowing how the type is defined, AND the aforementioned "how do i get child nodes" to allow you to traverse the document.

Of course that assumes what you have is a document, and not just a string encoded as JSON. Or a bool/null.

My point is, the tooling around XML is very mature. "Use" of a data format is a very broad topic, and covers a lot more than just "i want to get this single property value".

Absolutely. Right tool for the right job. Mixed content (perhaps a paragraph with bold and italics) is absolutely horrible in JSON because it lacks the complexity that XML has to cope with this.

You're basically saying that this isn't technically better, just more socially acceptable right now. I think you're right, but it seems to me that Atom's problem is primarily a social one. So even if this doesn't carry any technical advantages, a format with a strong social "in" is precisely what we need to make feeds a thing again.

To be honest, I'm really excited about the prospect of JSON based feeds. Right now, there's no easy way to work with Atom/RSS feeds on the command-line (that I know of anyway), which is something I often wish I could do. With a JSON feed, I can just throw the data at jq (https://stedolan.github.io/jq/) and have a bash script hacked together in 10 minutes to do whatever I want with the feed.

I give you libxml:

    xmllint --xpath '//element/@attribute'
There's a good chance it's already installed on your mac.

To avoid the hassle of handling xml namespaces (e.g. in an Atom feed...), just do:

    xmllint --xpath '//*[local-name()="element"]/@attribute'
Note: for consistency, namespaces are not needed for attribute names.


There are a few nice XML processing utilities. I tend to use xmlstarlet and/or xidel. This lets me use XPath, jQuery-style selectors, etc.

I agree that jq is really nice though. In particular, I still find JSON nicer than XML in the small-scale (e.g. scripts for transforming ATOM feeds) because:

- No DTDs means no unexpected network access or I/O failures during parsing

- No namespaces means names are WYSIWYG (no implicit prefixes which may/may not be needed, depending on the document)

- All text is in strings, rather than 'in between' elements

- No redundant element/attribute distinction

Even with tooling, these annoyances with XML leak through. As an example, xmlstarlet can find the authors in an ATOM file using an XPath query like '//author'; except if the document contains a default namespace, in which case it'll return no results since that XPath isn't namespaced.

This sort of silently-failing, document-dependent behaviour is really frustrating; requiring two branches (one for documents with a default-namespace, one for documents without) and text-based bash hackery to look for and dig out any default namespace prior to calling xmlstarlet :(



I have an RSS client written in Rust that builds as a command line program.[1] I wrote this in 2015, and it needs to be modernized and made a library crate, but it will build and run with the current Rust environment. It's not that hard to parse XML in Rust. Most of the code volume is error handling.

[1] https://github.com/John-Nagle/rust-rssclient

Surely there's an xml->json converter somewhere.

It's kind of tough to convert XML directly to other formats (including, but not limited to, JSON), because there are a lot of XML features that don't map cleanly onto JSON, such as:

• Text nodes (especially whitespace text nodes)

• Comments

• Attributes vs. child nodes

• Ordering of child nodes

As it happens, XSLT 3.0 and XPath 3.0 both have well documented and stable features for doing exactly this. Roundtripping XML to JSON and back is a solved problem - check it out some time; it may surprise you.

Are you talking about json-to-xml and xml-to-json?

From the XSLT spec [0]:

"Converts an XML tree, whose format corresponds to the XML representation of JSON defined in this specification, into a string conforming to the JSON grammar"

It can't take an arbitrary XML document and turn it into JSON, it can only take XML documents that conform to a specific format.

You can safely round-trip from JSON to XML and back to JSON. That's trivial because JSONs feature set is a subset of XMLs.

What you can't safely do is round-trip from arbitrary XML to JSON and back to XML. That's because, as the parent said, there are features in XML that don't exist in JSON. That means you are forced to find a way to encode it using the features you do have, but then you can't tell your encoding apart from valid values.

[0] https://www.w3.org/TR/xslt-30/#func-xml-to-json

You could conceivably serialize the DOM as a JSON object, but the representation would be very difficult to work with:

      "type": "element",
      "name": "blink",
      "attributes": {
        "foo": "bar"
      "children": [
          "type": "text",
          "content": "example text"

Once you've peeked at the complexity of some of the xml parsers (like xerces, oh god xerces) undoubtedly you'll want to avoid it like the plague. xml can get crazy-bananas very quickly. I fundamentally don't understand xml (just like I don't understand asn1) for anything beyond historical purposes.

The Atom spec is really easy to grasp. Your platform may even include a way to deal with it ([.NET](https://msdn.microsoft.com/en-us/library/system.servicemodel... for example)

There are definitely complexities in the XML ecosystem, like XLink, schemas, namespaces, etc. But in practice, not every application needs all that stuff, and when using the "common" parts of XML, I don't find it difficult to understand or work with. But that's just me.

Yep, we don't really need another syndication format that no reader is going to support or support well for years. All I see missing in RFC 4287 is the lack of a per-entry cover image/thumbnail, which you can solve with an extension (which no one supports, and that's kind of the point) anyway.

Yeah this is great, now instead of properly machine-readable and verifiable XSD files we have pseudo-RFC text on some shitty GitHub page.

JSON, given the same schema, will always be more efficient byte-for-byte than XML. In addition, JSON as a format is native to JavaScript, which itself is ubiquitous. That's not even mentioning raw readability/writability.

Basically, XML is to JSON as SOAP is to REST. It had it's day, though it's obviously still useful, but we have better tools now. Frankly, I'm surprised we haven't seen a proposal like this sooner.

> XML is to JSON as SOAP is to REST

That's true. Both XML and SOAP are well defined, and well structured.

JSON and REST are both marginally defined, and thus we see constant incompatible/incomplete implementations, or weird hacks to overcome the shortcomings.

> we have better tools now

I think "the cool kids are cargo-culting something newer now" is probably more accurate.

Nitpick: REST is very well defined. It's not just a protocol, like some people insist.

Other than that, fully in agreement.

Rest is effectively a concept, and its up to developers to follow the rules it sets.

You can't take your codebase, add some glue code to a REST module, and know that it will be usable by any other REST consumer/client, because no one follows the guidelines exactly the same way.

Part of me is also with you - JSON is indeed smaller than XML , but we do have gzip almost everywhere around the web, and with gzip, they don't have that much difference on space. Also, if people really care about this, why don't they use binary format, such as something like protobuf?

And the other part of me is not with you - manipulating XML is not as easy as JSON in most of my development time, and sometimes I even need to write something by my bare hands, which JSON is much more handy. Tons of other formats are more human-friendly than JSON, for example TOML, but they don't have the status JSON has. So I guess JSON is kinda choice under the current state of "web development times".

In practice, json is much easier to work with on the command line because of jq.

No, we don't. This doesn't do anything except break compatibility.

Yikes, you didn't even make it to the second sentence.

> JSON is simpler to read and write, and it’s less prone to bugs.

JSON is simpler to read and write, and it’s less prone to bugs.

I don't actually find either of those things to be true.

I simply think you're lying to yourself. It's both literally and theoretically simpler to write and digest let's start with the simplest case, {}. Prone to bugs is a matter of debate, depending on a number of factors.

That's a fine opinion to have, but that doesn't mean that people (the authors or devs generally) use JSON out of vanity. As an aside, you're the first person I've heard suggest people put their identity in serialization format, which gave me a good laugh.

From the HN guidelines:

Please don't insinuate that someone hasn't read an article.

My mistake for phrasing my point in a manner that violates HN guidelines. I tried to edit, but I missed the window. At any rate, my point stands.

We don't prefer JSON to XML for any reason other than that XML is terrible by comparison.

It's funny to me that at the same time people are flocking to languages with strong, flexible type systems (often with compile-time checks), we are fleeing from a strongly typed data interchange format in favor of a dynamic bag of objects and arrays.

I think that's because even if the data interchange format is strongly typed, as a consumer you often still must expect _anything_.

I've yet to work on a project that handles XML where we have a XSD prevalidation step that makes the reading of some deeply nested XML tag feel safe.

Unless we count XML <-> data object binding back in the java days. Not sure that felt any better...

On the flip side, I've only ever not had an XSD when I was building something myself and actively didn't care.

The truth, I tend to suspect, lies somewhere in between. =)

That's not a reason.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact