Hacker News new | past | comments | ask | show | jobs | submit login

It's true that this is essentially a markup language (in the original sense of markup: Text annotated with style and semantic information) in JSON form. But it wouldn't be sufficient to just pick a subset of HTML. You would need to extend it with metadata and custom element types. Since HTML elements, attributes and content are all just untyped strings, you will need to invent your own syntax to express other types. You're halfway into XML land at this point.

There's a million reasons why XML is a bad fit, but one stands out: JSON is a first-class citizen on the web, and XML isn't. Once you parse this with the browser's built-in JSON parser, you're good to go; you have a JS object you can plug into an editor, or manipulate and display in all sorts of ways. If this were a custom HTML subset, or XML, the path would be much longer. JSON has many issues, of course, but ease of use isn't one of them.




> Once you parse this with the browser's built-in JSON parser, you're good to go; you have a JS object you can plug into an editor, or manipulate and display in all sorts of ways. If this were a custom HTML subset, or XML, the path would be much longer.

Browsers parse XML too, y'know? It was, at one point, even more "first class" than JSON—why do you think browsers had an "XMLHTTPRequest" API, instead of just an "HTTPRequest" API?

> You're halfway into XML land at this point.

You mean... XHTML?

The ability to combine HTML tags (retaining their semantic meaning), with tags from your own, other XML namespaces (which you can define the semantic meaning of) was 99% of the point of XHTML.

People thought XHTML was supposed to "replace HTML" or "be the next version of HTML", and so hated it (because they'd have to fix all their existing HTML-authoring tools to make them generate valid XML.)

But no, XHTML was never supposed to entirely supplant HTML; XHTML was just supposed to be a syntax for including HTML-markup-ed text as part of the content of a larger XML document.

And XHTML is still a valid thing to use today. WHATWG is even developing XHTML5 to mirror HTML5.

Really, XHTML should not be considered "some past version of HTML, a bad diversionary experiment like New Coke that we've since gotten back on track from", like most people see it. Rather, XHTML was and is a separate tool, useful in particular domains for acting as a drop-in sub-schema to larger XML document formats that need to represent text markup within them. XHTML provides not only the DTD, but all the semantics for how those elements should translate to rendered results—while still allowing you to do whatever you want, XML-wise, inside and outside and around those elements.


XML is also a first-class citizen on the web. All major browsers have a built-in XML parser to access and manipulate XML. All major server-side programming languages have XML parsing and manipulation libraries.

XML is a far better format for this; it is designed specifically for mixing text content with markup. This format is incredibly hard to read and debug. Once you've parsed this JSON, you're still left with a complex stringy structure that you then have to parse again anyway. The value of being in JSON is incredibly low.


XML is not a native data structure to the web, and JSON arguably is. Browsers may have XML parsers built in, but XML requires APIs to work with in any meaningful way. XML's data model is complex. Where JSON allows you to build the entire data structure using pure JavaScript data types (objects, arrays, numbers, etc.), you can only build XML data reasonably via complex interactions of createElement(), appendChild() and so on. Compared to the thinness and transparency of JSON, the XML DOM has huge overhead. There's huge value in your document model also being your in-memory model.

I'm not sure why you think this particular JSON schema is a "complex stringy structure". XML is a complex stringy structure! There's conceptually and semantically zero difference between <node style="normal">text</node> and {style: "normal", text: "text"}, other than the fact that the former requires an API to read and manipulate, and the latter doesn't, from the point of view of JavaScript code. XML has a specific formalized data model of elements, attributes and so on, but this is just a graph representation: You can lossly encode an XML DOM in JSON, and vice versa.

The argument in favour of XML should revolve around the ecosystem of tools (XPath, XML Schema, namespaces, etc.) that allow you to work with it, but so far the comments here haven't conclusively highlighted these features as benefits. They also seem to ignore that with the exception of legacy enterprise software, the web world has largely (not entirely, but largely) moved away from XML. There's a reason we're using JSON over REST instead of SOAP. XML isn't the lingua franca of the web that W3C seemed to assume it would be during the early 2000s; JavaScript/JSON is.


> Browsers may have XML parsers built in, but XML requires APIs to work with in any meaningful way.

But this format will require parsers to take the resulting objects, arrays, and numbers and reconstruct something very close to XML's data model anyway.

> XML's data model is complex.

So is this document format. With it's special "_type" keys and specially named nodes. XML is simpler since attributes and children are a native part of the format and the API.

> There's huge value in your document model also being your in-memory model.

Are you suggesting this exact model be the in-memory model? Because that would be pretty non-optimal. You'd likely want your in-memory model to be more powerful than a tree of dictionaries, arrays, and strings. Therefore parsing and storing an XML document on load/save is not a big deal. In fact, given the existence of an API that already handles attributes and nested nodes, it might even be easier.

> XML has a specific formalized data model of elements, attributes and so on, but this is just a graph representation: You can lossly encode an XML DOM in JSON, and vice versa.

Exactly. So why use a format that it obviously terrible for mixed text and markup? For any reasonable sized document in JSON will be impossible to follow yet XML would be comparatively straight forward.

> There's a reason we're using JSON over REST instead of SOAP. XML isn't the lingua franca of the web that W3C seemed to assume it would be during the early 2000s; JavaScript/JSON is.

Basically SOAP is an over-engineered solution to web RDP. It started out as XML-RPC which was far simpler and far more constrained like JSON. You could easily encode SOAP in JSON if you wanted. You're blaming the underlying technology for when it's used inappropriately but this is an example of JSON being used inappropriately! It's an over-engineered solution to a problem solved simpler by a different technology. "XML bad / JSON good" doesn't tell the whole story.


> Are you suggesting this exact model be the in-memory model? Because that would be pretty non-optimal. You'd likely want your in-memory model to be more powerful than a tree of dictionaries, arrays, and strings.

I'm pretty confident that this is actually the intended goal, particularly for web-based editors.

Most "modern" web-based rich text editors work with an internal schema similar to this one, even when most of them support html/xml import/export too. In today's web ecosystem, it is hard to do much better than "javascript structure that gets diff-rendered to the DOM by a patching algorithm on updates".

I think this is just a case of someone saying "hey, wouldn't it be great if all these editors used the same schema instead of very similar ones with minor differences?"


> Are you suggesting this exact model be the in-memory model?

The data model, yes, though not the visual model. Sanity [1], which uses this specification, uses it for its rich text content.

Sanity is a document store with a collaborative, real-time editing UI similar to Airtable, but self-hosted and open source. When you're editing some text in the rich-text editor (which uses ProseMirror internally, last I checked), the editor is working directly on this data model, and produces patches against this data model that get synced with other clients. Similar to OT, except it uses a simpler git-like approach to rebase patches and converge the state.

There's another, less obvious reason it's JSON. In Sanity's document store, every document is a structured object that is read and written as JSON. Inside each document, Portable Text fields are also stored as JSON, not as a string or a binary blob. This content also is indexed and queryable. For example, let's say you're creating a wiki app; each wiki page is a document, with a title, body, and so on. The body is a Portable Text field. With this, you can find all wiki pages that link to another wiki page by running a query such as *[references("richard-feynman")]. You can do things like extract images, get the first paragraph, etc., all using the query language.

All technically possible with XML, of course, though I personally wouldn't want to go there.

(Disclosure: I work on Sanity's content store tech, but I don't work on Sanity itself.)

[1] https://www.sanity.io


> Browsers may have XML parsers built in, but XML requires APIs to work with in any meaningful way.

That's fine. APIs are good. The problem with XML is that the APIs never got any love from the end-user perspective. They are horrible to use, (seemingly) overcomplicated and the ones I've used, inconsistent.

I've seen one or two nice Python wrappers (forget names, been a long time since I worked with XML) that smoothed out the Developer Experience (DX), but that's it.

For me, this is the reason as a programmer I prefer to work with JSON, even though as a data author and maintainer I prefer XML at every level.


And as if dealing with JSON requires no APIs. Perhaps because I use libjq, I'm spoiled? But no, I can't imagine starting from scratch and not building an API for dealing with JSON.


> but XML requires APIs to work with in any meaningful way.

And JSON doesn't?! Sure it does. E.g., I regularly use the jv API in libjq. I wouldn't want to use JSON in any other way than via an API, honestly.

You could also argue that XSLT/XPath are part of the XML API / ecosystem, much the way I think of JSONPath / jq as part of the JSON ecosystem.

And yes, the world has moved away from XML... for non-document purposes. XML was a poor fit for encoding non-document data. SOAP's use of XML to encoded non-document data sure sucked, though it died of other things too (e.g., REST becoming more popular and being more appropriate in some ways). That's why we have things like protocol buffers, flat buffers, etc. Some even remember XDR, NDR, ASN.1 and its many encoding rules. Some even remember those and curious enough to have noticed that protocol buffers is DER reinvented.

JSON is much more appropriate for non-document data than XML, and unlike XDR/NDR/<various>buffers/DER/BER/CER/XER/PER/OER you don't have to agree on a schema a priori, which makes JSON very useful in REST APIs. But you could still define REST services in relation to schemas and then use OER or flat buffers to get very efficient encodings.

At the end of the day it's all the same data no matter how you encode it, provided the choice of encoding doesn't force you to give up some metadata or force you to express it in a convoluted way.

The point isn't that you can't do with JSON what you can with XML, or that you can't do with XML what you can with <various>buffers. The point is that XML has decades of evolution baked in already. You should not reinvent a document format without being very explicit about what's wrong with XML, what's missing that can't be added, and what you're willing to throw out, otherwise you'll find yourself replicating that evolution.


JSON doesn't need an API because it's a subset of Javacript:

  const foo = {
    bar: n,
    children = users.map(({name, id}) => ({name, id});
  }
  foo.x = 42;
DOM:

  const foo = document.createElement("root");
  foo.setAttribute("bar", `${n}`);
  for (const ({name, id)} of children) {
    const child = document.createElement("child");
    child.setAttribute("name", name);
    child.setAttribute("id", id);
    foo.appendChild(child);
  }
  foo.setAttribute("x", `${42}`);
Querying with jq isn't an "API".

JSON and XML are both encodings of two different formal data models. The difference is that JSON is native to JS. XML requires an API to build a safe and valid document. There are simplifying wrappers around the DOM, but they'd be yet another API on top of an API.

Also note how XML is string-based. In order to get any type safety at all you have to use XML Schema, which only works with a data consumer that also knows XML Schema. So given <foo x="1"/>, there's no way to tell arbitary consumers that x is, in fact, a number.

There are undoubtedly toolchains that do higher-level manipulation of XML data so that you can serialize and deserialize rich, type-safe data, but the fact remains that you need such a toolchain, in every single language that is to deal with this data.


I'll add to the chorus if I may. XML is much better for text documents, and very much is a first-class citizen on the web.

More than anything, XML is trivially extensible, but anything you do with JSON for documents will require that you reinvent just about everything that XML already has (e.g., namespaces, schemas), and as with all reinventions, you'll probably do it badly.

I've said elsewhere here that jq is so much pithier than XSLT, and that's true. But once you have a very complex schema to represent in JSON what would have been a simpler schema in XML, jq will no longer yield short, simple, pithy programs. The advantage of simplicity that JSON seems to have here over XML is imaginary and ephemeral -- I wouldn't count on it.


Sorry but this is nuts. Not only is JSON not capable to capture essential concepts of text documents such as text macros/entities and content models, it's also not a "first-class citizen" compared to HTML which can be styled and edited (via contenteditable) in browsers out of the box.


I think you're missing the point of what this specification is about. This has nothing to do with macros or entities.


> (in the original sense of markup: Text annotated with style and semantic information)

JSON is not annotated text, it is structured data that contains strings. It is a minor difference if you only use tools to manipulate it though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: