> I think it's bad reputation comes from anyone not using an enterprise language...

616c · on July 31, 2016

I think the worst of this is what I call semantic incoherence.

I have a system that has things like <Task ID="6">Blah</Task>. Why is the ID, clearly always an integer in every sample of hundreds I see, represented as a string?

Another favorite: <ExecuteCommand>[CDATA[Batchfile.bat]]</ExecuteCommand>, while a binary or something else will be <ExecuteCommand>"program.exe /argument:f /argument2:x"</ExecuteCommand>.

By the way, this is an enterprise as it gets: a software tool from a four-letter hardware company, quite huge, trying to sell off its software division. I wonder why.

XML is like all other "crap" tools: Java, PHP, SOAP: some people do not grok the spirit of the law, and they do weird things that reflect their discomfort and hurried need to operate with it. Many write it off.

I agree with your points, this is just my corrolary. The sad thing is SEXPR and XML are not far removed, one is arguably a subset of the other, and notice how people lose their shit when you ask them to consider Lisp languages for daily works because "all those parens are stupid" and how the culture surrounding a potentially viable tools makes people close up without delving in with curiosity.

https://en.wikipedia.org/wiki/SXML

http://arclanguage.org/item?id=19453

ams6110 · on July 31, 2016

> Why is the ID, clearly always an integer in every sample of hundreds I see, represented as a string

Becase XML is a text-based markup. If you truly want binary data you need to encode it and use CDATA sections.

616c · on July 31, 2016

That was not quite my point.

Why pretend it is a string at all?

I should have been more clear. Sometimes you have these argument type deals <Task ID="3"> where I would at least hope for <Task ID=3> or the monstrosity above (I assume ID=3 is not valid in hindsight, I am getting tired just writing this all on the second pass even!). And I see all different variations in the same XML file! There is no logical consistency, not even in the same config for the same function of this multi-stage system.

I am not even a novice programmer, and I find the variation annoying, and sometimes hard to reason about when I want to know what the hell the programer was thinking.

The valid part for the CDATA portion has changed several times in minor releases, so when our server team upgrades, I get to figure out the new syntax.

I thought XML was proposed to avoid these things! Haha. Again, tools in the hands of "wise men" like me are dangerous. I am probably as ignorant as them, I just think I know better!

oceanswave · on July 31, 2016

Enclosing the attribute within double quotes isn't pre-disposing the value to be of a particular type. It's part of the XML spec that attribute values are contained within double quotes, and must be to be valid. The type isn't implied in the file.

An xml schema such as <xs:element name="Task"> <xs:complexType> <xs:attribute name="ID" type="xs:int" use="required" /> </xs:complexType> </xs:element>

could more explicitly declare the type of the value.

616c · on July 31, 2016

Thanks for the explanation. I guess in this case I learned be careful what you wish for. I guess this is why I prefer the

But this is my ignorance of XML and familiarity with HTML showing.

ams6110 · on July 31, 2016

XML as a config file format was a disaster in every example I ever encountered. Config files are supposed to be editable by humans using editors, and most that I saw were too complex for that. In particular the NeXT/Apple property file formats are horrible abuses of XML.

As a format to represent structured data, it could be fine as long as you were pragmatic about it. In the case of <task id="3"> you either assumed that "id" was always an integer or you validated it with a schema declaration, which quickly got hairy.

In practice I never validated XML beyond it being well-formed (which was provided by default in any parser) and never had any real problems.

thangalin · on July 31, 2016

What takes fewer lines of code to parse?

    <element.name id.value="3.14">

Or accepting both:

    <element.name id.value="3.14">
    <element.name id.value=3.14>

How would you specify an empty value for mandatory attributes?

colejohnson66 · on July 31, 2016

I've seen empty values written as

    <tag attr1 attr2="val">data</tag>

Whether that's legal or not, I don't know.

oceanswave · on July 31, 2016

not valid. wondering if you've seen that within HTML, where it is valid.

colejohnson66 · on July 31, 2016

Actually, now that you mention it, I think it's from Chrome's Inspect Element tool, but I can't check right now.

I think if you wrote something like

    <div class="">...</div>

it would display in the tool as

    <div class>...</div>

kozak · on Aug 1, 2016

Chrome's Inspect Element shows you the non-serialized DOM structure, which means it's neither XML nor HTML at that point.

chiph · on July 31, 2016

Oh, this is the difference between attribute-oriented XML, element-oriented XML, and whatever-the-hell-we-feel-like-oriented XML. Publishers should pick one of the first two and be consistent about it.

ams6110 · on July 31, 2016

Agree. Practical/pragmatic use of XML as a data format requires consistency.

CamperBob2 · on July 31, 2016

I have a system that has things like <Task ID="6">Blah</Task>. Why is the ID, clearly always an integer in every sample of hundreds I see, represented as a string?

You're really asking a different question here: "Why should an integer be used as a task ID?" Storing the task ID as a string may give you options in the future that you wouldn't otherwise have, at a relatively small cost in parsing performance and validation overhead.

Most of the world's regrettable XML schemas were faulty at the specification stage, not the implementation stage. To minimize the likelihood of eventual regret, I usually prefer to store stuff in strings unless there's a very good reason not to. The fact that I'm using XML means that I'm not that concerned about performance, so... strings, it is.

A similar argument can be applied to the child/attribute dilemma. If there's even the slightest chance that a field isn't always going to be a leaf node, I'll do the extra typing and make it a child. Ideally the parser would be written to make them both work the same anyway.

616c · on Aug 1, 2016

I see were you were downvoted, but I happen to see merit with your comment. Again, a lot of people make technical decisions without stepping back and just scanning their choices as non-specialist (in the context of their programming domain) and ask hey, does this make sense?

CamperBob2 · on Aug 1, 2016

Technically all attributes are supposed to be surrounded by quotes regardless of how they're interpreted. That renders the premise of my whole comment invalid, to be "technically correct," so the people downvoting may have had that in mind.

Still, there are plenty of XML applications that leave out the quotes on numeric attributes. My point was really that they're not doing themselves any favors by abusing the spec that way. A text-based markup language is a great example of how premature optimization is unhelpful most of the time.

FunnyLookinHat · on July 31, 2016

> JSON has become popular because a lot of bad programmers saw nothing wrong with calling eval on untrusted input (before JSON.parse was available).

Disagree. JSON became popular because it was extremely easy to implement (both for marshaling and consuming), and because it was extremely lightweight.

I think you could also make the argument that JSON was conceptually easier for programmers to wrap their minds around. You could just pretty-print it and quickly get an idea for the object's format, attributes, etc.

seagreen · on July 31, 2016

I agree, especially with the easy to understand part.

Look how short the standard is: http://www.ecma-international.org/publications/files/ECMA-ST... It's small and perfect, like a 2x1 LEGO block.

Here's the XML spec: https://www.w3.org/TR/REC-xml/ <backs away slowly>

ams6110 · on July 31, 2016

XML could be fairly lightweight also. It was all the enterprisey-standard formats that were hideous.

E.g.

    {"name":"John","age":42}

vs.

    <person name="John" age="42" />

rimantas · on July 31, 2016

Now do the nested objects in both. One line does not show much.

Mikhail_Edoshin · on Aug 1, 2016

    <person id="123" name="John" age="42" sec:checksum="...">
      <family-member type="spouse" ref="456""/>
      <family-member type="child" ref="789" />
      <fin:credit-rating score="A"
          last-change="2016-02-04T12:34:56Z" />
      <уфмс:статус значение="42" />
    </person>

Here we can describe `person/@id` as element ID and `family-member/@ref` as a reference to an ID so our XML tools can link these together.

Also note three more elements from different namespaces: `@sec:checksum` could be some kind of technical information about the record, `fin:credit-rating` is added by the finanical module. The `@last-change` is defined as datetime so as we read it with other XML tools we'll get it as datetime type.

The next one is a tag in Russian language that describes something related to Russia; XML can use the whole Unicode in tag and attribute names.

Also, XML names are globally unique by design so there's no clash between all the different pieces and the tools can easily be configured to ignore parts they don't understand or work as a glue between different areas.

We can still efficiently validate the syntax the whole piece or parts of it as we see fit.

wtbob · on July 31, 2016

> Disagree. JSON became popular because it was extremely easy to implement (both for marshaling and consuming), and because it was extremely lightweight.

A canonical S-expression parser is strictly easier to implement, given that S-expressions consist only of lists and byte sequences (no numbers or objects), and is even more lightweight. JSON's big advantage was that it was familiar to a JavaScript programmer, that's all.

auganov · on July 31, 2016

S-expressions is basically no syntax. Human-readability depends solely on the person that comes up with the schema. I mean there's many reasons to love S-expressions but human-readability is an unusual one. edn [0] is an interesting compromise (as is clojure).

XML is actually IMO not that bad at human readability, it's pretty good. It's terrible at human writability. Conversely S-exps are lovely to work with.

[0] https://github.com/edn-format/edn

MichaelGG · on July 31, 2016

XML's bad rep for verbosity is almost entirely due to the nonsensical, terrible idea of requiring names in the end tag. Without that, it's about the same level of verbosity as JSON. And personally, after writing plenty of both by hand, XML is easier to get right. JSON, with it's poor quoting rules (mandatory quotes on names??) and lack of comments is very annoying to do by hand and seems visually more noisy.

erlehmann_ · on Aug 1, 2016

An advantage of names in end tags is human readability. Consider this XML fragment:

  <a>12<b>34<c>56<d>78<e>90</e></d></c></b></a>

Appending something to the end of the d element is easy, since one can just search for its end tag. In JSON and other formats that only have one single character at the end, one has to count brackets or parentheses for this purpose:

  (12(34(56(78(90)))))

mikeash · on Aug 1, 2016

If they're all <a> then you're back to square one.

JSON solves this with indentation, pretty printing, and using paired symbols that most conpetent editors can automatically balance. This solves the homogeneous case too.

Incidentally, XML can benefit from the first two, and many editors balance tags, so you can get the same thing there.

erlehmann_ · on Aug 1, 2016

It is rare in real-world XML that elements have children with the same type. Do you have a (non-divitis) example where the tags are all the same?

lmm · on Aug 1, 2016

It happens with any tree structure. E.g. I used to work on a system that managed reinsurance contracts and represented them as trees of contracts.

erlehmann_ · on Aug 1, 2016

Did the elements often have immediate child elements that had immediate child elements (and so on) of the same type? Like:

  <contract><contract><contract><contract> […]

lmm · on Aug 1, 2016

No, there were a couple of layers in that case. But that doesn't actually help you add a child at the correct level, because the end of a contract would look something like:

                ...
                </contract>
              </subcontracts>
            </content>          
          </contract>
        </subcontracts>
      </content>
    </contract>

JoshTriplett · on July 31, 2016

> I think that XML's bad reputation comes from the fact that it is <adverbial-particle modifies="#123">so</adverbial-particle> <adverb id="123">incredibly</adverb> <adjective>verbose</adjective>.

> Also, the whole child/attribute dichotomy is a huge, huge mistake.

Those two factors run counter to each other. Attributes decrease verbosity, compared to child elements.

I agree, though. A few changes would make XML closer to ideal: eliminate attributes and eliminate the name in closing tags (<tagname>value</>), which makes child elements much less verbose, and reduces the need for attributes.

wtbob · on July 31, 2016

> A few changes would make XML closer to ideal: eliminate attributes and eliminate the name in closing tags (<tagname>value</>), which makes child elements much less verbose, and reduces the need for attributes.

Then just change '<tagname>' to '(tagname,' and '</>' to ')' and you'll have S-expressions.

Consider this:

    (feed
     (version 1)
     (title "Example Feed")
     (link http://example.org/)
     (updated "2003-12-13T18:30:02Z")
     (author (name "John Doe"))
     (id urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6)
    
     (entry
      (title "Atom-Powered Robots Run Amok")
      (link http://example.org/2003/12/13/atom03)
      (id urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a)
      (updated "2003-12-13T18:30:02Z")
      (summary "Some text.")))

That is a canonical S-expression (for a Scheme or Common Lisp reader, just quote the URIs too) version of:

    <?xml version="1.0" encoding="utf-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom">
    
    <title>Example Feed</title>
    <link href="http://example.org/"/>
    <updated>2003-12-13T18:30:02Z</updated>
    <author>
    <name>John Doe</name>
    </author>
    <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
    
    <entry>
    <title>Atom-Powered Robots Run Amok</title>
    <link href="http://example.org/2003/12/13/atom03"/>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <summary>Some text.</summary>
    </entry>
    
    </feed>

I particularly like how URIs are sometimes encoded as attributes and sometimes as child text elements.

And compare to your proposed version:

    <feed>
    
    <title>Example Feed</>
    <link>http://example.org/</>
    <updated>2003-12-13T18:30:02Z</>
    <author>
    <name>John Doe</>
    </>
    <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</>
    
    <entry>
    <title>Atom-Powered Robots Run Amok</>
    <link>http://example.org/2003/12/13/atom03</>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</>
    <updated>2003-12-13T18:30:02Z</>
    <summary>Some text.</>
    </>
    
    </>

I think it's pretty clear which is the most readable and elegant.

JoshTriplett · on July 31, 2016

If you're going to compare the two fairly, include appropriate indentation for both, not just the S-expression version. Also put the author and name tags on the same line, as you did with the S-expressions:

    <feed>
      <title>Example Feed</>
      <link>http://example.org/</>
      <updated>2003-12-13T18:30:02Z</>
      <author><name>John Doe</></>
      <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</>
    
      <entry>
        <title>Atom-Powered Robots Run Amok</>
        <link>http://example.org/2003/12/13/atom03</>
        <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</>
        <updated>2003-12-13T18:30:02Z</>
        <summary>Some text.</>
      </>
    </>

That said, I like S-expressions too, and I wish more parsers and tools existed for them, such as schemas, query tools, and simple transformation tools.

wtbob · on July 31, 2016

> If you're going to compare the two fairly, include appropriate indentation for both, not just the S-expression version.

When I pasted it in from https://validator.w3.org/feed/docs/atom.html#sampleFeed I guess I lost the indents. No idea why: they are clearly there in the original.

j4_james · on Aug 1, 2016

> I particularly like how URIs are sometimes encoded as attributes and sometimes as child text elements.

I think the distinction here is that the one is an identifier which is not intended to be dereferencable, and the other is a link to a resource which has to be retrievable. In the good old days the id would most likely have been a URN and the link a URL, but that distinction was being discouraged in favour of the more general URI term at the time the Atom spec was developed. [1]

So while they're syntactically both URIs (well technically IRIs), they're functionally quite different. It may be debatable whether that's a good enough reason for the one to be an element value and the other an attribute value, but I don't think that decision was obviously wrong.

[1] https://tools.ietf.org/html/rfc3986#section-1.1.3

erlehmann_ · on Aug 1, 2016

The second and third examples do not have namespaces.

How would you include an HTML summary, for example?

wtbob · on Aug 1, 2016

> The second and third examples do not have namespaces.

> How would you include an HTML summary, for example?

As a text attribute, honestly — which would be necessary in XML as well (you could embed XHTML in XML, but not HTML). And in the general case, embedding one variant of XML inside another, rather than embedding a character-encoded variant of XML inside another, doesn't seem all that useful. How often do transforms need to reach all the way in like that?

I guess it's cool if it's possible, which is why I like S-expressions all the way down. But I don't think it's all that useful, as opposed to neat.

erlehmann_ · on Aug 1, 2016

> How often do transforms need to reach all the way in like that?

In my experience, almost every time XSLT is used on real-world documents, those are documents with multiple namespaces. XSLT stylesheets themselves are also documents that have multiple namespaces. Example: Atom feeds often contain XHTML content. It is a common problem with RSS that it does not specify if the content of an element is HTML or plain text.

I have found that arguments that doubt a feature is necassary from people who can not imagine use cases are almost invariably wrong, while arguments that doubt a feature is necessary from people who list use cases and why they think those are better solved otherwise or even left unsolved are often right. Your post seems like an example of the former; would you say that complex real-world content with namespaces could sway you in favor of them?

lmm · on Aug 1, 2016

I would be convinced if I saw real-world examples where having namespaces gave an advantage over not having namespaces. I can see the value in specifying whether the content of a given node is XHTML or text. I can at least theoretically see value in allowing nesting XHTML without a layer of escaping. I can't see any non-theoretical way in which namespaces are necessary to accomplish these things.

erlehmann_ · on Aug 1, 2016

Example: The XSLT stylesheet for this Atom feed generates a web page for each entry: http://news.dieweltistgarnichtso.net/notes/index.xml In this setup, the Atom XML for each entry is generated from XHTML with XSLT, which makes it possible to automatically include an Atom enclosure element for every XHTML media element. To publish a podcast episode, it is enough to add a post with an <audio> or <video> element, as an XSLT stylesheet can “reach into” the XHTML content.

Namespaces are also widely used in SVG, which uses the XLink specification for hyperlinks and can embed XHTML and MathML content. Since SVG can be embedded in (X)HTML, this means you can have an ATOM feed containing XHTML containing MathML and SVG that contains XHTML and all have it displayed correctly.

lmm · on Aug 1, 2016

> Example: The XSLT stylesheet for this Atom feed generates a web page for each entry: http://news.dieweltistgarnichtso.net/notes/index.xml

> In this setup, the Atom XML for each entry is generated from XHTML with XSLT, which makes it possible to automatically include an Atom enclosure element for every XHTML media element. To publish a podcast episode, it is enough to add a post with an <audio> or <video> element.

Sure. Why do you need namespaces to do that? Why couldn't you do it in XML-without-namespaces (or even JSON and some theoretical JSON-transformation-lanugage?)

> Namespaces are also widely used in SVG, which uses the XLink specification for hyperlinks and can embed XHTML and MathML content.

Again, why are namespaces necessary though? Why not just have a tag whose content is specified to be XHTML/MathML ? Wouldn't you want that anyway for the sake of human readability?

erlehmann_ · on Aug 1, 2016

XML without namespaces does not exist. If it existed, how would you differentiate between title and link elements in Atom and title and link elements in XHTML? They have the same element names, but do not have the same meaning and therefore must be processed differently. Namespaces ensure that any XML processor can know the language of each part of the input.

Namespaces actually are the general mechanism with which you can specify that content is in another language: If you look at the feed source code, you can see that XHTML content is started with <div xmlns="http://www.w3.org/1999/xhtml"> and ends where that div element is closed.

Having an element with the semantics that “this content is in another language” is done out of necessity in HTML, as it has no namespacing: <style> elements contain CSS, <script> elements contain JavaScript, <svg> elements contain SVG … having an element in each language to embed each other language would become complicated very fast.

lmm · on Aug 1, 2016

> XML without namespaces does not exist. If it existed, how would you differentiate between title and link elements in Atom and title and link elements in XHTML?

By where it is in the structure. The document is a tree where each element has well-defined context; there should never be confusion about whether a particular <title> is part of the feed or part of the content in the feed, because if it's in content it will be inside the content tag.

(Don't you need to do that anyway? I mean what if the XHTML had another Atom feed embedded in it? Or the content of one of the entries in the feed was another Atom feed? That's legitimate, but you wouldn't want to show titles from the "inner" feed as titles in the feed).

> Having an element with the semantics that “this content is in another language” is done out of necessity in HTML, as it has no namespacing: <style> elements contain CSS, <script> elements contain JavaScript, <svg> elements contain SVG … having an element in each language to embed each other language would become complicated very fast.

Only if you need the ability to embed an arbitrary other language. And if you do need that you can't possibly be validating or transforming based on what's embedded, so what value is the namespacing of it giving you?

Mikhail_Edoshin · on Aug 1, 2016

You may have incomplete documents (e.g. documents with conditional sections, very much like XSLT):

    <code:if test="...">
      <!-- whatever -->
    <code:else>
      <!-- whatever -->
    </code:if>

Here you'll first process your code part an copy the contents as they are and then process the contents; but in the source document the two languages are interspersed.

Or you may want to extend your text format with, say, literate programming and add code fragments and files. In my homegrown system it's like that:

    <literate:fragment id="..." language="...">
      <text:caption>...</text:caption>
      <literate:code>...</literate:code>
    </literate:fragment>

My text system already has a notion of captions so there's no need to add my own "literate:caption" here. Yet the other two "literate" elements are new an unique. Also, using a namespace here ensures that I'm sure not to have a clash if the base system adds their own "fragment" or "code" blocks.

lmm · on Aug 1, 2016

OK, I guess that takes things a level up. I don't like that kind of interspersed style and I don't think incomplete documents should be the same kind of thing as complete ones (e.g. one can't meaningfully validate your first example, because what if the "whatever" is an element that has to be present exactly once). But I can see that if you want to write things this way then namespaces help.

erlehmann_ · on Aug 2, 2016

“I don't like” seems to be an æsthetic argument, not a technical one.

erlehmann_ · on Aug 1, 2016

> The document is a tree where each element has well-defined context; there should never be confusion about whether a particular <title> is part of the feed or part of the content in the feed, because if it's in content it will be inside the content tag.

In this specific case, maybe – but generally, it is not true that you can infer the namespace of an element from context. Also, elements can have multiple attributes with different namespaces (and often do).

> I mean what if the XHTML had another Atom feed embedded in it? Or the content of one of the entries in the feed was another Atom feed? That's legitimate, but you wouldn't want to show titles from the "inner" feed as titles in the feed

That actually appears to be a bug in my stylesheet. Thank you for bringing it to my attention!

Programs often use namespaces to provide metadata. Here is an SVG I created with Inkscape that uses six different namespaces for metadata: http://daten.dieweltistgarnichtso.net/pics/icons/minetest/mi... Thanks to namespacing, web browsers can display the picture while ignoring Inkscape-specific data.

> Only if you need the ability to embed an arbitrary other language. And if you do need that you can't possibly be validating or transforming based on what's embedded, so what value is the namespacing of it giving you?

It is very useful to embed any arbitrary language, as XML processors can preserve the content they do not understand without processing it. My XSLT stylesheet would have no issue with SVG embedded in XHTML, just as your web browser most likely ignores everything about the SVG linked above it can not understand.

lmm · on Aug 1, 2016

> It is very useful to embed any arbitrary language, as XML processors can preserve the content they do not understand without processing it. My XSLT stylesheet would have no issue with SVG embedded in XHTML, just as your web browser most likely ignores everything about the SVG linked above it can not understand.

Sure, but you can ignore extra attributes in JSON or hypothetical XML-without-namespacing too. I feel like there's an excluded middle here: either the content of a given tag has to be, say, SVG, in which case the validation schema for the outer document could just say (in a structured way) "the content of this tag must be a valid SVG document according to the SVG schema", or the content is some opaque arbitrary XML document, in which case there's no meaningful validation to be done.

Even when working with something like XHTML-with-embedded-SVG, I found myself wishing there was a way to strip the namespaces, run my xpath queries / xslt transformations on the stripped version, and then put the namespaces back; I think I'd've got my actual business tasks done a lot quicker that way.

erlehmann_ · on Aug 2, 2016

Ignoring other attributes in data formats without namespaces is not as easy. What if one language is embedded in another and each one has a title element?

I do not know why you “feel” that way about the middle you want to exclude. It has been proven to be very useful in practice for me. Also without it, XML would not have the “extensible” property.

The way you describe working with “XHTML-with-embedded-SVG” reads to me like there is something about namespaces or your toolchain that you have difficulties with. I found that with XML-based systems, especially XSLT, it is easy to make a task needlessly complicated if one does not understand the details.

Mikhail_Edoshin · on Aug 1, 2016

The creators of XML were aware that it was verbose; they mention in their design goals that this was the least priority.

Child and attribute "dichotomy" is not a mistake. What you mean is that these two samples appear to be equivalent:

    <foo value="123" />
    <foo>123</foo>

But they are not equivalent. The first line (with an attribute) is there solely for the computer. When the document is rendered, the human user is not supposed to see anything there unless the computer adds it.

The second line (with text content) is there for both the computer and the human user. The text "123" is for the human user; the fact that this text is something called "foo" is for the computer. When the document is rendered, the human user will see "123" here. Maybe computer will enhance something or maybe it will just use it as index or reference, whatever.

Most people who don't like XML seem to only encounter it in config files. In config files there's normally no content that needs to be there for the end users, so all data can happily go into attributes. The text content starts to matter when we deal with natural language texts.

wtbob · on Aug 1, 2016

> The creators of XML were aware that it was verbose; they mention in their design goals that this was the least priority.

Which seems pretty wasteful.

> Child and attribute "dichotomy" is not a mistake.

It's not for a markup format — as I mentioned, it can make sense there — but, as you mentioned, it doesn't make sense in a config or data file format.

legulere · on July 31, 2016

The problem is that XML maps badly to data structures in common programming languages. JSON maps perfectly to structs and datastructures as lists/arrays/maps.

S-expressions are good if you work with Lisp like languages, but I don't think they're very readable if you're not into Lisp. I also can't see how they map easily into datastructures of imperative programming languages or even statically typed functional programming languages like haskell.

wtbob · on July 31, 2016

> S-expressions are good if you work with Lisp like languages, but I don't think they're very readable if you're not into Lisp.

Take a look at https://news.ycombinator.com/item?id=12198581; I think it demonstrates how readable one dialect of S-expressions can be.

> I also can't see how they map easily into datastructures of imperative programming languages

JSON consists of numbers, strings, booleans, objects and arrays; canonical S-expressions consist of bytes and lists. I contend that one can easily encode strings, numbers and booleans alike as bytes, and both objects and arrays as lists. Consider:

    {
        "id": 1234,
        "isEnabled": true,
        "props": ["abc", 123, false],
    }

This could be encoded in canonical S-expressions as:

    (object
     (id "1234")
     (is-enabled "true")
     (props (abc "123" "false")))

Granted, one still must convert the strings "1234," "true," "123," and "false" into the expected types, but with JSON one still must check the expected types anyway; it's not that big a difference.

And I honestly think that the S-expression version is far more attractive.

zyxley · on July 31, 2016

You could do make it more like S-expressions in JS if you really wanted.

    {object: [
      {id: "1234"},
      {isEnabled: "true"},
      {props: ["abc", "123", "false"]}]}

Not quite the same, but nothing keeps you from parsing an array of key/value pairs instead of a hash.

wtbob · on Aug 1, 2016

You may not leave JSON object properties unquoted, so it'd have to read:

    {"object": [
      {"id": "1234"},
      {"isEnabled": "true"},
      {"props": ["abc", "123", "false"]}]}

So you have extraneous quotes, extraneous semicolons, extraneous commas, plus the parsing code is complicated by having to handle all of that rather than atoms & lists (that's not a strong reason, since parsing code is written once and used millions of times).

I really, really don't get the visceral opposition to S-expressions. From my perspective they're both better & simpler.

PeterisP · on July 31, 2016

There is a very big difference - "with JSON one still must check the expected types anyway" is not really true, I can deserialize an arbitrary json and I will know the difference between 123 and "123" even if I don't know what's expected or, alternatively, mixed-type values are expected.

wtbob · on Aug 1, 2016

> There is a very big difference - "with JSON one still must check the expected types anyway" is not really true, I can deserialize an arbitrary json and I will know the difference between 123 and "123" even if I don't know what's expected or, alternatively, mixed-type values are expected.

You will still need, in your code, to handle both 123 & "123" (or handle one, and error on the other). That's really no different from, in your code, parsing "123" as an integer, or throwing an error.

In JSON one must check that every value is the type one expects, or throw an error. With canonical S-expressions, one must parse that every value is the type one expects, or throw an error. There's really no difference.

If one is willing to use a Scheme or Common Lisp reader, of course, then numbers &c. are natively supported, at the expense of more quoting of strings (unless one chooses to use symbols …).

lmm · on Aug 1, 2016

> You will still need, in your code, to handle both 123 & "123" (or handle one, and error on the other). That's really no different from, in your code, parsing "123" as an integer, or throwing an error.

It is different because in the latter case you have to write your own code to do it, while in the former your library will handle it for you.

> If one is willing to use a Scheme or Common Lisp reader, of course, then numbers &c. are natively supported, at the expense of more quoting of strings (unless one chooses to use symbols …).

So this format comes in dozens of partially-incompatible variants? Lovely.

ZenoArrow · on July 31, 2016

> "The best human-readable data transfer format is probably canonical S-expressions"

I personally think TOML is a bit more readable...

https://github.com/toml-lang/toml

legulere · on July 31, 2016

For configuration files, not for data serialisation.

ZenoArrow · on Aug 1, 2016

Let's put it like this... what can you express in JSON that you couldn't express in TOML?

lmm · on Aug 1, 2016

I can cleanly parse JSON, serialize it, and be confident I haven't lost anything. That can't be done for a language that allows comments without complicating the AST.

crdoconnor · on Aug 1, 2016

YAML is more readable than TOML though.