XML is an embarrassment. It solves no problem: not even the problem of agreeing on how to represent data. The only thing it does is give programmers something recognizable to fiddle with, however irrelevant to the problem it may be.
Most religious wars in computer science hinge on matters of taste. If you prefer emacs to vi, maybe that's just your style. If you prefer PHP to Ruby, there may be several good reasons why.
There is no such ambiguity in the case of XML. If you prefer XML to anything but XML, you don't know what you're talking about. You should have no say in anything that affects other programmers.
We're in this mess because of the unforeseen popularity of the web. When the web was created, its designers chose a simple and not particularly good markup language. Then the web grew, and instead of everybody recognizing the language as bad and replacing it, we turned a blind eye to its faults and kept it around.
The immense popularity of the web has glossed over all the deficiencies present in markup languages. People can't imagine that anything that built the internet might have something wrong with it. The internet is good, so anything that built the internet must be good as well.
Markup language was ill-conceived. Generalizing it into XML was folly.
How can you possibly take XML seriously? How do you squeeze an entire blog post out of it? Have you never bothered to look at the technology? The author is obviously capable of writing a coherent, well-thought out essay. Did he never stop and look at what he was doing and go, "This is a whole lot of shit!"
Yes. The fact that XML parsers are pervasive is a good thing, and an advantage for XML as a technology. But it says absolutely nothing about XML as a metadata format. A standard parser suite for anything would have the same advantages.
Note also that there are other technologies with pervasively available parsers, like JSON, which don't share any of XML's warts.
This is entirely the problem. The XML didn't solve anything. We still have to negotiate the terms of the transfer. We've agreed to use XML, true, but we are still at square one.
Of course you have to negotiate the terms of the transfer. You have to do that using any format (custom binary, XML, JSON, s-expressions, whatever). XML just defines a lot of common syntactic stuff which you would have to define anyway in any format you decide on.
I would argue that there's very little value in standardizing that syntactic stuff. Whatever tiny amount of value there might be is probably destroyed by picking a convention as almost universally inappropriate as XML.
So the fact that 96% of the characters you are sending are markup is not a downside?
S-expressions solved this problem a long time ago. (a b c) is only 57% markup. I don't think you can get much more succinct than that and still express the idea of a list.
XML have redundancy by design. It's a deliberate trade-off to make it easier to read and write by hand, at the cost of size.
If message size is an issue then gzip the data. Or if you have very specific needs for processing speed, look into something like Googles protocol buffers.
You are optimizing at the wrong level if you are concerned about a few extra characters in a human-readable data exchange format.
But is it really easier to read and write? Properly-indented S-expressions are just as readable. Generating XML and then gzip-ing it is a lot more work (and requires a lot more libraries) than generating S-expressions.
Perhaps the real problem is that too many people use terrible text editors. Paren-matching and auto-indentation makes writing S-expressions orders of magnitude easier, and at least a constant factor easier than writing XML.
You are right about text-editors. XML was designed to be reasonable easy to write and edit by humans without specialized software. The redundant end-tag helps to catch errors and make structure more explicit.
Sure everyone could just use a fancy specialized editor with paren-matching auto-indentation. But one of the goals of XML was precisely that it should not rely on specialized software to be able to read and write.
Your example with the table is a lot clearer with sexpr syntax because you don't actually have any content in the table. Try again with a few sentences of mixed content, some bolded words, a link, and so on, and you will get my point.
Note that you would also need to gzip your s-expressions if you are concerned about size.
The HTML of your comment, with some formatting added:
<span class="comment">
<font color=#000000>
You are right about text-editors. XML was designed to be reasonable
easy to write and edit by humans without specialized software. The
redundant end-tag helps to catch errors and make structure more
explicit.
<p>
Sure everyone could just use a fancy specialized editor with paren
matching auto-indentation. But one of the goals of XML was
precisely that it should not rely on specialized software to be
able to read and write.
<p>
Your example with the table is a lot clearer with sexpr syntax
<b>because you don't actually have any content in the table.</b>
Try again with a few sentences of mixed content, some bolded
words, a link, and so on, and you will get my point.
<p>
Note that you would also need to gzip your s-expressions if you are
concerned about size.
</font>
</span>
The same thing in S-expressions (an invented syntax):
(span (class . comment)
(font (color . #000000)
You are right about text-editors. XML was designed to be reasonable
easy to write and edit by humans without specialized software. The
redundant end-tag helps to catch errors and make structure more
explicit.
(p)
Sure everyone could just use a fancy specialized editor with paren
matching auto-indentation. But one of the goals of XML was
precisely that it should not rely on specialized software to be
able to read and write.
(p)
Your example with the table is a lot clearer with sexpr syntax
(b because you don't actually have any content in the table). Try
again with a few sentences of mixed content, some bolded words, a
link, and so on, and you will get my point.
(p)
Note that you would also need to gzip your s-expressions if you are
concerned about size.))
It's really not a lot different. Of course, parens would need to be escaped, but this is no different from needing to escape < and >.
But one of the goals of XML was precisely that it should not rely on specialized software to be able to read and write.
But this is the problem with XML... it does rely on special libraries to validate and parse into reasonable data structures. It requires special heuristics to describe how to recover nicely in the event that markup isn't valid. It requires a document describing exactly what the XML needs to look like.
S-expressions are easier to parse, less verbose, and can accomplish all the same tasks and more, all while being more flexible in general.
What you have done in you example is reinvent XML with round parenthesis instead of pointy brackets. Why is this better? The only difference is that you leave out the redundant end tags, which are there for good reason.
Yes, XML requires a library to parse - so does s-expressions! The reason XML seem more complex than sexprs is that it defines a higher-level syntax e.g. with element/attribute-distinctions. You have reinvented that yourself in you example, so you need a spec for it and you need the parser to support it. Also the rules of encodings and character sets have to be specified (e.g. how do you detect the encoding of a file? Which characters count as whitespace?). You will end up with a spec much like XML, except with round parentheses. (OK, XML is also complex because of DTD's but that is a optional part. If you want something like DTD's for sexprs, again, you have to specify it, and you get something like XML.)
Btw. there is no heuristics for recovery in XML. XML parsers must fail when encountering malformed syntax. This is one of the major (and controversial) differences between XML and HTML.
I appreciate s-expressions as a syntax for a programming language. But code is a very different use case than documents. I wouldn't like to program in XML syntax either! E.g. programs (hopefully) don't have deeply nested structures covering several pages. That is common in documents, hence the importance of the redundant end tag.
I like sexprs for code and data, but for documents they are only simpler if you ignore a lot of real-world issues.
BTW. the HTML is not valid XML so your example is a bit misleading. The P-elements contain the paragraphs rather than delimit them. The XML would be more verbose since it needs end-tags for P:
<p>Note that you would also need to gzip your
s-expressions if you are concerned about size.</p>
The s-expr OTOH would be more confusing, because there isn't a clear distinction between element-name and content:
(p Note that you would also need to gzip your
s-expressions if you are concerned about size.)
You might want to choose a different syntax to make the distinction clearer:
(p "Note that you would also need to gzip your
s-expressions if you are concerned about size.")
or:
((p) Note that you would also need to gzip your
s-expressions if you are concerned about size.)
In the end, you have to make some of the same trade-off decisions that the designers of SGML and XML did. Just saying that s-expressions are simpler than XML is like saying ASCII is simpler than s-expressions: True, but kind of missing the point.
Well in that case of using XML as markup (what it was designed for) - it is clearer then the s-exp. the only time I like XML editing is docbook - cause when you end a tag, you never have to bounce back up (which may be more then a screen away) to know what tag you are in.
That does not seem to be to be typical XML, and if it is, it's really being stretched to do something it's not intended to do, IMO. XML's strength is in representing tree-based structures, but that appears to be an attempt to represent an associative structure. With Sexps, this is just as easy:
(sizes (dress . 5) (pants . 7) (shoes . 11))
But in doing that, the structure really looks off, even though it's almost exactly mirroring the XML. I think this is a clue that the XML is a bit of a stretch. Much better (in Lisp code) is:
(let ((sizes '((dress . 5) (pants . 7) (shoes . 11))))
; do something with sizes
...)
But I guess the real question is what this is trying to represent. If it's the sizes of various people, then the Sexps are quite simple:
But now we're getting away from the structure we defined using S-expressions, and besides, name='...' seems to be distinctly different information from the sizes themselves, so something else is strange. Perhaps
Well, that looks nice, and closer to what we are trying to represent, but of course it's impossible to validate (at least from what I know of XML), since the person names are not part of our schema. We'll have to do something like:
Great! Now we have something that matches our desired structure and is easy to validate. Of course, it's much more verbose, but that made it easier to read and write, right?
Part of the problem with XML is that it causes these huge debates about how to structure and name the data. Another problem is that attributes don't nest nicely; that was the main problem in this instance. In other words, XML can be used nicely to represent a tree structure and reasonably well for lists or simple associative structures. But as soon as those associative elements need to map to something more complicated, you start having issues with how best to structure everything.
With S-expressions easily able to express assoc-lists while also being trivially nestable, these issues don't come up.
"That does not seem to be to be typical XML, and if it is, it's really being stretched to do something it's not intended to do, IMO. XML's strength is in representing tree-based structures, but that appears to be an attempt to represent an associative structure. "
It's fairly typical of the XML I've used, and well within what XML was intended to do. That some (many?) people end up with needlessly verbose markup is not the fault of XML. Some people write verbose Scheme. Go figure.
A major point of XML is just simply tree data, but meta-data. You first showed a basic, non-annotated list; I showed a list with meta-data. Seems that you didn't like how the s-exp version of that XML looked, so you changed that use case.
"But in doing that, the structure really looks off, even though it's almost exactly mirroring the XML. I think this is a clue that the XML is a bit of a stretch. "
Or just maybe it's an example where XML differs from s-exps.
"Part of the problem with XML is that it causes these huge debates about how to structure and name the data."
Not really. I mean, some people like that stuff (I see it as a bike shed thing; it's a chance to show off how complex people can make something), but many other folks find a sparse, good-enough structure and move on. Quite honestly, the way you exaggerated the initial example is pure strawman. And you can have the same arguments about representation using s-expressions.
Don't blame a syntax because it allows people to be dopey.
I can see how nicely s-exp can work for markup, but I'm still curious how name-spaces, schema, ID + IDREF, transclusion, and other XML features are handled in s-expressions.
I mostly get the feeling that the only real gripe about XML is the duplication in the closing tags. (The W3C has explained why they dropped the short-form of XML and went with explicit end tags.)
It's not just about the amount of markup, though, it's about the unnecessary complexity of the markup. The software that generates and parses S-expressions is much simpler than that which generates and parses XML. Of course, in Lisp, it's just
(let ((list (read data)))
...)
But even in Python, you could easily hack together (not recommended) something like
list = ", ".join(data.split())
...
Of course, that's not robust, but the library which is robust is much simpler than the one that requires the use of a C sax parser just to be usably fast.
If the argument is that XML is more human-readable, that is implying that it's being human-modified, and then XML creates more work since it's so verbose. If the verbosity is not an issue because it's auto-generated, that implies that it's not being read/modified by humans, and the whole point of using XML in the first place is lost. I just can't see any problem that XML solves that S-expressions didn't already solve in a simpler way.
S-expressions are nice but not superior to XML for all use cases. S-expression syntax are optimized for lists of names and numbers. XML-syntax is optimized for structured documents. Since XML is used just as much for data as for documents these days, s-expressions would perhaps be just as good as XML for a common data exchange meta format. But that train left the station a decade ago.
I suspect a reason XML catched on and s-expressions didn't (outside of the Lisp-niche) is that XML tackled difficult internationalization issues like different encodings and character sets head on.
He said:
With XML, we can easily agree and collaborate on a format and both of our languages have builtin libraries to extract the data we need.
You replied:
OK, I'm going to send you a list using XML.
The only thing you have proved here is that your blind hatred for XML makes you unable to read and parse what is posted.
When arguing against the evils of a standardised format which there are proper parsers for everywhere, failing to parse stuff yourself is probably not on the list of things you want to do.
You missed the point. After you've agreed on XML, you still have to agree on how to represent a list. You can use existing libraries to parse the XML, but you still have to write a parser to transform the resulting DOM into a list.
Plus, given that the availability of good XML parsers is one of the primary advantages claimed for XML, can anyone name some XML parsers that aren't ridiculously slow? Maybe I haven't looked hard enough, but I always seem to find that my own code to parse ad-hoc formats goes 10x faster than, e.g., expat parsing XML.
You missed the point. After you've agreed on XML, you still have to agree on how to represent a list.
No I didn't. Because that's exactly what the guy in the OP said. Agree on a XML schema for the data. After you've done that, writing some simple Xpath to get your data is done in minutes.
The only times my XML code is 100% DOM is when I need to make things from scratch or do XML data manipulation.
An embarrassment that solves no problem, not even the problem of agreeing on how to represent data?
Surely you jest. In the real world, well, my world anyway, the problem of how to transfer bits of data around in a file format wildly different systems can understand is a major problem indeed, even if it's not a terribly sexy one, and the, admittedly ugly, representation of data in some sort of HTML-inspired fashion may not be anywhere near a solution for the data representation problem, it's most definitely better than not making a stab at solving the problem at all.
I'm confused as well. XML is fantastic for providing an easy way to roll customized data storage and interchange documents.
The article is pointing to technologies like JSON as a "backlash" to XML. I only use JSON when sending PHP objects directly to javascript to manipulate. Why create a data interchange format if you don't need to?
XML is like Java. The language isn't friendly, but the platform has so many man-years invested into it that it may be the best tool for the job.
Maybe my "taste" is to have a solid platform at the expense of some syntactic niceties. Any any case, saying that anyone who disagrees with you is certainly ignorant just makes you come off as foolish.
It's not as dramatic as the article makes it seem.
XML is and was a great solution far a huge number of things. Overexcited developers were tad too eager to put it to use for things outside its true scope (regardless of what it was marketed as), and now they're realizing that there are better alternatives for those particular applications.
XML was and still remains an excellent solution for the problem it originally solved: a joint human and machine readable markup language.
Uh... a tad too eager? Dude, we crossed that threshold with Ant or JSP. By the time XSLT and XQuery rolled around, we were looking at a full-on stampede of developer group-think.
XSLT happens to be my pet peeve. I simply can't understand how anyone would have ever looked at that problem ("how to turn a source data document into a presentation format" -- something that has been solved sanely a thousand times by obscure technologies like "scripts", or "PHP") and decided that the best way to handle it was a turing-complete pattern matching language written in XML itself! I mean, it looks more like a torture device than a programming language...
Yes, XSLT written in XML itself is abominable. It's easier to read if the program and data have different formats (lisp notwithstanding...). But it's escaping less-than signs that got really me: (i < 5).
But transforming trees with a functional programming approach (not PHP) is a natural fit.
XML is popular simply because HTML made the web happen. XML looked like HTML so people understood what it was for immediately, and then it got overused.
Most religious wars in computer science hinge on matters of taste. If you prefer emacs to vi, maybe that's just your style. If you prefer PHP to Ruby, there may be several good reasons why.
There is no such ambiguity in the case of XML. If you prefer XML to anything but XML, you don't know what you're talking about. You should have no say in anything that affects other programmers.
We're in this mess because of the unforeseen popularity of the web. When the web was created, its designers chose a simple and not particularly good markup language. Then the web grew, and instead of everybody recognizing the language as bad and replacing it, we turned a blind eye to its faults and kept it around.
The immense popularity of the web has glossed over all the deficiencies present in markup languages. People can't imagine that anything that built the internet might have something wrong with it. The internet is good, so anything that built the internet must be good as well.
Markup language was ill-conceived. Generalizing it into XML was folly.
How can you possibly take XML seriously? How do you squeeze an entire blog post out of it? Have you never bothered to look at the technology? The author is obviously capable of writing a coherent, well-thought out essay. Did he never stop and look at what he was doing and go, "This is a whole lot of shit!"