However, the fight occurs when XML is used as a generic serialization format, or even as syntax for a programming language. There, the features only add redundant complexity (what does unquoted text mean in an ant build file? why am I putting quotes around identifiers, and why am I including the redundant end tags when my editor indents for structure?). When your format already has specific semantics, the wiggle room between element/attribute/text only adds inflexible accidental complexity.
I think there'll be another chance when SOA/web services are superseded; but there's still huge scope for improvement within their present architecture/ecosystem (e.g. REST vs. SOAP vs. ?).
For programming languages, sexps are competing against all the other language syntaxes out there, not just XML. I agree that it seems odd that ant uses XML though... perhaps the extensibility of ant is easier with some kind of generic format (like XML/sexp/JSON)? yet other languages manage to be extensible via functions/classes/modules etc.
Thank goodness no one uses JSON to encode a language (in the way that ant uses XML).
Heh. Of course, they don't need to. As the French guy says in the Holy Grail, "we've already got one, it's very nice!"
JS's notation for data and code may not be identical, but they're close enough to get things done (that's JS's Lisp heritage showing). Since XML was explicitly designed to prevent people from doing a bunch of things they needed, it's not surprising that what resulted were montrosities.
I suspect that XML is a manifestation of the kind of people who like to lock things down and specify them up front, until they're so tied up in knots of their own making that they form a committee to design the same thing all over again. As you may guess, I'm of the opposite camp. Happily, I can work in my medium and leave them to theirs.
[ "function", "byId", ["id"], [ "call", "document", "getElementById", ["id"] ] ]
["function", "byId", ["id"], [["return" ["call", "document", "getElementById", ["id"]]]]]
(basically lisp syntax substituting JS array notation.) My point is that JSON is capable of being terse, and XML is not. XML attributes are unordered, so you have to use child nodes, which have to be named. The best you could do is:
<func><sig name="byId"><arg name="id"/></sig><body><return><call object="document" method="getElementById"><args><variable name="id"></args></call></return></body></func>
Which is significantly longer than a positional JSON serialization. XML is also harder to implement a parser for, and existing libraries tend to be difficult to use (python etree and ruby's implementations are a much better direction). Now, someone else's raw XML is often easy to understand, whereas my array based JSON format would clearly require domain knowledge. Because of this, I prefer JSON for small network requests that are consumed by scripting languages.
For larger, disk files, the overhead of XML is marginalized, and the extra formatting might help in hand editing and error correction.
As for Greenspunning, I think its a perspective issue. The example was one of code serialization, so the lisp syntax is particularly well suited to the problem. Programmers also have the domain knowledge, so the less verbose format is still easy to understand.
Actually, in my current project we use JSON to encode parse trees. It's no s-expressions, but it works pretty well.
The Ruby and Python bindings let you choose between JSON and the native hash/array or list/dictionary structures. You can be idiomatic and portable at the same time.
Seems that it would be easier to write a build system as an embedded DSL in a general purpose language to begin with, and when further analysis tools wanted to be written, make the necessary changes (meanwhile old build scripts run just fine using the old library).
But then again, it's much easier to criticize with hindsight than to design a system that becomes popular.
Indeed - ant came about when XML was popular for far more things then it should have been used for, and java wouldn't make a very good declarative build language (which is what ant was targeting).
I found this statement in the article interesting: "The central idea of the XML family of standards is to separate code from data." It explains why all the systems that express programming constructs in XML are such monstrosities (including XSLT, which he cites approvingly and yet which totally violates this separation). I wonder what the author would say to the people who do this kind of thing? They're not using it according to specification?
Edit: some of the arguments are out of date, too. I don't know anything about Lisp documentation in LaTeX; the open-source Lisp world tends to generate HTML documentation from s-expressions, as for example here: http://www.weitz.de/cl-ppcre/.
By Google, XML is winning 50 to 1 - but declining, and JSON is growing: http://www.google.com/trends?q=xml%2C+json&ctab=0
However, a factor is that people already know about XML and don't need to search for it. e.g. HTML is declining even faster: http://www.google.com/trends?q=xml%2C+json%2C+HTML&ctab=...
Do you have references for that exodus?
Not really. I'm talking about what I observe in the hacker world, which is a thoroughgoing trend away from XML. Do you really see it otherwise? It's not all going to JSON of course.
Most of the XML stuff is in big enterprise projects and, for some value of "count", those just don't count. Last I checked the IT pundits were declaring SOA dead, after having milked it for a decade.
The nice thing I see in XML is that it abstracts out grammars (using XML Schema / DTD). For JSON, a grammar isn't used - it's a nested tuple transmission format - sort of a dynamic type systems, but without, er, types - just tuples that can contain anything. It's agile, and all you need in many cases. And JSON is a natural for web-client stuff.
BTW: who said SOA is dead? SOA doesn't solve any pressing problem, but all the vendors switched to it.
A friend who works in banking sent this to me, mainly because the two of us had been predicting it for years.
The post, incidentally, comes perilously close to saying that it's time to invent new bullshit acronyms because business people have stopped falling for "SOA". One could hardly ask for a better exposé of the IT racket.
Maybe, but I doubt it. When it comes to big companies, there are too many people making money off software development not getting better or cheaper. There's still too much of a culture gap between the execs who write the cheques and competent hackers. This creates opportunity for the incompetent-but-slick to step in as parasites. When I say incompetent, of course, I mean incompetent at software; they're quite competent at getting execs to write cheques. And that would be fine, except they're not adding any value (or at least not any value commensurate with what's spent). In other words, the market is simply inefficient.
Even when competent hackers work for such companies they are paid far less and have far less influence than the slickees. Moreover, the population of the competent is small, so they are drowned out demographically.
It will take a long time before the market rationalizes. I do believe this is happening, but slowly. One economic cycle won't turn it around, but I agree with you that it may help!
There's a range of people in any industry - I doubt that many fit the black-and-white stereotype that you paint. There are also real advantages of standardization (such as being able to swap different items in and out), which comes at a cost of inefficiency. The interfaces to those standards must be rigid (within tolerances), or you can't reliably swap the modules. It's a trade-off.
Re your second paragraph: Oh, come on. The XML standard doesn't allow two different programs that use XML to interoperate or one to be substituted for the other. It was never going to allow that, and it was obvious from the beginning that it was never going to allow it. It's like saying that if you're French and I'm German and we publish books using a standard font, we'll understand each other.
I was mainly addressing your comment elsewhere about people who like to lock things down, and specify them upfront. For interfaces, you really do need to agree on some things, and be strict (within those tolerances). Someone changing an interface on you can be pretty frustrating.
I worked with one of these industry-specific XML formats on another project. A bunch of oil companies took years to define it. Do you know what happened? First, the overwhelming majority of projects still use the old ASCII format which is much easier to work with. Second, those of us who tried out the new format soon found that the different vendors' implementation of this "standard" were incompatible with each other, and we had to write just as much vendor-specific code as before, only now the format was bloated and rigid and made it harder.
The whole approach just hasn't worked in practice, and if it were going to, it would have by now.
The nice thing I see in XML is that it abstracts out grammars (using XML Schema / DTD)
Have you ever used XML Schema on a real project? I tried, on a nice meaty project, for perhaps a year. It turned out to be as awful to work with in practice as it sounds good in theory. It's the kind of thing people write design specs for, and then after the standards are ratified they write books about it, without ever actually themselves building anything. Meanwhile, pity the poor schmucks who get the book and try to use it on a real system, wondering what they're doing wrong for a year until they finally figure out that the problem isn't them.
To give you an example: what happens when you give one of these systems a chunk of data that doesn't match its nicely specified schema? Well, with the tools we were using at the time, you get something like "Error type error the int at position 34,1 of element XSD:typeSpec:int32 type invalid blah blah blah". What can a system do with that other than tell its poor user, "Invalid data"?
Now I suppose you'll tell me that we just picked the wrong validator. :)
Disclaimer: I work for Mark Logic, which sells a schema-agnostic XML content server.
For a grammar specification language (like XML Schema) to do a really good job, it really should also formalize how to specify error messages for that particular grammar. I'm not sure how hard it would be to do this, and I haven't seen any research on it.
An odd thing about XML Schema is that it's not very resilient - when this was supposed to be one of the cool thing about "extensible" XML. The next version is a little better at this. But it sounds like in your case, you wanted to get an error (because there was a real problem), it's just that you couldn't trace where it came from, or what its meaning was in terms of the system. It sounds like a hard problem. BTW: would using JSON or sexps have made this problem any easier? I think it's much deeper than that.
Getting back to XML... it's unsurprising that XML Schema isn't resilient. It's one of the most rigid technologies I've ever seen. Rigidity gets brittle as complexity grows. The error-handling fiasco of XML Schema isn't an accident. It's revealing of a core problem. Don't think you can sidestep the issue just by saying, well it's hard. :)
Would using JSON or sexps have made this problem any easier?
Sure. I've done it both ways, there's no comparison. It's not the data format alone that makes the difference, but the programming and thinking style that the format enables. These structures are malleable where XML is not. Malleability allows you to use a structure in related-but-different ways, which is what error handling requires (a base behavior and an error-handling one). It also makes it far easier to develop these things incrementally instead of having to design them up front. So this is far from the only problem they make easier.
The wire format doesn't affect this approach - it could be JSON or XML. However, JSON and data structures maps cleanly, because it's an object format already. To do the same thing with XML requires an extra level, and you get a meta-format like xmlrpc, which is pretty ugly.
So I think you're talking about a kind of object serialization, with object-to-object data binding.
XML Schema is an attempt to factor out the grammar of the data structures, so that they can be checked automatically, and other grammar-based tasks can be automated. I think this is a worthy quest, succeed or fail. One specific failing we discussed was error messages.
I'm trying to grasp your point of view, and presenting what I think it is, so you tell me if I got it right or not (assuming you see this reply).
BTW: I meant that specific problem you mentioned (which was a non-conforming xml document) - how would JSON/sexps make that specific one easier to solve?
No, that's not the central idea of lisp. The closest "central idea of lisp" is that code should have a standard and convenient representation so it can be readily manipulated by programs.
Lispers separate application code from application data. They do so even when the application data is a program....
It is true that Lisper's didn't engage in years of meta-whinging and defining transform languages, but the fact that the XML folk did was more of an accident of history. When the Semantic Web hype was in full-swing, the Lispers were still licking their wounds from the AI winter.
And, how is that Semantic Web coming along? Pretty much where the Lisper's left it....
Let's hear it for ASN.1 (http://en.wikipedia.org/wiki/ASN.1), so good that google can reinvent it as protocol buffers (http://code.google.com/apis/protocolbuffers) and get people interested.
More seriously, yes, binary formats have their problems. And ASN.1 in particular suffered/suffers from being pre-unicode (and thus having a number of different + mostly useless character set types).
But it seems to me that this nested tag-length-value structure (ASN.1 and protocol buffers) occupies a design sweet spot.
He claims that syntax is important, otherwise we'd still be using binary formats. If that's true then the only reason syntax is important is for humans, because machines can read binary formats just as well as anything else. So why compare it to s-expressions instead of to a human-friendly ideal? Is XML the best human-friendly markup language possible? Yes his paragraph is similarly readable in both forms, but would it be if wrapped in XML namespace clauses and an XML Schema? Would the s-expression version be if wrapped in a macro to parse text outside quotes as significant text?
The purpose of XML shouldn't be to make everything more XML-y, but to make everything easier. Maybe it does, in big enterprisey systems and between large data silos. Look at the recent ODF v. Office XML file formats. Have any of you been tempted to process Microsoft Office documents directly because now they're XML they must be easier than the old binary formats?
It's still not human friendly - you can't use the Jabber protocol by hand over telnet like you can use POP3, SMTP, IRC.
There's an intriguing view that text can't be used by hand either - you have to use a "text shell". Wouldn't a fair comparison use an "XML shell"? If it knew the schema, it could even autocomplete/intellisense for you, showing you the available choices at that point... it's pretty cool how XML has factored out the grammar of a language, in a reusable way.
I don't know the Jabber protocol, but the difficulty I've experienced with web-based XML protocols (web services) is that the http header needs the length of the message, which is hard to do by hand - it's not due to XML (and a "http shell" would fix that...)
<t>This is not bold.</t>
This is isn't bold.
This isn't either.
(:p "Lisp doesn't require quoting of apostrophes (') or parentheses (()) either...")
In order to simplify parsing (and in order to enable parsing a well formed document without knowing its DTD), XML threw out much of the syntactic sugar that SGML had to offer such as implicitly closed tags. I.e. Many people think that <tr><td>foo<td>bar... is invalid HTML because of missing </td>. In fact, it is perfectly valid because the HTML (SGML-)DTD specifies that a <td>-tag implicitly closes a preceding <td>. Most think it's a browser-hack. It is not.
What both XML (and SGML) add over S-EXPR is support for expressing grammers, a well-defined validation mechanism and quite good support for coping with different character encoding.
The introduction covers the difficulty of parsing XML fairly well, which is another way of saying that XML does have a fairly complex "model" behind it that Lispers often ignore.
"They are not identical. The aspects you are willing to ignore are more important than the aspects you are willing to accept. Robbery is not just another way of making a living, rape is not just another way of satisfying basic human needs, torture is not just another way of interrogation. And XML is not just another way of writing S-exps. There are some things in life that you do not do if you want to be a moral being and feel proud of what you have accomplished."
Wasting what might amount to thousands of man-years by forcing talented people (who might otherwise accomplish something meaningful) to build workarounds for an inferior technology is every bit as criminal as, say, embezzlement.
Lisp programmers see everything in terms of as an ad hoc,
informally-specified, bug-ridden, slow implementation of half of Lisp,
and don't see other benefits it might have.
That is, Greenspun's Tenth Law is true - for Lisp programmers.
Some XML standards fell into a similar trap, by wanting languages that process XML to be themselves written in XML - such as XSLT. It's a nice abstract concept to be able to process yourself... but at the price of abominations like "i < 10".
Adam Bosworth pointed out that XML's XPath resisted this - by making XPath itself an embedded non-XML mini-language. Imagine an XML representation of path components - now that would be verbose!
In a politically expedient move, I'd like to point out that pg didn't fall into this trap: he made the DSL for users of Viaweb to customize their store to not be lisp (though an easily mapped subset, if I understand correctly.) It's a non-lisp mini-language.
Great link from the article, about "principle of least power", for mini-languages: http://www.w3.org/DesignIssues/Principles.html#PLP Constraints are very empowering, because you know what to expect.
Regarding XML: I'd always thought it was just one of many possible syntaxes for representing hierarchical data; and it really didn't matter which syntax you used. As in a lingua franca (or any standard), provided that it is barely adequate, the key thing is that everyone agrees on it. XML became the Chosen, de facto standard, because everyone was already familiar with HTML, propelled by the mass adoption of the web. So the question becomes: why did we get HTML (based on SGML), instead of S-expressions? The article gives reasons, but I guess the short of it is that if a group of people work towards a specific purpose for years, and are successful at it (as SGML was), it is probably a good base to start from if you want to do something similar, i.e. describe documents.
Also, more directly, if I imagine a large webpage described with S-expressions, I think HTML is a bit clearer.
Nitpick: The article omits that quotes (or apostrophes) must be escaped in XML attributes.
Very telling points about LaTeX - that like XML/HTML, it also uses named end-tags; and that Lisp documentation itself is used LaTeX instead of S-expressions - drinking their kool-aid; but not eating their dog-food.
No, it's a demonstration of ignorance. LaTeX wasn't written by Lispers, it is merely used by them. The fact that they find its design decisions acceptable must be weighed against the cost of their alternatives. That doesn't imply that they wouldn't have been happier with a more lispish syntax.
At the time that those decisions were made, LaTeX was pretty much the best alternative. The fact that Lispers, like almost everyone else in related communities, made that decision merely says that Lispers don't cut off their noses to spite their face.
> If S-expressions were easier to edit, it would be most logical to edit the document in S-expressions and then write a small Scheme program to convert S-expressions into a formatting language like LaTeX. This is, what XML and SGML people have done for decades [...]
(1) As another comment points out, they have when doing so provided benefits.
(2) Lispers tend to be multi-lingual; they'll use other languages when appropriate. If XMLers can only work in XML....
>This is, what XML and SGML people have done for decades
Decades? 20 years/two decades ago is 1988. The first draft of XML is roughly 1998/10 years later/one decade ago. GML, a predecessor to SGML, didn't become public until 73 but the "multiple use" stuff was still in the future.
SGML rode the WWW wave, but that didn't happen for technical reasons.
Note that the point of using s-expressions as a front-end would be programmatic generation, not by-human editing. (Neither sexpressions nor xml is actually all that friendly for editing text.)
Do XML folks really think write front-ends for ease of editing?
How does he know they didn't?