
XML is not S-Expressions - yters
http://www.prescod.net/xml/sexprs.html
======
mindslight
XML definitely adds features over sexps, and is good for adding markup to
primarily text-based documents, where the concept of bare text has an
intuitive meaning, and elements denote a formally defined machine-readable
layer on top.

However, the fight occurs when XML is used as a generic serialization format,
or even as syntax for a programming language. There, the features only add
redundant complexity (what does unquoted text mean in an ant build file? why
am I putting quotes around identifiers, and why am I including the redundant
end tags when my editor indents for structure?). When your format already has
specific semantics, the wiggle room between element/attribute/text only adds
inflexible accidental complexity.

~~~
11ren
JSON is another contestant in the generic data serialization space, which is
doing better than sexp - possibly due to the ubiquity of javascript (free-
riding the web). Though, for this technology generation, the fight is over.

I think there'll be another chance when SOA/web services are superseded; but
there's still huge scope for improvement within their present
architecture/ecosystem (e.g. REST vs. SOAP vs. ?).

For programming languages, sexps are competing against all the other language
syntaxes out there, not just XML. I agree that it seems odd that ant uses XML
though... perhaps the extensibility of ant is easier with some kind of generic
format (like XML/sexp/JSON)? yet other languages manage to be extensible via
functions/classes/modules etc.

Thank goodness no one uses JSON to encode a language (in the way that ant uses
XML).

~~~
gruseom
_Thank goodness no one uses JSON to encode a language (in the way that ant
uses XML)._

Heh. Of course, they don't need to. As the French guy says in the Holy Grail,
"we've already got one, it's very nice!"

~~~
11ren
If Javascript used JSON in the way I described, it would look _something_ like
this ( _view page source_ for original):

    
    
      {
        "function": {
           "signature": {
             "name": "byId",
             "args": ["id"]
           }
           "body": {
             "return": {
               "call": {
                 "target": "document",
                 "method": "getElementById",
                 "args": ["id"]
            }}}
      }}

~~~
ynniv
Lets assume that an identifier array that starts with the word "function" is a
function declaration, and has the additional place arguments _name_ ,
_argList_ , and _body_. Now we get something like this:

[ "function", "byId", ["id"], [ "call", "document", "getElementById", ["id"] ]
]

~~~
11ren
Yeah, I was thinking that. You're also omitting an explicit _return_ , I would
guess by assuming the value of the last statement (as an expression) is to be
returned (this won't work in general, because Javascript allows multiple
returns, like C or Java). It's closer to a lisp syntax, by using nested lists
instead of structs/maps. I submit that it's against the spirit of JSON to be
able to name the values, and then not use that ability. I think you're
Greenspunning it ;-). Sure is shorter though. :-)

~~~
ynniv
I could include multiple statements and allow returns like this:

["function", "byId", ["id"], [["return" ["call", "document", "getElementById",
["id"]]]]]

(basically lisp syntax substituting JS array notation.) My point is that JSON
is capable of being terse, and XML is not. XML attributes are unordered, so
you have to use child nodes, which have to be named. The best you could do is:

<func><sig name="byId"><arg name="id"/></sig><body><return><call
object="document" method="getElementById"><args><variable
name="id"></args></call></return></body></func>

Which is significantly longer than a positional JSON serialization. XML is
also harder to implement a parser for, and existing libraries tend to be
difficult to use (python etree and ruby's implementations are a much better
direction). Now, someone else's raw XML is often easy to understand, whereas
my array based JSON format would clearly require domain knowledge. Because of
this, I prefer JSON for small network requests that are consumed by scripting
languages.

For larger, disk files, the overhead of XML is marginalized, and the extra
formatting might help in hand editing and error correction.

As for Greenspunning, I think its a perspective issue. The example was one of
code serialization, so the lisp syntax is particularly well suited to the
problem. Programmers also have the domain knowledge, so the less verbose
format is still easy to understand.

------
gruseom
The article is refuted by both reality \-- the godawful messes that people
have made out of XML, the complexity of the systems built on top of it, its
failure to amount to more than a bloated machine format, and the exodus in
favor of more lightweight notations like JSON -- as well as by itself: the
author expects the reader to consider the proliferation of single-shot
technologies built on XML as a good thing and not a Tower of Babel. He lists
DTDs, RELAX, XML Schema, XPointer, XQuery, XPath (edit: whoops, I snuck that
one in), XSLT, and CSS and cites the "decades of person-effort embodied in
those specifications" as if it were an argument in their favor!

I found this statement in the article interesting: "The central idea of the
XML family of standards is to separate code from data." It explains why all
the systems that express programming constructs in XML are such monstrosities
(including XSLT, which he cites approvingly and yet which totally violates
this separation). I wonder what the author would say to the people who do this
kind of thing? They're not using it according to specification?

Edit: some of the arguments are out of date, too. I don't know anything about
Lisp documentation in LaTeX; the open-source Lisp world tends to generate HTML
documentation from s-expressions, as for example here:
<http://www.weitz.de/cl-ppcre/>.

~~~
11ren
Do you have references for that exodus?

By Google, XML is winning 50 to 1 - but declining, and JSON is growing:
[http://www.google.com/trends?q=xml%2C+json&ctab=0](http://www.google.com/trends?q=xml%2C+json&ctab=0)

However, a factor is that people already know about XML and don't need to
search for it. e.g. HTML is declining even faster:
[http://www.google.com/trends?q=xml%2C+json%2C+HTML&ctab=...](http://www.google.com/trends?q=xml%2C+json%2C+HTML&ctab=0&geo=all&date=all&sort=0)

~~~
gruseom
Ooh, duelling URLs, can I play? :)

<http://www.google.com/trends?q=xml%2C+javascript>

 _Do you have references for that exodus?_

Not really. I'm talking about what I observe in the hacker world, which is a
thoroughgoing trend away from XML. Do you really see it otherwise? It's not
all going to JSON of course.

Most of the XML stuff is in big enterprise projects and, for some value of
"count", those just don't count. Last I checked the IT pundits were declaring
SOA dead, after having milked it for a decade.

~~~
11ren
Hmmm... I don't think hackers ever went _towards_ XML. The old C hackers hated
it (too inefficient.)

The nice thing I see in XML is that it abstracts out grammars (using XML
Schema / DTD). For JSON, a grammar isn't used - it's a nested tuple
transmission format - sort of a dynamic type systems, but without, er, types -
just tuples that can contain anything. It's agile, and all you need in many
cases. And JSON is a natural for web-client stuff.

BTW: who said SOA is dead? SOA doesn't solve any pressing problem, but all the
vendors switched to it.

~~~
gruseom
By the way (I can't resist one more comment), apropos this:

 _The nice thing I see in XML is that it abstracts out grammars (using XML
Schema / DTD)_

Have you ever used XML Schema on a real project? I tried, on a nice meaty
project, for perhaps a year. It turned out to be as awful to work with in
practice as it sounds good in theory. It's the kind of thing people write
design specs for, and then after the standards are ratified they write books
about it, without ever actually themselves building anything. Meanwhile, pity
the poor schmucks who get the book and try to use it on a real system,
wondering what they're doing wrong for a year until they finally figure out
that the problem isn't them.

To give you an example: what happens when you give one of these systems a
chunk of data that doesn't match its nicely specified schema? Well, with the
tools we were using at the time, you get something like "Error type error the
int at position 34,1 of element XSD:typeSpec:int32 type invalid blah blah
blah". What can a system do with that other than tell its poor user, "Invalid
data"?

Now I suppose you'll tell me that we just picked the wrong validator. :)

~~~
11ren
Error messages are difficult to do right, and it's one area where (for
example) DSL's tend to fall down. You might have a beautiful DSL, and think
that it's finished, because you - as the designer - don't make mistakes with
it (perhaps because you're really smart; really know the tool; or really
haven't used it). Even some fully fledged languages have poor error reporting.

For a grammar specification language (like XML Schema) to do a really good
job, it really should also formalize how to specify error messages for that
particular grammar. I'm not sure how hard it would be to do this, and I
haven't seen any research on it.

An odd thing about XML Schema is that it's not very resilient - when this was
supposed to be one of the cool thing about "extensible" XML. The next version
is a little better at this. But it sounds like in your case, you wanted to get
an error (because there was a real problem), it's just that you couldn't trace
where it came from, or what its meaning was in terms of the system. It sounds
like a hard problem. BTW: would using JSON or sexps have made this problem any
easier? I think it's much deeper than that.

~~~
gruseom
Agreed about errors. A good error-handling design for system X often needs to
be nearly as complex as the design of X itself, and more importantly, needs to
have the same "shape" as that design; it needs to fit the problem that X
solves, speak the "language" that X and the users of X speak. Typically the
amount of work involved, and the importance of it, are badly underestimated.
Usually people work on what they think of as the cool parts and neglect the
rest. (This is the reason DSL error handling tends to suck.) Maybe they try to
hack the rest in later. By then it's much harder -- you have to rework the
kernel to allow for the right kind of hooks into it so your error messages can
have enough meaning. The advent of exceptions, by the way, was a huge step
backward in this respect. It made it easy to just toss the whole problem up
the stack, metaphorically and literally!

Getting back to XML... it's unsurprising that XML Schema isn't resilient. It's
one of the most rigid technologies I've ever seen. Rigidity gets brittle as
complexity grows. The error-handling fiasco of XML Schema isn't an accident.
It's revealing of a core problem. Don't think you can sidestep the issue just
by saying, well it's hard. :)

 _Would using JSON or sexps have made this problem any easier?_

Sure. I've done it both ways, there's no comparison. It's not the data format
alone that makes the difference, but the programming and thinking style that
the format enables. These structures are malleable where XML is not.
Malleability allows you to use a structure in related-but-different ways,
which is what error handling requires (a base behavior and an error-handling
one). It also makes it far easier to develop these things incrementally
instead of having to design them up front. So this is far from the only
problem they make easier.

~~~
11ren
With malleability, I think you're talking about low-level control, where you
work directly in terms of the data structures that will be serialized as JSON.
You might be translating between the domain data structures and the JSON
structure; or they might appear direction as JSON. This is malleable in that
you tweak it however you want; and it's simple in that you have direct access
to everything. You can do validation in the same way. If something goes wrong,
you have all the information available to deal with it as you see fit.

The wire format doesn't affect this approach - it could be JSON or XML.
However, JSON and data structures maps cleanly, because it's an object format
already. To do the same thing with XML requires an extra level, and you get a
meta-format like xmlrpc, which is pretty ugly.

So I think you're talking about a kind of object serialization, with object-
to-object data binding.

XML Schema is an attempt to factor out the grammar of the data structures, so
that they can be checked automatically, and other grammar-based tasks can be
automated. I think this is a worthy quest, succeed or fail. One specific
failing we discussed was error messages.

I'm trying to grasp your point of view, and presenting what I think it is, so
you tell me if I got it right or not (assuming you see this reply).

------
anamax
From the article: "Nor is it an accident of history that Lisp programmers
never came up with these technologies for Lisp data. The central idea of the
XML family of standards is to separate code from data. The cental idea of Lisp
is that code and data are the same and should be represented the same."

No, that's not the central idea of lisp. The closest "central idea of lisp" is
that code should have a standard and convenient representation so it can be
readily manipulated by programs.

Lispers separate application code from application data. They do so even when
the application data is a program....

It is true that Lisper's didn't engage in years of meta-whinging and defining
transform languages, but the fact that the XML folk did was more of an
accident of history. When the Semantic Web hype was in full-swing, the Lispers
were still licking their wounds from the AI winter.

And, how is that Semantic Web coming along? Pretty much where the Lisper's
left it....

------
jbert
No love for the binary formats?

Let's hear it for ASN.1 (<http://en.wikipedia.org/wiki/ASN.1>), so good that
google can reinvent it as protocol buffers
(<http://code.google.com/apis/protocolbuffers>) and get people interested.

More seriously, yes, binary formats have their problems. And ASN.1 in
particular suffered/suffers from being pre-unicode (and thus having a number
of different + mostly useless character set types).

But it seems to me that this nested tag-length-value structure (ASN.1 and
protocol buffers) occupies a design sweet spot.

------
jodrellblank
He makes some good points that I didn't consider when first I hated upon XML.
I agree that XML is better for its problem domain than s-expressions, but in
that case the whole article is an apples to oranges comparison.

He claims that syntax is important, otherwise we'd still be using binary
formats. If that's true then the only reason syntax is important is for
humans, because machines can read binary formats just as well as anything
else. So why compare it to s-expressions instead of to a human-friendly ideal?
Is XML the best human-friendly markup language possible? Yes his paragraph is
similarly readable in both forms, but would it be if wrapped in XML namespace
clauses and an XML Schema? Would the s-expression version be if wrapped in a
macro to parse text outside quotes as significant text?

The purpose of XML shouldn't be to make everything more XML-y, but to make
everything easier. Maybe it does, in big enterprisey systems and between large
data silos. Look at the recent ODF v. Office XML file formats. Have any of you
been tempted to process Microsoft Office documents directly because now
they're XML they must be easier than the old binary formats?

It's still not human friendly - you can't use the Jabber protocol by hand over
telnet like you can use POP3, SMTP, IRC.

~~~
11ren
> you can't use the Jabber protocol by hand over telnet like you can use POP3,
> SMTP, IRC.

There's an intriguing view that text can't be used by hand either - you have
to use a "text shell". Wouldn't a fair comparison use an "XML shell"? If it
knew the schema, it could even autocomplete/intellisense for you, showing you
the available choices at that point... it's pretty cool how XML has factored
out the grammar of a language, in a reusable way.

I don't know the Jabber protocol, but the difficulty I've experienced with
web-based XML protocols (web services) is that the http header needs the
length of the message, which is hard to do by hand - it's not due to XML (and
a "http shell" would fix that...)

~~~
hassy
http-twiddle mode for Emacs is handy for tinkering with web-based XML stuff:

<http://github.com/hassy/http-twiddle/>

------
Hexstream
"The XML one does not use standard human-punctuation characters as markup. It
doesn't require quoting of apostrophes, double quotes or parentheses."

(:p "Lisp doesn't require quoting of apostrophes (') or parentheses (())
either...")

------
voidpointer
I think when arguing about XML, many people forget its relationship to SGML.
Many of the drawbacks in XML are design decisions made in order to maintain
backwards compatibility with SGML tools that were in use when XML was devised.

In order to simplify parsing (and in order to enable parsing a well formed
document without knowing its DTD), XML threw out much of the syntactic sugar
that SGML had to offer such as implicitly closed tags. I.e. Many people think
that <tr><td>foo<td>bar... is invalid HTML because of missing </td>. In fact,
it is perfectly valid because the HTML (SGML-)DTD specifies that a <td>-tag
implicitly closes a preceding <td>. Most think it's a browser-hack. It is not.

What both XML (and SGML) add over S-EXPR is support for expressing grammers, a
well-defined validation mechanism and quite good support for coping with
different character encoding.

------
wingo
I'm surprised no-one mentioned SXML, Kiselyov's representation of XML in
Scheme. His paper, "A better XML parser through functional programming", can
be found here: <http://okmij.org/ftp/papers/XML-parsing.ps.gz>.

The introduction covers the difficulty of parsing XML fairly well, which is
another way of saying that XML does have a fairly complex "model" behind it
that Lispers often ignore.

~~~
anamax
Lisper's know that XML has a fairly complex model behind it. That's part of
the argument against XML because the complexity doesn't seem to buy much.

------
asciilifeform
Erik Naggum on XML vs. S-Expressions:

"They are not identical. The aspects you are willing to ignore are more
important than the aspects you are willing to accept. Robbery is not just
another way of making a living, rape is not just another way of satisfying
basic human needs, torture is not just another way of interrogation. And XML
is not just another way of writing S-exps. There are some things in life that
you do not do if you want to be a moral being and feel proud of what you have
accomplished."

(<http://www.schnada.de/grapt/eriknaggum-xmlrant.html>)

~~~
eru
Pretty strong words for such a trifle.

~~~
asciilifeform
Yet it is not a trifle.

Wasting what might amount to thousands of man-years by forcing talented people
(who might otherwise accomplish something meaningful) to build workarounds for
an inferior technology is every bit as criminal as, say, embezzlement.

~~~
eru
By the same logic, QWERTY forcing talented people to develop RSI should get
the same treatment..

~~~
asciilifeform
I can see no reason why not.

------
11ren
I have a corollary to Greenspun's Tenth Law, which is that:

    
    
      Lisp programmers see everything in terms of as an ad hoc,
      informally-specified, bug-ridden, slow implementation of half of Lisp,
      and don't see other benefits it might have.
      That is, Greenspun's Tenth Law is true - for Lisp programmers.
    

I came to this conclusion because of a tragic pair of research papers, which
had a fantastic usability idea. The second half of the first paper took the
focus off the usability, and developed it into a very simple functional
language. In their next paper, they dropped the fantastic usability idea
completely, and made it into a lisp. :-(

Some XML standards fell into a similar trap, by wanting languages that process
XML to be themselves written in XML - such as XSLT. It's a nice abstract
concept to be able to process yourself... but at the price of abominations
like "i &lt; 10".

Adam Bosworth pointed out that XML's XPath resisted this - by making XPath
itself an embedded non-XML mini-language. Imagine an XML representation of
path components - now _that_ would be verbose!

In a politically expedient move, I'd like to point out that pg didn't fall
into this trap: he made the DSL for users of Viaweb to customize their store
to _not_ be lisp (though an easily mapped subset, if I understand correctly.)
It's a non-lisp mini-language.

Great link from the article, about "principle of least power", for mini-
languages: <http://www.w3.org/DesignIssues/Principles.html#PLP> Constraints
are very empowering, because you know what to expect.

Regarding XML: I'd always thought it was just one of many possible syntaxes
for representing hierarchical data; and it really didn't matter which syntax
you used. As in a lingua franca (or any standard), provided that it is barely
adequate, the key thing is that everyone agrees on it. XML became the Chosen,
de facto standard, because everyone was already familiar with HTML, propelled
by the mass adoption of the web. So the question becomes: why did we get HTML
(based on SGML), instead of S-expressions? The article gives reasons, but I
guess the short of it is that if a group of people work towards a specific
purpose for years, and are successful at it (as SGML was), it is probably a
good base to start from if you want to do something similar, i.e. describe
documents.

Also, more directly, if I imagine a large webpage described with
S-expressions, I think HTML is a bit clearer.

Nitpick: The article omits that quotes (or apostrophes) must be escaped in XML
attributes.

Very telling points about LaTeX - that like XML/HTML, it also uses named end-
tags; and that Lisp documentation itself is used LaTeX instead of
S-expressions - drinking their kool-aid; but not eating their dog-food.

~~~
anamax
> Very telling points about LaTeX - that like XML/HTML

No, it's a demonstration of ignorance. LaTeX wasn't written by Lispers, it is
merely used by them. The fact that they find its design decisions acceptable
must be weighed against the cost of their alternatives. That doesn't imply
that they wouldn't have been happier with a more lispish syntax.

At the time that those decisions were made, LaTeX was pretty much the best
alternative. The fact that Lispers, like almost everyone else in related
communities, made that decision merely says that Lispers don't cut off their
noses to spite their face.

~~~
11ren
The article suggests Lispers could have used sexp as a front-end to LaTeX, in
the same way that XML was used as a front-end to LaTeX. Very easy to do.

> If S-expressions were easier to edit, it would be most logical to edit the
> document in S-expressions and then write a small Scheme program to convert
> S-expressions into a formatting language like LaTeX. This is, what XML and
> SGML people have done for decades [...]

~~~
anamax
Note that such front-ends are inherently leaky. If all they do is transform
syntax, they're probably a bad idea.

Note that the point of using s-expressions as a front-end would be
programmatic generation, not by-human editing. (Neither sexpressions nor xml
is actually all that friendly for editing text.)

Do XML folks really think write front-ends for ease of editing?

~~~
eru
S-Expressions are OK to work with as a human if you have a decent editor that
helps you indent and balance parens.

------
echair
...but it would be better if it were.

