Error messages are difficult to do right, and it's one area where (for example) DSL's tend to fall down. You might have a beautiful DSL, and think that it's finished, because you - as the designer - don't make mistakes with it (perhaps because you're really smart; really know the tool; or really haven't used it). Even some fully fledged languages have poor error reporting.
For a grammar specification language (like XML Schema) to do a really good job, it really should also formalize how to specify error messages for that particular grammar. I'm not sure how hard it would be to do this, and I haven't seen any research on it.
An odd thing about XML Schema is that it's not very resilient - when this was supposed to be one of the cool thing about "extensible" XML. The next version is a little better at this. But it sounds like in your case, you wanted to get an error (because there was a real problem), it's just that you couldn't trace where it came from, or what its meaning was in terms of the system. It sounds like a hard problem. BTW: would using JSON or sexps have made this problem any easier? I think it's much deeper than that.
Agreed about errors. A good error-handling design for system X often needs to be nearly as complex as the design of X itself, and more importantly, needs to have the same "shape" as that design; it needs to fit the problem that X solves, speak the "language" that X and the users of X speak. Typically the amount of work involved, and the importance of it, are badly underestimated. Usually people work on what they think of as the cool parts and neglect the rest. (This is the reason DSL error handling tends to suck.) Maybe they try to hack the rest in later. By then it's much harder -- you have to rework the kernel to allow for the right kind of hooks into it so your error messages can have enough meaning. The advent of exceptions, by the way, was a huge step backward in this respect. It made it easy to just toss the whole problem up the stack, metaphorically and literally!
Getting back to XML... it's unsurprising that XML Schema isn't resilient. It's one of the most rigid technologies I've ever seen. Rigidity gets brittle as complexity grows. The error-handling fiasco of XML Schema isn't an accident. It's revealing of a core problem. Don't think you can sidestep the issue just by saying, well it's hard. :)
Would using JSON or sexps have made this problem any easier?
Sure. I've done it both ways, there's no comparison. It's not the data format alone that makes the difference, but the programming and thinking style that the format enables. These structures are malleable where XML is not. Malleability allows you to use a structure in related-but-different ways, which is what error handling requires (a base behavior and an error-handling one). It also makes it far easier to develop these things incrementally instead of having to design them up front. So this is far from the only problem they make easier.
With malleability, I think you're talking about low-level control, where you work directly in terms of the data structures that will be serialized as JSON. You might be translating between the domain data structures and the JSON structure; or they might appear direction as JSON. This is malleable in that you tweak it however you want; and it's simple in that you have direct access to everything. You can do validation in the same way. If something goes wrong, you have all the information available to deal with it as you see fit.
The wire format doesn't affect this approach - it could be JSON or XML. However, JSON and data structures maps cleanly, because it's an object format already. To do the same thing with XML requires an extra level, and you get a meta-format like xmlrpc, which is pretty ugly.
So I think you're talking about a kind of object serialization, with object-to-object data binding.
XML Schema is an attempt to factor out the grammar of the data structures, so that they can be checked automatically, and other grammar-based tasks can be automated. I think this is a worthy quest, succeed or fail. One specific failing we discussed was error messages.
I'm trying to grasp your point of view, and presenting what I think it is, so you tell me if I got it right or not (assuming you see this reply).
Incidentally, I was just parsing some XML Schema documents, and the error messages were more helpful than I expected - it gave the rule of the grammar that was causing problems. However, this rule looked like it was taken from the English specification of XML Schema, when it could be (and should be) automatically inferred from the machine readable version of the grammar (i.e. the XML Schema for XML Schema documents).
Interesting, thanks. Would you say this "malleability" issue would be addressed if the XML could be bound (databinding) to arbitrarily different object structures?
BTW: I meant that specific problem you mentioned (which was a non-conforming xml document) - how would JSON/sexps make that specific one easier to solve?
For a grammar specification language (like XML Schema) to do a really good job, it really should also formalize how to specify error messages for that particular grammar. I'm not sure how hard it would be to do this, and I haven't seen any research on it.
An odd thing about XML Schema is that it's not very resilient - when this was supposed to be one of the cool thing about "extensible" XML. The next version is a little better at this. But it sounds like in your case, you wanted to get an error (because there was a real problem), it's just that you couldn't trace where it came from, or what its meaning was in terms of the system. It sounds like a hard problem. BTW: would using JSON or sexps have made this problem any easier? I think it's much deeper than that.