Error messages are difficult to do right, and it's one area where (for example) ...

gruseom · on Jan 13, 2009

Agreed about errors. A good error-handling design for system X often needs to be nearly as complex as the design of X itself, and more importantly, needs to have the same "shape" as that design; it needs to fit the problem that X solves, speak the "language" that X and the users of X speak. Typically the amount of work involved, and the importance of it, are badly underestimated. Usually people work on what they think of as the cool parts and neglect the rest. (This is the reason DSL error handling tends to suck.) Maybe they try to hack the rest in later. By then it's much harder -- you have to rework the kernel to allow for the right kind of hooks into it so your error messages can have enough meaning. The advent of exceptions, by the way, was a huge step backward in this respect. It made it easy to just toss the whole problem up the stack, metaphorically and literally!

Getting back to XML... it's unsurprising that XML Schema isn't resilient. It's one of the most rigid technologies I've ever seen. Rigidity gets brittle as complexity grows. The error-handling fiasco of XML Schema isn't an accident. It's revealing of a core problem. Don't think you can sidestep the issue just by saying, well it's hard. :)

Would using JSON or sexps have made this problem any easier?

Sure. I've done it both ways, there's no comparison. It's not the data format alone that makes the difference, but the programming and thinking style that the format enables. These structures are malleable where XML is not. Malleability allows you to use a structure in related-but-different ways, which is what error handling requires (a base behavior and an error-handling one). It also makes it far easier to develop these things incrementally instead of having to design them up front. So this is far from the only problem they make easier.

11ren · on Jan 17, 2009

With malleability, I think you're talking about low-level control, where you work directly in terms of the data structures that will be serialized as JSON. You might be translating between the domain data structures and the JSON structure; or they might appear direction as JSON. This is malleable in that you tweak it however you want; and it's simple in that you have direct access to everything. You can do validation in the same way. If something goes wrong, you have all the information available to deal with it as you see fit.

The wire format doesn't affect this approach - it could be JSON or XML. However, JSON and data structures maps cleanly, because it's an object format already. To do the same thing with XML requires an extra level, and you get a meta-format like xmlrpc, which is pretty ugly.

So I think you're talking about a kind of object serialization, with object-to-object data binding.

XML Schema is an attempt to factor out the grammar of the data structures, so that they can be checked automatically, and other grammar-based tasks can be automated. I think this is a worthy quest, succeed or fail. One specific failing we discussed was error messages.

I'm trying to grasp your point of view, and presenting what I think it is, so you tell me if I got it right or not (assuming you see this reply).

11ren · on Jan 17, 2009

Incidentally, I was just parsing some XML Schema documents, and the error messages were more helpful than I expected - it gave the rule of the grammar that was causing problems. However, this rule looked like it was taken from the English specification of XML Schema, when it could be (and should be) automatically inferred from the machine readable version of the grammar (i.e. the XML Schema for XML Schema documents).

11ren · on Jan 13, 2009

Interesting, thanks. Would you say this "malleability" issue would be addressed if the XML could be bound (databinding) to arbitrarily different object structures?

BTW: I meant that specific problem you mentioned (which was a non-conforming xml document) - how would JSON/sexps make that specific one easier to solve?