A thought exercice: XML entities can be self referencing, have you ever though of how you validate a document that has recursive entity definitions ?
That's one quirk that comes first to mind, but the XML format definition is probably at leat a few hundred pages and I'm hopeful there's enough in there to fuel anyone's worst nightmares.
It's kinda interesting in itself that XML is seen as a plain and boring data format. I don't wish on anyone to be the receiving end of an XML typing genius' documents.
The XML specification does contain a schema: the DTD. This is why it is 37 pages; without the DTD part it would be at most a half of that.
Other schemas are not merely popular, they are more powerful. In DTD the type of the element depends only on its own name. In XSD the type depends potentially on the context: the type of 'xxx' in 'aaa/xxx' and 'bbb/xxx' may be different. And in Relax NG we can define different types, for example, for the first 'xxx' and for the subsequent 'xxx's. These extensions make the validation somewhat more elaborate, but still linear, as it we remain within the complexity of a regular expression. These are formal validators; then there is Schematron which seems to be more like a business-rule kind of a validator that also has its uses.
I think I was seeing all the XSLT and XQuery and all the kitchen sink directly bound to XML, but those are just meta tools that don't bear directly on the language.
A thought exercice: XML entities can be self referencing, have you ever though of how you validate a document that has recursive entity definitions ?
That's one quirk that comes first to mind, but the XML format definition is probably at leat a few hundred pages and I'm hopeful there's enough in there to fuel anyone's worst nightmares.
It's kinda interesting in itself that XML is seen as a plain and boring data format. I don't wish on anyone to be the receiving end of an XML typing genius' documents.