> What I don't understand is why anyone thought using XML that way was a good idea, and why it still is popular in the enterprise. Bad habits are hard to break, I guess.
Namespaces, which then gives you easy answers for Internationalisation (xml:lang), a subject-predicate-object data structure (RDF), which can lead on to logical meaning/modelling of data (RDFS/OWL), which then lets you look at harder questions like trust/provenance.
There's also schema validation (XSD), transformation (XSLT), which then provides you tools like XPath.
The real problem is not syntax, its communication between groups with differing experiences and interests - how do I know your messages mean the same thing as what my system expects?
If you prove to be malicious, do I have to write a strict validator before I trust your input?
If you want to ensure your messages are well formed before they are sent, do you also have to write a validator?
How do I know our validators are checking the same things?
If you want to send a large document oriented data structure, but I only care about a specific section relating to my interests; do I have to understand where to look and what all of the surrounding material is; or can I query for the relevant bits?
On the more complicated RDF side of things - if you want to share identifiers with me, how do we both avoid calling everything record id=1?
If we are both talking about the same thing but know different parts of the story, how can I recognize your information as describing the same thing I know about?
If we both know about the same Thing, and know certain logical facts about that Thing, can we check those facts actually make sense against shared rules?
If we both know about the same Thing, and can see a logical inconsistency in data, can we reason about which data to Trust and why?
Unfortunately, communicating properly is hard even with all of the tools to help.
We tend to opt towards subjecting systems to an ongoing fuzzing test because we don't value many of the above things - we tend to work in organisations with a short attention span focused on the now and a narrow set of interests.
It just kind of works for the 80% of the time, so we move on.
Contrast that with something like a library or museum, and you see why ideas like Dublin Core really catch on there.
Sounds great in theory. In practice it doesn't seem nearly as carefully implemented, and/or XML is used where it's actually not needed.
XML is designed to be a markup language. The fact that it has all of these other things bolted on doesn't actually make it a good generic data interchange format.
For things like RDF, maybe it's the best option we have, but that's not because XML is great, it's because XML was used in the only standardized option.
Looking at an example of xml:lang:
<?xml version="1.0" encoding="utf-8" ?>
<doc xml:lang="en">
<list title="Titre en français" xml:lang="fr">
<p>Texte en français.</p>
<p xml:lang="fr-ca">Texte en québécquois.</p>
<p xml:lang="en">Second text in English.</p>
</list>
<p>Text in English.</p>
</doc>
...this is a nightmare. If I want to translate a document, the last thing I want to do is embed each translation inline like that. Almost certainly the best response is to "fork" the document at the highest level and include separate language versions of the document; otherwise, if you have 20 translations of the document, you need 20x the text in the document than any one reader will need.
Yes, XML gives you that particular hammer. But using XML results in a lot of sore thumbs.
Schema validation is nice to be sure. I'm using JSON Schema Validation myself [1] to verify incoming JSON, and I'm automatically generating those schemas from the TypeScript data structure specifications [2]. This is particularly good for a JavaScript language target, of course, but I find XML and XPath to be ugly or painfully slow in every language I've used it from, while JSON just has a better impedance match to data storage and interchange.
Namespaces, which then gives you easy answers for Internationalisation (xml:lang), a subject-predicate-object data structure (RDF), which can lead on to logical meaning/modelling of data (RDFS/OWL), which then lets you look at harder questions like trust/provenance.
There's also schema validation (XSD), transformation (XSLT), which then provides you tools like XPath.
Most of that is on the front page for the technology: https://www.w3.org/standards/xml/
The real problem is not syntax, its communication between groups with differing experiences and interests - how do I know your messages mean the same thing as what my system expects?
If you prove to be malicious, do I have to write a strict validator before I trust your input?
If you want to ensure your messages are well formed before they are sent, do you also have to write a validator?
How do I know our validators are checking the same things?
If you want to send a large document oriented data structure, but I only care about a specific section relating to my interests; do I have to understand where to look and what all of the surrounding material is; or can I query for the relevant bits?
On the more complicated RDF side of things - if you want to share identifiers with me, how do we both avoid calling everything record id=1?
If we are both talking about the same thing but know different parts of the story, how can I recognize your information as describing the same thing I know about?
If we both know about the same Thing, and know certain logical facts about that Thing, can we check those facts actually make sense against shared rules?
If we both know about the same Thing, and can see a logical inconsistency in data, can we reason about which data to Trust and why?
Unfortunately, communicating properly is hard even with all of the tools to help.
We tend to opt towards subjecting systems to an ongoing fuzzing test because we don't value many of the above things - we tend to work in organisations with a short attention span focused on the now and a narrow set of interests. It just kind of works for the 80% of the time, so we move on.
Contrast that with something like a library or museum, and you see why ideas like Dublin Core really catch on there.