That arrays point is such a weakness in XML that I rarely see addressed. Arrays and lists are such a common data structure in almost every programming language of the past 40 years that not having first class syntax for representing them is absurd and a huge weakness that makes XML a non-starter for me.
The qualities of sets that arrays don't have (and vice versa) are irrelevant to the point of neither being implicitly representable in XML.
You're either providing complex objects as properties or you're providing a list of complex objects. Worse, you can have a combination of both. Without a schema it is not possible to infer whether either or both is happening.
No you've misunderstood my point. This doesn't work for cases where one child is in fact a property that is a complex object.
XML claims to solve the problem of attributes vs children but then falls short at the first hurdle by not discerning between a single complex object as an attribute and an array of complex objects as children.
JSON and YAML do not have this problem as they are explicit in their representation.
YAML example:
parent:
child: name
vs
parent:
- child: name
Try converting each of these to JSON. The former will give you an object property called child, the latter will give you an array property called child with one element
I think the verbosity is not a problem. For example if you compare
["string1", "string2"]
to
<list>
<e>string1</e>
<e>string2</e>
</list>
then each element has about four bytes overhead (<e> instead of " and </e> instead of ",) plus some overhead for the list itself that may be offset by putting the name of the list itself into the element.
However, the issue is that you have to write a custom parser. There is no direct mapping between your data structure and the XML file. This developer ergonomics is a big win for JSON and consequently YAML.
> There is no direct mapping between your data structure and the XML file.
i think that's by design tbh.
it's only a big win for JSON (and YAML) because the default case works OK - but every time someone has a problem parsing numbers in JSON (because the value is bigger than Integer.MAX in the host language), this is the cause.
Yes, I understand that (and I like XML as a format and XSLT 2.0 as a language). However, from the popularity of JSON, it seems that for most cases it's the easier choice.
Take any random REST API for example. If it returns JSON, you can integrate it more easily than if it returned XML. If you need special cases like large numbers (or date-times), you handle only those.
I'm confused? Integrating XML was fairly easy back in the day. If in a dynamic language, serialize into a DOM and then use xpath to get data out. If in a static language, parse into your objects.
With JSON, you can mostly do the same. Such that I don't necessarily see this as a huge advantage of XML, mind. Having a schema does have some advantages, though.
JSON maps only to javascript, but only because it was designed as a subset of javascript, for others you have to use DOM or serializers, then there's no difference between formats. For this matter, xml has generic serializers than can be used instead of writing custom one every time.
If you interpret the start and end tags of the child elements as syntax indicating the type of each value, then those tags are analogous to, say, the quotes that enclose a string literal. In other words, in
<foo>hello</foo>
<foo>world</foo>
the <foo> and </foo> serve the same purpose as the double quotes in
"hello",
"world"
with the added benefit that the type system can be much richer (i.e. not everything is just a nondescript string value).
And you don’t even need a comma to separate the values! ;)
The main reason i avoid any typeless language is dates... how do i represent a date/time including a time zone has been badly reinvented so many times. A string type is never the way to go there in my opinion.
One of the classic lessons of the Falsehoods Programmers Believe about Time is that in general you can't correctly do better than simply storing the user's input (and the instant and place they entered it from) verbatim, unless you know something more about what they were entering. It's usually fine to store times in the past as a timestamp since the epoch plus a location, but the meaning of "2025-01-28 15:00 in Europe/London, for the purpose of a meeting that's being hosted there but is accessible by video call" is much more subject to change when e.g. countries change time zone. It's also not necessarily the same as "the absolute point in time 2025-01-28 15:00 assuming London's time zones stay as predicted since I entered this on 2023-09-21" or "2025-01-28 15:00 in Europe/London, for the purposes of a meeting that's being hosted in Lisbon but which I'm accessing by video call from London" (because then the Lisbon local time is the source of truth, not the London one, if Lisbon changes time zone).
The nice thing about JSON, TOML, and YAML is they have implicit structures for arrays and key/values encoded within them.
XML has a lot of different ways of representing data that way, which is what makes it a challenging configuration file format.