Hacker News new | past | comments | ask | show | jobs | submit login

My issue with xml for configuration files is not understanding when to use elements vs attributes to describe data.

The nice thing about JSON, TOML, and YAML is they have implicit structures for arrays and key/values encoded within them.

XML has a lot of different ways of representing data that way, which is what makes it a challenging configuration file format.




That arrays point is such a weakness in XML that I rarely see addressed. Arrays and lists are such a common data structure in almost every programming language of the past 40 years that not having first class syntax for representing them is absurd and a huge weakness that makes XML a non-starter for me.


I wrote many configuration files and encountered maps and sets, but never arrays.


The qualities of sets that arrays don't have (and vice versa) are irrelevant to the point of neither being implicitly representable in XML.

You're either providing complex objects as properties or you're providing a list of complex objects. Worse, you can have a combination of both. Without a schema it is not possible to infer whether either or both is happening.


> not having first class syntax for representing them

i think you're mixing representation of xml data vs the representation of them in a programming language.

XML does have arrays. They care called child elements.


No you've misunderstood my point. This doesn't work for cases where one child is in fact a property that is a complex object.

XML claims to solve the problem of attributes vs children but then falls short at the first hurdle by not discerning between a single complex object as an attribute and an array of complex objects as children.

JSON and YAML do not have this problem as they are explicit in their representation.

YAML example:

    parent:
        child: name
vs

    parent:
      - child: name
Try converting each of these to JSON. The former will give you an object property called child, the latter will give you an array property called child with one element


I'm not sure about that - I think your second example will parse to "parent": [{"child": "name"}] in JSON


Yeah this was what I was going for, same point stands


Aren't XML child elements pretty much the most verbose way you can represent an array though?


I think the verbosity is not a problem. For example if you compare

    ["string1", "string2"]
to

    <list>
        <e>string1</e>
        <e>string2</e>
    </list>
then each element has about four bytes overhead (<e> instead of " and </e> instead of ",) plus some overhead for the list itself that may be offset by putting the name of the list itself into the element.

However, the issue is that you have to write a custom parser. There is no direct mapping between your data structure and the XML file. This developer ergonomics is a big win for JSON and consequently YAML.


> There is no direct mapping between your data structure and the XML file.

i think that's by design tbh.

it's only a big win for JSON (and YAML) because the default case works OK - but every time someone has a problem parsing numbers in JSON (because the value is bigger than Integer.MAX in the host language), this is the cause.


Yes, I understand that (and I like XML as a format and XSLT 2.0 as a language). However, from the popularity of JSON, it seems that for most cases it's the easier choice.

Take any random REST API for example. If it returns JSON, you can integrate it more easily than if it returned XML. If you need special cases like large numbers (or date-times), you handle only those.


I'm confused? Integrating XML was fairly easy back in the day. If in a dynamic language, serialize into a DOM and then use xpath to get data out. If in a static language, parse into your objects.

With JSON, you can mostly do the same. Such that I don't necessarily see this as a huge advantage of XML, mind. Having a schema does have some advantages, though.


JSON maps only to javascript, but only because it was designed as a subset of javascript, for others you have to use DOM or serializers, then there's no difference between formats. For this matter, xml has generic serializers than can be used instead of writing custom one every time.


If you interpret the start and end tags of the child elements as syntax indicating the type of each value, then those tags are analogous to, say, the quotes that enclose a string literal. In other words, in

    <foo>hello</foo>
    <foo>world</foo>
the <foo> and </foo> serve the same purpose as the double quotes in

    "hello",
    "world"
with the added benefit that the type system can be much richer (i.e. not everything is just a nondescript string value).

And you don’t even need a comma to separate the values! ;)


The main reason i avoid any typeless language is dates... how do i represent a date/time including a time zone has been badly reinvented so many times. A string type is never the way to go there in my opinion.


One of the classic lessons of the Falsehoods Programmers Believe about Time is that in general you can't correctly do better than simply storing the user's input (and the instant and place they entered it from) verbatim, unless you know something more about what they were entering. It's usually fine to store times in the past as a timestamp since the epoch plus a location, but the meaning of "2025-01-28 15:00 in Europe/London, for the purpose of a meeting that's being hosted there but is accessible by video call" is much more subject to change when e.g. countries change time zone. It's also not necessarily the same as "the absolute point in time 2025-01-28 15:00 assuming London's time zones stay as predicted since I entered this on 2023-09-21" or "2025-01-28 15:00 in Europe/London, for the purposes of a meeting that's being hosted in Lisbon but which I'm accessing by video call from London" (because then the Lisbon local time is the source of truth, not the London one, if Lisbon changes time zone).


What’s the best representation you’ve seen for dates?


Hasn't https://en.wikipedia.org/wiki/ISO_8601 basically won, at this point?


UNIX timestamp. Plus timezone string, in separate field/column, if it's important for the use case (like calendar events, etc).


Epoch time + GPS coordinates.

That's as unambiguous as you could possibly make it IMO.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: