Hacker News new | past | comments | ask | show | jobs | submit login
TOML – Tom's Obvious, Minimal Language (toml.io)
146 points by pmoriarty on Sept 10, 2020 | hide | past | favorite | 160 comments



TOML superficially looks clean, but it has many of the same problems other similar languages have.

Such file formats are typically used for configuration files. Yet, a substantial amount of effort was expended on data types not commonly seen in configuration files, such as a date-time-offset. Meanwhile, much more common data types typically used in configuration files is missing, such as GUIDs, IP Addresses, and byte arrays.

Most disappointingly, TOML has a "table" type, but unfortunately, just like other "RPC as Config" formats, the column names need to be repeated for every row, which is just crazy. It also makes no sense to talk about "arrays of tables", which are actually just "rows" in standard terminology. An array of rows... is a table.

Here is a terse example of the type of thing that does come up very often in bulk provisioning of the type that needs config files full of data:

    @"
    Server,Subnet,IP
    db01,databases,10.1.2.3
    db02,databases,10.1.2.4
    user01,filers,10.3.10.1
    user02,filers,10.3.10.2
    "@ | ConvertFrom-Csv | New-VMBuildScript.ps1
I'm yet to see a config file format that can even approach this in terms of its readability and terseness.


It all depends on context. Honestly, I've rarely (never?) needed to store GUIDs, IP addresses, or byte arrays in a configuration file. So that's not as common of a situation as you're thinking - maybe for your experience, but there are many different uses for configuration.

The example you give would suck in any configuration file format, and honestly I don't believe it belongs in a configuration file - I'd put it into a data file (e.g. a CSV as you show) and then in the configuration file I'd include a pointer to that data file to be loaded separately.


Every server config ever requires an IP address entry which in itself is a non trivial proportion of all the user edited config files in the world.


But surely strings are valid? Unless you want some kind of static type checking/linting for eg an ipv4 vs ipv6 key? Strings are very portable as well (POD).

Plenty of real world applications use static hostnames and rely on DCHP to assign IPs. When you have systems that can fail hard and you need to replace a NIC, it saves having to update either a ton of config files, router configs, or your hosts file(s). Most networking library api calls also take strings as arguments for this reason. Not saying its a better solution, but there times when either an IP or a string hostname should be acceptable.

I've never had to use it, but Qt does have a hostname class, for an example of what such a dtype looks like. The class explicitly handles conversion for you:

https://doc.qt.io/qt-5/qhostaddress.html


IP is a configuration option for most server software, it’s just that it usually needs to be set to 0.0.0.0 and that is the default anyway so most admins just leave it out.


Lets use :: nowadays ;-)

After all we got ~16 % IPv6 traffic hitting our repository CDN. (seems to match with an observation of LWN https://lwn.net/Articles/808896/ )


> GUIDs, IP Addresses, and byte arrays.

In what situation is it not suitable to store these as strings, byte arrays as unbounded hex-encoded integers? 0xabcdef123400000000

As for your csv-example, I’d propose that if you have multi-value entries in a list like that large enough that it becomes unwieldy to repeat the property names every time (like in JSON or yaml), it really doesn’t belong in static configuration but in some kind of data store.

(And in the odd situation where that’s not the case, just use csv like you seem to prefer?)


It's about strong typing. I detest stringly-typed programming.

If the format specifies some stronger types, the decoders can "bubble that up" to the programming language that decode the file. E.g.: a guid turns up as a "System.Guid" type in C# instead of "System.String".

This matters, because all of these are the "same" GUID, but not all GUID parsers can handle all of these formats:

    {123e4567-e89b-12d3-a456-426652340000}
    (123e4567-e89b-12d3-a456-426652340000)
    123e4567-e89b-12d3-a456-426652340000
    123e4567e89b12d3a456426652340000
    urn:uuid:123e4567-e89b-12d3-a456-426655440000
Similarly, with IPv6 all of the following are the "same" address, but if encoded as a string it's hit-and-miss if the far end can properly handle all of the variants:

    fe80:0000:0000:0000:01ff:fe23:4567:890a
    fe80:0000:0000:0000:1ff:fe23:4567:890a
    fe80:0:0:0:1ff:fe23:4567:890a
    [fe80::1ff:fe23:4567:890a]
    fe80::1ff:fe23:4567:890a
    fe80::1ff:fe23:4567:890a%3
Similarly, for your hex-encoded binary example, you'd be shocked at how rare it is for programming languages to provide a "hex-string-to-byte-array" decoding function. If you roll your own, it'll be about 10x less efficient than something done properly using SIMD instructions.

Just recently I had to deal with Azure Resource Manager Templates, where even the distinction between numeric and text types has been eroded. There's several number fields that must be a string. There are also numeric fields that weirdly accept a string, but only if it is an ARM template expression that evaluates to a string containing a number at runtime.

Ugh...


Surely you shouldn’t be limited to types as interpreted and represented in the low-level config parser library but should have a strongly typed representation i of the config structure that you implement in whatever language you’re using.

C# is a good example. JSON doesn’t have any of these types. It doesn’t even make a distinction between integers and other numbers. Yet it’s common practice to marshal these into the kind of types you mentioned at ingestion.

At the end of the day, if you type it with your keyboard, it’s essentially a string.

I think it’s completely fine that all this functionality is part of neither programming languages core libraries nor configuration file formats - there is no one-size-fits-all here.

There are plenty of configuration frameworks that do what you ask for.


> At the end of the day, if you type it with your keyboard, it’s essentially a string.

I mean .. yes, but that's the exact chaos we're trying to mitigate? The different types may all get serialised as strings, but their semantics are different. And the purpose of strong typing is to make it harder for people to put semantically wrong things in the wrong place.

(This is semantic markup vs <font size=2> all over again, isn't it)


Deserializer provides that desired semantics. You can't express all types in markup language, because its type system is inherently limited.


> whatever language you’re using.

This is the #1 mistake people make when designing, evaluating, or critiquing frameworks, file formats, protocols, and the like.

I don't get to choose the language. I'm not writing 100% of the code that I use.

In fact, I have control over approximately 0.00001% of the code involved in processing a typical JSON file, or TOML file, or YAML file.

OTHER people control the language choice, and it certainly won't be ONE language. It'll be many languages.

If submit an ARM template to Azure, it'll go through at least three languages in the process: C#, Python, and JavaScript. Possibly C++ and F#. Who knows?

Even when the language is reasonably consistent, such as JavaScript, there is very little consistency in the specific parser used.

After the parser-level inconsistencies, there's further inconsistencies in how the stringly-typed data is converted into some more strongly typed format. That's just up to whoever wrote the code that consumes the data. There is exactly zero standardisation of this. None whatsoever. It's almost never documented, and there's just no way to know without experimenting.

NONE of this is in my control, or your control. I can't emphasise this enough.

Stop thinking in terms of a developer sitting down in front of an IDE with a new project, where they type all of the code in, hit compile and ship the binary to some customer.

Start thinking in terms of having to deal with inconsistencies between Terraform, Cloudformation, Amazon, AWS, GCP, and CloudFront.

Starting thinking about having to publish something like a Rust module to three different crate repositories, and cache it locally, and have it work properly with the various IDEs people use.

Start thinking in terms of security vulnerabilities of different parsers at different layers escaping data differently, or smuggling data using encodings accepted by the back-end past web application firewalls that don't understand that particular encoding.

This stuff matters.

JSON famously had a definition so short that it fit on a business card.

Less is more, right?

This "simplicity" lead to fun blog posts titled "Parsing JSON is a minefield": http://seriot.ch/parsing_json.php

The designer of TOML seems to have never read that article, or if he has, he hasn't learned any lessons from it...


Don't like stringly-typed programming or configuration? There's Dhall for you https://dhall-lang.org/

> Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports

It's specifically not Turing-complete so it has a number of safeties built in because of this. I've been using it more and more wherever possible as of late.


You're assuming that all languages have a "System.Guid" type that the configuration value can be parsed into.

If you want a strongly type configuration that works across many languages, you'll need to also provide the per-language implementation of those types to fill the gaps.

You'll notice that JSON, YAML, INI, and XML also do not have IP address nor GUID types.

What configuration format does meet your approval?


> What configuration format does meet your approval?

None at the moment.

My problem with all of them is that I can't scan them visually to spot typos or mistakes. They break the content up and move related data too far apart for the inconsistencies to be immediately apparent.

Don't laugh, but in the absence of better options, I prefer to use Excel. With tabular data, things that belong together are visually adjacent. There are no repeated headers, or unnecessary syntax. Formulas are available, but hidden by default.

In the past I liked to use XML, but only with Altova XmlSpy, which had a kind of "hierarchical table view" superficially similar to Excel, but for tree-like data.

Both of the above are easy to use from PowerShell. I even wrote myself an Import-Xlsx module that works without Excel having to be installed. It uses the .NET framework libraries for opening XLSX files. (Under they hood they're essentially just a Zip file containing plain XML data.)

Ideally?

I'd like something like the SQL Server Analysis Services MDX query language, but with a front-end more like Excel for the editing of the data.

It is purpose designed for multi-dimensional data, and in my field that's exactly what I have to do: provision every combination of a bunch of tables.

For example: For each data centre, for each availability zone, for each of PRD/UAT/DEV, deploy each of the following roles, with 'n' instances each.

This is a multi-dimensional cube, also known as a full outer product.

Naively expanding everything is not good either, there needs to be some sort of language for making "exceptions". Such as: The DEV environment is only in this location. UAT has fewer sewers than PRD. Etc...

That's where a nice language would work wonders. You'd want to be able to do things like:

    *.*.dev.sql.instances = 0
    us-west.*.dev.sql.instances = 2
That would set the dev instances everywhere to zero, except US West. It's selecting a hyperplane through the cube and setting all of the cells to the constant on the right. This is the kind of thing MDX is designed to do, but no "simple" config language ever can.

So instead you get monstrosities like the Azure ARM "copy" syntax: https://docs.microsoft.com/en-us/azure/azure-resource-manage...


It sounds like you should store your configuration in some sort of KVS (Consul or etcd) or database (SQL) with a frontend rather than flat text files. There exists tooling for all of that. If you're really going for Excel, why not even make a plugin that hooks in to it. Personally I'd find that horrible, but if it works for you and your team has no objections, why not.

I'd wager that the major reason no one has taken it that far is that your preferences are very rare.

But who knows, if you build it and make it available maybe others will find it useful too.

You're probably getting downvoted because unless I'm misinterpreting your previous comments, you're mixing up abstraction levels and concerns when you're talking about types.

Personally I think yaml is a terrible and confusing format in general (yes I do understand it quite well), I see your point with JSON, but I wouldn't personally say it's a fundamental issue the way you're describing it.


There is an awful lot of middle ground between the scale where it's faster to just click through a GUI, and the scale where a database engine makes sense.

Sure, if I were provisioning 10K+ more-or-less-but-not-entirely-the same objects, I'd be reaching for a SQL database engine of some sort.

For about 10 things I'd just grit my teeth and click through a GUI manually.

In between 10 items and 10K is where things get interesting. Inefficient tools can waste more time than they save. Full-on programming languages are right out. You have to consider dealing with people who aren't programmers. People that aren't DBAs.

This in-between-land of, say, 50-5000 instances is where configuration data files live. It's where Excel works well enough at the moment, but I feel that something better is just waiting to be invented.


To each their own, I guess. I’m guessing many people who work with the kind of environments you’re describing are like me in that I prefer everything as pure text and pipes (*sh/vim/sed/grep/awk/jq/tr/cut/xargs/curl/etc). It definitely is a longer learning curve but at some point everything becomes the same whereas with GUIs there’s a new flow for every system and there’s a bottleneck in how much you can manage manually and visually.

There’s no going back to GUIs when the shell and keyboard become and extension of your mind.

But I think you have a point in that there’s an big user base of somewhat-power-users that prefer visual interfaces and that there’s a better middle ground to be found. The biggest hurdle I see is that tools like that easily become outdated, have selective platform support, aren’t as extendable and customizable, etc. it’s a vastly bigger undertaking to make something that can work everywhere with everything in the way text UIs can.


From an enterprise background, I’ve seen it go wrong so many times. It’s worth it to script even 1 table change or 1 column schema definition. It’s kind of a “trust me” thing, but seriously, you get used to it. Your future “you”s will thank you.


There's https://www.gnu.org/software/recutils/ aimed for that scale.


> It's about strong typing. I detest stringly-typed programming.

I feel like that's something that's more an appropriate concern for the application, and not something that ought to be baked into the configuration format itself. For example:

> E.g.: a guid turns up as a "System.Guid" type in C# instead of "System.String".

Can you not just pass the System.String you get from the decoder into System.Guid's constructor and handle the resulting error should that string not actually be a GUID?

Like, my problem with a lot of config file formats is that they're frequently too clever about assuming types of things when I would much rather they be strings by default for me to interpret later.

> If you roll your own ["hex-string-to-byte-array" decoding function], it'll be about 10x less efficient than something done properly using SIMD instructions.

Unless you roll your own that uses those SIMD instructions, whether in hand-written assembly or in a programming language with a compiler smart enough to use SIMD for this.

This does, however, reek of a premature optimization.


> If you roll your own, it'll be about 10x less efficient than something done properly using SIMD instructions.

Is this really an issue when loading a field from a config file? If it was something happening 1m times per second then maybe I could see your point.


JSON is now at the point where yes, it does makes sense to optimise it with AVX2 instructions.

https://github.com/simdjson/simdjson


JSON is in a slightly different area since it's commonly used as a serialization format for RPC/REST so it's in a hot path. Optimizations there make a lot of sense; optimizations on reading your configuration file are almost certainly premature optimizations.

The fact that your config will load some tiny amount faster with those optimizations is a side effect of the hot path optimizations, not a requirement for config formats.


See https://tools.ietf.org/html/rfc5952 on how to write ipv6 addresses.


I think that's two different senses of the word "configuration". TOML is great for key/value data where you want a little bit of structure / nesting / categorization on the keys, but probably not more than one level or so. Think of, say, the preferences for your text editor, or (as widely used) a metadata file for a library that specifies its dependencies and build configuration.

It looks like you have tabular data where you don't need any nesting, every row has the exact same structure, and there's no single column of a row that you'd call "the value." TOML is not well-suited for this, and I don't think it sets out to be. I see how you'd call it "configuration," but it seems like a pretty different category of thing.

(I'd buy the argument that "table" is a poor choice of name for that TOML construct; I'd personally have called it a "multi-value" or something. It's an array of objects, and there's no guarantee that the objects have homogeneous structure, and it's usually useful for them to have different structures.)



  databases:
    db01: 10.1.2.3
    db02: 10.1.2.4
  filers:
    user01: 10.3.10.1
    user02: 10.3.10.2


What if I needed to add about 5-10 additional columns to that. Would TOML still have a similarly terse way of encoding that?

E.g.: Region, AZ, Network, Subnet, Description.


I think terseness and readability are often at odds past a certain point, and TOML seems to optimize for readability at the expense of terseness.

E.g., having to specify column names again and again is tedious to write, but makes the file easier to read and modify safely.


Yeah, I think toml is great short/simple config files, yaml is good for medium length/complexity, and I have yet to see anything that’s not painful for long config files.


There is always cutting to the chase and using SQLite.

https://www.sqlite.org/appfileformat.html


SQLite is lovely to work with, but I don't think it is entirely appropriate as a configuration format for server software considering that all of the existing tooling for configuration management (Ansible, Puppet, CFEngine) is based on template text files.


I dunno. By the time one manages a full enterprise of configuration, one puts it in SQLite and treat your cookbooks/roles/stacks as report products.

After normalizing, one could get to a tidy heterogeneous solution.

(Only the finest vaporware spoken here.)


I actually think it's almost the opposite for me.

In a tiny yaml file it's pretty easy to understand what everything is; it's huge files where toml's forcing you to full qualify nested tables becomes useful.

Too many huge yaml files where I'm just scrolled somewhere in the middle and have no clue what part of the tree I'm actually looking at.


Maybe our ideas of scale is a bit different?

I think yaml files hold's up well to about 50 lines.

I have only seen toml files up to about 10-20 lines and it looks great at that scale, but I am having a hard time imagining it being nice at 500+ lines.


Dhall bills itself as being designed for longer configuration files. It has functions, allowing for some abstraction, but it's not turing-complete.

https://github.com/dhall-lang/dhall-lang


I get what you are saying, but everything has trade offs.

Because everyone would agree for 80%, but then need different 20%.That's how you end up with 10k pages specs or XML and doc types.

And if you need that just use XML, that is actually what it was designed for.


I've been using TOML recently and what displeases me most is that almost all string values need to be quoted.


Array of objects is terrible in TOML. I grudgingly use it only because it's categorically better than YAML IMO and INI isn't powerful enough.


At least in YAML you only have to quote a string if it contains a colon-space and can therefore be inferred to be a dict. In TOML all strings are quoted: https://github.com/toml-lang/toml/issues/105#issuecomment-14...


Otoh, I got plenty of support requests from people not understanding that the space at the end of a line in an .ini file is part of the value. (app is using glib ini parser)

Quoting at least makes it clear what's part of the value.


What sort of nightmare format from hell has significant whitespace at end of line?


ini file


Ugh. If whitespace is wanted the value should be quoted. The glib ini parser is insane.


string quotation is a very interesting topic, one variation I would like in a config file is a single backtick that continues up to the first line break


That's not config data that you have here, that's your program input (hint: you pipe it to stdin).

TOML is for config, config that users edit by hand, not input.


That looks like hand-edited config to me. At least, I've maintained files similar to that in past gigs. It's a nice example of how CSV can work well if you don't need to worry about quoting.


1 - you don't pipe config to stdin. If you use standard input, that's your main input.

2 - config sets your program boostrapping state. It's not the main data your program processes. That's your main input. IP addresses are what the script processes. It's not config.


1 - If you google 'config stdin' you'll see a number of good reasons a program might want to accept configuration on stdin. (also, not that it matters, but my program read it from a file)

2 - Those IP addresses are probably not the main data the program processes. In my case, they were configuration, describing where an auditing processor scrapes input and how to allocate compute.


Scraping is not bootstrapping. That's the main process. Main input can have several sources.


If you're redefining the main input to include all input data (including hand-edited CSV files) then you're just contradicting yourself. This conversation is tedious and unproductive, I'll leave you alone.


Config is data. Data is input.


Anything that is not hardcoded is technically input.

But there is a reason we have a word for config.

It's a specific kind of input: it feeds the boostrapping state of your program, it's not the main data your program process.

The difference is important: structure, complexity and dynamism requirements are not the same.


Though I'll agree that I don't love the name "table". I think it's referring to the kind of table you might see in an instruction booklet or something, with keys on the left and their values on the right, but the possible confusion with database tables seems not great.


I recently read that YAML has a way of specifying a "sequence of sequences" which could work for tabular data like that in your example. i.e

  - [Server,Subnet,IP]
  - [db01,databases,10.1.2.3]
  - [db02,databases,10.1.2.4]
  - [user01,filers,10.3.10.1]
  - [user02,filers,10.3.10.2]
That being said, the CSV does seem more straightforward.


I would love to see YAML or EDN-style tagging in TOML for use cases like marking strings as IP addresses.


This is a table.

Let's see it do a tree. (And not an adjacency matrix. That's cheating.)


A tree is a graph without cycles. A graph is a relational structure E(a,b) which can easily be stored in tables. This is why relational databases like SQL store tables. So there's nothing exceptional about storing graphs, and adjacency lists or matrices are all fine.

To connect with the other threads, SQLite would be probably the best candidate proposed here to store a tree.


My point was about ergonomics, or at least that's what I was trying to hit at.

TOML does a much better job than CSV.


interesting. good pattern. I frequently do something similar:

  opt="
  # this is the foo option
  a=1

  # define the bar
  #BAR=ON
  #BAR=OFF
  BAR=MAYBE

  " | egrep -v '^#|^$' | sed 's/^/-D/'
or

  progs="
  one two three
  #four five siz
  seven ate tree
  " | egrep -v '^#|^$' | while read o1 o2 o3
  do 
    program $o1 $o2 $o3
  done
sort of like on-the-fly readable configurability


Terseness is a bad property though, it hurts readability and correctness.


You need to use strings to store guids or ips, it should not be a configuration language’s type


Yah, you gotta point.


Of configuration file formats, TOML is my favorite. It hits the right balance of power and simplicity, it's been able to handle my needs with minimal effort and fuss.

I tried Python's INI for configuration, but it was both too simple and too complex at the same time, very frustrating.

YAML is a nightmare, the formatting is way too easy to get wrong and accidentally break your configuration.

JSON is too strict/simple. You can't even have trailing commas in a list!


I recently wrote a TOML parser in Idris2. It's a cool language, but has some quirks. For me, it's that it's just a simple KV store with some sugar (that's nice!) but it also means that a lot of information is lost forever in parsing. For example, these two documents are strictly identical:

    [info]
    name = { first="bob", last="jones" }
and

    info.name.first="bob"
    info.name.first="jones"
This can be seen as a positive (it's a really simple document) or a negative (it's hard to machine-generate nice TOML, and it's easy to think there's more to the structure of a written document than there is).


Can you expand on the negatives? I don't think I understand what you mean.


I believe the main negative he is referring to is that there is no canonical representation for a given document. The input file can be structured to share common keys (in the example above it's `info`) or not. Since both inputs are parsed into the same value it means that writing a printer is harder: you now have the choice between different output representations.

As a comparison, there's not much choice when serializing to JSON - the only variation is around whitespace. When serializing TOML you need more insight to decide what's the best representation for a human. A contributing factor to this issue is that TOML is mostly used for configuration files, so it often matters that the output is readable.


JSON is too strict/simple

This is why I like JSON!

I have seen the horrors of XSLT and I’m never going back.


IMO it kind of sucks that you're not able to add comments to JSON easily.

In TOML (or YAML) it's just as easy as prepending it with # at the beginning of the line.


I've seen folks add comments to json... as data

  {"comment":"hi mom", ...}


> add comments to JSON easily

Simple, write comments in Morse code using spaces and tabs.


There are a few projects with a similar feel as json yet allowing nice to haves like comments and trailing commas:

* json5

* HCl

* hjson (now unmaintained)


> hjson (now unmaintained)

Which is quite sad, as I thought that it was the best one (if memory serves)..


I tried the go and rust implementations and they each had subtly broken behavior for things like leading slashes in key names or trailing commas. I want something much more strict than "human" json. Maybe a "lizard person" object notation would fit me better.


For a typescript project, after going from .ini and .toml I figured I'd just write config file in typescript. You'd get all the type hinting, auto completion and error checks.

I never liked that anything I pulled from a config file was never checked by typescript, so I just went simple with TS itself.

I understand if I use another language within the same project, it'll be a problem but until then this can't be beaten.



YAML and TOML wouldn't have been necessary if JSON had been slightly more expressive. If JSON had

  * comments
  * bareword (not double quoted) object key names
it'd be a much more suitable configuration file format for everything, and I don't think we'd see the motivation to make things like TOML and YAML.


JSON5 pretty much addresses all the problems with JSON as a config format. I'd love for it to just replace JSON entirely.

https://json5.org/


Generally looks good, but ugh, I dislike allowing single quotes. Allowing single quotes has close to zero ergonomic value but increases the number of ways to write the same fundamental thing.

But that's a minor quibble. If everyone adopted this thing instead of legacy JSON, we'd be better-off.


Having two ways to quote is a useful feature, not just one way to do the same thing.

Javascript already has this, so allowing single quotes simplifies copypasting object literals, which is a common use case.


That looks really nice. I might start using it for configuration.


YAML and JSON are about the same age: YAML dates from 2001 and JSON 2001/2.


Yep. And trailing commas,


I never realized I would want this so much.


or also no commas at all


I had tons of headaches trying to get toml to do what's trivial in Json (nested arrays of objects of arrays of ... etc). 1/10 would not try again. For very simple, hear "flat" config it checks out. But then, so does yaml or even ini files. Anything requiring composite types was just a nightmare. I'd even prefer xlm before using toml again.


toml really is best for mostly flat data. when your config grows more complex than that, i recommended you break it into multiple files and use directory structures or filenames to convey their relationship. or, of course, switch to a configuration format that is more well suited to deeply-nested data.

your comparison to yaml or ini is apt; toml's strength over yaml is syntax simplicity and toml is more-or-less a superset of ini (which itself is poorly-defined).


I never had to work with it seriously but I still don't get why people hate XML so much.


XML is verbose, surprisingly complex, and first and foremost not designed to be written and consumed by humans.

There's plenty of room for debate, e.g. element versus attribute and things get unwieldy pretty quickly [1]

The "note"-example illustrates the issue quite well: the order of the elements matters and you end up writing every element identifier twice.

Not to mention the bloat that comes with using an XML library that's actually compliant with the standard and includes all the bells 'n whistles.

[1] https://www.w3schools.com/XML/xml_dtd_el_vs_attr.asp


I would venture a guess that on the surface, it's about its verbosity and the pyramids of doom you can get when editing.

On a deeper level, I think they might be frustrated with the ability to develop custom formats (XSDs) which effectively make XML not one format, but a gazillion formats.


I think it allows too much and parsing it becomes a burden.

with json - you just suck it all into your data structure

with xml - sometimes people put stuff as a tag, sometimes as an attribute. Maybe not you, but someone else will do it.


Hipster meme that only xml has problems. Ironically they now went full circle to yaml, which is much more obtuse than xml.


Personally I find it a lot less readable than things like JSON or TOML.


I've always had a soft spot for the classic .INI style configuration.

Modding command and conquer through "rules.ini" was always an adventure.


I also like how it just does one thing: key/values. It doesn’t try to be a complex serialization format or its own full language - it’s just keys and their values. Granted, sometimes data needs to have more structure, but for settings/flags INI is very good.


One nice thing about INI is it's really easy to modify programmatically (without eg. losing comments).


It's 2020 and I'm still using them for configs. Section, key, value seems to fit so nice for me. But! I'm only using INI for the simple (like bootstrap level) of the apps - once it's up start using like etcd, or other fancier stuff


I used python configparser and there's a grey area when configs are merged (like default stuff and user-config'd stuff)


I discovered TOML a couple of years ago when I started playing with Hugo, as it was Hugo's format of choice for configuration files.

I honestly find TOML harder to read and more complicated to use than YAML. I tried to look into it but frankly I still haven't found a use case where it made more sense to use TOML.


Interesting how differently competent people can perceive things. To me, YAML is a cruel joke, and I carry suspicions about the competence of anyone who opts to use it out of all possible options. I think I've even grown to like XML configuration over it.


if json only allowed comments...


+1


I’ve had all sorts of issues with yaml due to indentation, so I don’t use yaml anymore.


I tried config files in the .INI type syntax of python's configparser but turns out there is a lot of hidden complexity where the "default config" intersects the "user config"


This has been my experience as well.

There's been times I wanted YAML support in a particular app; I've never wanted TOML support. If anything, it's been "ugh, I have to use TOML".


Strict yaml claims to have fixed most of the problems of yaml and other languages used for configuration. Basically it is yaml with only string as a basic datatype, and other complexities removed. Handling of data types is done on a higher schema layer.

https://hitchdev.com/strictyaml/

It's a smaller project so probably could use some support.


Another one I've been partial too lately is recutils (https://www.gnu.org/software/recutils/), it's similar to sqlite but has the benefits of being minimalist and plain text.

Generally I prefer actual code based configuration though, there's only a small subset of people than can read and edit these config formats but would be unable to configure something like DWM through real code (https://git.suckless.org/dwm/file/config.def.h.html) and this approach gives far more flexibility without code bloat. Once customization through config is removed it's only machine-to-machine configuration that changes and these are easily handled by TOML, INI or even environment variables.


TOML is great if you think of it as JSON, but optimized for being written by humans instead of machines. I hope no new first-class types or syntax features ever get added after the 1.0 release so it can have the same stability that makes JSON great. The vast majority of the proposals that I see in the repo's issues would collectively destroy the value of TOML, and I'm glad the current maintainer seems to have the sense and fortitude to resist such additions.


“Obvious” is subjective. I find TOML to be remarkably confusing and surprising and UNobvious, whereas YAML makes perfect sense to me, even though plenty of folks hate it. Essentially, the name is pretty obnoxious in multiple ways, given that it’s basically just an extension of the long-established INI format.


YAML makes perfect sense until one of its many, many edge cases bites you.

There is a subset of yaml that is almost good. But even then, there are cases where yaml is interpreted differently the average human would expect.


    country: no


I still don't understand why the Python steering committee decided to switch to this format for package configuration instead of the perfectly serviceable and existing setup.cfg. Packaging is already a pain to setup, now you gotta learn three (3) different ways to do it (if only to translate from one to the next) because everyone's split. The choice doesn't even matter very much, just settle on something whatever it is


They did settle on something: TOML, in PEP 517/518. There are a ton of in-progress PEPs building on top of PEP 518 to make pyproject.toml the place to put package metadata in a way that will actually be standardized and usable by multiple different packaging tools. setup.cfg is just a crappy form of ini file, with no standards for what it contains other than "what do common tools do". It's a mess, and it's going to get better, but it takes time because consensus is hard and we're all doing this for free.


There's not much to learn. It's basically .ini on steroids.


It's not that it's hard. It's that it's different, and every difference has a cost. IMHO, and in the OP's opinion, the benefit of TOML for packages isn't worth the mental cost of having to know about three configuration methods.


What confuses me is why there isn't a toml module in the standard library.


Yeah, that's an issue in flake8. There is no stable standard library for something that is supposed to be core to python moving forward.

https://gitlab.com/pycqa/flake8/-/issues/428#note_251982786


I think the general feeling in the community is that the standard library is where packages go to die. Once something is in the standard library, it can't get meaningful features/updates outside of the yearly release cycle. pip and setuptools are already vendoring a TOML implementation, and once you have pip, you can pick from multiple great toml implementations that are all under active development. Maybe once the packages/standards for TOML are finalized and the only thing left to do is fix bugs, maybe then we'll see a TOML implementation in the standard library, but in the modern age where you can fetch dependencies at any time, there's honestly not that much incentive for CPython developers to take on more burden of maintenance and development for the already-massive standard library.


> the standard library is where packages go to die

I totally understand that sentiment. But that does seem to contradict the "batteries included" philosophy. Then again pip itself seems like one of the batteries you would expect to be included, so maybe that philosophy is just no longer as relevant to python.


my understanding is that the batteries included philosophy wasn't really successful. probably it would have been better if they had a smaller standard library providing core language-level functionality and then many blessed p packages that with independent versioning


It was successful at the time. Remember Python is nearly 30 years old now, and has always had the worst dependency management of any major language.


I don't know c++ dependency management is pretty terrible.


Python now includes pip itself in the standard library (which is a spectacularly poor decision IMO).


Nit: the standard library includes a module to fetch and install pip. It doesn't include pip itself.


People keep confusing program configuration files and program stdin and are surprised that TOML is not a good fit for it.

If you are provisioning servers from a TOML files, you're doing it wrong. That's your program main input, not config.

If you have deeply nested values, chances are you are again confusing configuration vs main input.

JSON and CSV are great main input formats.

TOML is a great configuration format.

YAML and XML try to do both, and end up doing average on either, but being abused for all of them (looking at you Ansible and Solr).

So you should use each format for the correct purpose, as usual.

A config file looks like ~/.ssh/config or /etc/nginx/nginx.conf, not like ~/.ssh/authorized_keys or /etc/nginx/site-available/default.conf.

Just because we love to put a lot of data input in config folders (.config and /etc are full of input data) and call that configuration doesn't make it so.

How to know if something is configuration and something is input?

Conf data is usually only manually edited (authorized_keys changes with the life cycle of the program) and doesn't contain logic (default.conf if full of logic). Conf is also how the program is going to behave when performing its main task, the boostraping state, not the data used for it's main task (so not ip for provisioning servers, which is not meta, but the main course, or default.conf, which nginx uses to perform the main task). Conf rarely changes, main input often does. Conf is not piped or redirected to stdin.


Self plug: I wrote a simple web app in the same vain as jsonlint and yamllint for toml files. https://toml-lint.com


Whoa cool. I remember seeing Tom’s initial proposal/spec for TOML on here years ago. Like maybe a decade even. Glad to see it’s still out there and actually been implemented


It's used often in rust. Rust crates are configured with toml files.


TOML used as an `.ini` file is nice, but it gets verbose and confusing pretty quickly if you try to store more complex data in it.

`[[foo]]` syntax is not obvious at all. It's a bit weird that top-level declarations use ini-like syntax, but values can use JSON-like syntax, and there are multiple syntaxes to express the same data structure.

It doesn't strike me as very elegant, or obvious. Still, it is less annoyingly-inflexible than JSON, less verbose than XML, and less footgunny than YAML.


I was constantly reading/writing config files and the structure and comments were what convinced me TOML was worth to try. I did end up going back to JSON for the schema validator bit, but I just found out that TOML is still working on it: https://github.com/toml-lang/toml/pull/116


IF you need schema validation, you should probably be using XML


I can't stand TOML. I find it's method of representing arrays and tables of objects so much harder to read and reason about than YAML.


TOML sucks in so many ways. It is about 8x better than YAML, INI, or JSON though. It is good enough that I can ignore its flaws.


JSON should be banned for configuration, if only for the fact it doesn’t support comments.


Or trailing commas in lists/objects. That is 90% of what burns me every time I write JSON by hand.


Use HJSON, json with comments that doesn't kill you for a trailing comma.

All the benefits of JSON, no new syntax to learn, easy to fit in place in new or existing systems.

https://hjson.github.io/


Seems better than Json5.

All languages should make unnecessary characters optional like trailing semicolons and commas.


Yes. It's crazy that people somehow decided that XML is evil for configuration and JSON is good. YML is barely OK.

JSON is good for data exchange. Configs? That's crazy talk.


JSON is great, but not for config: It was never really meant to be written by humans.


I agree totally with you, but designing a language that's not "meant for humans" is an _incredibly_ stupid idea. Of course humans will try and write any protocol by hand. Who's gonna stop me?

If it looks like text, make it as easy as possible for humans to consume and write by hand. Period.

That's a small but very important reason why HTTP rules the world.


Really? How so? I thought the whole reason it won over XML for HTTP stuff was precisely because it could be easily written and read by humans.


By raw bulk most JSON is not written by humans, it is an interchange format. Yes of course a human is going to write JSON in snapshot tests. And it is for sure a feature that it is easily written by humans.

Notice I did not mention readability. JSON was always meant to be human-readable; and its popularity is a testament to the rapid turnaround on debugging that the readable aspect of JSON affords. This is, say, in contrast to BSON which is similar but not humanly readable, and unsurprisingly less popular despite its bandwidth advantages.

Config formats are, by raw bulk, generated by humans, and need to be readable (ideally reproducibly across implementations) by computers.


Of course it was meant to be written by humans - every programming language and text based data format was meant to be both written and read by humans. JSON is basically Javascript objects, and Javascript was definitely meant to be written by humans.


I hope this catches on some more. I'm tired of being bitten by all the issues that I encounter with yaml configs.


Same, but I doubt it. It's been around for many years. I had to migrate services away from it recently, even.


Just bring up Toml to be used for your next project whether it's open source or company project. That's how things get more popular.


What are these issues? I've been using YAML comfortably for years


This link has a decent overview: https://www.arp242.net/yaml-config.html

There's also the NO/Norway problem (although it seems this instance of the problem may be fixed if you're always using newer versions): https://allan.reyes.sh/programming/2018/06/20/The-YAML-NOrwa...



I've always enjoyed using HOCON in Scala projects. I wish it saw more adoption.


And so the language churn continues. This time for configuration. TOML is fine, I guess. It doesn't seem to offer much benefit over Yaml or JSON. In the absence of those it would be great. But in a world where Yaml and JSON exist it's yet another structured data format you have to learn.


It's not like TOML is new. I'm familiar with it because of Python's pip tool, and InfluxDB/Telegraf. Apparently it's also used in Rust's Cargo.


I'm personally sick of yet-another-config-formats. Why did designing metaformats become cool?

They all have warts because using the same symbolic notation for data and structure leads to encoding issues that have cognitive overhead.

So instead of getting used to the pratfalls involved in making mistakes in one format/library set, we can do it in a whole bunch of different ones. And of course if anyone uses the new format much much, we'll get multiple, slightly incompatible versions, all with different bugs.

Thanks?


> I'm personally sick of yet-another-config-formats. Why did designing metaformats become cool?

I can't seem to find a date for it, but I think TOML is pretty old, so probably predates a lot of these YACFs.

I think that the problem is that it's such an easy workflow:

1. No existing config format is perfect. Look at this should-be easy thing I want to do that isn't easy!

2. Create a new format that makes the desired should-be easy thing actually easy.

If you have one or two pain points, then it's really easy to address them, and your life gets so much better for a little while. If you're lucky and you're a good YACF designer, then your life gets so much better for quite a while—and, by the time you realise why the design isn't perfect, you're already invested in it.


> I can't seem to find a date for it, but I think TOML is pretty old, so probably predates a lot of these YACFs.

Not _that_ old. TOML's first release was in 2013.

But as a testament to the config format, it was only about a year old when Rust's cargo adopted it fully.


3. Realize that making a good general purpose config language that works well for a variety of different use cases is actually quite difficult.


Unicode handling seems nice:

>"

\uXXXX - unicode (U+XXXX)

\UXXXXXXXX - unicode (U+XXXXXXXX) "


I use only yaml for humans and json for machines. i see many projects that use toml but i just hate it. maybe toml is not bad itself but i have never seen project that would have toml config really human-friendly. it always loks like a retard-child of some ini monstrosity. yaml all the way, hands down.





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: