Hacker News new | comments | ask | show | jobs | submit login
TOML, Tom's Own Markup Language (github.com)
155 points by charlieok on Feb 24, 2013 | hide | past | web | favorite | 169 comments

I note that, like many erstwhile specs, TOML does not document the escape sequences accepted in strings. Nor does it exhaustively specify integer formats and float formats - rather ironic for a spec that advertises "TOML is designed to be unambiguous and as simple as possible."

The limitation on array types seemed fairly arbitrary at first glance, but after thinking it over I realized it aided compatibility with languages that do not support homogeneous arrays. Though as far as the types go, I would add boolean and perhaps non-quoted strings for single-word values.

Now that the technical criticism is out of the way, holy crap this guy is arrogant.

I don't know if you can call him "arrogant". None of it read as very serious to me, I more assumed he was just having fun.

fwiw: This is how I read it too, I didn't get any sense of arrogance.

Well, he's the CEO of Github, and he's probably been drinking, so I suppose a little bit of arrogance is expected.

The type of person who is arrogant while drunk is generally arrogant while sober as well ...

For the love of baby jesus why can't people get the 'H' right.

As proper nouns become more common, they first lose any capitalization in the middle of the word, and then finally capitalization of the initial letter. It's human language. It happens.

Especially when their own logotype has it in all lowercase.

That wouldn't at all back up calling it "Github."

Yes, and I think it's arrogance on the part of Wordpress (there I did it) folks to insist that everyone capitalize it in the prescribed manner. Especially since they weren't consistent from the get-go. They even went so far as to make Wordpress (trolol) itself filter content to be capitalized if someone tries using the lower case p. http://justintadlock.com/archives/2010/07/08/lowercase-p-dan...

It's to do with protecting their trademark though. That whole human language makes proper nouns normal words - companies don't like that at all. In the case of WordPress, there's a lot of potential for abuse if anybody can call their system it or whatever.

Hehe, that reminds me of iphones auto-correcting "iphone" to "iPhone". Jeez that would irritate me, I'm trying to write a text message, not look like an iDouche...

Unlike TOML, most people are case insensitive.

Don't you mean "Jesus"?

github likes daring escapades with sharks


Now that the technical criticism is out of the way, holy crap this guy is arrogant.

Tom's not being arrogant; he's just being irreverent.

I'm just wondering what the point of having homogeneous arrays is when the dictionaries aren't...

Seconded. Otherwise it seems quite nice, but this one inconsistency stands out.

> If it's not working for you, you're not drinking enough whisky.

>I realized it aided compatibility with languages that do not support homogeneous arrays.

Don't you mean languages that only support homogeneous arrays (or languages that do not support non-homogeneous)?

As the spec says that the array elements must all be of the same type, thus homogeneous.

If I a mistaken, can you please explain why?

I meant "only support homogeneous arrays," or "do not support heterogeneous arrays," and apparently got the two wordings mixed up. Thanks.

looks like you can put string arrays and int arrays into the same array though.

"data = [ ["gamma", "delta"], [1, 2] ] # just an update to make sure parsers support it"

so in a static language it would be like: Array<Array<???>> not sure this makes any sense

Urg. Off topic, but I dislike this perl/ruby tendency of calling hash tables hashes. When I see the word hash, I always think of a value (ie a hash code) and not a data structure. Why couldn't they call it a hash map, hash table, map, table, dictionary etc like all the other languages...?

I agree with that. 'map' or 'dictionary' are the best choices I think (or 'associative array', but why bring arrays into it). That's the interface, of which a hash table is just one possible implementation.

I've never liked 'dictionary'. The analogy isn't at all apparent to me. A dictionary explains what words means. The thing we're talking about doesn't explain what keys mean. (Someone who spends most of his time writing python here.) 'map' or 'mapping'.

It's about the operations. One Does Not Simply (tm) read a dictionary. One instead performs a “lookup” for a particular item. The dictionary is designed to make this lookup fast and reliable, which matches the purpose of these data structures in software.

A dictionary maps words to their definitions. The words are the keys, the definitions are the values. Seems reasonable to me. Though as another predominantly pythoner, I do prefer map as well.

In a dictionary the value (meaning) is often (partially) implied by the key (word), by etymology etc. In the data structure there need be no relationship between the key and value other than the fact that they are a key-value pair in this instance. It introduces messy cultural concepts into what should be a clean, abstract concept.

I think dictionary is a useful high-level analogy. Small key objects mapping to potentially large, and often structured, value objects. (By structure, I mean the definition in a dictionary often includes fields like pronunciation and origin.)

Sadly "map" is also the name of the critical "map" function which operates on lists. Maybe with an indefinite article ("a map") it's clear enough.

Sure, so go the Lua route: table. Or the python route: dictionary. If neither of those do it for you, how about "mapping"?

Hash (and hash map, hash table etc) leak too much implementation detail. What if you want a tree-based mapping instead? I like how in C++ it's map (for ordered, rb-tree based maps) and unordered_map (for unordered, hash table based maps).

I'm going to try out "mapping" -- good suggestion.

This bit of ruby should take care of it

    HashMap = HashTable = Map = Table = Dictionary = Hash
And if you're feeling adventurous,

    Object.send :remove_const, :Hash

Please never, ever do this

If doing this is such a bad idea then why is it so easy?

If doing:

  #define BEGIN {
  #define END }
were such a bad idea, then why is it so easy?

Maybe the language is poorly designed.

Or maybe the connection between being able to do something and it being a good idea to do something is just in your head.

In my experience, making obviously bad things difficult or impossible improves reliability. This idea certainly resides within my cranial cavity, but that doesn't necessarily make it wrong.

How could:

  HashMap = HashTable = Map = Table = Dictionary = Hash
possibly not qualify as "obviously bad"? The only reason you've offered up is because it is easy...

I think this is fine. The obviously bad part is being able to remove/change constants, especially as these changes are global.

The obviously bad part is that you pollute the global namespace for no reason other than laziness. When someone comes across code that uses a "Table" object interchangeably with "Dictionary" and "Hash", then he's going to have to look through the source code to find this bizarre line only to find out that you renamed a built-in container for no good reason.

Yes, I suppose that's also true.

I approve of this. Don't listen to that other guy. This should be standard.

One addition though:

    Cocktionary = HashMap
"Dictionary" never really made sense.

Because the primitive is called Hash

    my %hash = ();
So naturally people talk of Hashes etc. I understand where you're coming from, but it's really not very important, and it would be more confusing to talk of Hash Tables as learners would naturally look for HashTable in the stdlib.

That was the question: why the class is named Hash instead of HashMap or Dictionary? Was it done intentionally, or it is just an accident because someone did not know English very well?

I agree Map or Dictionary would have been fine too, but it's so widely used it needs to be easy to type so two words is not great (HashMap). However I suspect it was just named that way following perl (written by an english speaker). Obviously it's far to late to change it now and I can't say it bothers me or most Ruby users. It's something you get used to very quickly.

So do what python did: dict

Though even HashMap isn't bad because typing is a solved problem - with auto completion and touch typing two words really aren't an issue in my mind.

People get used to living with all kinds of things, but that doesn't make them any better. Yes I'm aware that this applies equally to my typing comment as to you having got used to hash.

> why the class is named Hash instead of HashMap

Why in golang is a function denoted by func instead of function?

I'd guess it's because programmers prefer fewer keystrokes as long as the term remains sufficiently mnemonic.

The name 'func' doesn't collide with anything else though. It is an entirely more reasonable abbreviation.

    Because we need a decent human readable format 
    that maps to a hash and the YAML spec is like 
    600 pages long and gives me rage. No, JSON 
    doesn't count. You know why.
I do not know why, And would love if one can explain me?

Other than comments, I see not difference between both.

Also, that human readable is not an accurate, as it should be hacker readable, you know, IT folks are the only target audience of those files.

    name = "Tom Preston-Werner"
    organization = "GitHub"
    bio = "GitHub Cofounder & CEO\nLikes tater tots and beer."
    dob = 1979-05-27T07:32:00Z # First class dates? Why not?

        "owner": {
            "name": "Tom Preston-Werner",
            "organization": "GitHub",
            "bio": "GitHub Cofounder & CEO\nLikes tater tots and beer.",
            "dob": "1979-05-27T07:32:00Z"

In JSON that datetime won't deserialize to a datetime instance in your language in a conforming parser. Further JSON has no comments (this is a killer for a configuration format).

There are ways to fake comments by using extra fields:

{ "what i want": "what i really really want", "ಠ_ಠ":"Ignore the eyes" }

I think you just helped Tom make his argument... :)

meteor has a way of serializing and deserializing datetime values http://docs.meteor.com/#ejson

In the authors own words[1]:

{ 'because': { '80': 'percent' }, {'of': 'JSON', 'is': 'brackets' } }

[1] https://github.com/mojombo/toml/issues/2#issuecomment-140029...

That's actually not valid JSON -- should use double quotes

{ "because": { "80": "percent" }, {"of": "JSON", "is": "brackets" } }

That's not valid JSON either! (The second value has no key). Needs to be:

{ "because": [{ "80": "percent" }, {"of": "JSON", "is": "brackets" }] }

And this is only 10% curly braces, not counting spaces.

Despite that, the thread easily illustrates the difficulty of writing valid JSON by hand.

JSON wasn't invented, it was discovered, from a long evolution of programming languages. The punctuation isn't ceremony. It's the amount needed for it to be concise (clear and terse, not just terse).

The difficulty level is hardly extreme. It is not an unreasonable challenge to learn that writing an array of elements requires opening and closing brackets.

The issue here might be that JSON has become widely used for two things:

   Data marshalling/transfer
   Config formats
For the latter, as they are typically written by hand, it's not particularly appropriate as the syntax is noisy and multiple nesting with brackets tends to lead to errors, even if you understand it perfectly well in principle, and of course there are no comments, no datetimes etc.

I imagine this is intended as a saner version of YAML for configs.

FWIW I've also recognized the problem and wrote and implemented my own configuration syntax, which makes the aforementioned JSON data look like:


The implementation, which is in Haxe and has an informal spec in comments, can be seen here: https://github.com/triplefox/triad/blob/master/dev/com/ludam...

I didn't view bracketing as the enemy(which seems to be the focus of a lot of config syntaxes) but rather the combination of multiple types of bracketing, plus start-and-stop usage of shift keying. I only have two types of brackets, the sequence [ type and the long string {" type, and you can "feel" when you're writing a long string because of that sudden need to use the shift.

I've /never/ had a problem writing JSON by hand.

Try this in TOML:

key = "value1", "value2"

The same mistakes can be ignorantly made in any markup.

It isn't a markup language. I'd like to correct this mistake that was started by YAML. :/ http://en.wikipedia.org/wiki/Markup_language (Using the bacronym "YAML Ain't Markup Language" only helped it grow, making more people confused as to what a Markup Language is.)

I like it, though. More grepable than JSON or YAML, with the way it handles nested keys using dot notation.

Like hell it has. People know what ML means at the end of a file format.

I just wrote what it said in the README. I assume that's why he named it thus. I agree it isn't a markup language.

Yes, this was directed at the linked content, not the title. Good on ya for not being clever with the title. :)

I've always been a fan of the .INI syntax but the lack of a standard (which I think Microsoft should have championed) made the format hard to use consistently. There have been attempts at standardization [1] but, alas, they never spread widely enough. In light of the above, I'm glad to see an INI-derived format with a real spec -- not necessarily because it might replace JSON but because it might replace INI.

Speaking of INI, for the longest time the killer app for INI files for me was persistent data storage in batch scripts (.bat/.cmd files in Windows 9x/NT). Using a command line utility like [2] or a similar program from IBM that sadly wasn't legally redistributable you were able to achieve persistence with minimum effort, which would otherwise be difficult to program in batch. I even wrote a portable clone of inifile.exe for MS-DOS and Linux to be able reuse my scripts more easily. TOML would sure benefit from the same.

[1] http://www.cloanto.com/specs/ini/

[2] http://www.horstmuc.de/wbat32.htm#inifile

The biggest concern with JSON seems to be the lack of comments. So what voodoo is Sublime Text 2 performing? Why can't we just use that?

    // Sets the colors used within the text area
    "color_scheme": "Packages/Color Scheme - Default/Monokai.tmTheme",

    // Note that the font_face and font_size are overriden in the platform
    // specific settings file, for example, "Preferences (Linux).sublime-settings".
    // Because of this, setting them here will have no effect: you must set them
    // in your User File Preferences.
    "font_face": "",
    "font_size": 12,

    // Valid options are "no_bold", "no_italic", "no_antialias", "gray_antialias",
    // "subpixel_antialias", "no_round" (OS X only) and "directwrite" (Windows only)
    "font_options": [],

    // Characters that are considered to separate words
    "word_separators": "./\\()\"'-:,.;<>~!@#$%^&*|+=[]{}`~?",

    // Set to false to prevent line numbers being drawn in the gutter
    "line_numbers": true

Some JSON implementations supports comments, others don't. If you know the one you use supports (and will continue to support) comments, go ahead and use it. It just won't be portable.

Douglas Crockford himself suggests you "Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser." That sounds like a reasonable workaround.


I don't know specifically about Sublime Text 2, but from my own experience writing a configuration library which accepts JSON as an input, you usually strip out those comments before feeding the resulting content to your appropriate json_decode function.

1- Read the contents of your JSON file.

2- Strip out the comments with some regex foo or such.

3 - Feed the remaining contents to your JSON parser.

> There should only be one way to do anything.


> There are two ways to make keys.

I guess I haven't had enough whiskey yet.

> Tabs or spaces. TOML don't care.

And two ways to indent.

Given Jekyll's enormous backlog of issues and pull requests[0], can we expect this to be maintained or supported any bit beyond the late night drunken brain fart that this is?

[0] https://github.com/mojombo/jekyll

Parker Moore and I (along with many contributors) have been spending quite a bit of time on Jekyll recently. Over the last 30 days we've merged 17 pull requests and closed 62 issues. We're ramping up for a 1.0 release and there's a brand new website in the works. You can check it all out on the master branch.

Tens (or possibly hundreds) of thousands of people use Jekyll now. It's interesting to note that Jekyll started out as a "brain fart" as well. Just one amongst hundreds of blog engines. I wrote it because I was dissatisfied with everything on the market, and I thought I could do something different and better, to serve my own needs. I open sourced it, because I thought others might get a kick out of it.

I'd wager that most of the great things we use today started as nearly ephemeral emanations from someone's mind, often late at night, or helped along by a snifter of brandy. The funny thing is, if you never try out your crazy ideas, you'll never know which ones might have changed the world.

Tom, I've been waiting two years on a pull-request to the official gem which adds the ability to view open/closed issues in private repositories.


The pull has 19 people asking for integration and has some stellar comments:

"seriously? year long pull request with two lines of changes?"

"I normally would think that the github gem features for paying users would get a lot of attention from the folks at github..."

I even tried getting it pulled via pre/postsales emails to enterprise@github.com (I'm a enterprise customer) which was met with a "yeah, i'll tap him on the shoulder to integrate - year later nothing.

That project isn't actively maintained at the moment (nor is it an official GitHub project), but I'll see what I can do tomorrow to get it merged in and released. Sorry for the frustration!

Cool. I'm very happy that you're back actively working on Jekyll! Guess my statement was a bit outdated then. Take it with the appropriately sized grain of salt.

Note: there's nothing wrong with releasing brain farts; quite the contrary. I didn't at all mean to imply that you shouldn't do that.

> I'd wager that most of the great things we use today started as nearly ephemeral emanations from someone's mind, often late at night, or helped along by a snifter of brandy.

Scientific support for this:




And how is this better than xml?

  <owner name="Tom Preston-Werner"
         bio="GitHub Cofounder &amp; CEO\nLikes tater tots and beer."
         dob="1979-05-27T07:32:00Z" />

  <database server=""
            ports="8001 8001 8002"
            enabled="true" />

    <alpha ip=""
           dc="eqdc10" />
    <beta ip=""
          dc="eqdc10" />

Let's see:

. No native support for numbers, dates, booleans or lists. The latter can be implemented using subelements, but it's so cumbersome that you skimped on that and used a non-typed string instead (the database ports).

. Redundant verbosity. Root elements, closing tags, way too much crap to be manually inserted.

. XML parsers are huge, complex beasts which have no place in many smaller applications.

. Being XML, it leaves way too many possibilities for crappy developers. Namespaces in config files, oh joy!


Most of your points are environment specific and I think that you forgot the strongest of them - "xml APIs usually suck". In .net they are non-issues. And about being cumbersome and verbose, the point I tried to make is that you don't have to be zealous and put every small piece of data in a separate element. No reason not to put data in attributes or even in comma/whitespace separated strings, if that piece of data can be extracted in one short line of code.

Most of your points are environment specific

How so? .Net can't magically discover the types of values or prevent developers from abusing the format.

you don't have to be zealous and put every small piece of data in a separate element.

But then you're layering a complex format with a custom application-specific parser, with an unknown syntax (e.g. spaces vs commas, are ranges supported, etc). It obviously can be done, but it's a mess.

Agree. And if XML supported unnamed closing tags, it'd lose a lot of it's rep for verbosity. Although in this case you'd just be replacing </servers> with </> in other documents it is a lot more noticeable.

I will note this isn't a valid XML document: you have no root node.

please let's not bring XML into this. last thing we need is someone inspired to say let's all go back to XML.

Go "back"?! There are lots of places where xml is alive and well and config files is one of them. And you can see why - empty elements with attributes look rather concise, and without all that punctuation noise JSON has.

Somewhere, in a small room in a larger building owned by a gigantic corporation, a SOAP programmer just felt validated.

cough every java project ever cough

Typesafe Config says hello.

Ruby parser here: https://gist.github.com/jm/5022483 Please fork and improve. :)

    # line 36
This will have trouble with a line like:

    tweet = "TOML is #awesomesauce"

    # line 43
    array = $1.split(",").map {|s| s.strip.gsub(/\"(.*)\"/, '\1')}
You should recurse into coerce here, or you'll just lose types. (Also you're assuming arrays of strings.)

    array = $1.split(",").map {|s| coerce(s) }

You're also not dealing with nested key groups. (eg. [servers.alpha]).


That being said, naïve string parsing is a terrible way to build a new markup language implementation. It's the reason the Markdown landscape is such a mess[1]. What this really needed is a formal grammar.

[1]: I actually tried to fixed that by writing a formal lexer & informal parser for Markdown in a side-project of mine[2]. It's not quite there yet, because for practicality reasons I wrote my own parser instead of a formal AST-generating parser.


[2]: http://getmacchiato.com

Yup array handling is weak. I was going to recurse into coerce, but then the examples made it seem like only strings will be accepted in arrays (he put "8000" in there rather than just 8000). I'll get clarification.

Made it into a proper project/gem here if you want to file issues: https://github.com/jm/toml

And good call on the nested key groups. Shouldn't be hard to knock that out.

Don't have time to work on it now, but it looks like you'll need to recurse while parsing arrays. Right now, only arrays of strings that don't contain commas are handled correctly.

Cool, the market isn't fragmented enough already.

Yes, being the CEO of Github does give you the power to do whatever you want.

Of course, drinking and coding is a great idea. The Ballmer peak isn't a joke, it's a way of life.

What's this fragmentation you speak of? Surely you won't be forced to use one particular format you don't like. An API usually supports multiple formats.

Are arrays of maps a bad idea? Someone posted a pom.xml file pointing out how horrible it was, and I thought to myself "How would this look in toml?"

I was all set to try a translation when I hit this section:

    <!-- A generated library for Google+ APIs. Visit here for more info:


How would I represent this in TOML?

  groupId    = "com.google.api-client"
  artifactId = "google-api-client"
  version    = "1.13.2-beta"

  groupId    = "com.google.api-client"
  artifactId = "google-api-client-servlet"
  version    = "1.13.1-beta"
That's not right, it clearly should be an array, but I don't think the standard supports it. At best I would think you'd have to use parallel arrays

  groupIds    = ["com.google.api-client", "com.google.api-client"]
  artifactIds = ["google-api-client"    , "google-api-client-servlet"]
  versions    = ["1.13.2-beta"          , "1.13.1-beta"]
and that's just not pretty.

why not

[dependencies.com.google.api-client] artifactId = "google-api-client" versions = "1.13.2-beta"

[dependencies.com.google.api-client] artifactId = "google-api-client-servlet" versions = "1.13.1-beta"

but the syntax is colliding there, You define dependencies.com.google.api-client.artifactId twice.

Also it creates the key value maps:

which shouldn't exist, so that doesn't seem right either.

Good point, I didn't notice that one. Certainly an interesting case. The ones that exist that shouldn't I don't think are as big of issues but it is certainly not an easy problem to solve here.

I'm trying to lay down the issues with TOML, so I added this in…


Ah, the power of fame. Implementations spreading like weeds. Four already in javascript, even though the spec is not anywhere near finished:

    npm search toml
    npm http GET https://registry.npmjs.org/-/all/since?stale=update_after&startkey=1361700343737
    npm http 200 https://registry.npmjs.org/-/all/since?stale=update_after&startkey=1361700343737
    NAME                  DESCRIPTION                   AUTHOR            DATE      
    node-toml             TOML parser                   =ricardobeat      2013-02-24 10:08
    toml                  TOML parser for Node.js       =binarymuse       2013-02-24 04:19  toml parser
    toml-node             TOML ====                     =thehydroimpulse  2013-02-24 08:01
    toml-parser           A TOML parser for node.js     =aaronblohowiak   2013-02-24 06:41

In a few years, maybe we can have "nfnpm" (noise-free node package manager).

What is wrong with JSON? Everything already supports it.

JSON has two drawbacks: a lack of comments (although you could add "#" keys in relevant places) and no binary support (arbitrary conventions include base64) but this doesn't support binary anyway.

A few issues (although I do use JSON in config):

It isn't a friendly form of human input. My error rate is 50%+ , you have to lint on save to catch things that are invisible to the naked eye

No ability to override, extend or reference keys. This is most useful in config objects where for eg. in a dev object you want to override the username and password for a database connection but not repeat all the other parameters

No comments

You can override in pretty much the same way TOML does. Instead of replacing an underlying object, you update it with the values read from JSON.

Lack of comments is pretty much a deal breaker for configuration. I see a lot of undocumented JSON used for configuration and I find it difficult to believe that is something we want for the future.

Lack of comments makes JSON much better for data exchange than formats with comments.

Lack of comments is at once annoying and beneficial: it forces your [JSON-based] configuration to be simple.

Like xcode project files...

couldn't you just use JS Object notation? you get comments, less cruft (having to quote string definitions in the key-space might be annoying).

Am I missing something with that?

What is the difference between "JS Object notation" and JSON? Google searches show they are the same thing. JSON definitely does not have comments http://www.json.org/

Hey Tom, why not use git's config format?


About the use of mark up language as config file. I see that in most Python apps, the config file is just another Python script and not using another markup language. This way makes sence in a dynamic language and it feels natural. I understand it is a habit to use yaml in Ruby apps for config. Is it not possible to just use Ruby script as config file since the script can be loaded dynamically? What are the pros and cons of using another markup language as config file vs using just the app language(Python/Ruby)?

Your configuration file might need to be read by more than one language.

It's also nice to have a configuration file mean the same thing regardless of its runtime environment.

Using script files to store config is convenient, but is it true that in some circumstances it could give malicious parties a chance to inject arbitrary executed code into your environment, in ways that parsing a pure data file could not.

It is also common for ruby configs to be script files. Rails, for instance, has the config/initializers folder which is a set of ruby scripts that will be run at startup. It comes down mostly to preference.

Dynamic config files are wrong for the same reason you don't want logic in your HTML templates.

This is quite nice but there are a few of things that I miss:

1. A way to have multi-line values for non array types

2. A more flexible number syntax (e.g. allow hex and binary integers, allow exponents on floats, allow NaN and +/-Inf)

3. Make it possible to have an extra comma after the last element on an array (as in Python)

4. Add a way to "include" another config file

#1 is important because some projects require all lines to have a max width of 80 lines, including on config files.

#2 is important for scientific/engineering projects. I think the current simple format shows that this format is a little too web centric. If this is going to be used for non-web stuff this is a must.

#3 is something that helps when putting this sort of configuration file in version control. Without this, adding an extra entry to a multi-line array creates a diff in two lines rather than 2 (since you must add a comma to the line above the one that you inserted). This is something I miss in JSON and which Python did just right (IMHO).

#4 would be useful in cases in which you want to provide a base configuration file for example.

Also, maybe I missed it but it is not super clear what would happen if you redefine an existing entry (I hope it is possible). Finally, is order important?

EDIT: typo.

+1 for #3

It's not just in the diffs. Trailing commas make editing the list easier.

It seems to me that YAML does this better already (with parsers which are already high-quality).

If we want simplicity, then why not make sure it is a subset of YAML?

Agreed, I'd much rather have a normalized subset of YAML without the object serialization stuff (I don't even understand why it's there: why take a format intended to be read by humans and then muck it up with complex and dangerous object serialization notation).

And without anchors and reference too.

I agree, and high quality YAML parsers are generally available in every language one might want to use. I don't believe I've ever encountered a situation where I was unable to obtain one. Well rust comes to mind, but then rust is really young and you could probably make one easily by just wrapping libyaml. That said, I might just write a TOML parser in python just for kicks.

Python has ConfigParser which parses ini-style config files, I assume it uses some standard grammar.


Working node.js version: https://github.com/aaronblohowiak/toml

I just need to auth and push it to npm.

Nice work! It's too bad that this guy squatted on the `toml` package name without any implementation: https://github.com/BinaryMuse/toml-node

Update: It seems that he has provided an implementation now. I feel better about that, I can't stand when people squat on package names in Node.js.

(Meta: the edit link expired, hence the reply to myself)

I think he invited pull requests containing implementations.

Wrote my own dumb parser in CoffeeScript as an experiment, tried to publish as `toml-parser` :D


If this had be done by any Tom no one would have upvoted

So? If Git had been invented by anyone except Linus, probably hardly anyone would be using it today.

Don't want to be a naysayer, but what's wrong with something like CSON (CoffeeScript Object Notation)?

Would this be considered legal?

  [ [1,2], ["a", "b"] ]

I wondered the same thing. "No, you can't mix data types, that's stupid" leaves it ambiguous.

If you parse the outer array as just "array of arrays" (as each element is an array), you're not "mixing". But if we're supposed to be parsing it as "arrays of arrays of _type_", then we are mixing.

This has now been clarified - it is legal

Which I find to be quite odd [1]. Data types in `TOML` can be mixed at the level of the hash table. Arrays ought to be homogeneous.

[1] - https://github.com/mojombo/toml/issues/28

The README says not to mix data types because "that's stupid" (which I don't know if I agree with); but I don't know if that answers your question.

You have an array of array, which at that level, satisfies the spec. The children individually keep types contained.

That said, I'm going to assume the intent is to not allow that.

Would it be okay to understand arrays as integer indexed keys?

For example, wouldn't that mean something like:

  array = [ [1,2], ["a", "b"] ]
Be the same as this:

  0 = [1,2]
  1 = ["a", "b"]

It's unspecified, I guess, but if you want to read into the spirit of it, which is to make it trivially-supportable by type-nazi languages such as haskell, you either get a [[Int]] or a [[String]].

I don't know how much it is going to help as you're going to have to wrap the values anyway to get a Map with heterogeneous values.

I'll be the first to ask: whats wrong with JSON?

Like JS from which it sprang, it lacks an integer type. Fortunately, parsers written for languages that do have integers can usually parse them correctly.

(If you don't know why this might matter, try opening your browser's Javascript console and evaluating 10000000000000001)

That's my peeve, though. I suspect that Tom is probably more concerned with readability. TOML also looks like it can be parsed a line at a time and doesn't really need to do any recursive parsing, so you could probably parse a stream of it as it arrives, which I imagine is trickier with JSON.

IMHO, although it's very easily readable by humans, it's not quite as easily written by humans.

I've always preferred INI over JSON for this reason.

I would prefer INI over JSON as well except that JSON lets me nest arrays and INI doesn't. So this looks really nice.

On the other hand I'd like to mix my data types as much as I darn well please.

I use it liberally, but the only thing that I find wrong with json is the [\u2028\u2029] issue:

{"The invisible character":"really messes with javascript

Copy the text and paste it in console.

Nitpick: that's an issue with JavaScript, not JSON.

I'd agree if JSON had a different name, but given that it is called "JavaScript Object Notation" on the main page (http://json.org/) there's an implicit expectation that it's somehow related to javascript.

And there's an implicit expectation that JavaScript is somehow related to Java.

No comments. Lack of essential data types, forcing you to make the contents of strings part of a hidden unspecified semantic (this parses as date, that parses as time, etc). Constrained by the limitations of JS floats (they aren't even bigdecimal). Excessive significant punctuation. Insignificant white space (permitting a difference between valid, and pretty-printed form). Looks like executable code and tempts you to parse it with eval.

Nothing, json's awesome. But like most data packaging schemes the finished product isn't designed with human-readability as a primary goal.

Edit: as mikegirouard points out it is much easier to read than (for example) serialized data, but still not as friendly as ini.

Dear industry, if you are going to add comments to JSON, please make it the /* */ variety.

Why? I dislike those comments.

Text editors will sometimes insert end-of-line characters in the name of word-wrap.

Using the end-of-line as a comment terminator would require significant refactoring of JSON parsers, which were previously at liberty to lump CR and LF together with SP and TAB. A starting and ending token, on the other hand, fits the pattern already required of a JSON parser.

> Text editors will sometimes insert end-of-line characters in the name of word-wrap.

In this decade, only a brain damaged text editor would do that.

Can't wait to see the end result.

This reminds me of a new project I'm working on called Leewh. It's based on Wheel and kinda has the same overall function, but I needed something to get my project rolling quickly and using .ini and JSON syntax separately felt... well... too square, I guess.

I figured I'll come up with something more well rounded.

Spent a couple of hours this evening on a small parser written in CoffeeScript:


Error handling is rough and it still doesn't handle groups with dots [alpha.beta], but appart from that it should be fairly complete.

Another parser for Ruby is hiding over here: https://github.com/parkr/tock

Its empty.

    class Tock
       VERSION = "0.0.1"

The issues I find with TOML:


Looks interesting. What's the rationale behind disallowing something like "-.1234". Is this how it's done elsewhere?

makes parsing slightly easier. also '-.' might blur together so you would think it is just '-1234'

I like the format. A formal spec would be nice, to aid implementations in as many languages as possible.

I really like the hash key format.

Parsing JSON for arbitrarily nested keys is nasty, and this makes it extremely natural.

How is this different, or better, than the configuration format that git itself uses?

at first glance i thought this was crazy. but now, i think this might actually work.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact