
TOML, Tom's Own Markup Language - charlieok
https://github.com/mojombo/toml
======
LeafStorm
I note that, like many erstwhile specs, TOML does not document the escape
sequences accepted in strings. Nor does it exhaustively specify integer
formats and float formats - rather ironic for a spec that advertises "TOML is
designed to be unambiguous and as simple as possible."

The limitation on array types seemed fairly arbitrary at first glance, but
after thinking it over I realized it aided compatibility with languages that
do not support homogeneous arrays. Though as far as the types go, I would add
boolean and perhaps non-quoted strings for single-word values.

Now that the technical criticism is out of the way, holy crap this guy is
arrogant.

~~~
burke
Well, he's the CEO of Github, and he's probably been drinking, so I suppose a
little bit of arrogance is expected.

~~~
nixgeek
For the love of baby jesus why can't people get the 'H' right.

~~~
stdbrouw
As proper nouns become more common, they first lose any capitalization in the
middle of the word, and then finally capitalization of the initial letter.
It's human language. It happens.

~~~
benatkin
Yes, and I think it's arrogance on the part of Wordpress (there I did it)
folks to insist that everyone capitalize it in the prescribed manner.
Especially since they weren't consistent from the get-go. They even went so
far as to make Wordpress (trolol) itself filter content to be capitalized if
someone tries using the lower case p.
[http://justintadlock.com/archives/2010/07/08/lowercase-p-
dan...](http://justintadlock.com/archives/2010/07/08/lowercase-p-dangit)

~~~
mcintyre1994
It's to do with protecting their trademark though. That whole human language
makes proper nouns normal words - companies don't like that at all. In the
case of WordPress, there's a lot of potential for abuse if anybody can call
their system it or whatever.

------
dkersten
Urg. Off topic, but I dislike this perl/ruby tendency of calling hash tables
_hashes_. When I see the word _hash_ , I always think of a value (ie a hash
code) and not a data structure. Why couldn't they call it a hash map, hash
table, map, table, dictionary etc like all the other languages...?

~~~
charlieok
I agree with that. 'map' or 'dictionary' are the best choices I think (or
'associative array', but why bring arrays into it). That's the interface, of
which a hash table is just one possible implementation.

~~~
Myrmornis
I've never liked 'dictionary'. The analogy isn't at all apparent to me. A
dictionary explains what words means. The thing we're talking about doesn't
explain what keys mean. (Someone who spends most of his time writing python
here.) 'map' or 'mapping'.

~~~
SoftwareMaven
A dictionary maps words to their definitions. The words are the keys, the
definitions are the values. Seems reasonable to me. Though as another
predominantly pythoner, I do prefer map as well.

~~~
Myrmornis
In a dictionary the value (meaning) is often (partially) implied by the key
(word), by etymology etc. In the data structure there need be no relationship
between the key and value other than the fact that they are a key-value pair
in this instance. It introduces messy cultural concepts into what should be a
clean, abstract concept.

------
tzury

        Because we need a decent human readable format 
        that maps to a hash and the YAML spec is like 
        600 pages long and gives me rage. No, JSON 
        doesn't count. You know why.
    

I do not know why, And would love if one can explain me?

Other than comments, I see not difference between both.

Also, that human readable is not an accurate, as it should be hacker readable,
you know, IT folks are the only target audience of those files.

    
    
        [owner]
        name = "Tom Preston-Werner"
        organization = "GitHub"
        bio = "GitHub Cofounder & CEO\nLikes tater tots and beer."
        dob = 1979-05-27T07:32:00Z # First class dates? Why not?
    
    
        {
            "owner": {
                "name": "Tom Preston-Werner",
                "organization": "GitHub",
                "bio": "GitHub Cofounder & CEO\nLikes tater tots and beer.",
                "dob": "1979-05-27T07:32:00Z"
            }
        }

~~~
kingkilr
In JSON that datetime won't deserialize to a datetime instance in your
language in a conforming parser. Further JSON has no comments (this is a
killer for a configuration format).

~~~
niggler
There are ways to fake comments by using extra fields:

{ "what i want": "what i really really want", "ಠ_ಠ":"Ignore the eyes" }

~~~
zwily
I think you just helped Tom make his argument... :)

------
benatkin
It isn't a markup language. I'd like to correct this mistake that was started
by YAML. :/ <http://en.wikipedia.org/wiki/Markup_language> (Using the bacronym
"YAML Ain't Markup Language" only helped it grow, making more people confused
as to what a Markup Language is.)

I like it, though. More grepable than JSON or YAML, with the way it handles
nested keys using dot notation.

~~~
burntsushi
This has been fixed. [1]

[1] -
[https://github.com/mojombo/toml/commit/aa4ac1d6df1031ebe871c...](https://github.com/mojombo/toml/commit/aa4ac1d6df1031ebe871c472011c1a4f3fb00885)

~~~
benatkin
Like hell it has. People know what ML means at the end of a file format.

------
networked
I've always been a fan of the .INI syntax but the lack of a standard (which I
think Microsoft should have championed) made the format hard to use
consistently. There have been attempts at standardization [1] but, alas, they
never spread widely enough. In light of the above, I'm glad to see an INI-
derived format with a real spec -- not necessarily because it might replace
JSON but _because it might replace INI._

Speaking of INI, for the longest time the killer app for INI files for me was
persistent data storage in batch scripts (.bat/.cmd files in Windows 9x/NT).
Using a command line utility like [2] or a similar program from IBM that sadly
wasn't legally redistributable you were able to achieve persistence with
minimum effort, which would otherwise be difficult to program in batch. I even
wrote a portable clone of inifile.exe for MS-DOS and Linux to be able reuse my
scripts more easily. TOML would sure benefit from the same.

[1] <http://www.cloanto.com/specs/ini/>

[2] <http://www.horstmuc.de/wbat32.htm#inifile>

------
tericho
The biggest concern with JSON seems to be the lack of comments. So what voodoo
is Sublime Text 2 performing? Why can't we just use that?

    
    
      {
        // Sets the colors used within the text area
        "color_scheme": "Packages/Color Scheme - Default/Monokai.tmTheme",
    
        // Note that the font_face and font_size are overriden in the platform
        // specific settings file, for example, "Preferences (Linux).sublime-settings".
        // Because of this, setting them here will have no effect: you must set them
        // in your User File Preferences.
        "font_face": "",
        "font_size": 12,
    
        // Valid options are "no_bold", "no_italic", "no_antialias", "gray_antialias",
        // "subpixel_antialias", "no_round" (OS X only) and "directwrite" (Windows only)
        "font_options": [],
    
        // Characters that are considered to separate words
        "word_separators": "./\\()\"'-:,.;<>~!@#$%^&*|+=[]{}`~?",
    
        // Set to false to prevent line numbers being drawn in the gutter
        "line_numbers": true
      }

~~~
geon
Some JSON implementations supports comments, others don't. If you know the one
you use supports (and will continue to support) comments, go ahead and use it.
It just won't be portable.

Douglas Crockford himself suggests you "Go ahead and insert all the comments
you like. Then pipe it through JSMin before handing it to your JSON parser."
That sounds like a reasonable workaround.

[https://plus.google.com/118095276221607585885/posts/RK8qyGVa...](https://plus.google.com/118095276221607585885/posts/RK8qyGVaGSr)

------
fruchtose
_> There should only be one way to do anything._

 _[...]_

 _> There are two ways to make keys._

I guess I haven't had enough whiskey yet.

~~~
geon
> Tabs or spaces. TOML don't care.

And two ways to indent.

------
skrebbel
Given Jekyll's enormous backlog of issues and pull requests[0], can we expect
this to be maintained or supported any bit beyond the late night drunken brain
fart that this is?

[0] <https://github.com/mojombo/jekyll>

~~~
mojombo
Parker Moore and I (along with many contributors) have been spending quite a
bit of time on Jekyll recently. Over the last 30 days we've merged 17 pull
requests and closed 62 issues. We're ramping up for a 1.0 release and there's
a brand new website in the works. You can check it all out on the master
branch.

Tens (or possibly hundreds) of thousands of people use Jekyll now. It's
interesting to note that Jekyll started out as a "brain fart" as well. Just
one amongst hundreds of blog engines. I wrote it because I was dissatisfied
with everything on the market, and I thought I could do something different
and better, to serve my own needs. I open sourced it, because I thought others
might get a kick out of it.

I'd wager that most of the great things we use today started as nearly
ephemeral emanations from someone's mind, often late at night, or helped along
by a snifter of brandy. The funny thing is, if you never try out your crazy
ideas, you'll never know which ones might have changed the world.

~~~
ghuntley
Tom, I've been waiting _two years_ on a pull-request to the official gem which
adds the ability to view open/closed issues in private repositories.

<https://github.com/defunkt/github-gem/pull/59>

The pull has 19 people asking for integration and has some stellar comments:

"seriously? year long pull request with two lines of changes?"

"I normally would think that the github gem features for paying users would
get a lot of attention from the folks at github..."

I even tried getting it pulled via pre/postsales emails to
enterprise@github.com (I'm a enterprise customer) which was met with a "yeah,
i'll tap him on the shoulder to integrate - year later nothing.

~~~
mojombo
That project isn't actively maintained at the moment (nor is it an official
GitHub project), but I'll see what I can do tomorrow to get it merged in and
released. Sorry for the frustration!

------
dfkf
And how is this better than xml?

    
    
      <owner name="Tom Preston-Werner"
             organization="GitHub"
             bio="GitHub Cofounder &amp; CEO\nLikes tater tots and beer."
             dob="1979-05-27T07:32:00Z" />
    
      <database server="192.168.1.1"
                ports="8001 8001 8002"
                connection_max="5000"
                enabled="true" />
    
      <servers>
        <alpha ip="10.0.0.1"
               dc="eqdc10" />
        <beta ip="10.0.0.2"
              dc="eqdc10" />
      </servers>

~~~
icebraining
Let's see:

. No native support for numbers, dates, booleans or lists. The latter can be
implemented using subelements, but it's so cumbersome that you skimped on that
and used a non-typed string instead (the database ports).

. Redundant verbosity. Root elements, closing tags, way too much crap to be
manually inserted.

. XML parsers are huge, complex beasts which have no place in many smaller
applications.

. Being XML, it leaves way too many possibilities for crappy developers.
Namespaces in config files, oh joy!

<http://harmful.cat-v.org/software/xml/>

~~~
dfkf
Most of your points are environment specific and I think that you forgot the
strongest of them - "xml APIs usually suck". In .net they are non-issues. And
about being cumbersome and verbose, the point I tried to make is that you
don't have to be zealous and put every small piece of data in a separate
element. No reason not to put data in attributes or even in comma/whitespace
separated strings, if that piece of data can be extracted in one short line of
code.

~~~
icebraining
_Most of your points are environment specific_

How so? .Net can't magically discover the types of values or prevent
developers from abusing the format.

 _you don't have to be zealous and put every small piece of data in a separate
element._

But then you're layering a complex format with a custom application-specific
parser, with an unknown syntax (e.g. spaces vs commas, are ranges supported,
etc). It obviously can be done, but it's a mess.

------
jeremymcanally
Ruby parser here: <https://gist.github.com/jm/5022483> Please fork and
improve. :)

~~~
SeoxyS

        # line 36
        text.split("#").first
    

This will have trouble with a line like:

    
    
        tweet = "TOML is #awesomesauce"
    

\--

    
    
        # line 43
        array = $1.split(",").map {|s| s.strip.gsub(/\"(.*)\"/, '\1')}
    

You should recurse into coerce here, or you'll just lose types. (Also you're
assuming arrays of strings.)

    
    
        array = $1.split(",").map {|s| coerce(s) }
    

\--

You're also not dealing with nested key groups. (eg. [servers.alpha]).

\--

That being said, naïve string parsing is a terrible way to build a new markup
language implementation. It's the reason the Markdown landscape is such a
mess[1]. What this really needed is a formal grammar.

[1]: I actually tried to fixed that by writing a formal lexer & informal
parser for Markdown in a side-project of mine[2]. It's not quite there yet,
because for practicality reasons I wrote my own parser instead of a formal
AST-generating parser.

<https://gist.github.com/kballenegger/29dabe4b6e762ee221df>

[2]: <http://getmacchiato.com>

~~~
jeremymcanally
Yup array handling is weak. I was going to recurse into coerce, but then the
examples made it seem like only strings will be accepted in arrays (he put
"8000" in there rather than just 8000). I'll get clarification.

Made it into a proper project/gem here if you want to file issues:
<https://github.com/jm/toml>

And good call on the nested key groups. Shouldn't be hard to knock that out.

------
egonschiele
_Cool_ , the market isn't fragmented enough already.

 _Yes_ , being the CEO of Github does give you the power to do whatever you
want.

 _Of course_ , drinking and coding is a great idea. The Ballmer peak isn't a
joke, it's a way of life.

~~~
tantalor
What's this fragmentation you speak of? Surely you won't be forced to use one
particular format you don't like. An API usually supports multiple formats.

------
GhotiFish
Are arrays of maps a bad idea? Someone posted a pom.xml file pointing out how
horrible it was, and I thought to myself "How would this look in toml?"

I was all set to try a translation when I hit this section:

    
    
      <dependency>
        <groupId>com.google.apis</groupId>
        <artifactId>google-api-services-drive</artifactId>
        <version>v2-rev53-1.13.2-beta</version>
      </dependency>
      
      <dependency>
        <!-- A generated library for Google+ APIs. Visit here for more info:
              http://code.google.com/p/google-api-java-client/wiki/APIs#Google+_API
        -->
        <groupId>com.google.apis</groupId>
        <artifactId>google-api-services-plus</artifactId>
        <version>v1-rev22-1.8.0-beta</version>
      </dependency>  
    
    
      <dependency>
        <groupId>com.google.api-client</groupId>
        <artifactId>google-api-client</artifactId>
        <version>1.13.2-beta</version>
      </dependency>
    
      <dependency>
        <groupId>com.google.api-client</groupId>
        <artifactId>google-api-client-servlet</artifactId>
        <version>1.13.1-beta</version>
      </dependency>   
    

How would I represent this in TOML?

    
    
      [dependancy1]
      groupId    = "com.google.api-client"
      artifactId = "google-api-client"
      version    = "1.13.2-beta"
    
      [dependancy2]
      groupId    = "com.google.api-client"
      artifactId = "google-api-client-servlet"
      version    = "1.13.1-beta"
    

That's not right, it clearly should be an array, but I don't think the
standard supports it. At best I would think you'd have to use parallel arrays

    
    
      [dependencies]
      groupIds    = ["com.google.api-client", "com.google.api-client"]
      artifactIds = ["google-api-client"    , "google-api-client-servlet"]
      versions    = ["1.13.2-beta"          , "1.13.1-beta"]
    

and that's just not pretty.

~~~
simcop2387
why not

[dependencies.com.google.api-client] artifactId = "google-api-client" versions
= "1.13.2-beta"

[dependencies.com.google.api-client] artifactId = "google-api-client-servlet"
versions = "1.13.1-beta"

~~~
GhotiFish
but the syntax is colliding there, You define _dependencies.com.google.api-
client.artifactId_ twice.

Also it creates the key value maps:

    
    
       dependencies.com
       dependencies.com.google
       

which shouldn't exist, so that doesn't seem right either.

~~~
simcop2387
Good point, I didn't notice that one. Certainly an interesting case. The ones
that exist that shouldn't I don't think are as big of issues but it is
certainly not an easy problem to solve here.

------
ricardobeat
Ah, the power of fame. Implementations spreading like weeds. Four already in
javascript, even though the spec is not anywhere near finished:

    
    
        npm search toml
        npm http GET https://registry.npmjs.org/-/all/since?stale=update_after&startkey=1361700343737
        npm http 200 https://registry.npmjs.org/-/all/since?stale=update_after&startkey=1361700343737
        NAME                  DESCRIPTION                   AUTHOR            DATE      
        node-toml             TOML parser                   =ricardobeat      2013-02-24 10:08
        toml                  TOML parser for Node.js       =binarymuse       2013-02-24 04:19  toml parser
        toml-node             TOML ====                     =thehydroimpulse  2013-02-24 08:01
        toml-parser           A TOML parser for node.js     =aaronblohowiak   2013-02-24 06:41

~~~
hyperpape
In a few years, maybe we can have "nfnpm" (noise-free node package manager).

------
rogerbinns
What is wrong with JSON? Everything already supports it.

JSON has two drawbacks: a lack of comments (although you could add "#" keys in
relevant places) and no binary support (arbitrary conventions include base64)
but this doesn't support binary anyway.

~~~
nikcub
A few issues (although I do use JSON in config):

It isn't a friendly form of human input. My error rate is 50%+ , you have to
lint on save to catch things that are invisible to the naked eye

No ability to override, extend or reference keys. This is most useful in
config objects where for eg. in a dev object you want to override the username
and password for a database connection but not repeat all the other parameters

No comments

~~~
rogerbinns
You can override in pretty much the same way TOML does. Instead of replacing
an underlying object, you update it with the values read from JSON.

------
jmah
Hey Tom, why not use git's config format?

<http://git-scm.com/docs/git-config>

------
jemeshsu
About the use of mark up language as config file. I see that in most Python
apps, the config file is just another Python script and not using another
markup language. This way makes sence in a dynamic language and it feels
natural. I understand it is a habit to use yaml in Ruby apps for config. Is it
not possible to just use Ruby script as config file since the script can be
loaded dynamically? What are the pros and cons of using another markup
language as config file vs using just the app language(Python/Ruby)?

~~~
gavinballard
Your configuration file might need to be read by more than one language.

It's also nice to have a configuration file mean the same thing regardless of
its runtime environment.

------
ezquerra
This is quite nice but there are a few of things that I miss:

1\. A way to have multi-line values for non array types

2\. A more flexible number syntax (e.g. allow hex and binary integers, allow
exponents on floats, allow NaN and +/-Inf)

3\. Make it possible to have an extra comma after the last element on an array
(as in Python)

4\. Add a way to "include" another config file

#1 is important because some projects require all lines to have a max width of
80 lines, including on config files.

#2 is important for scientific/engineering projects. I think the current
simple format shows that this format is a little too web centric. If this is
going to be used for non-web stuff this is a must.

#3 is something that helps when putting this sort of configuration file in
version control. Without this, adding an extra entry to a multi-line array
creates a diff in two lines rather than 2 (since you must add a comma to the
line above the one that you inserted). This is something I miss in JSON and
which Python did just right (IMHO).

#4 would be useful in cases in which you want to provide a base configuration
file for example.

Also, maybe I missed it but it is not super clear what would happen if you
redefine an existing entry (I hope it is possible). Finally, is order
important?

EDIT: typo.

~~~
tartley
+1 for #3

It's not just in the diffs. Trailing commas make editing the list easier.

------
slurgfest
It seems to me that YAML does this better already (with parsers which are
already high-quality).

If we want simplicity, then why not make sure it is a subset of YAML?

~~~
stormbrew
Agreed, I'd much rather have a normalized subset of YAML without the object
serialization stuff (I don't even understand why it's there: why take a format
intended to be read by humans and then muck it up with complex and dangerous
object serialization notation).

~~~
stock_toaster
And without anchors and reference too.

------
foobar2k
Python has ConfigParser which parses ini-style config files, I assume it uses
some standard grammar.

<http://docs.python.org/2/library/configparser.html>

------
aaronblohowiak
Working node.js version: <https://github.com/aaronblohowiak/toml>

I just need to auth and push it to npm.

~~~
jonpaul
Nice work! It's too bad that this guy squatted on the `toml` package name
without any implementation: <https://github.com/BinaryMuse/toml-node>

~~~
jonpaul
__Update __: It seems that he has provided an implementation now. I feel
better about that, I can't stand when people squat on package names in
Node.js.

(Meta: the edit link expired, hence the reply to myself)

------
tferris
If this had be done by any Tom no one would have upvoted

~~~
nacker
So? If Git had been invented by anyone except Linus, probably hardly anyone
would be using it today.

------
namuol
Don't want to be a naysayer, but what's wrong with something like CSON
(CoffeeScript Object Notation)?

------
Comkid
Would this be considered legal?

    
    
      [ [1,2], ["a", "b"] ]

~~~
mikegirouard
The README says not to mix data types because "that's stupid" (which I don't
know if I agree with); but I don't know if that answers your question.

You have an array of array, which at that level, satisfies the spec. The
children individually keep types contained.

That said, I'm going to assume the intent is to not allow that.

~~~
Comkid
Would it be okay to understand arrays as integer indexed keys?

For example, wouldn't that mean something like:

    
    
      array = [ [1,2], ["a", "b"] ]
    

Be the same as this:

    
    
      [array]
      0 = [1,2]
      1 = ["a", "b"]

------
mostly_harmless
I'll be the first to ask: whats wrong with JSON?

~~~
mikegirouard
IMHO, although it's very easily readable by humans, it's not quite as easily
written by humans.

I've always preferred INI over JSON for this reason.

~~~
krapp
I would prefer INI over JSON as well except that JSON lets me nest arrays and
INI doesn't. So this looks really nice.

On the other hand I'd like to mix my data types as much as I darn well please.

------
billpg
Dear industry, if you are going to add comments to JSON, please make it the /*
*/ variety.

~~~
GhotiFish
Why? I dislike those comments.

~~~
billpg
Text editors will sometimes insert end-of-line characters in the name of word-
wrap.

Using the end-of-line as a comment terminator would require significant
refactoring of JSON parsers, which were previously at liberty to lump CR and
LF together with SP and TAB. A starting and ending token, on the other hand,
fits the pattern already required of a JSON parser.

~~~
Shorel
> Text editors will sometimes insert end-of-line characters in the name of
> word-wrap.

In this decade, only a brain damaged text editor would do that.

------
eksith
Can't wait to see the end result.

This reminds me of a new project I'm working on called Leewh. It's based on
Wheel and kinda has the same overall function, but I needed something to get
my project rolling quickly and using .ini and JSON syntax separately felt...
well... too square, I guess.

I figured I'll come up with something more well rounded.

------
bobfunk
Spent a couple of hours this evening on a small parser written in
CoffeeScript:

<https://github.com/biilmann/coffee-toml>

Error handling is rough and it still doesn't handle groups with dots
[alpha.beta], but appart from that it should be fairly complete.

------
nixgeek
Another parser for Ruby is hiding over here: <https://github.com/parkr/tock>

~~~
twelvechairs
Its empty.

    
    
        class Tock
           VERSION = "0.0.1"
           # TODO: IMPLEMENT ALL THE THINGS
        end

------
espadrine
The issues I find with TOML:

<https://gist.github.com/espadrine/5028426>

------
cantankerous
Looks interesting. What's the rationale behind disallowing something like
"-.1234". Is this how it's done elsewhere?

~~~
aaronblohowiak
makes parsing slightly easier. also '-.' might blur together so you would
think it is just '-1234'

------
aristidb
I like the format. A formal spec would be nice, to aid implementations in as
many languages as possible.

------
ehm_may
I really like the hash key format.

Parsing JSON for arbitrarily nested keys is nasty, and this makes it extremely
natural.

------
benatkin
How is this different, or better, than the configuration format that git
itself uses?

------
jaequery
at first glance i thought this was crazy. but now, i think this might actually
work.

