
Making Coffeescript’s Whitespace More Significant - raganwald
https://github.com/raganwald/homoiconic/blob/master/2011/11/sans-titre.md#readme
======
Groxx
I like it a lot. It would also simplify the ".end()" function in jQuery[1],
because you could control it with indentation:

    
    
      $('.class')
        .find('.sub')
          .remove()
        .find('.stuff')
          .addClass('other')
          .find('.more_stuff')
            .removeClass('things')
        .filter('li')
          .appendTo('#my_list')
    

[1]: <http://api.jquery.com/end/>

~~~
dextorious
Yes, it could control it with indentation. "Simplify" though? Are you sure?

This whitespace zig-zaging is barely readable.

~~~
Groxx
Very much so. Otherwise you get this, at best:

    
    
      $('.class')
        .find('.sub')
          .remove()
          .end()
        .find('.stuff')
          .addClass('other')
          .find('.more_stuff')
            .removeClass('things')
            .end()
          .end()
        .filter('li')
          .appendTo('#my_list')
    

Or, to match jQuery's documentation style for that function, this:

    
    
      $('.class').find('.sub').remove()
        .end().find('.stuff').addClass('other').find('.more_stuff').removeClass('things')
        .end().end().filter('li').appendTo('#my_list')
    

Matching .end() with each selector change is equivalent to writing valid XML
by hand, without the aid of an auto-tag-closer, and without a validator - you
only see the error on run, and only if you hit that code path, and only if it
does something you notice isn't correct.

None of this is meant to imply that chaining things like this is a Good Idea™,
and I avoid .end() like the plague and use intermediate variables. But when
you don't need the root or intermediate results for anything else, yes, this
is more readable, more easily optimized (you can't get it wrong, every level
is cached for you), less prone to error, and significantly fewer characters /
lines of code. That's simplifying and improving.

\-- late-edit:

Less whitespace zigzaggery is also possible (I agree, not easy to read such
dense zigzagging), and similarly easier with significant whitespace. My
example was essentially just a trivial one, I tend to see larger ones where I
see that kind of indentation at all. Is this better?

    
    
      $('.class')
        .find('.sub').remove()
        .find('.stuff').addClass('other')
          .find('.more_stuff').removeClass('things')
        .filter('li').appendTo('#my_list')

~~~
tordek
In your last example, `.find('.more_stuff')` works on the value returned from
`.addClass('other')` (or so it'd seem), so it behaves differently.

~~~
Groxx
Which, with jQuery, is the same as the results from the most-recent selector
in the chain (in this case, the .find('.stuff') before it). Normally though,
you'd be absolutely correct, and that example would need to nest the
.addClass('other') inside its .find so it doesn't pollute the next .find:

    
    
      $('.class')
        .find('.sub').remove()
        .find('.stuff')
          .addClass('other')
          .find('.more_stuff').removeClass('things')
        .filter('li').appendTo('#my_list')

------
jashkenas
@raganwald -- Lovely post, as always. One (somewhat important) question: If
same-level continued calls mean chaining, and indented continued calls mean to
use the value of the previous line, then what do outdented continued calls
mean?

    
    
          first
        .second
    

Or heading 'offsides', in this fashion:

    
    
          first
            .second
        .third
    

Are those both syntax errors at compile time?

Also, how does this play with things that are not necessarily side-effect-ful?
For example:

    
    
        object.property
        .value
    

Currently means `object.property.value`. Under your rubric, would it evaluate
to `object.value`? Making any trailing access effectively a no-op?

~~~
raganwald
Outdented calls use the last value in the column to their left. So:

    
    
          first
        .second
    

Never means `first.second`, either `.second` is an error, or it is applied to
something above this, such as:

    
    
        someValue
          .methodName ->
            first
          .second
    

...which becomes:

    
    
        someValue.methodName(-> first)
        someValue.second
          

If such a value can't be found, it's an error. Of interest is the meaning of
`.propertyName` in the leftmost column, as in:

    
    
        object
        .property
    

I suggest it's an error. As long as you can write:

    
    
        object.property # or...
        object
          .property
    

I don't see the value in

    
    
        object
        .property
    

This:

    
    
        object.property
          .value
    

Means `object.property.value`.

~~~
sausagefeet
IMHO, Smalltalk got this right using ; rather than whitespace. And it's easier
to read.

------
groby_b
I read the article, and I simultaneously liked the idea and was fundamentally
disturbed by it. At the same time, Smalltalks cascading messages never
disturbed me.

So after going through the initial two reactions I have to every thought-
provoking blog posts (#1: OMG - raganwald is insane!, followed by #2: OMG -
raganwald is brilliant!), here's what actually doesn't work for me:

It is overloading whitespace with both control flow _and_ data flow.
Smalltalks cascade neatly introduce a different symbol to sidestep that.

So, ultimately, I'd rather see cascades introduced into CS than overloading
the meaning of whitespace.

~~~
wwweston
Amen. The part about data flow is insightful. The idea of overloading it into
whitespace is insane. There has to be a better way, and SmallTalk's might be
it.

Blocks delimited by whitespace work because they rely on a familiar set of
conventions that have already evolved among most programmers over 30 years or
more (and interestingly, a lot of programmers still _really_ hate the idea of
having it enforced). Moving what is essentially the alleged JavaScript
BadPart(TM) "with" into an overloaded whitespace/dot combo is going to throw
at least as many programmers for a loop as "with" has.

I also think there's something smells wrong about the examples. For all the
dataflow insight -- did I miss the part where he talks about how exactly we're
keeping track of the destination of the return values? And the problem with
the ".pop" example specifically might be less that you can't call it fluently
three times than that you _have_ to call it three times to get three items off
the collection instead of ".pop 3".

~~~
groby_b
There's no destination, exactly.(I think). A "data scope" for lack of a better
term just implies that all functions are called on the object returned by the
statement that introduced the scope.

It is _exactly_ like JS's 'with' statement, just that we let the language
automatically infer that we probably meant "with".

------
iambot
I actually agree with everything that is suggested in this submission. I
wonder what the best way would be to get it implemented is. Perhaps as a form
of feature request poll, to see what the suppor for it is. Or perhaps a call
for people that agree to "watch" it on github.

~~~
clutchski
It's being discussed here (along with some alternate syntax ideas):

<https://github.com/jashkenas/coffee-script/issues/1889>

~~~
raganwald
I think this is a little closer:

<https://github.com/jashkenas/coffee-script/issues/1495>

The above link is actually a discussion about Dart’s “Monocle Moustache,” so I
console myself that when people say it’s ugly, they mean the moustache and not
significant whitespace :-)

~~~
moomin
I'm the originator of that issue. There's been previous proposals, and
subsequent ones). I originally proposed different syntaxes, but the current
proposal was superior.

I've been amazed at how much interest there has been in this. Every time I
think it's over, more people pile in. The only catch is: there are a couple of
people who still need to be convinced, and they're actually the important
ones. I'm pretty sure that a pull request for this would not be accepted.

------
thedufer
I like the idea, but the lack of backwards-compatibility is less than ideal.
Some people who update CoffeeScript compilers will suddenly find their code
mysteriously doing the wrong thing (when they wrote what this suggestion
considers "cascading messages", but expect them to not cascade). I have worked
on multiple projects that would fall prey to this issue.

~~~
Semiapies
Perhaps a "legacy" option to disable cascading?

~~~
thedufer
The problem is the time it will take people to realize that they need to turn
on the "legacy" option. The failures this change could cause could be very
difficult to track down.

~~~
davej
Make it an opt-in feature for the moment and maybe make it the default for CS
2.0 or some other milestone.

------
Uncompetative
@raganwald -- fascinating ideas

Whilst the 'staircase' form forces each message to await the return from the
reciever. The 'cascade' form could be used to post commands to a concurrent
process into a separate recieving processor's message queue with no need to
await a reply - as in Eiffel's Command/Query Separation Principle.

Also, 'futures' could be used to decouple queries from having to await replies
from the reciever's of their messages. All that is needed is for variables
defined through assignment to a query to remain potentially undefined until
needed by some command. At this point all of the command's arguments would
need to be defined and it would either have to await a reply from the queried
process, or await some globally visible but yet to be defined thread to bind a
value to the variable i.e. dataflow.

All of this hinges on using a language that doesn't freak out when processing
undefined variables, but regards them as their symbolic names, reducing
complex expressions with a collection of rewrite rules.

I'd be interested to know what you think about my proposal for these richer
concurrent semantics.

\-- Uncompetative

~~~
raganwald
I like it! I’ve had some similar thoughts along slightly different lines
recently.

------
lisper
The problem with significant whitespace is that you can't count on whitespace
to be preserved across many common protocols. Text editors will convert spaces
to tabs and vice-versa. HTML rendering eats whitespace. Cut-and-paste may or
may not preserve whitespace.

Python has had this problem since its inception. If you're editing a Python
program in emacs python mode and you hit TAB at the wrong time you can
inadvertently change the semantics of your code. And that's just the tip of
the iceberg. I'm a big Python fan, but significant whitespace is a bad idea.

~~~
masklinn
> Text editors will convert spaces to tabs and vice-versa.

Get a good text editor?

> HTML rendering eats whitespace.

Except when you tell it not to, of course.

> Python has had this problem since its inception.

problem being mostly encountered by those who never use it, interestingly.

> If you're editing a Python program in emacs python mode and you hit TAB at
> the wrong time you can inadvertently change the semantics of your code.

So can you if you hit "}" or ";" at the wrong time in a braceful language...

The primary (and as far as I'm concerned the only significant) issue of
significant indentation (it's not even significant whitespace) is auto-
generated code (which is why Haskell has a braceful syntax and an indentation-
based transformation of it), as giving the right contextual indentation to a
piece of code may make the code generator much more complex (codegen
targetting Python should probably generate python bytecode, rather than
generating code).

And to support my claim that significant indentation is not effectively an
issue, I will use Haskell: Haskell can be written using both a brace-and-
semicolon syntax and an indentation-based one. Both forms are perfectly
equivalent and can be translated into one another without loss of information.

I do not remember ever seeing a Haskell piece of code, article, demonstration
or example which used braces except when the article was about the braceful
syntax or about auto-generated code.

If significant indentation was such a crippling problem, would Haskell users
not have coalesced around the "less problematic" braceful syntax?

~~~
lisper
I use Python a lot. And I encounter these problems often enough for them to be
very annoying.

> So can you if you hit "}" or ";" at the wrong time in a braceful language...

The difference is that when you hit "}" or ";" the effect is always the same,
it's always visible, and it's always possible to undo by hitting DELETE. If
you hit either of those characters N times, you can always undo that by
hitting delete N times.

This is not true for the tab key. The effect of hitting TAB depends on the
context. Determining whether your last press of the TAB key had an effect or
not requires that you remember the previous state, and so undoing the effect
(or lack thereof) of hitting TAB requires that you remember the previous
state. And if you ever do a block auto-indent at the wrong time you are pretty
much hosed.

~~~
snprbob86
What tools are you using?

I had _a lot_ of problems with Python when I was trying to edit code in a
variety of different IDEs, text editors, etc.

That was _years before_ I discovered the beauty of Vim (sub in Emacs here, if
you like).

For your vimrc:

    
    
        " Indentdation
        set tabstop=2
        set shiftwidth=2 softtabstop=2
        set smarttab
        set expandtab
        set smartindent
    
        " Shed light on hidden things
        set list
        set listchars=tab:»»,trail:•
        set wrap
        set linebreak
        set showbreak=↳
    

This will use soft-tabs (assert (> spaces tabs)) and expose tabs and trailing
spaces on lines using the » and • characters respectively. They show up as a
nice, obvious blue in my theme.

Sometimes, this can be annoying for other people's code, who prefer tabs. Easy
fix is to `:set nolist` on those buffers.

This also works with file formats that expect tabs, like Makefiles, which have
plugins in most Vim distributions that will forcibly type a tab when required.
Will be obvious when you see the ». If you ever want to explicitly type a tab,
go to insert mode and type ^v<tab> (that is control+v, then press tab). ^v
lets you disable custom mappings for the next chord, so instead of <tab>
meaning "indent" it will mean "type a damn tab character!"

Meanwhile, our non-technical CEO does some Haml/Sass (both whitespace
significant) using TextMate. I had to write some on-save scripts for him to
make sure he doesn't submit any trailing whitespace and always ends his files
with a trailing newline. _grumble grumble_

~~~
lisper
I use emacs and python-mode. This is not an editor issue. It's more
fundamental than that. The problem is this:

    
    
        block:
          stmt1
          stmt2
          stmt3
        stmt4
        stmt5
    

If your indenting gets screwed up for ANY reason there is not enough
information left to reconstruct it. There is enough information reconstruct
the indent at stmt1 (thanks to the colon, which is essentially equivalent to a
left curly brace), but not enough to reconstruct the outdent at stmt4. There
are many, many ways for indentation to get screwed up.

~~~
jholman
But to reiterate masklin's point, in C/etc, if your braces get screwed up for
ANY reason there is not enough information left to reconstruct them. So what's
the difference between meaningful braces and meaningful indentation?

To that you replied that the output of the Tab key depends on context, and
implied that in your editor(s), sometimes the result of the Tab is invisible,
and/or cannot be reversed by hitting Delete (or Backspace). And snprbob86
pointed out that in his/her editor (and mine), this isn't a problem. Tab never
does anything invisible, and it's always reversible with Backspace. So what's
the problem?

And although I assume you noticed this too, just to be clear and err on the
side of explicitness, it seems to me that that there're two things going on
here. One half is arguing about whether or not there's a fundamental problem
with significant indentation that is not present in languages without
significant indentation, and the other is an attempt to solve non-fundamental
problems that others might have (e.g. complicated state in tabs, possibly due
to using a poor editor).

~~~
lisper
> if your braces get screwed up for ANY reason there is not enough information
> left to reconstruct them

That's not necessarily true. If my code is indented, then I can reconstruct
the braces from the indentation. Also, it's a lot easier to inadvertently
screw up whitespace than a brace because there are so many more things out
there in the digital world (HTML, autoindent) that muck around with whitespace
than things that muck around with braces.

The right answer is to SPECIFY block structure using braces (or something
equivalent), but then RENDER the block structure using (automatically
generated) indentation. It's perfectly fine for the compiler to complain if
they don't match. This is one case where redundancy is a feature, not a bug.

If you hate braces and love whitespace so much, why are you not urging Guido
to get rid of the colon? It's essentially equivalent to an open brace. Why is
an open brace more pythonic than a close brace?

> using a poor editor

I use emacs, but just to see if maybe I'm missing something I fired up vim and
tried editing some Python code. AFAICT vim (at least out of the box on Snow
Leopard) is not aware of Python syntax at all.

------
phzbOx
There was a conversation on this recently on HN. IIRC jashkenas said he liked
the idea but it would be better to encourage library author's to write a
functional style enabling chaining rather than adding a new feature to the
language.

Btw, I found that missing too in Python and created Moka
(<http://www.phzbox.com/moka/> It's still in heavy construction)

~~~
moomin
Actually, that was in response to a previous request for a dedicate chaining
syntax. This syntax is actually more useful for dealing with when chaining is
already implemented.

------
quitedisgusted
For posterity, the original title of this blog post was "White Power":

[https://github.com/raganwald/homoiconic/commit/bd55e8ad731cc...](https://github.com/raganwald/homoiconic/commit/bd55e8ad731cc6d83df66fa86b4547112f2fa3d4)

Reg Braithwaite doing the Clayton Bigsby. Stay classy.

------
scotty79
Where do I sign?

Do you think it would be hard to introduce such feature to CoffeScript on your
own?

~~~
jashkenas
No, it wouldn't be terribly hard. The source code is all annotated to make it
easier for folks to get started trying out their own flavors:

<http://coffeescript.org/documentation/docs/grammar.html>

You'd probably want to start by altering the lexer to stop considering ...

    
    
        a
          .b
          .c
          .d
    

... as effectively a single line, and turn it into some sort of "chain" node.
Then, the value of the expression "a" can be cached at the beginning, and all
further operations in the chain can be performed against the original value.

------
jewel
I've thought about something similar, but for a different reason. I'd like to
be able to omit the parenthesis on multiline, chained statements, like this:

    
    
      $ 'class'
        .addClass 'babies'
        .removeClass 'kids'

------
alexyoung
This is important to me:

s/Coffeescript/CoffeeScript/g

s/Javascript/JavaScript/g

~~~
mhartl
I don't know why so many people are cavalier about this. Proper capitalization
is part of proper spelling, which is important for clear communication.
Whether it's _37Signals_ , _Github_ , or _Javascript_ , it irks me every time.

This cavalier attitude is so entrenched that attempts to correct it are
sometimes even met with hostility, which on HN manifests itself as downvotes.
Apparently there are those who feel that comments on this subject (e.g., this
one or its parent) don't add to the discussion. And yet, I sometimes stop
reading otherwise interesting articles simply because they exceed my
misspelling or typo threshold—reading badly edited copy is unpleasant, and the
lack of attention to detail undermines its credibility. I'd much prefer to
avoid the problem altogether. In that spirit, I'd like to offer some surefire
advice on how to prevent this kind of nitpicking: _Get it right in the first
place._ Anyone who can develop awesome web apps or write an optimizing
compiler can surely spell _37signals_ , _GitHub_ , and _JavaScript_ correctly
as well.

~~~
subsection1h

        I don't know why so many people are cavalier about this.
        Proper capitalization is part of proper spelling [...]
    

My favorite example of this behavior is when copywriters, graphic designers,
etc. are inconsistent regarding the capitalization of the name of their own
organization. I can't count all the times I couldn't figure out how best to
bookmark an organization's website because the copywriters, etc. referred to
it as Organization, ORGANIZATION, OrganiZation, and Organi Zation.

------
alexyoung
"writing functions to return a certain thing just to cater to how you like to
write programs is hacking around a missing language feature"

To me chaining demonstrates just how flexible JavaScript is, rather than
pointing out a fundamental missing language feature. In fact, by avoiding
adding language features like this, I feel like the language is simpler and
allows me to be more creative within its constraints.

~~~
raganwald

        Java(S|s)cript != Coffee(S|s)cript
    

:-)

Lisp does not consider indentation significant, Python does. Smalltalk has
cascading messages, Ruby doesn’t. I think these are simply design choices, and
the goal is to find a set of choices that work together harmoniously.

Note that my proposed syntax still allows you all the chaining you want.

------
PLejeck
Nice title change, much better than the racism from before, but I'm afraid my
spacebar, being white, is offended.

Additionally, my previous opinion
(<http://news.ycombinator.com/item?id=3296010>) still stands, that whitespace-
significance isn't such a good thing, and this whole "YAY TREES" stuff is
overrated.

~~~
raganwald
I regularly use languages where whitespace is not significant. However, in
those languages, whitespace is not significant. It isn’t significant some of
the time and not significant some of the time. It is a separator all of the
time.

Coffeescript is a language where whitespace is held out to be significant, so
I’m simply saying “Great! Well in _that_ case, let’s make it _more
significant_."

I have no argument with the idea that perfectly good programming languages do
not consider whitespace significant.

~~~
PLejeck
I don't think whitespace should EVER be significant, it seems like a very bad
setup that's more prone to issues. I also have an unnatural hatred of all
things Compile-to-JS.

~~~
tomp
Which programming language do you prefer then? Don't say C, because in C,
whitespace obviously is significant:

    
    
      int a = 1;
    

vs.

    
    
      inta=1;
    

other languages, like CoffeeScript and Python, simply take it to the next
level.

~~~
agscala
I think your example is a little too much. Whitespace isn't significant in C.
When talking about programming languages, I don't think whitespace refers to
spaces between tokens. It's mostly a reference to indentation.

~~~
masklinn
> Whitespace isn't significant in C.

As demonstrated, it is.

> When talking about programming languages, I don't think whitespace refers to
> spaces between tokens.

But that makes no sense, it is whitespace, and it has semantic significance
(hence being significant). Whitespace was not significant in older versions of
Fortran, and that allowed you to write

    
    
        DO30I=10,100
    

which was interpreted as

    
    
        DO 30 I = 10, 100
    

_that_ is non-significant whitespace.

Ruby has long struggled with how it interpreted its whitespace, for a long
time

    
    
       sin (x) + y
    

would be interpreted as

    
    
       sin(x + y)
    

for instance, rather than

    
    
        syn(x) + y
    

how is _that_ not significant whitespace?

~~~
tordek
But that makes no sense, it is whitespace, and it has semantic significance
(hence being significant). Whitespace was not significant in older versions of
Fortran, and that allowed you to write

    
    
        DO30I=10,100
    

which was interpreted as

    
    
        DO 30 I = 10, 100
    
    

An amusing bug I saw in Expert C Programming mentioned how somebody once typed
e dot instead of a comma, and

    
    
        DO 30 I = 10. 100
    

ended up interpreted as a simple real assignment:

    
    
        DO30I = 10.1

------
perfunctory
I like it

