

Sweet.js: Hygienic Macros for JavaScript - philf
https://github.com/mozilla/sweet.js

======
saurik
Despite spending a lot of time both in the design Wiki and in the talk
discussing the importance of being able to determine whether a / indicates a
regular expression literal or a division operator entirely within the lexer
(as opposed to using the parser, which is how JavaScript is generally
defined), the algorithm that this developer implemented does not actually
work.

First off, an example where it works:

    
    
        a
        /5/
        7
    

If you run this through sjs you get:

    
    
        a / 5 / 7;
    

This is because, in JavaScript, statements continue across line boundaries
until they are either explicitly terminated by a semicolon or a syntax error,
in which case the parse is retried at that point as if a semicolon had been
provided. In this case, that means we have a single statement that is a
division of these three expressions: a, 5, and 7.

However, let's take a more difficult case:

    
    
        a = function() {}
        /5/
        7
    

This is also a single statement: you are entirely allowed to attempt to divide
a function literal by a number, you will simply get the value NaN as output.
If you take this file and run it through node, adding a "console.log(a)" to
the end, that is in fact what you will get: NaN. However, when first run
through sjs, you instead get "[Function]".

The reason is that sjs translated the code to:

    
    
        a = function () {
        };
        /5/;
        7;
    

This is incorrect, and demonstrates how difficult some of these underlying
issues are when parsing languages that have intertwined lexer and parser
state. :( Attempting some other test cases involving regular expressions (but
not semicolon insertion) also failed: it seems a lot more work will need to be
done on this before it will be able to process general input (and it is not
100% clear to me that the shortcut required is even possible: I haven't
thought enough about it yet to say for certain, however).

(I work on the JavaScript parser for a compile-to-JS language used by people
doing jailbroken-iOS development for live introspection of running processes,
and thereby that was the first thing I was interested in: how well the parser
worked. ;P I have intentions to add reader macros, and then replace all of the
extra Objective-C syntax I added with them, but I haven't gotten around to it
yet. FWIW: I actually found and fixed a bug in my parser while writing this
comment. ;P)

~~~
disnet
Yeah there are certainly a few bugs remaining in the reader :)

It actually does the right thing if the function is named:

    
    
        a = function foo() {}
        /5/
        7
    

correctly translates to:

    
    
        a = function foo() {
        } / 42 / 7;
    

But clearly I missed the unnamed case. You mentioned finding a few other bugs?
Would you mind submitting a bug report on github? I would love to fix those
too!

~~~
saurik
Sorry, was distracted by the other conversation with dherman, and really
shouldn't be spending any time on this anyway, but I've verified your operator
associativity is wrong.

To start with some example code:

    
    
        for (var a = 7 in /7/ in 9 in b);
    

If I run that in node I get:

    
    
        TypeError: Cannot use 'in' operator to search for '/7/' in 9
    

As this code is vaguely equivalent to:

    
    
        var a = 7; for (a in ((/7/ in 9) in b));
    

However, it gets converted by sjs to:

    
    
        for (var a = 7 in (/7/ in (9 in b)));
    

That's obviously different, and gives this error instead:

    
    
        ReferenceError: b is not defined
    

(I really should get back to actually doing my job now, though; if dherman
responds again I'll totally notice and follow up: that conversation is really
interesting to me.)

~~~
disnet
Thanks for taking the time to write these up! I'm tracking them here [1]. The
first two should be fixed and I should have the third ready soon.

[1] <https://github.com/mozilla/sweet.js/issues/18>

------
saurik
There seemed to be some confusion during the question and answer segment
regarding the relative hygiene of macros in Clojure near the end of the
motivation and design talk; while it was totally off-topic for the video (and
it thereby made sense to take it offline), I personally wish I had been around
afterwards to ask the guy who seemed so adamant that syntax-case was
fundamentally better than the Clojure solution (which he claimed didn't do it
correctly) why that was the case.

I'm totally willing to believe it, but based on my understanding (which sadly
is somewhat limited for Scheme, but fairly in-depth for Clojure) it isn't
intuitive to me: it would seem like the way you escape hygiene in Clojure
(which by default achieves correct hygiene by attaching namespaces to symbols
read for macros or inside of quasi-quote) is quite similar in semantics--but
simpler in practice due to being exceedingly less verbose--than using
syntax->datum and datum->syntax.

~~~
dherman
That was me. :) Caveat: I know more about hygienic macros than I do about
Clojure, so I'm not in a position to critique Clojure specifically.

Hygiene is (roughly speaking) about getting scope right _by default_ but has
never been about forcing it on the programmer. Moreover, there are two
components to it, only one of which is easy to achieve in an unhygienic
system. It's easy to ensure your macro renames introduced /bindings/ by using
gensym. But if your macro introduces /references/ to existing variables, it's
very hard to protect against those references getting captured at the site
where clients call your macro.

I believe in Clojure they get around this for some cases by letting you fully
qualify a reference to a library binding, for example. But what if your macro
wants to refer to a variable that's local to it? Such as an unexported library
function, or simply a local variable. Again, I don't know if Clojure has an
answer to this.

One concrete example: write a `define-inline` macro-defining-macro. At the
call site, a user might write

    
    
        (define some-local-variable /* something */)
        (define-inline (foo x)
          (+ x some-local-variable))
    

In an unhygienic system, they should first of all fully-qualify the `+` to be
safe (yuck!) whereas a hygienic system just gets that right by default. But
more critically, how can they be sure that some client of `foo` won't write:

    
    
        (define some-local-variable "something else")
        (foo 42)
    

Not only will that break, it's not clear how to fix it. A hygienic system gets
this right.

I do agree that the state of the art in hygienic macros is too complex. I just
haven't seen another system that makes this kind of thing work. But I would
like to experiment with Clojure's macros more to see if they have an answer.

Dave

~~~
saurik
Clojure does not have these issues: when the macro is called, the symbols are
already attributed with the full namespace qualification, and usage of quasi-
quote inside of the macro definition will also apply namespace qualification
to variables local to the definition of the macro; you have to go out of your
way to break this. You should spend more time looking into it before claiming
to people that it doesn't work correctly; you could easily have just said
"that's a good question, we'll look into that after the talk" rather than
telling the person that Clojure wasn't as good.

~~~
dherman
> Clojure does not have these issues: when the macro is called, the symbols
> are already attributed with the full namespace qualification, and usage of
> quasi-quote inside of the macro definition will also apply namespace
> qualification to variables local to the definition of the macro; you have to
> go out of your way to break this.

Again, not a Clojure expert, but a namespace is coarser-grained than
individual local scopes, right? The problem I'm talking about is when you have
a local variable inside a nested scope (e.g. inside a `let`). If this is not
named by a namespace, then you would still get collisions.

Regardless, Clojure's approach seems to be much closer in spirit to a hygienic
macro system: it attempts to get scoping correct by default, and allows you to
intentionally capture.

> You should spend more time looking into it before claiming to people that it
> doesn't work correctly; you could easily have just said "that's a good
> question, we'll look into that after the talk" rather than telling the
> person that Clojure wasn't as good.

Fair enough as far as it goes. I did react snappily, but you didn't hear the
offline conversation (this whole thing was a dialog at my office with a friend
and colleague, incidentally) where I said "I'm not entirely sure, but I
believe there are things you just can't express with systems like Clojure's."
And we concluded, just as you reprimanded me to do, that we should look into
it further when we have time.

Dave

~~~
saurik
Correct: I did not hear the private conversation. I only heard the talk that
was made public along with this project that was posted here, and which was
recommended as an information source, and pretty much serves as the web page
and documentation for this project ;P.

> Again, not a Clojure expert, but a namespace is coarser-grained than
> individual local scopes, right? The problem I'm talking about is when you
> have a local variable inside a nested scope (e.g. inside a `let`). If this
> is not named by a namespace, then you would still get collisions.

Ok, so are you are concerned with the case where the person defining the macro
uses a symbol from the namespace of the person using the macro but that symbol
has been rebound by the user inside of a let surrounding the aforementioned
usage of the macro?

If so, that requires a cyclic module dependency, which isn't allowed (as the
namespace from which you are getting the symbol would need to be required, but
it would have to require back to get access to the macro: it does eager name
binding, so that can't happen).

If not, and you are just talking about the simpler and more obvious case of a
let shadowing a binding inside of a larger scope used by a macro, that works
fine. The following code prints "1100", despite the macro expanding to
multiple uses of the same symbol "t".

    
    
        (def t 1000) 
        (defmacro run [x] 
            `(+ ~x t))
        (let [t 100] 
            (prn (run t)))
    

You might then wonder (as I have) whether this is implemented by simply
renaming the variables bound by let to something random: that would be
sufficient to implement this. However, if I go out of my way to unquote an
unqualified symbol, I can capture: the following code prints "1200".

    
    
        (def t 1000)
        (defmacro run [x]
            `(+ ~x t ~'t))
        (let [t 100]
            (prn (run t)))

~~~
dherman
> Ok, so are you are concerned with the case where the person defining the
> macro uses a symbol from the namespace of the person using the macro but
> that symbol has been rebound by the user inside of a let surrounding the
> aforementioned usage of the macro?

> If so, that requires a cyclic module dependency...

Not necessarily. For example (forgive the Scheme syntax), all in one module:

    
    
        (define thing "outer thing")
        ;; define-inline is the above macro-defining-macro
        (define-inline (foo prefix)
          (string-append prefix thing))
        (let ((thing "inner thing"))
          (foo "should say outer thing: "))

~~~
saurik
That seems to be my "if not," case, which I provided some examples for; if
this is different, can you please be more explicit? It seems like your "thing"
is my "t" and your "foo" is my "run": the only difference is then that I went
out of my way to make it more complex my passing the inner thing through the
macro to demonstrate it wouldn't get mangled.

    
    
        (def thing "outer thing")
        (defmacro foo [prefix]
            `(str ~prefix thing))
        (let [thing "inner thing"]
            (prn (foo "should say outer thing: ")))
    

"should say outer thing: outer thing"

(edit:) Alternatively, maybe you are focussing on the define-inline "macro-
defining macro"; you mentioned it here again as "the above", and you had used
it above, but as it wasn't defined it didn't seem important. I tried to go
ahead and implement it, although to be honest I feel like I did it wrong
(spending more time thinking about it, I believe it is correct, modulo your
definition of "inline"); that said, it "worked".

    
    
        (defmacro def-inline [[name & args] code]
            `(defmacro ~name ~(apply vector args)
                ~code))
    
        (def thing "outer thing")
        (def-inline [foo prefix]
            (str prefix thing))
        (let [thing "inner thing"]
            (prn (foo "should say outer thing: ")))
    

"should say outer thing: outer thing"

~~~
dherman
Interesting. I don't see how this works. I wonder, is it different if you do
this?

    
    
        (let [thing "outer thing"]
          (defmacro foo [prefix]
              `(str ~prefix thing))
          (let [thing "inner thing"]
              (prn (foo "should say outer thing: "))))
    

Dave

~~~
saurik
That is not possible as written, because the "defmacro" is not executed to
define the macro until after the outer let is already executing, which is
after macro expansion of that form (and thereby its children), as it has
already been read: so what I get for that is a really weird error that I'm
passing too many arguments to "foo", as if it were a function (which it is
not; albeit I'm not certain what it _is_ ;P).

However, I can use the def-inline that I wrote in the edit to my earlier reply
to demonstrate that if you reorganized this code in a way that was
semantically equivalent but hoisted the macro, it would work the way you think
it should: the definition of the thing from the let surrounding the macro-ish
definition is used, not the one from the call site (or the global one in the
namespace).

    
    
        (defmacro def-inline [[name & args] code]
            `(defmacro ~name ~(apply vector args)
                ~code))
    
        (def thing "outer thing")
        (let [thing "middle thing"]
            (def-inline [foo prefix]
                (str prefix thing)))
        (let [thing "inner thing"]
            (prn (foo "should say middle thing: ")))
    

"should say middle thing: middle thing"

(edit:) Oh, that wasn't semantically equivalent, as the second let is not
inside of the first. However, if I do that, I get the same behavior as I get
in the other case (that it doesn't actually expand the macro at all and treats
the form as a function call), as I'm obviously just defining the macro again
inside of the same already-read form in which I'm using it.

(further:) Okay, and the reason why that is working is that the way I wrote
def-inline inlined the code from def-inline into the macro itself. That is
probably not what you wanted from def-inline: this is more like def-const (or
def-static or something). I thereby tried doing this instead:

    
    
        (let [thing "middle thing"]
            (defmacro foo [prefix]
                `(str ~prefix thing)))
        (let [thing "inner thing"]
            (prn (foo "should say middle thing: ")))
    

This, in fact, does _not_ return the "correct" string: instead, it fails to
work at all, as the "thing" used inside of the macro is supposedly not defined
(and worse, if I have a global def for "thing", I get that value). So, this is
is a case where the macro is unable to use bindings that are local to the time
when the macro definition is executed: it can only deal with global-ish names.

For the record, I think that is unrelated to what I normally think of with
relation to hygiene: the macro is able to modify the code using it without
accidental capture, but humorously it, itself, is unable to take advantage of
symbols that have been bound locally around it. I totally accept that this is
probably a flaw (I only say "probably" as I'm willing to believe someone from
Clojure can convince me otherwise; it certainly _seems_ like a flaw, though).

(more:) I am increasing the weight of that "probably", as I'm noting that the
person calling this code has absolutely no way to refer to the thing that I
have access to: there is no path no matter how complex or awesome that would
let it refer to my "middle thing". I can inline it with ~, but then it isn't a
binding anymore; however, that's _actually equivalent_ , as Clojure does not
have setf: "thing" is a constant, and so even if I had a function inside of
this let closed over that thing, I couldn't modify its value.

~~~
dherman
I'd forgotten about the whole Lisp tradition of expanding + evaluating one
form at a time -- once again betraying my Scheme biases. :) That's making it
hard for me to think about how to compare the two. I'm really not familiar
enough with the one-form-at-a-time approach.

Anyway, my takeaways here are:

\- I shouldn't've said anything about Clojure specifically, because a) the
design space is different and b) they have some form of hygiene-like
something-or-other that I don't know enough about. Gotta go study them!

\- I still don't know how to do hygiene in Scheme-like languages any other way
than the approaches I've seen, but I do agree they're more complicated than I
wish they were.

Dave

------
SchizoDuckie
OK, either your description page doesn't get your point across at all, or
you're missing a definition somewhere.

'Wish the function keyword in JavaScript wasn't so long? What if you could
define functions with def instead?'

Erh, no? Why in the lord's name would I ? Is that your big selling point? 'I
don't like function() because it's too long?'

'Want a better way to make "classy" objects?'

<provides example that looks like php vomited on Javascript after mating with
ruby>

Why would you want to make javascript less like javascript, introduce a
dependency to javascript that can read your language, and then compiles back
into javascript in realtime, in a way that will obviously make debugging
nearly impossible (like coffeescript)?

Am I the only one that doesnt understand the use case of this? Or should this
have been presented as just another lexer/parser?

~~~
disnet
> OK, either your description page doesn't get your point across at all, or
> you're missing a definition somewhere.

The description page (like the rest of the project at them moment) is
definitely a work in process. Sorry about the confusion!

> 'Wish the function keyword in JavaScript wasn't so long? What if you could
> define functions with def instead?'

> Erh, no? Why in the lord's name would I ? Is that your big selling point? 'I
> don't like function() because it's too long?'

This is just a really basic example of what you can do. Obviously you could
have the same effect with sed but we need something simple to shows what basic
macro definitions look like.

> 'Want a better way to make "classy" objects?'

> <provides example that looks like php vomited on Javascript after mating
> with ruby>

I'm assuming you refer to the macro definition with $ and whatnot. If you have
ideas for better syntax please contribute! The notation is by no means final.
That said I think it's pretty good for what it needs to do, matching syntax
patterns and transforming them into new bits of syntax. Certainly more
readable than a full parser/compiler.

> Am I the only one that doesnt understand the use case of this? Or is it just
> another lexer/parser?

The idea is to provide a middle way between no syntax extensions and writing
your own full compile-to-js language languages like CoffeeScript which allow
you to add any kind of syntax you want, but at the expense of being able to
compose extensions. If you like the class notation of CS you must buy into all
the syntax choices of CS whereas macros allow you to add just the syntax you
want.

Whether you want new syntax is of course a different question. Other languages
like scheme and closjure have found macros useful so sweet.js is an experiment
to find out if the same syntactic extensibility that they provide is useful in
JavaScript.

Hope that helps!

~~~
SchizoDuckie
Okay, Now that's a story that makes me understand what this is about, thanks!

How about a more clear description like above on your page with some readups
into macro's, and some more complicated examples that tell me as a programmer:
"Yes, I dó need macro's!" and I can use them for these and these cases?

Also, don't underestimate how smart your audience is going to be by giving
them just 3-line examples. I always find it more clear when there's some
actual use cases on the wiki pages. How about a proper class, or a todoMVC
style demo, so that people can actually imagine what kind of possibilities
this would unlock?

