Hacker News new | comments | show | ask | jobs | submit login

Why do you need to transform to CPS? The answer to your question will depend very much on your answer to that question. In the extreme if you are not doing anything with the CPS form, then probably the best optimization of the CPSed form will completely undo the CPS, transforming it back into direct style. Of course if you are actually making use of the CPS form, that's not possible anymore.

Depending on why you want CPS, you can apply the following strategy:

Keep all code in direct style. For each function, designate a special return token CaptureCont that indicates that the continuation should be captured. The source of this token will be callcc. After each function call you check if it returned a token, and if so, extend it with the continuation for the rest of the computation. Then on the main() function of your program, pass the continuation to the appropriate place. This strategy avoids most overhead in the common case: one if instanceof check plus a branch per function call.

For example, take this function:

    def foo(x,a):
      y = g(x)
      return y+a
Transform to:

    class CaptureCont:
      member target # the function that we will pass the continuation to once we finish constructing it
      member partialCont # the part of the continuation captured so far
      def extend(f):
        newCC = new CaptureCont()
        newCC.target = target
        newCC.partialCont = lambda v: f(partialCont(v))
      def invoke():

    def foo(x):
      y = g(x)
      if y is CaptureCont:
        # now y is the part of the continuation captured so far
        # our job is to extend it with our stack frame
        # and then return it so that the callee can build it up further
        return y.extend(lambda y2: y2+a)
        return y+a
Instead of just invoking main(), do this:

      v = main()
      if v is CaptureCont: v.invoke()
      else: exit(v) # well I guess this is not necessary in JS
Eventually the CaptureCont will bubble up to main, and at that point main will invoke it so that the continuation is passed to the right target.


    def callcc(f):
      cc = new CaptureCont()
      cc.target = f
      cc.partialCont = lambda x: x
      return cc
There are various optimizations that can be applied (like eliding the capture code for functions that you statically know don't call callcc and thus will never return CaptureCont), and this approach also (trivially) works for the more general delimited continuations. As described capturing the continuation is O(n) in the size of the stack. You can make it amortized O(1) with some more trickery by lazily rebuilding the stack, but the current approach is already blazingly fast in the case of no continuation capture. I'm really tired now, so there are probably lots of errors in the above description, so tread with caution. I don't know if this approach is new or not, but it probably isn't given that it's a fairly obvious approach to capture the stack. Does anybody know what it's called? Then the OP can find a better description.

That's a smart trick, and it's possible I might be able to do something like that.

The reason I want to do CPS is for debugging in the browser. I want to be able to control the stack, so when a breakpoint hits I can pause everything and step through the code. The browser is non-blocking so I need explicit control of the stack.

You're method is neat. My language doesn't support continuations, so I don't need them until I hit the debugging mode. You've made me curious if I can compile out both CPS-ed and non-CPS-ed versions of the code and somehow switch between them.

I did a very similar thing with a python-implemented LISP interpreter, mainly so that I could implement a trampoline so I could do tail call optimization. It required CPSing the interpreter (and built-in functions), but not the interpreted program- this works because the continuation of the interpreted program is contained in the continuation of the interpreter that runs it. The interpreted "stack" ends up as a chain of Python closures.

You might be able to get away with compiling a runtime environment slightly more complex than your simple trampoline loop that does the same thing, building a chain of JavaScript closures to represent continuations without actually CPSing the rest of the program. Continuation calls would then show up on the JavaScript stack and allow you to step through them.

Wouldn't that be even slow though? Not only is it CPS code, but it's a program being interpreted by CPS code. I think I see what you're getting at though. Sounds neat; I definitely have a lot of new ideas for implementing CPS in a performant way now.

Outlet is a compiler which compiles straight to js, no interpretation.

Not sure; I haven't got anything to benchmark my implementation against. My suspicion is that it would be faster because you get chains of interpreted-continuations-implemented-as-interpreting-closures that call each other normally, rather than trampolining absolutely everything. But I would not be surprised if I turn out to be completely wrong about that.

> Outlet is a compiler which compiles straight to js, no interpretation.

Right; hence, you would have to include the implementation of the CPSed run time environment in the compiled output, just as you have to include the extra trampoline function when you do the current CPS transform.

Won't that lead to a code-size explosion due to the duplication of continuation bodies?

As described yes, but you can extract the duplicate bodies in a function. That way there is no code-size explosion. Another way if you're willing to take a 2x code explosion hit is to compile each function twice: once for normal calling and another time for restoring the stack. The way to do this is to give the function an additional parameter that indicates where in the function it should start, and then do a switch on that parameter to go to the right point. If you can't take the 2x hit you can also replace all normal calls with a call to the special function with the parameter set so that it will start at the beginning of the procedure, although this makes your code a little slower for the case that no continuations are captured.

> As described yes, but you can extract the duplicate bodies in a function.

However, if we do that globally, we're back where we started, right? I mean, the allocation rate of continuation closures will be on the same level as standard CPS.

> Another way if you're willing to take a 2x code explosion hit is to compile each function twice: ...

The idea about compiling functions twice and using a dynamic switch on function entry seems to presume that it can be determined at function call whether a continuation will be captured somewhere within the call and I don't see how you'd do that.

Alternatively, you could introduce a static analysis that identifies functions that provably are not capable of continuation capture and compile only those in direct style. I suppose you could even do the analysis on an expression-level. Of course, Lisp dialects are not particularly amenable to static analysis so it may be hard to get much of a benefit in this way.

However, even if you deal with the code-size issues, I think you are left with a technique for optimizing code for a language that must support continuation capture but you are not left with code that is easy to debug. Every function or block that is compiled in direct style will be impossible to step through or dynamically interrupt.

All criticism aside, thanks for posting the idea! I had never seen a technique like that before.

> However, if we do that globally, we're back where we started, right?

Hmm yes you're right.

> The idea about compiling functions twice and using a dynamic switch on function entry seems to presume that it can be determined at function call whether a continuation will be captured somewhere within the call and I don't see how you'd do that.

Ah! I meant that the switch determines where in the function call we were when the continuation was captured, so that if we rebuild the stack frame we can go back to the right place.

Okay so I'll here try to describe the method more clearly (also to clear it up for myself, I haven't thought about this for some time).

First, lets answer the following question: What's in a stack frame? Or equivalently: What's in a delimited continuation?

A stack frame is a data structure that we build right before making a function call, to keep track of what do after that function call completes. So what we need to save is two things: the code that we should execute, plus any of our local variables that that code will need. But there is also a third thing about a stack frame: a place indicating what to do with the return value of the function we called. All of this is neatly described by a closure: a closure is (1) a piece of code (2) a local environment (3) has a parameter. The parameter is the place we put the return value of the function we called. The return value of the closure will be the return value of our function. So if we are a function that returns a value of type T, and we call a function that returns a value of type Q, then a stack frame for completing the rest of our function call is a closure of type Q -> T. The whole stack is represented by a stack of stack frames. When we have a value of type Q, we continue running the program by popping the topmost stack frame, and calling it with the value Q, which gets us a T. Then we pop another frame, and call that with the T, and so on until the stack is empty. Contrast this with traditional continuation passing style, where the whole stack is represented by a single function which is the composition of all these stack frames.

To capture a continuation, we simply copy the stack of stack frames. But there is a problem: in most languages such a stack frame is represented by a particular structure in memory, not by any data structure that we can get our hands on from within the language. We could build it ourselves before each function call, and pass that down to the call in case that call needs to capture the continuation, but that would be prohibitively expensive because it incurs a heap allocation on each call (just like normal CPS). So instead we're going to rely on the machine stack for efficiency, and only construct those program level stack frames when a callee wants to get hold of the continuation. To do this we need a data structure that holds a partial continuation:

   struct CaptureCont:
      member stackframes # stack of stackframes
      member target # the function to pass the continuation to when we're done building it
The way that the callee signals to us that it wants a continuation is by allocating a CaptureCont, setting target to the function it wants to pass the continuation to, and return this CaptureCont. In other words, callcc(f) can be defined as:

    def callcc(f):
       cc = new CaptureCont
       cc.target = f
       cc.stackframes = new List
       return cc
If we called a function that did not return an ordinary value but it did return a CaptureCont, it is now our duty to extend the cc with the rest of the stack frames. To do this, build a data structure out of our local stack frame by saving the local variables and the bit of code that's still to be executed in our function, and push it onto cc.stackframes. Then return the cc to our caller, so that it can finish the job of pushing the rest of the stack frames onto cc.stackframes. This goes on up to main(). At that point cc.stackframes is a valid representation of a continuation. So main simply calls cc.target(new Continuation(cc.stackframes)) to pass the continuation to the right target. Continuation is just a wrapper around the stack frames. A continuation can be invoked with a value, which passes the value to the topmost stack frame, and then that result to the second topmost stack frame, etcetera as described earlier.

The crucial question is: how do we represent a stack frame? We need to do two things with a stack frame: at each call site, we potentially have to construct it. The second thing is that we need to be able to invoke it. The former is going from a machine stack frame to a program level stack frame, the latter is doing the inverse: taking a program level stack frame and constructing it back on the machine stack. If we simply represent a stack frame as a closure, this will lead to code size explosion as you note. To solve this problem we need a bit of trickery.

Take any function, like this one:

    def foo(a):
      x = bar(a);
      y = baz(x); 
Since it calls bar and baz, and since bar and baz will perhaps capture the continuation, we need to be able to represent both the stack frame for what to do after bar returns and a different stack frame for what to do after baz returns. For every function in the program we will define a single function that can take a program level stack frame and build a machine stack frame out of it. We will represent a stack frame as a triple: (1) that function that can rebuild it (2) an integer `i` indicating where in the function it should start (3) an `env` consisting of local variables. The function that rebuilds the stack frame will need 3 pieces of information: the `i` and the `env`, and the return value of the function we called (after all, the reason that we needed the stack frame is because we did a function call; this return value is the return value of that function call). For example, for foo:

    def foo_rebuild_stackframe(i, env, ret):
      if i==0: 
         a = env['a']
         x = ret
         goto L0
      if i==1: 
         a = env['a']
         x = env['x']
         y = ret
         goto L1

      y = baz(x); 
As you can see, foo_rebuild_stackframe takes the i and the env and the return value, and resumes the computation that that data represents: first it builds up the local variables, then it jumps to the right place in the computation.

Note that on the assembly level, this code duplication is not necessary, because we could replace the original foo with:

   def foo(a):
     x = bar(a);
     goto L0
But I don't think most languages or compilers allow gotos across different functions, unfortunately, so you need that 2x code duplication to avoid any overhead beyond the if statement after each call, in the case that no continuation was captured (instead of `goto L0` we could do `foo_rebuild_stackframe(0, {'a':a}, x)`, and we could even arrange it so that no allocation was necessary, but that would be one extra function call overhead per function call in the no continuation case).

Each continuation can now be represented by a triple consisting of a pointer to the <func>_rebuild_stackframe procedure, a numeric index, and an environment. Now we're going to transform the code to build such a stack frame (i.e. go from machine stack to program level stack).

To do that we take take each function foo and for each call that foo makes (say bar(a)), replace it with:

    x = bar(a)
    if x is CaptureCont:
      x.stackframes.add(new StackFrame(foo_rebuild_stackframe, n, ENV))
      return x
In the above example, n=0 and ENV = {'a':a}.

Does this sound correct to you?

I apologize for the wall of text. If I find time I might post another describing in more detail how this lets you do the more powerful delimited continuations with the same low overhead on normal calls and amortized O(1) overhead for capturing a continuation. Note that traditional stack copying is O(call stack size) since if you repeatedly capture the continuation deep inside a call stack you capture the whole stack repeatedly. But in this case by rebuilding the machine stack one stack frame at a time you never capture the same stack frame twice, which means it's O(1) amortized time "spaghetti stacks on demand".

There are a couple of problems with this. The first is that it does not address tail calls at all. Tail calls will still overflow the stack. An extension of the same approach can solve that problem: keep track of the call stack depth and when it exceeds a value that you know could lead to stack overflow, capture the continuation and immediately invoke it :)

The second problem is that some languages, like JS, don't have gotos. That means that you need to find another method to represent the same control flow as in foo_rebuild_stackframe. When foo contains loops, that might pose a challenge (jumping back into the middle of a loop). Though it probably can be solved with code duplication linear in the nesting level of loops or something. If somebody who is good at turning goto code into (efficient!) structured control flow can explain how I'd be very grateful.

A possible extension of this is an optimization for escape continuations: for escape continuations you don't need to allocate any stack frames because they will not be used anyway. This lets you implement e.g. exceptions on top of continuations more efficiently (basically it turns exceptions into return codes for error handling as used in C). Another possible optimization is for coroutines: perhaps you can somehow reuse a previous stack frame for the next coroutine pause.

Another note: these stack frames are trivially serializable, so it's very easy to implement crazy things like a function teleport(address) that serializes the current state of the computation, sends it over to another computer over the network, and continues running there.

It seems to me that there's a latent virtual machine in this design. You are breaking code into basic blocks and handling control state in a first class way, so there need to be abstractions for all that, and those abstractions define a virtual machine.

I think that what you describe can work. However, it's not something I can see myself using because I find source-to-source compilation too inflexible and simply don't use it. Such a technique remains as an optimization that I will probably never get to.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact