Depending on why you want CPS, you can apply the following strategy:
Keep all code in direct style. For each function, designate a special return token CaptureCont that indicates that the continuation should be captured. The source of this token will be callcc. After each function call you check if it returned a token, and if so, extend it with the continuation for the rest of the computation. Then on the main() function of your program, pass the continuation to the appropriate place. This strategy avoids most overhead in the common case: one if instanceof check plus a branch per function call.
For example, take this function:
y = g(x)
member target # the function that we will pass the continuation to once we finish constructing it
member partialCont # the part of the continuation captured so far
newCC = new CaptureCont()
newCC.target = target
newCC.partialCont = lambda v: f(partialCont(v))
y = g(x)
if y is CaptureCont:
# now y is the part of the continuation captured so far
# our job is to extend it with our stack frame
# and then return it so that the callee can build it up further
return y.extend(lambda y2: y2+a)
v = main()
if v is CaptureCont: v.invoke()
else: exit(v) # well I guess this is not necessary in JS
cc = new CaptureCont()
cc.target = f
cc.partialCont = lambda x: x
The reason I want to do CPS is for debugging in the browser. I want to be able to control the stack, so when a breakpoint hits I can pause everything and step through the code. The browser is non-blocking so I need explicit control of the stack.
You're method is neat. My language doesn't support continuations, so I don't need them until I hit the debugging mode. You've made me curious if I can compile out both CPS-ed and non-CPS-ed versions of the code and somehow switch between them.
Outlet is a compiler which compiles straight to js, no interpretation.
> Outlet is a compiler which compiles straight to js, no interpretation.
Right; hence, you would have to include the implementation of the CPSed run time environment in the compiled output, just as you have to include the extra trampoline function when you do the current CPS transform.
However, if we do that globally, we're back where we started, right? I mean, the allocation rate of continuation closures will be on the same level as standard CPS.
> Another way if you're willing to take a 2x code explosion hit is to compile each function twice: ...
The idea about compiling functions twice and using a dynamic switch on function entry seems to presume that it can be determined at function call whether a continuation will be captured somewhere within the call and I don't see how you'd do that.
Alternatively, you could introduce a static analysis that identifies functions that provably are not capable of continuation capture and compile only those in direct style. I suppose you could even do the analysis on an expression-level. Of course, Lisp dialects are not particularly amenable to static analysis so it may be hard to get much of a benefit in this way.
However, even if you deal with the code-size issues, I think you are left with a technique for optimizing code for a language that must support continuation capture but you are not left with code that is easy to debug. Every function or block that is compiled in direct style will be impossible to step through or dynamically interrupt.
All criticism aside, thanks for posting the idea! I had never seen a technique like that before.
Hmm yes you're right.
> The idea about compiling functions twice and using a dynamic switch on function entry seems to presume that it can be determined at function call whether a continuation will be captured somewhere within the call and I don't see how you'd do that.
Ah! I meant that the switch determines where in the function call we were when the continuation was captured, so that if we rebuild the stack frame we can go back to the right place.
Okay so I'll here try to describe the method more clearly (also to clear it up for myself, I haven't thought about this for some time).
First, lets answer the following question: What's in a stack frame? Or equivalently: What's in a delimited continuation?
A stack frame is a data structure that we build right before making a function call, to keep track of what do after that function call completes. So what we need to save is two things: the code that we should execute, plus any of our local variables that that code will need. But there is also a third thing about a stack frame: a place indicating what to do with the return value of the function we called. All of this is neatly described by a closure: a closure is (1) a piece of code (2) a local environment (3) has a parameter. The parameter is the place we put the return value of the function we called. The return value of the closure will be the return value of our function. So if we are a function that returns a value of type T, and we call a function that returns a value of type Q, then a stack frame for completing the rest of our function call is a closure of type Q -> T. The whole stack is represented by a stack of stack frames. When we have a value of type Q, we continue running the program by popping the topmost stack frame, and calling it with the value Q, which gets us a T. Then we pop another frame, and call that with the T, and so on until the stack is empty. Contrast this with traditional continuation passing style, where the whole stack is represented by a single function which is the composition of all these stack frames.
To capture a continuation, we simply copy the stack of stack frames. But there is a problem: in most languages such a stack frame is represented by a particular structure in memory, not by any data structure that we can get our hands on from within the language. We could build it ourselves before each function call, and pass that down to the call in case that call needs to capture the continuation, but that would be prohibitively expensive because it incurs a heap allocation on each call (just like normal CPS). So instead we're going to rely on the machine stack for efficiency, and only construct those program level stack frames when a callee wants to get hold of the continuation. To do this we need a data structure that holds a partial continuation:
member stackframes # stack of stackframes
member target # the function to pass the continuation to when we're done building it
cc = new CaptureCont
cc.target = f
cc.stackframes = new List
The crucial question is: how do we represent a stack frame? We need to do two things with a stack frame: at each call site, we potentially have to construct it. The second thing is that we need to be able to invoke it. The former is going from a machine stack frame to a program level stack frame, the latter is doing the inverse: taking a program level stack frame and constructing it back on the machine stack. If we simply represent a stack frame as a closure, this will lead to code size explosion as you note. To solve this problem we need a bit of trickery.
Take any function, like this one:
x = bar(a);
y = baz(x);
def foo_rebuild_stackframe(i, env, ret):
a = env['a']
x = ret
a = env['a']
x = env['x']
y = ret
y = baz(x);
Note that on the assembly level, this code duplication is not necessary, because we could replace the original foo with:
x = bar(a);
Each continuation can now be represented by a triple consisting of a pointer to the <func>_rebuild_stackframe procedure, a numeric index, and an environment. Now we're going to transform the code to build such a stack frame (i.e. go from machine stack to program level stack).
To do that we take take each function foo and for each call that foo makes (say bar(a)), replace it with:
x = bar(a)
if x is CaptureCont:
x.stackframes.add(new StackFrame(foo_rebuild_stackframe, n, ENV))
Does this sound correct to you?
I apologize for the wall of text. If I find time I might post another describing in more detail how this lets you do the more powerful delimited continuations with the same low overhead on normal calls and amortized O(1) overhead for capturing a continuation. Note that traditional stack copying is O(call stack size) since if you repeatedly capture the continuation deep inside a call stack you capture the whole stack repeatedly. But in this case by rebuilding the machine stack one stack frame at a time you never capture the same stack frame twice, which means it's O(1) amortized time "spaghetti stacks on demand".
There are a couple of problems with this. The first is that it does not address tail calls at all. Tail calls will still overflow the stack. An extension of the same approach can solve that problem: keep track of the call stack depth and when it exceeds a value that you know could lead to stack overflow, capture the continuation and immediately invoke it :)
The second problem is that some languages, like JS, don't have gotos. That means that you need to find another method to represent the same control flow as in foo_rebuild_stackframe. When foo contains loops, that might pose a challenge (jumping back into the middle of a loop). Though it probably can be solved with code duplication linear in the nesting level of loops or something. If somebody who is good at turning goto code into (efficient!) structured control flow can explain how I'd be very grateful.
A possible extension of this is an optimization for escape continuations: for escape continuations you don't need to allocate any stack frames because they will not be used anyway. This lets you implement e.g. exceptions on top of continuations more efficiently (basically it turns exceptions into return codes for error handling as used in C). Another possible optimization is for coroutines: perhaps you can somehow reuse a previous stack frame for the next coroutine pause.
Another note: these stack frames are trivially serializable, so it's very easy to implement crazy things like a function teleport(address) that serializes the current state of the computation, sends it over to another computer over the network, and continues running there.
I think that what you describe can work. However, it's not something I can see myself using because I find source-to-source compilation too inflexible and simply don't use it. Such a technique remains as an optimization that I will probably never get to.