Stack Free Recursion (1995)

barrkel · on Nov 9, 2014

The "spaghetti code" conversion - using labels instead of calls - only works if the original recursion didn't actually need to be recursive, i.e. it didn't require storage proportional to its maximum stack depth, and could already have been rewritten as a loop.

Other than that, the approach trades code (time) for data (space). Instead of making a copy of activation record variables at each level of the recursion, the mutation of the activation record variables is implicitly encoded in the return address pushed onto the call stack, because the code at the return address undoes the mutation.

icarot · on Nov 9, 2014

I'm so confused by this. His solutions end up using recursion, yet he says it's stack-free. Which doesn't make any sense at all considering deficient compilers allocate stack frames on each call for languages like C. This is all chapter 1 SICP-grade stuff. Unless you're going to do a technique (read: hack) such as trampolining (certainly not done here) I don't see how it's possible to have recursion without a stack. The runtime should keep pushing frames. Notice there's still a pointer defined in the Hanoi().

I don't think this can be computed on a register machine. What do you girls/guys think?

waterhouse · on Nov 9, 2014

The title ("Stack Free Recursion") is certainly misleading, as are the labels of several examples ("Stackless Factorial", "This Form of Routine Can Be Changed to Stack Free Recursion", the function stackfree(x), "Stackless Towers of Hanoi"), in that all these examples, and all but one example in the paper, implicitly use a stack for return addresses.

However, in the abstract, the definition of the bet, and throughout much of the article, the author does distinguish between the concepts of a "data stack" and a "return stack", and he illustrates one case in which the return stack is in fact eliminated entirely (and says, at the bottom, that it is "sometimes possible" to remove the return stack). Overall, 19/48 instances of the string "stack" are preceded by either "data" or "return".

Is it fair to use "stackless" to mean "data-stack-less", without saying so explicitly? I would say no. The stated advantage is that less memory is used (and that the result "sometimes is faster", presumably due to fewer push/pop operations). However, "stackless" implies that you've eliminated the problem completely, when you've achieved less, perhaps significantly less than that. For example, in the case of factorial, the naive approach stores two CPU words per level of recursion (the value of x and the return address), and the (data-)stackless approach stores one CPU word per recurse (the return address); this fixes half the problem.

Also, see this in the abstract: "We also present a method that uses no return stack or data stack and we derive a simple line drawing function using the information presented herein." This wording strongly suggests that the line drawing function doesn't use a return stack, when it does. A comma after "data stack" would make it a little less non-obvious that the two statements in the sentence are not related to each other. I would have to call this paper "oversold".

icarot · on Nov 10, 2014

Definitely agree on the "oversold".

I notice you said "half the problem" at the end of your third paragraph there. I'm assuming the other half is that it seems foolish to save the return address for a recurse. Only the first call should save a return. I think one problem with compiler writers is that oddly they don't separate how to treat a procedure and its recursions as different cases. They just fit every procedure into the same bucket-list algorithm of 1. allocate frames 2. allocate return address.

tromp · on Nov 9, 2014

Hanoi is not the best example of a problem calling for recursion, as the following iterative solution shows:

    max = 1 << no_of_discs;
    for (x = 1; x < max; x++)
        printf("move a disc from %d to %d\n", (x&x-1)%3, ((x|x-1)+1)%3);

icarot · on Nov 10, 2014

You're right. The de-facto way to prove an algorithm as stack-free is run it on a register machine, having no stack to begin with. As you suggest, it'd also be optimal to use a problem which calls for deep recursion.

Hence the task, should ye choose to accept, is compute the Ackermann function as an iterative process (i.e, with state variables in the recursion keeping track of which step at the process you are at). The following is the easy, non-iterative process:

(define (A x y) (cond ((= y 0) 0) ((= x 0) (* 2 y)) ((= y 1) 2) (else (A (- x 1) (A x (- y 1))))))

space is O(n) to record past results in one of the two subtrees of this tree recursion at any given time, which must both be evaluated separately, and at separate times if evaluation is deterministic (not concurrent). So if you were to recast it iteratively, you'd hold an immaculate benchmark of an abstrusely deep recursion which you could run on a register machine. I'm not sure if anyone has done this.

wyager · on Nov 9, 2014

> I don't see how it's possible to have recursion without a stack.

Well, it depends what you mean by "without a stack", and it's also the case that for some recursive algorithms, you can make them stackless.

Any recursive algorithm is isomorphic to a non-recursive algorithm and a separate stack (like a linked list), but that's still "with a stack".

Many recursive algorithms are tail-call recursive, which means that you can reduce the algorithm to a fixed-space loop. For example,

    int fac'(int accum, int n){
        if(n==0) return accum;
        else return fac'(accum*n, n-1);
    }
    int fac(int n){return fac'(1,n);}

is equivalent to a loop

    int fac(int n){
        int accum = 1;
        while(n>0){
            accum *= n;
            n--;
        }
        return accum;
    }

Compilers like GCC and Clang will make this optimization.

Other (co)recursive algorithms are WHNF tail-call recursive, meaning that the algorithm can be expressed as a function that returns a partially evaluated value and another function to evaluate the next part of the value. This also requires constant space, but you won't find it in a whole lot of languages. It's popular in Haskell, for example:

    fibs a b = a : fibs b (a+b)
    fib n = fibs 0 1 !! n

Even though "fibs" (the function that generates the list of Fibonacci numbers) is recursive, and, in fact, infinitely recursive, it requires constant space.

You may also be interested in reading http://wwwhome.ewi.utwente.nl/~fokkinga/mmf91m.pdf , which explains a number of morphisms that might be leveraged to transform certain recursive functions into constant-space variants.

icarot · on Nov 10, 2014

Thanks so much for that paper. That'll save me time thinking about transforming functions on my own.

Are you sure they make that optimization? How do you know? (I'm not familiar with either code base).

Are you absolutely sure that example is O(1)? It looks O(n) to me. I'm not experienced in Haskell, by the way: disclaimer.

Each call to 'fibs' constructs a list with its first argument and the recursive call. But for all of the recursions (aka, 2nd element and on) I think they'll have to build up deferred cons operations. I'm not sure about this at all. I sound suspicious, I know, but before you say anything, look here: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-11.html...

don't they look suspiciously similiar? In that example, you'd have needed to store a function pointer and the argument'(s) type(s) for each call, on the stack, making space usage O(n). Each call pushes to the stack.

Does the Haskell example seem similar to you? I'm really uncertain.

wyager · on Nov 10, 2014

>Are you sure they make that optimization? How do you know?

Yeah, if you write a tail-call recursive fib function and turn on optimizations, the optimizer should definitely loop-ify it.

I tested this a while back on my machine. Writing C and compiling with LLVM yielded a 5-instruction loop, and writing tail-call recursive, strict (not WHNF-recursive) Haskell with optimizations on yielded a 4-instruction loop. Both were very fast. You can look at the generated assembly with a variety of tools. I used Apple's profiler tools.

>Each call to 'fibs' constructs a list with its first argument and the recursive call. But for all of the recursions (aka, 2nd element and on) I think they'll have to build up deferred cons operations.

Correct. If you saved the list, it would be O(n). However, since you throw away the head of the list and move on to the next element immediately (if you want the nth fib number), the garbage collector will get rid of all the generated list elements, so the space consumption is only the list element you want (or are looking at currently), so it's O(1).

It might look like this, in pseudo-code.

    def fibn(n){
        (curr, nextGenerator)=fibs(0, 1);
        for(i=0; i<n; i++){
            (curr, nextGenerator) = nextGenerator();
        }
        return curr;
    }

Since the rest of the list is passed as an unevaluated function, you only need to keep track of two things at once: the head of the list (in case it's the correct element) and the function to make the rest of the list.

Also, Haskell doesn't really use the stack like a lot of other languages do. That's why you can have infinitely recursive functions without running out of space. It accomplishes this using the technique I put in that loop up there: passing values around as unevaluated functions.

(Side note: Haskell doesn't keep track of types at runtime. The compiler erases any type information before compilation, and tries to eliminate any dynamic function lookup. As long as the types match up during compile time, they will match up during runtime.)

tome · on Nov 9, 2014

> I don't see how it's possible to have recursion without a stack.

Recursion can happen without a stack as long as function calls can happen without a stack. It's well-known how to achieve this. See Tail Call Optimization, for example.

icarot · on Nov 10, 2014

That's why I said 'deficient' compilers. Yes, it is well-known. Just reuse the stack frames on the next call; dead simple.

rogerbraun · on Nov 9, 2014

Isn't this just rewriting tail calls (http://en.wikipedia.org/wiki/Tail_call)? Am I missing something?

barrkel · on Nov 9, 2014

The "spaghetti code" approach isn't far off, no. But there is an idea which is distinct; it's reducing data usage at the cost of running more code. Not all recursive functions can be made tail recursive (at least, not without other tricks). But the approach first described in the article can be applied to any recursive function. It will still need a stack of return addresses, but no stack for parameters or locals.

LukeShu · on Nov 9, 2014

    > It will still need a stack of return addresses

Keep reading, further down he even gets rid of the stack of return addresses.

barrkel · on Nov 9, 2014

No; the stack of return addresses is required when the variable mutations don't have a consistent inverse.

segmondy · on Nov 9, 2014

This was 1995, Outside of Prolog, LISP, Scheme, and other exotic languages, what popular languages of that era offered TC? (C, Pascal, C++, BASIC, FORTRAN, etc) didn't.

dragonwriter · on Nov 10, 2014

Tail call optimization, strictly speaking, isn't a language feature (though a language standard can require it in conforming implementations, as Scheme standards have. Which brings me to:

> (C, Pascal, C++, BASIC, FORTRAN, etc) didn't.

I'm not sure that's true. gcc, in optimizing modes, could do at least tail recursion elimination at least as far back as 1999, and does more now. (The earliest reference I can find refers to gcc 2.95.3, but its not clear to me if that's the first version that did it, or just the earliest version on which the writer tested and confirmed the behavior.)

Outside of a handful of languages, tail call elimination in either restrictive (e.g., tail recursion only) or general forms is an optional optimization rather than an essential behavior.

icarot · on Nov 10, 2014

Yeah. Most languages don't. Python and Javascript compilers still don't and Emacs LISP still doesn't.

andreasvc · on Nov 9, 2014

This seems to me like the sort of optimization a compiler should do for you. Does GCC do this?

hughes · on Nov 9, 2014

As a dev who has never really got into the low-level parts of programming, I don't really understand yet how this is stackless. I thought the stack had more to do with the depth of function calls than with the passing of function parameters. Can anyone here explain it?

barrkel · on Nov 9, 2014

There are two approaches described - one a further extension of the other. The first approach still relies on a call stack (but not parameters or locals). Instead of calling a function like this:

    f(x) =
      // ...
      f(x + 1)

It's called like this:

    f = 
      x = x + 1
      f()
      x = x - 1

The return address from the function call references the code that undoes modification to global state (all locals and parameters have static storage when there is no recursion support), rather than relying on copies of parameters being made as in a modern language.

The second is an extension of the first, using goto and labels instead of function calls, but it only works if you don't need to distinguish between different call sites. If the recursive function only ever calls itself in one place, and the mutation it needs to make in order to "recurse" is always the same (so it can be consistently undone), then the call stack can be eliminated too.

chad_oliver · on Nov 9, 2014

As you probably know, the stack is made up of stack frames, and a stack frame is added each time you go 'deeper' by calling another function. A stack frame is removed every time you get to the end of a function and return. So, yes, the stack is closely linked to the depth of function calls.

However, in modern systems (or basically anything you'll ever see today, including the cheapest microcontrollers), the stack doesn't just keep track of where to return to after each function completes. It also stores all the local variables used by the function. This is vital because it allows the variables to be restored after you return from a 'deeper' function. A convenient side effect is that when the function completes and the frame is popped off the stack, the references to the local references are destroyed and the data can be garbage-collected.

In this sense, local variables are directly related to stack frames. If your system doesn't store local variables in the stack frame, it almost certainly doesn't store them anywhere else (stack frames are the natural place for them), and you've only got globals to work with.

ekr · on Nov 9, 2014

Haven't read all of it, but the author assumes the availability of a return stack, which somehow makes this approach not really stack-free. Most modern calling conventions (x86) use a register(eax/rax) for returning values.

But his approach does in fact conserve stack space.

LukeShu · on Nov 9, 2014

A little bit further down, he shows how to avoid using the return stack also, to use no stack at all.

Also, by "return stack", he isn't referring to the mechanism for returning values, he is referring to the mechanism for knowing which address to jump (return) to when a function is finished. In x86, this is still done with a stack, but the stack modifications are done "behind the scenes" by the call/ret instructions.

andrewstuart2 · on Nov 9, 2014

Mildly educated answer here. A stack in low level terms is just a first-in-first-out record of what context the CPU needs to reload when it's done with the current line of execution. If you can stay in the same context and just increment or decrement some variables, you'd probably call that stackless.

When you pass function parameters you usually need to push onto the stack or risk overwriting the value of the named variable in the current execution context, since it has the same name or register in the new execution context.

Again, mildly educated answer so if somebody can correct me, please do.

gorhill · on Nov 9, 2014

> A stack in low level terms is just a first-in-first-out record

I think you meant last-in-first-out.

andrewstuart2 · on Nov 10, 2014

Ah, yes I did.

mqsiuser · on Nov 9, 2014

Hoare writes in his (famous) paper on quicksort (in 1962): "We'd implement our own recursion schema for perf reasons"

Recursion is just a tree.

Calling a function/procedure is expensive (in memory and run-time).

Create a tree (e.g. linked list), base your recursion on that!

You are not at university anymore, where you look at naive implementations for the sake of learning.

Spoiler: I (actually do) resolve (function-call) recursion to (tree-based) recursion when I feel like I want to.

Downvoters pls explain.