Why Object-Oriented Languages Need Tail Calls (2011)

rbehrends · on Nov 14, 2017

There are a number of problems with this argument, but the biggest one is that most such calls will not actually be tail-recursive. The very specific `adjoin` example that has been provided only works with `contains` not being in a tail position because `or` is assumed to use short-circuit evaluation, and because the author is apparently willing to live with a set membership test that has linear time complexity. We can see that the `contains` method for `UnionObject` is not tail-recursive, for example, no matter how you cut it.

The primes example effectively constructs a linked list with 78,498 elements; any unsuccessful search will iterate over all those elements, and successful searches will (depending on how the search arguments are distributed) also easily traverse tens of thousands of elements. Handwaving performance issues away with "we know lots of ways this code can be optimized, but let us please focus on the behavior of the constructed set" is wrong, simply because efficient and actually usable implementations may exhibit totally different requirements w.r.t. tail recursion and what should be in an interface.

phkahler · on Nov 14, 2017

"is wrong, simply because efficient and actually usable implementations may exhibit totally different requirements w.r.t. tail recursion and what should be in an interface."

Thank you so much for that!

mikeash · on Nov 14, 2017

Why don't languages give tail calls their own syntax rather than making it an optimization? It seems like a lot of the arguments against tail calls boil down either 1) it's confusing when things go wrong and you weren't expecting stack frames to be optimized out or 2) it's hard to safely write tail-recursive code because a small change to the code or, worse, the compiler settings can make it stop being tail-recursive.

If you had to explicitly request it, say by doing `tailcall f()` instead of just `f()`, that would mostly fix these problems. Control flow will be a lot less confusing since it'll be explicit in the code (but you still lose information from the optimized-out stack frames, no way around that) and you'll get an explicit error if someone tries to add something that can no longer be tail call optimized, like `tailcall f() + 1`.

jcdavis · on Nov 14, 2017

Scala sort of has this with the @tailrec annotation. The compiler will always attempt TCO, but the annotation makes failure to do so a compilation error. (Its not a perfect situation though, since there can still be functions assumed to be tail recursive without @tailrec, and whose behavior can change without warning)

Jach · on Nov 14, 2017

Probably because people have concluded that programmers suck at explicitly specifying optimizations, so much so that things like "inline" don't guarantee inlining in certain languages. Furthermore tail recursion elimination can happen as an optimization even when the source isn't clearly tail recursive. For example, GCC can optimize this:

    int factorial(int x) {
       if (x > 1) return x * factorial(x-1);
       else return 1;
    }

to this:

    int factorial(int x) {
       int result = 1;
       while (x > 1) result *= x--;
       return result;
    }

(See http://ridiculousfish.com/blog/posts/will-it-optimize.html for other fun optimizations.)

adrianratnapala · on Nov 14, 2017

I think mikeash's point is that sometimes tail-recursion is not just an optimisation. For example it means I won't overflow my stack even if this thing recurses 10^7 times.

Right now, a programmer needing such guarantee must write explictly iterative code. But with explicit tail calls, she can write it recursively if she wants and get a compiler error if something makes it impossible to do the tail-call transformation.

Jach · on Nov 14, 2017

There's an interesting idea here that I wonder if any non-academic language has tried: annotating your expected big-O time and space complexities for a function. I wouldn't expect much compiler support except for trivial cases though due to the halting problem... AFAIK academic attempts at automated program analysis go back to Knuth and his students in the 70s, no idea what the state of the art looks like.

adrianratnapala · on Nov 15, 2017

I expect the trivial cases are the common ones for big-O.

sitkack · on Nov 15, 2017

Everytime I have used a tail call, it was because I wanted the control flow, not the optimization. Optimization is a misnomer for such a powerful construct. I think having an explicit `tailcall f(args)` would go a long way towards assuaging the non-tailers fears.

Peaker · on Nov 14, 2017

This optimization changes factorial from:

5 * (4 * (3 * (2 * 1)))

to:

(((1 * 5) * 4) * 3) * 2

It relies on the commutative and associative nature of integer multiplication.

This means it's quite a brittle optimization -- it shouldn't really work for floats, for example, which aren't associative at multiplication. And it wouldn't work on any operation for which the compiler does not know the associativity.

So while it seems general -- it's quite specific and not widely applicable.

to3m · on Nov 14, 2017

We even already have a common keyword suitable for this: 'goto'.

sevensor · on Nov 14, 2017

This is an interesting point -- structured programming has made source code vastly easier to reason about, but at the cost of turning some obvious control structures into optimizations. The upshot is that we don't have some of the really terrible control structures that were enabled by goto. I once had the pleasure of reading through some old Fortran code that freely jumped into various points in the middle of a loop, from outside the loop. Took me ages to figure out what it was doing.

javajosh · on Nov 14, 2017

heck you don't even need goto, just a labelled break, which is extremely well supported across languages (great comment btw i lol'd)

pjmlp · on Nov 14, 2017

In Scheme it isn't just an optimization, it is an actual semantic requirement.

cr__ · on Nov 14, 2017

This is how Clojure does it.

https://clojuredocs.org/clojure.core/recur

https://clojuredocs.org/clojure.core/trampoline

mikeash · on Nov 14, 2017

Nice. When I wrote that, I was sure there would be some language that had already done it, and I was about 90% sure it would be some sort of Lisp.

andreasgonewild · on Nov 14, 2017

That, or some sort of Forth; the separate keyword also enables calling anonymous functions recursively:

https://github.com/andreas-gone-wild/snackis/blob/master/sna...

ajross · on Nov 14, 2017

> Why don't languages give tail calls their own syntax rather than making it an optimization?

Grumpy old man answer: because if you need "special syntax" to express the fact that your recursively defined function can technically be evaluated in constant stack space, you probably should be writing it as an iterative function to begin with.

The whole point about recursion as a software engineering paradigm is that it's a clearer way to express the problem and thus less likely to be mistakenly implemented. Special syntax is the opposite of clear.

Iteration ain't that hard either folks, and it has the notable advantage of mapping more obviously to the behavior of actual machines we use to execute our code.

mikebenfield · on Nov 14, 2017

> because if you need "special syntax" to express the fact that your recursively defined function can technically be evaluated in constant stack space, you probably should be writing it as an iterative function to begin with.

Why? I find recursive functions are often simpler to understand and write than iteration. Why does the compiler need to stop me from doing so? You write "technically" as though it were just an obscure technicality that it's possible to implement a function call without taking up excessive stack space, but to my mind it's a fundamental fact.

> Iteration ain't that hard either folks, and it has the notable advantage of mapping more obviously to the behavior of actual machines we use to execute our code.

In what sense is this true? Recursive function: update these variables (in registers or memory) and jump back up to this instruction. Iteration: update this variable and jump back up to this instruction.

ajross · on Nov 15, 2017

> Why? I find recursive functions are often simpler to understand and write than iteration.

For simple recursive algorithms, that's true. And those tend to overlap well with ones which don't require assisted tail call tagging or whatever.

Those that do, however, tend to be pretty complicated. And the set of programmers who can understand an interative implementation of them is IMHO rather larger than that which can get their heads around the tail call analysis to know how (and when!) to make it work.

Spivak · on Nov 14, 2017

It is an obscure technicality: your code only works because a compiler optimization makes it work. If a different compiler executed your code exactly as written it would crash. Or another way, two pieces of code that are logically equivalent will succeed or crash depending on whether it's written in such a way that the compiler can recognize an optimization.

I'm in favor of having special syntax because compilers shouldn't really be in the business of 'do what I mean not what I say'.

nybble41 · on Nov 14, 2017

Despite the name "tail call optimization", the proper implementation of tail calls is not "just" an optimization. Preserving a stack frame for a function which has nothing left to do is incorrect behavior on the part of the compiler which negatively impacts both the time and space complexity of the resulting object code. The code, as written, does not require a stack frame. If the compiled object code crashes it is not because it was executed "exactly as written", but because the compiler inserted something beyond what was written.

The Scheme approach of mandating the use of tail calls for every call which occurs in "tail position" is the correct one, in my opinion.

Spivak · on Nov 15, 2017

> nothing left to do is incorrect behavior on the part of the compiler

But it's not though. By this token failing to optimize anything could be considered incorrect behavior. Obviously not turning on optimizations makes your program take longer and use more memory. When a function call is made its variables are pushed onto the stack and popped when the function returns. Doing something different when the compiler detects certain properties of your code is the definition of an optimization that at least in theory shouldn't be relied on to produce correct code.

A more correct design, I think, would be having a 'replace' keyword for function calls instead of return that causes it to replace the stack of the caller. Then the behavior is no longer an optimization but behavior. Now you have the best of both worlds. The compiler can even warn you about when you might want to use replace rather than return.

tweakism · on Nov 15, 2017

You suggest literally the same solution that the person you were arguing with suggested!

nybble41 · on Nov 17, 2017

> Obviously not turning on optimizations makes your program take longer and use more memory.

Yes, but it shouldn't take an algorithm which only requires constant space as written and turn it into object code which requires linear space.

Update: Elaborating on this point somewhat—Consider the case of a simple loop with a local variable. The variable is specific to the body of the loop; each iteration, in effect, has its own copy of the variable. What if a compiler implementer decided, for the sake of simplicity, to allocate block-local variables with alloca(), so that new memory is reserved from the stack for each loop iteration, and only freed when the enclosing function returns? Perhaps an optional optimization is provided -floop-variable-space-optimization which "lifts" the variable out of the loop, allowing the memory to be reused and restoring constant space complexity. Would you still consider this an acceptable implementation on the grounds that "obviously not turning on optimizations makes your program ... use more memory"? I think not.

> When a function call is made its variables are pushed onto the stack and popped when the function returns.

That is nothing more than an unfortunate, naïve implementation choice common among compilers which do not implement tail calls properly. Nothing in the definition of a function or a function call requires the use of a stack. What is required is only a continuation—a reference to the code to be executed when the function is finished. When there is a chain of calls F -> G -> H, and the G -> H call is in tail position, the natural continuation for that call is the same continuation which was provided to G. A compiler which does not implement tail calls properly, however, interposes an extra, unnecessary continuation for the call to H, turning a (potentially) constant-space function into a linear one.

> A more correct design, I think, would be having a 'replace' keyword for function calls instead of return that causes it to replace the stack of the caller.

Most modern languages make it simple to determine when a function call is in tail position, and in such cases I feel that a special calling syntax would be an unnecessary distraction. In Scheme, for example, the cases where a tail call is required are obvious from the structure of the code and are, moreover, spelled out in the language standard. However, I will grant that languages like C and C++ which occasionally depend on the compiler adding hidden code at the end of some functions to run destructors or implicitly transform the result type might benefit from such a keyword. It would, at any rate, be an improvement over the current situation where proper tail calls are deemed an optional optimization.

mikebenfield · on Nov 15, 2017

> your code only works because a compiler optimization makes it work. If a different compiler executed your code exactly as written it would crash.

But it's completely arbitrary that you consider this "a compiler optimization" as opposed to "the way the language behaves." The languages you're used to work the way you describe: if the stars are aligned, a particular compiler may do TCO as an extra optimization.

But there is absolutely no reason it has to be this way. It is perfectly possible for a language standard to mandate that TCO must happen and to specify in an easily understood way when it must happen. As Scheme did over 40 years ago.

goialoq · on Nov 14, 2017

How is a keyword (such as OCaml's 'rec') less clear than rewriting the whole function in an iterative form, with junk like temporary loop variables?

> mapping more obviously to the behavior of actual machines we use to execute our code.

There's a reason we usually don't write in assembly.

Spivak · on Nov 14, 2017

Should the compiler do what you mean or do what you say? Is it good design to have a system where your code only works because the compiler was able to recognize an optimization? Why not make it explicit?

ernst_klim · on Nov 15, 2017

>Why not make it explicit

Because we want programming languages to hide the hardware complexity behind elaborated semantics. Programming languages are for humans, not for machines.

catnaroek · on Nov 14, 2017

Structured looping constructs are rather removed from how a machine works - not that this is inherently a bad thing. The way it actually works in a physical machine is via jumps into code that has already run, i.e. self-reference, i.e. recursion.

adrianratnapala · on Nov 14, 2017

My way of saying the same thing is that there is surprisingly little difference between the recursive and iterative models when you start breaking things down.

Indeed, if I were to add an explicit tail syntax to some language, it would probably involve the "goto" keyword.

ajross · on Nov 14, 2017

A loop can be turned into branch-based code with an absolutely trivial transformation (e.g. evaluate the test, branch to the end if false, branch back to the start at the end), I don't quite know what you mean by that. 3GL compilers were available with working looping syntax within years of the first programmable computers. FIVE decades later, and as this thread proves, we still don't have general algorithms to turn recursive solutions into iterative ones at the compiler level.

It's true that it's more abstracted than machine code. It's absoultely not true that it's equivalently complicated to tail recursion detection.

catnaroek · on Nov 15, 2017

> It's absoultely not true that it's equivalently complicated to tail recursion detection.

Tail recursion doesn't need to be “detected”. The correct implementation of all tail calls, recursive or otherwise, is to overwrite the current activation record, rather than save it.

catnaroek · on Nov 14, 2017

I understand the case for adding O(1) overhead for the sake of debuggability, but not having proper tail calls (it's not an optimization, goddammit) is way more than O(1) overhead.

dghf · on Nov 14, 2017

> Why don't languages give tail calls their own syntax rather than making it an optimization?

Clojure does exactly that:

    (defn gcd [x y]
      (if (zero? y)
        x
        (recur y (quot x y)))

You can also recur with TCO to an explicitly labelled point within the function (effectively creating an inner, anonymous function):

    (defn factorial [n]
      (loop [n n, fac 1]
        (if (zero? n)
          fac
          (recur (dec n) (* fac n)))))

As I understand it, this syntax was developed because of limitations of the JVM w/r/t TCO.

catnaroek · on Nov 14, 2017

> As I understand it, this syntax was developed because of limitations of the JVM w/r/t TCO.

This doesn't make sense. Tail calls don't need any virtual machine support whatsoever: they're translated into jumps by the compiler. The real problem is a lack of willingness to translate functions into anything other than JVM methods, because “compatibility“, but that is a political issue, not a technical one.

denisw · on Nov 14, 2017

You don't need VM support for tail self-calls (which is exactly what Clojure supports with `recur`), but you sure do for optimizing tail calls to other functions. There is no way in the JVM to say "replace my stack frame with a new one for that other method and execute that", especially not one that gives you a useful stacktrace for debugging.

catnaroek · on Nov 15, 2017

> There is no way in the JVM to say "replace my stack frame with a new one for that other method and execute that"

Nor does there need to be one, because translating function calls to jumps is the (AOT) compiler's business, not the VM's.

pjmlp · on Nov 15, 2017

The CLR has byte codes for tail calls, properly named .tail..

_pmf_ · on Nov 14, 2017

I don't think it's that easy. Clojure code calls into Java libraries, and these are regular Java functions so you cannot reuse the same stack frame for these. Pure Clojure code, on which you could perform arbitrary transformations, is quite rare (does not exist at the moment, since the core datastructures are written in Java).

mannykannot · on Nov 14, 2017

If I understand your point correctly, it is not a problem. If your TCO-ed recursive function calls Java functions, they will expand the stack temporarily, but it will all be popped back to your stack frame by the time they return. It is no different than if you made the same calls from an equivalent non-recursive function.

catnaroek · on Nov 14, 2017

> Clojure code calls into Java libraries, and these are regular Java functions so you cannot reuse the same stack frame for these.

Somehow, programs written in <insert language that provides correctly implemented tail calls> can call routines written in C (of all languages!) without any problems. So I stand by what I previously said: the problem is political, not technical.

pjmlp · on Nov 15, 2017

They do so, by having a FFI marshaling layer that creates a stack frame the way C expects it, which is in full control of the language runtime, something that languages that pretend to be Java cannot do on the JVM.

Eventually one could do some tricks with invokedynamic, but I doubt the performance impact would be worthwhile.

stormbrew · on Nov 14, 2017

Definitely. I think, especially, in the case of retrofitting it into an existing language this is the way to go.

cryptonector · on Nov 14, 2017

One would think that by now tail call optimization (TCO) would be understood to be absolutely necessary. TFA makes a very good case, but it's hardly the first or only example of someone making a convincing argument for TCO.

Even in jq we found that adding TCO enabled a number of interesting possibilities. For example, the range() builtin in jq can be implemented as a tail-recursive function (and the version that takes two or three arguments is). There are a number of builtins in jq 1.5 which are made possible and practical only by having TCO. Before we added TCO it just didn't seem that interesting, but it's proven to be quite an enabler.

I can't think of a language that shouldn't have TCO. C/C++ absolutely should have it (and many compilers do). Java should have it (but doesn't). Lisps badly need it and generally have it -- the same goes for all functional (or mostly functional) languages. Python needs it but apparently lacks it[0]. And so on.

Yes, tail calls obscure stack traces by eliding reused frames. This could be ameliorated by leaving a flag in reused frames that could be shown on stack traces to indicate that there are missing frames. Or perhaps syntactic sugar could be used to control which tail calls are to get optimized (since often it does not matter).

[0] https://stackoverflow.com/questions/13591970/does-python-opt...

btilly · on Nov 14, 2017

This is an old debate.

On the one hand with tail calls you can write recursive code and find that it runs fast without excessive memory.

On the other hand when things blow up, it is really nice to have a stack backtrace to help debug why it blew up. But a stack backtrace can't include stack frames that were optimized away. Which makes it far harder to figure out why this call on this object turned into that call on that object.

steego · on Nov 14, 2017

> On the other hand when things blow up, it is really nice to have a stack backtrace to help debug why it blew up. But a stack backtrace can't include stack frames that were optimized away. Which makes it far harder to figure out why this call on this object turned into that call on that object.

Is that really a concern?

If you're writing a loop, do you expect your tools to show you a trace of every iteration in the loop? Of course not, because that would be silly. If you really want to trace each iteration, you can add your own tracing.

I think of tail-recursion as parameterized gotos that are wonderfully constrained and are useful for describing state transformations in a predictable way. It's not a regular function call, so the stack trace is probably a very bad idea.

tuukkah · on Nov 14, 2017

Not all calls in tail position are recursive and when they are, they may be mutually recursive. This means it's not possible to detect and optimize recursive calls only.

sbmassey · on Nov 14, 2017

Tail recursion optimisation gets rid of all tail calls in functions, not just those that might lead to loops - you cannot in general detect those calls that might be loops. This can put a lot of holes in your stack trace.

grumpyprole · on Nov 14, 2017

It's an optimisation. It can be turned off when debugging with stack traces. If you need stack traces in production code, you have too many partial functions.

Rusky · on Nov 14, 2017

But you can't ever turn it off if you rely on it for loops, which defeats the whole point.

grumpyprole · on Nov 14, 2017

You can turn it off, you just can't process large data sets, as you'd run out of stack space. But who wants to debug with large data sets?

stormbrew · on Nov 14, 2017

People who are debugging behaviour with large data sets perhaps? :P

grumpyprole · on Nov 15, 2017

The runtime could allocate "stack frames" on the heap, for such cases.

jcelerier · on Nov 14, 2017

> If you're writing a loop, do you expect your tools to show you a trace of every iteration in the loop? Of course not, because that would be silly. If you really want to trace each iteration, you can add your own tracing.

At least you can know at which iteration you are. If iterate over 12 elements and my loop counter is at 134753745, I can pinpoint what went wrong much more easily.

cryptonector · on Nov 15, 2017

No, that's only if you have a loop counter. And you can have a "loop" counter with function calls too.

bunderbunder · on Nov 14, 2017

I think it probably depends on how you think of the abstraction.

Everyone readily thinks of loops, in the abstract, as "GOTO line X and do everything over again, but using all the state that has changed since the last time we were at line X."

Ideally, you can think of recursion as nearly the same thing: "GOTO line X and do this stuff again, but using these very clearly stated changes to the data."

If you think of recursion that way, then it's fairly self-evidently easier to reason about. In practice, though, I've met few developers who think that way and haven't spent significant time doing functional programming.

I'm guessing it's because you need to spend a fair bit of time with both abstractions before you can really grok either of them. Without that, they're just kind of intimidating. Recursion more so, because everyone's taught from a young age to think imperatively.

tsuraan · on Nov 14, 2017

GHC does non-strict evaluation as well as tail call optimization, and it offers pretty decent stack traces. The team that implemented them wrote a paper on how they did it, including how they fold mutually recursive calls in the call stack representation so that a normal Haskell program won't cause unbounded growth of the stack representation: https://www.microsoft.com/en-us/research/wp-content/uploads/...

I wouldn't expect the same from an OO language, but I'm not sure that it's impossible either.

dualogy · on Nov 14, 2017

Probably could be a user option in compilers for debug builds to either disable TCO or include "faked stack" in the tco-generated loop --- in many-maybe-most cases, wouldn't a cleanly recursive stack trace just consist of ever-the-same-callee repeated a gazillion times? Not all and not the more intricate scenarios of course.

pdpi · on Nov 14, 2017

TCO also applies when you have multiple levels of indirection between the caller and the ultimate callee that actually responds to the request.

goialoq · on Nov 14, 2017

The OP addresses that concern with a link to https://www2.ccs.neu.edu/racket/pubs/cf-toplas04.pdf

btilly · on Nov 15, 2017

That is a different concern.

That paper is about implementing security models based on what is supposed to be in the stack. My concern is about emitting a debugging stack backtrace on error.

If you are concerned with a security model, it is nice to know that you can remove the stack and prove that the security model is satisfied. But you can't provide a programmer with a record of stack frames that you do not have.

pmontra · on Nov 14, 2017

Ruby's got TCO since 1.9 but it must be enabled at runtime in a pretty ugly way [1]. I'm also working on Elixir which has TCO and it's the basis for running server processes: the server function calls itself passing the server state as argument. I don't see why OO languages shouldn't do the same. The argument about losing the stack trace is pretty weak. Nobody wants a million stack frames to debug. If the program dies we check the last values of the variables and that's it.

[1] http://rpanachi.com/2016/05/30/ruby-recursion-stack-size-tai...

rurban · on Nov 15, 2017

Yes. The same argument can be made against fork(), that fork makes debugging too hard. And we don't know if the value we want to inspect is a common one (before cow) or an already copied. So do we forbid fork()?

phkahler · on Nov 14, 2017

I'm not a fan of this style of coding. It's not just the stack space. When you call methods all over the place it can result in inefficient code. And yes, I understand that I wrote "can" in that sentence. We're talking about a case where the author wants the compiler to be even smarter than it already is. I guess that's my point. Using these kinds of abstractions results in a reliance on the compiler to make things efficient. OTOH using abstractions can make things simpler and more understandable for the programmer. But there's that word "can" again.

goialoq · on Nov 14, 2017

It's [2009], not [2011]. 2011 is when the current host copied the document from the archive.

jacinabox · on Nov 14, 2017

They don't need tail calls, unless you're planning on doing idiomatic functional programming in an OO language, which, Why??

samth · on Nov 14, 2017

The answer to your question is the entire point of the blog post, which was written by Guy Steele, who worked on the original design of Java.

pjmlp · on Nov 14, 2017

Smalltalk, the grandaddy of OOP languages uses idiomatic functional programming for its messages.

FascinatedBox · on Nov 14, 2017

Some languages, such as Scala, have a mix of FP and OO concepts. Some FP features make use of TCO, so this helps to make them a first class citizen.

jacinabox · on Nov 14, 2017

Oh yes, my mistake.