
Rust will likely not support tail call optimization - steveklabnik
https://mail.mozilla.org/pipermail/rust-dev/2013-April/003557.html
======
pcwalton
Note that we do sibling call optimization per LLVM, and you can use
trampolines like Clojure does. Sibling call optimization, most notably,
includes all self-calls. This means that calls to the currently executing
function in tail position _will_ be tail call optimized, if there are no
destructors in scope.

The biggest problem here is ABIs: you have to use Pascal calling conventions
instead of C calling conventions. It's also difficult when you have
destructors: there are not many tail call positions when destructors are in
scope.

~~~
masklinn
Wouldn't it be possible to have some sort of macro able to "manually" convert
calls in a recursive style into loops? Something along the lines of Clojure's
`loop/recur` but less tricky (and maybe with a single "word")?

~~~
ufo
Self recursion is only a particular case and you can often represent it with
loops or folding patterns as you are imagining. However, the really cool use
for tail call elimination is when you are calling _other_ functions since you
can't rewrite the code without breaking encapsulation. Out of the top of my
head, one important example of this would be continuation passing style (you
need TCO to turn the continuations into real "gotos")

------
beagle3
Please, for the sake of $DIETY, stop calling it "Tail Call Optimization". It
is not an optimization, and the wrong term causes lots of useless discussions
and misunderstanding. Call it "Tail Call Elimination", because that's what it
is.

With finite memory (that is, in the real world), a program that relies on tail
recursion will blow up if TCE is not implemented even though it can run
forever with TCE. Thus, it changes actual program results (rather than just
runtime speed or amount of memory required), and cannot be considered just an
"optimization", since it is functionally required.

And to future-language designers: Please, for the sake of $DIETY2, make the
syntax for guaranteed-eliminated-tail-call different than a regular call: It's
really a bad idea that a function such as:

    
    
        def f(x):
           ...
           return 0+f(x-1)
    

cannot in general be TCEd (adding 0 to a floating point imporper denormal will
make it proper), whereas

    
    
         def f(x):
            ...
            return f(x-1)
    

can. I suggest "chain y" instead of "return y" so that the compiler can verify
that a TCE can indeed happen.

~~~
millstone
_cannot be considered just an "optimization", since it is functionally
required_

Here is a C program:

    
    
        #include <stdio.h>
        int main(void) { while (1) malloc(64); }
    

With clang, when compiled with -O0, this consumes all the memory on my system.
When compiled with -O2, it runs forever in constant memory, because clang has
optimized out the call to malloc. If I replace the call with, say,
'malloc((size_t)(-1) >> 1)', it even changes the output of the program.

Here is another example:

    
    
        int factorial(long long x) { return x * factorial(x-1); }
    

With gcc, when compiled with -O2, the factorial function can correctly compute
the factorial of a very large number. When compiled with -O0, it cannot.

I would call both of these optimizations, even though someone may rely on
either for correctness. So I don't think tail calls are unique in this
respect.

 _make the syntax for guaranteed-eliminated-tail-call different than a regular
call_

Agreed.

~~~
bcoates
Both of those programs are playing with C's infamously broad undefined
behavior rules, the first for a non-terminating loop and the second for using
too much stack (when called with a sufficiently large number). -O2 being nice
to you is a coincidence, not something you should rely on.

In a language which prefers diagnosing programmer error over punting both
should be errors.

------
dewitt
Clojure also doesn't support TCO, and from what I can tell the explicit
loop/recur is sufficient, which also has the advantage that it is very clear
(i.e., the compiler will tell you) when TCO is _not_ being invoked. With
implicit TCO you might think you're getting the benefit, only to have a stack
blow up unexpectedly at runtime with an edge-case input.

~~~
pcwalton
Especially with destructors. Whenever you have a destructor in the frame,
you're _not_ in tail-call position. This means that tail-call positions in
Rust are going to be few anyway.

~~~
qznc
That calls for an interesting optimization: Using escape+liveness analysis you
could probably call some of those destructors earlier.

~~~
kragen
The problem is that TCO in the sense we're talking about isn't really an
"optimization". It's a language feature that removes the need for explicit
looping constructs by allowing you to write them as higher-order functions
instead. This only works if the guy writing the higher-order function can
prove that the compiler will successfully "optimize away" the tail call, or if
he doesn't care whether the program dies with a stack overflow. If the success
of TCO is dependent on which conservative approximation the compiler is using
for escape and liveness analysis, then people who care about their programs
continuing to run will demand explicit looping constructs anyway.

There are three separate inventions in Scheme that work this way. Closures
give you objects without an object construct, tail-call optimization gives you
loops without looping constructs, and call/cc gives you threads, exceptions,
and backtracking without thread, exception, or backtracking constructs.

(I mention Scheme because all three of these were, as far as I can tell,
introduced in Scheme, and only later adopted by other functional languages
like ML, although to be fair, TCO at least falls out naturally from
combinator-graph reduction.)

In a sense, Scheme is sort of like a functional assembly language: there are
lots of object systems, looping constructs, threads, and exception systems in
Scheme, and they aren't compatible with each other. It's sort of like the
situation with linked lists in C, where every library has its own linked-list
type.

It's exactly the opposite of assembly language in another way, though. By
making object instantiation and population implicit, both the compiler and the
maintenance programmer have to do extra work to figure out what's an object
and what's not, and what the object's fields are. The same is true of loops,
threads, and exceptions. In assembly, instead, you have the problem of things
being _too explicit_ , thus losing the signal in the noise.

That's my take, anyway. I haven't spent that much time programming in either
Scheme or assembly, although I did write an almost-Scheme compiler targeting
assembly called Ur-Scheme.

~~~
solinent
The "assembly language" for ML would actually be the lambda calculus. Scheme
isn't really defined unless you're talking about some standard, and the
semantics of the language are probably just defined in English.

~~~
kragen
Hmm, it sounds like you're trying to argue with me, but I'm not clear on what
you're saying — some quick responses — hope this helps:

① I did not claim Scheme was an "assembly language" for ML. I said that
programming in, reading programs in, and compiling programs in Scheme was like
programming in, reading programs in, and compiling programs in assembly
language in some specific ways, and very unlike it in others. The relationship
between Scheme and ML is that some of the central insights of Scheme were
adopted by ML.

② ML defines evaluation order. The lambda calculus does not. Typical ML
implementations compile to the assembly languages of actual processors. Can
you clarify?

③ I think Scheme is sufficiently well defined for this discussion — it's a
family of languages originating in some papers by Sussman and Steele in the
1970s, and continuing through the current R7RS work, including a number of
compilers. Several of the standards define the semantics of the language
symbolically, not just in English.

------
protomyth
It seems like part of decision was reached because of limitations in LLVM[1].
This makes me a bit uneasy, but I guess it has to follow C to meet its goals.

1) "LLVM does support tail call optimization, but it requires using a
different calling convention than C and slowing down all function calls" from
[https://mail.mozilla.org/pipermail/rust-
dev/2013-April/00355...](https://mail.mozilla.org/pipermail/rust-
dev/2013-April/003556.html) and [http://llvm.org/docs/CodeGenerator.html#tail-
call-optimizati...](http://llvm.org/docs/CodeGenerator.html#tail-call-
optimization)

~~~
cwzwarich
They could also add their own calling convention to LLVM to support what they
want like GHC and HiPE have done.

~~~
saidajigumi
No they can't, at least not without compromising on several points laid out in
TFA. Mainly these two:

    
    
        - Tail calls also "play badly" with assumptions in C 
          tools, including platform ABIs and dynamic linking.
    
        - Tail calls require a calling convention that is a 
          performance hit relative to the C convention.
    

See other posts in this discussion for examples of the impacts of some
alternate calling convention choices.

------
btipling
What is tail call optimization? How does it usually work? There's an
implementation example on the Wikipedia page to answer this question a little
bit:

<http://en.wikipedia.org/wiki/Tail_call>

~~~
ericbb
And, of course, "Debunking the 'Expensive Procedure Call' Myth, or, Procedure
Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO" by Guy
Steele. Available here: <http://library.readscheme.org/page1.html>

------
pekk
I'm puzzled: if Rust and Clojure won't support TCO and they are both cool, why
is there such bitter complaining every time it comes up that Python won't
support TCO?

~~~
andolanra
I can't speak for Clojure, but Rust's rationale for omitting TCO is based on
sound engineering issues (as articulated) whereas Guido's sole reason for
omitting it (as far as I've seen) is that you lose stack trace information,
which is considered something of a red herring (as you can always selectively
disable TCO for debugging, and there are ways of retaining stack information
anyway.) Note that the original article here is quite disappointed that they
_can't_ include TCO, because it would be a valuable tool and allow for
different, efficient styles of programming.

~~~
rit
I can't answer fully for Clojure's reasons for not supporting TCO, but it will
at least be fundamentally rooted in the fact that the JVM, as a platform, does
NOT support TCO.

Scala is usually the language brought up as a counter argument here, but Scala
has the same limitations as Clojure - the JVM can't do the TCO. Scala's
compiler tries, with certain types of tail calls, to optimize via (I believe)
a trampoline – it reduces those calls into a loop, so they are no longer a
function call.

But, only certain types of calls (specifically, the last line of code in a
recursive function must be a call to the hosting function) can be TCO in
Scala. There is an annotation, scala.annotation.tailrec, which can be placed
above a method you want to tail recurse.

@tailrec does not, however, "force" the compiler to do TCO – it simply forces
compilation to fail if the method in question cannot be Tail Call Optimized.
It's a developer hook for saying "I realize the compiler _tries_ to do TCO
where possible, but I require this method to be trampolined". If it can't be,
you'll get a compilation error with details on why the TCO failed.

~~~
derleth
> the JVM, as a platform, does NOT support TCO.

And the x86, as a platform, does?

What, specifically, does the JVM do to make TCO more difficult than it would
be in the machine code of your choice?

~~~
kragen
Yes, the x86 instruction that implements TCO is called JMP. The JVM doesn't
have an equivalent instruction; you can only jump around inside a single
method, not to the beginning of another method.

~~~
derleth
> The JVM doesn't have an equivalent instruction; you can only jump around
> inside a single method, not to the beginning of another method.

OK, why do functions in the source language have to translate directly into
methods at the JVM level? Purely for debugging?

~~~
kragen
The JVM also doesn't (I'm fairly sure) have an indirect JMP instruction, which
means you can't compile a polymorphic source-language method call into a JMP
inside of a single generated method. Instead you have to use a call to a
method. That means that tail-calls to functions passed as parameters (rather
than functions that can be statically bound at compile time) can't use JMP.

~~~
happy_dino
The “JVM” can use any instruction it wants to use.

If you want a runtime with proper tail calls, use a implementation which
supports it.

~~~
kragen
I'm talking about the bytecode instructions standardized in the "JVM"
specification. The "JVM" has to implement the semantics of the instructions
found in your bytecode program, or your program won't run. Your bytecode
program can only use bytecodes defined in the "JVM" specification, or it won't
run on the "JVM".

~~~
happy_dino
There is no need or reason to change or add any bytecode instructions to
support proper tail calls.

Take standard class files and execute them on a runtime with proper tail
calls. Done. Works.

~~~
kragen
Which JVMs support proper tail calls? Doesn't this violate requirements of the
JRE?

~~~
happy_dino
For instance oss.readytalk.com/avian. Why should it?

~~~
kragen
Because the security manager needs to be able to introspect the stack. Avian
isn't a JVM; it's "designed to provide a useful subset of Java's features" and
can run some Java code.

~~~
abecedarius
It's actually possible to implement the stack inspection in a tail-call-
friendly way (along the lines of Racket's continuation marks IIRC), though
AFAIK nobody does it.

------
mike_ivanov
A TCO-support discussion on Y combinator.

OH IRONY.

~~~
zxcdw
Elaborate for the ignorant.

~~~
qznc
The Y Combinator is something important in the theory of functional
programming. It takes one function as argument and applies it to itself ad
infinitum. Together with lazy evaluation this is how you can implement
recursion in a language which does not natively support it, namely (lazy)
lambda calculus.

[http://en.wikipedia.org/wiki/Fixed-
point_combinator#Y_combin...](http://en.wikipedia.org/wiki/Fixed-
point_combinator#Y_combinator)

Actually, TCO and Y are not really related with each other apart from the fact
that you usually learn about them in the same course about functional
programming.

~~~
kragen
If you write your loops with the Y-combinator, you need tail-call elimination
to keep them from overflowing the stack when they run for enough iterations,
because the way your function f goes on to the next iteration of itself is by
calling the (Y f) that was passed as its first argument.

~~~
qznc
You can write a lambda calculus interpreter which does beta-reduction. In this
case, there is no stack which could overflow. Just a lambda expression, which
is reduced to another expression.

~~~
kragen
Yes, as I mentioned in another comment in this thread, TCE falls out
automatically from combinator-graph reduction.

------
asbut
The problem seems to be that you can't do tail calls with the C calling
convention.

For example, let's say a 0-argument function tail calls a 1-argument one: the
1-argument function expects an argument on the stack, so the 0-argument
function must push one.

However, when the 1-argument function returns, the argument will still be on
the stack, because with the C calling convention the caller removes arguments
from the stack.

But the caller called a 0-argument function, so he won't remove any argument,
and thus the stack is now misaligned due to the argument left over there,
which will crash the program soon.

However, just switching to the Pascal/stdcall convention, where the callee
removes arguments from the stack should just work; it might be slightly
slower, but on all modern architectures (i.e. those that aren't x86-32)
parameters are going to be passed in registers anyway for must functions, so
it shouldn't matter.

The problem of that is that non-varags functions cannot be called with a
varargs prototype; this is an issue with K&R C that allows calling undeclared
functions, but isn't an issue with Rust.

So, I'm not quite sure why Rust doesn't just switch calling conventions and
support tail calls.

------
jevinskie
Here is some information about TCE in LLVM from the lead developer:
[http://nondot.org/sabre/LLVMNotes/GuaranteedEfficientTailCal...](http://nondot.org/sabre/LLVMNotes/GuaranteedEfficientTailCalls.txt)

------
Rickasaurus
Screw it, we might as well use Go

Seriously though, why not just use a rec keyword and then disallow any of the
things that don't play nicely when you're in the recursive function? If you
really wanted to be cool you could put those things right into the type
system.

~~~
ufo
As they mentioned in the original post, a big problem is that TCO does not
play nice with two other important features they want.

1\. deterministic destructors that run at the end of functions (therefore
making things that look like tail calls not actually tail calls) and

2\. binary compatibility with C and C++ libraries and tools (they say that
tail recursion doesn't let you usethe C calling conventions that these tools
and libraries expect you to use)

There is no point in allowing tail recursion in restricted contexts if you
can't use these restricted functions to do the sort of stuff Rust was actually
made to do

~~~
ihnorton
> binary compatibility with C and C++ libraries and tools >(they say that tail
> recursion doesn't let you use the C > calling conventions that these tools
> and libraries expect > you to use)

I've read variations of this comment about Rust C++ compatibility a few times,
but haven't managed to find a source. Any references you could point me to?

~~~
kibwen
AFAIK there's no plan to support any sort of C++ interop natively in Rust.
However, it should work just fine if your C++ code exposes a C-compatible
interface. Servo has to be able to call into SpiderMonkey _somehow_.

~~~
kragen
GCC C++ ABIs are unstable enough that cross-platform C++ libraries usually
export a C-compatible interface already.

~~~
ihnorton
g++ vs itself, or g++ vs msvc? g++ uses Itanium C++ ABI on at least
linux/mac/mingw, and the mangling and vtable layouts seem to be the same (I'm
very interested in any information to the contrary - though I realize vtables
are only part of the story).

~~~
kragen
Itanic may be an exception, but there have been at least two backwards-
incompatible C++ ABI changes in G++ over the years. Casting to superclass in
the presence of multiple inheritance and exception handling have been among
the changes. Maybe they should change vtables too, I don't know.

