Joe Marshall's take on Guido Van Rossum's post on tail recursion

herdrick · on April 26, 2009

What I think stood out most as mistaken about Guido's post was his conflating recursion and recursive data structures in his third point. "For practical purposes, Python-style lists (which are flexible arrays, not linked lists), and sequences in general, are much more useful to start exploring the wonderful world of programming than recursion... Most of Python's library is written with sequences and iterators as fundamental building blocks (and dictionaries, of course), not linked lists, so you'd be locking yourself out of a lot of pre-defined functionality by not using lists or sequences." Well, chained cons cells are lists, too, and recursive functions are good for much more than operating over recursive data structures. Recursion is often the only simple solution in messy real world stuff. Just last week I wrote recursive code to do one of the most real-worldy and least Computer Science-y thing imaginable: get data from a public API. I didn't see any reasonable alternative to recursion. (But since that project was, sadly, not written in a language with TCO I am stuck with a function that's going to break if our little Twitter app gets popular, at which point I'll have to do unreasonable hack.)

But I admire Guido and could be misinterpreting what he said.

dkarl · on April 26, 2009

I think what he meant is that beginners have an easier time with iterators and list indexing than with recursive techniques. An important target group for Python is beginners and non-programmers who do some programming in support of their primary work. That's why Guido values stack traces over tail call optimization. For Python, supporting beginners and amateurs is a higher goal than writing elegant functional code. Even for problems with elegant recursive solutions, beginners seem to find it easier to cobble together a more complicated loop-based solution than to find the simple recursive one. I know that makes everyone really sad :-( and maybe the long-term solution is to fix the school curriculum to teach programming earlier and better, but for now it's reality.

herdrick · on April 26, 2009

Good points. But favoring Python lists over recursion is like favoring chocolate over logging. However, I'm probably just playing gotcha here. My apologies to HN and Guido.

My more important point is that recursion is good for real world stuff.

dkarl · on April 26, 2009

Good points. But favoring Python lists over recursion is like favoring chocolate over logging.

No, no, it's not at all, not when you approach it from Guido's point of view. To him, the crucial question is, "What's the right way for beginners to process data?" Python and its built-in data structures support iteration and list indexing because that's how beginners do it. Python does not support tail calls because Guido thinks it's a mistake to teach recursion to beginners.

I think everyone is misunderstanding Guido's point because they just can't imagine anyone taking his point of view, so they imagine him saying something else. Joe Marshall even called Guido's main points (which he correctly identified, and then ungraciously caricatured) "a sideshow" and said the "true argument" is about the power of recursion. Sorry, Guido doesn't care about the power of recursion, because it isn't valuable to his target programmers, the programmers he wants to serve. He really thinks tail call support would be a very minor feature in Python because it's not important whether you can program in Python that way. He thinks complete stack trace information is very important because it makes things easy and transparent for beginners. That's what's important to Guido and, I would argue, to most Python programmers. When it comes to the future of Python, that's what matters, and Joe Marshall is the sideshow.

herdrick · on April 26, 2009

Python's lists are just as good for recursion as Scheme's cons cells - the data structure doesn't really matter. That was my nitpicky point.

I agree that Python's emphasis on beginner-friendliness was the smart choice and led, ironically, to its current dominance in numerical computation.

silentbicycle · on April 26, 2009

If you want to see a language which is stylistically very similar to Python* , but has TCO, check out Lua.

In practice: you can easily add pre-tail-call tracing, you have the option of starting a debugging REPL on error, using iterators is very idiomatic, it makes implementing coroutines and fully-functional closures easier, etc.

I happen to think the most significant benefit of TCO is being able to set up "looping" tail calls between multiple functions, but that's just me.

* Somewhere between Python and Javascript, specifically.

nostrademons · on April 26, 2009

He never addresses the stack trace issue. Unless someone can figure out how to preserve useful stack traces while doing TCO, I'm still firmly in Guido's camp.

My boss made an observation recently. He did RoboCup when he was in college, and he found that the success of a team was almost directly correlated with the extensiveness of their debugging tools. The teams that built instrumentation to figure out exactly what their robot was "thinking" tended to kick the ass of the teams that coded very carefully and hoped it all worked.

A major component of Java and Python's productivity improvement over C++ comes from the ability to easily get clean, readable stack traces whenever anything goes wrong. It's already a huge pain when your Java stack trace says <compiled code> instead of giving you a method and line number; anything that makes that the default will cause far more headaches in debuggability than it solves.

wingo · on April 26, 2009

The thing is, usually when you want the stack frames, /they are there/. They're present because they form part of the continuation of the computation.

Tail calls, because they don't form part of the continuation, are usually not interesting at all in stack traces.

To put it another way: a python backtrace doesn't include statements that were already executed. "Of course it doesn't, they're already executed!", you say -- well so have the tail calls! They are exactly equivalent (google for CPS transformation).

eggnet · on April 26, 2009

I think a debug command line switch to python would suffice. You could use the heap to store stack traces when TCO is in use.

ralph · on April 27, 2009

Using the heap to store stack traces when TCO is in use and a "debugging" flag is set won't work. It could cause memory to be exhausted. You're effectively causing each recursion to consume memory; something TCO is trying to avoid. If the coder assumes TCO then they could have written code which will exhaust memory once the "debugging flag" starts storing stack trace information.

njharman · on April 26, 2009

So, we need tail recursion so we can prematurely optimize the space/time of our programs.

Sarcasm aside I'm sure for some people, for some applications that is needed. But there are plenty of languages that do that and do that well. Why do people want every language to be "their" language. For Python I'm glad Guido is following the Zen of Python.

Hexstream · on April 26, 2009

I think the "premature optimization" phrase is usually used in contexts where you have to perform said optimizations by hand... (of course in some cases you'll have to consciously write a procedure as tail-recursive).

Herring · on April 26, 2009

I must be missing something. Why can't we skip the drama & add it as an option?

silentbicycle · on April 26, 2009

Python has a pretty distinctive programming style (what's considered "pythonic"), and code that makes idiomatic use of tail calls tends to be structured quite differently. While it may have been a good addition early in Python's history, retrofitting it now would make the language more complicated and messy, and avoiding having a dozen ways to do the same thing (as in Perl) has been a goal from the start. Python's design seems to argue that any incremental improvements gained by using one coding style over another will be lost due to dealing with several different coding styles. (Particularly when somebody is actually trying to write Scheme, Haskell, etc. in Python.)

Questions about whether it'd be better if Python supported tail-call elimination, type inference, macros, etc. need to consider not just the theoretical power advantage, but the extent they would tend to fragment the existing community, make existing code harder to use/maintain, etc.

FWIW, Lua (http://lua.org) is stylistically quite similar to Python, and has an equally strong focus on keeping things simple, but approaches it differently. Its designers kept the core language extremely small, yet easily extensible in Lua or C. It's very interesting to compare and contrast its design with Python's. (It's also a great language in its own right, and it has TCO.)

Daniel_Newby · on April 26, 2009

Several good reasons. Firstly, Python is sufficiently dynamic that you cannot easily tell at compile time which function will be called at run time: the code can call a function that rewrites the name space as a side effect. That's generally a terrible idea, but somebody is probably doing it. EDIT: Actually, it is not a bad idea. Consider an object that plugs a worker method into itself (self.worker = some_function) and then calls it on the object, like self.worker( self ). This is pretty reasonable for Python code, but it would be hard to analyze sufficiently at compile time to eliminate tail calls.

Secondly, it is difficult to deal with a tail recursive call inside a try-catch block. Consider a multithreaded recursive tree manipulation algorithm that locks nodes as it descends into them, using finally blocks to guarantee lock release during exception unwind. The core algorithm can undergo tail call elimination, possibly releasing references to numerous large data structures. Yet the exception handling variables must remain on a stack, and some references to core algorithm variables may need to be retained. It is possible to do a hybrid stack elimination, but not easy. And the exception backtrace would have bizarre gaps where variables were eliminated, complicating debugging.

shiro · on April 26, 2009

> you cannot easily tell at compile time which function will be called at run time: the code can call a function that rewrites the name space as a side effect.

I think you don't need to know which function will be called at compile time to do tail call elimination, in general. As far as the compiler can see a function call, and can see the caller has nothing to do except returning the result of the call, then the compiler doesn't need to know which function is called (It doesn't even matter if the callee has some "wrappers" like decorators. The caller can just jump to the wrapper entry).

Is there something peculiar about Python that makes it difficult? (I don't understand how your self.worker example prevents tail call elimination, but maybe that's because I don't know enough about Python.)

> Secondly, it is difficult to deal with a tail recursive call inside a try-catch block.

A call made inside try-catch block is not a tail call, almost by definition. The compiler may perform something clever, but from the programmers POV, it's sufficient to regard them as non-tail calls.

Daniel_Newby · on April 26, 2009

> I don't understand how your self.worker example prevents tail call elimination, ...

If some_function calls a bunch of random functions, it is very, very hard to tell if those calls will execute "self.worker = some_other_function" as a side effect. This happens because everything in Python can be changed on the fly; languages with true constants don't have this problem.

> A call made inside try-catch block is not a tail call, almost by definition.

Right, and since Python makes wide use of exceptions, true tail call elimination is often impossible. The best you can get is partial elimination, and it is a PITA.

shiro · on April 26, 2009

> If some_function calls a bunch of random functions, it is very, very hard to tell if those calls will execute "self.worker = some_other_function" as a side effect.

I still don't see, since changing self.worker at runtime does not matter at all. Whether a call is at tail position or not can be determined only looking at the caller code. If self.worker(self) appear at the tail position, compiler can generate code sequence something like this:

   discard caller's frame.
   push the value of 'self' as argument.
   retrieve value of 'worker' attribute.
   jump to the method pointed by the value.

The value of worker attribute may be different every time this code is executed, but that's totally fine.

The only construct I can think of so far that prevents compile-time tail call elimination is first class macros, since with them the compiler cannot determine a call is tail position or not. It is the extreme end of dynamism. But even with them, you can do runtime tail call elimination. It might not be as efficient, though.

> Right, and since Python makes wide use of exceptions, true tail call elimination is often impossible.

Wait... are we talking about the same thing?

What I mean by tail call elimination is converting tail calls to jumps. It has nothing to do with non tail calls, hence wide use of try-catch block has nothing to do with "true tail call elimination".

What you can argue is something like: because try-catch block is so abundant in Python code that tail call elimination won't add much value to typical Python programs. I don't know if it's true or not, but it's plausible. (E.g. C++ has to run destructors when stack-allocated objects goes out of scope, which makes most of calls non-tail calls, hence effects of tail call elimination will be limited.)

DarkShikari · on April 26, 2009

I understand what Guido is saying to some extent. I thought through it, and my logic seems to go as follows.

1. Compilers are not perfect. In fact, they are incredibly dumb and get things wrong all the time. Add in the fact that the compiler often does not have enough information to properly make an optimization, the output goes from "retarded" to "braindead."

2. If you make certain types of optimizations and tell people that it is fine to rely on them--even if those optimizations will not always work--performance on average will get worse.

Tail recursion seems like a classic case of an optimization which, if one relied on it but the optimization failed, would have catastrophic results (a crash due to running out of stack space). In a very simple language like Scheme, relying on tail recursion is probably not a problem because of how simple such an optimization is. But with something of the complexity of python, it might not be as safe. Given the catastrophic number of regressions in compilers these days, imagine what would happen if your program was made for Python 2.5, which had tail recursion optimization, and then you updated to 2.6, and due to a change in optimization code, it failed to do tail recursion optimization on one particular function? Your program would crash.

I think in general it is never safe to make the assumption that an optimizer will make a particular "good decision". Compilers and optimizers are incredibly stupid, and relying on them to be smart is a sure way to end up with all sorts of problems. Yet with something like a C compiler and loop unrolling, the consequences of such a thing can never be too bad: even in the worst case, you only lose performance. But with tail recursion, a failure could mean an actual crash.

shiro · on April 26, 2009

This is why I don't like people referring proper tail recursion as tail call optimization. It leads to an assumption that it is somewhat optional, or something that doesn't change meaning of programs but only affects performance.

Proper tail recursion is much more fundamental property of language. Whether you have a guaranteed proper tail recursion or not affects the behavior of the program in terms of complexity (as discussed in Joe's article). Consequently it affects your choice of algorithms and coding styles. An example is that you start feeling very natural to represent state machine in terms of function calls (function = state, tail call = transition).

I don't claim that proper tail recursion is the only correct answer. It is a design choice. But if you opt out, you'd better understand what you're trading.

That said, I agree with you that optional TCO which you can't rely on isn't very useful. The proper tail recursive language, like Scheme, defines exactly what are tail calls so programmers can rely on it.

kylec · on April 26, 2009

I think that compiler optimizations should be classified into "leaky" and "non-leaky" (much like Joel's leaky abstractions). Basically, a "leaky" optimization would be where the compiler looks for basic patterns that can be optimized away, but where sufficiently complicated and convoluted code can cause the compiler not to recognize the optimization.

A "non-leaky" optimization would be one where the compiler can always optimize away a specific coding pattern. Though, to some, tail-call elimination is dark magic, in reality the optimization can always be performed. As alluded to in the article, all that's necessary is to replace the existing stack frame when returning a value.

DarkShikari · on April 26, 2009

Even so, you need to be able to make a very specific formal declaration of exactly what constructs result in the optimization being non-leaky. It needs to be completely straightfoward and easy enough to understand that no programmer gets into the situation where they think a leaky optimization is non-leaky.

And once you make an optimization non-leaky, it cannot ever become leaky again in any particular situation whatsoever, or it could break existing programs.

silentOpen · on April 26, 2009

Garbage collected languages are good examples of places where people regularly trust complex abstractions. Compiler behavior is analogous.

DarkShikari · on April 26, 2009

Which is often equally dangerous--applications which rely on garbage collection often use vastly more memory than applications that don't.

A particularly analogous situation, however, would be a C application that relies on a non-guaranteed GC, i.e. one that is not guaranteed to garbage collect all objects. In such a case, you're almost sure to eventually run out of memory. This is as opposed to the Java GC, which, while potentially inefficient, will (AFAIK) GC all objects.

Another analogous situation would be an application built to run on a server with 2 gigabytes of memory. The application uses 1.6 gigabytes and uses GC. A change in GC results in the program actually needing 2.4 gigabytes despite the actual internal memory usage never changing, and your servers grind to a halt.

silentOpen · on April 26, 2009

I wouldn't want my pacemaker running a garbage-collected language and yet most web apps are written with GC. Google uses GC languages. GC can be done correctly and efficiently.

TCO is not very hard to implement in a compiler/interpreter if considered from the beginning. Unless your compiler/interpreter was designed very poorly, it probably isn't very hard to implement later, either.

Given this, it doesn't really make sense to not trust TCO because it's somehow black magic. There may be other arguments against wanting TCO in your language of choice ("I really need backtraces all the time and I can't sacrifice simple, provable interpreter optimizations!") but reliability can't be it.

DarkShikari · on April 26, 2009

Given this, it doesn't really make sense to not trust TCO because it's somehow black magic.

I wouldn't say it's black magic at all--it's just that from my experience you can rarely even trust compilers with the most basic of things, let alone black magic.

silentOpen · on April 26, 2009

If you're doubting your compiler for the most basic things, you may want to get a new compiler.

functional-tree · on April 26, 2009

> I wouldn't want my pacemaker running a garbage-collected language and yet most web apps are written with GC.

I'd prefer GC over memory leaks...

silentOpen · on April 26, 2009

You don't need either in a pacemaker, though. If you're dynamically allocating memory in a pacemaker, something's wrong.

Shamiq · on April 26, 2009

AskHN: If you were to describe this to a slow, first year CS student, how would you do that?

mkramlich · on April 26, 2009

I tried to summarize Marshall's argument. Let me know if it makes sense, and answers your question.

mkramlich · on April 26, 2009

Well written.

Though it seems the whole article could have been said more concisely as:

"Guido's decision is wrong because it causes Python to use more resources than necessary in certain cases." [where resources is any one of: stack, heap, file descriptors, etc.]

Which is non-ideal, in isolation, certainly. But Guido's whole post was about saying there are trade-offs, and he doesn't like the trade-off made in the other direction. (Misleading stack traces, etc.) Guido does seem to understand both sides of the argument. But ultimately it is a design decision. And he made it.

allertonm · on April 26, 2009

Almost, there is one unstated additional qualification: all the above is only true if you don't believe in (or cannot use) boring old iterative loops and must implement an algorithm recursively.

I found it hard to believe the article went to the extent of explaining TCO (with assembly language!) and space complexity without bothering to mention this little assumption.

Since Python favours iteration over recursion I have some sympathy with GVR on this one.

wingo · on April 26, 2009

The "Lambda: The Ultimate GOTO" paper shows a still-timely example of when you might want to express a state machine as mutually recursive functions. Granted, they compare to the actual GOTO, but you can't make state machines of any manageable size with iteration, either.

Shamiq · on April 26, 2009

Gotcha. Thanks.