Thinking About Recursion

clusmore · on March 9, 2017

Gerald Sussman covers the Tower of Hanoi problem in the SICP lectures [1]. The thing I find fascinating about this solution is that it doesn't even require the rule that you can't place larger discs on smaller discs - it simply is never optimal to do so anyway. And the solution is so simple, I wonder if you could invent a game by first coming up with a solution and then seeing what it actually solves.

[1] https://youtu.be/dlbMuv-jix8?t=47m17s

justinpombrio · on March 9, 2017

> The thing I find fascinating about this solution is that it doesn't even require the rule that you can't place larger discs on smaller discs

It had better. If you're allowed to put larger disks on smaller disks, you can solve towers of Hanoi in linear time. Call the rods A, B, and C: the goal is to move the stack from A to C. First flip the stack upside down, one disk at a time, from A to B. Then do that again, from B to C. The stack is now upright on C, after 2n moves.

(In contrast, the real problem requires O(2^n) moves.)

jawarner · on March 9, 2017

Grant Sanderson has a beautiful video on the Towers of Hanoi problem too [1], part of his animated math project.

[1] https://www.youtube.com/watch?list=PLZHQObOWTQDMRtm8h9bG9P06...

clusmore · on March 9, 2017

Wow, fantastic insight. Thanks for sharing.

khedoros1 · on March 9, 2017

> The thing I find fascinating about this solution is that it doesn't even require the rule that you can't place larger discs on smaller discs - it simply is never optimal to do so anyway.

It seems like without that rule, the game's solution is 2n-1 steps (dump n-1 disks from the left to the right, move disk n to the center, then un-invert those n-1 disks on top of disk n), rather than 2^n-1 moves.

clusmore · on March 9, 2017

>dump n-1 disks from the left to the right, move disk n to the center, then un-invert those n-1 disks on top of disk n

In the case where you can place larger disks on top of smaller disks, the sub-problems don't have the same solutions, i.e. In order to move N disks from A to B you first move N-1 disks from A to C, then one disk from A to B, then the N-1 disks back from C to B. But in order to solve the N-1 disks, you don't need to ensure that the largest disk ends up on the bottom -- you just move them all straight to the target.

I guess the rule is implicitly tied up in the fact that the sub-problems are solved in the same way as the larger problem.

brianberns · on March 8, 2017

Man, this article really shows why imperative programming is so much harder to reason about than functional programming.

Difference between "variables" in math vs. functional programming? None. The value of a variable can't change.

Difference between "function" in math vs. functional programming? None. It's just a map from one set to another.

eikenberry · on March 9, 2017

What evidence is there that math is easier to reason about than imperative programming?

brianberns · on March 9, 2017

Here's an article that explains it pretty well: http://fsharpforfunandprofit.com/posts/is-your-language-unre...

dualogy · on March 9, 2017

Incomplete assessment.

Functional programming much better supports equational reasoning.

Imperative programming much better supports operational reasoning.

Or you could contrast "data flow" vs "control flow".

I'd have thought we won't need operational reasoning in 2017 but currently I conclude it may always be a part of computer programming..

khedoros1 · on March 9, 2017

You've described the reason that I doubt I'll ever do much with functional programming. Using it to solve a non-trivial problem reminds me unpleasantly of math, rather than lists of instructions.

dreamcompiler · on March 9, 2017

This is what I like about FP. As in math, you're expressing ideas about relationships that must always hold true, regardless of time. You think less about how state changes over time and more about what the fundamental "truths" of your program have to be. For me -- and for many problems but certainly not all -- this is an easier way to think.

dualogy · on March 9, 2017

yeah "regardless of time and space", as one observes in the resulting time and space usage. By the time the purely functional program has been tweaked for better (time and space) performance, said readability has greatly suffered ;D and I say that as an enthusiastic FPer

komali2 · on March 9, 2017

I don't really know much about functional programming. How can a variable not change?

crimsonalucard · on March 9, 2017

In functional programming the entire program is literally a single expression. This is essentially what functional programming is. A variable in a single expression cannot be modified.

clusmore · on March 9, 2017

In mathematics, variables don't change inside expressions -- they define a family of expressions, each with a different value. For example, in the function

  f(x) = x^2 + x + 1

the value of x in the right-hand-side expression doesn't change in the same sense that variables in imperative languages change. But it allows the single expression "x^2 + x + 1" to have a value which varies by the value of x. For example, f(2) yields the value 7 while f(3) yields the value 13.

dualogy · on March 9, 2017

Construct a whole program of functions only with say `readonly` (in c#) or `const` (in EcmaScript) or only `:=` (in Go) "variable" "assignment-declarations", never mutating them after initial declaration. It's possible and it's one of the core tenets of purely FP.

jsperson · on March 8, 2017

I just spent an hour reading this blog. I now feel absolutely smarter and relatively dumber. What a wonderful blog. Thanks for sharing!

ColinWright · on March 8, 2017

Wow. Thank you.

charles-salvia · on March 9, 2017

The best way I've found to introduce the concept of recursion is to explain it like trying to escape from a maze. Suppose at your starting position you have 3 possible corridors to try, in search of the exit. You mark your current position (using bread crumbs or whatever), then try each corridor. If the corridor is a dead end you return to where you left the bread crumb and try another corridor. If one of the corridors leads you to a room with yet more corridors, you leave another bread crumb to mark your position, and then recursively try each of those corridors. Each time you hit a dead end, you'll be able to back track to the last crumb you left. Eventually, you'll find a corridor that leads to the exit. (At that point the metaphor breaks a little because stack unwinding would require you to go back and clean up all your crumbs, but whatever. Maybe a better metaphor would be collecting some items from all the corridors in the maze, and bringing them back to your starting position.)

clusmore · on March 9, 2017

My favourite analogy is that of finding out which row you are in at the cinema. You don't know which row you are, but you know you're 1 behind the guy in front of you, so you ask him and figure you'll just add 1 to that. He doesn't know either, and using the same strategy he asks the person in front of him.

Eventually, the guy in the second row asks the guy in the first row, who immediately says he's in the first row. Then each person adds 1 to the answer they get back and feed it back until eventually you get your answer.

charles-salvia · on March 9, 2017

That's a good analogy - the problem is a student could reasonably ask "why would you do it this way"? - i.e. this kind of algorithm seems like a better candidate for iteration. Instead of asking the guy in front of you, get up and count the rows. I know it's for teaching purposes of course, but I'd like the analogy to involve a task that requires that you try something, then back track to some earlier state, then try a different thing, etc.

Of course, I guess this just highlights how all the toy examples we see in CS courses often use trivial things like computing Fibonacci sequences or factorials, which are much easier to do iteratively. This often leaves students wondering why anyone would ever even use recursion, or at best, makes it seem like recursion vs. iteration is a mere stylistic choice.

jcranmer · on March 9, 2017

The Ackermann function I personally find better for recursion since you can't implement it trivially with iteration--it's not primitive recursive. The downside is that it's double-recursive (which can be problematic if you're trying to teach the concept in assembly), and it's not particularly clear why such a function should exist (although one can point out that it's basically a generalization of addition/multiplication/exponentation/tetration to arbitrary degrees).

clusmore · on March 9, 2017

>the problem is a student could reasonably ask "why would you do it this way"? - i.e. this kind of algorithm seems like a better candidate for iteration

Fantastic question. And a point I would respond with is that if you were to actually get up and walk to the front of the cinema counting rows as you go, you would realise that in doing so you would have walked forwards one row and then performed exactly the same action that the person in front of you would have done had they tried to count their row. So even this "iterative" procedure parallels recursion.

Perhaps a problem that seems harder to convert to iteration is that of evaluating arbitrary arithmetic expressions. In order to evaluate a binary expression, you evaluate its two parts (recursively) and then perform the binary operator on the two operands. In order to evaluate a numeric literal, you simply take its value. So (1 + (2 * 3)) involves evaluating 1 to itself, (2 * 3) to 6, and then finally (1 + 6) to 7.

dreamcompiler · on March 9, 2017

Ordinary scalar multiplication can itself be expressed recursively, e.g. (2734 * 5896). Break each number into two parts and the answer is (27 * 96 * 100) + (34 * 58 * 100) + (27 * 58 * 100 * 100) + (34 * 96).

Fine.

Now what is (27 * 96)? (2 * 6 * 10) + (7 * 9 * 10) + (2 * 9 * 10 * 10) + (7 * 6). etc. [In real life you'd do this in binary and the multiplies by 10 become shifts.]

This is not usually the fastest way to do scalar multiplication, but it illustrates that:

1. Ordinary scalar multiplication is polynomial multiplication.

2. Polynomial multiplication is convolution.

3. And since we can [sometimes] speed up convolution by moving it to the Fourier domain, we can multiply numbers by taking their FFTs first, doing an elementwise multiplication, and then taking an inverse FFT. This actually pays off on very large numbers.

Multiplication is fun stuff.

lstyls · on March 9, 2017

This is a great analogy, but doesn't cover the set of recursive algorithms. It describes depth first search exactly.

jonsen · on March 8, 2017

"Let me see if I got this recursive disk juggling right. You move the top-most disk to position C. Then recursion can move the rest to position B. And finally I move the disk at position C on top of the rest at position B."

"No, no, no. Look ..."

"But to start you can only move the top-most disk?"

"Yes, but it won't work because ..."

"What? You have to move the top-most as the first move."

ihoz · on March 8, 2017

https://www.google.com/search?q=recursion

akkartik · on March 8, 2017

I feel fairly confident that I've solved the problems of teaching functions (http://akkartik.name/post/mu ; particularly the last couple of sections) and recursion (http://akkartik.name/post/swamp). Tl;dr - It just requires first introducing these ideas in a statement-oriented language without recursive expressions. Allow learners to see the steps unfold explicitly, one by one.

(Comment deliberately phrased provocatively to hear push-back and feedback.)

Koshkin · on March 8, 2017

To those who might be under the impression that recursion is not all that useful in the practice of commercial software development (as opposed to a purely academic interest or as a method of solving logical puzzles) I would point out the fact that recursion is the only way to emulate iteration in functional languages and, as such, it finds extensive use in the C++ template language as well as XSLT - two most used pure functional programming languages today.

brianberns · on March 9, 2017

Many commonly-used data structures (e.g. trees) are nearly impossible to manage without recursion.

crimsonalucard · on March 9, 2017

Not true. A loop and a stack can achieve the same thing!

auxym · on March 9, 2017

As an amateur programmer, I'm intrigued in how this is achieved, would you happen to have any references you could point me to?

My first encounter with recursion was when I needed to traverse a directory tree. I couldn't figure how to go arbritrarily deep without writing tens of nested for-loops. Then recursion somehow hit me, or I found something interesting on Google, can't remember, it was over 10 years ago, pretty sure SO wasn't around.

...and then a while later I found out about os.walk (this was python).

thethirdone · on March 9, 2017

http://blog.moertel.com/ does a decent job of explaining some practical aspects for de-recursivying, but I'll give you a basic explanation of that plus a more theoretical answer showing that any computatble function can be written using only while loops and a stack.

Practical: If we right all of are programs and tail-recursive (returning a recursive call or constant at the end of the prgram) programs they are easy to transform into simple while loops with no stack. The idea is instead of calling another function just restart the current one with the updated arguments.

EX:

    f(n,x)=f(n-1,x+1)
    f(0,x)=x

goes to

    while(n > 0){
      n--;
      x++
    }

Other techniques such as trampolining and continuation passing style also help translate recursive functions into iterative ones.

Theorectical: Think about Brainfuck or Turing machines, these computing frameworks can compute any computable function using only what amounts to while loops and infinite memory.

Given then that we can produce arbitrary control flow using only while loops, we can reproduce how recursive calls are computed in computers with stack frames (provided by our stack).

clusmore · on March 9, 2017

Here's a recursive directory traversal in Python:

  import os
  import os.path
  
  def list_files_recursive(path):
      filenames = os.listdir(path)
      for filename in filenames:
          fullname = os.path.join(path, filename)
          if os.path.isdir(fullname):
              yield from list_files_recursive(fullname) # recursively explore folders
          else:
              yield fullname

and here is the same program with an explicit stack

  import os
  import os.path
  
  def list_files_stack(path):
      folders = [path] # create stack of unexplored folders
      while folders: # repeat until stack is empty
          path = folders.pop() # pick a folder from stack
          filenames = os.listdir(path)
          for filename in filenames:
              fullname = os.path.join(path, filename)
              if os.path.isdir(fullname):
                  folders.append(fullname) # push folder to stack
              else:
                  yield fullname

auxym · on March 9, 2017

Wow, somehow I expected something complex.

Thanks a lot!

crimsonalucard · on March 9, 2017

Lets say you have two functions. Function A and function B. Function A does a bunch of stuff and function B is executed in the middle of function A. Well when function B returns the program must remember the context of function A in order to execute the remaining steps. It does this by storing the entire context of function A on a stack. This is known as the 'call stack.' Essentially, all recursive functions can and usually are implemented using a stack.

vertex-four · on March 9, 2017

Indeed. People who claim recursion is the only way to handle some types of data structures are missing how recursion is implemented. In practice, it boils down to a stack and a loop - and in the case of tail-recursive functions, not even the stack - though it may be a little prettier if you use recursion.

charles-salvia · on March 9, 2017

Recursion sometimes makes it easier to implement certain algorithms, since modern programming languages have an implicit call stack which conveniently provides a snapshot of the state of all local stack variables at that point in the algorithm.

crimsonalucard · on March 9, 2017

Here's a mind bending thought. It is actually the loop that is syntactic sugar and recursion is the implementation.

buzzybee · on March 9, 2017

You're begging the question of what the lambda calculus is. As a theoretical basis for computing it sits equivalently with Turing machines. In practical use, there are consequences with trying to represent the computation with one or the other model, and our physical hardware resembles neither, really.

vertex-four · on March 9, 2017

In practice, lambda calculus machines are rare, and your code probably compiles down to machine code that more closely emulates a loop and a stack. I'm not saying they're not theoretically equivalent, of course.

ihoz · on March 9, 2017

Recursion is nothing more than implicitly using a stack (via the call stack) or explicitly with a loop.

You just won't get a stack overflow exception with the loop -- you'll run out of heap instead. It's still the stack filling up.

dreamcompiler · on March 9, 2017

For many algorithms you'll have to build and maintain a stack manually if you use a loop. I once saw an old Fortran routine that expressed Quicksort this way.

lutusp · on March 9, 2017

Quote: "These are variables, and when I've been teaching people to program sometimes I see the light dawn when they realise that a variable name is like a box with the name on the outside."

Yes. Maybe there could be some algebra in advance of -- or alongside -- early coding lessons, where this idea is central to both fields. One variable name, any numeric value.

It's the same with functions, the lifeblood of analysis -- one function name, any equation. So in a perfect world, people would be learning calculus and general analysis while writing their first program functions -- indeed, they might be learning higher math while coding their lesson materials at the same time.

http://arachnoid.com/calculus

yazaddaruvala · on March 8, 2017

Kinda off topic. And a kinda crazy hypothetical:

I would really like to see a production language, where program-stack based recursion was a compile time error. Instead, only recursion as syntax sugar is allowed. i.e. Only recursion (or any cyclic function call graph) that the compiler can translate to "imperative code", should be allowed to compile.

For two reasons:

1. I honestly dream of a world where we can no longer think about infinitely propagating program-stacks.

2. It would force us to invent better optimization algorithms (like tail-call optimization) but for more general recursive patterns.

dreamcompiler · on March 9, 2017

You're talking about continuation-passing style, which makes stacks superfluous. It's a mind-blowing idea. But it's also a royal pain to write code directly this way, so some compilers convert to continuations under the sheets.

yazaddaruvala · on March 10, 2017

Its crazy that this idea was invented in 1975[0], and I haven't heard of it, even after:

- 4 years of self learned programming

- 4 years of university for computer engineering

- 5 years in the industry, as a software engineer

[0] https://en.wikipedia.org/wiki/Continuation-passing_style

dreamcompiler · on March 15, 2017

There are two very different (yet of course functionally equivalent) paradigms for general computation: Turing Machines and Lambda Calculus (Turing vs. Church). All schools teach TMs but not all teach LC. Which is a shame, because some wonderful insights like Functional Programming and Lisp and CPS come mostly from the LC paradigm.

buzzybee · on March 9, 2017

Oh, I do believe this was attempted with Algol 60. And there's an article on what happened. It turns out to be (possibly mathematically) unfeasible as there are many ways to sneak it in.

[0] https://news.ycombinator.com/item?id=10131664