Hacker News new | past | comments | ask | show | jobs | submit login
The Y Combinator (no, not that one) – A Crash Course on Lambda Calculus (medium.com)
251 points by juanplusjuan on Sept 10, 2014 | hide | past | web | favorite | 46 comments

One of my favorite programming videos: Jim Weirich on the Y Combinator https://www.youtube.com/watch?v=FITJMJjASUs

Thanks for letting me discover the work of that man. Its approach to learning about something by actually doing it really shines in the video you linked.

I think he passed away recently :(

He did. And that's how I learned about him and video.

Great video, I was hoping it would be linked here. Jim's is not only an informative talk but an inspirational ability to teach.

If anyone hasn't seen this I highly recommend it, if not just because Weirich was a great guy and was very entertaining. RIP.

Thanks for this. I just learned a lot!

Nice article, definitely piqued my interest in the lambda calculus, but there were a few points that were confusing for me as my first glimpse at the lambda calculus.

The first confusing point was the lack of definition of the order of operations. I first stumbled at the line (λy.(λx. x) y) a b . Is this supposed to be (λy.(λx. x) y) (a b) or ((λy.(λx. x) y) a) b . In this case both give the same answer, but it's not obvious that associativity holds in general.

It gets worse with the Y combinator: λf. (λx. f (x x))(λx. f (x x)) . Is this (λf. (λx. f (x x)))(λx. f (x x)) or λf. ((λx. f (x x))(λx. f (x x))) . Peeking ahead, it seems to be the latter, which makes more sense (otherwise the letter f would be pressed into service in two different contexts), but it's totally ambiguous from the rules that have been presented in the article.

The other point of confusion was regarding rule #3 (If t and s are both valid λ-terms, then t s is a valid λ-term.) The article tells is a certain manipulation we can do if t is a "function" (i.e. something that begins with a λ, I don't know the technical name for this), but doesn't say what to do if t is not a "function". As far as I can tell the answer is: do nothing, there is no simplification in this case. It would be nice if this was said explicitly.

There are only two bits of syntax for lambda calculus: 'application', written "a b", and 'abstraction', which I'll write as "\a. b" (since "\" is easier to type than lambda). The words "abstraction" and "function" are pretty-much interchangable, although we might use "abstraction" to specifically refer to a "new" function definition, like this:

    \x. x
As opposed to a "calculated" function, like the result of an application:

    a b
Regarding precedence, application is "left-associative", meaning that

    a b c
is the same as

    (a b) c
Abstractions 'capture' everything after the ".", so

    \a. b c \d. \e. f
is the same as

    \a. (b c \d. (\e. f))
You need to use parentheses to go the other way, eg.

    a (b c)
    (\x. x) y
In the case of the Y combinator, there is only one "f" variable but it gets used twice. There are two "x" variables, one per abstraction, although those abstractions just-so-happen to look the same. In other words, we could replace one of the "x" abstractions with "\y. f (y y)" (known as an "alpha conversion") but we can't rename one of the "f"s without renaming all three occurences.

Regarding your last point, the only semantic values in lambda calculus are functions, so "t" can't be anything else. However, we just saw that there are two syntactic forms which "t" can take. If "t" is an abstraction, we can simplify our expression immediately via the beta-reduction rule ('applying t to s').

If "t" is an application, like "t1 t2", then we can try simplifying that to get either an abstraction (which we can apply to "s") or another application, which we can try simplifying, and so on.

It's possible (due to the Halting Problem) that we keep getting applications no matter how much we try to simplify, in which case we've hit an infinite loop while trying to calculate "t", and hence we can never reach the point where we can apply it to "s".

>It's possible (due to the Halting Problem) that we keep getting applications no matter how much we try to simplify, in which case we've hit an infinite loop while trying to calculate "t", and hence we can never reach the point where we can apply it to "s".

But note that the _order_ of the reductions doesn't matter, as long as they're valid reductions. We always get the same result or it never halts like you said.


The Church-Rosser theorem probably deserves its own blog post.

My page on lambda diagrams at http://www.cwi.nl/~tromp/cl/diagrams.html has a nice picture of the Y-combinator at the top, and this note at the bottom:

The diagram in the title, produced by the postscript code below, is a slightly deformed alternative Y diagram made to look like a Y.

    %!PS-Adobe-2.0 EPSF-2.0
    %%BoundingBox:0 0 118 110
    /m{moveto}def/l{lineto}def/c{concat 6 m 0 6 l 7 8 m 0 8 l l l 3 6 l 2 6 m 7 6 l
    3 4 m 6 4 l 6 6 l stroke}def 3 0 0 0 1[-8 4 0 8 60 9]c 3 2 0 2 2[-1 1 0 1 0 0]c

Since SICP and Lambda papers I have a fascination for the relationship with discrete math, computing devices and their topology. This doesn't help.


A fixed point p for a function f, is a value so that f(p)=p.

A semi-recursive function f is a like a recursive function, except that instead of invoking itself, it invokes some other function provided as an argument to f.

The y-combinator (aka fixed-point-combinator) is a function, that for a function f, finds a fixed point for f.

We can turn a semi-recursive function f into the corresponding recursive function g, by finding a fix point for f, which we can do using the y-combinator.

Also, and a big also, the Y-combinator exists in (untyped) lambda calculus---e.g. all you need is abstraction and application.

This came as a shock to Church when he invented it as he wanted to use LC as a language for mathematical logic and fixed point combinators spell out doom for logical purposes. He thus invented the simply typed lambda calculus to banish such constructions.

I once tried an insane thing: building the Y-Combinator in C, and wrote a blog post about it. http://kswizz.com/post/50429268286/fibonacci-the-y-combinato...

It was a fun thought exercise. Not something I'd use in production code.

From The Little Schemer - http://www.ccs.neu.edu/home/matthias/BTLS/sample.pdf

That chapter made me grok the Y Combinator.

Still wishing more people read that book. It is not overrated.

I love Racket, and thus Schemes of many kinds, but that chapter is really difficult for me to read compared to more academic or technical approaches - and I am really not an academic or technical person!

I find the use of some sort of Socratique dialogue to be deeply confusing, distracting and frustrating. Creepy, SCP-foundation or Welcome-to-Nightvale-like asides such as "We did not expect you to know this." only add to my confusion. It makes simple concepts like recursion look like some sort of House of Leaves nightmare world.

If there's one thing that the prolific rise of the Y Combinator accelerator has done, it's made the combinator much harder to google for. (Though ofc Fixed-point combinator would still work).

I actually complained about this on twitter a while ago, and pg said he didn't believe me - recommending "y combinator lambda" as a search string. But more specific things like "y combinator golang" require you to block HN and a few other sites to get meaningful results.

Has the Y combinator been useful to anything? Has it been used in any software in a role other than pedagogic?

It's a beautiful way to make a recursive call without binding the function to an identifier, but has it actually proven useful? It would seem that languages that allow that make it easy to use the Y combinator also typically make it easy to use named recursion with a permanent or a temporary name.

Hey. You're completely missing it. The Y combinator shows that full general computation, including recursion and iteration, derives automatically and inevitably from just basic rewrite rules. Obviously it is too raw to be used directly. But if you design/implement any system with rewrite rules, you have provided indefinite power for recursions, and you have opened the Pandora box of undecidable-termination. This means that decidable computation can only occur without rewriting, which is incredibly restricted.

Food for thought.

I do use it directly. For example in Fexl I define the append function for lists as:

    \append=(@\append\x\y x y \h\t [h; append t y])
Where '@' is the Y-combinator.

Note that there's no direct self-reference with the symbol "append" there. I could define it equivalently as:

    \append=(@\loop\x\y x y \h\t [h; loop t y])

Why? Is it unreasonably difficult for you to implement recursion via direct self-reference in your language? Or do you just not want to because the Y combinator is there and it's cool?

True, I could implement it in terms of direct self-reference. At one point I used the "==" syntax for just that purpose:

    \append==(\x\y x y \h\t [h; append t y])
Maybe I should resurrect that syntax option. :)

It was trivial to implement. When the parser sees the "==", it just automatically abstracts the symbol "append" from the definition and applies the fixpoint operator (Y combinator).

The only reason I eliminated "==" in the first place was that I was in the throes of using different syntaxes for lazy, eager, and self-referencing definitions. Now I've settled in on "=" always meaning eager evaluation, without exception. Then I got hyper-minimalist and said that's it, there's only one syntax for definitions, and it's "=", and if you want self-reference, use "@".

However, the decision to settle in on eager evaluation now frees up "==" once again as an option for self-reference. So I may bring that back.

Thanks for the food for thought.

Postscript: You might well ask why not just use "=" only, and assume that all definitions are potentially self-referencing?

That's a non-starter because I find that redefinition is the more common intent:

    \x=(* x 5)
    \x=(+ y x)
In those cases I don't want x defined in terms of itself, but rather in terms of the previous definition of x.

That is why I would insist on a special token such as "==" to express the intention of self-reference.

OK, I went ahead and revived the "==" syntax for recursive definitions, so you don't have to use '@' (fixpoint) explicitly.


>derives automatically and inevitably from just basic rewrite rules.

Sorry, I don't understand what that means.

Also, (lambda x: x(x))(lambda x: x(x)) already gives you an infinite loop, so why do you need a Y combinator to show non-termination? Once you have functions as first-class citizens that you can copy around, you've lost control of termination. Seems pretty intuitive. What's the specific contribution of the Y combinator?

">derives automatically and inevitably from just basic rewrite rules."

I said "full general computation, including recursion and iteration, derives automatically and inevitably from just basic rewrite rules". You can't cut a sentence wherever you want and still think you're responding to the original thought.

There is an imprecision in my original comment, but it's not there. If you can stop nitpicking long enough, you may be able to see something that's pretty powerful.

Thanks for the assertion that everybody can understand you when you use terms without defining them. Not everybody here is so lucky as to have the same educational background as you.

What is a rewrite rule? To me, it's something you put in a web server configuration.

My apologies, I was only responding to the previous comment, forgot momentarily about others, sorry about that. Let me try to fix this.

A computation engine based on rewrite rules, in general, performs computations by replacing instances of certain patterns by some other patterns. Patterns can be literal or generic. The rules according to which these rewrites are done are "code" of the computation, they determine what computation the system performs. Different rules can result in a system that may do one of: computing prime numbers, computing digits of pi, factoring large number in prime divisors, finding the shortest path visiting all cities exactly once in a given map, rendering a 3D scene from a list of objects and coordinates, finding the best potential dating matches for a given user out of list containing many other users, or outputting a Schonberg-style tonal composition in waveform with synthetic piano-like sound.

The lambda calculus is one such type of computation system. The core element is a lambda expression of the form (\x.y), where "\" is just ASCII for the greek lambda letter which is hard to type for me now. The core thing a lambda-calculus system does is it takes lambda expressions and it applies a simple rewrite rule: where there is a sequence of the form (\x.y)z, where x,y,z can be anything, it will substitute the whole thing by "y", but wherever "x" appears inside "y", it will write "z". For example:

Will be rewritten to:

Given an initial string, the engine that computes the lambda expression will apply this rewrite rule repeatedly until no more rewrites can be done. This may or may not reach an end (this is the halting problem - in a general case, it is impossible to know whether the computation will reach an ending point or not).

The rewrite rule is called "beta reduction", by the way, but try to remember the concept, not the name, which is arbitrary and unimportant.

Using lambda notation you can craft code that will do any of the things above. The code will look like a long lambda expression, which the lambda-calculus engine can work on following the standard rules to end up producing the end result.

The Y combinator is a specific (\x.y) lambda expression that, once applied to some value by the rewriting system, it results in a computation being done on that value, plus an extra copy of the Y combinator, thus allowing a new iteration of the same computation. Thus resulting in being able to keep on computing - thus being able to compute anything requiring arbitrary recursion, such as the factorial of a number, the Ackerman function, the shortest path in an arbitrary graph, or whatever.

Although I've never seen Paul Graham explain it, my take is that the accelerator is called Y Combinator because they put some money in, this allows the startups to get to profitability, they get their money back, and they can use it again to fund new startups, thus achieving infinite effective resources from finite initial resources.

PS: the author of the original comment was handwaving and nitpicking since he didn't like being in the point of having missed something (which we all do quite often, incidentally, including myself in the previous comment). I don't like that behavior and I don't like the time and energy waste it results in, thus my dismissive response, apologies everyone for not responding constructively.

Exactly! Look at eg primitive recursion for an interesting subset of computation that does not suffer from undecidability.

Most programmers are familiar with regular expressions, too. There are another interesting subset.

What do you mean by "rewrite rules" - are you referring to beta reduction?

It has some minor uses, but not a whole lot. If, for example, you wanted to do recursion with an anonymous function, you could use the Y combinator (well, more likely the Z combinator which is a variant of the Y combinator that works with strict evaluation). However, it'd probably just be easier (and clearer!) to give the function a name and do it that way.

However, as someone pointed out, the Y combinator's utility is not as an actual tool for programmers but as a tool for mathematicians working in computational mathematics. Alonzo Church solved the Entscheidungsproblem using a primitive variant of the Y combinator called the omega combinator. The Y combinator itself was invented by Haskell Curry (his name is familiar for a reason) in order to demonstrate what is commonly called Curry's paradox.

Well, in 1936 Church invented the LC in order to serve as a foundational logic for mathematics. This was around the time that Hilbert's Program was attempting to totally mechanize reasoning through rich logical languages and Church wanted to use LC to define the notion of "efficiently computable" which was part of Hilbert's specification.

The Y-combinator was discovered originally as a flaw in the LC. It meant that you couldn't use LC as a logical language because you could form infinite loops which are not just not efficiently computable (in a sense, har har) but also correspond to vacuous logical statements like "If A proves A then A is true" which, in the context of rules like reflexivity "If A then A" (provable in LC as λ x . x) is logical nonsense.

To be more clear, here's the Y combinator

    λ f . (λ x . f (x x)) (λ x . f (x x))
and here's the Y-combinator applied to `id = λ x . x`

    (λ x . (x x)) (λ x . (x x))
which reduces into itself—a dead loop and a logical inconsistency (named "omega" as the end of LC!)

Anyway, Church worked for the next 4 years to invent the Simply Typed Lambda Calculus which imposes a Russell-like typing discipline on the LC in order to outlaw the Y-combinator. This made LC more useful for logic (and indeed STLC models Gentzen's Natural Deduction logical system) but it threw away its ability to compute any "recursively enumerable" function. That latter thing was what eventually came to be the definition of "efficiently computable" (after corroboration between Church's LC, Turing's Turing Machines invented the next year).

And now we happen to care about writing programs which, in general, may be r.e. and so we use either untyped LC or typed LC which has logically unsound recursion principles.

> (λ x . (x x)) (λ x . (x x))

This shows an infinite loop in lambda calculus evaluation. It can be arrived at independently (and I'd be surprised if it hadn't been) just by trying to build the simplest expression that doesn't reduce. Why do you need the Y combinator for this?

It had been. Everyone was just talking about the Y-combinator so I derived from there, but Omega was first.

Once you have any fixed-point, though, deriving the others is trivial. The explosive self-referential nature of your system has been uncovered.

Can you recommend me to any further reading regarding this or good literature? I've always meant to keep up with that, but aside from source papers is there a good overview of essentially what you're explaining? ie Further reading about Hilbert's Program.

The Stanford Emcyclopedia of Philsophy is usually a good source. If you're in Boston I'm going to be giving a Papers We Love talk on this stuff next week.

Yes, it is useful. For example, if you want to bootstrap a language implementation from the smallest possible subset, with a very trivial interpreter, then using Y-combinator as a way to implement recursion is reasonable (after all, performance does not matter for the bootstrap phase). See an example of such a use in https://github.com/combinatorylogic/mbase/blob/master/src/l/...

Named recursion requires a notion of time, which is clear from your distinction between "permanent" and "temporary" names. This can add unnecessary complexity.

Let's use an analogy: is it ever useful to call a function with another function's return value?

    p = \q. f (g q)
We can implement the same thing using named arguments with permanent or temporary names, for example:

    p = \q. let g_of_q = g q
             in f g_of_q
That forces us to write the same boilerplate every time. We can do better by encapsulating it in a function (DRY):

    compose = \a. \b. \c. let b_of_c = b c
                           in a b_of_c
    p = compose f g
The Y/Z combinators do exactly the same job as compose, except for self-application rather than chained application. They let us write down what we mean, directly, without having to break it up into artificial chunks and chain them back together. Notice that the final definition of "p" doesn't even need a lambda abstraction, since the function is being calculated automatically rather than defined manually.

For example, I've used the Z combinator when defining arrays of functions in PHP. Without it, I'd have to define the functions separately, using brand-new function abstractions to inherit the self-reference (which is horribly verbose in PHP), then combine them into an array; or I'd have to define them separately, using brand-new function abstractions to inherit the array, then combine them into the array. Either way, it would add a bunch of complexity and boilerplate.

Y Combinator in TXR's embedded Lisp dialect:

      ;; The Y combinator:
      (defun y (f)
        [(op @1 @1)
         (op f (op [@@1 @@1]))])

      ;; The Y-combinator-based factorial:
      (defun fac (f)
        (do if (zerop @1)
               (* @1 [f (- @1 1)])))

      ;; Test: 4! -> 24
      (pprinl [[y fac] 4]))

There appears to by a typo in one of the lines.

The line

6 * (if 3 == 0 then 1 else 1 * (YF)(1–1))

The previous line is 6 * (λx.(if x == 0 then 1 else x * (YF)(x–1)) 1) When replacing the x s with 1s, it replaces one of the x s with 1, but replaces the first one with 3.

My guess was that this was copied from the first version, and they just forgot to change one of the threes to a 1.

(that is, unless I misunderstood something, which is of course possible)

I think the line should be

6 * (if 1 == 0 then 1 else 1 * (YF)(1–1))

Nice catch, thank you. I've fixed it now. :)

It still amazes me that you can define the Y combinator in Haskell directly:

    y f = f (y f)
And in ML, only slightly less directly:

    let rec y f x = f (y f) x

I remember taking this in University, lambda scared the hell out of me.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact