Lecture 20 — 2018-03-29

Lambda calculus: recursion and the Y combinator

This lecture is written in literate Haskell; you can download the raw source.

The lambda calculus doesn’t have recursion built in, but we can do it anyway.

Recursion: finite prefixes

Consider a recursive version of plus on Church numerals:

plus = λm n. (isZero m) n (plus (pred m) (succ n))

Now, such a definition isn’t mathematically valid—we’ve defined a lambda calculus expression in terms of itself, which isn’t finitely solvable. But what about this:

plusF = λplusRec m n. (isZero m) n (plusRec (pred m) (succ n))

Now, plusF is a perfectly valid definition, and no matter what we give it as the first argument, it’ll give the right answer when m is zero:

plusF Ω zero n =β (isZero zero) n (Ω (pred zero) (succ n))
               =β true n (Ω (pred m) (succ n))
               =β n

Now observe that if we give it an argument that will do another recursive step, we work on more inputs:

plusF (plusF Ω) one n =β (isZero zero) n ((plusF Ω) (pred one) (succ n))
                      =β false n ((plusF Ω) (pred one) (succ n))
                      =β ((plusF Ω) (pred one) (succ n))
                      =β ((plusF Ω) zero (succ n))
                      =β (isZero zero) (succ n) (Ω (pred zero) (succ (succ n)))
                      =β true (succ n) (Ω (pred zero) (succ (succ n)))
                      =β (succ n)

If we give it plusF (plusF (plusF Ω)), then we’ll be good up to three, and so on. So: if we could only have an infinite number of plusF calls available, then we’d be able to work on any input.

What we want is technically known as a fixpoint: we want a function Y such that:

Y e = e (Y e)
    = e (e (Y e))
    = ...
    = e (... arbitrarily many times ... (Y e))

Recursion: infinite prefixes

Consider the term ω = λx. x x. What does ω do when applied to itself? It reduces right away to itself! This is kind of like running forever—Ω = (ω ω) will happily churn away, looping on its own forever.

We can use a behavior like this get recursion in the lambda caluclus, by using the paradoxical Y combinator.

Y = λf. (λx. f (x x)) (λx. f (x x))

We have, for all expressions e:

Y e =β (λx. e (x x)) (λx. e (x x))
    =β e (λx. e (x x)) (λx. e (x x))
    =  e (Y e) = e (e (Y e)) = ...

Whoa. How do we use this? The intuiton is that any terminating recursive function will call itself some finite prefix of times. Let’s try it on plusF:

plus = Y plusF

plus three one =β (isZero three) one ((Y plusF) (pred three) (succ one))
               =β false one ((Y plusF) (pred three) (succ one))
               =β (Y plusF) (pred three) (succ one)
               =β plusF (Y plusF) (pred three) (succ one)
               =β plusF (Y plusF) two two
               =β (isZero two) two ((Y plusF) (pred two) (succ two))
               =β false two ((Y plusF) (pred two) (succ two))
               =β plusF (Y plusF) (pred two) (succ two)
               =β plusF (Y plusF) one three
               =β (isZero one) one ((Y plusF) (pred one) (succ three))
               =β false one ((Y plusF) (pred one) (succ three)
               =β (Y plusF) (pred one) (succ three)
               =β plusf (Y plusF) (pred one) (succ three)
               =β plusf (Y plusF) zero four
               =β (isZero zero) four ((Y plusF) (pred zero) (succ four))
               =β true four ((Y plusF) (pred zero) (succ four))
               =β four

Try to do this derivation on your own, without consulting these notes.

The call-by-value Y combinator

You may have noticed that we’ve used the equational theory rather than an evaluation function or a stepping relation. The equational theory is the ground truth of the lambda calculus and the easiest way for humans to reason about it. But your interpreters for HW06 run using call-by-value (CBV) semantics. Is that a problem?

Take a closer look at the the derivation above. Notice that I chose specific beta reductions to make. If I wanted to, I could have derived:

plus three one =β (Y plusF) three one
               =β plusF (Y plusF) three one
               =β plusF (plusF (Y plusF)) three one
               =β plusF (plusF( plusF (Y plusF)))) three one
               =β plusF (... however many times I want (Y plusF)) three one

What’s going on? Here, I’ve simply chosen to apply β reduction in the first part before evaluating arguments. Incidentally, this is what CBV evaluation does.

Should we be worried? Yes and no. Algebra has tons of equations we can use, and sometimes using them doesn’t lead anywhere we ant to go. For example, we could an algebraic proof that n + 2 = (n + 1) + 1 might proceed along the lines of:

  (n + 1) + 1
=  n + (1 + 1)   (associativity of +)
=  n + 2         (definition of +)

But not every series of algebraic manipulations gets us somewhere worthwhile; I could just as easily have written an infinite series of irrelevant manipulations:

  (n + 1) + 1
= (1 + n) + 1    (commutativity of +)
= (n + 1) + 1    (commutativity of +)
= (1 + n) + 1    (commutativity of +)
= (n + 1) + 1    (commutativity of +)
= ...

The existence of this second set of equalities doesn’t change our earlier, more meaningful example.

So much for algebraic proof: how do we reconcile the equational theory and CBV evaluation? Y doesn’t behave right in CBV evaluation—your program just runs forever. The trick is to make sure that we don’t automatically evaluate the fixpoint—we’ll only unroll Y e as demanded by the program. We want a fixpoint that behaves like:

Y e = e (λx. (Y e) x)

CBV evaluation stops here, because the Y e is hidden under a lambda. So the call-by-value Y combinator can be defined as:

Y = λf. (λx y. f (x x) y) (λx y. f (x x) y);

For more information, the Wikipedia article on the Y combinator/fixpoint combinators is excellent. (They use “strict” meaning, for our purposes, call-by-value. What I’ve given you is a variant of the Z combinator.)

Another way to use Y combinator

It might be hard to figure out how to use Y. Here’s a step-by-step recipe for how to go from Haskell code to code using the Y combinator.

plus 0 n = n
plus m n = plus (m-1) (n+1)

First step: eliminate pattern matching, since the lambda calculus doesn’t have that. Let’s rewrite this to use an if statement.

plus m n = if m == 0 then n else plus (m-1) (n+1)

Second step: write explicit lambdas.

plus = \m n -> if m == 0 then n else plus (m-1) (n+1)

Third step: use Church encondings.

plus = \m n -> (isZero m) n (plus (pred m) (succ n))

Fourth step: eliminate explicit recursion using Y. To do this, we come up with a new name—here, plusF—and add it as a new parameter to our function. We’ll use plusF to do a recursive call, and pass our whole function to Y.

plus = Y \plusF m n -> (isZero m) n (plusF (pred m) (succ n))

Fifth and final step: translate to lambda calculus syntax! This amounts to changing arrows to dots and giving the slashes little legs to make them lambdas.

plus = Y (λplusF m n. (isZero m) n (plusF (pred m) (succ n)))

CBV equivalents

The above recipe isn’t exactly right. isZero m will return a boolean which will evaluate both of its arguments… one of which is a recursive call!

The solution is to ‘delay’ a little bit more. We can write the delayed Church booleans as:

true = lambda a b. a (lambda x. x)
false = lambda a b. b (lambda x. x)

The expectation here is that each choice (a or b) takes a single argument, which it ignores. By putting each choice under a lambda, we can delay evaluation until the choice is made.

Try to figure out how to write and, or, and not on your own. But we can write our conditional:

plus = Y (λplusF m n. (isZero m) (lambda x. n) (lambda x. (plusF (pred m) (succ n))))

Such functions that ignore their arguments are called “thunks”.