Homework 6.0

Untyped lambda calculus

Please submit homeworks via the new submission page.

In this homework, you’ll implement an interpreter for the untyped lambda calculus.

Your submission should be a zipfile including:

  • a Makefile, with a default target that builds an executable named interp;
  • your source files;
  • a file README.md, which lists the collaborators on this assignment (not more than three); and
  • a file fact.lc, which computes the factorial of 5.

If you’re going to use a language other than Haskell, please tell me about your choices in README.md—which version I should use, which libraries I need, and why you chose the given language.

Please submit a clean zipfile, i.e., you should include just your Makefile, source, README.md, and fact.lc.

I will grade your homework by unzipping the zipfile and running ‘make’. If that doesn’t work and the cause is something that is not my fault (like needing to install library you used), you will get a zero.

Do not:

  • have your files inside a directory
  • include an existing interp executable
  • include any .hi or .o files
  • include weirdness like __MACOSX or .DS_Store files
  • include version control information like .git directories

To avoid getting a zero, test out your zipfile by unzipping it in a new directory and running make. Does it build your interpreter correctly?

The language

You will fundamentally be implementing the lambda calculus. A lambda calculus expression is defined by the following grammar:

e ::= x | e1 e2 | lambda x. e

Your concrete syntax should allow arbitrary whitespace between tokens (like lambda or .) and parenthesization for disambiguation (lambda x. x (x x) is different from lambda x. x x x).

Variable names should begin with an alphabetical character followed by zero or more alphanumeric or single-quote (') characters, i.e., x and foo3'5bar are valid variable names, but 12 and 'quoted' are not. As for the While language, please be careful to ensure that the identifiers and keywords are kept distinct.

Application should be left associative, i.e., x y z should parse like (x y) z.

Lambda expressions should be allowed to have more than one argument, i.e., lambda s z. s z should parse like lambda s. (lambda z. (s z)).

A program in our language is a sequence of let-statements or runnable “bare” expressions, separated by semi-colons, where the final semi-colon is optional. Here is a grammar:

program ::= statement (;statement)+ ;?
statement ::= let x = e | e

For example, the following program computes the Church numeral representing the number two:

let zero = lambda s z. z;
let succ = lambda n. lambda s z. s (n s z);
succ (succ zero)

Note that each variable bound by a let statement is visible later on in the program, but not before.

The interpreter

Your task is to implement an interpreter that reads in and evaluates programs. You’ll need to define syntax, a parser, a pretty printer, and an interpreter. Write a Makefile that compiles your code into a program called interp.

Your interpreter should use a call-by-value semantics, i.e., you only apply a beta rule when the argument has fully reduced to a lambda.

After your interpreter evaluates all of the program, it should, by default, print out the result of each bare expression; for example, running interp on:

let zero = lambda s z. z;
let succ = lambda n. lambda s z. s (n s z);
succ (succ zero)

should produce:

lambda s z. s ((lambda s z. s ((lambda s z. z) s z)) s z)

and running interp on:

let zero = lambda s z. z;
let succ = lambda n. lambda s z. s (n s z);
succ (succ zero);
succ (succ (succ zero))

should produce:

lambda s z. s ((lambda s z. s ((lambda s z. z) s z)) s z)
lambda s z. s ((lambda s z. s ((lambda s z. s ((lambda s z. z) s z)) s z)) s z)

It’s critical (for grading) that your interpreter output parseable lambda expressions. For example:

let id = lambda x. x;
lambda y. id

should yield:

lambda y x. x

Your interpreter should signal errors appropriately. The precise content of your error message isn’t the most important thing—though the more detailed they can be, the better!—but it is critical that interp exits with a non-zero exit code when there is an error. For example, when I run my version of interp on lambda. lambda lambda, I get the following error message:

Parse error: "parse.lc" (line 1, column 7):
unexpected keyword in place of variable (lambda)
expecting letter or digit, space or "'"

There are other errors that can occur, like unbound variables. For example, the program (lambda x. y) (lambda x. x) produces Error: unbound variable y when run. Note that (lambda a b. a) (lambda x. x) (lambda y. z) should not produce an error.

If you don’t encounter an error, your program should exit with a zero exit code.

Command-line arguments

Your interpreter should, when run without arguments, read all of the input from standard input, parse the input as a program, and then evaluate the program, pretty printing the result of each bare line.

When given an argument, your interpreter should read the file as input (and then proceed to parse, evaluate, and print as above). If the file is specified as -, then you should follow UNIX convention and read from standard input. (Pro tip: never name a file -.) I don’t care what your program does when given more than one argument; mine uses the rightmost file given.

You should also implement two flags to your interpreter: -c for checking that all variables are well scoped, and -n for converting the final results from Church numerals to Arabic numerals.

The -c flag should turn on a “checking mode”, wherein you should check that a program is well scoped before running it. If a program is not well scoped, you should display an error message and exit with a non-zero exit code. For example, running:

$ echo lambda x. y >bad.lc
$ ./interp bad.lc
lambda x. y
$ ./interp -c bad.lc
Unbound variables: y
$ echo $? # displays the exit code of the last command
1

Note that the checker should run before your program does. For example:

$ echo '((lambda x. x x) (lambda x. x x))' z >bad2.lc
$ ./interp bad2.lc # runs forever
^C
$ ./interp -c bad2.lc
Unbound variables: z

The -n flag should turn on a “Church numeral conversion mode”. Whenever you evaluate a bare expression in -n mode, you should then convert that expression from a Church numeral to an Arabic numeral. For example, suppose two-three.lc contains the text:

let zero = lambda s z. z;
let succ = lambda n. lambda s z. s (n s z);
succ (succ zero);
succ (succ (succ zero))

We should then have:

$ ./interp two-three.lc
lambda s z. s ((lambda s z. s ((lambda s z. z) s z)) s z)
lambda s z. s ((lambda s z. s ((lambda s z. s ((lambda s z. z) s z)) s z)) s z)
$ ./interp -n two-three.lc
2
3

Implementing the -n flag will require some creativity in extending your language. You’ll have to make some interesting internal changes—but please make no externally visible changes beyond adding the -n flag. I expect you to be able to convert any Church numeral—note that the two lambda terms we get out of two-three.lc aren’t the “standard” way of writing two and three, which would be lambda s z. s (s z) and lambda s z. s (s (s z)), respectively.

If you’re evaluating in -n mode and you get a term that can’t be interpreted as a Church numeral, you should issue an error message and exit with a non-zero exit code. For example:

$ echo lambda s z. s | ./interp -n
Couldn't extract a number from lambda s z. s

Note that -c and -n are orthogonal: I expect your code to work with every possible combination.

Finally, you’ll be a good citizen if you implement a usage message, as in:

$ ./interp --help
interp [OPTIONS] FILE (defaults to -, for stdin)
  lambda calculus interpreter

  -c --check    Check scope
  -n --numeral  Convert final Church numeral to a number
  -? --help     Display help message

I don’t require that you write a usage message, but it’s a good habit to get into.

Conservativity

You are free to write whatever error messages you like, though please be sure (a) that they go to stderr (not stdout) and (b) that you exit with a non-zero exit code.

Do not output unnecessary text when a program succeeds. While such debugging information may be valuable when you’re programming it’s (a) not part of the specification in this document and (b) will confuse my grader. If you’re not sure what output to give, please ask on Piazza.

Testing

I strongly recommend that you build a test suite. Include short programs and long programs; programs that fail and programs that succeed. In class we’ve been building up the booleans, the naturals, etc.—use these to test your code!

At a minimum, I expect you turn in a file fact.lc which builds up enough of the naturals to compute the factorial of 5. We should have:

$ ./interp -cn fact.lc
120

A note on efficiency

While this class is explicitly not about efficiency, I expect your code to run in a reasonable amount of time. For example, here’s how long my code takes to calculate factorial of 5 (using Church numerals and the Y combinator):

$ time ./interp -nc fact.lc
120

real    0m0.014s
user    0m0.007s
sys     0m0.006s

Your code doesn’t need to be this fast, but if it takes more then five seconds, I think there’s cause for concern.