# Homework 5

## Parsing

This homework is written in literate Haskell; you can download the raw source to fill in yourself. You’re welcome to submit literate Haskell yourself, or to start fresh in a new file, literate or not.

Please submit homeworks via the DCI submission page.

There is a lot of coding in this assignment. Good luck! You can download working lexers and parsers for the pure lambda calculus to help you with problems 2 and 3. The starter includes two lexers—one in alex and one by hand—and two parsers—one in happy and one by hand, by recursive descent. (To install these tools, run `cabal install alex` and `cabal install happy` on your command line. They should already be installed on the lab machines.)

You should submit only one solution. You can mix and match: write the lexer however you want, then write the parser however you want. I find it most straightforward to lex manually and parse using happy, but play around and decide for yourself. Get started early.

You can test the code in the starter by unzipping it and running `make` in the `hw05_lc` directory. This should create an executable named `Main`, which can be run on any of the `.lc` files. Don’t worry if you don’t understand the code in `Main` yet—we’ll get to it soon.

You will need to submit all of your code, whether all in a single file (viable only if you write everything manually) or many files zipped up in one. If you submit more than one file, make it absolutely clear to the TAs which function runs your lexer and which runs your parser.

Problem 1: parse trees

Please do problem 4.2 from Mitchell, page 83.

Example 4.2 specifies that multiplication and division have higher precedence than addition and subtraction, and that operators of the same precedence are left associative, e.g., 6 - 2 - 1 is interpreted as being equivalent to (6 - 2) - 1 and not 6 - (2 - 1).

Problem 2: lexing the `Expr` language

``import Data.Char``

Write a lexer for the `Expr` language.

``````type Id = String

data Expr =
EVar Id
| ETrue
| EFalse
| EIf Expr Expr Expr
| ENum Int
| EIncr Expr
| EDecr Expr
| EIsZero Expr
| EApp Expr Expr
| ELam Id Expr
deriving (Show, Eq)``````

Here are some sample programs in the `Expr` language’s syntax:

``incr 5``
``if isZero x then true else f (decr x)``
``\s -> \z -> s (s z)``
``(\increment -> increment 5) (\x -> incr x)``

Here are some invalid samples, which should fail in the lexer:

``?**&``
``\x -> (!)``

What kinds of tokens should you use? How do you make sure that keywords override identifiers—i.e., `\incr -> incr` isn’t valid because `incr` clashes with the built-in `incr` operation—but that identifiers can harmelessly include keywords, like `increment` above?

To help you get started, you can download a lexer and parse for the pure lambda calculus. Note that the syntaxes are already slightly different—that parser uses a `.` in lambdas, but we’re going to use `->`.

You have a choice for how to implement your lexer. You can either use alex, the automatic lexer generator, or you can write it by hand. There are pros and cons either way: alex gets keyword/identifier overriding correct automatically, but it’s a new tool to learn. Both versions are in Lexer.x—you only need to do one.

Problem 3: parsing the `Expr` language

Once you have a lexer, you must write a parser for the `Expr` language. If you haven’t by this point, you should consider downloading working lexers and parsers for the pure lambda calculus. Here are some more sample programs demonstrating the syntax I want you to use for the `Expr` language, which is slightly different from that in the starter:

A file containing:

``if isZero 0 then incr 1 else decr 5``

should parse to `EIf (EIsZero (ENum 0)) (EIncr (ENum 1)) (EDecr (Enum 5))`. If you evaluated this code—which is different from parsing it!—it should return 2.

A file containing:

``(\x -> true) false``

should parse to `EApp (ELam "x" ETrue) EFalse`.

A file containing:

``a b true``

should parse to `EApp (EApp (EVar "a") (EVar "b")) ETrue`.

Here are some invalid samples, which should fail in the parser:

``incr 1 2``
``\x ->``
``\1 -> 1``

Problem 4: extending the lexer and parser

We’re going to add let and let rec to our language by merely extending the parser—`Expr` won’t change at all. That is, let and let rec will be syntactic sugar, clever encodings done in the parser.

I recommend you do this problem in two steps: first get let working, then get let rec working.

First, you’ll need to add tokens to support let syntax. The syntax of let is `let id = expr in expr`. What tokens do you need add?

Once you’ve added the appropriate tokens, you need to get the parser to encode the let in `Expr`. Suppose you have `let x = e1 in e2`. You can encode this in the lambda calculus as (λx. e2) e1… how can you translate that to `Expr`?

Once you’ve gotten let working, you should work on let rec. The syntax for let rec is `let rec id = expr in expr`. Let rec should allow recursion, i.e., in `let rec x = e1 in e2`, the expression `e1` should be allowed to recursively reference `x`.

We can encode recursive definitions using the y combinator. When a programmer writes `let rec f = e1 in e2`, you can encode it as (λf. e2) (y (λf. e1)). What tokens do you need to add? Why is that encoding correct?

Good luck, and happy hacking!

Final “please submit the right things” plea

If you write one of your lexer or parser by hand, you only need to turn in one file:

• both lexer and parser in one,
• Lexer.x (for your alex lexer) with your parser included in the bottom, or
• Parser.y (for your happy parser) with your lexer included in the bottom.

If you use both alex and happy, you’ll need to turn in a zipfile containing Lexer.x and Parser.y.

You don’t need to turn in the Main.hs driver or Makefile we give you, though it’s fine if you do. We’ll ignore them.