CS 334
|
<Declaration> ::= <Type> <Declarator> <Type> ::= int | char <Declarator> ::= '*' <Declarator> | <Declarator> '[' number ']' | <Declarator> '(' <type> ')' | '(' <Declarator> ')' | namea. Prove the syntactic ambiguity of this grammar by finding a string that has two distinct parse trees. Draw the parse trees.
b. The constructs '[' number ']' and '('
c. Suppose that the first production for Declarator is changed to
make '*' a postfix operator (i.e., it goes after the Declarator
rather than in front of it). Why is the resulting grammar unambiguous?
As in last week's assignment, we will presume that we
have a parser which parses input into an abstract syntax tree, which
your interpreter should use. The definition of the ML datatype is
You are to write an ML function
interp that takes an abstract syntax tree representing a term and
returns the result of evaluating it, which will also be an abstract syntax
tree. The reduction should be done according to the rules given below. The
expression "e => v" means that the term "e" evaluates to "v" (and
then can be evaluated no further). The rules below are written for the
expressions in the original grammar. Your program should be written
for the equivalent expressions using the abstract syntax trees (elements of
type "term").
The base cases are:
(1) n => n for n an integer.
(2) true => true, and similarly for false
(3) error => error
(4) succ => succ, and similarly for the other initial
functions
The other cases are slightly more complicated. They are written in the form
of a rule in the manner of the following example:
a. Use these rules to write an interpreter, interp: term -> term, for the
subset of the language which does not include terms of the form AST_ID, AST_FUN,
or AST_REC. If your interpreter tries to evaluate these three types of expressions,
it should return the error, AST_ERROR.
Note: In my directory, ~kim/home/cs334stuff/ML/ML.interps,
you will find a file, parser.sml, which is an ML program which
parses strings or files containing an expression from the simple BNF
grammar given above into an expression using the AST terms. Thus, if you
use "parser.sml"; and then write succ 7, the system will
return AST_APP(AST_SUCC, AST_NUM 7). Similarly, if you have a
file "foo.pcf" containing succ 7, parsefile "foo.pcf"
returns AST_APP(AST_SUCC, AST_NUM 7). Feel free to use these
functions to generate abstract syntax trees, which is much easier than
typing in the long AST terms directly. You will find in the same directory
the skeleton of a program called "PCF.interp.student.sml", which
also contains brief explanations and examples.
b. The notation e[x := v] indicates the textual substitution of v for all free
occurrences of x in e. For example, (succ x) [x:=1] is the expression
(succ 1). Please write an ML function subst that takes a term,
t, a string representing a variable, v, and a term, s,
and returns t with all free occurrences of v (actually
AST_ID v) replaced by s. Thus, the function
application (corresponding to (succ x) [x:=1], above),
Do not substitute in for bound occurrences of variables. I.e., substituting
3 for x in (x + ((fn x => 2+x) 8)) should result in
(3 + ((fn x => 2+x) 8)). The formal
parameter x, and its occurrences in the body of the function are not affected
by the substitution because of the static scoping rules.
Hint: Just as in part a, use
pattern-matching on each constructor of the abstract syntax tree, calling
subst recursively when you need to.
c. Using your substitution function, extend your interp
function from part a to
include AST_FUN terms. The reduction for the terms involving
AST_FUN should be done according to the rules given below:
Functions by themselves don't do anything (just like succ and
pred above)
Notice that while terms of the form (AST_ID s) can appear whenever
s is a formal parameter, we never need to evaluate terms of the form
(AST_ID s), because they are always replaced by the subst
function before we evaluate the body of the function.
Any variables
which have not yet been replaced by other terms at the time of evaluation
represent unbound variables (those not introduced as formal parameters).
You should return AST_ERROR is your interpreter is applied to
a term of that form.
d. Surprisingly enough, evaluating recursive terms is pretty trivial.
First let's talk about what a term of the form rec x => e
actually means. It corresponds to the definition of a recursivve function
called x. Let's work with an example. The term
The rules for evaluating a recursive term are pretty simple. Just evaluate
the body of the term, where all occurrences of the recursively defined
variable are replaced by the entire rec term.
Notes:
Rather than complicating the interpreter, the parser translates let clauses
as if they were function applications. Thus the general form gets
translated as if it were
e ::= x | n | true | false | succ | pred | iszero |
if e then e else e | (fn x => e) | (e e) | rec x => e
In the above, "x" is a variable, "n" stands for an integer, "true" and "false"
are the truth values, "succ" and "pred" are unary functions which either add or
subtract 1 from its arguments, "iszero" is a unary function which returns
"true" if its argument is 0 and "false" otherwise, "if...else..." is a
conditional expression, "fn x => e" is a function with formal parameter "x"
and body "e", and "(e e)" represents function application. (Don't worry about
"rec x => e" for now! It is used for defining recursive functions.)
datatype term =
AST_ID of string | AST_NUM of int | AST_BOOL of bool
| AST_SUCC | AST_PRED | AST_ISZERO
| AST_IF of (term * term * term) | AST_ERROR
| AST_FUN of (string * term) | AST_APP of (term * term)
| AST_REC of (string *term)
As before, this definition mirrors the BNF grammar given above; for instance,
the constructor AST_ID makes a string into an identifier or variable,
and the constructor AST_FUN makes a string representing the formal
parameter and a term representing the body of the function into a function.
Interpreting abstract syntax trees is much easier than trying to interpret
terms directly.
b => true e1 => v
(5) ---------------------------
if b then e1 else e2 => v
We read the rule from the bottom up: if the expression is an if-then-else with
components b, e1, and e2, and b evaluates to true and e1 returns v, then the
entire expression returns v. Of course, we also have the symmetric rule
b => false e2 => v
(6) ----------------------------
if b then e1 else e2 => v
The following are some of the cases for applications:
e1 => succ e2 => n
(7) ----------------------------
(e1 e2) => (n+1)
e1 => pred e2 => 0 e1 =>pred e2 => (n+1)
(8) --------------------------- --------------------------
(e1 e2) => 0 (e1 e2) => n
e1 => iszero e2 => 0 e1 =>iszero e2 => (n+1)
(9) ------------------------ ---------------------------
(e1 e2) => true (e1 e2) => false
Here is a simple example using these rules: Evaluate
if (iszero 0) then 1 else 2
According to rules 5 and 6, we must first evaluate (iszero 0).
By rule (9), this evaluates to true. Now by rule (5) (and the fact
that 1 => 1 via rule 1), this evaluates to 1.
subst (AST_APP(AST_SUCC, AST_ID "x")) "x" (AST_NUM 1)
gives the
answer
AST_APP(AST_SUCC, AST_NUM 1)
.
(10) (fn x => e) => (fn x => e)
Computations occur when you apply these functions to arguments. The next rule
defines call-by-value function application, as in ML. If the function is of
the form fn x => e, evaluate the operand to a value, v1,
substitute v1
in for the formal parameter in e, and then evaluate the modified body:
e1 => (fn x => e3) e2 => v1 e3[x:=v1] => v
(11) --------------------------------------------------------
(e1 e2) => v
For instance, in evaluating the application
((fn x => (succ x)) (succ 0))
we first note that the functions is already full evaluated, so we evaluate
(succ 0) to 1, and then plug this in for x in the body,
(succ x), of the function, obtaining (succ 1), which
evaluates to 2.
rec sum => fn x => fn y => if (iszero x) then y else sum (pred x) (succ y)
represents the following equivalent recursive function definition:
sum x y = if x = 0 then y else (x - 1) + (y + 1)
Thus the variable immediately after
e[x:=rec x => e] => v
(12) --------------------------
(rec x => e) => v
Thus we can evaluate:
rec sum => fn x => fn y => if (iszero x) then y else sum (pred x) (succ y)
by replacing it with
fn x => fn y => if (iszero x) then y else sum' (pred x) (succ y)
where sum' abbreviates the entire expression above which begins
with rec sum ....
let x = E1 in E2 end
For example,
let z = 2 in succ z end
is recognized by the parser. This makes it much easier to write
interesting terms of the language to evaluate.
(fn x => E2) E1
A moment's thought will show you that this has exactly the same meaning as
the let clause. Thus the example above will be parsed as:
AST_APP (AST_FUN ("z",AST_APP (AST_SUCC,AST_ID "z")),AST_NUM 2)
Thus you may use let clauses in creating examples to test your interpreter,
without having to worry about including a new clause in the interpreter.
Compiler.Control.Print.printDepth := 100;
This assignment statement tells ML to print longer answers.
Normally ML truncates your answer at around 20 characters, inserting a
hash symbol (#) to indicate where it has elided parts of the answer.
This command, which updates a variable in a module, tells the system to
print longer answers.
Back to:
kim@cs.williams.edu