Day02_types

Separate Compilation

Before getting started, we need to import all of our definitions from the previous chapter:

Types

For the Require Export to work, you first need to use coqc to compile Day01_intro.v into Day01_intro.vo. If your Proof General is well configured, the Require Export line above should trigger the compile automatically.

If it doesn't work, there are a few things you can do to fix things.

1. Check to make sure that Day02_types.v is in the same directory as Day01_intro.v.

2. Check to make sure that you didn't change any filenames. Downloading things multiple times will cause little (1)s to appear in filenames, and that will break things.

3. Try compiling manually. From the command line, go to the directory where your files are and run:

coqc -Q . DMFP Day02_types.v

You might not have coqc on your PATH. You'll need to add either /Applications/CoqIDE_.../Contents/Resources/bin/ to your PATH or C:\Coq\bin, depending on platform.

If you have trouble (e.g., if you get complaints about missing identifiers later in the file), it may be because the "load path" for Coq is not set up correctly. The Print LoadPath. command may be helpful in sorting out such issues.

In particular, if you see a message like

Compiled library Foo makes inconsistent assumptions over library Coq.Init.Bar

you should check whether you have multiple installations of Coq on your machine. If so, it may be that commands (like coqc) that you execute in a terminal window are getting a different version of Coq than commands executed by Proof General.

Compound Types

The types we have defined so far are examples of "enumerated types": their definitions explicitly enumerate a finite set of elements, each of which is just a bare constructor. Here is a more interesting type definition, where one of the constructors takes an argument:

Inductive rgb : Type :=
  | red
  | green
  | blue.

Inductive color : Type :=
  | black
  | white
  | primary (p : rgb).

Let's look at this in a little more detail.

Every inductively defined type (day, bool, rgb, color, etc.) contains a set of constructor expressions built from constructors like red, primary, true, false, monday, etc. The definitions of rgb and color say how expressions in the sets rgb and color can be built:

red, green, and blue are the constructors of rgb;
black, white, and primary are the constructors of color;
the expression red belongs to the set rgb, as do the expressions green and blue;
the expressions black and white belong to the set color;
if p is an expression belonging to the set rgb, then primary p (pronounced "the constructor primary applied to the argument p") is an expression belonging to the set color; and
expressions formed in these ways are the only ones belonging to the sets rgb and color.

We can define functions on colors using pattern matching just as we have done for day and bool.

Definition monochrome (c : color) : bool :=
  match c with
  | black ⇒ true
  | white ⇒ true
  | primary p ⇒ false
  end.

Since the primary constructor takes an argument, a pattern matching primary should include either a variable (as above) or a constant of appropriate type (as below). How does monochrome white evaluate? We can write out the computation like so, simulating the way Coq works:

monochrome white

    computes to

    match white with
     | black ⇒ true
     | white ⇒ true
     | primary p ⇒ false
     end

Coq considers each case in turn, comparing the scrutinee (here, white) against the patterns (here, black, white, and primary p).

The scrutinee will always be a value; the pattern will be a mix of constructors and variable names. The match construct tries to match the scrutinee to each pattern, where a match is when every constructor is matched with an identical constructor or a variable.

Concretely, the scrutinee white doesn't match the pattern black, because white and black are different constructors. So Coq will skip that case, and keep evaluating:

     match white with
     | black ⇒ true
     | white ⇒ true
     | primary p ⇒ false
     end

    computes to

     match white with
     | white ⇒ true
     | primary p ⇒ false
     end

We're being particularly careful with our computation steps---as we gain facility with Coq, we won't need to step each case. But here it helps us see that each pattern is considered in turn, from top to bottom.

The next pattern is white, which matches our scrutinee white, so our pattern matches and we can execute the code in that branch of the match:

     match white with
     | white ⇒ true
     | primary p ⇒ false
     end

    computes to

true

In this case, there were no variables in the match to keep track of, but in general, Coq will bind each variable in a pattern to the corresponding part of the scrutinee. For example, suppose we had run monochrome (primary blue):

monochrome (primary blue)

    computes to

     match (primary blue) with
     | black ⇒ true
     | white ⇒ true
     | primary p ⇒ false
     end

    computes to

     match (primary blue) with
     | white ⇒ true
     | primary p ⇒ false
     end

    computes to

     match (primary blue) with
     | primary p ⇒ false
     end

At this point, the argument to the primary constructor, blue, in the scrutinee corresponds to the variable p in the pattern primary p. So what Coq will do is evaluate the corresponding branch of the match, remember that p is equal to blue. (In this case, p doesn't occur at all on the right-hand side of ⇒, so it doesn't matter. But it could!)

     match (primary blue) with
     | primary p ⇒ false
     end

     computes to

false (* with p bound to blue *)

Definition isred (c : color) : bool :=
  match c with
  | black ⇒ false
  | white ⇒ false
  | primary red ⇒ true
  | primary _ ⇒ false
  end.

The pattern primary _ here is shorthand for "primary applied to any rgb constructor except red." Recall again that Coq applies patterns in order. For example:

isred (primary green)

computes to

     match (primary green) with
     | black ⇒ false
     | white ⇒ false
     | primary red ⇒ true
     | primary _ ⇒ false

computes to

     match (primary green) with
     | white ⇒ false
     | primary red ⇒ true
     | primary _ ⇒ false

computes to

     match (primary green) with
     | primary red ⇒ true
     | primary _ ⇒ false

At this point, our scrutinee primary green doesn't match the pattern primary red: while the outermost constructors are both primary, the arguments differ: green and red are different constructors of color. So Coq will skip this pattern:

     match (primary green) with
     | primary red ⇒ true
     | primary _ ⇒ false

computes to

match (primary green) with
| primary _ ⇒ false

computes to

false

In this case, the wildcard pattern _ has the same effect as the dummy pattern variable p in the definition of monochrome. Both functions don't actually use the argument to primary. We could have written isred a different way:

In this definition, we explicitly name the primary color we're working with and do a nested pattern match. We can compute isred' (primary green) as follows:

isred' (primary green)

computes to

computes to

     match (primary green) with
     | white ⇒ false
     | primary p ⇒
       match p with
       | red ⇒ true
       | _ ⇒ false
       end
     end

computes to

     match (primary green) with
     | primary p ⇒
       match p with
       | red ⇒ true
       | _ ⇒ false
       end
     end

computes to

     match green with
     | red ⇒ true
     | _ ⇒ false
     end

computes to

     match green with
     | _ ⇒ false
     end

computes to

false

It's good practice to use wildcards when you don't need to name the variable---it helps prevent mistakes, like referring to the wrong variable. Do keep in mind, though: patterns are matched top to bottom. An early wildcard may rule out later cases! Coq should warn you if this is the case.

It's also good practice to try to condense pattern matching: the definition of isred is cleaner than that of isred'.

Exercise: 1 star, standard (is_weekday')

Define a function is_weekday' that behaves like is_weekday but has fewer than four cases. Remember: the order of cases matters!

Definition is_weekday' (d:day) : bool (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
☐

Modules

Coq provides a module system, to aid in organizing large developments. In this course we won't need most of its features, but one is useful: If we enclose a collection of declarations between Module X and End X markers, then, in the remainder of the file after the End, these definitions are referred to by names like X.foo instead of just foo. We will use this feature to introduce the definition of the type nat in an inner module so that it does not interfere with the one from the standard library (which we want to use in the rest because it comes with a tiny bit of convenient special notation).

Module NatPlayground.

Numbers

In this section, we'll define the natural numbers, that is, the numbers:

0, 1, 2, 3, ...

You may have heard in a math class that 0 isn't a natural number---for our purposes, it will be. Computer scientists almost always start counting from 0!

In other programming languages, it's easy to take numbers for granted: int is simply a built-in type, represented in some low-level way on the computer (typically binary).

But in Coq we're building everything from scratch. The representation we'll be using is called unary, which is an odd sort of way of writing numbers. You can think of it like tick marks, so the number zero is represented by no tick marks, the number one is one tick mark, the number two is two tick marks, etc.

That is, a unary natural number is either:

zero (i.e., no tick marks), or
a natural number followed by another tick mark.

Notice that our definition of natural numbers is in terms of natural numbers themselves---that is, to allow the rules describing the natural numbers are inductive.

Here's the above definition translated into Coq:

Inductive nat : Type :=
| O
| S (n : nat).

The clauses of this definition can be read:

O is a natural number (note that this is the letter "O," not the numeral "0").
S can be put in front of a natural number to yield another one -- if n is a natural number, then S n is too.

Again, let's look at this in a little more detail. The definition of nat says how expressions in the set nat can be built:

O and S are constructors;
the expression O belongs to the set nat;
if n is an expression belonging to the set nat, then S n is also an expression belonging to the set nat; and
expressions formed in these two ways are the only ones belonging to the set nat.

The same rules apply for our definitions of day, bool, color, etc.

The above conditions are the precise force of the Inductive declaration. They imply that the expression O, the expression S O, the expression S (S O), the expression S (S (S O)), and so on all belong to the set nat, while other expressions built from data constructors, like true, andb true false, S (S false), and O (O (O S)) do not.

A critical point here is that what we've done so far is just to define a representation of numbers: a way of writing them down. The names O and S are arbitrary, and at this point they have no special meaning -- they are just two different marks that we can use to write down numbers (together with a rule that says any nat will be written as some string of S marks followed by an O). If we like, we can write essentially the same definition this way:

Inductive nat' : Type :=
| stop
| tick (foo : nat').

The interpretation of these marks comes from how we use them to compute. We use O and S rather than these longer names because (a) concision is nice, (b) we'll be using nat quite a bit, and (c) that's how the Coq standard library defines them.

We interpret by writing functions that pattern match on representations of natural numbers just as we did above with booleans and days -- for example, here is the predecessor function, which takes a given number and returns one less than that number, i.e.,

pred (S n) should compute to n

For example, pred (S O) computes to O, i.e., 0 is the predecessor of 1.

Definition pred (n : nat) : nat :=
  match n with
    | O ⇒ O
    | S n' ⇒ n'
  end.

Our definition has an unfortunate property: pred O computes to O, i.e., 0 is its own predecessor. A mathematician might take a different course of action, saying that pred O is undefined---there really is no natural number that comes before 0, since 0 is the first one! Coq won't let us leave anything undefined, though, so we'll simply have to be careful to remember that pred O = O.

The second branch can be read: "if n has the form S n' for some n', then return n'."

End NatPlayground.

Because natural numbers are such a pervasive form of data, Coq provides a tiny bit of built-in magic for parsing and printing them: ordinary arabic numerals can be used as an alternative to the "unary" notation defined by the constructors S and O. Coq prints numbers in arabic form by default:

Check (S (S (S (S O)))).
(* ===> 4 : nat *)

Definition minustwo (n : nat) : nat :=
  match n with
    | O ⇒ O
    | S O ⇒ O
    | S (S n') ⇒ n'
  end.

Compute (minustwo 4).
(* ===> 2 : nat *)

The constructor S has the type nat → nat, just like pred and functions like minustwo:

Check S.
Check pred.
Check minustwo.

These are all things that can be applied to a number to yield a number. However, there is a fundamental difference between the first one and the other two: functions like pred and minustwo come with computation rules -- e.g., the definition of pred says that pred 2 can be simplified to 1 -- while the definition of S has no such behavior attached. Although it is like a function in the sense that it can be applied to an argument, it does not do anything at all! It is just a way of writing down numbers. (Think about standard arabic numerals: the numeral 1 is not a computation; it's a piece of data. When we write 111 to mean the number one hundred and eleven, we are using 1, three times, to write down a concrete representation of a number.)

For most function definitions over numbers, just pattern matching is not enough: we also need recursion, i.e., functions that refer to themselves. For example, to check that a number n is even, we may need to recursively check whether n-2 is even. To write such functions, we use the keyword Fixpoint.

In your prior programming experience, you may not have spent a lot of thought on recursion. Many languages use loops (using words like while or for) rather than recursion. Coq doesn't have loops---only recursion. Don't worry if you're not particularly familiar with recursion... you'll get lots of practice!

Fixpoint evenb (n:nat) : bool :=
  match n with
  | O ⇒ true
  | S O ⇒ false
  | S (S n') ⇒ evenb n'
  end.

The call to evenb n' is a recursive call. One might worry: is it okay to define a function in terms of itself? How do we know that evenb terminates?

We can check an example: is 3 even?

evenb 3

is the same as

evenb (S (S (S O)))

computes to

     match (S (S (S O))) with
     | O ⇒ true
     | S O ⇒ false
     | S (S n') ⇒ evenb n'
     end

computes to

     match (S (S (S O))) with
     | S O ⇒ false
     | S (S n') ⇒ evenb n'
     end

computes to

     match (S (S (S O))) with
     | S (S n') ⇒ evenb n'
     end

computes to

evenb n' (* with n' bound to S O *)

is the same as

evenb (S 0)

computes to

     match (S O) with
     | O ⇒ true
     | S O ⇒ false
     | S (S n') ⇒ evenb n'
     end

computes to

     match (S O) with
     | S O ⇒ false
     | S (S n') ⇒ evenb n'
     end

computes to

false

More generally, evenb only ever makes recursive calls on smaller inputs: if we put in the number n, we'll make a recursive call on two less than n.

A surprising fact: Coq will only let us write functions that are guaranteed to terminate on all inputs. A good rule of thumb is that every recursive function should make a recursive call only on subparts of its input.

We can define oddb by a similar Fixpoint declaration, but here is a simpler definition:

Definition oddb (n:nat) : bool := negb (evenb n).

Compute (oddb 1).
Compute (oddb 4).

Module NatPlayground2.

Fixpoint plus (n : nat) (m : nat) : nat :=
  match n with
    | O ⇒ m
    | S n' ⇒ S (plus n' m)
  end.

Adding three to two now gives us five, as we'd expect.

Compute (plus 3 2).

The simplification that Coq performs to reach this conclusion can be visualized as follows, a little more succinctly than what we wrote above:

(*  plus (S (S (S O))) (S (S O))
==> S (plus (S (S O)) (S (S O)))
      by the second clause of the match
==> S (S (plus (S O) (S (S O))))
      by the second clause of the match
==> S (S (S (plus O (S (S O)))))
      by the second clause of the match
==> S (S (S (S (S O))))
      by the first clause of the match
*)

As a notational convenience, if two or more arguments have the same type, they can be written together. In the following definition, (n m : nat) means just the same as if we had written (n : nat) (m : nat).

Fixpoint mult (n m : nat) : nat :=
  match n with
    | O ⇒ O
    | S n' ⇒ plus m (mult n' m)
  end.

Compute (mult 3 3).
Compute (mult 5 2).

You can match two expressions at once by putting a comma between them:

Fixpoint minus (n m:nat) : nat :=
  match n, m with
  | O , _ ⇒ O
  | S _ , O ⇒ n
  | S n', S m' ⇒ minus n' m'
  end.

Our minus function has the same problem pred does: minus 0 n results in 0! This 'truncating' subtraction is necessary because we're working with natural numbers, i.e., there are no negative numbers.

Keep in mind: these definitions have suggestive names, but it's up to us as humans to agree that minus corresponds to our notion of subtraction. I would wager that it does: if we pick an n and an m such that m ≤ n, then minus n m yields n - m. We don't have the tools to prove this... yet!

Again, the _ in the first line is a wildcard pattern. Writing _ in a pattern is the same as writing some variable that doesn't get used on the right-hand side. This avoids the need to invent a variable name.

The pow function defines exponentiation: exp n m should yield n to the mth power.

Fixpoint pow n m :=
  match m with
    | 0 ⇒ 1
    | S m ⇒ n × (pow n m )
  end.

End NatPlayground2.

Exercise: 1 star, standard (factorial)

Here's the standard mathematical factorial function:

       factorial(0)  =  1
       factorial(n)  =  n * factorial(n-1)     (if n>0)

We'll meet this function again later in the course. For now, translate this into Coq.

Fixpoint factorial (n:nat) : nat
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
☐

Compute (factorial 3).
Compute (factorial 5).

Notation "x + y" := (plus x y)
                       (at level 50, left associativity)
                       : nat_scope.
Notation "x - y" := (minus x y)
                       (at level 50, left associativity)
                       : nat_scope.
Notation "x * y" := (mult x y)
                       (at level 40, left associativity)
                       : nat_scope.

Check ((0 + 1) + 1).

(The level, associativity, and nat_scope annotations control how these notations are treated by Coq's parser. The details are not important for our purposes, but interested readers can refer to the optional "More on Notation" section at the end of this chapter.)

Note that these do not change the definitions we've already made: they are simply instructions to the Coq parser to accept x + y in place of plus x y and, conversely, to the Coq pretty-printer to display plus x y as x + y.

Comparisons

When we say that Coq comes with almost nothing built-in, we really mean it: even equality testing for numbers is a user-defined operation! We now define a function eqb, which tests natural numbers for equality, yielding a boolean. Note the use of nested matches (we could also have used a simultaneous match, as we did in minus.)

The leb function tests whether its first argument is less than or equal to its second argument, yielding a boolean.

Exercise: 2 stars, standard (leb)

Fixpoint leb (n m : nat) : bool (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_test_leb : option (nat ×string) := None.
☐

Compute (leb 2 2).
Compute (leb 2 4).
Compute (leb 4 2).

Exercise: 1 star, standard (ltb)

The blt_nat function tests natural numbers for less-than, yielding a boolean. Instead of making up a new Fixpoint for this one, define it in terms of a previously defined function.

Definition ltb (n m : nat) : bool
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_ltb : option (nat ×string) := None.
☐

Compute (ltb 2 2).
Compute (ltb 2 4).
Compute (ltb 4 2).

Inductive ordering : Type :=
  | LT
  | EQ
  | GT.

Fixpoint compare (n m : nat) : ordering :=
  match n, m with
  | O, O ⇒ EQ
  | S _, O ⇒ GT
  | O, S _ ⇒ LT
  | S n', S m' ⇒ compare n' m'
  end.

We can define a notion of minuimum.

Definition min (n₁ n₂ : nat) : nat :=
  if leb n₁ n₂
  then n₁
  else n₂.

We can also define argmin, which an optimization function: given a notion of cost, choose the lower cost option. We'll meet the {A : Type} syntax in detail later, and we'll talk more about cost (a higher-order function) in the next section.

This next definition uses the let ... in form to define a local variable; Definition and Fixpoint give you new top-level definitions, but let is how you get new local definitions. (Well, each branch of a match has its own definitions, but that's not exactly the same!)

The general syntax is:
let NAME : TYPE := EXPR in
EXPR

Definition argmin {A:Type} (cost : A → nat) (o₁ o₂ : A) : A :=
  let c₁ := cost o₁ in
  let c₂ := cost o₂ in
  if leb c₁ c₂
  then o₁
  else o₂.

We can lift these definitions to talk about three things. Here's a manual lifting:

Definition min3 (n₁ n₂ n₃ : nat) : nat :=
  if leb n₁ n₂
  then if leb n₁ n₃
       then n₁
       else n₃
  else if leb n₂ n₃
       then n₂
       else n₃.

Definition argmin3 {A:Type} (cost : A → nat) (o₁ o₂ o₃ : A) : A :=
  let c₁ := cost o₁ in
  let c₂ := cost o₂ in
  let c₃ := cost o₃ in
  if leb c₁ c₂
  then if leb c₁ c₃
       then o₁
       else o₃
  else if leb c₂ c₃
       then o₂
       else o₃.

And here's how we'd reuse the definitions:

Definition min3' (n₁ n₂ n₃ : nat) : nat :=
min n₁ (min n₂ n₃).

Definition argmin3' {A:Type} (cost : A → nat) (o₁ o₂ o₃ : A) : A :=
argmin cost o₁ (argmin cost o₂ o₃).

Fixpoints and Structural Recursion

Here is a copy of the definition of addition:

Fixpoint plus' (n : nat) (m : nat) : nat :=
  match n with
  | O ⇒ m
  | S n' ⇒ S (plus' n' m)
  end.

When Coq checks this definition, it notes that plus' is "decreasing on 1st argument." What this means is that we are performing a structural recursion over the argument n -- i.e., that we make recursive calls only on strictly smaller values of n. This implies that all calls to plus' will eventually terminate. Coq demands that some argument of every Fixpoint definition is "decreasing."

This requirement is a fundamental feature of Coq's design: In particular, it guarantees that every function that can be defined in Coq will terminate on all inputs. However, because Coq's "decreasing analysis" is not very sophisticated, it is sometimes necessary to write functions in slightly unnatural ways.

Exercise: 2 stars, standard (decreasing)

To get a concrete sense of this, find a way to write a sensible Fixpoint definition (of a simple function on numbers, say) that does terminate on all inputs, but that Coq will reject because of this restriction.

(* FILL IN HERE *)
(* Do not modify the following line: *)
Definition manual_grade_for_decreasing : option (nat ×string) := None.
☐

Functions as Data

Like most modern programming languages -- especially other "functional" languages, including OCaml, Haskell, Racket, Scala, Clojure, etc. -- Coq treats functions as first-class citizens, allowing them to be passed as arguments to other functions, returned as results, stored in data structures, etc.

Higher-Order Functions

Functions that manipulate other functions are often called higher-order functions. Here's a simple one:

Definition doit3times (f:nat →nat) (n:nat) : nat :=
f (f (f n)).

The argument f here is itself a function (from X to X); the body of doit3times applies f three times to some value n.

Check doit3times : (nat → nat ) → nat → nat.

Compute (doit3times minustwo 9). (* ==> 3 *)

Compute (doit3times (fun n ⇒ S n) 2). (* ==> 5 *)

Note that fun n ⇒ S n is morally equivalent to just S:

Compute (doit3times S 2). (* ==> 5 *)

We can construct a function "on the fly" without declaring it at the top level or giving it a name.

Check (fun n ⇒ n × n). (* ==> nat -> nat *)

The expression (fun n ⇒ n + n) can be read as "the function that, given a number n, yields n + n", i.e., a function that doubles its input.

Compute (doit3times (fun n ⇒ n + n) 2). (* ==> 16 *)

Work out on paper how doit3times (fun n ⇒ n + n) 2 evaluate. Here are the first few steps:

   doit3times (fun n ⇒ n + n) 2
]] =
[[
   (fun n ⇒ n + n) ((fun n ⇒ n + n) ((fun n ⇒ n + n) 2))
]] =
[[
   (fun n ⇒ n + n) ((fun n ⇒ n + n) (2 + 2))

Pick up from here. Note: there's more than one way to do this, but you should get the same answer no matter what.

We can use the fix keyword instead of fun to define a recursive function on the fly. We'll have to give a name to it, though.

Check (fix evenb' (n : nat) : bool :=
       match n with
       | O ⇒ true
       | S O ⇒ false
       | S (S n') ⇒ evenb' n'
       end).

Functions That Construct Functions

Most of the higher-order functions we have talked about so far take functions as arguments. Let's look at some examples that involve returning functions as the results of other functions. To begin, here is a function that takes a value x (drawn from some type X) and returns a function from nat to X that yields x whenever it is called, ignoring its nat argument.

Definition constfun {X: Type} (x: X) : nat →X :=
fun (k:nat) ⇒ x.

Definition ftrue := constfun true.

Compute ftrue 0. (* ===> true *)

Compute (constfun 5) 99. (* ===> 5 *)

In fact, the multiple-argument functions we have already seen are also examples of passing functions as data. To see why, recall the type of plus.

Check plus : nat → nat → nat.

Each → in this expression is actually a binary operator on types. This operator is right-associative, so the type of plus is really a shorthand for nat → (nat → nat) -- i.e., it can be read as saying that "plus is a one-argument function that takes a nat and returns a one-argument function that takes another nat and returns a nat." In the examples above, we have always applied plus to both of its arguments at once, but if we like we can supply just the first. This is called partial application.

Definition plus3 := plus 3.
Check plus3 : nat → nat.

Compute (plus3 4). (* ==> 7 *)
Compute (doit3times plus3 0). (* ==> 9 *)
Compute (doit3times (plus 3) 0). (* ==> 9 *)

Exercise: 1 star, standard (partial_app_minus)

Partial application is a powerful tool, used frequently by functional programmers. It can be very confusing at first, though. Explain what answer you might expect just from reading the code. Why do we get the answer we do?

Compute (doit3times (minus 1) 10).

(* FILL IN HERE *)
(* Do not modify the following line: *)
Definition manual_grade_for_partial_app_minus : option (nat ×string) := None.
☐

(* Mon Oct 12 08:48:47 PDT 2020 *)