Day02_types

Separate Compilation

Before getting started, we need to import all of our definitions from the previous chapter:

Require Export DMFP.Day01_intro.

Types

For the Require Export to work, you first need to use coqc to compile Day01_intro.v into Day01_intro.vo. If your Proof General is well configured, the Require Export line above should trigger the compile automatically.
If it doesn't work, there are a few things you can do to fix things.
1. Check to make sure that Day02_types.v is in the same directory as Day01_intro.v.
2. Check to make sure that you didn't change any filenames. Downloading things multiple times will cause little (1)s to appear in filenames, and that will break things.
3. Try compiling manually. From the command line, go to the directory where your files are and run:
coqc -Q . DMFP Day02_types.v
You might not have coqc on your PATH. You'll need to add either /Applications/CoqIDE_.../Contents/Resources/bin/ to your PATH or C:\Coq\bin, depending on platform.
If you have trouble (e.g., if you get complaints about missing identifiers later in the file), it may be because the "load path" for Coq is not set up correctly. The Print LoadPath. command may be helpful in sorting out such issues.
In particular, if you see a message like
Compiled library Foo makes inconsistent assumptions over library Coq.Init.Bar
you should check whether you have multiple installations of Coq on your machine. If so, it may be that commands (like coqc) that you execute in a terminal window are getting a different version of Coq than commands executed by Proof General.

Compound Types

The types we have defined so far are examples of "enumerated types": their definitions explicitly enumerate a finite set of elements, each of which is just a bare constructor. Here is a more interesting type definition, where one of the constructors takes an argument:

Inductive rgb : Type :=
  | red
  | green
  | blue.

Inductive color : Type :=
  | black
  | white
  | primary (p : rgb).
Let's look at this in a little more detail.
Every inductively defined type (day, bool, rgb, color, etc.) contains a set of constructor expressions built from constructors like red, primary, true, false, monday, etc. The definitions of rgb and color say how expressions in the sets rgb and color can be built:
  • red, green, and blue are the constructors of rgb;
  • black, white, and primary are the constructors of color;
  • the expression red belongs to the set rgb, as do the expressions green and blue;
  • the expressions black and white belong to the set color;
  • if p is an expression belonging to the set rgb, then primary p (pronounced "the constructor primary applied to the argument p") is an expression belonging to the set color; and
  • expressions formed in these ways are the only ones belonging to the sets rgb and color.
We can define functions on colors using pattern matching just as we have done for day and bool.

Definition monochrome (c : color) : bool :=
  match c with
  | blacktrue
  | whitetrue
  | primary pfalse
  end.
Since the primary constructor takes an argument, a pattern matching primary should include either a variable (as above) or a constant of appropriate type (as below). How does monochrome white evaluate? We can write out the computation like so, simulating the way Coq works:

    monochrome white
    computes to

    match white with
     | blacktrue
     | whitetrue
     | primary pfalse
     end
Coq considers each case in turn, comparing the scrutinee (here, white) against the patterns (here, black, white, and primary p).
The scrutinee will always be a value; the pattern will be a mix of constructors and variable names. The match construct tries to match the scrutinee to each pattern, where a match is when every constructor is matched with an identical constructor or a variable.
Concretely, the scrutinee white doesn't match the pattern black, because white and black are different constructors. So Coq will skip that case, and keep evaluating:

     match white with
     | blacktrue
     | whitetrue
     | primary pfalse
     end
    computes to

     match white with
     | whitetrue
     | primary pfalse
     end
We're being particularly careful with our computation steps---as we gain facility with Coq, we won't need to step each case. But here it helps us see that each pattern is considered in turn, from top to bottom.
The next pattern is white, which matches our scrutinee white, so our pattern matches and we can execute the code in that branch of the match:

     match white with
     | whitetrue
     | primary pfalse
     end
    computes to

    true
In this case, there were no variables in the match to keep track of, but in general, Coq will bind each variable in a pattern to the corresponding part of the scrutinee. For example, suppose we had run monochrome (primary blue):

     monochrome (primary blue)
    computes to

     match (primary blue) with
     | blacktrue
     | whitetrue
     | primary pfalse
     end
    computes to

     match (primary blue) with
     | whitetrue
     | primary pfalse
     end
    computes to

     match (primary blue) with
     | primary pfalse
     end
At this point, the argument to the primary constructor, blue, in the scrutinee corresponds to the variable p in the pattern primary p. So what Coq will do is evaluate the corresponding branch of the match, remember that p is equal to blue. (In this case, p doesn't occur at all on the right-hand side of , so it doesn't matter. But it could!)

     match (primary blue) with
     | primary pfalse
     end
     computes to

     false (* with p bound to blue *)

Definition isred (c : color) : bool :=
  match c with
  | blackfalse
  | whitefalse
  | primary redtrue
  | primary _false
  end.
The pattern primary _ here is shorthand for "primary applied to any rgb constructor except red." Recall again that Coq applies patterns in order. For example:

     isred (primary green)
computes to

     match (primary green) with
     | blackfalse
     | whitefalse
     | primary redtrue
     | primary _false
computes to

     match (primary green) with
     | whitefalse
     | primary redtrue
     | primary _false
computes to

     match (primary green) with
     | primary redtrue
     | primary _false
At this point, our scrutinee primary green doesn't match the pattern primary red: while the outermost constructors are both primary, the arguments differ: green and red are different constructors of color. So Coq will skip this pattern:

     match (primary green) with
     | primary redtrue
     | primary _false
computes to

     match (primary green) with
     | primary _false
computes to

     false
In this case, the wildcard pattern _ has the same effect as the dummy pattern variable p in the definition of monochrome. Both functions don't actually use the argument to primary. We could have written isred a different way:

Definition isred' (c : color) : bool :=
  match c with
  | blackfalse
  | whitefalse
  | primary p
    match p with
    | redtrue
    | _false
    end
  end.
In this definition, we explicitly name the primary color we're working with and do a nested pattern match. We can compute isred' (primary green) as follows:

     isred' (primary green)
computes to

     match (primary green) with
     | blackfalse
     | whitefalse
     | primary p
       match p with
       | redtrue
       | _false
       end
     end
computes to

     match (primary green) with
     | whitefalse
     | primary p
       match p with
       | redtrue
       | _false
       end
     end
computes to

     match (primary green) with
     | primary p
       match p with
       | redtrue
       | _false
       end
     end
computes to

     match green with
     | redtrue
     | _false
     end
computes to

     match green with
     | _false
     end
computes to

    false
It's good practice to use wildcards when you don't need to name the variable---it helps prevent mistakes, like referring to the wrong variable. Do keep in mind, though: patterns are matched top to bottom. An early wildcard may rule out later cases! Coq should warn you if this is the case.
It's also good practice to try to condense pattern matching: the definition of isred is cleaner than that of isred'.

Exercise: 1 star, standard (is_weekday')

Define a function is_weekday' that behaves like is_weekday but has fewer than four cases. Remember: the order of cases matters!
Definition is_weekday' (d:day) : bool (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Modules

Coq provides a module system, to aid in organizing large developments. In this course we won't need most of its features, but one is useful: If we enclose a collection of declarations between Module X and End X markers, then, in the remainder of the file after the End, these definitions are referred to by names like X.foo instead of just foo. We will use this feature to introduce the definition of the type nat in an inner module so that it does not interfere with the one from the standard library (which we want to use in the rest because it comes with a tiny bit of convenient special notation).

Module NatPlayground.

Numbers

In this section, we'll define the natural numbers, that is, the numbers:
0, 1, 2, 3, ...
You may have heard in a math class that 0 isn't a natural number---for our purposes, it will be. Computer scientists almost always start counting from 0!
In other programming languages, it's easy to take numbers for granted: int is simply a built-in type, represented in some low-level way on the computer (typically binary).
But in Coq we're building everything from scratch. The representation we'll be using is called unary, which is an odd sort of way of writing numbers. You can think of it like tick marks, so the number zero is represented by no tick marks, the number one is one tick mark, the number two is two tick marks, etc.
That is, a unary natural number is either:
  • zero (i.e., no tick marks), or
  • a natural number followed by another tick mark.
Notice that our definition of natural numbers is in terms of natural numbers themselves---that is, to allow the rules describing the natural numbers are inductive.
Here's the above definition translated into Coq:

Inductive nat : Type :=
  | O
  | S (n : nat).
The clauses of this definition can be read:
  • O is a natural number (note that this is the letter "O," not the numeral "0").
  • S can be put in front of a natural number to yield another one -- if n is a natural number, then S n is too.
Again, let's look at this in a little more detail. The definition of nat says how expressions in the set nat can be built:
  • O and S are constructors;
  • the expression O belongs to the set nat;
  • if n is an expression belonging to the set nat, then S n is also an expression belonging to the set nat; and
  • expressions formed in these two ways are the only ones belonging to the set nat.
The same rules apply for our definitions of day, bool, color, etc.
The above conditions are the precise force of the Inductive declaration. They imply that the expression O, the expression S O, the expression S (S O), the expression S (S (S O)), and so on all belong to the set nat, while other expressions built from data constructors, like true, andb true false, S (S false), and O (O (O S)) do not.
A critical point here is that what we've done so far is just to define a representation of numbers: a way of writing them down. The names O and S are arbitrary, and at this point they have no special meaning -- they are just two different marks that we can use to write down numbers (together with a rule that says any nat will be written as some string of S marks followed by an O). If we like, we can write essentially the same definition this way:

Inductive nat' : Type :=
  | stop
  | tick (foo : nat').
The interpretation of these marks comes from how we use them to compute. We use O and S rather than these longer names because (a) concision is nice, (b) we'll be using nat quite a bit, and (c) that's how the Coq standard library defines them.
We interpret by writing functions that pattern match on representations of natural numbers just as we did above with booleans and days -- for example, here is the predecessor function, which takes a given number and returns one less than that number, i.e.,
pred (S n) should compute to n
For example, pred (S O) computes to O, i.e., 0 is the predecessor of 1.

Definition pred (n : nat) : nat :=
  match n with
    | OO
    | S n'n'
  end.
Our definition has an unfortunate property: pred O computes to O, i.e., 0 is its own predecessor. A mathematician might take a different course of action, saying that pred O is undefined---there really is no natural number that comes before 0, since 0 is the first one! Coq won't let us leave anything undefined, though, so we'll simply have to be careful to remember that pred O = O.
The second branch can be read: "if n has the form S n' for some n', then return n'."
Because natural numbers are such a pervasive form of data, Coq provides a tiny bit of built-in magic for parsing and printing them: ordinary arabic numerals can be used as an alternative to the "unary" notation defined by the constructors S and O. Coq prints numbers in arabic form by default:

Check (S (S (S (S O)))).
  (* ===> 4 : nat *)

Definition minustwo (n : nat) : nat :=
  match n with
    | OO
    | S OO
    | S (S n') ⇒ n'
  end.

Compute (minustwo 4).
  (* ===> 2 : nat *)
The constructor S has the type nat nat, just like pred and functions like minustwo:

Check S.
Check pred.
Check minustwo.
These are all things that can be applied to a number to yield a number. However, there is a fundamental difference between the first one and the other two: functions like pred and minustwo come with computation rules -- e.g., the definition of pred says that pred 2 can be simplified to 1 -- while the definition of S has no such behavior attached. Although it is like a function in the sense that it can be applied to an argument, it does not do anything at all! It is just a way of writing down numbers. (Think about standard arabic numerals: the numeral 1 is not a computation; it's a piece of data. When we write 111 to mean the number one hundred and eleven, we are using 1, three times, to write down a concrete representation of a number.)
For most function definitions over numbers, just pattern matching is not enough: we also need recursion, i.e., functions that refer to themselves. For example, to check that a number n is even, we may need to recursively check whether n-2 is even. To write such functions, we use the keyword Fixpoint.
In your prior programming experience, you may not have spent a lot of thought on recursion. Many languages use loops (using words like while or for) rather than recursion. Coq doesn't have loops---only recursion. Don't worry if you're not particularly familiar with recursion... you'll get lots of practice!

Fixpoint evenb (n:nat) : bool :=
  match n with
  | Otrue
  | S Ofalse
  | S (S n') ⇒ evenb n'
  end.
The call to evenb n' is a recursive call. One might worry: is it okay to define a function in terms of itself? How do we know that evenb terminates?
We can check an example: is 3 even?

     evenb 3
is the same as

     evenb (S (S (S O)))
computes to

     match (S (S (S O))) with
     | Otrue
     | S Ofalse
     | S (S n') ⇒ evenb n'
     end
computes to

     match (S (S (S O))) with
     | S Ofalse
     | S (S n') ⇒ evenb n'
     end
computes to

     match (S (S (S O))) with
     | S (S n') ⇒ evenb n'
     end
computes to

     evenb n' (* with n' bound to S O *)
is the same as

     evenb (S 0)
computes to

     match (S O) with
     | Otrue
     | S Ofalse
     | S (S n') ⇒ evenb n'
     end
computes to

     match (S O) with
     | S Ofalse
     | S (S n') ⇒ evenb n'
     end
computes to

    false
More generally, evenb only ever makes recursive calls on smaller inputs: if we put in the number n, we'll make a recursive call on two less than n.
A surprising fact: Coq will only let us write functions that are guaranteed to terminate on all inputs. A good rule of thumb is that every recursive function should make a recursive call only on subparts of its input.
We can define oddb by a similar Fixpoint declaration, but here is a simpler definition:

Definition oddb (n:nat) : bool := negb (evenb n).

Compute (oddb 1).
Compute (oddb 4).

Module NatPlayground2.

Fixpoint plus (n : nat) (m : nat) : nat :=
  match n with
    | Om
    | S n'S (plus n' m)
  end.
Adding three to two now gives us five, as we'd expect.

Compute (plus 3 2).
The simplification that Coq performs to reach this conclusion can be visualized as follows, a little more succinctly than what we wrote above:

(*  plus (S (S (S O))) (S (S O))
==> S (plus (S (S O)) (S (S O)))
      by the second clause of the match
==> S (S (plus (S O) (S (S O))))
      by the second clause of the match
==> S (S (S (plus O (S (S O)))))
      by the second clause of the match
==> S (S (S (S (S O))))
      by the first clause of the match
*)

As a notational convenience, if two or more arguments have the same type, they can be written together. In the following definition, (n m : nat) means just the same as if we had written (n : nat) (m : nat).

Fixpoint mult (n m : nat) : nat :=
  match n with
    | OO
    | S n'plus m (mult n' m)
  end.

Compute (mult 3 3).
Compute (mult 5 2).
You can match two expressions at once by putting a comma between them:

Fixpoint minus (n m:nat) : nat :=
  match n, m with
  | O , _O
  | S _ , On
  | S n', S m'minus n' m'
  end.
Our minus function has the same problem pred does: minus 0 n results in 0! This 'truncating' subtraction is necessary because we're working with natural numbers, i.e., there are no negative numbers.
Keep in mind: these definitions have suggestive names, but it's up to us as humans to agree that minus corresponds to our notion of subtraction. I would wager that it does: if we pick an n and an m such that m n, then minus n m yields n - m. We don't have the tools to prove this... yet!
Again, the _ in the first line is a wildcard pattern. Writing _ in a pattern is the same as writing some variable that doesn't get used on the right-hand side. This avoids the need to invent a variable name.
The pow function defines exponentiation: exp n m should yield n to the mth power.
Fixpoint pow n m :=
  match m with
    | 0 ⇒ 1
    | S mn × (pow n m)
  end.

End NatPlayground2.

Exercise: 1 star, standard (factorial)

Here's the standard mathematical factorial function:
       factorial(0)  =  1
       factorial(n)  =  n * factorial(n-1)     (if n>0)
We'll meet this function again later in the course. For now, translate this into Coq.

Fixpoint factorial (n:nat) : nat
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute (factorial 3).
Compute (factorial 5).

Notation "x + y" := (plus x y)
                       (at level 50, left associativity)
                       : nat_scope.
Notation "x - y" := (minus x y)
                       (at level 50, left associativity)
                       : nat_scope.
Notation "x * y" := (mult x y)
                       (at level 40, left associativity)
                       : nat_scope.

Check ((0 + 1) + 1).
(The level, associativity, and nat_scope annotations control how these notations are treated by Coq's parser. The details are not important for our purposes, but interested readers can refer to the optional "More on Notation" section at the end of this chapter.)
Note that these do not change the definitions we've already made: they are simply instructions to the Coq parser to accept x + y in place of plus x y and, conversely, to the Coq pretty-printer to display plus x y as x + y.

Comparisons

When we say that Coq comes with almost nothing built-in, we really mean it: even equality testing for numbers is a user-defined operation! We now define a function eqb, which tests natural numbers for equality, yielding a boolean. Note the use of nested matches (we could also have used a simultaneous match, as we did in minus.)

Fixpoint eqb (n m : nat) : bool :=
  match n with
  | Omatch m with
         | Otrue
         | S m'false
         end
  | S n'match m with
            | Ofalse
            | S m'eqb n' m'
            end
  end.
The leb function tests whether its first argument is less than or equal to its second argument, yielding a boolean.

Exercise: 2 stars, standard (leb)

Fixpoint leb (n m : nat) : bool (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_test_leb : option (nat×string) := None.

Compute (leb 2 2).
Compute (leb 2 4).
Compute (leb 4 2).

Exercise: 1 star, standard (ltb)

The blt_nat function tests natural numbers for less-than, yielding a boolean. Instead of making up a new Fixpoint for this one, define it in terms of a previously defined function.

Definition ltb (n m : nat) : bool
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_ltb : option (nat×string) := None.

Compute (ltb 2 2).
Compute (ltb 2 4).
Compute (ltb 4 2).

Inductive ordering : Type :=
  | LT
  | EQ
  | GT.

Fixpoint compare (n m : nat) : ordering :=
  match n, m with
  | O, OEQ
  | S _, OGT
  | O, S _LT
  | S n', S m'compare n' m'
  end.
We can define a notion of minuimum.

Definition min (n1 n2 : nat) : nat :=
  if leb n1 n2
  then n1
  else n2.
We can also define argmin, which an optimization function: given a notion of cost, choose the lower cost option. We'll meet the {A : Type} syntax in detail later, and we'll talk more about cost (a higher-order function) in the next section.
This next definition uses the let ... in form to define a local variable; Definition and Fixpoint give you new top-level definitions, but let is how you get new local definitions. (Well, each branch of a match has its own definitions, but that's not exactly the same!)
The general syntax is:
    let NAME : TYPE := EXPR in
    EXPR

Definition argmin {A:Type} (cost : A nat) (o1 o2 : A) : A :=
  let c1 := cost o1 in
  let c2 := cost o2 in
  if leb c1 c2
  then o1
  else o2.
We can lift these definitions to talk about three things. Here's a manual lifting:

Definition min3 (n1 n2 n3 : nat) : nat :=
  if leb n1 n2
  then if leb n1 n3
       then n1
       else n3
  else if leb n2 n3
       then n2
       else n3.

Definition argmin3 {A:Type} (cost : A nat) (o1 o2 o3 : A) : A :=
  let c1 := cost o1 in
  let c2 := cost o2 in
  let c3 := cost o3 in
  if leb c1 c2
  then if leb c1 c3
       then o1
       else o3
  else if leb c2 c3
       then o2
       else o3.
And here's how we'd reuse the definitions:

Definition min3' (n1 n2 n3 : nat) : nat :=
  min n1 (min n2 n3).

Definition argmin3' {A:Type} (cost : A nat) (o1 o2 o3 : A) : A :=
  argmin cost o1 (argmin cost o2 o3).

Fixpoints and Structural Recursion

Here is a copy of the definition of addition:

Fixpoint plus' (n : nat) (m : nat) : nat :=
  match n with
  | Om
  | S n'S (plus' n' m)
  end.
When Coq checks this definition, it notes that plus' is "decreasing on 1st argument." What this means is that we are performing a structural recursion over the argument n -- i.e., that we make recursive calls only on strictly smaller values of n. This implies that all calls to plus' will eventually terminate. Coq demands that some argument of every Fixpoint definition is "decreasing."
This requirement is a fundamental feature of Coq's design: In particular, it guarantees that every function that can be defined in Coq will terminate on all inputs. However, because Coq's "decreasing analysis" is not very sophisticated, it is sometimes necessary to write functions in slightly unnatural ways.

Exercise: 2 stars, standard (decreasing)

To get a concrete sense of this, find a way to write a sensible Fixpoint definition (of a simple function on numbers, say) that does terminate on all inputs, but that Coq will reject because of this restriction.

(* FILL IN HERE *)
(* Do not modify the following line: *)
Definition manual_grade_for_decreasing : option (nat×string) := None.

Functions as Data

Like most modern programming languages -- especially other "functional" languages, including OCaml, Haskell, Racket, Scala, Clojure, etc. -- Coq treats functions as first-class citizens, allowing them to be passed as arguments to other functions, returned as results, stored in data structures, etc.

Higher-Order Functions

Functions that manipulate other functions are often called higher-order functions. Here's a simple one:

Definition doit3times (f:natnat) (n:nat) : nat :=
  f (f (f n)).
The argument f here is itself a function (from X to X); the body of doit3times applies f three times to some value n.

Check doit3times : (nat nat) nat nat.

Compute (doit3times minustwo 9). (* ==> 3 *)

Compute (doit3times (fun nS n) 2). (* ==> 5 *)
Note that fun n S n is morally equivalent to just S:

Compute (doit3times S 2). (* ==> 5 *)
We can construct a function "on the fly" without declaring it at the top level or giving it a name.

Check (fun nn × n). (* ==> nat -> nat *)
The expression (fun n n + n) can be read as "the function that, given a number n, yields n + n", i.e., a function that doubles its input.

Compute (doit3times (fun nn + n) 2). (* ==> 16 *)
Work out on paper how doit3times (fun n n + n) 2 evaluate. Here are the first few steps:

   doit3times (fun nn + n) 2
]] =
[[
   (fun nn + n) ((fun nn + n) ((fun nn + n) 2))
]] =
[[
   (fun nn + n) ((fun nn + n) (2 + 2))
Pick up from here. Note: there's more than one way to do this, but you should get the same answer no matter what.
We can use the fix keyword instead of fun to define a recursive function on the fly. We'll have to give a name to it, though.

Check (fix evenb' (n : nat) : bool :=
       match n with
       | Otrue
       | S Ofalse
       | S (S n') ⇒ evenb' n'
       end).

Functions That Construct Functions

Most of the higher-order functions we have talked about so far take functions as arguments. Let's look at some examples that involve returning functions as the results of other functions. To begin, here is a function that takes a value x (drawn from some type X) and returns a function from nat to X that yields x whenever it is called, ignoring its nat argument.

Definition constfun {X: Type} (x: X) : natX :=
  fun (k:nat) ⇒ x.

Definition ftrue := constfun true.

Compute ftrue 0. (* ===> true *)

Compute (constfun 5) 99. (* ===> 5 *)
In fact, the multiple-argument functions we have already seen are also examples of passing functions as data. To see why, recall the type of plus.

Check plus : nat nat nat.
Each in this expression is actually a binary operator on types. This operator is right-associative, so the type of plus is really a shorthand for nat (nat nat) -- i.e., it can be read as saying that "plus is a one-argument function that takes a nat and returns a one-argument function that takes another nat and returns a nat." In the examples above, we have always applied plus to both of its arguments at once, but if we like we can supply just the first. This is called partial application.

Definition plus3 := plus 3.
Check plus3 : nat nat.

Compute (plus3 4). (* ==> 7 *)
Compute (doit3times plus3 0). (* ==> 9 *)
Compute (doit3times (plus 3) 0). (* ==> 9 *)

Exercise: 1 star, standard (partial_app_minus)

Partial application is a powerful tool, used frequently by functional programmers. It can be very confusing at first, though. Explain what answer you might expect just from reading the code. Why do we get the answer we do?

Compute (doit3times (minus 1) 10).

(* FILL IN HERE *)
(* Do not modify the following line: *)
Definition manual_grade_for_partial_app_minus : option (nat×string) := None.

(* Mon Oct 12 08:48:47 PDT 2020 *)