Day04_structures

Require Export DMFP.Day03_recursion.

Structures

In this chapter we continue our development of basic concepts of functional programming. The critical new ideas are polymorphism (abstracting functions over the types of the data they manipulate) and higher-order functions (treating functions as data). We begin with polymorphism.
Along the way, we'll talk about sets, an absolutely fundamental notion in mathematics (discrete and otherwise). We'll also introduce what will be a long running case study in the course: bioinformatics and computationally manipulating DNA.

Lists

Interesting programs need to be able to manipulate data structures. Let's start with lists.
We'll need to work with lists with elements from a variety of types -- lists of nats, lists of booleans, lists of lists, etc. We could just define a new inductive datatype for each of these, for example...
Module MonoList.

Inductive natlist : Type :=
  | nat_nil
  | nat_cons (n : nat) (l : natlist).

Fixpoint natlist_length (l : natlist) : nat :=
  match l with
  | nat_nil ⇒ 0
  | nat_cons _ l' ⇒ 1 + natlist_length l'
  end.

Compute (natlist_length (nat_cons 1 (nat_cons 2 (nat_cons 0 nat_nil)))). (* ==> 3 *)

Inductive boollist : Type :=
  | bool_nil
  | bool_cons (b : bool) (l : boollist).

Fixpoint boollist_length (l : boollist) : nat :=
  match l with
  | bool_nil ⇒ 0
  | bool_cons _ l' ⇒ 1 + boollist_length l'
  end.

Compute (boollist_length (bool_cons true (bool_cons true (bool_cons false bool_nil)))). (* ==> 3 *)

End MonoList.
... but this would quickly become tedious, partly because we have to make up different constructor names for each datatype, but mostly because we wouldn't get any reuse: if you're computing the length of a list, it doesn't matter what type the list holds... so why write the function more than once?
To avoid all this repetition, Coq supports polymorphic (Greek: "poly-" many, "morphos" form, shape) inductive type definitions. For example, here is a polymorphic list datatype.
Inductive list (X:Type) : Type :=
  | nil
  | cons (x : X) (l : list X).
This is exactly like the definition of natlist above, except that the nat argument to the cons constructor has been replaced by an arbitrary type X, a binding for X has been added to the function header on the first line, and the occurrences of natlist in the types of the constructors have been replaced by list X.
What sort of thing is list itself? A good way to think about it is that the definition of list is a function from Types to Inductive definitions; or, to put it more concisely, list is a function from Types to Types. For any particular type X, the type list X is the Inductively defined set of lists whose elements are of type X.
Another name for a function that produces Types is "type constructor": list is a type constructor; if you give it a type X, then list X will "construct" the type of lists holding Xs.
Check list : Type Type.
The parameter X in the definition of list automatically becomes a parameter to the constructors nil and cons -- that is, nil and cons are now polymorphic constructors; when we use them, we must now provide a first argument that is the type of the list they are building. For example, nil nat constructs the empty list of type nat.
Check (nil nat) : list nat.
Similarly, cons nat adds an element of type nat to a list of type list nat. Here is an example of forming a list containing just the natural number 3.
Check (cons nat 3 (nil nat)) : list nat.
What might the type of nil itself be? We can tell from Check (nil nat) above that nil takes a Type as input and produces a list as input.
So you might say that nil : Type list _. But what's _? Well, it's whatever Type was provided as nil's first argument!
For nil to refer back to its type argument, we need a new feature: types. If e : (X:T1), T2, then e takes an X of type T1 and produces a value of type T2.
Check nil : X : Type, list X.
That is, nil takes a Type -- call it X -- and produces a value of type list X. The type of the return value depends on the type given as input.
Similarly, the type of cons from the definition looks like X list X list X, but using this convention to explain the meaning of X results in the type X, X list X list X.
Check cons : X : Type, X list X list X.
(Side note on notation: In .v files, the "forall" quantifier is spelled out in letters. In the generated HTML files and in the way various IDEs show .v files, depending on the settings of their display controls, is usually typeset as the usual mathematical "upside down A," though you'll still see the spelled-out "forall" in a few places. This is just a quirk of typesetting: there is no difference in meaning.)
Having to supply a type argument for every single use of a list constructor would be rather burdensome; we will soon see ways of reducing this annotation burden.
Check (cons nat 2 (cons nat 1 (nil nat)))
      : list nat.
We can now go back and make polymorphic versions of all the list-processing functions that we wrote before. Here is repeat, for example:
Fixpoint length (X : Type) (l : list X) : nat :=
  match l with
  | nil _ ⇒ 0
  | cons _ _ l' ⇒ 1 + length X l'
  end.

Compute length nat (cons nat 1 (cons nat 2 (cons nat 0 (nil nat)))). (* ==> 3 *)
Compute length bool (cons bool true (cons bool true (cons bool false (nil bool)))). (* ==> 3 *)

Check (length nat).
Check (length base).
Check length.

Fixpoint repeat (X : Type) (x : X) (count : nat) : list X :=
  match count with
  | 0 ⇒ nil X
  | S count'cons X x (repeat X x count')
  end.

Compute (repeat nat 4 2).

Compute (repeat bool false 1).

Check (repeat bool false).
Check (repeat bool).
Check (repeat day).
Check repeat.

Type Annotation Inference

Let's write the definition of repeat again, but this time we won't specify the types of any of the arguments. Will Coq still accept it?
Fixpoint repeat' X x count : list X :=
  match count with
  | 0 ⇒ nil X
  | S count'cons X x (repeat' X x count')
  end.
Indeed it will. Let's see what type Coq has assigned to repeat':
Check repeat'
  : X : Type, X nat list X.
Check repeat
  : X : Type, X nat list X.
It has exactly the same type as repeat. Coq was able to use type inference to deduce what the types of X, x, and count must be, based on how they are used. For example, since X is used as an argument to cons, it must be a Type, since cons expects a Type as its first argument; matching count with 0 and S means it must be a nat; and so on.
This powerful facility means we don't always have to write explicit type annotations everywhere, although explicit type annotations can still be quite useful as documentation and sanity checks, so we will continue to use them much of the time.

Type Argument Synthesis

To use a polymorphic function, we need to pass it one or more types in addition to its other arguments. For example, the recursive call in the body of the repeat function above must pass along the type X. But since the second argument to repeat is an element of X, it seems entirely obvious that the first argument can only be X -- why should we have to write it explicitly?
Fortunately, Coq permits us to avoid this kind of redundancy. In place of any type argument we can write a "hole" _, which can be read as "Please try to figure out for yourself what belongs here." More precisely, when Coq encounters a _, it will attempt to unify all locally available information -- the type of the function being applied, the types of the other arguments, and the type expected by the context in which the application appears -- to determine what concrete type should replace the _.
This may sound similar to type annotation inference -- and, indeed, the two procedures rely on the same underlying mechanisms. Instead of simply omitting the types of some arguments to a function, like
      repeat' X x count : list X := we can also replace the types with holes
      repeat' (X : _) (x : _) (count : _) : list X := to tell Coq to attempt to infer the missing information.
Using holes, the repeat function can be written like this:
Fixpoint repeat'' X x count : list X :=
  match count with
  | 0 ⇒ nil _
  | S count'cons _ x (repeat'' _ x count')
  end.
In this instance, we don't save much by writing _ instead of X. But in many cases the difference in both keystrokes and readability is nontrivial. For example, suppose we want to write down a list containing the numbers 1, 2, and 3. Instead of this...
Definition list123 :=
  cons nat 1 (cons nat 2 (cons nat 3 (nil nat))).
...we can use holes to write this:
Definition list123' :=
  cons _ 1 (cons _ 2 (cons _ 3 (nil _))).

Implicit Arguments

In fact, we can go further and even avoid writing _'s in most cases by telling Coq always to infer the type argument(s) of a given function.
The Arguments directive specifies the name of the function (or constructor) and then lists its argument names, with curly braces around any arguments to be treated as implicit.
Arguments nil {X}.
Arguments cons {X} x l.
Arguments length {X} l.
Arguments repeat {X} x count.
Now, we don't have to supply type arguments at all:
Definition list123'' := cons 1 (cons 2 (cons 3 nil)).
Alternatively, we can declare an argument to be implicit when defining the function itself, by surrounding it in curly braces instead of parens. For example:
Fixpoint repeat''' {X : Type} (x : X) (count : nat) : list X :=
  match count with
  | 0 ⇒ nil
  | S count'cons x (repeat''' x count')
  end.
Let's finish by implementing a few other standard list functions on our new polymorphic lists.
Fixpoint app {X : Type} (l1 l2 : list X)
             : (list X) :=
  match l1 with
  | nill2
  | cons h tcons h (app t l2)
  end.

Compute (app nil (cons 1 (cons 2 nil))).
Compute (app (cons 2 (cons 7 nil)) nil).
Compute (app (cons 2 (cons 7 nil)) (cons 3 (cons 1 (cons 0 nil)))).

Fixpoint rev {X:Type} (l:list X) : list X :=
  match l with
  | nilnil
  | cons h tapp (rev t) (cons h nil)
  end.

Compute (rev (cons 1 (cons 2 nil))).
Compute (rev (cons true nil)).

Exercise: 2 stars, standard (allb)

Complete the definition of allb : list bool bool. It should return true if every bool in the list is true.
What should you return if the list is empty? It's a question of convention, but here's a hint: your match shouldn't do anything special for single-element lists.
Fixpoint allb (l : list bool) : bool (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute (allb nil).
Compute (allb (cons true nil)).
Compute (allb (cons true (cons false nil))).

Exercise: 2 stars, standard (sum)

Complete the definition of sum : list nat nat. It should sum up a list.
What should you return if the list is empty? It's a question of convention, but the same hint applies: your match should only have two cases.
Fixpoint sum (l : list nat) : nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute (sum (cons 12 (cons 11 (cons 12 (cons 12 nil))))). (* ==> chirp chirp *)

Exercise: 3 stars, standard (zero_to_n)

Write a function zero_to_n that builds a list from 0 to n, inclusive. Coq's termination requirements will get in your way here.
There are two approaches: either write a helper function or post-process the list.
Why not use app? Well, appending each element to the end as a new list means you need to visit the whole list n times (why?), whereas you can make this function work in just one trip through the list (or zero)!
Coq tip: you can define helper functions externally using Fixpoint or Definition, but you can also define them inline using let. You can write:
[ let fix foo (m : TYPE) := ... in ... ]
to mean:
[ let foo = fix foo (m : TYPE) := ... in ... ]
but NB that you can't write the return type.
Definition zero_to_n (n : nat) : list nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute zero_to_n 0.
Compute zero_to_n 5.

Supplying Type Arguments Explicitly

One small problem with declaring arguments Implicit is that, once in a while, Coq does not have enough local information to determine a type argument; in such cases, we need to tell Coq that we want to give the argument explicitly just this time. For example, suppose we write this:
Fail Definition mynil := nil.
(The Fail qualifier that appears before Definition can be used with any command, and is used to ensure that that command indeed fails when executed. If the command does fail, Coq prints the corresponding error message, but continues processing the rest of the file.)
Here, Coq gives us an error because it doesn't know what type argument to supply to nil. We can help it by providing an explicit type declaration (so that Coq has more information available when it gets to the "application" of nil):
Definition mynil : list nat := nil.
Alternatively, we can force the implicit arguments to be explicit by prefixing the function name with @.
Check @nil : X : Type, list X.

Definition mynil' := @nil nat.

Polymorphic Pairs

Lists are a very common data structure---whole programming languages use them as their only data structures (the "Lisp" family comes from "LISt Programming").
But a programmer needs plenty of data structures in their toolbelt, some more complex, some less. It's often useful to have a fixed, finite number of things. can be generalized to polymorphic pairs, often called products or tuples by mathy folks:
Inductive prod (X Y : Type) : Type :=
| pair (x : X) (y : Y).

Arguments pair {X} {Y} _ _.
As with lists, we make the type arguments implicit and define the a nice concrete notation.
Notation "( x , y )" := (pair x y).
We can also use the Notation mechanism to define the standard notation for product types:
Notation "X * Y" := (prod X Y) : type_scope.
(The annotation : type_scope tells Coq that this abbreviation should only be used when parsing types, not when parsing expressions. This avoids a clash with the multiplication symbol.)
It is easy at first to get (x,y) and X×Y confused. Remember that (x,y) is a value built from two other values, while X×Y is a type built from two other types. If x has type X and y has type Y, then (x,y) has type X×Y.
The first and second projection functions extract the parts of pairs.
Definition fst {X Y : Type} (p : X × Y) : X :=
  match p with
  | (x, y)x
  end.

Definition snd {X Y : Type} (p : X × Y) : Y :=
  match p with
  | (x, y)y
  end.

Exercise: 2 stars, standard, optional (currying)

In Coq, a function f : A B C really has the type A (B C). That is, if you give f a value of type A, it will give you function f' : B C. If you then give f' a value of type B, it will return a value of type C. This allows for partial application, as in plus3. Processing a list of arguments with functions that return functions is called currying, in honor of the logician Haskell Curry.
Conversely, we can reinterpret the type A B C as (A × B) C. This is called uncurrying. With an uncurried binary function, both arguments must be given at once as a pair; there is no partial application. Concretely, currying says that it doesn't matter if you take your arguments all at once---like a math function---or if you do it one at a time.
Suppose we have the math-y function, f:
f : nat × natnat

f(x,y) = 2*x + y
Notice how the math notation has you take both x and y at once. You could write it differently, though mathematicians may squint at you:
f' : nat → (natnat)
f'(x) = g_x(y)

g_x : natnat
g_x(y) = 2*x + y
The subscript on g, written here as g_x, conveys the idea that g is "closed over" x. Here's an example:
   f(20, 7) = 2*20 + 7 = 40 + 7 = 47

   f'(20) = g_20

   g_20(7) = 2*20 + 7 = 40 + 7 = 47
Functional programming languages like Gallina typically use curried functions; most other languages use uncurried functions.
We can define currying as follows:
Definition prod_curry {X Y Z : Type}
  (f : X × Y Z) (x : X) (y : Y) : Z := f (x, y).
As an exercise, define its inverse, prod_uncurry.
Definition prod_uncurry {X Y Z : Type}
  (f : X Y Z) (p : X × Y) : Z
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Check @prod_curry.
Check @prod_uncurry.
(* Do not modify the following line: *)
Definition manual_grade_for_currying : option (nat×string) := None.

Polymorphic Options

Our last polymorphic type for now is polymorphic options. You can think of an option as a list with exactly zero or exactly one element.
Maybe you've programming in languages that have a 'null' or 'None' value. Gallina has no built-in notion of 'null', but options let us express the idea that a value is "nullable" or "optional".
(We put the definition inside a module because the standard library already defines option and it's this one that we want to use below.)
Module OptionPlayground.

Inductive option (X:Type) : Type :=
  | Some (x : X)
  | None.

Arguments Some {X} _.
Arguments None {X}.

End OptionPlayground.
We can now write the nth_error function, which tries to get the nth element of a list.
Fixpoint nth_error {X : Type} (l : list X) (n : nat)
                   : option X :=
  match l with
  | nilNone
  | cons a l'if eqb n O then Some a else nth_error l' (pred n)
  end.

Compute (nth_error (cons 4 (cons 5 (cons 6 (cons 7 nil)))) 0). (* ==> Some 4 *)
Compute (nth_error (cons (cons monday nil) (cons (cons tuesday nil) nil)) 1). (* ==> Some (cons tuesday nil) *)
Compute (nth_error (cons C nil) 2). (* ==> None *)

Exercise: 1 star, standard (hd_error)

Complete the definition of a polymorphic version of the hd_error function. It should return the first element of the list, if possible. Don't just call nth_error!
Definition hd_error {X : Type} (l : list X) : option X
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Once again, to force the implicit arguments to be explicit, we can use @ before the name of the function.
Check @hd_error : X : Type, list X option X.

Compute (hd_error (cons 1 (cons 2 nil))). (* ==> Some 1 *)
Compute (hd_error (cons (cons 1 nil) (cons (cons 2 nil) nil))). (* ==> Some (cons 1 nil) *)

Fancier data structures

There are many, many other kinds of data structures. We'll define just one more in this course: trees.
Inductive bt (X : Type) : Type :=
| empty
| node (l : bt X) (v : X) (r : bt X).

Arguments empty {X}.
Arguments node {X} l v r.
If you've studied plant biology, the word 'node' may make you think of trees... but otherwise, what's going on here? What do these structures have to do with trees?
When you encounter a definition you don't understand, you need to play with it. So let's get playing!
Check empty.
Check @empty nat.
Check (node empty true empty).
Check (node empty 5 empty).
Check (node empty G (node empty C empty)).
Check (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty)).
Okay, those are all well typed. But... what are they? Let's adopt a graph notation, where each constructor is a vertex in our graph and we draw edges from constructors to their arguments. Here are each of those examples in order.
Gallina term:
  empty as a graph:
  empty
Gallina term:
  @empty nat as a graph:
  empty
Gallina term:
  node empty true empty as a graph:
       node
       /|\
      / | \
     /  |  \
    /   |   \
empty  true  empty
Gallina term:
  node empty 5 empty as a graph:
     node
     /|\
    / | \
   /  |  \
empty 5  empty
Gallina term:
  node empty G (node empty C empty) as a graph:
       node
       /|\
      / | \
     /  |  \
empty   G  node
           /|\
          / | \
         /  |  \
     empty  C   empty
Gallina term:
  node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty) as a graph:
                 node
                 /|\
                / | \
               /  |  \
           node   25  node
           /|\        /|\
      empty | empty  / | \
            12      /  |  \
                 node  5   empty
                 /|\
                / | \
            empty 5  empty
These graph notations can get a little heavy, so if it's unambiguous, people will leave off the constructors. Here's the last one in simpler notation:
Gallina term:
  node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty) as a graph:
                 25
                 /\
                /  \
               /    \
              12     5
              /\    / \
                   5
                  /\
Note that we write nothing for empty and we write the value v for a node.
Some terminology: a node has two children, l and r. A node is a leaf node when both of its children are empty.
The topmost node is called the root. This means that computer scientists draw trees upside down. Whoops! Too late to change now.

Exercise: 4 stars, standard (bt_functions)

Let's conclude by writing some functions on binary trees. We'll compute their size (the number of nodes), determine whether they're a leaf or not with is_leaf, count their leaves with num_leaves.
Fixpoint size {X : Type} (t : bt X) : nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Definition is_leaf {X : Type} (t : bt X) : bool (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Fixpoint num_leaves {X : Type} (t : bt X) : nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

(* 2021-09-28 15:37 *)