PolyPolymorphism and Higher-Order Functions


From DMFP Require Export Lists.

Overview

In this chapter we continue our development of basic concepts of functional programming. The critical new ideas are polymorphism (abstracting functions over the types of the data they manipulate) and higher-order functions (treating functions as data). We begin with polymorphism.
Along the way, we'll talk about sets, an absolutely fundamental notion in mathematics (discrete and otherwise). We'll also introduce what will be a long running case study in the course: bioinformatics and computationally manipulating DNA.

Polymorphic Lists

For the last chapter, we've been working with lists containing just numbers. Obviously, interesting programs also need to be able to manipulate lists with elements from other types -- lists of booleans, lists of lists, etc. We could just define a new inductive datatype for each of these, for example...

Inductive boollist : Type :=
  | bool_nil
  | bool_cons (b : bool) (l : boollist).
... but this would quickly become tedious, partly because we have to make up different constructor names for each datatype, but mostly because we would also need to define new versions of all our list manipulating functions (length, rev, etc.) and all their properties (rev_length, app_assoc, etc.) for each new datatype definition.
To avoid all this repetition, Coq supports polymorphic inductive type definitions. For example, here is a polymorphic list datatype.

Inductive list (X:Type) : Type :=
  | nil
  | cons (x : X) (l : list X).
This is exactly like the definition of natlist from the previous chapter, except that the nat argument to the cons constructor has been replaced by an arbitrary type X, a binding for X has been added to the function header on the first line, and the occurrences of natlist in the types of the constructors have been replaced by list X.
What sort of thing is list itself? A good way to think about it is that the definition of list is a function from Types to Inductive definitions; or, to put it more concisely, list is a function from Types to Types. For any particular type X, the type list X is the Inductively defined set of lists whose elements are of type X.

Check list : Type Type.
The parameter X in the definition of list automatically becomes a parameter to the constructors nil and cons -- that is, nil and cons are now polymorphic constructors; when we use them, we must now provide a first argument that is the type of the list they are building. For example, nil nat constructs the empty list of type nat.

Check (nil nat) : list nat.
Similarly, cons nat adds an element of type nat to a list of type list nat. Here is an example of forming a list containing just the natural number 3.

Check (cons nat 3 (nil nat)) : list nat.
What might the type of nil be? We can read off the type list X from the definition, but this omits the binding for X which is the parameter to list. Type list X does not explain the meaning of X. (X : Type) list X comes closer. Coq's notation for this situation is X : Type, list X.

Check nil : X : Type, list X.
Similarly, the type of cons from the definition looks like X list X list X, but using this convention to explain the meaning of X results in the type X, X list X list X.

Check cons : X : Type, X list X list X.
(Side note on notation: In .v files, the "forall" quantifier is spelled out in letters. In the generated HTML files and in the way various IDEs show .v files, depending on the settings of their display controls, is usually typeset as the usual mathematical "upside down A," though you'll still see the spelled-out "forall" in a few places. This is just a quirk of typesetting: there is no difference in meaning.)
Having to supply a type argument for every single use of a list constructor would be rather burdensome; we will soon see ways of reducing this annotation burden.

Check (cons nat 2 (cons nat 1 (nil nat)))
      : list nat.
We can now go back and make polymorphic versions of all the list-processing functions that we wrote before. Here is repeat, for example:

Fixpoint repeat (X : Type) (x : X) (count : nat) : list X :=
  match count with
  | 0 ⇒ nil X
  | S count'cons X x (repeat X x count')
  end.
As with nil and cons, we can use repeat by applying it first to a type and then to an element of this type (and a number):

Example test_repeat1 :
  repeat nat 4 2 = cons nat 4 (cons nat 4 (nil nat)).
Proof. reflexivity. Qed.
To use repeat to build other kinds of lists, we simply instantiate it with an appropriate type parameter:

Example test_repeat2 :
  repeat bool false 1 = cons bool false (nil bool).
Proof. reflexivity. Qed.

Type Annotation Inference

Let's write the definition of repeat again, but this time we won't specify the types of any of the arguments. Will Coq still accept it?

Fixpoint repeat' X x count : list X :=
  match count with
  | 0 ⇒ nil X
  | S count'cons X x (repeat' X x count')
  end.
Indeed it will. Let's see what type Coq has assigned to repeat':

Check repeat'
  : X : Type, X nat list X.
Check repeat
  : X : Type, X nat list X.
It has exactly the same type as repeat. Coq was able to use type inference to deduce what the types of X, x, and count must be, based on how they are used. For example, since X is used as an argument to cons, it must be a Type, since cons expects a Type as its first argument; matching count with 0 and S means it must be a nat; and so on.
This powerful facility means we don't always have to write explicit type annotations everywhere, although explicit type annotations can still be quite useful as documentation and sanity checks, so we will continue to use them much of the time.

Type Argument Synthesis

To use a polymorphic function, we need to pass it one or more types in addition to its other arguments. For example, the recursive call in the body of the repeat function above must pass along the type X. But since the second argument to repeat is an element of X, it seems entirely obvious that the first argument can only be X -- why should we have to write it explicitly?
Fortunately, Coq permits us to avoid this kind of redundancy. In place of any type argument we can write a "hole" _, which can be read as "Please try to figure out for yourself what belongs here." More precisely, when Coq encounters a _, it will attempt to unify all locally available information -- the type of the function being applied, the types of the other arguments, and the type expected by the context in which the application appears -- to determine what concrete type should replace the _.
This may sound similar to type annotation inference -- and, indeed, the two procedures rely on the same underlying mechanisms. Instead of simply omitting the types of some arguments to a function, like
      repeat' X x count : list X :=
we can also replace the types with holes
      repeat' (X : _) (x : _) (count : _) : list X :=
to tell Coq to attempt to infer the missing information.
Using holes, the repeat function can be written like this:

Fixpoint repeat'' X x count : list X :=
  match count with
  | 0 ⇒ nil _
  | S count'cons _ x (repeat'' _ x count')
  end.
In this instance, we don't save much by writing _ instead of X. But in many cases the difference in both keystrokes and readability is nontrivial. For example, suppose we want to write down a list containing the numbers 1, 2, and 3. Instead of this...

Definition list123 :=
  cons nat 1 (cons nat 2 (cons nat 3 (nil nat))).
...we can use holes to write this:

Definition list123' :=
  cons _ 1 (cons _ 2 (cons _ 3 (nil _))).

Implicit Arguments

In fact, we can go further and even avoid writing _'s in most cases by telling Coq always to infer the type argument(s) of a given function.
The Arguments directive specifies the name of the function (or constructor) and then lists its argument names, with curly braces around any arguments to be treated as implicit.

Arguments nil {X}.
Arguments cons {X} x l.
Arguments repeat {X} x count.
Now, we don't have to supply type arguments at all:

Definition list123'' := cons 1 (cons 2 (cons 3 nil)).
Alternatively, we can declare an argument to be implicit when defining the function itself, by surrounding it in curly braces instead of parens. For example:

Fixpoint repeat''' {X : Type} (x : X) (count : nat) : list X :=
  match count with
  | 0 ⇒ nil
  | S count'cons x (repeat''' x count')
  end.
(Note that we didn't even have to provide a type argument to the recursive call to repeat'''. Indeed, it would be invalid to provide one, because Coq is not expecting it.)
We will use the latter style whenever possible, but we will continue to use explicit Argument declarations for Inductive constructors. The reason for this is that marking the parameter of an inductive type as implicit causes it to become implicit for the type itself, not just for its constructors. For instance, consider the following alternative definition of the list type:

Inductive list' {X:Type} : Type :=
  | nil'
  | cons' (x : X) (l : list').
Because X is declared as implicit for the entire inductive definition including list' itself, we now have to write just list' whether we are talking about lists of numbers or booleans or anything else, rather than list' nat or list' bool or whatever; this is a step too far.
Let's finish by re-implementing a few other standard list functions on our new polymorphic lists...

Fixpoint app {X : Type} (l1 l2 : list X)
             : (list X) :=
  match l1 with
  | nill2
  | cons h tcons h (app t l2)
  end.

Fixpoint rev {X:Type} (l:list X) : list X :=
  match l with
  | nilnil
  | cons h tapp (rev t) (cons h nil)
  end.

Fixpoint length {X : Type} (l : list X) : nat :=
  match l with
  | nil ⇒ 0
  | cons _ l'S (length l')
  end.

Example test_rev1 :
  rev (cons 1 (cons 2 nil)) = (cons 2 (cons 1 nil)).
Proof. reflexivity. Qed.

Example test_rev2:
  rev (cons true nil) = cons true nil.
Proof. reflexivity. Qed.

Example test_length1: length (cons 1 (cons 2 (cons 3 nil))) = 3.
Proof. reflexivity. Qed.

Supplying Type Arguments Explicitly

One small problem with declaring arguments Implicit is that, once in a while, Coq does not have enough local information to determine a type argument; in such cases, we need to tell Coq that we want to give the argument explicitly just this time. For example, suppose we write this:

Fail Definition mynil := nil.
(The Fail qualifier that appears before Definition can be used with any command, and is used to ensure that that command indeed fails when executed. If the command does fail, Coq prints the corresponding error message, but continues processing the rest of the file.)
Here, Coq gives us an error because it doesn't know what type argument to supply to nil. We can help it by providing an explicit type declaration (so that Coq has more information available when it gets to the "application" of nil):

Definition mynil : list nat := nil.
Alternatively, we can force the implicit arguments to be explicit by prefixing the function name with @.

Check @nil : X : Type, list X.

Definition mynil' := @nil nat.
Using argument synthesis and implicit arguments, we can define convenient notation for lists, as before. Since we have made the constructor type arguments implicit, Coq will know to automatically infer these when we use the notations.

Notation "x :: y" := (cons x y)
                     (at level 60, right associativity).
Notation "[ ]" := nil.
Notation "[ x ; .. ; y ]" := (cons x .. (cons y []) ..).
Notation "x ++ y" := (app x y)
                     (at level 60, right associativity).
Now lists can be written just the way we'd hope:

Definition list123''' := [1; 2; 3].

Exercises

Exercise: 2 stars, standard, optional (poly_exercises)

Here are a few simple exercises, just like ones in the Lists chapter, for practice with polymorphism. Complete the proofs below.

Theorem app_nil_r : (X:Type), l:list X,
  l ++ [] = l.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem app_assoc : A (l m n:list A),
  l ++ m ++ n = (l ++ m) ++ n.
Proof.
  (* FILL IN HERE *) Admitted.

Lemma app_length : (X:Type) (l1 l2 : list X),
  length (l1 ++ l2) = length l1 + length l2.
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 2 stars, standard, optional (more_poly_exercises)

Here are some slightly more interesting ones...

Theorem rev_app_distr: X (l1 l2 : list X),
  rev (l1 ++ l2) = rev l2 ++ rev l1.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem rev_involutive : X : Type, l : list X,
  rev (rev l) = l.
Proof.
  (* FILL IN HERE *) Admitted.

Polymorphic Pairs

Following the same pattern, the definition for pairs of numbers that we gave in the last chapter can be generalized to polymorphic pairs, often called products:

Inductive prod (X Y : Type) : Type :=
| pair (x : X) (y : Y).

Arguments pair {X} {Y} _ _.
As with lists, we make the type arguments implicit and define the familiar concrete notation.

Notation "( x , y )" := (pair x y).
We can also use the Notation mechanism to define the standard notation for product types:

Notation "X * Y" := (prod X Y) : type_scope.
(The annotation : type_scope tells Coq that this abbreviation should only be used when parsing types, not when parsing expressions. This avoids a clash with the multiplication symbol.)
It is easy at first to get (x,y) and X×Y confused. Remember that (x,y) is a value built from two other values, while X×Y is a type built from two other types. If x has type X and y has type Y, then (x,y) has type X×Y.
The first and second projection functions now look pretty much as they would in any functional programming language.

Definition fst {X Y : Type} (p : X × Y) : X :=
  match p with
  | (x, y)x
  end.

Definition snd {X Y : Type} (p : X × Y) : Y :=
  match p with
  | (x, y)y
  end.
The following function takes two lists and combines them into a list of pairs. In other functional languages, it is often called zip; we call it combine for consistency with Coq's standard library.

Fixpoint combine {X Y : Type} (lx : list X) (ly : list Y)
           : list (X×Y) :=
  match lx, ly with
  | [], _[]
  | _, [][]
  | x :: tx, y :: ty(x, y) :: (combine tx ty)
  end.

Exercise: 1 star, standard, optional (combine_checks)

Try answering the following questions on paper and checking your answers in Coq:
  • What is the type of combine (i.e., what does Check @combine print?)
  • What does
            Compute (combine [1;2] [false;false;true;true]).
    print?

Exercise: 2 stars, standard, recommended (split)

The function split is the right inverse of combine: it takes a list of pairs and returns a pair of lists. In many functional languages, it is called unzip.
Fill in the definition of split below. Make sure it passes the given unit test.

Fixpoint split {X Y : Type} (l : list (X×Y))
               : (list X) × (list Y)
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Example test_split:
  split [(1,false);(2,false)] = ([1;2],[false;false]).
Proof.
(* FILL IN HERE *) Admitted.

Polymorphic Options

Our last polymorphic type for now is polymorphic options, which generalize natoption from the previous chapter. (We put the definition inside a module because the standard library already defines option and it's this one that we want to use below.)

Module OptionPlayground.

Inductive option (X:Type) : Type :=
  | Some (x : X)
  | None.

Arguments Some {X} _.
Arguments None {X}.

End OptionPlayground.
We can now rewrite the nth_error function so that it works with any type of lists.

Fixpoint nth_error {X : Type} (l : list X) (n : nat)
                   : option X :=
  match l with
  | []None
  | a :: l'if eqb n O then Some a else nth_error l' (pred n)
  end.

Example test_nth_error1 : nth_error [4;5;6;7] 0 = Some 4.
Proof. reflexivity. Qed.
Example test_nth_error2 : nth_error [[1];[2]] 1 = Some [2].
Proof. reflexivity. Qed.
Example test_nth_error3 : nth_error [true] 2 = None.
Proof. reflexivity. Qed.

Exercise: 1 star, standard, optional (hd_error_poly)

Complete the definition of a polymorphic version of the hd_error function from the last chapter. Be sure that it passes the unit tests below.

Definition hd_error {X : Type} (l : list X) : option X
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Once again, to force the implicit arguments to be explicit, we can use @ before the name of the function.

Check @hd_error : X : Type, list X option X.

Example test_hd_error1 : hd_error [1;2] = Some 1.
 (* FILL IN HERE *) Admitted.
Example test_hd_error2 : hd_error [[1];[2]] = Some [1].
 (* FILL IN HERE *) Admitted.

Functions as Data

Like most modern programming languages -- especially other "functional" languages, including OCaml, Haskell, Racket, Scala, Clojure, etc. -- Coq treats functions as first-class citizens, allowing them to be passed as arguments to other functions, returned as results, stored in data structures, etc.

Higher-Order Functions

Functions that manipulate other functions are often called higher-order functions. Here's a simple one:

Definition doit3times {X:Type} (f:XX) (n:X) : X :=
  f (f (f n)).
The argument f here is itself a function (from X to X); the body of doit3times applies f three times to some value n.

Check @doit3times : X : Type, (X X) X X.

Example test_doit3times: doit3times minustwo 9 = 3.
Proof. reflexivity. Qed.

Example test_doit3times': doit3times negb true = false.
Proof. reflexivity. Qed.
One common use of higher-order functions is to parameterize a definition with a notion of cost. argmin3 implements the mathematical argmin operation that chooses the argument with the lowest cost.

Definition argmin3 {A:Type} (cost : A nat) (o1 o2 o3 : A) : A :=
  let c1 := cost o1 in
  let c2 := cost o2 in
  let c3 := cost o3 in
  if leb c1 c2
  then if leb c1 c3
       then o1
       else o3
  else if leb c2 c3
       then o2
       else o3.

Filter

Here is a more useful higher-order function, taking a list of Xs and a predicate on X (a function from X to bool) and "filtering" the list, returning a new list containing just those elements for which the predicate returns true.

Fixpoint filter {X:Type} (test: Xbool) (l:list X)
                : (list X) :=
  match l with
  | [][]
  | h :: tif test h then h :: (filter test t)
                        else filter test t
  end.
For example, if we apply filter to the predicate evenb and a list of numbers l, it returns a list containing just the even members of l.

Example test_filter1: filter evenb [1;2;3;4] = [2;4].
Proof. reflexivity. Qed.

Definition length_is_1 {X : Type} (l : list X) : bool :=
  eqb (length l) 1.

Example test_filter2:
    filter length_is_1
           [ [1; 2]; [3]; [4]; [5;6;7]; []; [8] ]
  = [ [3]; [4]; [8] ].
Proof. reflexivity. Qed.
We can use filter to give a concise version of the countoddmembers function from the Lists chapter.

Definition countoddmembers' (l:list nat) : nat :=
  length (filter oddb l).

Example test_countoddmembers'1: countoddmembers' [1;0;3;1;4;5] = 4.
Proof. reflexivity. Qed.
Example test_countoddmembers'2: countoddmembers' [0;2;4] = 0.
Proof. reflexivity. Qed.
Example test_countoddmembers'3: countoddmembers' nil = 0.
Proof. reflexivity. Qed.

Anonymous Functions

It is arguably a little sad, in the example just above, to be forced to define the function length_is_1 and give it a name just to be able to pass it as an argument to filter, since we will probably never use it again. Moreover, this is not an isolated example: when using higher-order functions, we often want to pass as arguments "one-off" functions that we will never use again; having to give each of these functions a name would be tedious.
Fortunately, there is a better way. We can construct a function "on the fly" without declaring it at the top level or giving it a name.

Example test_anon_fun':
  doit3times (fun nn × n) 2 = 256.
Proof. reflexivity. Qed.
The expression (fun n n × n) can be read as "the function that, given a number n, yields n × n."
Here is the filter example, rewritten to use an anonymous function.

Example test_filter2':
    filter (fun leqb (length l) 1)
           [ [1; 2]; [3]; [4]; [5;6;7]; []; [8] ]
  = [ [3]; [4]; [8] ].
Proof. reflexivity. Qed.

Exercise: 2 stars, standard (filter_even_gt7)

Use filter (instead of Fixpoint) to write a Coq function filter_even_gt7 that takes a list of natural numbers as input and returns a list of just those that are even and greater than 7.

Definition filter_even_gt7 (l : list nat) : list nat
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Example test_filter_even_gt7_1 :
  filter_even_gt7 [1;2;6;9;10;3;12;8] = [10;12;8].
 (* FILL IN HERE *) Admitted.

Example test_filter_even_gt7_2 :
  filter_even_gt7 [5;2;6;19;129] = [].
 (* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard (partition)

Use filter to write a Coq function partition:
      partition : X : Type,
                  (Xbool) → list Xlist X × list X
Given a set X, a predicate of type X bool and a list X, partition should return a pair of lists. The first member of the pair is the sublist of the original list containing the elements that satisfy the test, and the second is the sublist containing those that fail the test. The order of elements in the two sublists should be the same as their order in the original list.

Definition partition {X : Type}
                     (test : X bool)
                     (l : list X)
                   : list X × list X
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Example test_partition1: partition oddb [1;2;3;4;5] = ([1;3;5], [2;4]).
Proof. (* FILL IN HERE *) Admitted.

Example test_partition2: partition (fun xfalse) [5;9;0] = ([], [5;9;0]).
Proof. (* FILL IN HERE *) Admitted.

Map

Another handy higher-order function is called map.

Fixpoint map {X Y: Type} (f:XY) (l:list X) : (list Y) :=
  match l with
  | [][]
  | h :: t(f h) :: (map f t)
  end.
It takes a function f and a list l = [n1, n2, n3, ...] and returns the list [f n1, f n2, f n3,...] , where f has been applied to each element of l in turn. For example:

Example test_map1: map (fun xplus 3 x) [2;0;2] = [5;3;5].
Proof. reflexivity. Qed.
The element types of the input and output lists need not be the same, since map takes two type arguments, X and Y; it can thus be applied to a list of numbers and a function from numbers to booleans to yield a list of booleans:

Example test_map2:
  map oddb [2;1;2;5] = [false;true;false;true].
Proof. reflexivity. Qed.
It can even be applied to a list of numbers and a function from numbers to lists of booleans to yield a list of lists of booleans:

Example test_map3:
    map (fun n[evenb n;oddb n]) [2;1;2;5]
  = [[true;false];[false;true];[true;false];[false;true]].
Proof. reflexivity. Qed.

Exercises

Exercise: 1 star, standard (map_length)

Show that map preserves the length of lists.
Lemma map_length :
   A B (f:AB) (l:list A),
    length (map f l) = length l.
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard (map_rev)

Show that map and rev commute. You may need to define an auxiliary lemma.

Theorem map_rev : (X Y : Type) (f : X Y) (l : list X),
  map f (rev l) = rev (map f l).
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 2 stars, standard, recommended (flat_map)

The function map maps a list X to a list Y using a function of type X Y. We can define a similar function, flat_map, which maps a list X to a list Y using a function f of type X list Y. Your definition should work by 'flattening' the results of f, like so:
        flat_map (fun n ⇒ [n;n+1;n+2]) [1;5;10]
      = [1; 2; 3; 5; 6; 7; 10; 11; 12].
Fixpoint flat_map {X Y: Type} (f: X list Y) (l: list X)
                   : (list Y)
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Example test_flat_map1:
  flat_map (fun n[n;n;n]) [1;5;4]
  = [1; 1; 1; 5; 5; 5; 4; 4; 4].
 (* FILL IN HERE *) Admitted.
Lists are not the only inductive type for which map makes sense. Here is a map for the option type:

Definition option_map {X Y : Type} (f : X Y) (xo : option X)
                      : option Y :=
  match xo with
    | NoneNone
    | Some xSome (f x)
  end.

Exercise: 2 stars, standard (map_option)

We can also use lists and options together. Define a function map_option that takes a function f : X option Y and a list of Xs and returns a list of those Ys for which f didn't return None. Take X and Y as implicit arguments.
Prove (in Coq) a lemma named map_option_id showing that map_option (fun x Some x) l = l for all l.

Fixpoint map_option {X Y : Type} (f : X option Y) (l : list X) : list Y
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

(* Do not modify the following line: *)
Definition manual_grade_for_map_option_id : option (nat×string) := None.

Exercise: 2 stars, standard (map_option__map_option')

The flat_map function defined above is a little bit more general than map_option. Use it to implement map_option a different way.

Definition map_option' {X Y : Type} (f: X option Y) (l : list X) : list Y
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Lemma map_option__map_option' : {X Y : Type} (f : X option Y) (l : list X),
    map_option f l = map_option' f l.
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 2 stars, standard, optional (implicit_args)

The definitions and uses of filter and map use implicit arguments in many places. Replace the curly braces around the implicit arguments with parentheses, and then fill in explicit type parameters where necessary and use Coq to check that you've done so correctly. (This exercise is not to be turned in; it is probably easiest to do it on a copy of this file that you can throw away afterwards.)

Sets from lists

Sets are perhaps the most important mathematical concept after numbers. You've probably heard math-y people talk about them. What is a set?
  • A set is a collection of zero or more distinct elements.
Let's unpack that. A set is a collection, i.e., it's grouping or mass of things. A set has zero or more elements---that is, an empty set could have zero things in it while other sets could have many more. And the elements in a set are distinct, i.e., there's no notion of something occurring more than once in a set. A given element of a set occurs at most once, because having two occurrences of the same element in a set would mean those two elements wouldn't be distinct.
Mathematicians often think of sets in terms of a "is in" predicate, written ∈. For example, here's an axiom characerizing the empty set:
  • forall x, x is not in the empty set
In mathematical notation, we might write:
  • ∀x, x ∉ ∅
Here's a definition of subset relationship, written ⊆:
  • A is a subset of B precisely when forall x, if x is in A, then x is in B
Also written:
  • A ⊆ B iff ∀x, x ∈ A ⇒ x ∈ B
Here's a definition of the union operation, ∪:
  • The union of sets A and B is a set C such that forall x, x is an element of C if and only if x is an element f A or an element of B.
Also written:
  • A ∪ B = C iff ∀x, x ∈ C ⇔ x ∈ A ∨ x ∈ B
These axioms may not make sense to you yet. That's fine! We'll meet them again in Logic. For now, let's develop an operational understanding of sets by defining some of their operations.
We'll work with sets of natural numbers, implementing them as lists.

Definition natset : Type := list nat.

Definition empty_natset : natset := [].

Definition evens_to_ten : natset := [0; 2; 4; 6; 8; 10].
Definition evens_to_ten' : natset := [2; 4; 0; 10; 8; 6].
First, we'll define the "is in" predicate as a function member.

Fixpoint member (x : nat) (l : natset) : bool :=
  match l with
  | []false
  | y::l'eqb x y || member x l'
  end.

Exercise: 2 stars, standard (is_setlike)

Write a function is_setlike that determines whether or not a list of natural numbers is 'setlike', i.e., has no duplicates.
Fixpoint is_setlike (l : natset) : bool
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_is_setlike : option (nat×string) := None.
Here's an operation that inserts or adds an element to a set.
Definition insert (x : nat) (l : natset) : natset :=
  if member x l
  then l
  else x :: l.
Here's union, which combines two sets so that elements of either consituent set are members of the their union.

Fixpoint union (l1 : natset) (l2 : natset) : natset :=
  match l1 with
  | []l2
  | (x::l1') ⇒ insert x (union l1' l2)
  end.

Exercise: 3 stars, standard (intersection)

Define intersection. The intersection of sets A and B, written A ∩ B, is defined as containing those elements that are in both A and B.
Fixpoint intersection (l1 l2 : natset) : natset
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_intersection : option (nat×string) := None.

Exercise: 3 stars, standard (subset)

Now define the subset predicate. We say the set A is a subset of a set B when everything that's a member of A is a member of B.
Fixpoint subset (l1 l2 : natset) : bool
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_subset : option (nat×string) := None.

Definition remove (x : nat) (l : natset) : natset :=
  filter (fun ynegb (eqb x y)) l.

Case study: DNA strands and amino acids

A strand of DNA is just a list of bases. We might ask a variety of questions about DNA strands: are they equal or complementary?

Definition strand : Type := list base.

Definition eq_base (b1 b2 : base) : bool :=
  match (b1, b2) with
  | (C, C)true
  | (G, G)true
  | (A, A)true
  | (T, T)true
  | (_, _)false
  end.

Exercise: 1 star, standard (eq_base_refl)

Lemma eq_base_refl : (b : base),
    eq_base b b = true.
Proof.
  (* FILL IN HERE *) Admitted.

Fixpoint eq_strand (dna1 dna2 : strand) : bool :=
  match (dna1, dna2) with
  | ([], [])true
  | ([], _)false
  | (_, [])false
  | (b1 :: dna1', b2 :: dna2')
    eq_base b1 b2 && eq_strand dna1' dna2'
  end.

Exercise: 1 star, standard (eq_strand_refl)

Lemma eq_strand_refl : (dna : strand),
    eq_strand dna dna = true.
Proof.
  (* FILL IN HERE *) Admitted.

Fixpoint complementary (dna1 dna2 : strand) : bool :=
  match (dna1, dna2) with
  | ([], [])true
  | ([], _)false
  | (_, [])false
  | (b1 :: dna1', b2 :: dna2')
    eq_base b1 (complement b2) && complementary dna1' dna2'
  end.

Exercise: 2 stars, standard (complementary_complementary')

Define a different version of complementary that doesn't use recursion. (Hint: try using map!)
Prove that the two versions are equivalent.
Definition complementary' (dna1 dna2 : strand) : bool
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Lemma complementary_complementary' : dna1 dna2,
    complementary dna1 dna2 = complementary' dna1 dna2.
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 1 star, standard (complement_correct)

It's always good to check that your functions do what you think they do! Does our complementary predicate agree with our definition of complement?
Lemma complement_correct : (dna : strand),
    complementary dna (map complement dna) = true.
Proof.
  (* FILL IN HERE *) Admitted.
DNA (modeled as base) encodes, among other things, amino acids. Three DNA nucleotides encode a single codon: either an amino acid (like tryptophan (Trp) or glutamine (Glu)) or a sequence marker like STOP. These sequence of amino acids are formed into a protein by the ribosome.

Inductive codon : Type :=
| Ala
| Cys
| Asp
| Glu
| Phe
| Gly
| His
| Ile
| Lys
| Leu
| Met
| Asn
| Pro
| Gln
| Arg
| Ser
| Thr
| Val
| Trp
| Tyr
| STOP.
The truth is that the ribosome doesn't work on DNA, it works on RNA, which uses slightly different bases, replacing thymine (T) with uracil (U). To simplify things, we'll just work with DNA. Later on, we'll want some helper functions on codons, so we'll just define them now.

Definition is_amino (amino : codon) : bool :=
  match amino with
  | STOPfalse
  | _true
  end.

Fixpoint eq_codon (amino1 amino2 : codon) : bool :=
  match (amino1, amino2) with
  | (Ala, Ala)true
  | (Cys, Cys)true
  | (Asp, Asp)true
  | (Glu, Glu)true
  | (Phe, Phe)true
  | (Gly, Gly)true
  | (His, His)true
  | (Ile, Ile)true
  | (Lys, Lys)true
  | (Leu, Leu)true
  | (Met, Met)true
  | (Asn, Asn)true
  | (Pro, Pro)true
  | (Gln, Gln)true
  | (Arg, Arg)true
  | (Ser, Ser)true
  | (Thr, Thr)true
  | (Val, Val)true
  | (Trp, Trp)true
  | (Tyr, Tyr)true
  | (STOP, STOP)true
  | (_, _)false
  end.

Exercise: 3 stars, standard (encode_one)

Using the DNA codon table from Wikipedia(https://en.wikipedia.org/wiki/DNA_codon_table), implement a function that takes three nucleotides and returns the corresponding codon.
Programming is often tedious and repetitive. It pays to be careful and methodical when writing this long function.
Definition encode_one (b1 b2 b3 : base) : codon (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Example encode_one_eg1 :
  encode_one T A G = STOP.
(* FILL IN HERE *) Admitted.
(* 
Proof. reflexivity. Qed.
*)


Example encode_one_eg2 :
  encode_one G A T = Asp.
(* FILL IN HERE *) Admitted.
(* 
Proof. reflexivity. Qed.
*)


Example encode_one_eg3 :
  encode_one A T G = Met.
(* FILL IN HERE *) Admitted.
(* 
Proof. reflexivity. Qed.
*)


Lemma encode_rev_Ala : b1 b2 b3,
    encode_one b1 b2 b3 = Ala
    b1 = G b2 = C.
(* FILL IN HERE *) Admitted.
(* 
Proof.
  intros b1 b2 b3 H.
  (* Feel free to explore this proof, but there are TONS of
     cases. (Can you count how many?)  We offer the following proof
     script which might take a few seconds to run, but automatically
     discharges all of the obligations. Yikes! *)

  destruct b1; destruct b2; destruct b3; try discriminate; split; reflexivity.
Qed.
*)

Exercise: 2 stars, standard (encode)

Write a function that tries to translate a DNA strand into a list of codons. Since encode_one needs three bases to work, we'll run into a problem if we don't have the right number of nucleotides in the list. If you end up with a funny number of nucleotides (i.e., not a multiple of 3), you should return None.
Fixpoint encode (dna : strand) : option (list codon) (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Lemma encode_eg1 : encode [A] = None.
(* FILL IN HERE *) Admitted.
(* 
Proof. reflexivity. Qed.
*)


Lemma encode_eg2 : encode [A; T; G; C; G; T; T; A; T; T; A; G] = Some [Met; Arg; Tyr; STOP].
(* FILL IN HERE *) Admitted.
(* 
Proof. reflexivity. Qed.
*)

Case study: DNA edit distance

A key biological question of interest is similarity. How similar are two individuals of the same species? How similar are two species? When we see similarities or differences in expressed behavior (phenotype), can we trace these to corresponding similarities or differences in genetics (genotype)?
To ask these questions in a formal way, we need measures of similarity. One popular quantitative measure for any list-based data is called edit distance. Given two strands of DNA, how many edits do we need to make to get from one to the other?
For example, consider the following DNA sequences.

Definition dna_src : strand := [G; C; A; T].
Definition dna_tgt1 : strand := [T; C; A; T].
Definition dna_tgt2 : strand := [C; A; T].
Definition dna_tgt3 : strand := [C; A; T; G].
What edits might we need to make to get from dna_src to each of the dna_tgts?
To change dna_src into dna_tgt1, we should replace the first G, substituting it a T.
To change dna_src into dna_tgt2, we should delete the first G.
To change dna_src into dna_tgt3, we should delete the first G and add a G at the end.
We can formalize this idea explicitly: we'll define a type edit and say what it means to 'apply' an edit.
Here's what an edit is: you can either copy a nucleotide, delete a nucleotide, add a nucleotide, or substitute a nucleotide for what was already there.

Inductive edit : Type :=
| copy
| delete
| add (e : base)
| substitute (e : base).
These aren't the only edits we could have defined. For example, we don't need substitute, since we can always delete and then add (or vice versa). We could add a move edit that somehow said where to move the current base (i.e., in changing dna_src to dna_tgt3, we could say that that G moves to the end).
We've chosen these edits because they correspond to the edits invented by Vladimir Levenshtein in 1966, and are used to compute the widely used Levenshtein distance (see https://en.wikipedia.org/wiki/Levenshtein_distance). It's worth noting that this distance is very useful in computational applications in a variety of domains, but (according to folklore), Levenshtein didn't get to use computers at his Soviet institute!
In order to justify substitute's presence when add and delete would do, we define a notion of cost. It's free to copy, but every other edit has a cost of 1.

Definition cost (edit : edit) : nat :=
  match edit with
  | copy ⇒ 0
  | delete ⇒ 1
  | add _ ⇒ 1
  | substitute _ ⇒ 1
  end.
Given a list of edits, the cost is just the sum of the costs of every constituent edit.
Fixpoint total_cost (edits : list edit) : nat :=
  match edits with
  | [] ⇒ 0
  | e::edits'cost e + total_cost edits'
  end.
We've only given an intuition for edits. How do they actually work? We must define what it means to apply an edit. We'll do it in two parts: first, given an edit and a strand of DNA we're editing, apply_edit returns two things: first, an optional nucleotide which will appear at the front of the new, edited strand; and second, a (possibly modified) DNA strand that we're working on.

Definition apply_edit (edit : edit) (orig : strand) : option base × strand :=
  match edit with
  | copy
    match orig with
    | [](None, [])
    | b::orig'(Some b, orig')
    end
  | delete(None, match orig with
                     | [][]
                     | _::orig'orig'
                     end)
  | add b(Some b, orig)
  | substitute b(Some b, match orig with
                             | [][] (* just act like add *)
                             | _::orig'orig'
                             end)
  end.
It's worth paying close attention to this function, as there are several corner cases.
  • copy has two possibilities. Either the strand we're editing is done, in which there's nothing to add and nothing to continue with... or the strand has some base b at the front, which (a) we'll make sure to copy to the front of the new strand (Some b), and (b) we'll return the rest of the strand (orig').
  • delete is slightly simpler. We'll never add anything to the front (None), and we'll knock off the base at the front of the strand we're working with, returning whatever may be left (orig').
  • add b is the simplest case: add b to the front and leave the strand we're editing alone.
  • susbtitute b is trickier. We'll put b at the front no matter what (Some b), but what should we do if we're supposed to substitute b but the strand we're editing is empty? We choose to shrug and say, "That's fine, we'll pretend you meant add b and not worry about having nothing to substitute for." If, on the other hand, orig = b'::orig', then we'll ignore b' (which is what we substituted for) and give orig' to keep editing.
Once we know how to apply an individual edit, it's easy enough to apply a list of edits. We walk down the list and, for each edit, we see what should be added to the front (new) and what remains of the strand of DNA we're editing (orig').

Fixpoint apply_edits (orig : strand) (edits : list edit) : strand :=
  match edits with
  | []orig
  | edit::edits'
    let (new, orig') := apply_edit edit orig in
    match new with
    | Noneapply_edits orig' edits'
    | Some bb::apply_edits orig' edits'
    end
  end.
With a notion of edits in hand, let's verify that our formal model matches our intuition. Can we come up with the 'valid' edits that match our informal descriptions above?

Definition valid_edit (src : strand) (tgt : strand) (edits : list edit) :=
  eq_strand (apply_edits src edits) tgt = true.

Definition edit_tgt1 : list edit := [substitute T; copy; copy; copy].

Lemma valid_edit1 : valid_edit dna_src dna_tgt1 edit_tgt1.
Proof. reflexivity. Qed.

Definition edit_tgt1_worse : list edit := [delete; add T; copy; copy; copy].

Lemma valid_edit1_worse : valid_edit dna_src dna_tgt1 edit_tgt1_worse.
Proof. reflexivity. Qed.

Lemma edit1_worse_actually_worse :
  leb (total_cost edit_tgt1) (total_cost edit_tgt1_worse) = true.
Proof. reflexivity. Qed.

Definition edit_tgt1_same : list edit := [substitute T].

Lemma edit1_same :
  total_cost edit_tgt1 = total_cost edit_tgt1_same.
Proof. reflexivity. Qed.

Exercise: 3 stars, standard (edit_tgt23)

Write edits that take dna_src to dna_tgt2 and dna_tgt3. Your edits should be minimal, i.e., the lowest cost possible, while still being valid.

Definition edit_tgt2 : list edit (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Example edit_tgt2_valid :
  valid_edit dna_src dna_tgt2 edit_tgt2.
(* FILL IN HERE *) Admitted.
(* 
Proof. reflexivity. Qed.
*)


Definition edit_tgt3 : list edit (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Example edit_tgt3_valid :
  valid_edit dna_src dna_tgt3 edit_tgt3.
(* FILL IN HERE *) Admitted.
(* 
Proof. reflexivity. Qed.
*)


(* Do not modify the following line: *)
Definition manual_grade_for_edit_tgt23_minimal : option (nat×string) := None.

Exercise: 3 stars, standard (add_delete)

With our notion of edits in hand, we can contemplate defining algorithms that compute edits from one strand to another. Let's begin with the simplest one: the delete_add_edit.
To go from src to tgt, first delete everything in src and then add everything in tgt.
Now, delete_add_edit won't be minimal, but it's a place to start!
We'll define it in three parts: delete_edit takes a src strand and produces the correct number of delete edits; add_edit takes a tgt strand and produces the correct number of add edits with the right bases; and delete_add_edit combines the two.
NOTE: your solutions should:
(a) be only line each
(b) for delete_edit and add_edit, use the map function
(c) for delete_add_edit, should make use of delete_edit and add_edit.

Definition delete_edit (src : strand) : list edit
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Definition add_edit (tgt : strand) : list edit
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Definition delete_add_edit (src : strand) (tgt : strand) : list edit
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

(* Do not modify the following line: *)
Definition manual_grade_for_delete_edit : option (nat×string) := None.
(* Do not modify the following line: *)
Definition manual_grade_for_add_edit : option (nat×string) := None.
(* Do not modify the following line: *)
Definition manual_grade_for_delete_add_edit : option (nat×string) := None.

Functions That Construct Functions

Most of the higher-order functions we have talked about so far take functions as arguments. Let's look at some examples that involve returning functions as the results of other functions. To begin, here is a function that takes a value x (drawn from some type X) and returns a function from nat to X that yields x whenever it is called, ignoring its nat argument.

Definition constfun {X: Type} (x: X) : natX :=
  fun (k:nat) ⇒ x.

Definition ftrue := constfun true.

Example constfun_example1 : ftrue 0 = true.
Proof. reflexivity. Qed.

Example constfun_example2 : (constfun 5) 99 = 5.
Proof. reflexivity. Qed.
In fact, the multiple-argument functions we have already seen are also examples of passing functions as data. To see why, recall the type of plus.

Check plus : nat nat nat.
Each in this expression is actually a binary operator on types. This operator is right-associative, so the type of plus is really a shorthand for nat (nat nat) -- i.e., it can be read as saying that "plus is a one-argument function that takes a nat and returns a one-argument function that takes another nat and returns a nat." In the examples above, we have always applied plus to both of its arguments at once, but if we like we can supply just the first. This is called partial application.

Definition plus3 := plus 3.
Check plus3 : nat nat.

Example test_plus3 : plus3 4 = 7.
Proof. reflexivity. Qed.
Example test_plus3' : doit3times plus3 0 = 9.
Proof. reflexivity. Qed.
Example test_plus3'' : doit3times (plus 3) 0 = 9.
Proof. reflexivity. Qed.

Additional Exercises


Module Exercises.

Exercise: 2 stars, advanced (currying)

In Coq, a function f : A B C really has the type A (B C). That is, if you give f a value of type A, it will give you function f' : B C. If you then give f' a value of type B, it will return a value of type C. This allows for partial application, as in plus3. Processing a list of arguments with functions that return functions is called currying, in honor of the logician Haskell Curry.
Conversely, we can reinterpret the type A B C as (A × B) C. This is called uncurrying. With an uncurried binary function, both arguments must be given at once as a pair; there is no partial application. We can define currying as follows:

Definition prod_curry {X Y Z : Type}
  (f : X × Y Z) (x : X) (y : Y) : Z := f (x, y).
As an exercise, define its inverse, prod_uncurry. Then prove the theorems below to show that the two are inverses.

Definition prod_uncurry {X Y Z : Type}
  (f : X Y Z) (p : X × Y) : Z
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
As a (trivial) example of the usefulness of currying, we can use it to shorten one of the examples that we saw above:

Example test_map1': map (plus 3) [2;0;2] = [5;3;5].
Proof. reflexivity. Qed.
Thought exercise: before running the following commands, can you calculate the types of prod_curry and prod_uncurry?

Check @prod_curry.
Check @prod_uncurry.

Theorem uncurry_curry : (X Y Z : Type)
                        (f : X Y Z)
                        x y,
  prod_curry (prod_uncurry f) x y = f x y.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem curry_uncurry : (X Y Z : Type)
                        (f : (X × Y) Z) (p : X × Y),
  prod_uncurry (prod_curry f) p = f p.
Proof.
  (* FILL IN HERE *) Admitted.

End Exercises.

(* Mon Apr 6 09:16:55 PDT 2020 *)