Day05_lists

Require Export DMFP.Day04_structures.

Lists

Since lists are so important, we're going to spend a whole day working with them. We'll start by defining some nicer notation for lists, write some fancier functions, and then do a short bioinformatics case study.
Using argument synthesis and implicit arguments, we can define convenient notation for lists, as before. Since we have made the constructor type arguments implicit, Coq will know to automatically infer these when we use the notations.
Notation "x :: y" := (cons x y)
                     (at level 60, right associativity).
Notation "[ ]" := nil.
Notation "[ x ; .. ; y ]" := (cons x .. (cons y []) ..).
Notation "x ++ y" := (app x y)
                     (at level 60, right associativity).
Now lists can be written concisely:
Definition list123 := @cons nat 1 (@cons nat 2 (@cons nat 3 (@nil nat))).
Definition list123' := @cons _ 1 (@cons _ 2 (@cons _ 3 (@nil _))).
Definition list123'' := cons 1 (cons 2 (cons 3 nil)).
All that's fine. But check this out.
Definition list123''' := [1; 2; 3]. (* Oh, very nice. *)
Such a notation is very common for lists and arrays to use this "concrete, element-wise" notation---a wide variety of programming languages use it.
The general idea is that:

   [ v1; v2; ...; vn-1; vn ]
is the same as

   cons v1 (cons v2 (cons ... (cons vn-1 (cons vn nil))))
It's worth noting: when using concrete notation, you only write the values, never the tail of the list.
But it's easy to get confused! The following quizzes are meant to hone your understanding.

Combine and split

The following function takes two lists and combines them into a list of pairs. In other functional languages, it is often called zip; we call it combine for consistency with Coq's standard library.
Fixpoint combine {X Y : Type} (lx : list X) (ly : list Y)
           : list (X×Y) :=
  match lx, ly with
  | [], _[]
  | _, [][]
  | x :: tx, y :: ty(x, y) :: (combine tx ty)
  end.
Try answering the following questions on paper and checking your answers in Coq:
  • What is the type of combine (i.e., what does Check @combine print?)
  • What does
            Compute (combine [1;2] [false;false;true;true]). print?

Exercise: 2 stars, standard (split)

The function split is the right inverse of combine: it takes a list of pairs and returns a pair of lists. In many functional languages, it is called unzip.
Fill in the definition of split below. Make sure it passes the given unit test.
Fixpoint split {X Y : Type} (l : list (X×Y))
               : (list X) × (list Y)
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute split [(1,false);(2,false)]. (* ==> (1;2,false;false) *)

Filter

Here is a more useful higher-order function, taking a list of Xs and a predicate on X (a function from X to bool) and "filtering" the list, returning a new list containing just those elements for which the predicate returns true.
Fixpoint filter {X:Type} (test: Xbool) (l:list X)
                : (list X) :=
  match l with
  | [][]
  | h :: tif test h then h :: (filter test t)
                        else filter test t
  end.
For example, if we apply filter to the predicate evenb and a list of numbers l, it returns a list containing just the even members of l.
Compute (filter evenb [1;2;3;4]). (* ==> 2;4 *)

Definition length_is_1 {X : Type} (l : list X) : bool :=
  eqb (length l) 1.

Compute (filter length_is_1 [ [1; 2]; [3]; [4]; [5;6;7]; []; [8] ]). (* ==>  [3]; [4]; [8]  *)
We can use filter to give a concise version of the countoddmembers function from the Lists chapter.
Definition countoddmembers' (l:list nat) : nat :=
  length (filter oddb l).

Compute (countoddmembers' [1;0;3;1;4;5]). (* ==> 4 *)
Compute (countoddmembers' [0;2;4]). (* ==> 0 *)
Compute (countoddmembers' nil). (* ==> 0 *)

Higher-order Functions and Anonymous Functions: Together Forever

It is arguably a little sad, in the example just above, to be forced to define the function length_is_1 and give it a name just to be able to pass it as an argument to filter, since we will probably never use it again. Moreover, this is not an isolated example: when using higher-order functions, we often want to pass as arguments "one-off" functions that we will never use again; having to give each of these functions a name would be tedious.
Fortunately, we've already seen a better way. We can construct a function "on the fly" without declaring it at the top level or giving it a name.
Compute (filter (fun leqb (length l) 1) [ [1; 2]; [3]; [4]; [5;6;7]; []; [8] ]).
(* ==>  [3]; [4]; [8]  *)

Exercise: 2 stars, standard (filter_even_gt7)

Use filter (instead of Fixpoint) to write a Coq function filter_even_gt7 that takes a list of natural numbers as input and returns a list of just those that are even and greater than 7.
Definition filter_even_gt7 (l : list nat) : list nat
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Compute (filter_even_gt7 [1;2;6;7;9;10;3;12;8]). (* ==> 10;12;8 *)
Compute (filter_even_gt7 [5;2;6;7;19;129]). (* ==>  *)

Exercise: 3 stars, standard (partition)

Use filter to write a Coq function partition:
      partition : X : Type,
                  (Xbool) → list Xlist X × list X
Given a set X, a predicate of type X bool and a list X, partition should return a pair of lists. The first member of the pair is the sublist of the original list containing the elements that satisfy the test, and the second is the sublist containing those that fail the test. The order of elements in the two sublists should be the same as their order in the original list.
Your definition should not be recursive: just use filter!
Definition partition {X : Type}
                     (test : X bool)
                     (l : list X)
                   : list X × list X
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute (partition oddb [1;2;3;4;5]). (* ==> (1;3;52;4) *)
Compute (partition (fun xfalse) [5;9;0]). (* ==> (5;9;0) *)

Map

Another handy higher-order function is called map. Perhaps you've heard of systems like MapReduce or Hadoop---they're massive, distributed systems that implement efficient computations on huge datasets using what are common higher-order functions like map and reduce (a/k/a fold---we won't really talk about it).
Fixpoint map {X Y: Type} (f:XY) (l:list X) : (list Y) :=
  match l with
  | [][]
  | h :: t(f h) :: (map f t)
  end.
It takes a function f and a list l = [n1, n2, n3, ...] and returns the list [f n1, f n2, f n3,...] , where f has been applied to each element of l in turn. For example:
Compute (map (fun xplus 3 x) [2;0;2]).
(* ==> 5;3;5 *)
The element types of the input and output lists need not be the same, since map takes two type arguments, X and Y; it can thus be applied to a list of numbers and a function from numbers to booleans to yield a list of booleans:
Compute (map oddb [2;1;2;5]).
(* ==> false;true;false;true *)
It can even be applied to a list of numbers and a function from numbers to lists of booleans to yield a list of lists of booleans:
Compute (map (fun n[evenb n;oddb n]) [2;1;2;5]).
(* ==> [true;false];[false;true];[true;false];[false;true] *)
Lists are not the only inductive type for which map makes sense. Here is a map for the option type:
Definition option_map {X Y : Type} (f : X Y) (xo : option X)
                      : option Y :=
  match xo with
    | NoneNone
    | Some xSome (f x)
  end.

Exercise: 2 stars, standard (map_option)

We can also use lists and options together. Define a function map_option that takes a function f : X option Y and a list of Xs and returns a list of those Ys for which f didn't return None. Take X and Y as implicit arguments.
Fixpoint map_option {X Y : Type} (f : X option Y) (l : list X) : list Y
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Exercise: 2 stars, standard (flat_map)

The function map maps a list X to a list Y using a function of type X Y. We can define a similar function, flat_map, which maps a list X to a list Y using a function f of type X list Y. Your definition should work by 'flattening' the results of f, like so:
        flat_map (fun n ⇒ [n;n+1;n+2]) [1;5;10]
      = [1; 2; 3; 5; 6; 7; 10; 11; 12].
Fixpoint flat_map {X Y: Type} (f: X list Y) (l: list X)
                   : (list Y)
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute (flat_map (fun n[n;n;n]) [1;5;4]).
(* ==> 1; 1; 1; 5; 5; 5; 4; 4; 4 *)

Exercise: 2 stars, standard, optional (map_option')

The flat_map function defined above is a little bit more general than map_option. Define a function map_option' that has map_option's type, but uses flat_map to do the work. You'll need to somehow take f : X option Y and produce some other function of type X list Y.
Definition map_option' {X Y : Type} (f: X option Y) (l : list X) : list Y
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Higher-order predicates: forallb and existsb

Exercise: 4 stars, standard (forall_exists)

Define two recursive Fixpoints, forallb and existsb. The first checks whether every element in a list satisfies a given predicate:
      forallb oddb [1;3;5;7;9] = true

      forallb negb [false;false] = true

      forallb evenb [0;2;4;5] = false

      forallb (eqb 5) [] = true
The second checks whether there exists an element in the list that satisfies a given predicate:
      existsb (eqb 5) [0;2;3;6] = false

      existsb (andb true) [true;true;false] = true

      existsb oddb [1;0;0;0;0;3] = true

      existsb evenb [] = false
Next, define a nonrecursive version of existsb -- call it existsb' -- using forallb and negb. (You shouldn't call any functions other than these two.)
Remember: even if you don't do this part of the homework, you should write Definition forallb TYPE GOES HERE. Admitted. so that the autograder works. If you're stuck figuring out the type, ask for help.
(* FILL IN HERE *)
(* Do not modify the following line: *)
Definition manual_grade_for_forall_exists : option (nat×string) := None.

Case study: DNA strands and amino acids

A strand of DNA is just a list of bases. We might ask a variety of questions about DNA strands: are they equal or complementary?
Definition strand : Type := list base.
We'll redefine eq_base here, to make sure we're all on the same page. Note how we match on the pair to only specify the true cases, using _ as a "catch-all" or "default" for the remaining false cases.
Definition eq_base (b1 b2 : base) : bool :=
  match (b1, b2) with
  | (C, C)true
  | (G, G)true
  | (A, A)true
  | (T, T)true
  | (_, _)false
  end.
Now we can define predicates for strand equality (eq_strand) and complementarity (complementary).
Fixpoint eq_strand (dna1 dna2 : strand) : bool :=
  match (dna1, dna2) with
  | ([], [])true
  | ([], _)false
  | (_, [])false
  | (b1 :: dna1', b2 :: dna2')
    eq_base b1 b2 && eq_strand dna1' dna2'
  end.

Fixpoint complementary (dna1 dna2 : strand) : bool :=
  match (dna1, dna2) with
  | ([], [])true
  | ([], _)false
  | (_, [])false
  | (b1 :: dna1', b2 :: dna2')
    eq_base b1 (complement b2) && complementary dna1' dna2'
  end.

Exercise: 2 stars, standard, optional (complementary')

Define a different version of complementary that doesn't use recursion; you may of course call a recursive function. (Hint: try using map!)
Definition complementary' (dna1 dna2 : strand) : bool
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
DNA (modeled as base) encodes, among other things, amino acids. Three DNA nucleotides encode a single codon: either an amino acid (like tryptophan (Trp) or glutamine (Glu)) or a sequence marker like STOP. These sequence of amino acids are formed into a protein by the ribosome.
Inductive codon : Type :=
| Ala
| Cys
| Asp
| Glu
| Phe
| Gly
| His
| Ile
| Lys
| Leu
| Met
| Asn
| Pro
| Gln
| Arg
| Ser
| Thr
| Val
| Trp
| Tyr
| STOP.
The truth is that the ribosome doesn't work on DNA, it works on RNA, which uses slightly different bases, replacing thymine (T) with uracil (U). To simplify things, we'll just work with DNA.
Using the DNA codon table from Wikipedia https://en.wikipedia.org/wiki/DNA_codon_table), let's implement a function that takes three nucleotides and returns the corresponding codon.
Programming is often tedious and repetitive. It pays to be careful and methodical when writing this long function. But... there are definitely places you can save pattern matches!
Definition encode_one (b1 b2 b3 : base) : codon :=
  match b1 with
  | T
    match b2 with
    | Tmatch b3 with
           | TPhe
           | CPhe
           | ALeu
           | GLeu
           end
    | CSer
    | Amatch b3 with
           | TTyr
           | CTyr
           | ASTOP (* Ochre *)
           | GSTOP (* Amber *)
           end
    | Gmatch b3 with
           | TCys
           | CCys
           | ASTOP (* Opal *)
           | GTrp
           end
    end
  | C
    match b2 with
    | TLeu
    | CPro
    | Amatch b3 with
           | THis
           | CHis
           | AGln
           | GGln
           end
    | GArg
    end
  | A
    match b2 with
    | Tmatch b3 with
           | TIle
           | CIle
           | AIle
           | GMet
           end
    | CThr
    | Amatch b3 with
           | TAsn
           | CAsn
           | ALys
           | GLys
           end
    | Gmatch b3 with
           | TSer
           | CSer
           | AArg
           | GArg
           end
    end
  | G
    match b2 with
    | TVal
    | CAla
    | Amatch b3 with
           | TAsp
           | CAsp
           | AGlu
           | GGlu
           end
    | GGly
    end
  end.

Compute (encode_one T A G). (* ===> STOP *)

Compute (encode_one G A T). (* ===> Asp *)

Compute (encode_one A T G). (* ===> Met *)

Exercise: 2 stars, standard (encode)

Write a function that tries to translate a DNA strand into a list of codons. Since encode_one needs three bases to work, we'll run into a problem if we don't have the right number of nucleotides in the list. If you end up with a funny number of nucleotides (i.e., not a multiple of 3), you should return None.
Fixpoint encode (dna : strand) : option (list codon) (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute (encode [A]). (* ==> None *)
Compute (encode [A; T; G; C; G; T; T; A; T; T; A; G]). (* ==> Some Met; Arg; Tyr; STOP *)

(* 2021-09-28 15:37 *)