Lists

Require Export DMFP.Structures.
Since lists are so important, we're going to spend a whole day working with them. We'll start by defining some nicer notation for lists, write some fancier functions, and then do a short bioinformatics case study.
Using argument synthesis and implicit arguments, we can define convenient notation for lists, as before. Since we have made the constructor type arguments implicit, Coq will know to automatically infer these when we use the notations.
Notation "x :: y" := (cons x y)
                     (at level 60, right associativity).
Notation "[ ]" := nil.
Notation "[ x ; .. ; y ]" := (cons x .. (cons y []) ..).
Notation "x ++ y" := (app x y)
                     (at level 60, right associativity).
Now lists can be written concisely:
Definition list123 := @cons nat 1 (@cons nat 2 (@cons nat 3 (@nil nat))).
Definition list123' := @cons _ 1 (@cons _ 2 (@cons _ 3 (@nil _))).
Definition list123'' := cons 1 (cons 2 (cons 3 nil)).
All that's fine. But check this out.
Definition list123''' := [1; 2; 3]. (* Oh, very nice. *)
Such a notation is very common for lists and arrays to use this "concrete, element-wise" notation---a wide variety of programming languages use it.
The general idea is that:

   [ v1; v2; ...; vn-1; vn ]
is the same as

   cons v1 (cons v2 (cons ... (cons vn-1 (cons vn nil))))
It's worth noting: when using concrete notation, you only write the values, never the tail of the list.
But it's easy to get confused! The following quizzes are meant to hone your understanding.

Case study: DNA strands and amino acids

A strand of DNA is just a list of bases. We might ask a variety of questions about DNA strands: are they equal or complementary?
Definition strand : Type := list base.
We'll redefine eq_base here, to make sure we're all on the same page. Note how we match on the pair to only specify the true cases, using _ as a "catch-all" or "default" for the remaining false cases.
Definition eq_base (b1 b2 : base) : bool :=
  match (b1, b2) with
  | (C, C)true
  | (G, G)true
  | (A, A)true
  | (T, T)true
  | (_, _)false
  end.
Now we can define predicates for strand equality (eq_strand) and complementarity (complementary).
Fixpoint eq_strand (dna1 dna2 : strand) : bool :=
  match (dna1, dna2) with
  | ([], [])true
  | ([], _)false
  | (_, [])false
  | (b1 :: dna1', b2 :: dna2')
    eq_base b1 b2 && eq_strand dna1' dna2'
  end.

Fixpoint complementary (dna1 dna2 : strand) : bool :=
  match (dna1, dna2) with
  | ([], [])true
  | ([], _)false
  | (_, [])false
  | (b1 :: dna1', b2 :: dna2')
    eq_base b1 (complement b2) && complementary dna1' dna2'
  end.
DNA (modeled as base) encodes, among other things, amino acids. Three DNA nucleotides encode a single codon: either an amino acid (like tryptophan (Trp) or glutamine (Glu)) or a sequence marker like STOP. These sequence of amino acids are formed into a protein by the ribosome.
Inductive codon : Type :=
| Ala
| Cys
| Asp
| Glu
| Phe
| Gly
| His
| Ile
| Lys
| Leu
| Met
| Asn
| Pro
| Gln
| Arg
| Ser
| Thr
| Val
| Trp
| Tyr
| STOP.
The truth is that the ribosome doesn't work on DNA, it works on RNA, which uses slightly different bases, replacing thymine (T) with uracil (U). To simplify things, we'll just work with DNA.
Using the DNA codon table from Wikipedia https://en.wikipedia.org/wiki/DNA_codon_table), let's implement a function that takes three nucleotides and returns the corresponding codon.
Programming is often tedious and repetitive. It pays to be careful and methodical when writing this long function. But... there are definitely places you can save pattern matches!
Definition encode_one (b1 b2 b3 : base) : codon :=
  match b1 with
  | T
    match b2 with
    | Tmatch b3 with
           | TPhe
           | CPhe
           | ALeu
           | GLeu
           end
    | CSer
    | Amatch b3 with
           | TTyr
           | CTyr
           | ASTOP (* Ochre *)
           | GSTOP (* Amber *)
           end
    | Gmatch b3 with
           | TCys
           | CCys
           | ASTOP (* Opal *)
           | GTrp
           end
    end
  | C
    match b2 with
    | TLeu
    | CPro
    | Amatch b3 with
           | THis
           | CHis
           | AGln
           | GGln
           end
    | GArg
    end
  | A
    match b2 with
    | Tmatch b3 with
           | TIle
           | CIle
           | AIle
           | GMet
           end
    | CThr
    | Amatch b3 with
           | TAsn
           | CAsn
           | ALys
           | GLys
           end
    | Gmatch b3 with
           | TSer
           | CSer
           | AArg
           | GArg
           end
    end
  | G
    match b2 with
    | TVal
    | CAla
    | Amatch b3 with
           | TAsp
           | CAsp
           | AGlu
           | GGlu
           end
    | GGly
    end
  end.

Compute (encode_one T A G). (* ===> STOP *)

Compute (encode_one G A T). (* ===> Asp *)

Compute (encode_one A T G). (* ===> Met *)

Exercise: 2 stars, standard (encode)

Write a function that tries to translate a DNA strand into a list of codons. Since encode_one needs three bases to work, we'll run into a problem if we don't have the right number of nucleotides in the list. If you end up with a funny number of nucleotides (i.e., not a multiple of 3), you should return None.
Fixpoint encode (dna : strand) : option (list codon) (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute (encode [A]). (* ==> None *)
Compute (encode [A; T; G; C; G; T; T; A; T; T; A; G]). (* ==> Some Met; Arg; Tyr; STOP *)

Combine and split

The following function takes two lists and combines them into a list of pairs. In other functional languages, it is often called zip; we call it combine for consistency with Coq's standard library.
Fixpoint combine {X Y : Type} (lx : list X) (ly : list Y)
           : list (X×Y) :=
  match lx, ly with
  | [], _[]
  | _, [][]
  | x :: tx, y :: ty(x, y) :: (combine tx ty)
  end.
Try answering the following questions on paper and checking your answers in Coq:
  • What is the type of combine (i.e., what does Check @combine print?)
  • What does
            Compute (combine [1;2] [false;false;true;true]). print?

Exercise: 2 stars, standard (split)

The function split is the right inverse of combine: it takes a list of pairs and returns a pair of lists. In many functional languages, it is called unzip.
Fill in the definition of split below. Make sure it passes the given unit test.
Fixpoint split {X Y : Type} (l : list (X×Y))
               : (list X) × (list Y)
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Compute split [(1,false);(2,false)]. (* ==> (1;2,false;false) *)

Higher-order predicates: forallb and existsb

Exercise: 3 stars, standard (forall_exists)

Define two recursive Fixpoints, forallb and existsb. The first checks whether every element in a list satisfies a given predicate:
      forallb oddb [1;3;5;7;9] = true

      forallb negb [false;false] = true

      forallb evenb [0;2;4;5] = false

      forallb (eqb 5) [] = true
The second checks whether there exists an element in the list that satisfies a given predicate:
      existsb (eqb 5) [0;2;3;6] = false

      existsb (andb true) [true;true;false] = true

      existsb oddb [1;0;0;0;0;3] = true

      existsb evenb [] = false
Next, define a nonrecursive version of existsb -- call it existsb' -- using only forallb and negb. (You shouldn't call any functions other than these two.)
Remember: even if you don't do this part of the homework, you should write Definition forallb TYPE GOES HERE. Admitted. so that the autograder works. If you're stuck figuring out the type, ask for help.
(* FILL IN HERE *)
(* Do not modify the following line: *)
Definition manual_grade_for_forall_exists : option (nat×string) := None.
(* 2022-01-12 10:20 *)