Day01_intro

Intro: Functional Programming in Coq

The Require Export statement on the next line tells Coq to use the String module from the standard library. We won't really be using strings ourselves, but we need to Require it here so that the grading scripts can use it for internal purposes.
From Coq Require Export String.

(* REMINDER:

          ########################## #  PLEASE DO NOT DISTRIBUTE SOLUTIONS PUBLICLY  # ##########################

   (See the Preface for why.)
*)

Introduction

This course is about two things:
  • functional programming, and
  • (inductive) proof.
The primary goal of this course is to get you "thinking like a computer scientist": how to structure code, how to think about what code does, and how to justify your beliefs with proof.
The course has three parts:
  • a functional programming part in Coq (YOU ARE HERE),
  • a formal proof part in Coq, and
  • an informal proof part on paper.
In the first part of the course, we'll use Coq's (French for 'rooster') functional programming language Gallina (Spanish for 'hen') to write programs. In the second part of the course, we'll use Coq's unnamed tactic language to learn to write proofs. In the third and final part of the course, we'll adapt what we've learned to write proofs on paper.
This course is unconventional. Most computer science departments simply teach a "discrete math" course that more or less resembles the third part of this course.
There are two reasons we teach this funny course at Pomona College.
  • Some students skip the intro course (CS051), and we want everyone to have the same programming fundamentals. Teaching functional programming is a good way to achieve that.
  • A lot of what people teach in discrete math courses isn't that relevant to many computer scientists. We'd rather focus on the parts that are most important.
It's also unusual (but not unheard of) to teach undergraduates using Coq, a powerful tool not often encountered until graduate school. First, we trust you to handle this difficult material. Second, Coq is a critical part of this course's "middle": formal proof about functional programs you've written. We think that Coq makes an excellent tutor, making sure you follow the rules when you're learning how proof works. You'll be better at paper proofs after having learned formal proof in Coq.

What is functional programming?

The functional programming style is founded on simple, everyday mathematical intuition: If a procedure or method has no side effects, then (ignoring efficiency) all we need to understand about it is how it maps inputs to outputs -- that is, we can think of it as just a concrete method for computing a mathematical function. This is one sense of the word "functional" in "functional programming." The direct connection between programs and simple mathematical objects supports both formal correctness proofs and sound informal reasoning about program behavior.
The other sense in which functional programming is "functional" is that it emphasizes the use of functions (or methods) as first-class values -- i.e., values that can be passed as arguments to other functions, returned as results, included in data structures, etc. The recognition that functions can be treated as data gives rise to a host of useful and powerful programming idioms.
Finally, a third sense of "functional" is that... it works! Functional programming rules out a variety of bugs that can occur in imperative programming. Learning a functional programming language will help you think clearly about programming... in any language.
Other common features of functional languages include algebraic data types and pattern matching, which make it easy to construct and manipulate rich data structures, and sophisticated polymorphic type systems supporting abstraction and code reuse. Coq offers all of these features.
The first half of this chapter introduces the most essential elements of Coq's functional programming language, called Gallina. The second half introduces some basic tactics that can be used to prove properties of Coq programs.

Getting the tools in order

We'll be using Emacs and Proof General throughout the course.
You'll need to:
1. Download and install Coq:
  • https://coq.inria.fr/download
On macOS, it should install into /Applications/CoqIDE_8.12.0.
On Windows, it should install into C:\Coq.
2. Download and install Emacs
  • Windows: http://mirrors.ocf.berkeley.edu/gnu/emacs/windows/emacs-26/emacs-26.3-x86_64.zip
    I recommend extracing the zipfile into C:\Program Files\Emacs; you then want to add a shortcut to C:\Program Files\Emacs\bin\runemacs.exe to your Desktop and maybe pin it to your taskbar.
  • macOS: https://emacsformacosx.com/ (or `brew cask install emacs`)
  • Linux: `apt install emacs` (depending on distro)
3. Install the init.el configuration file for Emacs, which will download Proof General automatically.
  • https://cs.pomona.edu/~michael/courses/csci054f20/downloads/init.el
On macOS, you'll need to put the `init.el` file in `~/.emacs.d/`. If `init.el` is in your `Downloads` folder, run the command `mkdir ~/.emacs.d/; mv ~/Downloads/init.el ~/.emacs.d`.
On Windows, you'll need to put the `init.el` file in `C:\Users\username\AppData\Roaming\.emacs.d\.init.el`. Best to copy it by hand in Explorer.
Working directly with your computers filesystem may be new to you: you may be used to dragging and dropping things in the GUI, or using "open recent" or other automatic suggestions. Those are all great tools, but programming often means getting into the guts of the machine.
4. Make a directory where you'll keep all of your CS054 files. Don't create a subdirectory for each assignment, since each one depends on the previous ones.
You need to create a _CoqProject file in that directory. There are two ways to do this.
a. Download https://cs.pomona.edu/~michael/courses/csci054f20/downloads/_CoqProject, put it in the directory you want. Be attentive: your OS may not be happy with a file without an extension! If you're using Windows, we recommend enabling "developer mode" and having the Explorer show file extensions.
a. Make it yourself. With Emacs open to this file, create it manually. Type C-x C-f _CoqProject RET to open/create the file in your current directory. Then type -Q . DMFP as the contents, then save the file with C-x C-s.
Once you've gotten all the software installed and set up, fire up Emacs. You need to check that everything is hunky dory:
1. Run the Emacs tutorial. You can get there by pressing C-h t, i.e., press control and h at the same time, let go, then press the letter t.
2. Once you've learned the basic Emacs ropes, open up this file in Emacs C-x C-f PATH/TO/Day01_intro.v. It might take some time to get used to this way of working! (You can also drag and drop or use the menu bar.)
3. Double check that you can compile the file. Go to the very end M and ask Coq to check everything C-c RET, where RET means the enter or return key.
It's not as nice as the tutorial, but typing C-h m when you have a Coq file open will show you help for your current 'mode'. There's also documentation online at https://proofgeneral.github.io/doc/master/userman/, but it's written for a more experienced audience.
Everything working? If not, contact Prof. Greenberg or TA. If so, great... let's get started!

Data and Functions

Enumerated Types

One notable aspect of Coq is that its set of built-in features is extremely small. For example, instead of providing the usual palette of atomic data types (booleans, integers, strings, etc.), Coq offers a powerful mechanism for defining new data types from scratch, with all these familiar types as instances.
Naturally, the Coq distribution comes preloaded with an extensive standard library providing definitions of booleans, numbers, and many common data structures like lists and so on. But there is nothing magic or primitive about these library definitions. To illustrate this, we will explicitly recapitulate all the definitions we need in this course, rather than just getting them implicitly from the library. Later on, when we're doing proofs, we'll mostly use the library definitions.

Days of the Week

To see how this definition mechanism works, let's start with a very simple example. The following declaration tells Coq that we are defining a new set of data values -- a type.

Inductive day : Type :=
  | monday
  | tuesday
  | wednesday
  | thursday
  | friday
  | saturday
  | sunday.
The type is called day, and its members are monday, tuesday, etc.
Having defined day, we can write functions that operate on days.

Definition next_weekday (d:day) : day :=
  match d with
  | mondaytuesday
  | tuesdaywednesday
  | wednesdaythursday
  | thursdayfriday
  | fridaymonday
  | saturdaymonday
  | sundaymonday
  end.
One thing to note is that the argument and return types of this function are explicitly declared. Like most functional programming languages, Coq can often figure out these types for itself when they are not given explicitly -- i.e., it can do type inference -- but we'll generally include them to make reading easier.
Having defined a function, we should check that it works on some examples. There are several different ways to check your work in Coq. Later on, we'll prove our work correct! For now, we can use the Check command to type check an expression and Compute command to evaluate an expression involving next_weekday.

Check friday.
(* ==> friday : day *)

Check (next_weekday friday).
(* ==> next_weekday friday : day*)

Compute (next_weekday friday).
(* ==> monday : day *)

Compute (next_weekday (next_weekday saturday)).
(* ==> tuesday : day *)
(We show Coq's responses in comments, but, if you have a computer handy, this would be an excellent moment to fire up the Coq interpreter in VS Code and try this for yourself. Load this file, Day01_intro.v, from the book's Coq sources, find the above example, submit it to Coq, and observe the result.)
We can ask Coq to extract, from our Definition, a program in some other, more conventional, programming language (OCaml, Scheme, or Haskell) with a high-performance compiler. This facility is very interesting, since it gives us a way to go from proved-correct algorithms written in Gallina to efficient machine code. (Of course, we are trusting the correctness of the OCaml/Haskell/Scheme compiler, and of Coq's extraction facility itself, but this is still a big step forward from the way most software is developed today.) Indeed, this is one of the main uses for which Coq was developed. We won't really talk about extraction more in this course.

Homework Submission Guidelines

If you are using Software Foundations in a course, your instructor may use automatic scripts to help grade your homework assignments. In order for these scripts to work correctly (so that you get full credit for your work!), please be careful to follow these rules:
  • The grading scripts work by extracting marked regions of the .v files that you submit. It is therefore important that you do not alter the "markup" that delimits exercises: the Exercise header, the name of the exercise, the "empty square bracket" marker at the end, etc. Please leave this markup exactly as you find it.
  • Do not delete exercises. If you skip an exercise (e.g., because it is marked Optional, or because you can't solve it), it is OK to leave a partial proof in your .v file, but in this case please make sure it ends with Admitted (not, for example Abort).
  • It is fine to use additional definitions (of helper functions, useful lemmas, etc.) in your solutions. You can put these between the exercise header and the theorem you are asked to prove.
  • As we work our way through the files, keep in mind that we'll grade you in terms of our old definitions, not yours. If you want to use a helper function from an earlier file in a later one, be sure to copy it over.

Booleans

In a similar way, we can define the standard type bool of booleans, with members true and false.

Inductive bool : Type :=
  | true
  | false.
Although we are rolling our own booleans here for the sake of building up everything from scratch, Coq does, of course, provide a default implementation of the booleans, together with a multitude of useful functions and lemmas. (Take a look at Coq.Init.Datatypes in the Coq library documentation if you're interested.) Whenever possible, we'll name our own definitions and theorems so that they exactly coincide with the ones in the standard library.
Functions over booleans can be defined in the same way as above. First, we can use booleans to define a predicate, a function that identifies some subset of a given set:

Definition is_weekday (d:day) : bool :=
  match d with
  | mondaytrue
  | tuesdaytrue
  | wednesdaytrue
  | thursdaytrue
  | fridaytrue
  | saturdayfalse
  | sundayfalse
  end.
We can also define some of the usual operations on booleans. First comes not or negation, which is often written as the operator !.

Definition negb (b:bool) : bool :=
  match b with
  | truefalse
  | falsetrue
  end.
Coq also lets you use conventional if/then/else notation for booleans, as in:

Definition negb' (b:bool) : bool :=
  if b
  then false
  else true.
Every other datatype will need you to use match, though! Depending on other languages you've learned, you may have seen a "one-armed if" before. You can't do that in Coq---every expression must return a value, and a missing else branch would leave Coq wondering what to return.
Another common way of expressing functions from booleans to booleans is with a truth table.
    ⊢--⊢-------|
    | b | negb b |
    ⊢--⊢-------|
    | T |    F   |
    | F |    T   |
    ⊢--⊢-------|
Each column of the truth table represents an expression of type bool. Here the first column represents an arbitrary input b, which can be true (written T) or false (written F). It's typical to consider the initial columns of a truth table as representing inputs and the final column as representing an output.
Each row of the truth table gives a possible assignment: you can read the first row as saying that if b = true, then negb b = false; the second row says that if b = false, then negb b = true.

Definition andb (b1:bool) (b2:bool) : bool :=
  match b1 with
  | trueb2
  | falsefalse
  end.
When constructing a truth table with more than one input, it's important to make sure your truth table has every possible input configuration accounted for. People have different ways of doing so, but I tend to like the following format, where we exhaust all of the possibilities for the first column to be true, and then we consider the cases where the first column is false. Electrical engineers, however, like to do it the opposite way: when false is 0 and true is 1, it makes sense to count "up".
It doesn't particularly matter which method you choose, but it's important to be consistent!
    ⊢---⊢---⊢-----------|
    | b1 | b2 | andb b1 b2 |
    ⊢---⊢---⊢-----------|
    | T  | T  |      T     |
    | T  | F  |      F     |
    | F  | T  |      F     |
    | F  | F  |      F     |
    ⊢---⊢---⊢-----------|

Definition orb (b1:bool) (b2:bool) : bool :=
  match b1 with
  | truetrue
  | falseb2
  end.
    ⊢---⊢---⊢----------|
    | b1 | b2 | orb b1 b2 |
    ⊢---⊢---⊢----------|
    | T  | T  |     T     |
    | T  | F  |     T     |
    | F  | T  |     T     |
    | F  | F  |     F     |
    ⊢---⊢---⊢----------|
The last two of these definitions illustrate Coq's syntax for multi-argument function definitions. The corresponding multi-argument application syntax is illustrated by the following "unit tests," which constitute a complete specification -- a truth table -- for the orb function:

Compute (orb true true ).
Compute (orb true false).
Compute (orb false true ).
Compute (orb false false).
We can also introduce some familiar syntax for the boolean operations we have just defined. The Notation command defines a new symbolic notation for an existing definition.

Notation "x && y" := (andb x y).
Notation "x || y" := (orb x y).
A note on notation: In .v files, we use square brackets to delimit fragments of Coq code within comments; this convention, also used by the coqdoc documentation tool, keeps them visually separate from the surrounding text. In the html version of the files, these pieces of text appear in a different font.

Exercise: 1 star, standard (nandb)

Remove "Admitted." and complete the definition of the following function; then make sure that the Example assertions below can each be verified by Coq. (Remove "Admitted." and fill in each proof, following the model of the orb tests above.) The function should return true if either or both of its inputs are false.
You can use negb, but please do not use andb when you're defining this function.

Definition nandb (b1:bool) (b2:bool) : bool
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_nandb : option (nat×string) := None.

(* What's that box symbol? It represents the end of an exercise. Since
   it's in a comment, it doesn't _do_ anything. It shouldn't hurt if
   you remove it, but there's no need to. *)


Compute (nandb true true ).
Compute (nandb true false).
Compute (nandb false true ).
Compute (nandb false false).
Truth tables are a particularly nice way of calculating compound expressions involving booleans. In addition to having input and output columns, we can have intermediate columns representing subexpressions of the boolean we're interested in. When building such a truth table, every subexpression of the final result should show up as a column.
    ⊢--⊢-------⊢----------------|
    | b | negb b | orb b (negb b)  |
    ⊢--⊢-------⊢----------------|
    | T |    F   |        T        |
    | F |    T   |        T        |
    ⊢--⊢-------⊢----------------|

Exercise: 1 star, standard (impb)

Write a function impb such that impb b1 b2 has the same truth table as orb (negb b1) b2. Don't just trivially define it as orb (negb b1) b2, though! Try using a match.
Definition impb (b1:bool) (b2:bool) : bool
  (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_impb : option (nat×string) := None.

Function Types

Every expression in Coq has a type, describing what sort of thing it computes. The Check command asks Coq to print the type of an expression.

Check true.
(* ===> true : bool *)
Check (negb true).
(* ===> negb true : bool *)
Functions like negb itself are also data values, just like true and false. Their types are called function types, and they are written with arrows.

Check negb.
(* ===> negb : bool -> bool *)
The type of negb, written bool bool and pronounced "bool arrow bool," can be read, "Given an input of type bool, this function produces an output of type bool." Similarly, the type of andb, written bool bool bool, can be read, "Given two inputs, both of type bool, this function produces an output of type bool."

Check orb.
(* ===> orb : bool -> bool -> bool *)
You might think of the function orb as taking two arguments, but every function in Coq takes one argument at a time. Each argument gets its own arrow. You can think of this as saying that orb is a function that takes a boolean and returns another function that takes another boolean... and that returns a boolean. Look:

Check (orb true).
(* ===> orb true : bool -> bool *)

Check (orb true false).
(* ===> true || false : bool *)
Function types are right associative, i.e., parentheses go on the right, i.e., A B C is the same as  A (B C).
Function application is left associative, i.e., parentheses go on the left, i.e., a b c is the same (a b) c.

Check ((orb true) false).
(* ===> true || false : bool *)
The Fail prefix says that we expect a command to not work. It's useful for examples like this!
Fail Check (orb (true false)).
(* ===> 
  The command has indeed failed with message:
  Illegal application (Non-functional construction): 
  The expression "true" of type "bool"
  cannot be applied to the term
    "false" : "bool"
 *)

Case study: DNA nucleotides

We'll use DNA processing as a running example through the course. Bioinformatics---the application of computational techniques to biological data---has richly blossomed over the last thirty years, and we'll only skim the surface.
We can start by defining the types of nucleotides: Cytosine, Guanine, Adenine, and Thymine.

Inductive base : Type :=
| C (* cytosine *)
| G (* guanine *)
| A (* adenine *)
| T. (* thymine *)
DNA has a double helix structure comprising two paired strands, where each C corresponds to a G and each A corresponds to T. We won't get to defining DNA strands for a few weeks, but we can already start thinking about DNA in a more detailed way.
The DNA double helix has 'complementary' structure: if you know the bases of one strand, you know the bases of the other.
We can express this idea with a function that computes the complement for a given base.

Definition complement (b : base) : base :=
  match b with
  | CG
  | GC
  | AT
  | TA
  end.

Exercise: 1 star, standard (xorb)

Here is the truth table for xorb (eXclusive OR on Booleans).
    ⊢---⊢---⊢--------|
    | b1 | b2 | xorb b1 |
    ⊢---⊢---⊢--------|
    | T  | T  |    F    |
    | T  | F  |    T    |
    | F  | T  |    T    |
    | F  | F  |    F    |
    ⊢---⊢---⊢--------|
Define a function xorb that takes two booleans and returns a boolean, following the above truth table.
(* FILL IN HERE *)
(* Do not modify the following line: *)
Definition manual_grade_for_xorb : option (nat×string) := None.

Exercise: 2 stars, standard (is_classday)

Write a function is_classday : day bool that returns true exactly when it's a day we have CS054 (in FA2020, that's MW). It should be of type day bool.

(* Do not modify the following line: *)
Definition manual_grade_for_is_classday : option (nat×string) := None.

Exercise: 2 stars, standard (eq_base)

Write a function eq_base : base base bool that returns true exactly when two bases are equal.
Definition eq_base (b1 b2 : base) : bool (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

How to succeed in this course

Here's some advice on how to succeed in this course.
1. Read the book._ It's very tempting to just skim things and go straight to the homework... resist! We recommend a multi-pass approach: read through the chapter but don't fill in any homework. Then watch the videos. Interleaving videos and problem solving is a good idea---having skimmed the chapter first, you should have a clear idea of whether you're properly stuck or merely haven't watched the right video yet.
2. Don't spin._ It's easy to get stuck in a rut: Coq rejects everything you say, so you just try different things for an hour. Don't waste your time spinning in place! Set a timer for, say, twenty minutes. If you can't make progress that's clearly closer to where you need to be, then...
3. Ask for help._ It's normal to need help: math and computer science are hard, and even moreso together. We're here to help.

(* Mon Oct 12 08:48:47 PDT 2020 *)