Day07_trees

Require Export DMFP.Day06_sorting.

Binary trees

Recall the definition of binary trees bt from Day04_structures:

Inductive bt (X : Type) : Type :=
| empty
| node (l : bt X) (v : X) (r : bt X).

Check empty.
Check @empty nat.
Check (node empty true empty). (* leaf node *)
Check (node empty 5 empty). (* leaf node *)
Check (node empty G (node empty C empty)).
Check (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty)).

Compute (size (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty))).
We defined some terminology already: a leaf is a node where both children are empty.
The root of a tree is the topmost node.

Definition root {X : Type} (t : bt X) : option X :=
  match t with
  | emptyNone
  | node _ v _Some v
  end.

Compute (root (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty))).
(* ===> Some 25 *)
It's common enough to want to make a leaf that we'll define a function to do it for us.
Definition leaf {X : Type} (v : X) : bt X := node empty v empty.

Compute (root (node (leaf 12) 25 (node (leaf 5) 5 empty))).
(* ===> Some 25 *)

What are trees for?

Trees are a useful and general structure. Computer scientists use trees:
  • In interpreters, compilers, and program analyses to represent and manipulate programs.
  • In machine learning's so-called "decision trees".
  • In game AI to represent strategies.
  • In operating systems to store filesystem information.
  • In databases to quickly manipulate stored values.
But it's not just CS! Biological systematists use trees to model evolutionary relationships. Linguists use trees to model syntactic structure. Genealogists use trees to model familial relationships.
Trees are a Big Deal. Here's a very contrived tree model of declarative English sentences.

Inductive proper_noun : Type :=
  | Prof_Greenberg
  | Opie_the_dog
  | Dr_Dave.

Inductive article : Type :=
  | the
  | a.

Inductive noun : Type :=
  | program
  | frisbee
  | lasagna
  | unicycle.

Inductive noun_phrase : Type :=
  | np_proper (pn : proper_noun)
  | np_compound (a : article) (n : noun).

Inductive verb_intransitive : Type :=
  | laughed
  | danced
  | barked.

Inductive verb_transitive : Type :=
  | bit
  | wrote
  | baked
  | rode.

Inductive preposition : Type :=
  | under
  | around
  | in_front_of.

Inductive adverb : Type :=
  | goofily
  | well
  | sarcastically.

Inductive modifier : Type :=
  | m_prep (p : preposition) (np : noun_phrase)
  | m_adverb (a : adverb).

Inductive verb_phrase : Type :=
  | vp_intransitive (v : verb_intransitive)
  | vp_transitive (v : verb_transitive) (o : noun_phrase)
  | vp_modifier (vp : verb_phrase) (m : modifier).

Inductive sentence : Type :=
  | s (np : noun_phrase) (vp : verb_phrase).

Exercise: 1 star, standard (sentences)

Write a tree that models a sentence where Prof. Greenberg danced in front of Opie.
Definition goofball : sentence (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Write a tree that models a sentence where Opie bakes a lasagna well. Good boy!
Definition good_dog : sentence (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Translate the following tree to an English sentence.
Definition so_talented : sentence :=
  s (np_proper Dr_Dave)
    (vp_modifier
       (vp_modifier
          (vp_transitive rode (np_compound the unicycle))
          (m_adverb goofily))
       (m_prep around (np_compound the frisbee))).
(* FILL IN HERE *)

(* Do not modify the following line: *)
Definition manual_grade_for_sentences : option (nat×string) := None.

Traversals

A tree is a useful data structure in a program, but it isn't always what people want to see---people often do better with lists.
Any function that looks at every node of a tree is called a traversal. The word 'traversal' often refers particularly to the process of converting a tree to a list.
There are three common notions of traversal, named for what they do with the value at a given node:
  • A pre-order traversal puts the value first, then traverses the left child, then the right child.
  • An in-order traversal first traverses the left child, then the value of the node, then the right child.
  • A post-order traversal first traverses the left child, then the right child, then the value of the node.
You'll notice that all three of these go left-to-right. That's a convention---you could look at the right children, then the left children, then the node. Which traversal you use in practice depends on
Here's an in-order traveral.
Fixpoint inorder {X : Type} (t : bt X) : list X :=
  match t with
  | empty[]
  | node l v rinorder l ++ [v] ++ inorder r
  end.

Compute (inorder (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty))).

Exercise: 2 stars, standard (traversals)

Define traversal functions preorder and postorder that implement those respective traversals.
(* FILL IN HERE *)
Write a tree that produces different output from all three traversals. (Note the type!) It should have at least three nodes.
Definition different_all_three : bt base (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Write a tree that produces the same output from pre- and in-order traversals (but not post-order). It should have at least three nodes.
Definition same_pre_in : bt base (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Write a tree that produces the same output from in- and post-order traversals (but not pre-order). It should have at least three nodes.
Definition same_post_in : bt base (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_traversals : option (nat×string) := None.

Binary search trees (BSTs)

Binary search trees are pervasive and powerful data structures. They are used to implement sets and associative maps, two of the most commonly used data structures.
BSTs refine binary trees with an invariant called the 'search property'. A binary tree t is said to have the search property when for a given ordering (i.e., a notion of 'less than' and 'greater than'):
  • t is empty, or
  • t is node l v r where:
    + l and r have the search property
    + every value in l is less than v
    + every value in r is greater than v
The search property is very useful. Suppose you have a binary search tree, and you wonder if a value n is in it. First, you check the root: if that's it, great. If not, then ask: is n less than or greater than the root? You only have to keep looking in one side of the tree! If your tree is well balanced, you'll do very little traversal at all. If you kept a list rather than a BST, you'd have to search the whole list to be sure. Nice... what a savings!
It turns out other operations on BSTs---adding and deleting elements---are also very efficient. We'll implement a few.
(Sometimes people want BSTs were values in l are less than or equal to v. It depends; for sets and maps, you want less than, because it guarantees there are no duplicates.)

Exercise: 1 star, standard (search_property)

Write two trees of naturals that have the search property and two trees that do not. All of your trees should have at least two nodes.

Definition bst1 : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Definition bst2 : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Definition non_bst1 : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Definition non_bst2 : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_search_property : option (nat×string) := None.
We've already defined eqb and leb, so we could just use those. But it's common to use a compare operation that gives us 'complete' comparison information. It saves comparisons and is less error prone. What's not to like?
Inductive ordering : Type :=
  | LT
  | EQ
  | GT.

Fixpoint compare (n m : nat) : ordering :=
  match n, m with
  | O, OEQ
  | S _, OGT
  | O, S _LT
  | S n', S m'compare n' m'
  end.

Compute (compare 0 0). (* ===> EQ *)
Compute (compare 0 10). (* ===> LT *)
Compute (compare 7 4). (* ===> GT *)
Compute (compare 1000 1000). (* ===> EQ *)
Here's a function that uses the search property to rapidly determine whether or not a value is in a BST already.
Fixpoint lookup (n : nat) (t : bt nat) : bool :=
  match t with
  | emptyfalse
  | node l v r
    match compare n v with
    | LTlookup n l
    | EQtrue
    | GTlookup n r
    end
  end.

Exercise: 3 stars, standard (insert)

Write a function that inserts a number n into a binary search tree t. To get full credit, you should use the binary search property!
What should your function do when n is already in the tree?
Fixpoint insert (n : nat) (t : bt nat) : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Exercise: 2 stars, standard (remove_min)

Write a function that tries to remove the smallest element from a binary search tree. You'll need to return not just the smallest element, but also what's left of the tree.
Your function should return None only when t is empty. Use the search property!
Fixpoint remove_min (t : bt nat) : option (nat × bt nat) (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Exercise: 4 stars, standard (remove)

Write a function that removes a number n from a binary search tree. If n isn't in the tree, you should return the tree unchanged.
It turns out this is a hard function to write! Here's a general plan:
  • Handling empty should be simple.
  • To remove n from node l v r, look at compare n v.
    + The LT and GT cases should be simple.
    + The EQ case is tricky! Use remove_min to find a new root element from r if you can. What does it mean if remove_min r = None?
Fixpoint remove (n : nat) (t : bt nat) : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Exercise: 4 stars, standard (is_bst)

Write a function is_bst that takes t : bt nat and returns true if t has the search property.
There are different ways to go here. A few hints:
  • You'll need at least one helper function. How can you check the "every value" parts of the search property?
  • Don't think about efficiency---you're just making things hard for yourself. Get it to work first, and then make it efficient if that'll make you happy.
We'd like you to write the signature yourself. If you can't finish writing the function, that's fine---but make sure you use Definition and Admitted to leave something behind for the grader. Talk to the TAs if you're stuck.

(* Do not modify the following line: *)
Definition manual_grade_for_is_bst : option (nat×string) := None.

(* Mon Oct 12 08:48:48 PDT 2020 *)