Trees

Require Export DMFP.Sorting.

Binary trees

There are many, many other kinds of data structures besides lists, options, pairs, bases, and weekdays. We'll define just one more in this course: trees.
Inductive bt (X : Type) : Type :=
| empty
| node (l : bt X) (v : X) (r : bt X).

Arguments empty {X}.
Arguments node {X} l v r.
If you've studied plant biology, the word 'node' may make you think of trees... but otherwise, what's going on here? What do these structures have to do with trees?
When you encounter a definition you don't understand, you need to play with it. So let's get playing!
Check empty.
Check @empty nat.
Check (node empty true empty).
Check (node empty 5 empty).
Check (node empty G (node empty C empty)).
Check (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty)).
Okay, those are all well typed. But... what are they? Let's adopt a graph notation, where each constructor is a vertex in our graph and we draw edges from constructors to their arguments. Here are each of those examples in order.
Gallina term:
  empty as a graph:
  empty
Gallina term:
  @empty nat as a graph:
  empty
Gallina term:
  node empty true empty as a graph:
       node
       /|\
      / | \
     /  |  \
    /   |   \
empty  true  empty
Gallina term:
  node empty 5 empty as a graph:
     node
     /|\
    / | \
   /  |  \
empty 5  empty
Gallina term:
  node empty G (node empty C empty) as a graph:
       node
       /|\
      / | \
     /  |  \
empty   G  node
           /|\
          / | \
         /  |  \
     empty  C   empty
Gallina term:
  node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty) as a graph:
                 node
                 /|\
                / | \
               /  |  \
           node   25  node
           /|\        /|\
      empty | empty  / | \
            12      /  |  \
                 node  5   empty
                 /|\
                / | \
            empty 5  empty
These graph notations can get a little heavy, so if it's unambiguous, people will leave off the constructors. Here's the last one in simpler notation:
Gallina term:
  node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty) as a graph:
                 25
                 /\
                /  \
               /    \
              12     5
              /\    / \
                   5
                  /\
Note that we write nothing for empty and we write the value v for a node.
Some terminology: a node has two children, l and r. A node is a leaf node when both of its children are empty.
The topmost node is called the root. This means that computer scientists draw trees upside down. Whoops! Too late to change now.
Let's start by writing some functions on binary trees. We'll compute their size (the number of nodes), determine whether they're a leaf or not with is_leaf, count their leaves with num_leaves.

Exercise: 1 star, standard (size)

Fixpoint size {X : Type} (t : bt X) : nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Exercise: 1 star, standard (is_leaf)

Definition is_leaf {X : Type} (t : bt X) : bool (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Exercise: 1 star, standard (num_leaves)

Fixpoint num_leaves {X : Type} (t : bt X) : nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Check empty.
Check @empty nat.
Check (node empty true empty). (* leaf node *)
Check (node empty 5 empty). (* leaf node *)
Check (node empty G (node empty C empty)).
Check (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty)).

Compute (size (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty))).
We defined some terminology already: a leaf is a node where both children are empty.
The root of a tree is the topmost node.
Definition root {X : Type} (t : bt X) : option X :=
  match t with
  | emptyNone
  | node _ v _Some v
  end.

Compute (root (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty))).
(* ===> Some 25 *)
It's common enough to want to make a leaf that we'll define a function to do it for us.
Definition leaf {X : Type} (v : X) : bt X := node empty v empty.

Compute (root (node (leaf 12) 25 (node (leaf 5) 5 empty))).
(* ===> Some 25 *)

Traversals

A tree is a useful data structure in a program, but it isn't always what people want to see---people often do better with lists.
Any function that looks at every node of a tree is called a traversal. The word 'traversal' often refers particularly to the process of converting a tree to a list.
There are three common notions of traversal, named for what they do with the value at a given node:
  • A pre-order traversal puts the value first, then traverses the left child, then the right child.
  • An in-order traversal first traverses the left child, then the value of the node, then the right child.
  • A post-order traversal first traverses the left child, then the right child, then the value of the node.
You'll notice that all three of these go left-to-right. That's a convention---you could look at the right children, then the left children, then the node. Which traversal you use in practice depends on
Here's an in-order traversal.
Fixpoint inorder {X : Type} (t : bt X) : list X :=
  match t with
  | empty[]
  | node l v rinorder l ++ [v] ++ inorder r
  end.

Compute (inorder (node (node empty 12 empty) 25 (node (node empty 5 empty) 5 empty))).

Exercise: 2 stars, standard (traversals)

Define traversal functions preorder and postorder that implement those respective traversals.
(* FILL IN HERE *)
Write a tree that produces different output from all three traversals. (Note the type!) It should have at least three nodes.
Definition different_all_three : bt base (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Write a tree that produces the same output from pre- and in-order traversals (but not post-order). It should have at least three nodes.
Definition same_pre_in : bt base (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Write a tree that produces the same output from in- and post-order traversals (but not pre-order). It should have at least three nodes.
Definition same_post_in : bt base (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_traversals : option (nat×string) := None.

Binary search trees (BSTs)

Binary search trees are pervasive and powerful data structures. They are used to implement sets and associative maps, two of the most commonly used data structures.
BSTs refine binary trees with an invariant called the 'search property'. A binary tree t is said to have the search property when for a given ordering (i.e., a notion of 'less than' and 'greater than'):
  • t is empty, or
  • t is node l v r where:
    + l and r have the search property
    + every value in l is less than v
    + every value in r is greater than v
The search property is very useful. Suppose you have a binary search tree, and you wonder if a value n is in it. First, you check the root: if that's it, great. If not, then ask: is n less than or greater than the root? You only have to keep looking in one side of the tree! If your tree is well balanced, you'll do very little traversal at all. If you kept a list rather than a BST, you'd have to search the whole list to be sure. Nice... what a savings!
It turns out other operations on BSTs---adding and deleting elements---are also very efficient. We'll implement a few.
(Sometimes people want BSTs were values in l are less than or equal to v. It depends; for sets and maps, you want less than, because it guarantees there are no duplicates.)

Exercise: 1 star, standard (search_property)

Write two trees of naturals that have the search property and two trees that do not. All of your trees should have at least two nodes.
Definition bst1 : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Definition bst2 : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Definition non_bst1 : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
Definition non_bst2 : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
(* Do not modify the following line: *)
Definition manual_grade_for_search_property : option (nat×string) := None.
We've already defined eqb and leb, so we could just use those. But it's common to use a compare operation that gives us 'complete' comparison information. It saves comparisons and is less error prone. What's not to like?
Inductive ordering : Type :=
  | LT
  | EQ
  | GT.

Fixpoint compare (n m : nat) : ordering :=
  match n, m with
  | O, OEQ
  | S _, OGT
  | O, S _LT
  | S n', S m'compare n' m'
  end.

Compute (compare 0 0). (* ===> EQ *)
Compute (compare 0 10). (* ===> LT *)
Compute (compare 7 4). (* ===> GT *)
Compute (compare 1000 1000). (* ===> EQ *)
Here's a function that uses the search property to rapidly determine whether or not a value is in a BST already.
Fixpoint lookup (n : nat) (t : bt nat) : bool :=
  match t with
  | emptyfalse
  | node l v r
    match compare n v with
    | LTlookup n l
    | EQtrue
    | GTlookup n r
    end
  end.

Exercise: 3 stars, standard (insert)

Write a function that inserts a number n into a binary search tree t. To get full credit, you should use the binary search property!
What should your function do when n is already in the tree?
Fixpoint insert (n : nat) (t : bt nat) : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Exercise: 2 stars, standard (remove_min)

Write a function that tries to remove the smallest element from a binary search tree. You'll need to return not just the smallest element, but also what's left of the tree.
Your function should return None only when t is empty. Use the search property!
Fixpoint remove_min (t : bt nat) : option (nat × bt nat) (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Exercise: 4 stars, standard (remove)

Write a function that removes a number n from a binary search tree. If n isn't in the tree, you should return the tree unchanged.
It turns out this is a hard function to write! Here's a general plan:
  • Handling empty should be simple.
  • To remove n from node l v r, look at compare n v.
    + The LT and GT cases should be simple.
    + The EQ case is tricky! Use remove_min to find a new root element from r if you can. What does it mean if remove_min r = None?
Fixpoint remove (n : nat) (t : bt nat) : bt nat (* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Exercise: 4 stars, standard, optional (is_bst)

Write a function is_bst that takes t : bt nat and returns true if t has the search property.
There are different ways to go here. A few hints:
  • You'll need at least one helper function. How can you check the "every value" parts of the search property?
  • Don't think about efficiency---you're just making things hard for yourself. Get it to work first, and then make it efficient if that'll make you happy.
We'd like you to write the signature yourself. If you can't finish writing the function, that's fine---but make sure you use Definition and Admitted to leave something behind for the grader. Talk to the TAs if you're stuck.
(* Do not modify the following line: *)
Definition manual_grade_for_is_bst : option (nat×string) := None.
(* 2022-01-12 10:20 *)