CS62 - Spring 2010 - Lecture 14

how's the course going so far?

midterm
- average 53
- high 68.5

look at new constructor in BinaryTree for empty tree
   - I've also changed the constructor for BinaryTree(E data)
   - why do we have empty trees?
      - put them in any place where we would normally have a null left or right child
      - what does this buy us?
         - look at height method in BinaryTree code
            - why return -1?
               - 1+left.height(), and we don't want to count the "empty" nodes
            - by using an empty node, we can assume that all of the children of a non-empty node are non-null

searching in binary trees
   - look at search method in BinaryTree code
      - running time?
         - O(n)

binary search trees
   - we saw that search is O(n) time to find something in a tree. Can we do better?
   - binary search trees speed up the searching process by keeping the tree in a structured order. In particular:
      - left.data() < this.data() <= right.data()
   - what are the implications of this?
      - all items to the left are less than the value at the current node
      - all items to the right are greater than or equal to the value of the current node
   - how does this help us when it comes to searching?
      - at each node, we can compare the value we're looking for to the value at this node and we know which branch it's down!

      public boolean search(E item){
         if( isEmpty() ){
            return false;
         }
         else if( data.equals(item) ){
            return true;
         }else{
            if( item.compareTo(data) < 0 ){
               return left.search(item);
            } else {
               return right.search(item);
            }
         }
      }

   - notice that now rather than recursing on both the left and the right in our search method, we recurse on one or the other
   - has a similar feel to binary search
   - what is the running time?
      - O(h)
      - when is this a good running time and when is this a bad running time?
         - when the tree is full (or near full) then O(h) = O(log n)
         - when the tree is a twig (or near a twig) then O(h) = O(n) so we haven't gained anything
   - some more methods
      - How can we find the minimum value in the tree?
         - left-most value in the tree
         - running time? O(h)
      - Max?
         - right-most value in the tree
         - running time? O(h)
      - traversal: what kind of tree traversal would make sense?
         - in-order
            - visit nodes to the left first
            - then visit this node
            - finally, visit nodes to the right
         - in-order traversal will print them in sorted order
      - successor and predecessor
         - sometimes we may want to know where the predecessor or successor is in the tree, that is, the previous or next in the sequence of data
         - the simple case:
            - predecessor is the right-most node of the left subtree, i.e. the largest node of all of the elements that are less than a node
            - successor is the left-most node of the right sub-tree, i.e. the smallest node of all of the elements that are larger than a node
         - complications: what if a node doesn't have a left or right subtree?
            - it could be the max or the min, in which case, it might not have a success or predecessor
            - successor: what if there is no right subtree?
               - let x be a node with no right subtree
               - let y = successor(x)
               - we know that predecessor(y) = x
               - to find the predecessor of y, we'd look at the right most node of the left subtree
               - so, to find the successor of x
                  - keep going up as long as we're a right child
                  - when we're not a right child anymore, then that's the successor of x
            - predecessor is similar
         - what are the running time's of predecessor and successor? O(h)
   - inserting into a binary tree
      - assuming no duplicates, how can we insert into a binary tree?
         - always add to a leaf node
         - traverse down to some leaf node similar to search
         - add a node to the left or right depending on whether it is larger than or smaller than the leaf
      - running time? O(h)
   - deleting from a binary tree
      - let's say you're given a node (i.e. a BinaryTree) and you want to delete it
      - 3 cases:
         - it is a leaf: just delete it and set that child of the parent's to an empty node
         - it only has one child: splice it out
         - it has two children: replace x with it's successor in the list!
            - we know the successor is a leaf because we have a left subtree, so it's easy to remove
            - we know the successor is larger than anything in our left subtree (it's the largest of the left subtree)
            - we know the successor is smaller than anything in our right subtree (it's smaller than x)
      - running time? O(h)
   - we're starting to see a recurring trend that most algorithms are bounded by the height of the binary search tree
      - what is the worse case height?
         - O(n) the twig
         - when does this happen?
            - insert elements in sorted or reverse sorted order
      - what is the best case height?
         - O(log_2 n)
         - when it's a full tree
      - Randomized BST: the expected height of a randomly built binary search tree is O(log n), i.e. a tree where the values inserted are randomly selected
         - this is only useful if we know before hand all of the data we'll be inserting
   - does this give you an idea for a sorting algorithm?
      - randomly insert the data into a binary search tree
      - in-order traversal of the tree
      - running time
         - best-case: O(n log n)
         - worse-case: O(n^2) - we could still get unlucky
         - average-case: O(n log n)

balanced trees
   - even randomized trees still don't give us guaranteed best-case O(log n) height on the tree
   - however, there are approaches that can guarantee this by making sure the tree doesn't become too "unbalanced"
      - AVL trees
      - red-black tress
      - B-trees (used in databases and for "on-disk" trees)

red-black trees
   - a binary search tree with additional constraints
      - a binary search tree
      - each node is also labeled with a color, red or black
      - all empty nodes are black (i.e. children of leaves)
      - the root is always black (this isn't technically required, but doesn't hurt us and makes our life easier)
      - all red nodes have two black children
      - for a given node, the number of black nodes on any path from that node to any leaf is the same
   - how does this guarantee us our height?
      - what is the shortest possible path from the root to any leaf?
         - all black nodes
      - what is the longest possible path from the root to any leaf?
         - alternating red and black nodes (since a red node has to have two black children)
      - what is the biggest difference between the longest and shortest path?
         - since all paths must have the same number of black nodes, the longest path can be at most twice as long
      - the tree can be no more than an order of 2 imbalanced, which will still guarantee us O(log n) height, since 2 is just a constant multiplier
   - insertion into a red-black tree
      - we insert as normal into the binary tree at a leaf
      - we color the node inserted as red
      - then we need to fix up the tree to make it maintain the constraints
         - like delete for normal BSTs, there are a number of cases, with some more complicated than others
         - beyond the scope of this class, but they utilize "rotations" of the tree to alter the structure
      - rotations:
         - basic idea is to rotate the child up into the parent position and then give the child on the side of the rotation to the old parent
         - left-rotation
            - x with left subtree alpha and right subtree y with left subtree beta and right subtree gamma
            - becomes: y with right subtree gamma and left subtree x with left subtree alpha and right subtree beta
         - right rotation is in the opposite direction
         - how might this help us?
            - insert: 1, 2, 3 into the tree
               - inserting 1 and 2 is fine
               - after inserting 3, we have a twig
               - if we rotate left, it looks more like a balanced tree
   - look at demo: http://www.ece.uc.edu/~franco/C321/html/RedBlack/rb.orig.html

n-ary trees