CS62 - Spring 2010 - Lecture 14
how's the course going so far?
midterm
- average 53
- high 68.5
look at new constructor in BinaryTree for empty tree
- I've also changed the constructor for BinaryTree(E data)
- why do we have empty trees?
- put them in any place where we would normally have a null left or right child
- what does this buy us?
- look at height method in
BinaryTree code
- why return -1?
- 1+left.height(), and we don't want to count the "empty" nodes
- by using an empty node, we can assume that all of the children of a non-empty node are non-null
searching in binary trees
- look at search method in
BinaryTree code
- running time?
- O(n)
binary search trees
- we saw that search is O(n) time to find something in a tree. Can we do better?
- binary search trees speed up the searching process by keeping the tree in a structured order. In particular:
- left.data() < this.data() <= right.data()
- what are the implications of this?
- all items to the left are less than the value at the current node
- all items to the right are greater than or equal to the value of the current node
- how does this help us when it comes to searching?
- at each node, we can compare the value we're looking for to the value at this node and we know which branch it's down!
public boolean search(E item){
if( isEmpty() ){
return false;
}
else if( data.equals(item) ){
return true;
}else{
if( item.compareTo(data) < 0 ){
return left.search(item);
} else {
return right.search(item);
}
}
}
- notice that now rather than recursing on both the left and the right in our search method, we recurse on one or the other
- has a similar feel to binary search
- what is the running time?
- O(h)
- when is this a good running time and when is this a bad running time?
- when the tree is full (or near full) then O(h) = O(log n)
- when the tree is a twig (or near a twig) then O(h) = O(n) so we haven't gained anything
- some more methods
- How can we find the minimum value in the tree?
- left-most value in the tree
- running time? O(h)
- Max?
- right-most value in the tree
- running time? O(h)
- traversal: what kind of tree traversal would make sense?
- in-order
- visit nodes to the left first
- then visit this node
- finally, visit nodes to the right
- in-order traversal will print them in sorted order
- successor and predecessor
- sometimes we may want to know where the predecessor or successor is in the tree, that is, the previous or next in the sequence of data
- the simple case:
- predecessor is the right-most node of the left subtree, i.e. the largest node of all of the elements that are less than a node
- successor is the left-most node of the right sub-tree, i.e. the smallest node of all of the elements that are larger than a node
- complications: what if a node doesn't have a left or right subtree?
- it could be the max or the min, in which case, it might not have a success or predecessor
- successor: what if there is no right subtree?
- let x be a node with no right subtree
- let y = successor(x)
- we know that predecessor(y) = x
- to find the predecessor of y, we'd look at the right most node of the left subtree
- so, to find the successor of x
- keep going up as long as we're a right child
- when we're not a right child anymore, then that's the successor of x
- predecessor is similar
- what are the running time's of predecessor and successor? O(h)
- inserting into a binary tree
- assuming no duplicates, how can we insert into a binary tree?
- always add to a leaf node
- traverse down to some leaf node similar to search
- add a node to the left or right depending on whether it is larger than or smaller than the leaf
- running time? O(h)
- deleting from a binary tree
- let's say you're given a node (i.e. a BinaryTree) and you want to delete it
- 3 cases:
- it is a leaf: just delete it and set that child of the parent's to an empty node
- it only has one child: splice it out
- it has two children: replace x with it's successor in the list!
- we know the successor is a leaf because we have a left subtree, so it's easy to remove
- we know the successor is larger than anything in our left subtree (it's the largest of the left subtree)
- we know the successor is smaller than anything in our right subtree (it's smaller than x)
- running time? O(h)
- we're starting to see a recurring trend that most algorithms are bounded by the height of the binary search tree
- what is the worse case height?
- O(n) the twig
- when does this happen?
- insert elements in sorted or reverse sorted order
- what is the best case height?
- O(log_2 n)
- when it's a full tree
- Randomized BST: the expected height of a randomly built binary search tree is O(log n), i.e. a tree where the values inserted are randomly selected
- this is only useful if we know before hand all of the data we'll be inserting
- does this give you an idea for a sorting algorithm?
- randomly insert the data into a binary search tree
- in-order traversal of the tree
- running time
- best-case: O(n log n)
- worse-case: O(n^2) - we could still get unlucky
- average-case: O(n log n)
balanced trees
- even randomized trees still don't give us guaranteed best-case O(log n) height on the tree
- however, there are approaches that can guarantee this by making sure the tree doesn't become too "unbalanced"
- AVL trees
- red-black tress
- B-trees (used in databases and for "on-disk" trees)
red-black trees
- a binary search tree with additional constraints
- a binary search tree
- each node is also labeled with a color, red or black
- all empty nodes are black (i.e. children of leaves)
- the root is always black (this isn't technically required, but doesn't hurt us and makes our life easier)
- all red nodes have two black children
- for a given node, the number of black nodes on any path from that node to any leaf is the same
- how does this guarantee us our height?
- what is the shortest possible path from the root to any leaf?
- all black nodes
- what is the longest possible path from the root to any leaf?
- alternating red and black nodes (since a red node has to have two black children)
- what is the biggest difference between the longest and shortest path?
- since all paths must have the same number of black nodes, the longest path can be at most twice as long
- the tree can be no more than an order of 2 imbalanced, which will still guarantee us O(log n) height, since 2 is just a constant multiplier
- insertion into a red-black tree
- we insert as normal into the binary tree at a leaf
- we color the node inserted as red
- then we need to fix up the tree to make it maintain the constraints
- like delete for normal BSTs, there are a number of cases, with some more complicated than others
- beyond the scope of this class, but they utilize "rotations" of the tree to alter the structure
- rotations:
- basic idea is to rotate the child up into the parent position and then give the child on the side of the rotation to the old parent
- left-rotation
- x with left subtree alpha and right subtree y with left subtree beta and right subtree gamma
- becomes: y with right subtree gamma and left subtree x with left subtree alpha and right subtree beta
- right rotation is in the opposite direction
- how might this help us?
- insert: 1, 2, 3 into the tree
- inserting 1 and 2 is fine
- after inserting 3, we have a twig
- if we rotate left, it looks more like a balanced tree
- look at demo:
http://www.ece.uc.edu/~franco/C321/html/RedBlack/rb.orig.html
n-ary trees