Red-black trees also maintain balance, but restructuring only O(1) after an update.
A red-black tree is a binary search tree with nodes colored red and black in a way that satisfies the following properties:
Proposition: The height of a red-black tree storing n entries is O(log n). In fact, log(n+1) <= h <= 2 log(n+1).
The idea behind this is if we erase the red nodes then the tree will be perfectly balanced. Hence the black height = log (# black nodes). The red nodes can at most double the length of a path.
Insertion in a red-black tree:
If n and p are opposite children (one is left and the other is right),
then perform a rotation that moves n up to p's place in the tree (and p
moves down). Notice that this does not change the black depth of any
external node. Now they are both right children or both left
children.
Now that n and p are both the same kind of child then rotate around
grand parent g so that p is now at the root of the subtree. Now change
the colors of p and g so that n and g are now both red. Notice that u
is the child of g and black (because we had no prior conflicts). It is
easy to verify that we have fixed the double red problem and not
changed any other paths. Also notice that the black depth of each
external node is the same as in the original.
The main advantage of red-black trees over AVL is that they need fewer rotations, so in practice will be a bit faster. However, algorithms are considerably more complex, so AVL easier to explain.
The pictures in these lecture notes are from the wikipedia article on red-black trees.
Splay means to spread outwards
Tree can get more unbalanced w/ splay operations, but ave depth of nodes on original path is (on average) cut in half. Therefore if repeatedly look for same elts, then will find much faster (happens in practice).
Implemented as a specialization of binary search tree - same interface, but when perform insert or search, splay the elt. When remove an elt, splay its parent.
If more recently accessed elements are more likely to be accessed then this can be a big win, but lose big if each uniformly likely to be accessed, though on average gives O(log n) behavior. Note that needs no extra info at nodes.
So far we have seen two ways in which we can represent sets
Bitstrings are very fast and easy and support set operations using the usual bit operations of & (for intersection), | (for union) and ~ (for complement). Set subtraction, A-B, can also be supported by a combination of these operations (left as an exercise for the reader).
Unfortunately we need a discrete linear ordering for this to work (e.g., every element has a unique successor), and in fact they all need to be codeable as non-negative integers. Thus this works well for subranges of ints, for chars, and for enumerated types. However, they will not work well for strings or for other complex orderings.
Hash tables work well for representing elements when we have a good hash table, but they don't support the set operations well at all. Taking a union or intersection would involve traversing through all of the elements of a hash table (empty and non-empty) to process individual elements. This is O(N) for the size of the table, which is usually larger than O(n) (actually, we ignore constants with "O", but you know what I mean!).
A simple alternative is to use an ordered sequence (e.g., an ordered linked list). It is easy to see how we could perform union, intersection, and set subtraction operations. If moving to the next element, comparing, and copying are all O(1) then the entire thing will be O(n+m) where n and m are the sizes of the sets involved.
See design pattern in text for Template Method Pattern. See code on-line in OrderedListSet.