CS136, Lecture 24

Array representations of trees
Application: Heaps and Priority Queues
1. Inserting Into a Heap
2. Deleting the Root From a Heap
Sorting with Trees

Array representations of trees

We can also represent a tree in an array:

The array, data[0..n-1], holds the values to be stored in the tree. It does not contain references to the left or right subtrees.

Instead the children of node i are stored in positions 2*i +1 and 2*i + 2, and therefore the parent of a node j, may be found at (j-1)/2

The following example shows how a binary tree would be stored. The notation under the tree is a "parenthesis" notation for a tree. A tree is represented as (Root Left Right) where Root is the value of the root of the tree and Left and Right are the representations of the left and right subtrees (in the same notation). Leaves are represented by just writing their value. When a node has only one subtree, the space for the other is filled with () to represent the absence of a subtree.

Ex. (U (O C (M () P) ) (R (E T () ) S) )

IndexRange: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

data[]: U O R C M E S - - - P T - - -

Save space for links, but it is possible that there is exponentially much wasted storage:

Storing a tree of height n requires an array of length 2ⁿ - 1 (!), even if the tree only has O(n) elements. This makes this representation very expensive if you have a long, skinny tree. However it is very efficient for holding full or complete trees.

Application: Heaps and Priority Queues

Recall that a complete binary tree is one in which every level is full except possibly the bottom level and that level has all leaves in the leftmost positions. (Note that this is more restrictive than a balanced tree.)

Def: A Min-Heap H is a complete binary tree such that

1) H is empty or

2a) The root value is the smallest value in H and

2b) The left and right subtrees of H are also heaps.

This is equivalent to saying that H[i] <= H[2*i+1], H[2*i+2] for all approp values of i in the array representation of trees. Another way of looking at Min-Heap is that any path from a leaf to the root is in non-ascending order.

This turns out to be exactly what is needed to implement a priority queue.

A priority queue is a queue in which the elements with lowest priority values are removed before elements with higher priority.

public interface PriorityQueue {
    public Comparable peek();
    // pre: !isEmpty()
    // post: returns the minimum value in priority queue

    public Comparable remove();
    // pre: !isEmpty()
    // post: returns and removes minimum value from queue

    public void add(Comparable value);
    // pre: value is non-null comparable
    // post: value is added to priority queue

    public boolean isEmpty();
    // post: returns true iff no elements are in queue

    public int size();
    // post: returns number of elements within queue

    public void clear();
    // post: removes all elements from queue
}

One can implement a priority queue as a regular queue where either you work harder to insert or to remove an element (i.e. store in priority order, or search each time to remove lowest priority elements).

Unfortunately, in these cases either adding or deleting an element will be O(n). (Which one is O(n) depends on which of the two schemes is adopted!)

Can provide more efficient implementation with heap!

- remove element with lowest priority (at root of tree) and then remake heap.

Ex.

Note: In a heap - if a node has only 1 child, it must be a left child.

IndexRange: 0 1 2 3 4 5 6 7 8 9 10

data: 10 20 14 31 40 45 60 32 33 47 -

Inserting Into a Heap

1) Place number to be inserted at the next free position.

2) "Percolate" it up to correct position

Example: Insert 15 into the heap above.

Deleting the Root From a Heap

1) Save value in root element

2) Move last element to root

3) Push down element now in the root position (it was formerly the last element) to its correct position by repeatedly swapping it with the smaller of its two children.

Example: Delete root in the above example.

These are exactly what are needed to implement add and remove methods for priority queue!

public class VectorHeap implements PriorityQueue
{
    protected Vector data;

    public VectorHeap()
    // post: constructs a new priority queue.
    {
        data = new Vector();
    }

    protected static int parentOf(int i)
    // post: returns index of parent of value at i
    {
        return (i-1)/2;
    }

    protected static int leftChildOf(int i)
    // post: returns index of left child of value at i
    {
        return 2*i+1;
    }

    protected static int rightChildOf(int i)
    // post: returns index of right child of value at i
    {
        return 2*(i+1);
    }

    public Comparable peek()
    // pre: !isEmpty()
    // post: returns minimum value in queue
    {
        return (Comparable)data.elementAt(0);
    }

    public Comparable remove()
    // pre: !isEmpty()
    // post: removes and returns minimum value in queue
    {
        Comparable minVal = peek();
        data.setElementAt(data.elementAt(data.size()-1),0);
        data.setSize(data.size()-1);
        if (data.size() > 1) pushDownRoot(0);
        return minVal;
    }

    public void add(Comparable value)
    // pre: value is non-null comparable object
    // post: adds value to priority queue
    {
        data.addElement(value);
        percolateUp(data.size()-1);
    }

    public boolean isEmpty()
    // post: returns true iff queue has no values
    {
        return data.size() == 0;
    }

    protected void percolateUp(int leaf)
    // pre: 0 <= leaf < size
    // post: takes value at leaf in near-heap,
    //       and pushes up to correct location
    {
        int parent = parentOf(leaf);
        Comparable value = (Comparable)(data.elementAt(leaf));
        while (leaf > 0 && (value.lessThan((Comparable)(data.elementAt(parent)))))
        {
           data.setElementAt(data.elementAt(parent),leaf);
           leaf = parent;
           parent = parentOf(leaf);
        }
        data.setElementAt(value,leaf);
    }

    protected void pushDownRoot(int root)
    // pre: 0 <= root < size
    // post: pushes root down into near-heap
    //       constructing heap
    {
        int heapSize = data.size();
        Comparable value = (Comparable)data.elementAt(root);
        while (root < heapSize) {
          int childpos = leftChildOf(root);
          if (childpos < heapSize)
          {
            if ((rightChildOf(root) < heapSize) &&
            (((Comparable)(data.elementAt(childpos+1))).lessThan
                   ((Comparable)(data.elementAt(childpos)))))
            {
              childpos++;
            }
            // Assert: childpos indexes smaller of two children
            if (((Comparable)(data.elementAt(childpos))).lessThan(value))
            {
              data.setElementAt(data.elementAt(childpos),root);
              root = childpos; // keep moving down
            } else { // found right location
              data.setElementAt(value,root);
              return;
            }
          } else { // at a leaf! insert and halt
            data.setElementAt(value,root);
            return;
          }       
        }
    }

    ...
}

Notice how these heap operations implement a priority queue.

When you add a new element in a priority queue, copy it into the next free position of the heap and sift it up into its proper position.

When you remove the next element from the priority queue, remove the element from the root of heap (first elt, since it has lowest number for priority), move the last element up to the first slot, and then sift it down.

How expensive are sift up and sift down?

Each are log n. This compares very favorably with holding the priority queue as regular queue and inserting new elements into the right position in the queue and removing them from the front.

Skip section 12.3.2 on skew heaps.

Sorting with Trees

1. Tree sort:

We can build a binary search tree (as explained in next chapter of text) and then do an inorder traversal. Since the cost of entering an element into a (balanced) binary tree of size n is log n, the cost of building the tree is

(log 1) + (log 2) + (log 3) + ... + (log n) = O(n log n) compares.

Traversal is O(n). Total cost is O(n log n) in both the best and average cases.

The worst case is if the list is in order, it then behaves more like an insertion sort, creating a tree with one long branch. This results in a tree search as bad as O(n²).

The heap sort described below is always better - since it automatically keeps the tree in balance.

2. Heap Sort

We build a heap with the smallest element at top (taking <= (n/2) log n compares)

Once the heap is established remove elements one at a time, putting smallest at end, second smallest next to end, etc.

In detail:

Swap top with last element, sift down, do heap sort on remaining n-1 elements.

Ex. 25, 46, 19, 58, 21, 23, 12

public void heapSort (VectorHeap aheap)
{
    int last = aheap.size()-1;      // keeps track of how much of list is now in a heap 

     // Construct the initial heap.  Push down elts starting w/parent of last elt.
    for (int index = (last-1) / 2; index >= 0; index--)
    aheap.pushDownRoot(index);

    // Extract the elements in sorted order. 
    for (int index = last; index > 0; index--)
    {
            aheap.Swap(0, index);
            aheap.pushDownRoot(0)
    }
 }

We've cheated here since pushDownRoot is not public, but could make it so

Each sift down takes <= log n steps.

Therefore total compares <= f(n,2) log n + n log n = f(3n,2) log n, in worst case. Average about same.

No extra space needed!

Actually, with a little extra work we can show that the initial "heapifying" of the list can be done in O(n) compares. The key is that we only call SiftDown on the first half of the elements of the list. That is, no calls of SiftDown are made on subscripts corresponding to leaves of the tree (corresponding to n/2 of the elements). For those elements sitting just above the leaves (n/4 of the elements), we only go through the loop once (and thus we make only two comparisons of priorities). For those in the next layer (n/8 of the elements) we only go through the loop twice (4 comparisons), and so on. Thus we make 2*(n/4) + 4*(n/8) + 6*(n/16) + ... + 2*(log n)*(1) total comparisons. We can rewrite this as n*( 1/2¹ + 2/2² + 3/2³ + ... + log n/2^{log n}). (Of course in the last term, 2^{log n} = n, so this works out as above.) The sum inside the parentheses can be rewritten as Sum for i=1 to log n of (i/2ⁱ). This is clearly bounded above by the infinite sum, Sum for i=1 to infinity of i/2ⁱ. With some work the infinite sum can be shown to be equal to 2. (The trick is to arrange the terms in a triangle:

         1/2 +  1/4  +  1/8 +  1/16 + ... =  1
                1/4  +  1/8 +  1/16 + ... =  1/2
                        1/8 +  1/16 + ... =  1/4
                               1/16 + ... =  1/8
                                      ... = ..
                 -------------------------------
          Sum for i=1 to infinity of i/2ⁱ = 2

Thus n*( 1/2¹ + 2/2² + 3/2³ + ... + log n/2^{log n}) <= 2n, and hence the time to heapify an array is O(n).

3. Comparisons of Advanced sorts:

Quicksort fastest on average - O(n log n)), but bad in worst case O(n²), and takes O(log n) extra space.

Low overhead makes it perform well on average.

HeapSort takes O(n log n) in average and worst case, no extra space.

On random data somewhat slower than Quicksort and MergeSort. If you only need the first few items in a sorted list, it can be better since the initial heapify can be done in time O(n).

MergeSort takes O(n log n) in average and worst case, O(n) extra space.

On random data somewhat slower than Quicksort.
Performs well on external files where all data will not fit into memory.

All suffer from copying of large elements except insertion and merge sorts of linked lists.

Selection sort least affected since # copies is O(n), for rest is same as # compares.