CS201 - Spring 2014

CS201 - Spring 2014 - Class 25

exercises

binary search tree height
   - most methods on a binary search tree are bounded by its height
   - what is the worse case height?
      - O(n) the twig
      - when does this happen?
         - insert elements in sorted or reverse sorted order
   - what is the best case height?
      - O(log_2 n)
      - when it's a complete tree

   - Randomized BST: the expected height of a randomly built binary search tree is O(log n), i.e. a tree where the values inserted are randomly selected
      - this is only useful if we know before hand all of the data we'll be inserting
      - does this give you an idea for a sorting algorithm?
         - randomly insert the data into a binary search tree
         - in-order traversal of the tree
         - running time
            - best-case: O(n log n)
            - worst-case: O(n^2) - we could still get unlucky
            - average-case: O(n log n)

balanced trees
   - even randomized trees still don't give us guaranteed best-case O(log n) height on the tree
   - however, there are approaches that can guarantee this by making sure the tree doesn't become too "unbalanced"
      - AVL trees
      - red-black tress
      - B-trees (used in databases and for "on-disk" trees)

red-black trees
   - a binary search tree with additional constraints
      - a binary search tree
      - each node is also labeled with a color, red or black
      - the root is always black (this isn't technically required, but doesn't hurt us and makes our life easier)
      - all red nodes have two children that are colored black
      - for a given node, the number of black nodes on any path from that node to any leaf is the same

   - how does this guarantee us our height?
      - what is the shortest possible path from the root to any leaf?
         - all black nodes
      - what is the longest possible path from the root to any leaf?
         - alternating red and black nodes (since a red node has to have two black children)
      - what is the biggest difference between the longest and shortest path?
         - since all paths must have the same number of black nodes, the longest path can be at most twice as long
         - the tree can be no more than an order of 2 imbalanced, which will still guarantee us O(log n) height, since 2 is just a constant multiplier

   - insertion into a red-black tree
      - we insert as normal into the binary tree at a leaf
      - we color the node inserted as red
      - then we need to fix up the tree to make it maintain the constraints
      - like delete for normal BSTs, there are a number of cases, with some more complicated than others
      - beyond the scope of this class, but they utilize "rotations" of the tree to alter the structure

   - rotations:
      - basic idea is to rotate the child up into the parent position and then give the child on the side of the rotation to the old parent
      - left-rotation
         - x with left subtree alpha and right subtree y with left subtree beta and right subtree gamma
         - becomes: y with right subtree gamma and left subtree x with left subtree alpha and right subtree beta
      - right rotation is in the opposite direction
      - how might this help us?
      - insert: 1, 2, 3 into the tree
         - inserting 1 and 2 is fine
         - after inserting 3, we have a twig
         - if we rotate left, it looks more like a balanced tree
- look at demo: http://www.cs.usfca.edu/~galles/visualization/RedBlack.html

n-ary trees

data structures with a purpose
   - as I've mentioned before, there is no one single best data structure
   - data structures help us speed up certain operations
   - what was the purpose of binary search trees?
      - speed up searching for items when we have a dynamically changing set
         - balanced BSTs have O(log n) search, insert and delete

priority queues
   - what did queues allow us to do efficiently?
      - keep track of a sequential ordering of items
      - add to the back and remove from the front in constant time
   - Queues work well for operations when everything is equal, but this is often not the case
   - A priority queue is a queue where order is determined by an associated priority
      - items with the lowest priority exit the queue before items with a larger priority
   - look at PriorityQueue interface in PriorityQueue code
      - very simple interface (like queue)
      - we can add elements
      - the only way we can remove elements is via the extractMin method, which removes the smallest elements from the set
   - when/where might priority queues be useful? common in scheduling tasks:
      - process scheduling
         - there are many processes running on your computer at any given time
         - each application you run has one or more processes associated with it
         - the operating system has many processes associated with it
         - why do we need priorities associated with processes?
            - some process are just more important than others
            - enforce fairness (we can adjust priorities of those processes that aren't getting much processor time)
         - the "top" command (on macs and linux machines) shows you the processes and their priorities (on windows this information is in the task manager, type ctrl+alt+del, select task manager and then select the processes tab)
            - shows a variety of information on the machine about the number of processes, cpu usage, memory usage, etc.
            - also shows each individual process and the cpu usage and the priority
            - typing 'q' exits top
      - network traffic scheduling
         - different information floating around the net may have higher priority than others
         - what might be some examples?
            - real-time/streamed data has higher priority over things like e-mail, etc.
            - certain customers might have higher priority
            - P2P protocol traffic (like bittorrent) often has lower priority

implementing a priority queue
   - what would be possible approaches?
      - use an ArrayList (or similar expandable linear structure)
      - two options:
         1) add at the back of the array
            - add: O(1)
            - extractMin: O(n)
         2) keep in sorted order with highest priority at the back
            - add: O(n)
            - extractMin: O(1)

      - look at SimpleArrayListPriorityQueue class in PriorityQueue code

restricting generic types
   - If we declare a generic type variable (e.g. <E>) this can be instantiated with ANY class
   - There are situations where we need to restrict the type of things that can be instantiated in the class variable
      - most often when you need to require that the class have certain attributes, e.g.
         - implement a particular interface
         - extend a particular class
      - you can add restrictions to the the type variable
      - For example: <E extends Comparable<E>>
         - defines a type parameter E
         - the classes that can be used to instantiate this type parameter must implement "Comparable<E>"
         - in the code, we can then assume that anything of type E has the compareTo method!