CS62 - Spring 2010 - Lecture 17
heap of trouble?
announcements
- office hours will start late ~11 on Wed.
- class participation if you've seen the material before...
- cs lunch on Thursday at noon in Frank West
tail recursion
- a recursive method is tail recursive if the last thing it does is call itself recursively
- which of the following are tail recursive?
public static int factorial(int n){
if( n <= 1 ){
return 1;
}else{
return n * factorial(n-1);
}
}
public static int mystery1(int num){
if( num%2 == 0 ){
return num;
}else{
return mystery1(num/2);
}
}
public static int pow1(int n, int i){
if (i == 0){
return 1;
}else{
return n * pow1(n, i-1);
}
}
public static int pow2(int n, int i, int acc){
if (i == 0){
return acc;
}else{
return pow2(n, i-1, n * acc);
}
}
- what about the heapify method in ArrayListPriorityQueue?
- why do we care about tail recursion?
- let's look at the call for pow1(10)
- what would the call stack look like when we get to the base case?
- we'll have a stack frame for each of the 10, 9, 8, ..., 0 calls to pow(10)
- why do we need these?
- when we finally get to the base case, we can return the value, pop that frame off of the call-stack and then use the returned value in the next frame on the stack
- eventually, we'll pop all of the call frames off the stack and we will return our answer
- let's look at the call for pow2(10)
- what would the call stack look like when we get to the base case?
- same as above
- do we need the call-stack in this case? What happens when we get to the base case?
- once we get to the base case, we actually have our answer and we just need to return the result
- for tail recursive methods, the last thing a method does is make the recursive call, therefore, we don't need to keep track of the actual call-stack since when we're done, we can just return our answer
- what is the benefit of this?
- less memory usage
- for example, if we called pow1 with a large enough i (e.g. pow1(1, 1000000)) we will get a stack overflow exception, however, in theory, we could avoid this since the stack isn't necessary (Java doesn't optimize for tail recursion, though :( )
- actually faster, since we don't have to deal with creating, pushing and poping stack frames
- tail recursion is interesting for two reasons:
- tail recursive methods are easy to convert into iterative methods
- some compilers will optimize for tail recursion
- converting to tail recursion
- often, we can convert a method to a tail recursive method with a little bit of work
- a common approach is what we had to do above for the pow1 method: introduce an additional accumulator variable that you pass along to keep track of the running result. When you hit the base case, you just return the result.
- write a tail recursive version of the factorial method
search
- look at the SimpleMap interface in
Hashtables code
- basic set of operations to keep track of a set
- add things to the set (via put)
- check if something exists in the set (via containsKey)
- remove things from the set
- how quickly can we implement this using a:
- ArrayList
- version 1: append to the end
- put: O(1)
- contains: O(n)
- version 2: keep in sorted order
- put: O(n)
- contains: O(log n)
- BST (balanced, e.g. Red-Black tree)
- put: O(log n)
- contains: O(log n)
- can we do better?
- what if we knew that the keys were in a certain range, e.g. between 0 and 1000?
- make an array of booleans from 0 to 1000
- the put method would simply switch that entry to true (remove to false)
- contains would just return whether that entry was true or not
- running time?
- put: O(1)
- contains: O(1)
- what is the problem with this type of approach?
- not very memory efficient. What if we just have 10 things in our set?
- sometimes infeasible
- what if we want to keep track of the last names?
- last census: 88,799 last names
- how big of an array would we need, let's say assuming they're all < 10 characters long?
- 26^10 = a really big number
- even if we could store that, we'd be wasting a bunch of space
universe of keys
- we have some universe of keys (often called U) that we want to store, be it numbers, strings, objects, etc.
- the problem with using an array-based approach is that the array has to be the size of the universe of keys
hash functions
- a hash function is a function that maps the universe of keys to a restricted range, call it m, where m << |U|, that is m is much smaller than the universe of keys
- how does this help us?
- now we don't have to have an array of size |U|, just have to have an array of size m
- a hashtable is a data structure that uses an array of some sort to store the items. Using a hash function, any item is mapped to the array.
- to find if an item exists in the hash table, we hash the item and see if it exists in the table at the specified entry
- what can happen if m < |U|?
- we can have two things map to the same position in the array even though they're not equivalent, that is h(x) == h(y) even though !x.equals(y)
- this is called a "collision"
- a good hash function will try to avoid them but if m < |U|, they are inevitable
- why?
- pigeonhole principle: if n items are put into m pigeonholes with n > m, then at least on pigeonhole must contain more than one item
- simple idea, but often useful for proving things
hashCode
- every object in java has a method called hashCode that returns an attempt at a unique integer for that object
- how does this happen?
- it's another method (like equals and toString) that is inherited from the Object class
- the hashCode method for Object is based on the objects location in memory and does a fairly good job of providing unique numbers, however...
- if you plan on using Maps or hashtables with an Object, you should consider overriding the hashCode method
- the two requirements of the hashCode method are (
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#hashCode(
)):
- "Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified."
- "If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result."
- A number of the common classes (like String, Integer, etc) do have overridden hashCode methods
- how would you write a hashCode method for String?
- via ASCII we can easily get a number between 0 and 255 for each character. Now what?
- could just use the first letter?
- meets the requirement, but not a very good hash function since we'll get clumps
- add them up?
- a little bit better, but still not great (see Figure 15.7 from Bailey)
- treat it just like a big 255-based number:
\sum_{i = 0}^{l-1} s[i]c^i
where c = 256
- see Figure 15.9 from Bailey
- how did Bailey generate these tables?
collision resolution by chaining
- ideas for solving this problem of collisions?
- a common approach is to allow multiple items to occupy a given entry in our array. How?
- rather than just having an the item stored at the entry, store a linked list
- put: if two items hash to the same location in the array, just add them to the linked list
- contains: do a search of all of the entries at that entry to see if the item being search for is there
- walk through an example
- show ChainedHashtable class in
Hashtables code
- hashCodes are integers
- our table has a fixed length
- how do we remedy this?
- % table.length (look at getEntry method)
- what is the run-time of the put and containsKey methods?