The purpose of this course is not just to help you write bigger and more complex programs, but to help you be smarter programmers. Thus focus on careful design and analysis of data structures and algorithms to solve problems. This will be an important focus of the examinations.
I will presume that you have read the "text", on-line lecture notes, and sample code discussed in class.
Please come prepared for labs by having thought through the material and created a design for the program so that you can use the lab time effectively.
In general if have polynomial of the form a0 nk + a1 nk-1 + ... + ak , say it is O(nk).
Most common are
O(1) - for any constant
O(log n), O(n), O(n log n), O(n2), ..., O(2n)
Usually use these to measure time and space complexity of algorithms.
Insertion of new first element in an array of size n is O(n) since must bump all other elts up by one place.
Insertion of new last element in a vector of size n is O(1) if enough room for it, O(n) otherwise.
Saw increasing array size by 1 at a time to build up to n takes time n*(n-1)/2, which is O(n2).
Saw increasing array size to n by doubling each time takes time n-1, which is O(n).
Make table of values to show difference.
Suppose have operations with time complexity O(log n), O(n), O(n log n), O(n2), and O(2n).
And suppose all work on problem of size n in time t. How much time to do problem 10, 100, or 1000 times larger?
size | 10 n | 100 n | 1000 n |
---|---|---|---|
O(log n) | >3t | 10t | >30t |
O(n) | 10t | 100t | 1,000t |
O(n log n) | >30t | 1,000t | >30,000t |
O(n2) | 100t | 10,000t | 1,000,000t |
O(2n) | ~t10 | ~t100 | ~t1000 |
*Note that the last line depends on the fact that the constant is 1, otherwise the times are somewhat different.
Suppose get new machine that allows certain speed-up. How much larger problems can be solved? If original machine allowed solution of problem of size k in time t, then
speed-up | 1x | 10x | 100x | 1000x |
---|---|---|---|---|
O(log n) | k | k10 | k100 | k1000 |
O(n) | k | 10k | 100k | 1,000k |
O(n log n) | k | <10k | <100k | <1,000k |
O(n2) | k | 3k+ | 10k | 30k+ |
O(2n) | k | k+3 | k+7 | k+10 |
We will use big Oh notation to help us measure complexity of algorithms.
Only deal with searches here, come back to do sorts.
Code for all searches is on-line in Sort program example
If list has n elements, then n compares in worst case.
With each recursive call do at most two compares.
What is maximum number of recursive calls?
Concrete comparison of worst cases: # of comparisons:
Search\# elts | 10 | 100 | 1000 | 1,000,000 |
---|---|---|---|---|
linear | 10 | 100 | 1000 | 1,000,000 |
binary | 8 | 14 | 20 | 40 |
Can actually make binary search faster if don't compare for equality until only 1 elt left!
The data structures we examine are sometimes called container classes because they contain collections of elements. Virtually all of the data structures we will be studying in this course have interfaces which extend Container (in the structure package):
package structure; public interface Container { public int size(); // post: returns number of elts contained in container. public boolean isEmpty(); // post: returns true iff container is empty public void clear(); // post: clears container }
Unfortunately, Java does not currently allow the user to define data structures with the same flexibility. For example, we saw earlier that Vectors were defined to hold values of type Object. This had the disadvantage that if we put in some specific type of element, we often had to insert a downcast in order to use an element when removed from the Vector.
Of course we could have defined vectors specifically to hold elements of any specific type -- Renderable, for example. The problem is that for each application we might have to write a different version.
Until Java puts in features to write parameterized data structures (they're thinking about it) we instead will imitate Vector and write data structures which are designed to hold elements of type Object. This way we will be able to insert elements of any object type, while base types can be packed into their corresponding object forms and inserted.
public interface List extends Container { public Iterator elements(); // ignore for now! // post: returns an iterator allowing // ordered traversal of elements in list public int size(); // from Container // post: returns number of elements in list public boolean isEmpty(); // from Container // post: returns true iff list has no elements public void clear(); // from Container // post: empties list public void addToHead(Object value); // post: value is added to beginning of list public void addToTail(Object value); // post: value is added to end of list public Object peek(); // pre: list is not empty // post: returns first value in list public Object tailPeek(); // pre: list is not empty // post: returns last value in list public Object removeFromHead(); // pre: list is not empty // post: removes first value from the list public Object removeFromTail(); // pre: list is not empty // post: removes the last value from the list public boolean contains(Object value); // post: returns true iff list contains an object equal // to value public Object remove(Object value); // post: removes and returns element equal to value // otherwise returns null }
We can imagine other useful operations on lists, such as return nth element, etc., but we'll stick with this simple specification for now.
The text has a simple example of reading in successive lines from a text and adding each line to the end of a list if it doesn't duplicate an element already in the list. This is easily handled with the operations provided.
Suppose we decided to implement List using a vector:
public class VectList implements List { protected Vector listElts; public VectList() { listElts = new Vector(); } .... }How expensive would each of the operations be (worst case) if the VectList contains n elements?
Some are easy. Following are O(1). Why?
size(), isEmpty(), peek(), tailPeek(), removeFromTail()Others take more thought:
clear(); // O(n) currently, because reset all slots to null, // but could be O(1) addToHead(Object value); //O(n) - must move contents removeFromHead(); //O(n) - must move contents contains(Object value); //O(n) - must search remove(Object value); //O(n) - must search & move contentsThe last is the trickiest:
addToTail(Object value);If the vector holding the values is large enough, then it is clearly O(1), but if needs to increase in size then O(n). If use the doubling strategy then saw this is O(1) on average, but O(n) on average if increase by fixed amount.
All of the other operations have the same "O" complexity in the average case as for the best case.