CS62 - Spring 2010 - Lecture 24

destructor exercise

a few things from last time
   - C++ does NOT have a root, Object class
   - memory leaks in Java
      - Most commonly, when an object who's lifespan is longer than the lifespan of another object still keeps a reference
      - Circular references:
         Node n1 = new Node();
         Node n2 = new Node();
         n1.setNext(n2);
         n2.setNext(n1);

a few notes about the assignment
   - make sure to pass file streams by reference:
      readHelper(ifstream& in)
      writeHlper(ofstream& out)
   - I've given you code for clear and the big 3, you just need to move it over and use it (modulo a change or two to accommodate the empty tree)
   - if you're confused about height and depth, look in the notes for binary trees
   - "empty" in the code I gave you refers to the empty tree. Ideally, you only have one empty tree for all required empty trees. You can just make this a public variable and use it whenever you need an empty tree (e.g. in the constructors).
   - talk about BinaryTreeIO:: (and why NOT to use it :)

other announcements
   - Sr. presentations today and tomorrow
   - Pre-registration pizza
   - TA reminder

random thing in C++ for the day :)
   - arrays in C++
      - arrays in C++ inherit most of their functionality (and weirdness) from C
      - my advice... just don't use them... use the vector class and call it a day
      - If for some reason you come across them...
         - int myArray[50];
            - allocations an array of ints with 50 elements in it
            - the [] have to come AFTER the variable name (unlike in Java where either before or after the variable is fine)
            - string myArray[50], etc.
         - What is an array in Java?
            - reference
            - how could you check this?
               - I've told you that Java always uses call-by value
               - make an array
               - pass it to a method
               - change an array entry in the method
               - see if the change is seen outside the method
         - Creating a new array does the following:
            - allocates enough memory for the, say 50, objects/items
               - the "sizeof" function returns the size in bytes of an object, if you're curious
            - an array is then shorthand for a pointer to the beginning of that chunk of memory
               - when you access the array, say,
                  myArray[i]

               this is just shorthand for:
                  *(myArray+i)

                  - add i "int"s worth to the pointer myArray (remember adding to an int pointer adds 4 bytes!)
                     - if it were a different object, for example:
                        IntCell myIntCellArray[50]
                     then it would increment a different amount of memory, since an IntCell would be larger than an int
                  - give me the value referenced by this new pointer
               - For example, the following are equivalent:

               for( int i = 0; i < 50; i++ ){
                  myArray[i] = 0;
               }

               int* ptr;

               for( ptr = myArray; ptr < myArray+50; ptr++ ){
                  *ptr = 0;
               }

         - other differences
            - there is no .length member variable for arrays, and no way of telling how long an array is, so you need to pass along the length
            - there is no bounds checking, so be careful with your array indices

   - iterators
      - what methods does an Iterator have in Java?
         - next()
         - hasNext()
         - remove() // optional
      - how are they used?
         - to traverse through a data set
      - In C++, iterators are implemented in a fashion similar to pointers
      - unlike Java, each class has it's own iterator type
         - vector<int>::iterator
         - map<int, int>::iterator
         - map<int, list<pair<int,int> > >::iterator
      - Look at vector_iterator() method in iterator.cpp code
         - we start out at the beginning of our elements with the begin() method
            - returns an iterator at the beginning of the data to iterate through
         - we can access the elements we're iterating through via the iterator variable
            - notice that we're given a pointer to that object, so we can actually modify the object
            - "it" is like a pointer, though, so you need to dereference it or use -> if you want to call methods
         - incrementing the iterator, is just like incrementing a pointer
            - it++
         - the end() method also returns an iterator that is past the end of the data
            - we use this to see if we've iterated through all of the data
         - just to make it clear, iterators are NOT pointers, but by using operator overloading, they're made to function like operators
      - What does map_iterator() method do in iterator.cpp code ?
         - first, we create a map object
            - the key is an int
            - the value is a pair of ints
         - next, we add things to that object
            - the key is i
            - the value is (i, i)
         - finally, we make an iterator and traverse the data
            - again, each different type has a different iterator type
            - the iterator for a map returns a pointer to a key/value pair
            - it->first is the key
            - it->second is the value (which is itself a pair)
      - look at map_iterator(const ...) method in iterator.cpp code
         - often we want to pass objects by constant references
         - in this case, we can't use a normal iterator!
            - a normal iterator would allow us to modify the values
         - instead, we use a const_iterator
         - almost all classes that have iterators also implement const_iterator

graphs, a quick recap
   - A graph is a set of vertices (or nodes) V and a set of edges (u,v) in E where u, v are in V
   - a path is a list of vertices p_1, p_2, ..., p_k where there exists an edge (p_i, p_{i+1}) in E
      - a "simple" path is a path where all edges are unique

representing graphs
   - so far, we've drawn them on the board fine, but how are we going to store them for processing?
   - adjacency list
      - each vertex u in V contains a linked list of all the vertices v such that there exists an edge (u, v) in E, that is that there is an edge from u to v

      A: B->D
      B: A->D
      C: D
      D: A->B->C->E
      E: D

   - adjacency matrix
      - a |V| by |V| matrix A, such that A_ij is 1 if edge (i, j) is in E, 0 otherwise

       A B C D E
      A 0 1 0 1 0
      B 1 0 0 1 0
      C 0 0 0 1 0
      D 1 1 1 0 1
      E 0 0 0 1 0

      - what will this matrix look like if the graph is undirected?
         - it will be symmetric
   - examples:
      - draw the following graphs
         --
         A: B C
         B: A C
         C: A B
         --
         A: D
         B: D E
         C: D B
         D: A B C D E
         E: B C
         --
          A B C
         A 1 0 0
         B 0 0 1
         C 0 1 0
   - how would we incorporate weights into both of these approaches?
      - adjacency list: just keep that additional piece of information in the linked list
      - adjacency matrix: store that value in the matrix (instead of just a 0 or a 1)
   - What are the benefits/drawbacks of each approach and when might each be useful?
      - adjacency list
         - good for sparse graphs
         - more space efficient (for sparse graphs)
         - must traverse the adjacency list to discover if an edge exists
      - adjacency matrix
         - good for dense graphs
         - constant time lookup to discover if an edge exists
         - for non-weighted graphs, only requires a boolean matrix
   - Can we get the best of both worlds (constant lookup, good sparse representation)?
   - sparse adjacency matrix
      - rather than storing adjacent vertices as a linked list, store as a hashtable
      - benefits/drawbacks?
         - constant time lookup
         - fairly space efficient (though some overhead with keeping the table)
         - not good for dense graphs

finding cycles
   - given a connected graph, how can we determine if it has a cycle in it?
      - or, given a connected graph, determine that it is not a tree
   - what is the definition of a cycle?
      - a simple path, where the endpoints are the same
   - idea:
      - start at a node, go down a path
         - stop when either we find a vertex on the path that we've already seen
         - or when we hit a dead-end
      - if we hit a dead-end, backtrack and find another path
      - if we visit all of the nodes, without finding a repeat vertex, it's acyclic
   - does this sound like anything we've seen before?
      - depth first search!

   void dfs(vertex u, visited) {
      if(!visited(u)){
         visited.add(u);

         for (v: neighbors of u){
            if (!visited(v)){
               dfs(v, visited);
            }
         }
      }
   }

   - what modifications need to be made?
      - if we visit a node that we've already visited, then we've found a cycle
      - what about where we just came from?
         - need to know where we came from so we can avoid calling that a cycle
      - want to return true if we find a cycle, false otherwise

   bool dfsCycle(vertex u, vertex parent, visited) {
      bool result = false;
      visited.add(u);

      for(v: neighbors of u){
         if(!visited(v)){
            result = result || dfsCycle(v, u);
         } else if (v != parent){
            result = true;
         }
      }

      return result;
   }

   - observations:
      - what does it do?
         - runs depth first search
         - if it finds a visited node that was not it's parent (i.e. a cycle) returns true
         - otherwise, false
      - how is this different from DFS that we saw before?
         - we have the additional else if to see if we've found a cycle
         - why do we need the parent as a parameter?
            - so we can distinguish finding a visited node in a cycle vs. a visited node where we just came from
      - what does "result ||= dfsCycle(v, u)" do?
         - result = result || dfsCycle(v, u)
         - which is true if we find a cycle anywhere
   - walk through an example
   - let's try and actually implement our boolean cycle detector
      - how can we represent a vertex?
         - simplest is just use a number, i.e. an int
      - we'll use an adjacency list represenation
         - in C++ there is a "list" class in the STL library
      - we have a few options for declaring the graph type:
         - what if we wanted to use a vector to store the vertices?
            - vector<list<int> > graph
            - what is one downside to this approach?
               - assumes the vertices are sequential, that is 0, 1, 2, ...
         - what is another option?
            - map<int, list<int> >
         - what if we wanted to add weights?
            - map<int, list<pair<int, int> > >
      - look at dfs_hasCycles in graph_algorithms.cpp code
         - what does "list<int> nbrList = adjMap.find(v)->second" do?
            - get's the adjacency list associated with vertex v
            - why the "->second"?
               - recall, the map iterator returns a pair
               - "->first" would give us the key (in this case, just v)
            - why can't we write "list<int> nbrList = adjMap[v]"?
               - the operator[] is not a const method
               - it can be used to change the map
                  adjMap[v] = ...
         - use an iterator to iterate thorough the neighbor list
         - what does "visited.find(v) != visited.end()" do?
            - checks to see if v is in the list
            - another option:
               visited.count(v) > 0
         - note the recursive call to dfs_hasCycles
      - look at grop_hasCycles in graph_algorithms.cpp code
         - takes a graph
            - why passed by reference?
               - to avoid copying
         - the "set" class is useful for keeping track of which nodes we've visited
         - why do we have the for loop?
            - graphs may not be connected!
            - still could have a cycle though
            - need to make sure we've visited all possible sections of the graph when looking for cycles
         - notice the use of a const_iterator (and the type of that iterator)
      - will be implementing a version of this where you actually return the cycle
   - running time
      - how many times do we call dfsCycle on each vertex?
         - exactly once for a connected graph
         - the first thing we do is set visited to true for that vertex
         - and will never revisit a visited vertex
      - what is the cost of each call to dfsCycle?
         - depends on the representation
      - adjacency matrix:
         - we need to traverse all V entries to get the neighbors
         - O(|V|^2) overall
      - adjacency list:
         - a little trickier
         - how many times do we process each edge?
            - once
         - O(|V| + |E|), which for a connected graph is O(|E|)