CS150 - Fall 2012

CS150 - Fall 2012 - Class 13

admin
   - CS lunch Thursday
      - Ross dining hall at 12:20pm
      - or meet up with use at MBH 632 at 12:15pm and walk over

   - Friday lab
      - partnered again
         - can work on the lab prep together if you'd like
         - both people must be there when working on it
         - work on one computer
         - change occasionally who is coding
         - E-mail me if you're looking for a partner
      - I won't be there, but the tutors will be and you still should be :)

   - Tour of Middlebury server, etc. facilities
      - Tuesday, October 30 at 3pm
      - Warner Hemicycle

classes
   - a "class"is the blueprint describing what data and methods an object will have
   - an object is an instance of a class
      - for example, we could define a class people
         - people have attributes
         - people will have methods
         - when we define a particular person, it is an object that is an instance of the class or people
   - classes define types
      - in Python, since all things are objects, then they all represent instances of objects
      - though in other languages, you could have a type that is not defined by a class

   - since everything we've seen is an object, then all the objects/types we've seen are classes
      - for any class, we can type help(class_name) to get information about the class (methods, etc.)

      >>> help(int)
      >>> help(list)

   - by the end of this class, you're going to be able to understand almost all of the information that comes back from calling help

what is a data structure?
   - a way of storing and organizing data
   - no free lunch:
      - different data structures are optimized to make different operations better (i.e. faster, more memory efficient, etc.)
      - there is not single best data structure
      - depending on the application, you will have to decide how to store your data

sets
   - what is a set, i.e. a set of data points?
      - an unordered collection of data

      - how does this differ from a list?
         - a list has a sequential order to it

   - what operations/methods might we want from a set?
      - create new/construct
      - add things to the set
      - ask if something belongs in the set
      - intersect
      - union
      - remove things from the set

set class
   >>> help(set)

   - the first thing we see is how to create new sets
   - these are called the constructors for a class
      - they define how we "construct" (or create) new objects (instances of that class)
      - we can construct a new set using a constructor

         >>> s = set()
         >>> s
         set([])
         >>> s = set([4, 3, 2, 1])
         >>> s
         set([1, 2, 3, 4])
         >>> s = set("abcd")
         >>> s
         set(['a', 'c', 'b', 'd'])

      - notice that there were two constructors
         - the empty constructor (set()), which created an empty set
         - and a constructor that took a single parameter
            - a list
            - a string
            - in general, any thing that we can iterate over in a for loop (we'll get back to this later)
      - when we print out the value of s it explicitly states that it is a set
         "set([1, 2, 3, 4])"
      - notice that there even though we may give it something where there is ordering, the ordering is NOT preserved
   - we've used constructors before
      >>> s = str(10)
      >>> x = int("1234")

      - notice these constructors took in an object and then created a new int/string

      - every class of objects has a constructor
      - some other ones may be useful down the road...
         >>> list("abcd")
         ['a', 'b', 'c', 'd']

   - some of the more common classes like int, float, string, list, etc. have special syntax (sometimes called "syntactic sugar") for creating the objects in a special way
         >>> 10
         10
         >>> [1, 2, 3, 4]
         [1, 2, 3, 4]
         >>> "abcd"
         'abcd'

      - but these are still just constructor calls
         >>> int(10)
         10
         >>> list([1, 2, 3])
         [1, 2, 3, 4]
         >>> str("abcd")
         'abcd'

   - set methods
      - class methods can be broken down into two types of methods
         - mutator methods that change the underlying object
         - accessor methods that do NOT change the underlying object, but ask some question about the data and give us some information back
      - from the help output, which of the following are mutator vs. accessor?
         - add
         - clear
         - difference
         - difference_update
         - intersection
         - intersection_update
         - ...
      - mutators: add, clear, different_update, intersection_update
         - all of these will change the object
      - accessor: difference, intersection
         - these will NOT change the object
      - other interesting methods
         - pop
         - remove
         - isdisjoint
         - issubset
         - issuperset
         - union
         - update
      - supports most of the methods you'd want for a set
         >>> s = set([1,2,3,4])
         >>> s.add(5)
         >>> s
         set([1, 2, 3, 4, 5])
         >>> s2 = set([4, 5, 6, 7])
         >>> s2
         set([4, 5, 6, 7])
         >>> s.difference(s2)
         set([1, 2, 3])
         >>> s
         set([1, 2, 3, 4, 5])
         >>> s2
         set([4, 5, 6, 7])
         >>> s.union(s2)
         set([1, 2, 3, 4, 5, 6, 7])
         >>> s.intersection(s2)
         set([4, 5])
         >>> s
         set([1, 2, 3, 4, 5])
         >>> s2
         set([4, 5, 6, 7])
         >>> s.intersection_update(s2)
         >>> s
         set([4, 5])
         >>> s2
         set([4, 5, 6, 7])

      - we can also ask if an item is in a set
         >>> 1 in s2
         False
         >>> 5 in s2
         True
         >>> "abc" in s2
         False
         >>> s2 in s2
         False

      - notice that you CANNOT index into a set (there is no order)
         >>> s[0]
         Traceback (most recent call last):
          File "<string>", line 1, in <fragment>
         TypeError: 'set' object does not support indexing

why sets?
   - seems like we could do all of these things and more with lists?
      - list has all of the operations like add, pop, find that sets have
      - some nice operations like union and intersection, but we could put these in the list class
      - in fact, lists also support the "in" notation
         >>> some_list = [1, 2, 3, 4]
         >>> 4 in some_list
         True
         >>> "abc" in some_list
         False

   - why have the separate class for set?
      - performance!

   - write the following function:
      - contains(list, item)
         - returns True if the item is in the list
         - false otherwise
         - don't use "in" or "find"

      def contains(list, item):
         for thing in list:
            if thing == item:
               return True

         return False

      - If we're searching for an item and we double the size of the list, how much longer (on average) do you think it would take to run this function?
         - twice as long
         - we're looping through each item in the list
         - computers are fast, but there still is a cost to each operation
      - what if we quadrupled the size of the list?
         - four times as long
      - the contains function above is called a "linear" runtime function
         - its runtime varies linearly with respect to the input
      - can we do better than linear for finding an item?

look at lists_vs_sets.py code
   - two functions for generating data
      - generate_set: generates random points and puts them into a set
      - generate_list: generates random points and puts them into a list
   - query_data
      - generates num_queries random numbers
      - uses "in" to see if they are in the data set
      - times how long it takes to do num_queries
   - speed_test
      - generates equal sized data sets in both list and set form
      - then calls query_data to see how long it takes to query each one

         >>> speed_test(1000, 100)
         List creation took 0.003422 seconds
         Set creation took 0.003589 seconds
         --
         List querying took 0.002917 seconds
         Set querying took 0.000194 seconds

      - for small sizes, they behave fairly similarly
      - as we increase the size of the set and the number of queries, however, we start to see some differences

         >>> speed_test(10000, 100)
         List creation took 0.023313 seconds
         Set creation took 0.021885 seconds
         --
         List querying took 0.021288 seconds
         Set querying took 0.000179 seconds

         >>> speed_test(10000, 1000)
         List creation took 0.020332 seconds
         Set creation took 0.021198 seconds
         --
         List querying took 0.213577 seconds
         Set querying took 0.001833 seconds

         >>> speed_test(100000, 1000)
         List creation took 0.186876 seconds
         Set creation took 0.220910 seconds
         --
         List querying took 2.148366 seconds
         Set querying took 0.001881 seconds

      - we can better understand these by generating points as we increase the size of the set/list and then plotting them
         >>> speed_data(1000, 10000, 100000, 5000)
         size   list   set
         10000   0.237790   0.001881
         15000   0.358325   0.001999
         20000   0.469743   0.001956
         25000   0.602107   0.001916
         30000   0.687776   0.001889
         35000   0.824027   0.001903
         40000   0.921235   0.001952
         45000   1.009843   0.001912
         50000   1.156059   0.001927
         55000   1.386080   0.001913
         60000   1.566058   0.001984
         65000   1.722870   0.001936
         70000   2.025138   0.001966
         75000   2.363384   0.001962
         80000   2.619580   0.002030
         85000   2.897005   0.002054
         90000   2.975576   0.001946
         95000   3.418256   0.002082

      - we can copy and paste this in Excel and plot it
         - we'll look later at how to plot within Python

when to use a set vs. a list?
   - lists have an ordering
      - if you need indexing, use a list
   - sets are faster for asking membership
      - if you don't care about the order, use a set!

midterm
   - Good job!
   - high of 51
   - average 47
   - median 48