CS150 - Fall 2012 - Class 13

  • admin
       - CS lunch Thursday
          - Ross dining hall at 12:20pm
          - or meet up with use at MBH 632 at 12:15pm and walk over

       - Friday lab
          - partnered again
             - can work on the lab prep together if you'd like
             - both people must be there when working on it
             - work on one computer
             - change occasionally who is coding
             - E-mail me if you're looking for a partner
          - I won't be there, but the tutors will be and you still should be :)
       
       - Tour of Middlebury server, etc. facilities
          - Tuesday, October 30 at 3pm
          - Warner Hemicycle

  • classes
       - a "class"is the blueprint describing what data and methods an object will have
       - an object is an instance of a class
          - for example, we could define a class people
             - people have attributes
             - people will have methods
             - when we define a particular person, it is an object that is an instance of the class or people
       - classes define types
          - in Python, since all things are objects, then they all represent instances of objects
          - though in other languages, you could have a type that is not defined by a class

       - since everything we've seen is an object, then all the objects/types we've seen are classes
          - for any class, we can type help(class_name) to get information about the class (methods, etc.)

          >>> help(int)
          >>> help(list)

       - by the end of this class, you're going to be able to understand almost all of the information that comes back from calling help

  • what is a data structure?
       - a way of storing and organizing data
       - no free lunch:
          - different data structures are optimized to make different operations better (i.e. faster, more memory efficient, etc.)
          - there is not single best data structure
          - depending on the application, you will have to decide how to store your data

  • sets
       - what is a set, i.e. a set of data points?
          - an unordered collection of data

          - how does this differ from a list?
             - a list has a sequential order to it

       - what operations/methods might we want from a set?
          - create new/construct
          - add things to the set
          - ask if something belongs in the set
          - intersect
          - union
          - remove things from the set

  • set class
       >>> help(set)
          
       - the first thing we see is how to create new sets
       - these are called the constructors for a class
          - they define how we "construct" (or create) new objects (instances of that class)
          - we can construct a new set using a constructor

             >>> s = set()
             >>> s
             set([])
             >>> s = set([4, 3, 2, 1])
             >>> s
             set([1, 2, 3, 4])
             >>> s = set("abcd")
             >>> s
             set(['a', 'c', 'b', 'd'])

          - notice that there were two constructors
             - the empty constructor (set()), which created an empty set
             - and a constructor that took a single parameter
                - a list
                - a string
                - in general, any thing that we can iterate over in a for loop (we'll get back to this later)
          - when we print out the value of s it explicitly states that it is a set
             "set([1, 2, 3, 4])"
          - notice that there even though we may give it something where there is ordering, the ordering is NOT preserved
       - we've used constructors before
          >>> s = str(10)
          >>> x = int("1234")

          - notice these constructors took in an object and then created a new int/string

          - every class of objects has a constructor
          - some other ones may be useful down the road...
             >>> list("abcd")
             ['a', 'b', 'c', 'd']

       - some of the more common classes like int, float, string, list, etc. have special syntax (sometimes called "syntactic sugar") for creating the objects in a special way
             >>> 10
             10
             >>> [1, 2, 3, 4]
             [1, 2, 3, 4]
             >>> "abcd"
             'abcd'

          - but these are still just constructor calls
             >>> int(10)
             10
             >>> list([1, 2, 3])
             [1, 2, 3, 4]
             >>> str("abcd")
             'abcd'

       - set methods
          - class methods can be broken down into two types of methods
             - mutator methods that change the underlying object
             - accessor methods that do NOT change the underlying object, but ask some question about the data and give us some information back
          - from the help output, which of the following are mutator vs. accessor?
             - add
             - clear
             - difference
             - difference_update
             - intersection
             - intersection_update
             - ...
          - mutators: add, clear, different_update, intersection_update
             - all of these will change the object
          - accessor: difference, intersection
             - these will NOT change the object
          - other interesting methods
             - pop
             - remove
             - isdisjoint
             - issubset
             - issuperset
             - union
             - update
          - supports most of the methods you'd want for a set
             >>> s = set([1,2,3,4])
             >>> s.add(5)
             >>> s
             set([1, 2, 3, 4, 5])
             >>> s2 = set([4, 5, 6, 7])
             >>> s2
             set([4, 5, 6, 7])
             >>> s.difference(s2)
             set([1, 2, 3])
             >>> s
             set([1, 2, 3, 4, 5])
             >>> s2
             set([4, 5, 6, 7])
             >>> s.union(s2)
             set([1, 2, 3, 4, 5, 6, 7])
             >>> s.intersection(s2)
             set([4, 5])
             >>> s
             set([1, 2, 3, 4, 5])
             >>> s2
             set([4, 5, 6, 7])
             >>> s.intersection_update(s2)
             >>> s
             set([4, 5])
             >>> s2
             set([4, 5, 6, 7])

          - we can also ask if an item is in a set
             >>> 1 in s2
             False
             >>> 5 in s2
             True
             >>> "abc" in s2
             False
             >>> s2 in s2
             False

          - notice that you CANNOT index into a set (there is no order)
             >>> s[0]
             Traceback (most recent call last):
              File "<string>", line 1, in <fragment>
             TypeError: 'set' object does not support indexing   

  • why sets?
       - seems like we could do all of these things and more with lists?
          - list has all of the operations like add, pop, find that sets have
          - some nice operations like union and intersection, but we could put these in the list class
          - in fact, lists also support the "in" notation
             >>> some_list = [1, 2, 3, 4]
             >>> 4 in some_list
             True
             >>> "abc" in some_list
             False

       - why have the separate class for set?
          - performance!

       - write the following function:
          - contains(list, item)
             - returns True if the item is in the list
             - false otherwise
             - don't use "in" or "find"

          def contains(list, item):
             for thing in list:
                if thing == item:
                   return True
             
             return False

          - If we're searching for an item and we double the size of the list, how much longer (on average) do you think it would take to run this function?
             - twice as long
             - we're looping through each item in the list
             - computers are fast, but there still is a cost to each operation
          - what if we quadrupled the size of the list?
             - four times as long
          - the contains function above is called a "linear" runtime function
             - its runtime varies linearly with respect to the input
          - can we do better than linear for finding an item?

  • look at lists_vs_sets.py code
       - two functions for generating data
          - generate_set: generates random points and puts them into a set
          - generate_list: generates random points and puts them into a list
       - query_data
          - generates num_queries random numbers
          - uses "in" to see if they are in the data set
          - times how long it takes to do num_queries
       - speed_test
          - generates equal sized data sets in both list and set form
          - then calls query_data to see how long it takes to query each one
          
             >>> speed_test(1000, 100)
             List creation took 0.003422 seconds
             Set creation took 0.003589 seconds
             --
             List querying took 0.002917 seconds
             Set querying took 0.000194 seconds

          - for small sizes, they behave fairly similarly
          - as we increase the size of the set and the number of queries, however, we start to see some differences
       
             >>> speed_test(10000, 100)
             List creation took 0.023313 seconds
             Set creation took 0.021885 seconds
             --
             List querying took 0.021288 seconds
             Set querying took 0.000179 seconds

             >>> speed_test(10000, 1000)
             List creation took 0.020332 seconds
             Set creation took 0.021198 seconds
             --
             List querying took 0.213577 seconds
             Set querying took 0.001833 seconds

             >>> speed_test(100000, 1000)
             List creation took 0.186876 seconds
             Set creation took 0.220910 seconds
             --
             List querying took 2.148366 seconds
             Set querying took 0.001881 seconds

          - we can better understand these by generating points as we increase the size of the set/list and then plotting them
             >>> speed_data(1000, 10000, 100000, 5000)
             size   list   set
             10000   0.237790   0.001881
             15000   0.358325   0.001999
             20000   0.469743   0.001956
             25000   0.602107   0.001916
             30000   0.687776   0.001889
             35000   0.824027   0.001903
             40000   0.921235   0.001952
             45000   1.009843   0.001912
             50000   1.156059   0.001927
             55000   1.386080   0.001913
             60000   1.566058   0.001984
             65000   1.722870   0.001936
             70000   2.025138   0.001966
             75000   2.363384   0.001962
             80000   2.619580   0.002030
             85000   2.897005   0.002054
             90000   2.975576   0.001946
             95000   3.418256   0.002082

          - we can copy and paste this in Excel and plot it
             - we'll look later at how to plot within Python

  • when to use a set vs. a list?
       - lists have an ordering
          - if you need indexing, use a list
       - sets are faster for asking membership
          - if you don't care about the order, use a set!

  • midterm
       - Good job!
       - high of 51
       - average 47
       - median 48