CS150 - Fall 2012

CS150 - Fall 2012 - Class 15

quick review of dictionaries
   - creating dictionaries
      - creating an empty dictionary
         >>> d = {}
         >>> d
         {}

      - creating dictionaries with values
         >>> d = {"apple": 2, "banana": 3, "pears": 15}
         >>> d
         {'apple': 2, 'pears': 15, 'banana': 3}

   - accessing keys
      >>> d["apple"]
      2
      >>> d["banana"]
      3
      d["grapefruit"]
      Traceback (most recent call last):
       File "<string>", line 1, in <fragment>
      KeyError: 'grapefruit'

   - updating/adding values
      >>> d["apple"] = 10
      >>> d
      {'apple': 10, 'pears': 15, 'banana': 3}
      >>> d["pineapple"] = 1
      >>> d
      {'pineapple': 1, 'apple': 10, 'pears': 15, 'banana': 3}

   - deleting values
      - sometimes we want to delete a key/value pair
      - "del" does this
      >>> d
      {'pineapple': 1, 'apple': 10, 'pears': 15, 'banana': 3}
      >>> del d["pineapple"]
      >>> d
      {'apple': 10, 'pears': 15, 'banana': 3}

   - useful built-in methods
      - clear(): removes everything
      - keys(): gets the keys as a list
      - values(): gets the values as a list

exercises: histogram program
   - how could we use a dictionary to generate the counts?
      >>> data = [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
      >>> print_counts(get_counts(data))
      1   3
      2   3
      3   2
      4   2
      5   2
   - first, we need to store them in a dictionary
      - look at the get_counts function in histogram.py code
         - creates an empty hashtable
         - iterates through the data
         - check if the data is in the dictionary already
            - if it is, just increment the count by 1
            - if it's not, add it to the dictionary with a count of 1
      - what types of things could we call get_counts on?
         - anything that is iterable!
            >>> get_counts(data)
            {1: 3, 2: 3, 3: 2, 4: 2, 5: 2}
            >>> get_counts("this is a string and strings are iterable")
            {'a': 4, ' ': 7, 'b': 1, 'e': 3, 'd': 1, 'g': 2, 'i': 5, 'h': 1, 'l': 1, 'n': 3, 's': 5, 'r': 4, 't': 4}
            >>> s = set([1, 2, 3, 4, 1, 2])
            >>> s
            set([1, 2, 3, 4])
            >>> get_counts(s)
            {1: 1, 2: 1, 3: 1, 4: 1}

            - though sets aren't that interesting :)
   - now that we have the dictionary of counts, how can we print them out?
      - there are many ways we could iterate over the things in a dictionary
         - iterate over the values
         - iterate over the keys
         - iterate over the key/value pairs
      - which one is most common?
         - since lookups are done based on the keys, iterating over the keys is the most common
      - look at print_counts function in histogram.py code
         - by default, if you say:

            for key in dictionary:
               ...

            key will get associated with each key in the dictionary.
         - this is the same as writing

            for key in dictionary.keys():
               ...

         - once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
      - if you want to iterate over the values, use the values() method, which returns a list of the values
      - what if you want to iterate over the key/value pairs?
         - there is a method called items() that returns key value pairs as a 2-tuple
            >>> my_dict = {"dave": 1, "anna": 15}
            >>> my_dict.items()
            [('dave', 1), ('anna', 15)]
         - how could we use this in a loop?

            for (key, value) in my_dict.items():
               print "Key: " + str(key)
               print "Value: " + str(value)
         - items() returns a list of 2-tuples, which we're iterating over

   - Does the following print like you'd like it to?
      >>> print_counts(get_counts("this is some string"))
          3
      e   1
      g   1
      i   3
      h   1
      m   1
      o   1
      n   1
      s   4
      r   1
      t   2

      - prints in a random order
      - like the values in sets, there is NO inherent ordering to the keys in a dictionary
   - how could we print this in sorted order?
      - get the keys first
      - sort them
      - then use them to iterate over the data
   - look at print_counts_sorted in histogram.py code

more tuple fun
   - even if you don't supply parenthesis, if you comma separate values, they're interpreted as a tuple
      >>> 1, 2
      (1, 2)
      >>> [1, 2], "b"
      ([1, 2], 'b')

   - unpacking tuples
      - given a tuple, we can "unpack" it's values into variables:
         >>> my_tuple = (3, 2, 1)
         >>> my_tuple
         (3, 2, 1)
         >>> (x, y, z) = my_tuple
         >>> x
         3
         >>> y
         2
         >>> z
         1

         - be careful with unpacking
            >>> (x, y) = my_tuple
            Traceback (most recent call last):
             File "<string>", line 1, in <fragment>
            ValueError: too many values to unpack
            >>> (a, b, c, d) = my_tuple
            Traceback (most recent call last):
             File "<string>", line 1, in <fragment>
            ValueError: need more than 3 values to unpack
      - notice that we can store anything in a tuple and unpack anything in a tuple
         my_tuple = ([1], [2], [3])
         >>> (x, y, z) = my_tuple
         >>> x
         [1]
         >>> x.append(4)
         >>> my_tuple
         ([1, 4], [2], [3])
         >>>

         - tuples are immutable, however, the objects inside a tuple may be mutable
         - why hasn't this changed the tuple?
            - the tuple still references the same three lists
            - we've just appended something on to the list

      - unpacking, combined with what we saw before, allows us to do some nice things:
         - initializing multiple variables

         >>> x, y, z = 1, 2, 3
         >>> x
         1
         >>> y
         2
         >>> z
         3
         >>> first, last = "dave", "kauchak"
         >>> first
         'dave'
         >>> last
         'kauchak'

         - say we have two variables x and y, how can we swap the values?
            >>> x = 10
            >>> y = 15
            # now swap the values
            >>> temp = x
            >>> x = y
            >>> y = temp
            >>> x
            15
            >>> y
            10
         - is there a better way?
            >>> x
            15
            >>> y
            10
            >>> x, y = y, x
            >>> x
            10
            >>> y
            15

returning multiple values from a function
   - there are times when it's useful to return more than one value from a function
   - any ideas how we might do this?
      - return a tuple of the values

      def multiple_return():
       x = 10
       y = 15

       return x, y

      >>> multiple_return()
      (10, 15)
   - if a function returns multiple values, how can we get it into multiple variables?
      - unpacking!

      >>> a, b = multiple_return()
      >>> a
      10
      >>> b
      15

matplotlib
   - matplotlib is a module that allows us to create our own plots within python
      - any guess where the name comes from?
         - "matlab plotting library"
      - it's a set of modules that supports plotting functionality similar to that available in matlab
      - it is NOT built-in to python
         - you have to download and install it
            - I'll post instructions on the course web page for installation
      - documentation
         - General: http://matplotlib.sourceforge.net/
         - Basic tutorial: http://matplotlib.sourceforge.net/users/pyplot_tutorial.html
         - Examples: http://matplotlib.sourceforge.net/examples/index.html
   - think about creating a graph/plot. What functionality will we want?
      - plot data
         - as points
         - as lines
         - as bars
         - ...
      - label axes
      - set the title
      - add a legend
      - annotate the graph
      - add grid
      - ...
   - matplotlib supports all of this functionality (though some is easier to get at than others)

look at basic_plotting.py code
   - look at simple_plot function
      - what does it do?
         - plots x values vs. y values
      - from matplotlib, we've imported pyplot, however, pyplot isn't a function, it's a module
         - ideas?
         - matplotlib is what is called a package
         - a package is a way of organizing modules
            - all the modules in a package are referenced using the . notation
         - why might this be done?
            - to avoid naming conflicts
            - just like we use modules to hold functions to avoid functions with the same name, packages keep modules with the same name from conflicting
      - notice that after plotting, we call the show() method
         - why do you think it's done this way?
            - so that we can make additional alterations to the plot, before displaying it
      - run simple_plot function
         - generates a nice looking graph
         - notice that matplotlib does a pretty good job at picking default axis values, etc.
         - a new window opens with your plot in it
            - this new window has some interactive functionality
            - most importantly, the ability to save the plot (as a .png file)

   - look at multiple_plot
      - what does it do?
         - plots two lines with the same x coordinates, but different y
      - the plot function can take any number of pairs of lists of x and y to plot multiple lines
      - notice that we can either do it with multiple arguments to pyplot or multiple separate calls to pyplot

   - look at fancy_multiple_plot
      - what does it do?
         - plot two data sets
      - plot optionally can take a list of argument that affect how the line is drawn
         - in addition to the x and y values, you have a third argument that is a string of settings
         - each thing plotted is then a triplet
      - there are a whole bunch of options, see the documentation on plot for a description:
         - http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot
            - r = red
            - o = circle marker
            - b = blue
            - + = plus marker
            - - = solid line style

plotting sets vs lists
   - last week, we ran an experiment that compared lists vs. sets
   - we generated data for varying numbers of queries for asking if a list or a set included a particular value
   - we wanted to plot the times as we increased the number of queries

   - how did we do this last time?
      - generated the data
      - printed it out tab delimited so that we could copy and paste into Excel to plot
      - run speed_test_old in lists_vs_sets_improved.py code

   - could we do this using matplotlib?
      - first, what do we want to plot (e.g. what are the x axes and y axes)
         - x axis is the number of queries
         - y axis represents the time
         - plot two lines
            - one for list times
            - one for set times
      - what do we need to change in the code?
         - we could just put the graphing code in the speed_data function instead of printing it out
            - any problem with this?
               - we lose the original functionality
         - we could copy it and put the graphing code in
            - any problem with this?
               - we have duplicate code
         - better idea: change speed_data to generate the data and then just store it and return it
            - we can then use this data however we want (e.g. print it, plot it, etc.)
      - look at speed_data in lists_vs_sets_improved.py code
         - generate three empty lists
         - populate these lists as we get our timing data
      - look at plot_speed_data in lists_vs_sets_improved.py code
         - takes the three lists as parameters and uses those to generate a plot
         - adds a few more things to the plot to make it nicer
            - xlabel: puts some text under the x-axis
            - ylabel: puts some text next to the y-axis
            - title: adds a title to the plot
            - legend: adds a legend to the plot
      - look at run_experiment in lists_vs_sets_improved.py code
         - generates the speed data and unpacks to three different variables
         - passes these three variables to our plot_speed_data
   - what about printing functionality?
      - write another function that just prints out the values obtained from speed_data
      - look at print_speed_data in lists_vs_sets_improved.py code