CS150 - Fall 2012 - Class 15
quick review of dictionaries
- creating dictionaries
- creating an empty dictionary
>>> d = {}
>>> d
{}
- creating dictionaries with values
>>> d = {"apple": 2, "banana": 3, "pears": 15}
>>> d
{'apple': 2, 'pears': 15, 'banana': 3}
- accessing keys
>>> d["apple"]
2
>>> d["banana"]
3
d["grapefruit"]
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
KeyError: 'grapefruit'
- updating/adding values
>>> d["apple"] = 10
>>> d
{'apple': 10, 'pears': 15, 'banana': 3}
>>> d["pineapple"] = 1
>>> d
{'pineapple': 1, 'apple': 10, 'pears': 15, 'banana': 3}
- deleting values
- sometimes we want to delete a key/value pair
- "del" does this
>>> d
{'pineapple': 1, 'apple': 10, 'pears': 15, 'banana': 3}
>>> del d["pineapple"]
>>> d
{'apple': 10, 'pears': 15, 'banana': 3}
- useful built-in methods
- clear(): removes everything
- keys(): gets the keys as a list
- values(): gets the values as a list
exercises: histogram program
- how could we use a dictionary to generate the counts?
>>> data = [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
>>> print_counts(get_counts(data))
1 3
2 3
3 2
4 2
5 2
- first, we need to store them in a dictionary
- look at the get_counts function in
histogram.py code
- creates an empty hashtable
- iterates through the data
- check if the data is in the dictionary already
- if it is, just increment the count by 1
- if it's not, add it to the dictionary with a count of 1
- what types of things could we call get_counts on?
- anything that is iterable!
>>> get_counts(data)
{1: 3, 2: 3, 3: 2, 4: 2, 5: 2}
>>> get_counts("this is a string and strings are iterable")
{'a': 4, ' ': 7, 'b': 1, 'e': 3, 'd': 1, 'g': 2, 'i': 5, 'h': 1, 'l': 1, 'n': 3, 's': 5, 'r': 4, 't': 4}
>>> s = set([1, 2, 3, 4, 1, 2])
>>> s
set([1, 2, 3, 4])
>>> get_counts(s)
{1: 1, 2: 1, 3: 1, 4: 1}
- though sets aren't that interesting :)
- now that we have the dictionary of counts, how can we print them out?
- there are many ways we could iterate over the things in a dictionary
- iterate over the values
- iterate over the keys
- iterate over the key/value pairs
- which one is most common?
- since lookups are done based on the keys, iterating over the keys is the most common
- look at print_counts function in
histogram.py code
- by default, if you say:
for key in dictionary:
...
key will get associated with each key in the dictionary.
- this is the same as writing
for key in dictionary.keys():
...
- once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
- if you want to iterate over the values, use the values() method, which returns a list of the values
- what if you want to iterate over the key/value pairs?
- there is a method called items() that returns key value pairs as a 2-tuple
>>> my_dict = {"dave": 1, "anna": 15}
>>> my_dict.items()
[('dave', 1), ('anna', 15)]
- how could we use this in a loop?
for (key, value) in my_dict.items():
print "Key: " + str(key)
print "Value: " + str(value)
- items() returns a list of 2-tuples, which we're iterating over
- Does the following print like you'd like it to?
>>> print_counts(get_counts("this is some string"))
3
e 1
g 1
i 3
h 1
m 1
o 1
n 1
s 4
r 1
t 2
- prints in a random order
- like the values in sets, there is NO inherent ordering to the keys in a dictionary
- how could we print this in sorted order?
- get the keys first
- sort them
- then use them to iterate over the data
- look at print_counts_sorted in
histogram.py code
more tuple fun
- even if you don't supply parenthesis, if you comma separate values, they're interpreted as a tuple
>>> 1, 2
(1, 2)
>>> [1, 2], "b"
([1, 2], 'b')
- unpacking tuples
- given a tuple, we can "unpack" it's values into variables:
>>> my_tuple = (3, 2, 1)
>>> my_tuple
(3, 2, 1)
>>> (x, y, z) = my_tuple
>>> x
3
>>> y
2
>>> z
1
- be careful with unpacking
>>> (x, y) = my_tuple
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
ValueError: too many values to unpack
>>> (a, b, c, d) = my_tuple
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
ValueError: need more than 3 values to unpack
- notice that we can store anything in a tuple and unpack anything in a tuple
my_tuple = ([1], [2], [3])
>>> (x, y, z) = my_tuple
>>> x
[1]
>>> x.append(4)
>>> my_tuple
([1, 4], [2], [3])
>>>
- tuples are immutable, however, the objects inside a tuple may be mutable
- why hasn't this changed the tuple?
- the tuple still references the same three lists
- we've just appended something on to the list
- unpacking, combined with what we saw before, allows us to do some nice things:
- initializing multiple variables
>>> x, y, z = 1, 2, 3
>>> x
1
>>> y
2
>>> z
3
>>> first, last = "dave", "kauchak"
>>> first
'dave'
>>> last
'kauchak'
- say we have two variables x and y, how can we swap the values?
>>> x = 10
>>> y = 15
# now swap the values
>>> temp = x
>>> x = y
>>> y = temp
>>> x
15
>>> y
10
- is there a better way?
>>> x
15
>>> y
10
>>> x, y = y, x
>>> x
10
>>> y
15
returning multiple values from a function
- there are times when it's useful to return more than one value from a function
- any ideas how we might do this?
- return a tuple of the values
def multiple_return():
x = 10
y = 15
return x, y
>>> multiple_return()
(10, 15)
- if a function returns multiple values, how can we get it into multiple variables?
- unpacking!
>>> a, b = multiple_return()
>>> a
10
>>> b
15
matplotlib
- matplotlib is a module that allows us to create our own plots within python
- any guess where the name comes from?
- "matlab plotting library"
- it's a set of modules that supports plotting functionality similar to that available in matlab
- it is NOT built-in to python
- you have to download and install it
- I'll post instructions on the course web page for installation
- documentation
- General:
http://matplotlib.sourceforge.net/
- Basic tutorial:
http://matplotlib.sourceforge.net/users/pyplot_tutorial.html
- Examples:
http://matplotlib.sourceforge.net/examples/index.html
- think about creating a graph/plot. What functionality will we want?
- plot data
- as points
- as lines
- as bars
- ...
- label axes
- set the title
- add a legend
- annotate the graph
- add grid
- ...
- matplotlib supports all of this functionality (though some is easier to get at than others)
look at
basic_plotting.py code
- look at simple_plot function
- what does it do?
- plots x values vs. y values
- from matplotlib, we've imported pyplot, however, pyplot isn't a function, it's a module
- ideas?
- matplotlib is what is called a package
- a package is a way of organizing modules
- all the modules in a package are referenced using the . notation
- why might this be done?
- to avoid naming conflicts
- just like we use modules to hold functions to avoid functions with the same name, packages keep modules with the same name from conflicting
- notice that after plotting, we call the show() method
- why do you think it's done this way?
- so that we can make additional alterations to the plot, before displaying it
- run simple_plot function
- generates a nice looking graph
- notice that matplotlib does a pretty good job at picking default axis values, etc.
- a new window opens with your plot in it
- this new window has some interactive functionality
- most importantly, the ability to save the plot (as a .png file)
- look at multiple_plot
- what does it do?
- plots two lines with the same x coordinates, but different y
- the plot function can take any number of pairs of lists of x and y to plot multiple lines
- notice that we can either do it with multiple arguments to pyplot or multiple separate calls to pyplot
- look at fancy_multiple_plot
- what does it do?
- plot two data sets
- plot optionally can take a list of argument that affect how the line is drawn
- in addition to the x and y values, you have a third argument that is a string of settings
- each thing plotted is then a triplet
- there are a whole bunch of options, see the documentation on plot for a description:
-
http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot
- r = red
- o = circle marker
- b = blue
- + = plus marker
- - = solid line style
plotting sets vs lists
- last week, we ran an experiment that compared lists vs. sets
- we generated data for varying numbers of queries for asking if a list or a set included a particular value
- we wanted to plot the times as we increased the number of queries
- how did we do this last time?
- generated the data
- printed it out tab delimited so that we could copy and paste into Excel to plot
- run speed_test_old in
lists_vs_sets_improved.py code
- could we do this using matplotlib?
- first, what do we want to plot (e.g. what are the x axes and y axes)
- x axis is the number of queries
- y axis represents the time
- plot two lines
- one for list times
- one for set times
- what do we need to change in the code?
- we could just put the graphing code in the speed_data function instead of printing it out
- any problem with this?
- we lose the original functionality
- we could copy it and put the graphing code in
- any problem with this?
- we have duplicate code
- better idea: change speed_data to generate the data and then just store it and return it
- we can then use this data however we want (e.g. print it, plot it, etc.)
- look at speed_data in
lists_vs_sets_improved.py code
- generate three empty lists
- populate these lists as we get our timing data
- look at plot_speed_data in
lists_vs_sets_improved.py code
- takes the three lists as parameters and uses those to generate a plot
- adds a few more things to the plot to make it nicer
- xlabel: puts some text under the x-axis
- ylabel: puts some text next to the y-axis
- title: adds a title to the plot
- legend: adds a legend to the plot
- look at run_experiment in
lists_vs_sets_improved.py code
- generates the speed data and unpacks to three different variables
- passes these three variables to our plot_speed_data
- what about printing functionality?
- write another function that just prints out the values obtained from speed_data
- look at print_speed_data in
lists_vs_sets_improved.py code