Lecture 9: Dictionaries
Topics
Lunch with Prof. Osborn (or others)
Test 1 Monday 3/2
- in class
- paper-based
- can bring in two pages of notes (either two pieces of paper, single-side or one piece, double-sided)
- problems like practice problems
- coding
- what's wrong with this function
- what would this function do
- is this valid?
- what would the output be
- …
- practice writing code on paper (it's different than on the computer)
- I'll post practice problems
- cover everything through today's lecture (not recursion)
Student Presentation
Dictionaries (aka "maps")
- store keys and an associated value
- each key is associated with a value
- lookup can be done based on the key
- this is a very common phenomena in the real world. What are some examples?
- social security number
- key = social security number
- value = name, address, etc
- phone numbers in your phone (and phone directories in general)
- key = name
- value = phone number
- websites
- key = url
- value = location of the computer that hosts this website
- car license plates
- key = license plate number
- value = owner, type of car, …
- flight information
- key = flight number
- value = departure city, destination city, time, …
- social security number
- creating new dictionaries
dictionaries can be created using curly braces
>>> d = {} >>> d {}
dictionaries function similarly to lists, except we can put things in ANY index and can use non-numerical indices
>>> d[15] = 1 >>> d {15: 1}
notice when a dictionary is printed out, we get the key AND the associated value
>>> d[100] = 10 >>> d {100: 10, 15: 1} >>> my_list = [] >>> my_list[15] = 1 Traceback (most recent call last): File "<string>", line 1, in <fragment> IndexError: list assignment index out of range
- dictionaries ARE very different than lists….
we can also update the values already in a dictionary
>> d[15] = 2 >>> d {100: 10, 15: 2} >>> d[100] += 1 >>> d {100: 11, 15: 2}
keys in the dictionary can be ANY immutable object
>>> d2 = {} >>> >>> d2["dave"] = 1 >>> d2["anna"] = 1 >>> d2["anna"] = 2 >>> d2["seymore"] = 100 >>> d2 {'seymore': 100, 'dave': 1, 'anna': 2}
the values can be ANY object
- >>> d3 = {} >>> d3["dave"] = [] >>> d3 {'dave': []} >>> d3["dave"].append(1) >>> d3["dave"].append(40) >>> d3 {'dave': [1, 40]}
be careful to put the key in the dictionary before trying to use it
>>> d3["steve"] Traceback (most recent call last): File "<string>", line 1, in <fragment> KeyError: 'steve' >>> d3["steve"].append(1) Traceback (most recent call last): File "<string>", line 1, in <fragment> KeyError: 'steve'
how do you think we can create non-empty dictionaries from scratch?
>>> another_dict = {"dave": 1, "anna":100, "seymore": 21} >>> another_dict {'seymore': 21, 'dave': 1, 'anna': 100}
- what are some other methods you might want for dictionaries (things you might want to ask about them?
- does it have a particular key?
- how many key/value pairs are in the dictionary?
- what are all of the values in the dictionary?
- what are all of the keys in the dictionary?
- remove all of the items in the dictionary?
dictionaries support most of the other things you'd expect them too that we've seen in other data structures
>>> "seymore" in another_dict True >>> len(another_dict) 3
dictionaries are a class of objects, just like everything else we've seen (called dict … short for dictionary)
>>> help(dict)
some of the more relevant methods:
>>> d2 {'seymore': 100, 'dave': 1, 'anna': 2} >>> d2.values() [100, 1, 2] >>> d2.keys() dict_keys(['seymore', 'dave', 'anna']) >>> d2.pop('seymore') >>> d2 {'dave': 1, 'anna': 2} >>> d2.clear() >>> d2 {}
Tracking frequencies
- We're going to use dictionaries to store counts like we did on paper last lecture
- Write a function called get_counts that takes a list of numbers and returns a dictionary containing the counts of each of the numbers
Key idea:
def get_counts(numbers): d = {} for num in numbers: # do something here return d
- There are two cases we need to contend with:
- if the number isn't in the dictionary
- In this case we need to add it with the value 1:
d[num] = 1
- In this case we need to add it with the value 1:
- if the number is in the dictionary
- In this case, we just need to increment it:
d[num] = d[num] + 1
- This can also be written
- In this case, we just need to increment it:
- if the number isn't in the dictionary
- Look at the
get_counts
function in dictionaries.py We now can generate the counts from our file
>>> data = read_numbers('numbers.txt') >>> data >>> [1, 2, 3, 2, 1, 1, 2, 6, 7, 8, 10, 1, 5, 5, 5, 3, 8, 6, 7, 6, 4, 1, 1, 2, 3, 1, 2, 3] >>> get_counts(data) {1: 7, 2: 5, 3: 4, 6: 3, 7: 2, 8: 2, 10: 1, 5: 3, 4: 1}
Iterating over dictionaries
- We're almost to the point where we can find the most frequent value.
- Next, we need to go through all of the values in the dictionary to find the most frequent one.
- there are many ways we could iterate over the things in a dictionary
- iterate over the values
- iterate over the keys
- iterate over the key/value pairs
- which one is most common?
- since lookups are done based on the keys, iterating over the keys is the most common
by default, if you say:
for key in dictionary: ...
key
will get associated with each key in the dictionary in turn
once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
for key in dictionary: value = dictionary[key] ...
- look at the
print_counts
function"\t" is the tab character
>>> data = read_numbers('numbers.txt') >>> counts = get_counts(data) >> print_counts(counts) 1 7 2 5 3 4 6 3 7 2 8 2 10 1 5 3 4 1
- Notice that the keys are not in numerical order. In general, there's no guarantee about the ordering of the keys, only that you'll iterate over all of them.
- look at the
get_most_frequent_value
function - Looks very similar to the my_max function we wrote way back in lecture 6 notes (https://cs.pomona.edu/classes/cs51a/lectures/lec06.html)
- We keep a variable (
max_value
) that stores the largest value we've seen so far- We'll initialize it to -1 assuming that the numbers are all positive
- See problem set 6 for a general solution
- We then iterate through each of the key/value pairs in our dictionary
- We compare the value (i.e.
counts[key]
) to the largest value we've seen so far - If it's larger, we update
max_value
- We compare the value (i.e.
- The only difference with
my_max
is that we want to return the key associated with the largest value- We need another variable (
max_key
) that stores this key Whenever we update
max_value
, we also updatemax_key
>>> data = read_numbers('numbers.txt') >>> get_most_frequent_value(data) 1
- We need another variable (
- We keep a variable (
- It may also be useful to not only get the most frequent value, but also how frequent it is
- Anytime you want to return more than one value from a function, a tuple is often a good option
- Look at the
get_most_frequent
functiononly difference is that we return a tuple and also include the
max_value
>>> data = read_numbers('numbers.txt') >>> get_most_frequent(data) (1, 7)