Lecture 9: Dictionaries

Topics

Lunch with Prof. Osborn (or others)

Test 1 Monday 3/2

in class
paper-based
can bring in two pages of notes (either two pieces of paper, single-side or one piece, double-sided)
problems like practice problems
- coding
- what's wrong with this function
- what would this function do
- is this valid?
- what would the output be
- …
practice writing code on paper (it's different than on the computer)
I'll post practice problems
cover everything through today's lecture (not recursion)

Student Presentation

Dictionaries (aka "maps")

store keys and an associated value
- each key is associated with a value
- lookup can be done based on the key
- this is a very common phenomena in the real world. What are some examples?
  - social security number
    - key = social security number
    - value = name, address, etc
  - phone numbers in your phone (and phone directories in general)
    - key = name
    - value = phone number
  - websites
    - key = url
    - value = location of the computer that hosts this website
  - car license plates
    - key = license plate number
    - value = owner, type of car, …
  - flight information
    - key = flight number
    - value = departure city, destination city, time, …

creating new dictionaries

dictionaries can be created using curly braces
```
>>> d = {}
>>> d
{}
```
dictionaries function similarly to lists, except we can put things in ANY index and can use non-numerical indices
```
>>> d[15] = 1
>>> d
{15: 1}
```
- notice when a dictionary is printed out, we get the key AND the associated value
```
>>> d[100] = 10
>>> d
{100: 10, 15: 1}
>>> my_list = []
>>> my_list[15] = 1
Traceback (most recent call last):
 File "<string>", line 1, in <fragment>
IndexError: list assignment index out of range
```
- dictionaries ARE very different than lists….

we can also update the values already in a dictionary

>> d[15] = 2
>>> d
{100: 10, 15: 2}
>>> d[100] += 1
>>> d
{100: 11, 15: 2}

keys in the dictionary can be ANY immutable object

>>> d2 = {}
>>> >>> d2["dave"] = 1
>>> d2["anna"] = 1
>>> d2["anna"] = 2
>>> d2["seymore"] = 100
>>> d2
{'seymore': 100, 'dave': 1, 'anna': 2}

the values can be ANY object

- >>> d3 = {}
>>> d3["dave"] = []
>>> d3
{'dave': []}
>>> d3["dave"].append(1)
>>> d3["dave"].append(40)
>>> d3
{'dave': [1, 40]}

be careful to put the key in the dictionary before trying to use it

>>> d3["steve"]
Traceback (most recent call last):
 File "<string>", line 1, in <fragment>
KeyError: 'steve'
>>> d3["steve"].append(1)
Traceback (most recent call last):
 File "<string>", line 1, in <fragment>
KeyError: 'steve'

how do you think we can create non-empty dictionaries from scratch?

>>> another_dict = {"dave": 1, "anna":100, "seymore": 21}
>>> another_dict
{'seymore': 21, 'dave': 1, 'anna': 100}

what are some other methods you might want for dictionaries (things you might want to ask about them?
- does it have a particular key?
- how many key/value pairs are in the dictionary?
- what are all of the values in the dictionary?
- what are all of the keys in the dictionary?
- remove all of the items in the dictionary?
dictionaries support most of the other things you'd expect them too that we've seen in other data structures
```
>>> "seymore" in another_dict
True
>>> len(another_dict)
3
```
dictionaries are a class of objects, just like everything else we've seen (called dict … short for dictionary)
```
>>> help(dict)
```

some of the more relevant methods:

>>> d2
{'seymore': 100, 'dave': 1, 'anna': 2}
>>> d2.values()
[100, 1, 2]
>>> d2.keys()
dict_keys(['seymore', 'dave', 'anna'])
>>> d2.pop('seymore')
>>> d2
{'dave': 1, 'anna': 2}
>>> d2.clear()
>>> d2
{}

TODO dict.items() example

Tracking frequencies

We're going to use dictionaries to store counts like we did on paper last lecture
Write a function called get_counts that takes a list of numbers and returns a dictionary containing the counts of each of the numbers

Key idea:

def get_counts(numbers):
  d = {}

  for num in numbers:
    # do something here

  return d

There are two cases we need to contend with:
1. if the number isn't in the dictionary
  - In this case we need to add it with the value 1: d[num] = 1
2. if the number is in the dictionary
  - In this case, we just need to increment it: d[num] = d[num] + 1
  - This can also be written
Look at the get_counts function in dictionaries.py

We now can generate the counts from our file

>>> data = read_numbers('numbers.txt')
>>> data
>>> [1, 2, 3, 2, 1, 1, 2, 6, 7, 8, 10, 1, 5, 5, 5, 3, 8, 6, 7, 6, 4, 1, 1, 2, 3, 1, 2, 3]
>>> get_counts(data)
{1: 7, 2: 5, 3: 4, 6: 3, 7: 2, 8: 2, 10: 1, 5: 3, 4: 1}

Iterating over dictionaries

We're almost to the point where we can find the most frequent value.
Next, we need to go through all of the values in the dictionary to find the most frequent one.
there are many ways we could iterate over the things in a dictionary
- iterate over the values
- iterate over the keys
- iterate over the key/value pairs
which one is most common?
- since lookups are done based on the keys, iterating over the keys is the most common
by default, if you say:
```
for key in dictionary:
  ...
```
- key will get associated with each key in the dictionary in turn
once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
```
for key in dictionary:
  value = dictionary[key]
  ...
```
look at the print_counts function
- "\t" is the tab character
```
>>> data = read_numbers('numbers.txt')
>>> counts = get_counts(data)
>> print_counts(counts)
1  7
2  5
3  4
6  3
7  2
8  2
10  1
5  3
4  1
```
- Notice that the keys are not in numerical order. In general, there's no guarantee about the ordering of the keys, only that you'll iterate over all of them.
look at the get_most_frequent_value function
Looks very similar to the my_max function we wrote way back in lecture 6 notes (https://cs.pomona.edu/classes/cs51a/lectures/lec06.html)
- We keep a variable (max_value) that stores the largest value we've seen so far
  - We'll initialize it to -1 assuming that the numbers are all positive
  - See problem set 6 for a general solution
- We then iterate through each of the key/value pairs in our dictionary
  - We compare the value (i.e. counts[key]) to the largest value we've seen so far
  - If it's larger, we update max_value
- The only difference with my_max is that we want to return the key associated with the largest value
  - We need another variable (max_key) that stores this key
  - Whenever we update max_value, we also update max_key
```
>>> data = read_numbers('numbers.txt')
>>> get_most_frequent_value(data)
1
```
It may also be useful to not only get the most frequent value, but also how frequent it is
Anytime you want to return more than one value from a function, a tuple is often a good option
Look at the get_most_frequent function
- only difference is that we return a tuple and also include the max_value
```
>>> data = read_numbers('numbers.txt')
>>> get_most_frequent(data)
(1, 7)
```