CS51A - Spring 2019 - Class 13

Example code in this lecture

   dictionaries.py

Lecture notes

  • admin
       - mentor hours and office hours changing

  • write a function called read_numbers that takes a file of numbers (one per line) and generates a list consisting of the numbers in that file
       - look at read_numbers function in dictionaries.py code
          - if you're reading numbers, don't forget to turn them into ints using "int"

          >>> data = read_numbers('numbers.txt')
          data
          [1, 2, 3, 2, 1, 1, 2, 6, 7, 8, 10, 1, 5, 5, 5, 3, 8, 6, 7, 6, 4, 1, 1, 2, 3, 1, 2, 3]

  • what if we wanted to find the most frequent value in this data?
       - how would you do it?
       - do it on paper: [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
          - how did you do it?
             - kept a tally of the number
             - each time you saw a new number, added it to your list with a count of 1
             - if it was something you'd seen already, add another tally/count
          - key idea, keeping track of two things:
             - a key, which is the thing you're looking up
             - a value, which is associated with each key

  • dictionaries (aka maps)
       - store keys and an associated value
          - each key is associated with a value
          - lookup can be done based on the key
          - this is a very common phenomena in the real world. What are some examples?
             - social security number
                - key = social security number
                - value = name, address, etc
             - phone numbers in your phone (and phone directories in general)
                - key = name
                - value = phone number
             - websites
                - key = url
                - value = location of the computer that hosts this website
             - car license plates
                - key = license plate number
                - value = owner, type of car, ...
             - flight information
                - key = flight number
                - value = departure city, destination city, time, ...
       - creating new dictionaries
          - dictionaries can be created using curly braces
             >>> d = {}
             >>> d
             {}
          - dictionaries function similarly to lists, except we can put things in ANY index and can use non-numerical indices
             >>> d[15] = 1
             >>> d
             {15: 1}
             
             - notice when a dictionary is printed out, we get the key AND the associated value

             >>> d[100] = 10
             >>> d
             {100: 10, 15: 1}
             >>> my_list = []
             >>> my_list[15] = 1
             Traceback (most recent call last):
              File "<string>", line 1, in <fragment>
             IndexError: list assignment index out of range

             - dictionaries ARE very different than lists....
          - we can also update the values already in a list
             >> d[15] = 2
             >>> d
             {100: 10, 15: 2}
             >>> d[100] += 1
             >>> d
             {100: 11, 15: 2}
          - keys in the dictionary can be ANY immutable object
             >>> d2 = {}
             >>> >>> d2["dave"] = 1
             >>> d2["anna"] = 1
             >>> d2["anna"] = 2
             >>> d2["seymore"] = 100
             >>> d2
             {'seymore': 100, 'dave': 1, 'anna': 2}
          - the values can be ANY object
             - >>> d3 = {}
             >>> d3["dave"] = []
             >>> d3
             {'dave': []}
             >>> d3["dave"].append(1)
             >>> d3["dave"].append(40)
             >>> d3
             {'dave': [1, 40]}
          - be careful to put the key in the set before trying to use it
             >>> d3["steve"]
             Traceback (most recent call last):
              File "<string>", line 1, in <fragment>
             KeyError: 'steve'
             >>> d3["steve"].append(1)
             Traceback (most recent call last):
              File "<string>", line 1, in <fragment>
             KeyError: 'steve'
          - how do you think we can create non-empty dictionaries from scratch?
             >>> another_dict = {"dave": 1, "anna":100, "seymore": 21}
             >>> another_dict
             {'seymore': 21, 'dave': 1, 'anna': 100}
          - what are some other methods you might want for dictionaries (things you might want to ask about them?
             - does it have a particular key?
             - how many key/value pairs are in the dictionary?
             - what are all of the values in the dictionary?
             - what are all of the keys in the dictionary?
             - remove all of the items in the dictionary?
          - dictionaries support most of the other things you'd expect them too that we've seen in other data structures
             >>> "seymore" in another_dict
             True
             >>> len(another_dict)
             3
          - dictionaries are a class of objects, just like everything else we've seen (called dict ... short for dictionary)
             >>> help(dict)
          - some of the more relevant methods:
             >>> d2
             {'seymore': 100, 'dave': 1, 'anna': 2}
             >>> d2.values()
             [100, 1, 2]
             >>> d2.keys()
             dict_keys(['seymore', 'dave', 'anna'])
             >>> d2.pop('seymore')
             >>> d2
             {'dave': 1, 'anna': 2}
             >>> d2.clear()
             >>> d2
             {}

  • generating counts
       - We're going to use dictionaries to store counts like we did on paper
       - Write a function called get_counts that takes a list of numbers and returns a dictionary containing the counts of each of the numbers
       - Key idea:

          def get_counts(numbers):
             d = {}

             for num in numbers:
                # do something here

             return d

       - There are two cases we need to contend with:
          1) if the number isn't in the dictionary

             - In this case we need to add it with the value 1

                d[num] = 1

          2) if the number is in the dictionary

             - In this case, we just need to increment it

                d[num] = d[num] + 1

             which can also be written

                d[num] += 1

       - Look at the get_counts function in dictionaries.py code

       - We now can generate the counts from our file

       >>> data = read_numbers('numbers.txt')
       >>> data
       >>> [1, 2, 3, 2, 1, 1, 2, 6, 7, 8, 10, 1, 5, 5, 5, 3, 8, 6, 7, 6, 4, 1, 1, 2, 3, 1, 2, 3]
       >>> get_counts(data)
       {1: 7, 2: 5, 3: 4, 6: 3, 7: 2, 8: 2, 10: 1, 5: 3, 4: 1}

  • Iterating over dictionaries
       - We're almost to the point where we can find the most frequent value.
       - Next, we need to go through all of the values in the dictionary to find the most frequent one.

       - there are many ways we could iterate over the things in a dictionary
          - iterate over the values
          - iterate over the keys
          - iterate over the key/value pairs
       - which one is most common?
          - since lookups are done based on the keys, iterating over the keys is the most common
       - by default, if you say:

          for key in dictionary:
             ...

          key will get associated with each key in the dictionary.
       - once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
          for key in dictionary:
             value = dictionary[key]
             ..

       - look at the print_counts function in dictionaries.py code
          - "\t" is the tab character
          
          >>> data = read_numbers('numbers.txt')
          >>> counts = get_counts(data)
          >> print_counts(counts)
          1   7
          2   5
          3   4
          6   3
          7   2
          8   2
          10   1
          5   3
          4   1

          Notice that there the keys are not in numerical order. In general, there's no guarantee about the ordering of the keys, only that you'll iterate over all of them.

  • look at the get_most_frequent_value function in dictionaries.py code
       - Looks very similar to the my_max function we wrote in lecture8 (http://www.cs.pomona.edu/~dkauchak/classes/cs51a/lectures/lecture8-sequences.html)
          - We keep a variable (max_value) that stores the largest value we've seen so far
             - We'll initialize it to -1 assuming that the numbers are all positive
             - See problem set 6 for a general solution
          - We then iterate through each of the key/value pairs in our dictionary
             - We compare the value (i.e. counts[key]) to the largest value we've seen so far
             - If it's larger, we update max_value
          - The only difference with my_max is that we want to return the *key* associated with the largest value
             - We need another variable (max_key) that stores this key
             - Whenever we update max_value, we also update max_key

          >>> data = read_numbers('numbers.txt')
          >>> get_most_frequent_value(data)
          1

  • It may also be useful to not only get the most frequent value, but also how frequent it is
       - Anytime you want to return more than one value from a function, a tuple is often a good option
       - Look at the get_most_frequent function in dictionaries.py code
          - only difference is that we return a tuple and also include the max_value

          >>> data = read_numbers('numbers.txt')
          >>> get_most_frequent(data)
          (1, 7)