CS51A - Spring 2019

CS51A - Spring 2019 - Class 13

Example code in this lecture

Lecture notes

admin
- mentor hours and office hours changing

write a function called read_numbers that takes a file of numbers (one per line) and generates a list consisting of the numbers in that file
   - look at read_numbers function in dictionaries.py code
      - if you're reading numbers, don't forget to turn them into ints using "int"

      >>> data = read_numbers('numbers.txt')
      data
      [1, 2, 3, 2, 1, 1, 2, 6, 7, 8, 10, 1, 5, 5, 5, 3, 8, 6, 7, 6, 4, 1, 1, 2, 3, 1, 2, 3]

what if we wanted to find the most frequent value in this data?
   - how would you do it?
   - do it on paper: [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
      - how did you do it?
         - kept a tally of the number
         - each time you saw a new number, added it to your list with a count of 1
         - if it was something you'd seen already, add another tally/count
      - key idea, keeping track of two things:
         - a key, which is the thing you're looking up
         - a value, which is associated with each key

dictionaries (aka maps)
   - store keys and an associated value
      - each key is associated with a value
      - lookup can be done based on the key
      - this is a very common phenomena in the real world. What are some examples?
         - social security number
            - key = social security number
            - value = name, address, etc
         - phone numbers in your phone (and phone directories in general)
            - key = name
            - value = phone number
         - websites
            - key = url
            - value = location of the computer that hosts this website
         - car license plates
            - key = license plate number
            - value = owner, type of car, ...
         - flight information
            - key = flight number
            - value = departure city, destination city, time, ...
   - creating new dictionaries
      - dictionaries can be created using curly braces
         >>> d = {}
         >>> d
         {}
      - dictionaries function similarly to lists, except we can put things in ANY index and can use non-numerical indices
         >>> d[15] = 1
         >>> d
         {15: 1}

         - notice when a dictionary is printed out, we get the key AND the associated value

         >>> d[100] = 10
         >>> d
         {100: 10, 15: 1}
         >>> my_list = []
         >>> my_list[15] = 1
         Traceback (most recent call last):
          File "<string>", line 1, in <fragment>
         IndexError: list assignment index out of range

         - dictionaries ARE very different than lists....
      - we can also update the values already in a list
         >> d[15] = 2
         >>> d
         {100: 10, 15: 2}
         >>> d[100] += 1
         >>> d
         {100: 11, 15: 2}
      - keys in the dictionary can be ANY immutable object
         >>> d2 = {}
         >>> >>> d2["dave"] = 1
         >>> d2["anna"] = 1
         >>> d2["anna"] = 2
         >>> d2["seymore"] = 100
         >>> d2
         {'seymore': 100, 'dave': 1, 'anna': 2}
      - the values can be ANY object
         - >>> d3 = {}
         >>> d3["dave"] = []
         >>> d3
         {'dave': []}
         >>> d3["dave"].append(1)
         >>> d3["dave"].append(40)
         >>> d3
         {'dave': [1, 40]}
      - be careful to put the key in the set before trying to use it
         >>> d3["steve"]
         Traceback (most recent call last):
          File "<string>", line 1, in <fragment>
         KeyError: 'steve'
         >>> d3["steve"].append(1)
         Traceback (most recent call last):
          File "<string>", line 1, in <fragment>
         KeyError: 'steve'
      - how do you think we can create non-empty dictionaries from scratch?
         >>> another_dict = {"dave": 1, "anna":100, "seymore": 21}
         >>> another_dict
         {'seymore': 21, 'dave': 1, 'anna': 100}
      - what are some other methods you might want for dictionaries (things you might want to ask about them?
         - does it have a particular key?
         - how many key/value pairs are in the dictionary?
         - what are all of the values in the dictionary?
         - what are all of the keys in the dictionary?
         - remove all of the items in the dictionary?
      - dictionaries support most of the other things you'd expect them too that we've seen in other data structures
         >>> "seymore" in another_dict
         True
         >>> len(another_dict)
         3
      - dictionaries are a class of objects, just like everything else we've seen (called dict ... short for dictionary)
         >>> help(dict)
      - some of the more relevant methods:
         >>> d2
         {'seymore': 100, 'dave': 1, 'anna': 2}
         >>> d2.values()
         [100, 1, 2]
         >>> d2.keys()
         dict_keys(['seymore', 'dave', 'anna'])
         >>> d2.pop('seymore')
         >>> d2
         {'dave': 1, 'anna': 2}
         >>> d2.clear()
         >>> d2
         {}

generating counts
   - We're going to use dictionaries to store counts like we did on paper
   - Write a function called get_counts that takes a list of numbers and returns a dictionary containing the counts of each of the numbers
   - Key idea:

      def get_counts(numbers):
         d = {}

         for num in numbers:
            # do something here

         return d

   - There are two cases we need to contend with:
      1) if the number isn't in the dictionary

         - In this case we need to add it with the value 1

            d[num] = 1

      2) if the number is in the dictionary

         - In this case, we just need to increment it

            d[num] = d[num] + 1

         which can also be written

            d[num] += 1

   - Look at the get_counts function in dictionaries.py code

   - We now can generate the counts from our file

   >>> data = read_numbers('numbers.txt')
   >>> data
   >>> [1, 2, 3, 2, 1, 1, 2, 6, 7, 8, 10, 1, 5, 5, 5, 3, 8, 6, 7, 6, 4, 1, 1, 2, 3, 1, 2, 3]
   >>> get_counts(data)
   {1: 7, 2: 5, 3: 4, 6: 3, 7: 2, 8: 2, 10: 1, 5: 3, 4: 1}

Iterating over dictionaries
   - We're almost to the point where we can find the most frequent value.
   - Next, we need to go through all of the values in the dictionary to find the most frequent one.

   - there are many ways we could iterate over the things in a dictionary
      - iterate over the values
      - iterate over the keys
      - iterate over the key/value pairs
   - which one is most common?
      - since lookups are done based on the keys, iterating over the keys is the most common
   - by default, if you say:

      for key in dictionary:
         ...

      key will get associated with each key in the dictionary.
   - once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
      for key in dictionary:
         value = dictionary[key]
         ..

   - look at the print_counts function in dictionaries.py code
      - "\t" is the tab character

      >>> data = read_numbers('numbers.txt')
      >>> counts = get_counts(data)
      >> print_counts(counts)
      1   7
      2   5
      3   4
      6   3
      7   2
      8   2
      10   1
      5   3
      4   1

      Notice that there the keys are not in numerical order. In general, there's no guarantee about the ordering of the keys, only that you'll iterate over all of them.

look at the get_most_frequent_value function in dictionaries.py code
   - Looks very similar to the my_max function we wrote in lecture8 (http://www.cs.pomona.edu/~dkauchak/classes/cs51a/lectures/lecture8-sequences.html)
      - We keep a variable (max_value) that stores the largest value we've seen so far
         - We'll initialize it to -1 assuming that the numbers are all positive
         - See problem set 6 for a general solution
      - We then iterate through each of the key/value pairs in our dictionary
         - We compare the value (i.e. counts[key]) to the largest value we've seen so far
         - If it's larger, we update max_value
      - The only difference with my_max is that we want to return the *key* associated with the largest value
         - We need another variable (max_key) that stores this key
         - Whenever we update max_value, we also update max_key

      >>> data = read_numbers('numbers.txt')
      >>> get_most_frequent_value(data)
      1

It may also be useful to not only get the most frequent value, but also how frequent it is
   - Anytime you want to return more than one value from a function, a tuple is often a good option
   - Look at the get_most_frequent function in dictionaries.py code
      - only difference is that we return a tuple and also include the max_value

      >>> data = read_numbers('numbers.txt')
      >>> get_most_frequent(data)
      (1, 7)