CS150 - Fall 2013 - Class 14

  • exercise

  • admin
       - CS lunch on Thursday

  • you can only put immutable objects in a set
       - any guesses as to why?
          - objects are kept track of based on their contents
          - if their contents change, there is no easy way to let the set know this
       - what can/can't we store in a set?
          - can store:
             - ints
             - floats
             - strings
             - bools
          - can't store
             - lists
             - sets

  • tuples
       - there are occasions when we want to have a list of things, but it's immutable
          - for example, if we want to keep track of a list of things in a set
       - a "tuple" is an immutable list
       - tuples can be created as literals using parenthesis (instead of square braces)
          >>> my_tuple = (1, 2, 3, 4)
          >>> my_tuple
          (1, 2, 3, 4)
          >>> another_tuple = ("a", "b", "c", "d")
          >>> another_tuple
          ('a', 'b', 'c', 'd')

          - notice that when they print out they also show using parenthesis
       - tuples are sequential and have many of the similar behaviors as lists
          >>> my_tuple[0]
          1
          >>> my_tuple[3]
          4
          >>> for i in range(len(my_tuple)):
          ...    print my_tuple[i]
          ...
          1
          2
          3
          4
          >>> my_tuple[1:3]
          (2, 3)
       - tuples are immutable!
          >>> my_tuple[0] = 1
          Traceback (most recent call last):
           File "<string>", line 1, in <fragment>
          TypeError: 'tuple' object does not support item assignment
          >>> my_tuple.append(1)
          Traceback (most recent call last):
           File "<string>", line 1, in <fragment>
          AttributeError: 'tuple' object has no attribute 'append'
       
       - what about?
          >>> my_tuple = another_tuple
          >>> my_tuple
          ('a', 'b', 'c', 'd')
          >>> another_tuple
          ('a', 'b', 'c', 'd')
          
          - this is perfectly legal. We're not mutating a tuple, just reassigning our variable

  • generating histograms
       - we'd like to write a function that generates a histogram based on some input data
       - what is a histogram?
          - shows the "distribution" of the data (i.e. where the values range)
          - often visualized as a bar chart
             - along the x axis are the values (or bins)
             - and the y axis shows the frequency of those values (or bins)
       - for example, run histogram.py code
          >>> data = [1, 1, 2, 3, 1, 5, 4 ,2, 1]
          >>> print_counts(get_counts(data))

          - we can use Excel again to visualize this as a histogram
       - how can we do this?
          - we could do this like we did in assignment 5, where we sort and then count
          - but there's an easier way...
       - do it on paper: [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
          - how did you do it?
             - kept a tally of the number
             - each time you saw a new number, added it to your list with a count of 1
             - if it was something you'd seen already, add another tally/count
          - key idea, keeping track of two things:
             - a key, which is the thing you're looking up
             - a value, which is associated with each key

  • dictionaries (aka maps)
       - store keys and an associated value
          - each key is associated with a value
          - lookup can be done based on the key
          - this is a very common phenomena in the real world. What are some examples?
             - social security number
                - key = social security number
                - value = name, address, etc
             - phone numbers in your phone (and phone directories in general)
                - key = name
                - value = phone number
             - websites
                - key = url
                - value = location of the computer that hosts this website
             - car license plates
                - key = license plate number
                - value = owner, type of car, ...
             - flight information
                - key = flight number
                - value = departure city, destination city, time, ...
       - like sets, dictionaries allow us to efficiently lookup (and update) keys in the dictionary
       - creating new dictionaries
          - dictionaries can be created using curly braces
             >>> d = {}
             >>> d
             {}
          - dictionaries function similarly to lists, except we can put things in ANY index and can use non-numerical indices
             >>> d[15] = 1
             >>> d
             {15: 1}
             
             - notice when a dictionary is printed out, we get the key AND the associated value

             >>> d[100] = 10
             >>> d
             {100: 10, 15: 1}
             >>> my_list = []
             >>> my_list[15] = 1
             Traceback (most recent call last):
              File "<string>", line 1, in <fragment>
             IndexError: list assignment index out of range

             - dictionaries ARE very different than lists....
          - we can also update the values already in a list
             >> d[15] = 2
             >>> d
             {100: 10, 15: 2}
             >>> d[100] += 1
             >>> d
             {100: 11, 15: 2}
          - keys in the dictionary can be ANY immutable object
             >>> d2 = {}
             >>> >>> d2["dave"] = 1
             >>> d2["anna"] = 1
             >>> d2["anna"] = 2
             >>> d2["seymore"] = 100
             >>> d2
             {'seymore': 100, 'dave': 1, 'anna': 2}
          - the values can be ANY object
             - >>> d3 = {}
             >>> d3["dave"] = set()
             >>> d3["anna"] = set()
             >>> d3
             {'dave': set([]), 'anna': set([])}
             >>> d3["dave"].add(1)
             >>> d3["dave"].add(40)
             >>> d3["anna"].add("abcd")
             >>> d3
             {'dave': set([40, 1]), 'anna': set(['abcd'])}
          - be careful to put the key in the set before trying to use it
             >>> d3["steve"]
             Traceback (most recent call last):
              File "<string>", line 1, in <fragment>
             KeyError: 'steve'
             >>> d3["steve"].add(1)
             Traceback (most recent call last):
              File "<string>", line 1, in <fragment>
             KeyError: 'steve'
          - how do you think we can create non-empty dictionaries from scratch?
             >>> another_dict = {"dave": 1, "anna":100, "seymore": 21}
             >>> another_dict
             {'seymore': 21, 'dave': 1, 'anna': 100}
          - what are some other methods you might want for dictionaries (things you might want to ask about them?
             - does it have a particular key?
             - how many key/value pairs are in the dictionary?
             - what are all of the values in the dictionary?
             - what are all of the keys in the dictionary?
             - remove all of the items in the dictionary?
          - dictionaries support most of the other things you'd expect them too that we've seen in other data structures
             >>> "seymore" in another_dict
             True
             >>> len(another_dict)
             3
          - dictionaries are a class of objects, just like everything else we've seen (called dict ... short for dictionary)
             >>> help(dict)
          - some of the more relevant methods:
             >>> d2
             {'seymore': 100, 'dave': 1, 'anna': 2}
             >>> d2.values()
             [100, 1, 2]
             >>> d2.keys()
             ['seymore', 'dave', 'anna']
             >>> d2.clear()
             >>> d2
             {}
          - sometimes we want to delete a key/value pair: "del" does this
             >>> d
             {'pineapple': 1, 'apple': 10, 'pears': 15, 'banana': 3}
             >>> del d["pineapple"]
             >>> d
             {'apple': 10, 'pears': 15, 'banana': 3}

  • back to our histogram example
       - how could we use a dictionary to generate the counts?
          >>> data = [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
          >>> print_counts(get_counts(data))
          1   3
          2   3
          3   2
          4   2
          5   2
       - first, we need to store them in a dictionary
          - look at the get_counts function in histogram.py code
             - creates an empty hashtable
             - iterates through the data
             - check if the data is in the dictionary already
                - if it is, just increment the count by 1
                - if it's not, add it to the dictionary with a count of 1
          - what types of things could we call get_counts on?
             - anything that is iterable!
                >>> get_counts(data)
                {1: 3, 2: 3, 3: 2, 4: 2, 5: 2}
                >>> get_counts("this is a string and strings are iterable")
                {'a': 4, ' ': 7, 'b': 1, 'e': 3, 'd': 1, 'g': 2, 'i': 5, 'h': 1, 'l': 1, 'n': 3, 's': 5, 'r': 4, 't': 4}
                >>> s = set([1, 2, 3, 4, 1, 2])
                >>> s
                set([1, 2, 3, 4])
                >>> get_counts(s)
                {1: 1, 2: 1, 3: 1, 4: 1}

                - though sets aren't that interesting :)
       - now that we have the dictionary of counts, how can we print them out?
          - there are many ways we could iterate over the things in a dictionary
             - iterate over the values
             - iterate over the keys
             - iterate over the key/value pairs
          - which one is most common?
             - since lookups are done based on the keys, iterating over the keys is the most common
          - look at print_counts function in histogram.py code
             - by default, if you say:

                for key in dictionary:
                   ...

                key will get associated with each key in the dictionary.
             - this is the same as writing
             
                for key in dictionary.keys():
                   ...

             - once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
          - if you want to iterate over the values, use the values() method, which returns a list of the values
          - what if you want to iterate over the key/value pairs?
             - there is a method called items() that returns key value pairs as a 2-tuple
                >>> my_dict = {"dave": 1, "anna": 15}
                >>> my_dict.items()
                [('dave', 1), ('anna', 15)]
             - how could we use this in a loop?
                
                for (key, value) in my_dict.items():
                   print "Key: " + str(key)
                   print "Value: " + str(value)
             - items() returns a list of 2-tuples, which we're iterating over

       - Does the following print like you'd like it to?
          >>> print_counts(get_counts("this is some string"))
              3
          e   1
          g   1
          i   3
          h   1
          m   1
          o   1
          n   1
          s   4
          r   1
          t   2

          - prints in a random order
          - like the values in sets, there is NO inherent ordering to the keys in a dictionary
       - how could we print this in sorted order?
          - get the keys first
          - sort them
          - then use them to iterate over the data
       - look at print_counts_sorted in histogram.py code