CS150 - Fall 2013

CS150 - Fall 2013 - Class 14

exercise

admin
- sea and space images
- schedule for the next week

you can only put immutable objects in a set
   - any guesses as to why?
      - objects are kept track of based on their contents
      - if their contents change, there is no easy way to let the set know this
   - what can/can't we store in a set?
      - can store:
         - ints
         - floats
         - strings
         - bools
      - can't store
         - lists
         - sets

tuples
   - there are occasions when we want to have a list of things, but it's immutable
      - for example, if we want to keep track of a list of things in a set
   - a "tuple" is an immutable list
   - tuples can be created as literals using parenthesis (instead of square braces)
      >>> my_tuple = (1, 2, 3, 4)
      >>> my_tuple
      (1, 2, 3, 4)
      >>> another_tuple = ("a", "b", "c", "d")
      >>> another_tuple
      ('a', 'b', 'c', 'd')

      - notice that when they print out they also show using parenthesis
   - tuples are sequential and have many of the similar behaviors as lists
      >>> my_tuple[0]
      1
      >>> my_tuple[3]
      4
      >>> for i in range(len(my_tuple)):
      ...    print my_tuple[i]
      ...
      1
      2
      3
      4
      >>> my_tuple[1:3]
      (2, 3)
   - tuples are immutable!
      >>> my_tuple[0] = 1
      Traceback (most recent call last):
       File "<string>", line 1, in <fragment>
      TypeError: 'tuple' object does not support item assignment
      >>> my_tuple.append(1)
      Traceback (most recent call last):
       File "<string>", line 1, in <fragment>
      AttributeError: 'tuple' object has no attribute 'append'

   - what about?
      >>> my_tuple = another_tuple
      >>> my_tuple
      ('a', 'b', 'c', 'd')
      >>> another_tuple
      ('a', 'b', 'c', 'd')

      - this is perfectly legal. We're not mutating a tuple, just reassigning to another variable

generating histograms
   - we'd like to write a function that generates a histogram based on some input data
   - what is a histogram?
      - shows the "distribution" of the data (i.e. where the values range)
      - often visualized as a bar chart
         - along the x axis are the values (or bins)
         - and the y axis shows the frequency of those values (or bins)
   - for example, run histogram.py code
      >>> data = [1, 1, 2, 3, 1, 5, 4 ,2, 1]
      >>> print_counts(get_counts(data))

      - we can use Excel again to visualize this as a histogram
   - how can we do this?
      - we could do this like we did in assignment 5, where we sort and then count
      - but there's an easier way...
   - do it on paper: [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
      - how did you do it?
         - kept a tally of the number
         - each time you saw a new number, added it to your list with a count of 1
         - if it was something you'd seen already, add another tally/count
      - key idea, keeping track of two things:
         - a key, which is the thing you're looking up
         - a value, which is associated with each key

dictionaries (aka maps)
   - store keys and an associated value
      - each key is associated with a value
      - lookup can be done based on the key
      - this is a very common phenomena in the real world. What are some examples?
         - social security number
            - key = social security number
            - value = name, address, etc
         - phone numbers in your phone (and phone directories in general)
            - key = name
            - value = phone number
         - websites
            - key = url
            - value = location of the computer that hosts this website
         - car license plates
            - key = license plate number
            - value = owner, type of car, ...
         - flight information
            - key = flight number
            - value = departure city, destination city, time, ...
   - like sets, dictionaries allow us to efficiently lookup (and update) keys in the dictionary
   - creating new dictionaries
      - dictionaries can be created using curly braces
         >>> d = {}
         >>> d
         {}
      - dictionaries function similarly to lists, except we can put things in ANY index and can use non-numerical indices
         >>> d[15] = 1
         >>> d
         {15: 1}

         - notice when a dictionary is printed out, we get the key AND the associated value

         >>> d[100] = 10
         >>> d
         {100: 10, 15: 1}
         >>> my_list = []
         >>> my_list[15] = 1
         Traceback (most recent call last):
          File "<string>", line 1, in <fragment>
         IndexError: list assignment index out of range

         - dictionaries ARE very different than lists....
      - we can also update the values already in a list
         >> d[15] = 2
         >>> d
         {100: 10, 15: 2}
         >>> d[100] += 1
         >>> d
         {100: 11, 15: 2}
      - keys in the dictionary can be ANY immutable object
         >>> d2 = {}
         >>> >>> d2["dave"] = 1
         >>> d2["anna"] = 1
         >>> d2["anna"] = 2
         >>> d2["seymore"] = 100
         >>> d2
         {'seymore': 100, 'dave': 1, 'anna': 2}
      - the values can be ANY object
         - >>> d3 = {}
         >>> d3["dave"] = set()
         >>> d3["anna"] = set()
         >>> d3
         {'dave': set([]), 'anna': set([])}
         >>> d3["dave"].add(1)
         >>> d3["dave"].add(40)
         >>> d3["anna"].add("abcd")
         >>> d3
         {'dave': set([40, 1]), 'anna': set(['abcd'])}
      - be careful to put the key in the set before trying to use it
         >>> d3["steve"]
         Traceback (most recent call last):
          File "<string>", line 1, in <fragment>
         KeyError: 'steve'
         >>> d3["steve"].add(1)
         Traceback (most recent call last):
          File "<string>", line 1, in <fragment>
         KeyError: 'steve'
      - how do you think we can create non-empty dictionaries from scratch?
         >>> another_dict = {"dave": 1, "anna":100, "seymore": 21}
         >>> another_dict
         {'seymore': 21, 'dave': 1, 'anna': 100}
      - what are some other methods you might want for dictionaries (things you might want to ask about them?
         - does it have a particular key?
         - how many key/value pairs are in the dictionary?
         - what are all of the values in the dictionary?
         - what are all of the keys in the dictionary?
         - remove all of the items in the dictionary?
      - dictionaries support most of the other things you'd expect them too that we've seen in other data structures
         >>> "seymore" in another_dict
         True
         >>> len(another_dict)
         3
      - dictionaries are a class of objects, just like everything else we've seen (called dict ... short for dictionary)
         >>> help(dict)
      - some of the more relevant methods:
         >>> d2
         {'seymore': 100, 'dave': 1, 'anna': 2}
         >>> d2.values()
         [100, 1, 2]
         >>> d2.keys()
         ['seymore', 'dave', 'anna']
         >>> d2.clear()
         >>> d2
         {}
      - sometimes we want to delete a key/value pair: "del" does this
         >>> d
         {'pineapple': 1, 'apple': 10, 'pears': 15, 'banana': 3}
         >>> del d["pineapple"]
         >>> d
         {'apple': 10, 'pears': 15, 'banana': 3}

back to our histogram example
   - how could we use a dictionary to generate the counts?
      >>> data = [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
      >>> print_counts(get_counts(data))
      1   3
      2   3
      3   2
      4   2
      5   2
   - first, we need to store them in a dictionary
      - look at the get_counts function in histogram.py code
         - creates an empty hashtable
         - iterates through the data
         - check if the data is in the dictionary already
            - if it is, just increment the count by 1
            - if it's not, add it to the dictionary with a count of 1
      - what types of things could we call get_counts on?
         - anything that is iterable!
            >>> get_counts(data)
            {1: 3, 2: 3, 3: 2, 4: 2, 5: 2}
            >>> get_counts("this is a string and strings are iterable")
            {'a': 4, ' ': 7, 'b': 1, 'e': 3, 'd': 1, 'g': 2, 'i': 5, 'h': 1, 'l': 1, 'n': 3, 's': 5, 'r': 4, 't': 4}
            >>> s = set([1, 2, 3, 4, 1, 2])
            >>> s
            set([1, 2, 3, 4])
            >>> get_counts(s)
            {1: 1, 2: 1, 3: 1, 4: 1}

            - though sets aren't that interesting :)
   - now that we have the dictionary of counts, how can we print them out?
      - there are many ways we could iterate over the things in a dictionary
         - iterate over the values
         - iterate over the keys
         - iterate over the key/value pairs
      - which one is most common?
         - since lookups are done based on the keys, iterating over the keys is the most common
      - look at print_counts function in histogram.py code
         - by default, if you say:

            for key in dictionary:
               ...

            key will get associated with each key in the dictionary.
         - this is the same as writing

            for key in dictionary.keys():
               ...

         - once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
      - if you want to iterate over the values, use the values() method, which returns a list of the values
      - what if you want to iterate over the key/value pairs?
         - there is a method called items() that returns key value pairs as a 2-tuple
            >>> my_dict = {"dave": 1, "anna": 15}
            >>> my_dict.items()
            [('dave', 1), ('anna', 15)]
         - how could we use this in a loop?

            for (key, value) in my_dict.items():
               print "Key: " + str(key)
               print "Value: " + str(value)
         - items() returns a list of 2-tuples, which we're iterating over

   - Does the following print like you'd like it to?
      >>> print_counts(get_counts("this is some string"))
          3
      e   1
      g   1
      i   3
      h   1
      m   1
      o   1
      n   1
      s   4
      r   1
      t   2

      - prints in a random order
      - like the values in sets, there is NO inherent ordering to the keys in a dictionary
   - how could we print this in sorted order?
      - get the keys first
      - sort them
      - then use them to iterate over the data
   - look at print_counts_sorted in histogram.py code