CS150 - Fall 2011 - Class 10

  • Pi video
       http://www.youtube.com/user/Vihart#p/u/9/jG7vhMMXagQ

  • admin
       - No lab prep for Friday
       - I did add a few pages to read from the book about file I/O
       - Lab on Friday will be partnered
       - Test project 1 out today
          - due next Friday (10/21) at 6pm
          - honor code
             - must work alone
             - may only use: book, your notes, class notes, python.org documentation
             - may NOT: get help from other students, get help from the tutors (except for file issues, etc), look online for solutions
          - 3 problems
          - required to do some extra credit
             - 63 points total, but only 60 for just doing what I stated
          - More than a third of the points come from code style and commenting
          - follow instructions carefully!

  • problem set, problem 1c.
       - what does it do?
       - how does it work?

  • aliasing
       - what will be the output of my_list after doing the following:

          >>> my_list = [1, 2, 3, 4, 5]
          >>> other_list = my_list
          >>> other_list[2] = 100
          >>> other_list
          [1, 2, 100, 4, 5]
          >>> my_list

       - [1, 2, 100, 4, 5] ... why?
          - my_list and other_list are just references to the same object
             - this is called aliasing, since other_list is an alias (another name) for my_list
          - saying other_list = my_list does not do a deep copy, that is it does NOT create a new list that is a copy of the list
          - draw a picture
       
       - notice that if I make changes to either one, changes will be seen in the other
          >>> my_list
          [1, 2, 100, 4, 5]
          >>> other_list
          [1, 2, 100, 4, 5]
          >>> my_list[0] = 0
          >>> other_list[1] = 1000
          >>> my_list
          [0, 1000, 100, 4, 5]
          >>> other_list
          [0, 1000, 100, 4, 5]      

       - aliasing can also show up in other places
          >>> my_list = [1, 2, 3, 4, 5]
          >>> def mystery(x):
          ...    x[0] = 1000
          ...
          >>> my_list
          [1, 2, 3, 4, 5]
          >>> mystery(my_list)
          >>> my_list
          [1000, 2, 3, 4, 5]

       - parameters are passed as a shallow copy (i.e. an alias)
          - "parameter passing" describes how the values that are input to the function (i.e. the arguments) are bound to the parameters inside the function
          - be careful!
          - why do you think this is done?
             - a deep copy can be a lot of work
             - also allows us to write functions that manipulate the parameter (which we may or may not do)
          - notice that we cannot changes what other_list reference (only mutate the object)
          
             def mystery(alist):
                alist = [0]*10
                print alist

             >>> my_list = [1, 2, 3, 4, 5]
             >>> mystery(my_list)
             [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
             >>> my_list
             [1, 2, 3, 4, 5]      

       - slicing does create a new copy
          >>> my_list = [1, 2, 3, 4, 5]
          >>> other_list = my_list[2:4]
          >>> other_list
          [3, 4]
          >>> other_list[0] = 100
          >>> other_list
          [100, 4]
          >>> my_list
          [1, 2, 3, 4, 5]
       
       - given this, how could we create a deep copy of other_list?
          >>> my_list = [1, 2, 3, 4, 5]
          >>> other_list = my_list[:]
          >>> other_list[3] = 100
          >>> other_list
          [1, 2, 3, 100, 5]
          >>> my_list
          [1, 2, 3, 4, 5]


  • run the sentence_stats function from word-stats.py code
       - similar idea to our scores functions except now we're going it over strings instead of numbers
       - the string class has a "split" method that splits up a sentence into a list by splitting on spaces
          
          >>> "this is a sentence".split()
          ['this', 'is', 'a', 'sentence']

       - optionally, can specify what to split on (though this is much more rare)

          >>> "this is a sentence".split("s")
          ['thi', ' i', ' a ', 'entence']

  • files
       - what is a file?
          - a chunk of data stored on the hard disk
       - why do we need files?
          - hard-drives persist state regardless of whether the power is on or not
          - when a program is running, all the data it is generating/processing is in main memory (e.g. RAM)
             - main memory is faster, but doesn't persist when the power goes off

  • reading files
       - to read a file in Python we first need to open it

          file = open("some_file_name", "r")

          - open is another function that takes two parameters
          - the first parameter is a string identifying the filename
             - be careful about the path/directory. Python looks for the file in the same directory as the program (.py file) unless you tell it to look elsewhere
          - the second parameter is another string telling Python what you want to do with the file
             - "r" stands for "read", that is, we're going to read some data from the file
          - open returns a "file" object that we can use later on for reading purposes
             - above, I've saved that in a variable called "file", but I could have called in anything else

             >>> open("english.txt", "r")
             <open file 'english.txt', mode 'r' at 0x10120a030>
             >>> type(open("english.txt", "r"))
             <type 'file'>

       - once we have a file open, we can read a line at a time from the file using a for loop:

          for <variable> in <file_variable>:
             # do something

          - for each line in the file, the loop will get run
          - each time the variable will get assigned to the next line in the file
             - the line will be of type string
             - the line will also have an endline at the end of it which you'll often want to get rid of (the strings strip() method is often good for this)
       
  • look at the file_stats function in word-stats.py code
       - what does it do?
          - opens a file
          - reads a line at a time
          - appends each entry in the file to a list called words (stripping of the end of line)
          - prints out the statistics of the word file

       - in this same directory I have a file call "english.txt" that has a large list of English words

          >>> file_stats("english.txt")
          Number of words: 47158
          Longest word: antidisestablishmentarianism
          Shortest word: Hz
          Avg. word length: 8.37891768099

          - notice how quickly it can process through the file
             - computers are fast!