CS150 - Fall 2013 - Class 17
today: a collection of topics
- finish up optional parameters
- navigating the file system with Terminal
- calling python from Terminal
- passing command-line arguments to our program
- reading data from urls
- writing files
File system basics
- what is a file system?
- it's a way of organizing files on a hard-disk
- For most file systems, how are the files organized?
- they are hierarchical with nested directories
- on Macs and Linux everything starts at '/'
- on Windows everything starts at a "drive", e.g. "C:\"
- What is a directory?
- a directory is a container for files and other directories
- What is the "path" of a file?
- The path of a file is the sequence of directories leading up to the file
- What is a home directory?
- On systems where multiple users can login, the home directory is the location in the file system where a user's files reside
- commonly: /home/<username>/ for example /home/dkauchak/
navigating the file system with Terminal
- How do you normally navigate through the file system?
- Using finder/explorer
- By clicking on directories
- Using the mouse
- Terminal is a program that allows us to navigate the file system and run commands/programs using just the keyboard
- Terminal is an interactive shell for the operating system (just like Python has an interactive shell)
- When we first start Terminal, we see the prompt:
dkauchak-15819:~
(Note: due to various configurations, your prompt will likely be different)
- Terminal has a variety of commands that we can type that allow us to move throughout the file system without clicking
- pwd (print working directory)
- prints the current directory that you are in
dkauchak-15819:~ dkauchak$ pwd
/Users/dkauchak
dkauchak-15819:~ dkauchak$
- when terminal starts, it starts in your home directory
- ls
- lists the contents of the current directory
dkauchak-15819:~ dkauchak$ ls
Desktop Downloads Movies Pictures Sites classes research software workspaces
Documents Library Music Public bin data resources temp
dkauchak-15819:~ dkauchak$
- notice that if we navigate to the same place with Finder, we see the same files
- cd (change directory)
- changes the current directory
- we can move around the different directories by changing our current directory
dkauchak-15819:Desktop dkauchak$ pwd
/Users/dkauchak/Desktop
dkauchak-15819:Desktop dkauchak$ ls
00006.MTS Screen shot 2011-11-02 at 10.14.37 PM.png evis_flight.pdf
DMV-VD119-Vehicle_Reg_Tax_Title_App.pdf albanian.rtf movies.rtf
DMV-VT028-Tax_Title_Application.pdf blah.png zipf_corollary.png
dkauchak-15819:Desktop dkauchak$
- if we want to go up a directory, we use "..", e.g. "cd .." goes up one directory
dkauchak-15819:Desktop dkauchak$ pwd
/Users/dkauchak/Desktop
dkauchak-15819:Desktop dkauchak$ cd ..
dkauchak-15819:~ dkauchak$ pwd
/Users/dkauchak
dkauchak-15819:~ dkauchak$ ls
Desktop Downloads Movies Pictures Sites classes research software workspaces
Documents Library Music Public bin data resources temp
- if you type "cd" without any arguments, it takes you back to your home directory
dkauchak-15819:~ dkauchak$ cd classes/
dkauchak-15819:classes dkauchak$ cd cs150/
dkauchak-15819:cs150 dkauchak$ pwd
/Users/dkauchak/classes/cs150
dkauchak-15819:cs150 dkauchak$ cd
dkauchak-15819:~ dkauchak$ pwd
/Users/dkauchak
- lots of other commands for moving, creating and manipulating files and directories
- a fairly comprehensive list:
http://ss64.com/bash/
windows equivalents
- Windows has a similar program called "command"
- To run command, under the start menu goto "run" and then run "cmd"
- if you really want to be hard-core, download cygwin which has a similar interface to the above
- I've posted the equivalent commands for windows on the course web page in the "Resources" section at the bottom, but here's a quick review
- directories are delimited by a backslash instead of a forward slash
- "cd" works the same
- instead of "ls" use "dir" to list the directory contents
Python and Terminal
- besides navigating files we can also run commands/programs from within Terminal
- for example, we can run Python by typing "python"!
dkauchak-15819:~ dkauchak$ python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
- when we run Python from the Terminal, it executes the Python shell
- we can interact with it just like we did in the Python shell within Wing
>>> print "hello"
hello
>>> x = 4
>>> x
4
>>> import math
>>> math.sqrt(x)
2.0
- Wing is an IDE
- what does IDE stand for?
- Integrated Development Environment
- what does that mean?
- Wing has the python shell built into it
- but it also has an editor for editing our programs
- allows us to run and debug our programs
- but it is built on top of the exact same Python we can run from Terminal
running Python programs from the Terminal
- just like we can run Python programs in Wing, we can also run Python programs from the python terminal
- first, you need to change your directory into the directory where your .py file is (using "cd")
- once you're there, you can run your program by typing python followed by the name of your .py file (i.e. the name of your program)
dkauchak-15819:examples dkauchak$ python print_vs_return.py
100
100
25
25
None
25
dkauchak-15819:examples dkauchak$
- when you run a program from Terminal:
- the program executes each step, just as if we'd run it in Wing
- Any input/output (e.g. print or raw_input) happen through the Terminal window
- when the program finishes, you end up back at the Terminal prompt (i.e. python exits)
- if you want to still be able to call functions, etc running a file (like in Wing), you need to run python in interactive mode:
dkauchak-15819:examples dkauchak$ python -i print_vs_return.py
100
100
25
25
None
25
>>>
command-line parameters
- when you run Python programs from the Terminal, you can also specify arguments to pass extra information to the program
- these arguments are added after the "python program_name.py"
- look at
sys_args.py code
- there is a module called "sys"
- has lots of functionality regarding the Python system
- inside the module is a variable called argv (short for arguments vector)
- this variable is a list and contains all of the things that were typed on the command-line when python started, after "python"
- if you're running a program, the first thing in the list is always the name of the .py file
- everything after that are any other arguments that you may want to pass to your program
dkauchak-15819:examples dkauchak$ python sys_args.py
Arguments: ['sys_args.py']
0: sys_args.py
dkauchak-15819:examples dkauchak$ python sys_args.py information
Arguments: ['sys_args.py', 'information']
0: sys_args.py
1: information
dkauchak-15819:examples dkauchak$ python sys_args.py these are some arguments
Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments']
0: sys_args.py
1: these
2: are
3: some
4: arguments
- how might this be useful? what type of information might we pass it? How is this different than, say, raw_input?
- another way of interacting with the program
- often pass things like filenames, urls, numbers, etc. (similar types of things you might use for raw_input)
- allows for repeatability
- like in the Wing shell, we can just hit up to run the program again
- can run this program externally, without requiring user interaction
- this is a common phenomena for many programs
- compare, for example, running Word by itself, vs. running it by double-clicking on a .doc(x) file
web pages
- what is a web page or more specifically what's in a web page?
- just a text file with a list of text, formatting information, commands, etc.
- written mostly in html, but can also contains some scripting (i.e. mini-programs), for example in javascript
- sometimes this content can be automatically generated by a program
- this text is then parsed by the web browser to display the content
- you can view the html source of a web page from your browser
- in Safari: View->View Source
- in Firefox: View->Page Source
- in Chrome: View->Developer->View Source
- html content
- html consists of tags (a tag starts with a '<' and ends with a '>')
- generally tags come in pairs, with an opening tag and closing tag, e.g. <html> ... </html>
- lots of documentation online for html
reading from web pages using urllib
- look at
url_basics.py code
: what does this program do?
- uses sys.argv to get input from the user
- if the uses does not provide exactly one argument, it calls print_usage, which prints out how to run the program
- take a single argument from the command-line
- if that argument is a url (web page address)
- use the urllib module to open a connection to the web page
- urllib.urlopen takes a web page as a parameter and opens a reader to that web page that reads a line at a time
- this is almost identical to file reading!
- only difference is how we open it (open, for a file vs. urllib.urlopen, for a url)
- otherwise assume it's a file
- and open it using open
- print out the contents of whatever was opened
- notice that we can read from a web page in the same way that we read from a file, a line at a time
- we can interchange a file or a web page reader once it's opened, since the functionality is the same
- we can run this from the command-line
- if we don't give it any arguments on the command-line, we get the usage
dkauchak-15819:examples dkauchak$ python url_basics.py
url_basics.py <filename or url>
- if we give it a web page, it prints out the source for the page
python url_basics.py
http://www.cs.middlebury.edu/~dkauchak/classes/cs150/
<html>
<head>
<title>CS 150 - Computing for the Sciences - Fall 2011</title>
...
- if we give it a file, it prints out the text in the file
dkauchak-15819:examples dkauchak$ python url_basics.py url_basics.py
import urllib
import sys
def print_data(reader):
...
reading web pages: ethics
- you are reading a file on a remote server
- you shouldn't be doing this repeatedly
- if you're trying to debug some code, copy the source into a file and debug that way before running live
- there are some restrictions about what content a web site owner may want you looking at
- see
http://www.robotstxt.org/
look at
url_extractor.py code
- what does the get_note_urls function do?
- opens up the course web page
- reads a line at a time
- checks each line to see if it contains any lecture notes
- if so, keeps track of it in a list
- what does write_list_to_file do?
- opens a file, this time with "w" instead of "r"
- "w" stands for write
- if the file doesn't exist it will create it
- if the file does exists, it will erase the current contents and overwrite it (be careful!)
- we can also write to a file without overwriting the contents, but instead appending to the end
- "a" stands for append
- just like with reading form a file, we get a file object from open
- the "write" method writes an object to the file as a string
- why do I have the "\n" appended on to the end of item?
- write does NOT put a line return after the end of it
- if you want one, you need to put it in yourself
- what does this program do?
- gets the lecture urls from the course web page
- writes them to a file called "lectures.txt"