For this program, you will be working with a very large file (over 6 Megabytes!), the log file for several days for the CS department web server. This file contains a log entry for every access of a web page on our servers. Here is a sample of what the entries look like:
134.173.95.84 - - [11/Nov/2007:12:50:35 -0800] "GET /classes/cs151/ HTTP/1.1" 134.173.95.84 - - [11/Nov/2007:12:50:38 -0800] "GET /classes/cs151/assignments.html HTTP/1.1" 134.173.95.84 - - [11/Nov/2007:12:50:57 -0800] "GET /classes/cs151/assignments/insurance.xml HTTP/1.1" 66.249.73.50 - - [11/Nov/2007:12:51:01 -0800] "GET /~tzuyi/Classes/ProblemSets/ps10a.ps HTTP/1.1" 66.249.73.50 - - [11/Nov/2007:12:51:04 -0800] "GET /~tzuyi/Classes/Notes/030129.pdf HTTP/1.1" 66.249.73.50 - - [11/Nov/2007:12:51:48 -0800] "GET /~tzuyi/Classes/Notes/tree_code.pdf HTTP/1.1" 66.249.73.50 - - [11/Nov/2007:12:52:17 -0800] "GET /~tzuyi/Classes/Notes/041116.ppt.pdf HTTP/1.1" 65.55.209.100 - - [11/Nov/2007:12:52:47 -0800] "GET /~kim/CSC051S06/demos/PolyStructure/structure/Iterable.java HTTP/1.0" 65.55.209.102 - - [11/Nov/2007:12:54:19 -0800] "GET /cs080/ HTTP/1.0"
Actually, I cheated a bit and truncated each entry to get rid of some long, but not relevant info at the end of each line.
The beginning of each entry consists of the IP address of the computer making the request. This consists of 4 groups of digits, separated by periods. Starting immediately after the first "[" is the date and time (in "universal" 24 hour format) of the request. Finally the URL of the page requested comes after the "GET" and continues to the next occurrence of a double quote. Thus the first line represents a request from IP address 134.173.95.84. It was made on 11 November of this year at 12:50:35 pm Pacific Standard time. It involved a request for the page classes/cs151/ (the extra HTTP/1.1 describes the protocol used) on www.cs.pomona.edu.
The starter folder you copy will include a a class ParseEntry that you will need to write. It should be used to break up a line from the log into four strings corresponding to the IP address, date, time, and URL. The constructor takes a String corresponding to a line of the file as a parameter, and the class should provide methods getAddress(), getDate(), getTime(), and getURL() that return Strings representing the appropriate part of each entry. Warning: A few lines do not contain the "GET". For those lines, getURL() should return the string "No URL".
The log file "access.log" can be found in your start-up folder. Your job is to use the log file to answer the following questions:
For each computer in a dorm that accessed the CS 51 web site, please list the earliest and latest times within the midnight to 6 a.m. time frame that they accessed the web site. One line of your output should look something like:
89-120.res.pomona.edu from 04:01:21 to 05:40:41
You may work in pairs on this assignment and turn in only one program with both names on it.
Begin by writing the ParseEntry class and try it out. We have provided code within the begin method of class Trivia that will call method testParseEntry that we have provided in order to test whether your methods of class ParseEntry work. When your are convinced that your ParseEntry class is correct, please remove the call to testParseEntry() from the begin method and erase the method.
The begin method opens a file and prepares it for reading (we'll talk about exactly how that works soon). It then calls the method answerQuestion(). You are to fill in the body of answerQuestion so that you can get the answers to the questions above. the structure of the method is as follows:
private void answerQuestion() throws IOException { // declarations and initialization code String line = theFile.readLine(); while(line != null) { // do whatever is necessary to process line line = theFile.readLine(); } // display answers as necessary }
Please have your answers displayed in the TextArea named display that is created in the begin method. Remember to use the append method to add new content to the TextArea.
All programs will be due by Tuesday at 11 p.m., though I hope you will be well prepared enough for lab that you will finish by the end of lab. When your work is complete you should deposit in the appropriate dropoff folder a copy of the entire folder containing all of your .java files. Before you do this, make sure the folder name includes your name(s) and the phrase "Lab 11". Also make sure to double check your work for correctness, organization and style.
Grading Point AllocationsValue | Feature |
Syntax Style (3 pts total) | |
1 pt. | Descriptive comments |
1/2 pt. | Good names |
1 pt. | Good use of constants |
1/2 pt. | Appropriate formatting |
Semantic style (3 pts total) | |
1 pt. | conditionals and loops |
1 pt. | General correctness/design/efficiency issues |
1 pts. | Parameters, variables, and scoping |
Correctness (4 pts total) | |
1 pt. | parsing log lines |
3 pt. | correct answers to questions |
Computer Science
051
Department of Computer Science
Pomona College