Department of Computer Science
Middlebury College
CS 458 - Information Retrieval
Fall 2012

instructor: Dave Kauchak
e-mail: [first_initial][last_name]@middlebury.edu
office hours: MBH 635
  M/W 2:45-3:45pm
  T 2-3pm
  Th 1-2pm
  and by appointment

time: T/Th 11am-12:15
location: MBH 632
web page: http://www.cs.middlebury.edu/~dkauchak/classes/cs458/ (or go/cs458)

textbook:

summary:
In this course we will explore how search engines work and will cover: basic text processing, index construction, text similarity, evaluation, and searching of other types of media. In addition, we will examine related application areas including language modeling, clustering, classification and e-commerce.

Other information:


Announcements

The midterm is available and must be taken by Oct. 23rd at midnight
Final project specification

Schedule

Note: This is a tentative schedule and will likely change
DateTopicReadingAssignmentComments
9/11admin, intro (ppt)Ch. 1 except 1.2homework 1 
9/13text preprocessing (ppt)Ch. 2, 5.1assignment 1 
9/18index construction (ppt)Ch. 1.2, 4homework 2 
9/20index compression (ppt)Ch. 5  
9/25TF-IDF (ppt)Ch. 6 except 6.4.4assignment 2 
9/27faster TF-IDF (ppt)Ch. 7, article 1, article 2  
10/2spelling corection (ppt)Ch. 3.3, 3.4  
10/4evaluation (ppt)Ch. 8homework 3 
10/9relevance feedback/
query expansion
(ppt)
Ch. 9assignment 3
sample output
 
10/11web search basics (ppt)Ch. 19 except 19.3, article  
10/16Fall recess   
10/18detecting duplicates (ppt)Ch. 20  
10/23link analysis (ppt)Ch. 21homework 4 
10/25text segmentation (ppt)paper  
10/30audio processing (ppt)paperassignment 4 
11/1audio search (ppt)paper  
11/6NO CLASSpaper  
11/8image search (ppt)
image processing (ppt)
paper  
11/13project discussion  Quiz!
11/15text classification (ppt)Ch. 13 except 13.5  
11/20text classification (ppt)
exercise
Ch. 14-14.6 except 14.2, 15-15.3  
11/22Thanksgiving   
11/27Paper discussion 1Watson article  
11/29document modeling (ppt)document modeling  
12/4Paper discussion 2User activity paper  
12/6online advertising (ppt)   

We will use our final exam slot for project material on Friday, December 14 7-10pm.