Web Retrieval and Mining

Fall 2009

Generation Instructions

This course provides 1 handwritten assignment and 3 programming assignments. For the 3 programming assignments, you are going to work in groups. Each group has 2 students and is required to write programs with the assistance of some open-source projects like nutch or some given text processing tool. Demonstrations and documentations are required.
The overall goals of the programming assignments are to help you understand (1) how to construct a simple search engine, including crawling, indexing and searching web pages based on their content and hyperlinks and (2) how to analyze search engine logs by clustering real users' queries.

Assignment 1: handwritten assignment (due on 10/16)
Assignment 2: Document search (due on 11/06)
Assignment 3: Webpage search (handwritten part) (due on 12/04)
Assignment 4: Query clustering (due on 12/25)