Web Retrieval and Mining

Spring 2013


The Web has become the largest data repository in the world. This course aims at introducing the basic and advanced techniques of (1) Web information retrieval (IR)? How to search the large-scale Web data and (2) Web mining? How to discover knowledge from the diverse data resources on the Web.

The lecture will cover the topics of (1) Web IR, including the fundamentals of modern IR systems, crawling, ranking algorithms, Web page classification and clustering, Chinese IR, multimedia IR, and case studies of search engines, and (2) Web mining, including Web content/text mining, Web structure mining, Web query log mining, information extraction, and taxonomy generation.

Students in this course are expected to read research papers on a relevant topic to Web IR or Web mining, do a project, and then present their work in class.


Pu-Jen Cheng

Email: pjcheng@csie.ntu.edu.tw,

Homepage: http://www.csie.ntu.edu.tw/~pjcheng

Office hours: R323, 9:00 am ~ 12:00 am, Tuesday

Class Hours: 9:10 am ~ 12:10 am, Friday

Classroom: CSIE Room 102

Prerequisites: Data Structure, Algorithm, and Programming (programming experience will be necessary for the homework and project).


Introduction to Information Retrieval (IIR), by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Cambridge university Press, 2008.(Selected Chapters)

Modern Information Retrieval, by Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Addison-Wesley, 1999. (Selected Chapters)

Search Engines: Information Retrieval in Practice, by W. Bruce Croft, Donald Metzler, and Trevor Strohman, 2009. (Selected Chapters)

Mining the Web: Discovering Knowledge from Hypertext Data, by Soumen Chakrabarti, Morgan Kaufmann, 2002. (Selected Chapters)

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, by Bing Liu, Springer, 2006. (Selected Chapters)

Selected papers (mainly from SIGIR, WWW, CIKM, JASIST & ACM TOIS)


Assignments (50%)

Midterm Exam (20%)

Term Project (30%)