Homework 2

In our lecture today one slide discusses the process of "generating a flat file." We would like to investigate such a process in practice.

Let's consider the UCI University data set, where each instance concerns one university. If we consider "academic-emphasis" as the target class, this is a so called multi-label problem. That is, an instance may be associated with more than one class label.

We would like to transform this UCI data to libsvm format. In LIBSVM data set, you can find some multi-label data sets. If possible, we can release this UCI university data there too.

You may face difficulties such as missing values. In your report, show us what you have done and what kinds of difficulties you have faced. For this homework, you are not required to use R. Any tools can be used.


Last modified: Sun Feb 25 16:27:57 CST 2007