Note that this course is highly research oriented. You want to take this course only if you are interested in doing related research.
HW1-2: number of corrections of perceptron algorithm (given in the class)
HW2-2: explain why the following situation happens (this question is from Yu Lan at Tsinghua University)
HW2-3: given in the class. Showing that linearly independent data are linearly separable.
Also due to the time limit, we consider only the RBF kernel which is the default mapping function. Hence, you need to conduct model selection for selecting C and gamma. This can be done by using the file grid.py provided in libsvm (under the directory python). It calculates the cross validation accuracy using different C and gamma (range specified by you) and draw a contour. The usage of grid.py is in the README file of the same directory. The guide A practical guide to support vector classification provides more information and practical examples. An libsvm option -m specifies how much memory to use. The default is 40 but you need more memory to save time. For this problem, -m 300 is enough.
To restrict the search space, you can use choose.py to chose a subset first. After knowing the possible region of parameters, you run the whole set.
After finding the best parameter, you train the whole training set and then predict the test set.
Note: the model selection (i.e. cross validation) can be time consuming so you want to do this homework as soon as possible.
HW4-1: the RBF kernel as an inner product when n = 2 (given in the class)
HW 4-2: given in the class
HW 4-3, 4-4: problems listed in the class note, given in the class
HW 4-5: use nearest neighbor to train/test the data set used in HW 3.
HW5-1: If the error term of SVM formulation is defined as follows
(0.5/\gamma) \xi^2 if \xi <= \gamma
\xi - 0.5\gamma if \xi > \gamma,
where \gamma is a constant, derive its dual.
This is a difficult problem. We will ask one of you to do a presentation at November 24.
HW 5-2: we would like to study the effect of data scaling. In ~cjlin/htdocs/libsvmtools/binary there are some two-class problems in original and scaled formats. You first randomly split each file to training and testing. Conducting grid search of cross validation on the training and then predict the testing. You should observe that for non-scaled data, it is more difficult to locate good parameters and the testing performance is not good. Write a short (2-page) report to explain what you find.
HW5-3: given in the class.
I know you are going to check this place, so I would like to say something here. For the presentation today, some did quit well but some did not. I would encourge you to spend more time on doing the project. They are all interesting research topics. You don't need to read many papers, nor do you want to try many approaches. You should be more focused and produce some conclusive results. This is how a research work is done... Anyway, I still keep the due date to be December 15, but HW 8 will be due at December 22.
HW 6-1: Redo HW3 on the training set 22features and testing set 22test. You also would like to compare svm with nearest neighbor.
HW 6-2: Write a simple SVM classifier using the SMO-type algorithm taught earlier. You consider dense input format only. Basically you use the simple working set selection and solve two-variable optimization problems until given stopping tolerance is satisfied. You can write the predictor in the same program so we immediately know the test accuracy. Kernel elements are calculated by need.
Requirement:
HW8-1: derive the dual of SVR
HW8-2: Consider the smallest four problems in software/svm/regession_data/newsvrdata. They are regression data sets. You would like to randomly split each data to training and testing first. Then conduct cross validation on the training to choose the best parameters. Write a short report (<= 2 pages) to show your results and experience.
At December 1, each group please give a 10 to 15-minute presentation about your progress.
There are several possible approaches. One is to consider some methods for speeding up SVM testing. There are some papers about this. On the other hand, you may think about any possibilities to approximate the curve in a 3-D space.
The java code of svm-toy is in the java directory of libsvm package.
An earlier study of this data set is in the second part of the thesis at ~cjlin/latex/students/Bo-June.Chen.
For example, we consider the first problem of KDD cup 2001 for the prediction of molecular bioactivity for drug design -- binding to thrombin.
The training file contains 1910 data with more than 100,000 features. Using the sparse format it is available here (in zip format). For testing we have 634 data which are available here (in zip format).
Before using SVM or other classification methods, we should select a few hundreds or thousands "important features." You would like to think of different ways for such a selection and conduct experiment to compare with existing results.
You might want to check the thesis in ~cjlin/software/svm/libsvm/papers/MatthiasHeiler.ps.gz An implementation is in ~cjlin/software/svm/libsvm/fselect.tgz
Multi-labelling is a different type of classification problems. A data instance may belong to more than one class. This project concerns about how to extend libsvm for such problems. First you need to collect some multi-labelling data sets. This can be done by checking some earlier work such as paper1, paper2, and references therein. The only multi-label problem in UCI machine learning repository is "university," which could also be considered.
Libsvm uses the so called 1vs1 (pairwise) voting strategy for multi-class classification. In the prediction phase, k(k-1)/2 votes are casted to k classes and we pick the one with the larget number of votes. The goal of this project is to constructa multi-labeling system based on the current 1vs1 strategy implemented in libsvm.
Now for multi-labelling problems, we may have to select the class with the second or third highest votes as well. You need to design a new rule for this selection process and conduct experiments to compare with existing approaches. For example, for each (C,g), you have four-fold as training and one-fold as validation. Using the four-fold you first train k(k-1)/2 models. With all f_ij and x as attributes and # of casses as target label, you conduct SVR or multi-class SVM to have model for predicting # of classes. Then for each of the validation set, you first calculate the k(k-1)/2 decision values and then predict the # of classes.
When conducting the pairwise training, ix x is in both class1 and class2, in the training file you can directly have
class1 x class2 xIs this right or not ? For the binary problem of class1 vs. class2, should we exclude data which are in both clases ? By excluding data in both classes, we hope that when these data are put into the model, the probability to be in either class is 0.5. If they are included, most likely the separating plane cut through it, so the same effect seems to occur. From this point of view, it may be fine to include data in the two classes.
Note that once the number of classes (labels) is determined, there are different ways to do the prediction. One is to simply select classes with the highest votes. Another is sequentially remove votes involved with the class picked earlier and then select the highest from the remaining. For example, if we would like to choose two classes and class 1 is the one with the highest votes, we remove all results of 1vs2, ..., 1vsk before selecting the other class.
You also would like to consider some extreme situations. For example, if x_i, i = 1, ..., l are all in classes 1, 2, and 3, will your system be able to correctly label data ? How about existing approaches ?
You may want to use the new version of libsvm (2.5), in which devision values can be easily accessed. In earlier version, decision values are not available for users. We have both C++ or python subroutines to access them. You also would like to check somi's svmprob example ins ~cjlin/htdocs/svmprob/svmprob-1.0.tgz.
The evaluation of such data sets is also a concern. Different criteria may have to be tried. You may then compare your approach with the so called "binary approach" mentioned in the paper by Elisseeff and Weston. This "binary approach" is based on "1 vs the rest." You can have a modification using 1vs1 as well.
The authors have provided all data on the web. You have to transform it to our format. Reproducing the same results as in the paper is the first step and then we seek for any possible improvements.
It seems for this application, the number of support vectors is a concern. We may apply for some novel techniques to improve this aspect.
Finding some tools for analyzing memory usage or data locality may help.
For this project, you would like to focus on one or two topics and propose your own methods to solve them.
Several papers have shown ways to do transductive SVMs. For example, the software svmlight has an implementation. In this project, you would like to re-implement existing methods (by modifying libsvm), conduct experiments, and, if possible, propose new approaches. Note that efficient implementation is not the most important thing here as we focus on getting some observations. Hence maybe you can use the python interface to save the time for implementation.
You may want to think about multi-class transductive learning as well.