Note that this course is highly research oriented. You want to take this course only if you are interested in doing related research.
Every week we will randomly select one to give a 10 to 15-minute
presentation about his/her homework (in English).
Everyone has to turn in his/her homework before this
presentation.
Rules: We do not require you to come every week. If you are
absent and are selected for presentation, you will be
required to do a presentation next week. If you failed
to show up then, your final score will be deducted by
5 points. On the other hand, every week we seek for a
volunteer first who will get 3 bonus points.
When no one volunteers, everyone can be picked no
matter you have given a presentation before or not.
http://www.csee.usf.edu/%7Ehall/papers/plankton03a.pdf r93922020 r93944009
http://portal.acm.org/citation.cfm?id=958479 R93922038, R93942036
http://www.informedia.cs.cmu.edu/documents/acmm02_lin.pdf R93922013,R93922025
http://www.araa.asn.au/acra/acra2003/papers/42.pdf r92922006 and r93546015
http://springerlink.metapress.com/app/home/contribution.asp?wasp=4nc6a0wwrn2qyh8lwvtp&referrer=parent&backto=issue,105,169;journal,90,1768;linkingpublicationresults,1:105633,1 r91922113 r92922120
http://www.ai.univie.ac.at/~elias/publications/kne_ismir04.pdf p92922007 d93922011
http://www.comp.nus.edu.sg/~leews/publications/p31189-zhang.pdf r93922108, r93922140
http://www.sls.csail.mit.edu/sls/publications/2004/saenko_icmi_04.pdf r92922054,
- Using libsvm to train train combined_scale.bz2 and test combined_scale.t.bz2.
Due to the time limit, we consider only the RBF kernel which is the default mapping function. Hence, you need to conduct model selection for selecting C and gamma. This can be done by using the file grid.py provided in libsvm (under the directory python). It calculates the cross validation accuracy using different C and gamma (range specified by you) and draw a contour. The usage of grid.py is in the README file of the same directory. An libsvm option -m specifies how much memory to use. The default is 40 but you need more memory to save time. For this problem, -m 300 is enough.
To restrict the search space, you can use choose.py to chose a subset first. After knowing the possible region of parameters, you run the whole set.
After finding the best parameter, you train the whole training set and then predict the test set.
Note: the model selection (i.e. cross validation) can be time consuming so you want to do this homework as early as possible.
The RBF kernel as an inner product when n = 2
- Dual of combined loss function.
- we would like to study the effect of data scaling. In ~cjlin/htdocs/libsvmtools/binary there are some two-class problems in original and scaled formats. You first randomly split each file to training and testing. Conducting grid search of cross validation on the training and then predict the testing. You should observe that for non-scaled data, it is more difficult to locate good parameters and the testing performance is not good. Write a short (2-page) report to explain what you find.
- Derive the dual of SVR
- Consider the smallest four problems in software/svm/regession_data/newsvrdata. They are regression data sets. You would like to randomly split each data to training and testing first. Then conduct cross validation on the training to choose the best parameters. Write a short report (<= 2 pages) to show your results and experience.
Presentation will be at January 6, 2005 (30 minutes per group). 3-4 persons per group. Each group has to turn in a report (<= 10 pages, no MS word file please) by January 4 12am. As these projects are highly research oriented, please start working on them as early as possible.
On December 3, each group gives a 30-minute presentation about your progress.
Last year one group (~/htdocs/courses/slt2003/projects/svmtoy.pdf) has done some preliminary work, but they were unable to speed up the testing speed. Indeed there are some computer graphics techniques to approximate the nonlinear curve in a 3-D space.
The java code of svm-toy is in the java directory of libsvm package. You can contact the group who did this project last year and obtain their code. The evaluation is whether your code can be really released to the community.
The only judging criterion is whether your code is simple and good enough for release. It must be simple for future maintenance.
You would like to compare the number of iterations and the computational time under different parameters. However, it is impossible to analyze results in so many parameters. One possibility is to compare
You may also have to check the timing difference between libsvm and its R interface.
An earlier study of this data set is in the second part of the thesis at ~cjlin/latex/students/Bo-June.Chen.