Homework 3
In homework 2 you may have seen that the performance
after selecting 200 features is still not good.
Remember that
we used the software
libsvm
with parameters as follows
svm-train -c 32 -g 0.0001220703125 thrombin
We suspect that maybe we did not select good parameters.
We would like to try the following two things:
-
Using different combinations of C and g on the training
data and predict the test data.
Then report the best
result you can get. For example, if you
test
g = [24, 23, ... , 2-10]
and
C = [212, 211, ...
2-2],
then there are 225 combinations.
For each combination you predict the test data and write down
your result. Then you report the best of all 225 results.
This can be considered as the best result you can obtain
under the current setting.
- Conduct five-fold cross validation on the training
data using different criteria. See if any
criteria returns a parameter which is close
to the optimal one you found above.
For consistency, please use the following
200-feature training and
testing files prepares by Yien (yien@csie):
training
and
testing.
Running so many combinations may take a few hours so you
want to do this homework as early as possible.
Write a short report (<= 2 pages) in English about what you find.
Note that the error rate is counted by a different
way: from the KDD cup homepage "if
there are 10 actives and 100 inactives in the test set, then each
active will effectively count 10 times as much as each
inactive."
For calculating cross-validation accuracy using different
criteria, you can modify the program svm-train.c.
In particular line 144 to 157.
Last modified: Mon Oct 29 19:11:36 CST 2001