Homework 3

After the previous homework, you should already have the code for fast reading files in this directory. So we would like to try a large set: 50000 training and 50000 testing. Files are ijcnn and ijcnn1.t

Compare your code for reading the data with the subroutine read.matrix.csr2 in the following R code. Show which one is faster and discuss the reasons.

As previous hws, run random forest to see how the accuracy is.

We also would like to exploit the use of R subroutines on analyzing features. Now the data consists of 22 features, which are indeed some transformations of the original 5 features. Check the original training file and use subroutines to draw distributions of the five features. Tell what you find from such figures. For example, some features may be more important and some may be less.

Note that in the original training set, the end of each line is the class label.

Write a short report (<= 2 pages in English) to show what you find.

Last modified: Sat Mar 6 20:32:16 CST 2004