Homework 8

We will use the same data in HW 4, but consider it as a multi-class problem. The data file is here.

We would like to try k-NN and random forest. To evaluate these methods, you randomly select 520000 as training and the remaining as the testing. Since the problem is unbalanced, you would like to conduct a "stratified" spliting.

Similar to earlier work, if the data set is too large to be handled, try some subsets only and gradually increase the size.


Last modified: Sun May 1 20:04:50 CST 2005