Homework 2

Consider the covertype data from UCI machine learning repository. One attribute Soil_Type has 40 possible values. Now it is encoded as 40 binary attributes. We are interested in whether a single nominal attribute with 40 possible values or 40 binary attributes is better.

This data set is huge. To do the comparison, we first randomly select 3,000 as training and another 3,000 as testing. You then use the same classifier for hw1 to do the experiment.

Write a short report (<= 2 pages in English) to show what you find.


Last modified: Tue Feb 28 07:02:04 CST 2006