Homework 2
Consider the
covertype
data from UCI machine learning repository.
One attribute Soil_Type has 40 possible values.
Now it is encoded as 40 binary attributes.
We are interested in whether a single nominal
attribute with 40 possible values or 40 binary
attributes is better.
This data set is huge.
To do the comparison, we first randomly select 3,000
as training and another 3,000 as testing.
You then use the same classifier for hw1 to do the
experiment.
Write a short report (<= 2 pages in English) to show
what you find.
Last modified: Tue Feb 28 07:02:04 CST 2006