Homework 1

Go to UCI machine learning repository and download two data files: german.data and german.data-numeric.

They are the same thing but with different formats: nominal and numerical. Write them in two files with ARFF format. For the first one write attributes as nominal ones but for the second one everything is numeric.

Separate each file so that 500 are training and 500 are testing. Use the C4.5 implementation in Weka to train and classify these two files. The C4.5 implementation in weka is called weka.classifiers.j48.J48. Then write a short report (<= 2 pages) in English to describe what you find.

How to run weka on our linux system ?
If you use IBM java:

  1. Download weka
  2. Uncompress the data
    /opt/IBMJava2-13/bin/jar xvf weka-3-2-1.jar
  3. Test their sample data
    /opt/IBMJava2-13/bin/java -cp weka.jar weka.classifiers.j48.J48 -t ./data/iris.arff

If you use SUN java (i.e. /usr/bin/java and jar):
  1. Download weka
  2. Uncompress the data
    jar xvf weka-3-2-1.jar
  3. Test their sample data
    java -classpath /usr/lib/jdk1.1/lib/classes.zip:weka.jar weka.classifiers.j48.J48 -t ./data/iris.arff

Last modified: Sat Oct 6 22:49:13 CST 2001