Homework 1

Go to UCI machine learning repository and download the data soybean-large.data.

Prepare two files (one with all attributes numerical and another as nominal) with ARFF format.

Train the file and test the file soybean-large.test Use the C4.5 implementation in Weka to train and classify these two files. The C4.5 implementation in weka is called weka.classifiers.j48.J48. Then write a short report (<= 2 pages) in English to describe what you find.

How to run weka on our linux system ?
If you use IBM java:

  1. Download weka
  2. Uncompress the data
    /opt/IBMJava2-13/bin/jar xvf weka-3-2-1.jar
  3. Test their sample data
    /opt/IBMJava2-13/bin/java -cp weka.jar weka.classifiers.j48.J48 -t ./data/iris.arff

If you use SUN java (i.e. /usr/bin/java and jar):
  1. Download weka
  2. Uncompress the data
    jar xvf weka-3-2-1.jar
  3. Test their sample data
    java -classpath /usr/lib/jdk1.1/lib/classes.zip:weka.jar weka.classifiers.j48.J48 -t ./data/iris.arff

Last modified: Sat Oct 6 22:49:13 CST 2001