Homework 1
Go to
UCI machine learning repository
and download two data files: german.data and german.data-numeric.
They are the same thing but with different formats: nominal
and numerical. Write them in two files with ARFF format.
For the first one write attributes as nominal ones
but for the second one everything is numeric.
Separate each file so that 500 are training and 500
are testing.
Use the C4.5 implementation in
Weka to train and
classify these two files.
The C4.5 implementation in weka is called
weka.classifiers.j48.J48.
Then write a short report (<= 2 pages) in English
to describe what you find.
How to run weka on our linux system ?
If you use IBM java:
- Download weka
- Uncompress the data
/opt/IBMJava2-13/bin/jar xvf weka-3-2-1.jar
- Test their sample data
/opt/IBMJava2-13/bin/java -cp weka.jar weka.classifiers.j48.J48 -t ./data/iris.arff
If you use SUN java (i.e. /usr/bin/java and jar):
- Download weka
- Uncompress the data
jar xvf weka-3-2-1.jar
- Test their sample data
java -classpath /usr/lib/jdk1.1/lib/classes.zip:weka.jar weka.classifiers.j48.J48 -t ./data/iris.arff
Last modified: Sat Oct 6 22:49:13 CST 2001