Homework 4

You can find classification data sets in for example, UCI machine learning repository or statlog. A problem of doing experiments is that data in these places may have different input formats.

For this homework, you choose any two different formats in these places and write R codes to read files in such forms. Then in your report, show your experience and discuss what the best way for handling such different formats might be.

It is easy to do R codes for two formats, but the main purpose is to let you think about how to handle multiple formats in a data mining/machine learning software. Just feel free to write what you think is a good way. For example, you may want to define an internal object so all data read are transformed to this type. You can propose that each method has to take care of this. So in each method, they need to have an if statement so that depending on different data type, they handle them differently.

Again, the report should be <= 2 pages and be in English.


Last modified: Thu Mar 11 20:53:49 CST 2004