~/spark-1.0.0-bin-hadoop1/ and Hadoop home is ~/hadoop-1.2.1/.~/spark/ instead of ~/spark-1.0.0-bin-hadoop1/,
while Hadoop will be placed at
~/ephemeral-hdfs/ insteads of ~/hadoop-1.2.1/. Please properly change directory names, paths, master name and user name in this guide.
To simplify this guide,
we assume the names of master machine and user are "pineapple0" and "spongebob," respectively.
$ cd ~$ tar zxvf spark-liblinear-1.95.tar.gz$ mv spark-liblinear-1.95/ ~/spark-1.0.0-bin-hadoop1/
spark-liblinear-1.95.jarspark-liblinear-1.95.jar.
You can find this file at ~/spark-1.0.0-bin-hadoop1/spark-liblinear-1.95/spark-liblinear-1.95.jar.
If you want to pack it by yourself,
please check README file at ~/spark-1.0.0-bin-hadoop1/spark-liblinear-1.95/README.spark.
$ ~/hadoop-1.2.1/bin/start-all.shheart_scale for example.
You can find dataset heart_scale in the directory of Spark LIBLINEAR
and put it into HDFS by ~$ hadoop fs -put ~/spark-1.0.0-bin-hadoop1/spark-liblinear-1.95/heart_scale heart_scale~$ hadoop fs -ls$ ~/spark-1.0.0-bin-hadoop1/sbin/start-all.sh
$ cd ~/spark-1.0.0-bin-hadoop1/ $ ./bin/spark-shell --jars "/home/spongebob/spark-1.0.0-bin-hadoop1/spark-liblinear-1.95/spark-liblinear-1.95.jar"
scala> import tw.edu.ntu.csie.liblinear._loadLibSVMData.scala> val data = Utils.loadLibSVMData(sc, "hdfs://pineapple0:9000/user/spongebob/heart_scale")scala> val model = SparkLiblinear.train(data, "-s 0 -c 1.0 -e 1e-2")train().predict().scala> val LabelAndPreds = data.map { point =>val prediction = model.predict(point)(point.y, prediction)}scala> val accuracy = LabelAndPreds.filter(r => r._1 == r._2).count.toDouble / data.countscala> println("Training Accuracy = " + accuracy)