require('header.inc.php'); ?>
Protein secondary structure prediction is a famous problem in bioinformatics
field. Since the technique of predicting an unknown 3-D protein structure from
the primary amino acid sequences is immature, scientists try to predict the
elements of protein secondary structure from amino acid sequences first.
However, the secondary structure prediction is also a difficult problem.
Before 1993, the prediction accuracy was just slightly better than random
guess, and in 1993, Rost & Sander proposed the PHD system [Rost and Sander, 1993] and
made a significant improvement from 64.3% to 70.8%
by using evolutionary
information contained in multiple sequences alignments.
Protein secondary structure prediction has been tackled by numerous learning
algorithms including neural networks, SVM and other famous classifiers [Riis and Krogh, 1996,Cuff and Barton, 1999,Hua and Sun, 2001,Ward et al., 2003] and
therefore presents as a classic problem for testing the effectiveness of new
techniques.
We conducted the experiments on the most famous data set used in
protein secondary structure prediction, RS126. The
RS126 data has been well studied in many publications
[Riis and Krogh, 1996,Cuff and Barton, 1999,Hua and Sun, 2001,Ward et al., 2003], and can be downloaded at
this website. Also, we adopted the
same 7-fold partition used by Riis and Krogh [Riis and Krogh, 1996].
Regarding to the parameter settings of classifiers, we adopted the grid.py
utility in the libsvm package to perform the model selection process of SVM ,
and the utility select the best parameter set from 90 parameter
combinations.
All experiments have been done in the same environment and the same
data sets, so the comparison should be objective. The detailed
accuracy results can be seen in Table 1 and Table 2. As these two Table
shows, the proposed method basically delivers the
same level of accuracy with LIBSVM.
RS126
LIBSVM
QuickRBF
QuickRBF
QuickRBF
QuickRBF
All
12000
5000
1000
Set A
74.06
74.14
74.01
73.73
72.71
Set B
77.44
77.01
76.32
75.54
74.76
Set C
74.99
75.01
75.07
74.93
73.85
Set D
73.11
73.69
73.72
72.44
71.44
Set E
74.08
74.19
74.26
73.97
73.14
Set F
76.93
77.23
77.39
77.28
76.12
Set G
73.82
74.27
74.30
74.07
74.36
Average
74.92
75.08
75.01
74.57
73.77
RS126
LIBSVM
QuickRBF
QuickRBF
QuickRBF
QuickRBF
sv
15000
12000
8000
Set A
68.14
68.45
68.73
68.61
67.73
Set B
73.33
73.86
73.9
73.29
72.65
Set C
71.72
71.4
71.57
71.09
69.98
Set D
70.45
70.89
70.33
70.65
70.01
Set E
70.26
70.19
70.26
70.76
70.04
Set F
72.39
72.94
72.34
72.03
71.78
Set G
71.67
71.95
71.39
70.94
70.88
Average
71.14
71.38
71.22
71.05
70.44