To prepare rcv1 multi-label sets, run the stript % gen.sh It does the following steps: 1. Prepare the following two files rcv1.topics.txt (from rcv1 web site; "B.1. On-Line Appendix 1") rcv1-v2.topics.qrels (from rcv1 web site; "B.8. On-Line Appendix 8,") rcv1.industries.txt (from rcv1 web site; "B.4. On-Line Appendix 4") rcv1-v2.industries.qrels (from rcv1 web site; "B.9. On-Line Appendix 9,") rcv1.regions.txt (from rcv1 web site; "B.6. On-Line Appendix 6") rcv1-v2.regions.qrels (from rcv1 web site; "B.10. On-Line Appendix 10,") and run ./gen_id_label.py rcv1.topics.txt rcv1-v2.topics.qrels ./gen_id_label.py rcv1.industries.txt rcv1-v2.industries.qrels ./gen_id_label.py rcv1.regions.txt rcv1-v2.regions.qrels A file "id_label" is generated. 2. Obtain original rcv1 training/testing file from rcv1 web site (B.13. On-Line Appendix 13). For example, consider lyrl2004_vectors_train.dat and run ./dat2svm.py lyrl2004_vectors_train.dat id_label > output_file