Advances in Cost-sensitive Multiclass and Multilabel Classification

Hsuan-Tien Lin

Home | MOOCs | AIsk | Courses | Research Group | Awards | Publications | Presentations | Programs/Data

Advances in Cost-sensitive Multiclass and Multilabel Classification: Tutorial Talk at KDD 2019

Description

Classification is an important problem for data mining and knowledge discovery. Traditionally, the regular classification problem aims at minimizing the error rate of mis-prediction. Nevertheless, many real-world data mining applications require varying costs for different types of mis-classification errors. For instance, mis-classifying a Gram-positive bacteria as a Gram-negative one leads to totally ineffective treatments and is hence more serious than mis-classifying a Gram-positive bacteria as another Gram-positive one. Such a cost-sensitive classification problem can be very different from the regular classification one, and can be used by applications like targeted marketing, information retrieval, medical decision making, object recognition and intrusion detection.

Cost-sensitive binary classification problem has been studied since the 90s, resulting in sampling and re-weighting tools that continue to influence many real-world applications. In the past 20 years, researchers have advanced those tools to tackle more complicated problems, including multiclass and multilabel classification ones. The tutorial aims to review and summarize those advances to allow more real-world applications to enjoy the benefits of cost-sensitive classification. The advances range from the Bayesian approaches that consider costs during inference, to reduction-based approaches that transform the cost-sensitive classification task to other tasks, to deep learning approaches that plug the costs into the optimization and feature-extraction process. We discuss the relationship between the approaches as well as their practical usage. We will also introduce some success in data mining applications, such as improving the performance of a real-world bacteria classification system and tackling the class-imbalance problem of KDDCup 1999.

Time and Location

between 1:00pm and 5:00pm on August 4, 2019 in Summit 3-Ground Level, Egan

Handout Slides

cs.kdd19.handout.pdf (released om 2019/09/03)

Presenter

Hsuan-Tien Lin

Professor, Department of Computer Science and Information Engineering, National Taiwan University
Chief Data Science Consultant, Appier Inc.

References

[ABe2005]	Alina Beygelzimer, Varsha Dani, Tom Hayes, John Langford, and Bianca Zadrozny. Error-limiting reductions between classification tasks. In Proceedings of the International Conference on Machine Learning (ICML), pages 49--56, 2005. [ bib ]
[ABe2009]	Alina Beygelzimer, John Langford, and Pradeep Ravikumar. Error-correcting tournaments. In Proceedings of the Conference on Algorithmic Learning Theory (ALT), pages 247--262, 2009. [ bib ]
[AK2013]	Abhishek Kumar, Shankar Vembu, Aditya Krishna Menon, and Charles Elkan. Beam search algorithms for multilabel learning. Machine Learning, 92(1):65--89, 2013. [ bib ]
[BZ2003]	Bianca Zadrozny, John Langford, and Naoki Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the International Conference on Data Mining (ICDM), pages 435--442, 2003. [ bib ]
[CE2001]	Charles Elkan. The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), pages 973--978, 2001. [ bib ]
[CH2018]	Cheng-Yu Hsieh, Yi-An Lin, and Hsuan-Tien Lin. A deep model with local surrogate loss for general cost-sensitive multi-label learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 3239--3246, 2018. [ bib ]
[CL2014]	Chun-Liang Li and Hsuan-Tien Lin. Condensed filter tree for cost-sensitive multi-label classification. In Proceedings of the International Conference on Machine Learning (ICML), pages 423--431, 2014. [ bib ]
[FT2012]	Farbound Tai and Hsuan-Tien Lin. Multilabel classification with principal label space transformation. Neural Computation, 24(9):2508--2542, 2012. [ bib ]
[GT2007]	Grigorios Tsoumakas and Ioannis Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. In Proceedings of the 2007 European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pages 406--417, 2007. [ bib ]
[HC2018]	Hsien-Chun Chiu and Hsuan-Tien Lin. Multi-label classification with feature-aware cost-sensitive label embedding. In Proceedings of the Conference on Technologies and Applications for Artificial Intelligence (TAAI), pages 40--45, 2018. [ bib ]
[HLi2014]	Hsuan-Tien Lin. Reduction from cost-sensitive multiclass classification to one-versus-one binary classification. In Proceedings of the Asian Conference on Machine Learning (ACML), pages 371--386, 2014. [ bib ]
[HLo2011]	Hung-Yi Lo, Ju-Chiang Wang, Hsin-Min Wang, and Shou-De Lin. Cost-sensitive multi-label learning for audio tag annotation and retrieval. IEEE Transactions on Multimedia, 18(3):518--529, 2011. [ bib ]
[HLo2014]	Hung-Yi Lo, Shou-De Lin, and Hsin-Min Wang. Generalized k-labelsets ensemble for multi-label and cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering, 26(7):1679--1691, 2014. [ bib ]
[HT2010]	Han-Hsing Tu and Hsuan-Tien Lin. One-sided support vector regression for multiclass cost-sensitive classification. In Proceedings of the International Conference on Machine Learning (ICML), pages 1095--1102, 2010. [ bib ]
[JLa2005]	John Langford and Alina Beygelzimer. Sensitive error correcting output codes. In Proceedings of the International Conference on Computational Learning Theory (COLT), pages 158--172, 2005. [ bib ]
[JR2009]	Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. Classifier chains for multi-label classification. In Proceedings of the 2009 European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pages 254--269, 2009. [ bib ]
[KD2010]	Krzysztof Dembczyński, Weiwei Cheng, and Eyke Hüllermeier. Bayes optimal multilabel classification via probabilistic classifier chains. In Proceedings of the International Conference on International Conference on Machine Learning (ICML), pages 279--286, 2010. [ bib ]
[KD2011]	Krzysztof Dembczyński, Willem Waegeman, Weiwei Cheng, and Eyke Hüllermeier. An exact algorithm for F-measure maximization. In Advances in Neural Information Processing Systems 24 (NeurIPS), pages 1404--1412, 2011. [ bib ]
[KD2012]	Krzysztof Dembczyński, Willem Waegeman, Weiwei Cheng, and Eyke Hüllermeier. On label dependence and loss minimization in multi-label classification. Machine Learning, 88(1):5--45, 2012. [ bib ]
[KH2017]	Kuan-Hao Huang and Hsuan-Tien Lin. Cost-sensitive label embedding for multi-label classification. Machine Learning, 106(9--10):1725--1746, 2017. [ bib ]
[NA2004]	Naoki Abe, Bianca Zadrozny, and John Langford. An iterative method for multi-class cost-sensitive learning. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 3--11, 2004. [ bib ]
[PD1999]	Pedro Domingos. MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 155--164, 1999. [ bib ]
[SK2018]	Salman Khan, Munawar Hayat, Mohammed Bennamoun, Ferdous Sohel, and Roberto Togneri. Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 29(8):3573--3587, 2018. [ bib ]
[TJ2011]	Te-Kang Jan, Hsuan-Tien Lin, Hsin-Pai Chen, Tsung-Chen Chern, Chung-Yueh Huang, Bing-Cheng Wen, Chia-Wen Chung, Yung-Jui Li, Ya-Ching Chuang, Li-Li Li, Yu-Jiun Chan, Juen-Kai Wang, Yuh-Lin Wang, Chi-Hung Lin, and Da-Wei Wang. Cost-sensitive classification on pathogen species of bacterial meningitis by Surface Enhanced Raman Scattering. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 390--393, 2011. [ bib ]
[TJ2012]	Te-Kang Jan, Da-Wei Wang, Chi-Hung Lin, and Hsuan-Tien Lin. A simple methodology of soft cost-sensitive classification. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 141--149, 2012. [ bib ]
[YC2012]	Yao-Nan Chen and Hsuan-Tien Lin. Feature-aware label space dimension reduction for multi-label classification. In Advances in Neural Information Processing Systems: Proceedings of the 2012 Conference (NeurIPS), pages 1529--1537, 2012. [ bib ]
[YC2016a]	Yu-An Chung, Hsuan-Tien Lin, and Shao-Wen Yang. Cost-aware pre-training for multiclass cost-sensitive deep learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pages 1411--1417, 2016. [ bib ]
[YC2016b]	Yu-An Chung and Hsuan-Tien Lin. Cost-sensitive deep learning with layer-wise cost estimation. Technical report, National Taiwan University, November 2016. [ bib ]
[YW2017]	Yu-Ping Wu and Hsuan-Tien Lin. Progressive k-labelsets for cost-sensitive multi-label classification. Machine Learning, 106(5):671--694, 2017. [ bib ]
[YY2018]	Yao-Yuan Yang, Kuan-Hao Huang, Chih-Wei Chang, and Hsuan-Tien Lin. Cost-sensitive reference pair encoding for multi-label learning. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 143--155, 2018. [ bib ]
[ZL2014]	Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. Multi-label classification via feature-aware implicit label space encoding. In Proceedings of the International Conference on Machine Learning (ICML), pages 325--333, 2014. [ bib ]

This file was generated by bibtex2html 1.98.

Last updated at CST 13:07, October 04, 2023
Please feel free to contact me: htlin.email.png