:::

Announcements

:::

[2020-01-03] Prof. Ye-In Chang, National Sun Yat-Sen University, "Patients and Doctors Together to Discover Medical Knowledge with Statistics and Classification: of Patients, by Patients and for Patients"

專題討論演講公告
Poster:Post date:2019-12-23
Title: Patients and Doctors Together to Discover Medical Knowledge with Statistics and Classification: of Patients, by Patients and for Patients
Date: 2020-01-03 13:20-14:20
Location: R107, CSIE
Speaker: Prof. Ye-In Chang, National Sun Yat-Sen University
Hosted by: Prof. Feipei Lai
 
 

Abstract:

 
Knowledge discovery in the database focuses upon methodologies for extracting useful information from collection of data. One of approaches for knowledge discovery is data mining. Data classification is one of famous and useful techniques for data mining that assigns categories to collected data in order to analyze the accurate prediction. Moreover, one of models for data classification is a decision tree. In fact, one of key points of a good decision tree is the kind of deciding factors in the internal nodes. In statistical tests, the chi-square test is one of good ways to analyze whether categorical variable A is the significant factor to categorical variable B. From our observation from research papers in the topic of medicine, we consider that the risk factor (i.e., the significant factor of the chi-square in statistics) is strongly related to the important deciding factor in the decision tree. Therefore, in this study, first, we study the chronic kidney disease as an important risk factor for the bladder cancer by cooperating with Department of Urology, Chang Gung Memorial Hospital, Kaohsiung, Taiwan, and we propose a statistic approach to check the relation. In such a study, we need several preprocessing steps of knowledge discovery, including data selection, cleaning unclear data, and data enrichment. Moreover, the resulting risk factor (i.e., the significant factor) can be used as a deciding factor in a decision tree. Second, we make use of the significant factor to improve the performance of the decision tree, and we propose an approach which aims to reduce the number of deciding factors and decide the order of deciding factors in a decision tree. In such a study, we take the public baseball database as an example to illustrate our method. In fact, what we care about is the comparison of the performance of the same decision tree algorithm with or without using the preprocessing step, i.e., the pruning process of insignificant factors, before we construct the decision tree. Therefore, we compare the performance of the case that it uses the preprocessing step and the case that it does not use the preprocessing step. Overall, our proposed method can be applied to any other database for an extra attribute with a class value. For each of those two directions for research, we have shown that our contribution in terms of high accuracy, short processing time and less storage to some degree. Consequently, in this study, we have proposed efficient algorithms for data classification toward knowledge discovery based on statistics and decision trees.
 
 
Biography:
 
Ye-In Chang received her B.S. degree in Computer Science and Information Engineering from National Taiwan University, Taipei, Taiwan, in 1986. She received her M.S. and Ph.D. degrees in Computer and Information Science from The Ohio State University, Columbus, Ohio, in 1987 and 1991, respectively. From August 1991 to July 1999, she joined the faculty of Department of Applied Mathematics at National Sun Yat-Sen University, Kaohsiung, Taiwan. From August 1997, she has been a Professor in Department of Applied Mathematics at National Sun Yat-Sen University, Kaohsiung, Taiwan. Since August 1999, she has been a Professor in Department of Computer Science and Engineering at National Sun Yat-Sen University, Kaohsiung, Taiwan. Her research interests include database systems, distributed systems, multimedia information systems, mobile information systems and data mining.
 
 
 
Last modification time:2019-12-23 PM 5:09

cron web_use_log