Data Mining and Machine Learning: Theory and Practice 2014

 

Instructors: Prof. Shou-de Lin
                    Prof. Chih-jen Lin

Guest Instructor: Prof. Hsuan-tien Lin

Classroom: CSIE 105

Meeting Time: Tue 9:10am~12:00

Office Hour:  After class or by appointment

TA: ¤ý·çÙy r01922165@ntu.edu.tw, ·¨¶v¦Ê skylyyang@gmail.com

Course Description:

While it is possible to learn a variety of machine learning and data mining theories from lectures or books, applying them accurately and efficiently to the real-world data is a completely different story. Very often data miners have to suffer a painful process of trial and error due to lack of experience. Dealing with the practical issues on data is rather an art than science, nevertheless, in this course we try to build up our experiences from tackling a real-world problem proposed as the ongoing competitions in data mining society. In particular, we aim at attending the some data mining competition such as KDD Cup. We expect to run this course in an interactive way, so students must discuss with the lecturers and other classmates about their findings as well as the problems they encountered every week.

Pre-requisite courses:
You have to take at least one of the following courses (two or more is even better):
    Machine Learning 
    Statistical Artificial Intelligence
    Optimization and Machine Learning

Courses Format and Loading:
You need to implement different kinds of intelligent systems for the competition and run extensive experiments to verify them. You will compete with the other students in the class as well as  other teams all over the world in the competition. Note that this is an extremely intensive course. The students will have WEEKLY presentation about your progress in the previous week. Since the estimated time spent on this course is AT LEAST10 hours per week, we in general need an approval from your advisor to attend it if you are a graduate student. 

Grades:
It will depend on your weekly performance (judged by your efforts, novelty, and presentation), and weighted by how much you contribute to the overall competition results.

Syllabus:

TBA. (Please note that if we win the competition, the workload can be extended to summer to write a paper and prepare a poster for presentation)

If you have interests to join the course, please obtain your advisor's permission and contact the instructor (first come first serve)

FAQ (modified from last years FAQ):

Q: I am interested in learning data mining and machine learning methods. Is this course the place to go?

This course aims at attending data mining competitions (i.e., KDD CUP). So this is not a place for you to learn basic materials of machine learning and data mining. We suppose you already know the basics. Therefore we require the participants have taken some preliminary courses (See above).


Q: What is the capacity of this class? Do we work individually or form teams?

To make sure we provide sufficient supports to every student in the class, we plan to take no more than 20 students in this class. If there are more than 20 students express the interests to join, we would select based on their prerequisite knowledge and intension. Students form teams (3 person each team) in this class.


Q: May I audit this course?

In general the answer is no, because you will not learn a lot without getting your hands dirty in this class. We don't want to waste your time and we hope every member in the class indeed spends significant amount of efforts on the competition.


Q: How about the course load?

Please anticipate spending at least 10 hours per week on this course. Simply put this: the more efforts you put in, the better results you will get. When your fellow classmates spend (or have to spend) lots of time and efforts on this, you will not be competitive if you don't.


Q: Is there any homework?

You have one single homework (that is, the competition itself) throughout the whole course. You need to give a 20 min presentation on your progress EVERY WEEK.


Q: Because of team work, can I rely on some smart teammates?

No, you should work as hard as others. We will find a way to evaluate each individual student's performance.


Q: I tried many new ways in the past week, but all gave worse results. What should I present?

Failed approaches indeed show something. You should frankly present what you have tried. Competition results are related but not completely related to your final scores. We encourage creative thinking and out-of-the-box ideas. Novel ideas will be rewarded even if it is not proven by you to be useful.


Q: What kinds of computational resources do I need for this course? Will you provide any?

In general the department's machines (e.g., 217) should be enough. We will also provide some other computation resources for this competition.


Q: Will each team/student allow to submit their individual results?

It depends on lots of factors and the instructors will decide what is the best strategy for submission when the time is closer. It is possible that we will only allow selective teams to submit results and/or to form a new ensemble of teams for submission. In any case, every team's contribution (with ideas and either positive or negative results) would be fairly acknowledged. At the current point, the policy is that no individuals nor teams may submit their results  unless granted by the instructors in advance. Violating the policy would lead to serious punishments.


Q: How good your team did in the past years?

Well, we were not perfect, of course, but we did just fine.
This year's performance will be considered as satisfiable if similar to the past years.