Data Mining and Machine Learning: Theory and Practice 2018


Instructors: Prof. Shou-de Lin
Classroom: CSIE 110

Meeting Time: Tue 14:20am~17:00

Office Hour:  After class or by appointment

TA: ΩAʷu

Course Description:

While it is possible to learn a variety of machine learning and data mining theories from lectures or books, applying them accurately and efficiently to the real-world data is a completely different story. Very often data miners have to suffer a painful process of trial and error due to lack of experience. Dealing with the practical issues on data is rather an art than science, nevertheless, in this course we try to build up our experiences from tackling a real-world problem proposed as the ongoing competitions in data mining society. We expect to run this course in an interactive way, so students must discuss with the lecturers and other classmates about their findings as well as the problems they encountered every week.

Pre-requisite courses:
You have to take at least one of the following courses (two or more is even better):
    Machine Learning 
    Probabilistic graphical Model
    Deep Learning
Optimization and Machine Learning

Courses Format and Loading:
You need to implement different kinds of intelligent systems for the competition and run extensive experiments to verify them. You will compete with the other students in the class as well as other teams all over the world. Note that this is an extremely intensive course. The students will have WEEKLY presentation about your progress in the previous week. The estimated time spent on this course is AT LEAST 10 hours per week. 

It will depend on your weekly performance (judged by your efforts, novelty, and presentation), and weighted by how much you contribute to the overall competition results.



FAQ (modified from last years FAQ):

Q: I am interested in learning data mining and machine learning methods. Is this course the place to go?

This course aims at extending your current knowledge on machine learning and data mining to real world tasks or attending data mining competitions (i.e., KDD CUP). So this is not a place for you to learn basic materials of machine learning and data mining. We suppose you already know the basics. Therefore we require the participants have taken some preliminary courses (See above).

Q: What is the capacity of this class? Do we work individually or form teams?

To make sure we provide sufficient supports to every student in the class, we plan to take no more than 20 students in this class. If there are more than 20 students express the interests to join, we would select based on their prerequisite knowledge and intension. Students form teams in this class.

Q: May I audit this course?

In general the answer is no, because you will not learn a lot without getting your hands dirty in this class. We don't want to waste your time and we hope every member in the class indeed spends significant amount of efforts on the competition.

Q: How about the course load?

Please anticipate spending at least 10 hours per week on this course. Simply put this: the more efforts you put in, the better results you will get. When your fellow classmates spend (or have to spend) lots of time and efforts on this, you will not be competitive if you don't.

Q: Is there any homework?

There will not be a specific homework assignment, but your team needs to give a 20 min presentation on the progress EVERY WEEK.

Q: Because of team work, can I rely on some smart teammates?

No, you should work as hard as others. We will find a way to evaluate each individual student's performance.

Q: I tried many new ways in the past week, but all gave worse results. What should I present?

Failed approaches indeed show something. You should frankly present what you have tried. The results are related but do not contribute 100% to your final scores. We encourage creative thinking and out-of-the-box ideas. Novel ideas will be rewarded even if it is not proven by you to be useful.

Q: What kinds of computational resources do I need for this course? Will you provide any?

In general the department's machines (e.g., 217) should be enough. We will also provide some other computation resources if necessary.