While it is possible to learn a variety of machine learning and data mining theories from lectures or books, applying them accurately and efficiently to the real-world data is a completely different story. Very often data miners have to suffer a painful process of trial and error due to lack of experience. Therefore, dealing with the practical issues on data is frequently viewed as art rather than as science.
In this course, we try to build up our experiences on the art by tackling real-world problems that appear the ongoing competitions in data mining society. In particular, we aim at attending ACM KDDCup 2012, which is currently the most prestigious data mining competition. We expect to run this course in an interactive way, in which students must discuss with the instructors and other classmates about their findings as well as the problems they encountered every week.
You need to implement different kinds of intelligent systems for the competition and run extensive experiments to verify them. You will compete with the other students in the class as well as other teams all over the world in KDDCup. Note that this is an extremely intensive course. The students will have WEEKLY presentation about your progress in the previous week. Since the estimated time spent on this course is at least 10 hours per week, we in general need an approval from your advisor to attend it if you are a graduate student.
depend on your weekly performance (judged by your efforts, novelty, and presentation), and weighted by how much you contribute to the overall competition results; no exams
This course aims at attending data mining competitions (i.e., KDDCup). So this is not a place for you to learn basic materials of machine learning and data mining. We suppose you already know the basics. Therefore we require the participants have taken some prerequisites (See above).
There are so many machine learning techniques and no body can be familiar with all. Therefore, some very basic understanding of machine learning might be enough. You must be able to find and learn new techniques by yourselves while working on this course.
To make sure we provide sufficient supports to every student in the class, we plan to take no more than 18 students in this class. If there are more than 18 students express the interests to join, we would select based on their prerequisite knowledge and motivation. Students form teams (3 person each team) in this class.
In general the answer is no, because you will not learn a lot without getting your hands dirty in this class. We don't want to waste your time and we hope every member in the class indeed spends significant amount of efforts on the competition.
Please anticipate spending at least 10 hours per week on this course. Simply put this: the more efforts you put in, the better results you will get. When your fellow classmates spend (or have to spend) lots of time and efforts on this, you will not be competitive if you don't.
We have a homepage (as you are reading it). However, the private course wiki will be the main place to give details. You will see our progress on the competitions there. Every enrolled student will get a wiki account.
Continuously (on the competition task) EVERY WEEK, including presentations.
Basically no unless you think slides can better indicate your ideas and results. However, your presentation must clearly show your progress and problems. Sometimes we even would like to directly see your code and experimental environments.
No, you should work as hard as others. We will find a way to evaluate each individual student's performance.
Failed approaches indeed show something. You should frankly present what you have tried. Competition results are related but not completely related to your final scores. We encourage creative thinking and out-of-the-box ideas. Novel ideas will be rewarded even if it is not proven by you to be useful.
In general the department's machines (e.g., 217) should be enough. We will also provide some machines we purchased for this competition.
It depends on lots of factors and the instructors will decide what is the best strategy for submission when the time is closer. It is possible that we will only allow selective teams to submit results and/or to form a new ensemble of teams for submission. In any case, every team's contribution (with ideas and either positive or negative results) would be fairly acknowledged. At the current point, the policy is that no individuals nor teams may submit their results to KDDCup 2012 unless granted by the instructors in advance. Violating the policy would lead to serious punishments.
Of course. You pass only if you work hard enough. (Similarly, in industry, underperformers will be fired).
Well, we were not perfect, of
course, but we did ok.
This year's performance will be considered satisfiable if similar to the past two years.
The least we would do is to let you fail this class. You can even face expelling from the university in the serious cases.