Optimization Methods for Deep Learning

Instructor: Chih-Jen Lin, Room 413, CSIE Building.
The best way to contact me is via e-mails.
TA: Cheng-Hung Liu (email: d07944009 at ntu.edu.tw) and Li-Chung Lin (email: r08922141 at ntu.edu.tw). TA hour: Wed. 14:00 ~ 15:00 online
Your HW/exam scores will be here
Time: Monday 10:20am-1pm
We will do two 10-minute breaks at around 11:10am and 12:10pm. The class ends at 1pm.

Place: room 105, CSIE building
FAQ of this course
This is an advanced course. We expect to take 15 to 20 students
We will pre-record most lectures and broadcast them in the class. In the class I will give additional comments while the video is being played.

Course Outline

Deep learning involves a difficult non-convex optimization problem. The goal of this course is to study the implementation of optimization methods for deep learning. We will run this course in the following formats:

lectures (by the instructor)
project presentations (by students): we will have many.

For potential students: you want to make sure that you are interested in optimization for deep learning.

We will heavily use the software simpleNN

Among the various types of networks, we will pay more attention to CNN.

You will get hands-on experiences in implementing a deep learning code

Slides and recordings

This section (and slides) will be continuously updated.

Course information ( video )
Optimization problems for deep learning
- Linear classification ( part 1: slides video )
- Fully-connected networks ( part 1: slides video )
- Convolutional networks ( part 1: slides video part 2: slides video part 3: slides video part 4: slides video )
Stochastic gradient methods for deep learning
- Gradient descent ( part 1: slides video )
- Stochastic gradient methods ( part 1: slides video part 2: slides video )
- a note on different momentum update rules: slides
Gradient calculation
- Vector form ( part 1: slides video )
- Gradient calculation ( part 1: slides video part 2: slides video part 3: slides video )
Implementation
- part 1: slides video
- We will partially cover two sets of slides from the course "numerical methods" ( part 1: slides video part 2: slides video )
- part 2: slides video 1 video 2 video 3
- part 3: slides video
Automatic differentiation
- part 1: slides video
- part 2: slides video
Newton method
- Basic ( part 1: slides video part 2: slides video )
- Algorithms ( part 1: slides video part 2: slides video part 3: slides video )
- Gauss Newton matrix-vector product (
  - slides
  - Using only backward process ( videos: part1, part2, part3, part4, part5 )
  - Using forward and backward processes ( videos: part1, part2, part3 )

Projects

Project 1: simple experiments using SG methods. Presentation: March 15. Discussion.
Project 2: more experiments on SG methods. Presentation: March 29. Discussion.
Project 3: checking running time of major operations. Presentation: April 19. Discussion.
Project 4: Making the MATLAB Implementation Competitive with Tensorflow. Presentation: May 10. Discussion.
Project 5: An investigation of Python profilers. Presentation: June 7. Discussion.
Project 6: Robustness of newton methods and running time analysis. Presentation: June 21. Discussion

Some explanation: we do not have a final exam, so the class should end on 6/14, which is a holiday. To give you more time on doing the project, the decision is to run the project 6 presentation on 6/21 and we decide not to have a class on 5/31.

Exams

No exam

Grading

100% Projects.

Issues related to COVID-19

According to school's regulation, all students must wear masks
If the covid situation becomes serious, we will move the course online.
If you are sick, please do not come to the class.

Acknowledgements: the following people have greatly helped to prepare materials for this course (including creating the software used for the course and trying some projects). Chien-Chih Wang, Kent Loong Tan, Pin-Yen Lin, Cheng-Hung Liu (former and current members in my group), Pengrui Quan (UCLA), and Leonardo Galli (University of Florence)

Last modified: