Place: room 101, CSIE building
The final project is research oriented. We expect that you conduct deep investigation. You are welcome to discuss with the teacher while doing the project.
You may survey existing works and point out possible future directions. You may see an earlier course project report at UIUC as an example
In our lecture, we only introduced the basic concept of automatic differentiation. Can you investigate the implementation in PyTorch?
In our lecture, we discussed block algorithms for reducing the cost of memory access in matrix-matrix products. It's a general illustration. For GPU, some architectural properties make it very efficient for matrix-matrix products. Can you investigate what these properties are and conduct experiments to confirm them?
In our discussion we see many issues of Adam/AdamW remain to be investigated. For example, the original implementation in BERT does not include bias correction. Can you confirm the importance of bias correction? Then in the AdamW paper, it decouples the weight decay step. Can you design experiments to show that the setting in AdamW is better?
In this talk, for some document sets, linear methods are highly competitive with BERT. However, in Christian et al.(2021), their linear results are worse than BERT on IMdB Reviews sets. Can you investigate the task-specific manners of BERT in what situation linear methods are more robust than large pretrained language models for text classification?
From Wang et al.(2020), Newton methods may not be able to handle networks with many layers. However, some applications need only a few layers. An example is the CNN for texts discussed in our project 2. Can you implement a Newton method and compare with stochastic gradient? For this project you must know how to do the Jacobian-vector and transposed Jacobian-vector products in PyTorch.
Though Adam is widely used in deep learning, Wilson et al. (2017) showed that, adaptive methods (i.e., AdaGrad, RMSProp, and Adam) are worse than non-adaptive methods (i.e., SGD) in some image classification tasks. Does the same thing happen in text classification?
Last modified: