NLU-11: Week 3
Welcome to Week 3!
We are now switching lecturers in the class, and I (Shay) expect to teach you in the next few weeks. In this week, we will explore the notion of attention in deep learning models, which has been important to make large language models work well.
Labs
- The Numpy lab mentioned last week can be found here. It should help you get acquainted with the basics of PyTorch and Numpy.
Tutorials
We will be starting with tutorials next week. We are providing the questions this week, so you can start thinking about them while you are attending the lectures. [tutorial 1]
Lectures
Lecture 1: Sequence-to-sequence models with attention [pdf].
Required reading:
- Sections 7 through 9 of Neural Machine Translation and Sequence-to-sequence Models: A Tutorial, Neubig.
Lecture 2: Transformers [pdf]
Required reading:
- Attention is All You Need. Vaswami et al. (Sections 1-4)
- Transformers from Scratch. Blog entry by Peter Bloem.
- Note: The Vaswami et al. paper is quite difficult to follow. I would suggest you read the Bloem blog entry first, and then try to tackle the Vaswami et al. paper, which is the original work that introduced transformers.
Lecture 3: Pretrained Language Models [pdf].
Required reading:
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Devlin et al., NAACL 2019. (Sections 1 and 3)
Background reading:
- Chapter 9 of Speech and Language Processing, 3rd edition by Jurafsky and Martin provides an alternative presentation of contextualized word embeddings.