Welcome to Week 3!
We are now switching lecturers in the class, and I (Shay) expect to teach you in the next three weeks. In this week, we will explore the notion of attention in deep learning models, which has been important to make it work to the extent it does.
Jonas will provide further TA hours, we expect on Thursday at 2pm. An announcement with the location will be sent out during the week.
Labs
- The Numpy lab mentioned last week can be found here. It should help you get acquainted with the basics of PyTorch and Numpy.
Tutorials
We will be starting with tutorials next week. We are providing the questions this week, so you can start thinking about them while you are attending the lectures. [tutorial 1]
Lectures
- Sequence-to-sequence models with attention [pdf]. Required reading:
- Sections 7 through 9 of Neural Machine Translation and Sequence-to-sequence Models: A Tutorial, Neubig.
- Transformers [pdf]. Required reading:
- Attention is All You Need. Vaswami et al.
- Transformers from Scratch. Blog entry by Peter Bloem.
- Note: The Vaswami et al. paper is quite difficult to follow. I would suggest you read the Bloem blog entry first, and then try to tackle the Vaswami et al. paper, which is the original work that introduced transformers.
- Word embeddings [pdf]. Required reading:
- Efficient estimation of word representations in vector space. Mikolov et al., NIPS Workshop 2013.
- Contextual word representations: A contextual introduction. Smith, 2019. This paper provides a conceptual overview of word embeddings, and explains why contextualized embeddings are an important innovation.
- Chapter 6 of Speech and Language Processing, 3rd edition by Jurafsky and Martin also has a useful presentation of word embeddings, as well as explaining the wider literature on vector representations of words and similarity functions.