This time, we will introduce the currently most popular architecture of neural models: Transformers. Transformers, and their variations, power the majority of modern large language models, such as OpenAI's ChatGPT or Google's BERT. In this section, we will consider them in the context of text generation, but next week, we will also look at the ways we can utilize them to extract knowledge from large collections and transfer it to specific tasks. The key component of Transformers is the multi-head attention module, which builds on the attention modeling idea we introduced last time. We also touch on interpretability and observe that some of the behavior of large transformer models is human interpretable.
The folder contains slides, required reading and a quiz.
Slides and reading
(The recorded video contains animations which are not visible in pdf)
Please refer to Jurafsky and Martin, 3rd edition (online), chapter 10. However, again notice that the material presented in the lecture differs from the text books in some aspects. Also optionally study relevant sections in Lena Voita's NLP course:
Quiz 27: Transformers
These questions are designed to test your understanding of the above course content; doing this quiz does not contribute to your overall grade. Some questions require a text answer. You can ask for formative feedback on these from your tutor or on piazza. Other questions are multiple choice or they require a numeric answer: you will get immediate feedback for these. Please don't attempt this quiz until you have acquainted yourself with this lecture and the required reading.
You must be logged onto Learn to do this quiz.