The vanilla encoder-decoder models, which we discussed in the previous session, are not effective in capturing interactions between input text (e.g., a sentence in a source language) and the generated context (e.g., the sentence being generated), are prone to hallucination, and are hard to train. In this lecture, we will discuss how to integrate attention into the neural text generator. With the attention model, on each step of generation, the decoder will select the input tokens that are most relevant to the next prediction and rely on those tokens. We will discuss several variations of these ideas and also do some analysis to see what the attention weights can capture.
The folder contains slides, required reading and a quiz.
Slides and reading
(The recorded video contains animations which are not visible in pdf)
Recommended reading: Jurafsky and Martin, 3rd edition (online), section 9.8. Note that there is material in the lecture which is not covered in J&M.
Also optionally: study language modeling and seq2seq sections in Lena Voita's NLP course:
Quiz 26: Attention
These questions are designed to test your understanding of the above course content; doing this quiz does not contribute to your overall grade. Some questions require a text answer. You can ask for formative feedback on these from your tutor or on piazza. Other questions are multiple choice or they require a numeric answer: you will get immediate feedback for these. Please don't attempt this quiz until you have acquainted yourself with this lecture and the required reading.
You must be logged onto Learn to do this quiz.