Week 7: Transfer learning and large language models

Reminders and announcements

  • Lab 4 worksheet is now available for the labs this week on implementing and interpreting multi-head attention.
    • You should be able to complete the lab during your scheduled timeslot, but you may wish to review the material on attention and Transformers from last week before attending.
  • The specification for this year's assignment (including helper code and submission templates) will be released by the end of Thursday 30th of October.
    • The student groups for the assignment are available on Learn. Please contact your partner (if any) as soon as possible to start preparing to work together on the assignment. As mentioned last week, we only partnered up people who asked us to. If you find any problems with the partnering, please contact the TA Mai Dao by Friday at noon at the latest. Requests after this time will not be considered.
    • During weeks 8 and 9 (the next two weeks), we have scheduled extra help hours to assist you with the assignment. So in addition to the regular help hour on Friday at 10 am, there will also be a second one on Tuesday at 3 pm. All help hours take place in Appleton Tower 5.07 - Teaching Suite.
  • Tutorial exercises for the Week 8 groups are here. The tutorial includes questions on ethical and social issues in NLP. In this tutorial, you will also practice discussing and critiquing research papers.
  • Preview of next week's reading:
    • Large Language Models (7.3-7.6)
    • Scaling Laws (8.8.1)
    • Note: there will be no mandatory readings for the Friday lecture in week 8 as it will be a guest lecture.

Overview of this week

Welcome to Week 7! This week, we continue the discussion of Transformer architectures, focusing on their input (token and positional embeddings) and outputs (language modelling head). Next, we introduce the concept of large language models (LLMs), i.e., a language model (usually with a Transformer architecture) pre-trained on large amounts of data with unsupervised objectives. This is an instance of the more general framework of transfer learning. In particular, we will study how encoder LLMs (like BERT) can provide contextual representations of words and can be fine-tuned for sequence classification tasks (such as Natural Language Inference). We will also cover other architectures for generative LLMs, including encoder-decoder LLMs (such as T5) and decoder-only LLMs (such as GPT).

Lectures and reading

Lecture #Who?SlidesReading 
1EPTransformer inputs and outputs8.4-8.7 (*)
2EPTransfer Learning and BERT

If you're using the pdf: 9.0-9.2.1 (*), 9.3-9.4 (*)

If you're using the website: 10.0-10.2.1 (*), 10.3-10.4 (*)

3EPArchitectures of Large Language Models7.1-7.2 (*)

Assignment information

The student groups for the assignment are available on Learn.

All the other information regarding the assignment will be released by the end of Thursday.

License
All rights reserved The University of Edinburgh