Week 8: LLM Inference and Scaling Laws
Reminders and announcements
- Tutorial 3 group meetings are this week. This tutorial will tackle issues with web-scale data, dialect and discrimination, and possible mismatches between the preferences of users and those of developers/researchers.
- This week, we also have a guest lecture by Edoardo's PhD student, Piotr Nawrot. The guest lecture will focus on sparse attention in Transformers and its accuracy–efficiency trade-offs. The contents of the guest lecture are not examinable.
- Readings for next week (Post-training: Instruction Tuning, Alignment, and Test-Time Compute):
Overview of the Week
This week, we continue our explorations of LLMs. Specifically, we will discuss how GPT3 introduced the notion of in-context learning (ICL), a test-time strategy to adapt a model towards a specific task without updating its parameters. This is achieved by prepending a small number of input–output examples to the LLM's context. Afterwards, we will see how the accuracy of a model across tasks can be predicted reliably by scaling laws that depend on the model size (number of parameters), number of datapoints, and number of training steps. The Friday lecture (a non-examinable guest lecture) will then touch upon efficient variants of attention, where the attention weights are sparse.
Lectures and reading
| Lecture # | Who? | Slides | Reading |
|---|---|---|---|
| 1 | EP | Prompting and In-context Learning | 7.3-7.5 (*) |
| 2 | EP | Scaling Laws and LLM Evaluation | 8.8.1 (*), 7.6 (*) |
| 3 | Piotr Nawrot (Guest lecturer) | Memory Compression and Attention Sparsity (not examinable) | Optional reading: |
License
All rights reserved The University of Edinburgh