Week 8: LLM Inference and Scaling Laws

Reminders and announcements

  • Tutorial 3 group meetings are this week. This tutorial will tackle issues with web-scale data, dialect and discrimination, and possible mismatches between the preferences of users and those of developers/researchers.
  • This week, we also have a guest lecture by Edoardo's PhD student, Piotr Nawrot. The guest lecture will focus on sparse attention in Transformers and its accuracy–efficiency trade-offs. The contents of the guest lecture are not examinable.
  • Readings for next week (Post-training: Instruction Tuning, Alignment, and Test-Time Compute):
    • If you're using the pdf: 10.1-10.3
    • If you're using the website: 9.1-9.3

Overview of the Week

This week, we continue our explorations of LLMs. Specifically, we will discuss how GPT3 introduced the notion of in-context learning (ICL), a test-time strategy to adapt a model towards a specific task without updating its parameters. This is achieved by prepending a small number of input–output examples to the LLM's context. Afterwards, we will see how the accuracy of a model across tasks can be predicted reliably by scaling laws that depend on the model size (number of parameters), number of datapoints, and number of training steps. The Friday lecture (a non-examinable guest lecture) will then touch upon efficient variants of attention, where the attention weights are sparse.

Lectures and reading

Lecture #Who?SlidesReading 
1EPPrompting and In-context Learning7.3-7.5 (*)
2EPScaling Laws and LLM Evaluation8.8.1 (*), 7.6 (*)
3Piotr Nawrot (Guest lecturer)Memory Compression and Attention Sparsity (not examinable)

Optional reading:

License
All rights reserved The University of Edinburgh