RL: Lecture Schedule
Note: See the "Coding Proficiency Self-Check" below for details about programming requirements.
Lecture times and location:
- Tuesdays and Fridays, 14.10–15.00
- 40 George Square, Lecture Theatre B (Entrance in annex building, look for signage)
Lecture slides are linked in the table below. Slides may be updated, and the new slides will be uploaded at least 24h before the lecture time.
The course follows the 2nd-edition RL book by Richard Sutton and Andrew Barto, the 2022 version of which can be downloaded for free here. Required reading from the RL book for each lecture is given in the last slide of the lecture and in the table below.
Week | Date | Topic (slides) | Required Reading |
1 | 14 January '25 | Introduction | Ch. 1 (1.1–1.4); Coding Proficiency Self-Check |
17 January '25 | Multi-armed bandits | Ch. 2 (2.1–2.8) | |
2 | 21 January '25 | Markov decision processes (part 1) | Ch. 3 (3.1–3.7) |
24. January | Lecture cancelled due to adverse weather conditions | ||
3 | 28 January '25 | Markov decision processes (part 2) and Dynamic programming (part 1) | Ch. 4 (4.1–4.7) |
31 January '25 | Monte Carlo methods (and DP part 2) | Ch. 5 (5.1–5.7) | |
4 | 04 February '25 | Temporal-difference learning | Ch. 6 (6.1–6.2, 6.4–6.6), Ch. 7 (7.1–7.3) |
07 February '25 | Tutorial lecture: Building a complete RL system | Demo code on Learn | |
5 | 11 February '25 | Coursework introduction | Coursework on Learn (from 11/2/25) |
14 February '25 | Planning and learning (incl. n-step TD) | Ch. 8 (8.1–8.3, 8.10–8.11) | |
Flexible learning week (no lectures) | |||
6 | 25 February '25 | Value function approximation | Ch. 9 (9.1–9.5), Ch. 10 (10.1), Ch. 11 (11.1) |
28 February '25 | Policy gradient methods | Ch. 13 (13.1–13.5) | |
7 | 04 March '25 | Deep reinforcement learning | These topics will not be examined. |
07 March '25 | Reward: Neuroscience, Reward Hypothesis, Inverse RL, Shaping | ||
8 | 11 March '25 | RL Beyond the Markov Property | |
14 March '25 (last lecture) | Multi-agent reinforcement learning |
License
All rights reserved The University of Edinburgh