#### 18 April 2024: Coursework Marks Released

Dear RL students,

Marking for the RL coursework has been concluded. You should be able to now see your mark in Learn.

Please note that these are raw marks. If any late penalties are applicable, they will be applied by ITO.

Besides the mark itself, you will also be able to access feedback in Learn which will provide further insight into provided marks, deductions and generally provides an overview of how many marks were received for questions and sub-components of all questions.

If you have trouble accessing this feedback, please let us know in Piazza and we will try to sort any difficulties.

Best of luck for your exam preparations!

Your RL team

#### 18 March 2024: Updates to Coursework

Dear RL students,

We would like to announce two typos in the coursework that have been raised as questions on Piazza. We suggest you incorporate these updates into your code.

To see the equations in this post displayed nicely in math mode, please see the corresponding announcement on Piazza: https://piazza.com/class/losstjmpabz2r/post/116

**Exercise 3, part 2(a)(ii): **There is a typo in the update rule for exponential decay in the pdf. It should be written as

`\epsilon_{t+1} \leftarrow r^{\frac{1}{t_{max}}} \epsilon_t`

i.e. set t=1 in the exponent, because this update is for a single timestep. However, note that the schedule hyperparameter function is not called at every timestep, but is actually called once per episode (see line 160 of train_dqn.py). This means you must also take into account the number of timesteps elapsed between epsilon updates, so the update could also be written more generally as:

`\epsilon_{t+\Delta t} \leftarrow r^{\frac{\Delta t}{t_{max}}} \epsilon_t`

**Exercise 4: **There is a typo in the constants dictionary for Q4. Within the constants file, you will see the two key-value pairs:

```
"episode_length": 31000,
"max_timesteps": 200,
```

These should be reversed like so:

```
"episode_length": 200,
"max_timesteps": 31000,
```

Technically, Q4 is still solvable even with the typo'ed version of the constants. So, if you have already solved the problem in that way, there will be no mark deduction.

We hope this clears up any confusion on these questions.

#### 13 Feb 2023: Coursework Released

Today, the RL coursework has been released and the introduction slides and coursework description can be found on the Schedule page. Today's lecture (will be recorded) will introduce the coursework and students can ask questions. We highly recommend attending.

Good luck with the coursework!

#### 9 Feb 2024: RL Tutorial Lecture

This Friday 9 Feb, the RL TAs will deliver a special tutorial lecture on how to build and evaluate a complete RL system. We highly recommend attending, as this is very good preparation for the RL coursework (released on Tuesday next week). Slides and demo code can be found on the schedule page.

#### 5 Feb 2024: Resources

We would like to draw your attention to the Resources subpage on the RL course page:

https://opencourse.inf.ed.ac.uk/rl/resources

This page contains pointers to many useful learning resources on RL, including a RL algorithm chart. We hope you find these resources useful.

#### 15 Jan 2024: Welcome to the course!

The course page with all information can be found here: https://opencourse.inf.ed.ac.uk/rl

Course announcements can be found on the course page under "Announcements" and are sent via email to the course list (all registered students are automatically added to this list).

The first lecture will take place on **Tuesday 16th Jan 2024** at 14.10 in Appleton Tower Lecture Theatre 1. Information about lecture dates and topics can be found here: https://opencourse.inf.ed.ac.uk/rl/schedule