ATML: Optimisation in machine learning

Large neural networks trained with suitable data often work out well, and produce high accuracy models. But why do they work? What is going on inside neural networks? What about the stochastic gradient descent training algorithm makes it so successful?

In this track, we will try to understand what makes good model and good training.

Tutorials

All tutorials at Appleton Tower M2, starting in Week 3.

You can find the following people at these times for discussion.

Mondays 13:10 -- 14:00. Rik Sarkar
Mondays 14:10 -- 15:00 Sahel Torkamani
Wednesdays 13:10 -- 14:00 Sahel Torkamani

Lectures

Mondays 17:10 – 18:00
Lecture Theatre B, 40 George Square

We will start with some basic concepts and definitions in machine learning and move toward more advanced topics. Slides, notes, exercises will be uploaded as we proceed.

	Topic	Resources
Week 1	Introduction and basics	Course introduction Track introduction slides
Week 2	Generalisation, linear classifiers and Optimisation	Generalisation slides Week 1 & 2 Exercises With Solutions
Week 3	Gradient Descent, Convex optimisation, stability	Optimisation slides Week 3 exercises Solutions
Week 4	Stability, stochastic gradient descent	Stability and SGD slides Week 4 exercises Solutions
Week 5	Neural networks	Neural nets slides No additional Exercises
FLW	No Lectures/tutorials.
Week 6	Neural Networks part 2	NN2 Slides Exercises Week 5 & 6
Week 7	No Lecture.
Week 8	Overparameterisation: Neural collapse and pruning	Overparameterisation Slides

Topics

ML Basics — definitions and notations
- The elements of a general ML system -- domains, hypothesis classes and loss functions
- How to define and describe an ML system
- Generalization Vs Memorisation
Linear classifiers and convex optimization
- Convex optimisation
- Gradient descent and stochastic gradient descent
Neural nets
- ReLU vs other activations — what makes ReLU successful?
- Why are deep networks better than shallow networks?
- Cross entropy loss and loss landscapes — why neural networks overfit
Why SGD works — The various factors that affect optimisation and generalisation
- Randomness
- Sharp and flat minima
- Fractal dimension and generalisation
What we know about neural networks
- Neural collapse — what happens inside NN classifiers
- Overparameterization and pruning — why we need large networks and how much we really need in a large network.
Additional topics
- Fairness -- definitions and impossibility -- why fairness is hard
- Explainability -- what does it mean to explain the model behaviour? How do we do it?

Sample exam

A sample set of questions are here: Sample exam (Track 1). These questions are only indicative and based on the first few weeks of lectures. Actual exam will contain more diverse types of questions covering more topics.

License

Mon, 09/03/2026 - 12:00

Tutorials

Lectures

Topics

Sample exam

Search

Navigation