ATML: Optimisation in machine learning

Large neural networks trained with suitable data often work out well, and produce high accuracy models. But why do they work? What is going on inside neural networks? What about the stochastic gradient descent training algorithm makes it so successful?

In this track, we will try to understand what makes good model and good training.

Topics 

We will start with some basic concepts and definitions in machine learning and move toward more advanced topics. Slides, notes, exercises will be uploaded as we proceed. 

  • ML Basics — definitions and notations
    • The elements of a general ML system -- domains, hypothesis classes and loss functions
    • How to define and describe an ML system
    • Generalization Vs Memorisation
  • Linear classifiers and convex optimization
    • Convex optimisation
    • Gradient descent and stochastic gradient descent
  • Neural nets
    • ReLU vs other activations — what makes ReLU successful?
    • Why are deep networks better than shallow networks?
    • Cross entropy loss and loss landscapes — why neural networks overfit
  • Why SGD works — The various factors  that affect optimisation and generalisation
    • Randomness
    • Sharp and flat minima
    • Fractal dimension and generalisation
  • What we know about neural networks
    • Neural collapse — what happens inside NN classifiers
    • Overparameterization and pruning — why we need large networks and how much we really need in a large network.
  • Additional topics
    • Fairness -- definitions and impossibility -- why fairness is hard
    • Explainability -- what does it mean to explain the model behaviour? How do we do it? 

 

 

 

 

License
All rights reserved The University of Edinburgh