ASR: Lectures

There are 18 lectures, taking place in weeks 1-9. Lectures are held on Mondays and Thursdays at 14:10, starting Monday 13 January. Monday lectures will be held in G.159 in Old College (just inside the Law School entrance on the north side of the quad) and Thursday lectures will be in 2.35 in the Edinburgh Futures Institute.

Lecture live streaming is available via Media Hopper Replay for students not able to attend in person – the link can be found on Learn under “Course Materials”.

All reading listed below is optional and will not be examinable. Works listed as Reading may be useful to improve your understand of the lecture content; Background reading is for interest only.

  1. Monday 12 January 2026. Introduction to Speech Recognition
    Slides
    Reading: J&M: chapter 7, section 9.1; R&H review chapter (sec 1).
     
  2. Thursday 15 January 2026. Speech Signal Analysis 1 
    Reading: O'Shaughnessy (2000), Speech Communications: Human and Machine, chapter 2; J&M: Sec 9.3; Paul Taylor (2009), Text-to-Speech Synthesis: Ch 10 and Ch 12. 
    SparkNG MATLAB realtime/interactive tools for speech science research and education
     
  3. Monday 19 January 2026. Speech Signal Analysis 2 
    Reading: O'Shaughnessy (2000), Speech Communications: Human and Machine, chapter 3-4 
     
  4. Thursday 22 January 2026. Introduction to Hidden Markov Models
    Reading: Rabiner & Juang (1986) Tutorial.; J&M: Secs 6.1-6.5, 9.2, 9.4; R&H review chapter (sec 2.1, 2.2); 
     
  5. Monday 26 January 2026. HMM algorithms 
    Reading: J&M: Sec 9.7, G&Y review (sections 1, 2.1, 2.2); (J&M: Secs 9.5, 9.6, 9.8 for introduction to decoding). 
     
  6. Thursday 29 January 2026. Gaussian mixture models 
    Reading: R&H review chapter (sec 2.2) 
     
  7. Monday 2 February 2026. HMM acoustic modelling 3: Context-dependent phone modelling 
    Reading: J&M: Sec 10.3; R&H review chapter (sec 2.3); Young (2008). 
     
  8. Thursday 5 February 2026. Large vocabulary ASR 
    Background reading: Ortmanns & Ney, Young (sec 27.2.4) 
     
  9. Monday 9 February 2026. ASR with WFSTs
    Reading: Mohri et al (2008), Speech recognition with weighted finite-state transducers, in Springer Handbook of Speech Processing (sections 1 and 2) 
     
  10. Thursday 12 February 2026. Hybrid acoustic modelling with neural networks 
    Background Reading: Morgan and Bourlard (May 1995). Continuous speech recognition: Introduction to the hybrid HMM/connectionist approach, IEEE Signal Processing Mag., 12(3):24-42
    Mohamed et al (2012). Understanding how deep belief networks perform acoustic modelling, ICASSP-2012. 
      
    Monday 16 - Friday 20 February 2026.
    NO LECTURES OR LABS - FLEXIBLE LEARNING WEEK. 
      
     
  11. Monday 23 February 2026. Neural network architectures for ASR 
    Reading: Maas et al (2017), Building DNN acoustic models for large vocabulary speech recognition Computer Speech and Language, 41:195-213.
    Background reading: Peddinti et al (2015). A time delay neural network architecture for efficient modeling of long temporal contexts, Interspeech-2015
    Graves et al (2013), Hybrid speech recognition with deep bidirectional LSTM, ASRU-2013.
     
  12. Thursday 26 February 2026. Speaker Adaptation 
    Reading: G&Y review, sec. 5
    Woodland (2001), Speaker adaptation for continuous density HMMs: A review, ISCA Workshop on Adaptation Methods for Speech Recognition
    Bell et al (2021), Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview , IEEE Open Journal of Signal Processing, Vol 2:33-36. 
     
  13. Monday 2 March 2026. Connectionist temporal classification 
    Reading: A Hannun et al (2014), Deep Speech: Scaling up end-to-end speech recognition, ArXiV:1412.5567. 
    A Hannun (2017), Sequence Modeling with CTC, Distill. 
    Background Reading: Y Miao et al (2015), EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, ASRU-2105. 
    A Maas et al (2015). Lexicon-free conversational speech recognition with neural networks, NAACL HLT 2015. 
      Reading: Sec 27.3.1 of Young (2008), HMMs and Related Speech Recognition Technologies.
     
  14. Thursday 5 March 2026. Encoder-decoder models 1: the RNN transducer 
    Background reading: Alex Graves (2012), Sequence Transduction with Recurrent Neural Networks, International Conference of Machine Learning (ICML) 2012 Workshop on Representation Learning
     
  15. Monday 9 March 2026 Guest lecture: Multilingual and low-resource speech recognition 
    Background reading: Besaciera et al (2014), Automatic speech recognition for under-resourced languages: A survey, Speech Communication, 56:85--100.
    Huang et al (2013). Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers, ICASSP-2013. 
     
  16. Thursday 12 March 2026. Encoder-decoder models 2: attention-based models 
    Reading: W Chan et al (2015), Listen, attend and spell: A neural network for large vocabulary conversational speech recognitionICASSP. 
    R Prabhavalkar et al (2017), A Comparison of Sequence-to-Sequence Models for Speech Recognition, Interspeech. 
    Background Reading: C-C Chiu et al (2018), State-of-the-art sequence recognition with sequence-to-sequence models, ICASSP.
    S Watanabe et al (2017), Hybrid CTC/Attention Architecture for End-to-End Speech Recognition, IEEE STSP, 11:1240--1252. 
     
  17. Monday 16 March 2026. Self-supervised learning for speech 
    Slides
    Background Reading: Baevski et al. (2020), wav2vec 2.0: A framework for self-supervised learning of speech representations, NeurIPS
    Hsu et al. (2021), HuBERT: Self-supervised speech representation learning by masked prediction of hidden units, IEEE/ACM Transactions on Audio, Speech, and Language Processing
    A van den Ooord et al (2018), Representation learning with contrastive predictive coding
     
  18. Thursday 19 March 2026. ASR with Large Language Models 
    Background Reading: Zhang et al. (2023), SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities, Findings of EMNLP
    Wu et al. (2023), On decoder-only architecture for speech-to-text and large language model integration, ASRU
     
  19. Revision tutorials (various time slots in April or May)

Reading

All listed reading is optional and will not be examinable. Works listed as Reading may be useful to improve your understand of the lecture content; Background reading is for interest only.

Textbook

  • J&M: Daniel Jurafsky and James H. Martin (2008). Speech and Language Processing, Pearson Education (2nd edition). 
    You can also look at the draft 3rd edition online – we take a much broader view of ASR than covered in this edition, but material in Appendix A and Chapter 16 is useful.

Review and Tutorial Articles

Other supplementary materials

  • In case you need more introductory articles on speech signal analysis (Lectures 2 and 3):
    Daniel P.W. Ellis, "An introduction to signal processing for speech", Chapter 22 in The Handbook of Phonetic Science, 2nd ed., ed. Hardcastle, Laver, and Gibbon. pp. 757-780, Blackwell, 2008.
  • Speech.zone by Prof Simon King at the University of Edinburgh.

Copyright (c) University of Edinburgh 2015-2026
The ASR course material is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License 
licence.txt
This page maintained by Peter Bell.

License
All rights reserved The University of Edinburgh