Welcome to FNLP!
Hello, we are pleased to welcome you to Foundations of Natural Language Processing.
This course is normally taken by third year undergraduates. It introduces you to some basic concepts and techniques in Natural Language Processing.
We will be delivering the course via 3 lectures a week (all in person, but some lectures are also available as pre-recorded videos), fortnightly tutorials in small groups (in person), fortnightly labs (in person) and chat forums (piazza).
The course material for each week (including lecture slides, videos, required reading and quizzes) will appear on these pages (see right hand menu). The textbook we use for required reading is detailed under Library Resources. Visit the Learn page for the course for information about Assessment and your lab and tutorial groups (not yet assigned, check back in Week 1).
Alex Lascarides is the course organiser, and Alex Lascarides and Ivan Titov will be delivering the lectures.
Ivan Titov is a professor of natural language processing; Ivan's research focuses on natural language understanding (e.g., question answering, information extraction, and semantic parsing) and generation (e.g., machine translation), and more specifically on improving generalization of deep learning NLP models across tasks and data distributions and making them more interpretable, controlable and accountable.
Alex Lascarides is a professor, whose research is in natural language processing (particularly discourse and dialogue understanding), interactive task learning, learning strategies in complex games, and learning to adapt to unforeseen possibilities.
We very much look forward to teaching you all.
Learning Outcomes
On successful completion of this course, you should be able to:
- Identify and analyze examples of ambiguity in natural language---ambiguity in part-of-speech, word sense, syntax, semantics and pragmatics. Explain how ambiguity presents a problem for computational analysis and NLP applications and some of the ways it can be addressed (see (2) to (5)).
- Describe and apply standard sequence models (e.g., HMMs), classification models (e.g., Naïve Bayes, MaxEnt); parsing algorithms (e.g., statistical chart parsing and dependency parsing) for processing language at different levels (e.g. morphology, syntax and semantics), and simulate each algorithm on `toy linguistic examples step-by-step with pen and paper.
- Explain and provide examples of how sparse data can be a problem for machine learning in NLP; describe and apply methods for addressing the sparse data problem.
- Given an appropriate NLP problem, students should also be able to identify suitable evaluation measures for testing solutions to the problem, explain the role of annotated corpora in developing those solutions, and assess and justify which sequence of algorithms are most appropriate for solving the problem, based on an understanding of the algorithms in (2) and (3).
- Implement parts of the NLP pipeline with the help of appropriate support code and/or tools. Evaluate and interpret the results of implemented methods on natural language data sets.
Course Outline
This course covers some of the linguistic and algorithmic foundations of natural language processing (NLP). It builds on algorithmic and data science concepts developed in second year courses, applying these to NLP problems. It also equips students for more advanced NLP courses in year 4. The course is strongly empirical, using corpus data to illustrate both core linguistic concepts and algorithms, including language modeling, part of speech tagging, syntactic processing, the syntax-semantics interface, and aspects of semantic and pragmatic processing. The theoretical study of linguistic concepts and the application of algorithms to corpora in the empirical analysis of those concepts will be interleaved throughout the course.
An indicative list of topics to be covered include the following (although they won't be presented in this order):
1. Lexicon and lexical processing:
* morphology
* language modeling
* hidden Markov Models and associated algorithms
* part of speech tagging (e.g., for a language other than English) to illustrate HMMs
* smoothing
* text classification
2. Syntax and syntactic processing:
* the Chomsky hierarchy
* syntactic concepts: constituency (and tests for it), subcategorization, bounded and unbounded dependencies, feature representations
* context-free grammars
* lexicalized grammar formalisms (e.g., dependency grammar)
* chart parsing and dependency parsing (eg, shift-reduce parsing)
* treebanks: lexicalized grammars and corpus annotation
* statistical parsing
3. Semantics and semantic processing:
* word senses: regular polysemy and the structured lexicon; distributional models; word embeddings (including biases found)
* compositionality, constructing a formal semantic representation from a (disambiguated) sentential syntactic analysis.
* predicate argument structure
* word sense disambiguation
* pragmatic phenomena in discourse and dialogue, including anaphora, presuppositions, implicatures and coherence relations.
* labelled corpora addressing word senses (e.g., Brown), semantic roles (e.g., Propbank, SemCor), discourse information (e.g., PDTB, STAC, RST Treebank).
4. Data and evaluation (interspersed throughout other topics):
* cross-linguistic similarities and differences
* commonly used datasets
* annotation methods and issues (e.g., crowdsourcing, inter-annotator agreement)
* evaluation methods and issues (e.g., standard metrics, baselines)
* effects of biases in data
Weekly Activities
This year, the course will be delivered entirely on campus:
- There are 3 in person lectures each week (see the timetable for details). Videos of all in-person lectures will be live-streamed. In-person lectures will also be uploaded for later viewing soon after they happen.
- There are also 3 online post-lecture quizzes each week, to be done in your own time after watching the lecture videos and/or attending the in-person lecture, to test your understanding of the content of the lecture.
- 1 in person tutorial every other week. You should attempt to do the tutorial exercises in advance. There are 5 tutorials, in weeks 2, 4, 6, 8 and 10. Check out which tutorial group you are in under Groups on the Learn page. The tutorial exercises are available on the RHS menu under Tutorial Exercises.
- There are lab sessions in weeks 3, 5, 7, 9 and 11. The class is divided into 2 groups; please check which group you're in under Groups. These labs are designed to enable independent work, and so you can also do the lab exercises in your own time.
- As always, you can ask TAs and demonstrators questions on the discussion forum piazza about both the lab exercises and the two pieces of coursework, as and when those queries arise;
We suggest the following weekly schedule:
Monday: |
The online quizzes, required reading and videos are available in Course Materials. It should take you about 6 hours total each week to watch the videos or attend the lectures, do the quizzes and read the required reading that is set for that week. |
Tuesday: |
|
Wednesday: |
|
Thursday: |
|
Friday: |
|
Overall, each week, the Directed Learning and Independent Learning activities (i.e. the guided self-study activities, such as preparing your tutorial assignments, doing the required reading, or doing the lab exercises) should take you about 10 hours in total. This estimate does not include the time you need to do the two pieces of FNLP coursework.
License
All rights reserved The University of Edinburgh