This page consists of:
- three videos of short lectures. They cover:
- Text Corpora: Motivation
- Text Corpora: Some basic principles and experimental design
- Text Corpora: Tokenisation
- some required reading from Jurafsky and Martin and the NLTK book
- a quiz that tests your understanding of the material presented here.
Please do the required reading, and attempt the quiz. If there is anything you don't understand, then you can ask questions in the lecture or on piazza.
Lecture 3 Slides: whole!
3a: Text Corpora: Motivation
- Slides: 03a_slides.pdf
3b: Text Corpora: Basic Principles and Experimental Design
- slides: 03b_slides.pdf
3c: Text Corpora: Tokenisation
- slides: 03c_slides.pdf
Recommended Reading
J&M, 2nd edition, chapter 1
NLTK, chapter 11.
NOTE: The abbreviation J&M refers to the textbook:
Dan Jurafsky and James H. Martin, Speech and Language Processing.
When we specify 2nd edition, we are referring to the version of the book that was published by Pearson International in 2008.
When we specify 3rd edition, then we will supply links to the drafts of the relevant parts of that book (since the third edition isn't published yet, but the current draft is available here: https://web.stanford.edu/~jurafsky/slp3/).
The abbreviation NLTK refers to the textbook:
Bird, S., E. Klein and E. Loper (2009), Natural Language Processing with Python, O'Reilly Media
An (early) online version of this book is here: http://www.nltk.org/book_1ed/.
Quiz 3: Corpora and Sentiment Analysis
These questions are designed to test your understanding of the above course content; doing this quiz does not contribute to your overall grade. Some questions require a text answer. You can ask for formative feedback on these from your tutor or on piazza. Other questions are multiple choice or they require a numeric answer: you will get immediate feedback for these. Please don't attempt this quiz until you have acquainted yourself with this lecture and the required reading.
You must be logged onto Learn to do this quiz.