Week 4: Tagging and Ethics 1
Reminders and announcements
Welcome to Week 4! We are now switching lecturers again, and I (Sharon) will be teaching for the next three weeks, in which we'll cover various aspects of syntax, as well as some ethical issues in NLP.
- Tutorial groups this week. Our first group meetings are this week. Please remember to make a good attempt at the problems before you arrive to your meeting. The problems are available under last week's Reminders and Announcements.
- Assignment 1 is due next week. Get started now if you haven't already! If you have questions, you can:
- Post a private question to Piazza. When we respond, we'll also make it public if it doesn't give away solutions to other students. Please recall that we do not normally staff Piazza on the weekends.
- Attend extra help hours: see the bottom of this page for the available times.
- There are no lab or tutorial groups next week, to give you a little extra time for the assignment.
Overview of this week
Monday's lecture will be a bit of a change. Instead of focusing on linguistic or computational methods, we'll have the first of a few lectures where we will discuss some of the ethical issues that arise when we start to use NLP systems in the real world. In this case, we will be talking about algorithmic bias: what it is, and some examples of where and how it can arise. To illustrate these ideas, I'll talk about a paper showing bias against speakers of a particular dialect of English. I'll also talk a little about how linguists think about dialects, which might be different from how you think about them!
Then, we'll get back to technical topics and focus on parts of speech (categories such as "noun", "verb", and "adjective") and part-of-speech tagging -- the problem of taking an input sentence and returning the part of speech of each word. If you want to get a head start, try to think why such a problem is non-trivial to begin with (why not just tag all occurrences of "dog" with "noun"?).
For POS tagging, we are looking at Hidden Markov Models, which combine some of the ideas from n-gram models (sequence modelling) and those of the Naive Bayes model (latent or hidden variables). We'll also see our first example of a dynamic programming algorithm. This is an algorithmic strategy for solving problems that could take exponential time if solved naively, and it's used in a lot of classic (and some modern) NLP and machine learning methods.
Lectures and reading
Lecture # | Who? | Slides | Reading |
---|---|---|---|
1 | SG | Dialect and discrimination | Understanding Bias, Part 1 from Machines Gone Wrong by Lim Swee Kiat. |
2 | SG | Dialect and discrim (cont.) and Part-of-speech tagging Live lecture is cancelled; videos are available under Additional Materials below. | JM3 17.0-17.2 (*), 17.3, 17.4.0-17.4.4 (*) |
3 | SG | Algorithms for HMMs: | JM3 17.4.4-17.4.6 (*). (See also Additional Materials below) |
Additional Materials
- Videos for Thu (Lecture 2): these are from a few years ago, and I had made a few minor updates to the slides for this year's planned lecture, but the videos cover the same material with very similar slides:
- Video 1 (~15min): The Blodgett and O'Connor case study. Please read slides 1-3 from today's slides before starting the video (which begins at slide 4).
- Video 2 (~11min), Video 3 (~11min), Video 4 (~15min): Part-of-speech tagging. The final video runs about 5min past where I was planning to stop today, so it corresponds to the start of Friday's slides.
- To get a better intuition about the Viterbi algorithm, you can try playing around with this spreadsheet that implements the Viterbi algorithm (thanks to Shay Cohen).
- For students who are keen to learn more about the other HMM algorithms (for computing likelihood and for learning the parameter weights), feel free to read JM3 Appendix A.2-A.5. This material is optional and non-examinable!
Extra help hours
TAs will be available at the following times to help with questions about the assignment (or other course material), in the usual room (AT 5.07).
Day | Time |
Mon 7th Oct | 10am |
Tue 8th Oct | noon |
Thu 10th Oct | 3pm |
Fri 11th Oct | 10am |
Mon 14th Oct | 10am |
Tue 15th Oct | noon |