Week 2: Classification
Reminders and Announcements
Welcome to week 2! A big congratulations to all of you for making it through the first week! It can be a lot to keep track of everything at the start, so here are a few reminders:
- Week 1 quizzes. Many of you have completed the quizzes for the Week 1 units (good job!). But if you haven't, please try to catch up as soon as you can. If you are having any technical difficulties, please do let us know.
- Lab sessions this week!
- If you are not registered for the course by Monday morning, you won't be assigned to a lab session yet. In that case, please just pick one of the lab times to attend for this week. You'll get a permanent timeslot once you register.
- The lab is available below, under Additional Materials. Before your scheduled lab, go through the Preliminaries section of the lab while logged into a DICE machine (i.e. in one of the Appleton Tower labs).
- Response to Intake Form. Please see the bottom of this page for our responses to the intake form.
- TA hours: A reminder that we have drop-in help hours with a TA every Wed from 2-3pm in AT 5.07, if you have a question that's not easily answered on Piazza, or just prefer to ask in person!
- If you aren't yet registered, only recently registered, or missed a lot of Week 1:
- Make sure you go through all of the steps in the Week 0 preparation list, and start working through the Week 1 materials.
- Especially important to do as soon as possible: Watch the lecture 1 video and read the Course Guidance document, so you know how the course is structured. Also check that you can access the quizzes.
- If you run into problems with either of the above, please post to Piazza. In the meantime, keep working through other material to catch up.
Overview of Week 2
In the lectures this week, we will introduce the notion of a model and how we can train and evaluate one based on data. Specifically, we will focus on models for classification, the task of assigning a class to an example (e.g., deciding if an email is legitimate or spam). We will discuss two main families of models, Naive Bayes and logistic regression, comparing their strengths and weaknesses.
Lectures and reading
Lecture # | Who? | Slides | Reading |
---|---|---|---|
1 | EP | Data and probability | |
2 | EP | Classification and Naive Bayes | JM3 4.1-4.4, 4.7-4.8 |
3 | EP | Logistic Regression | JM3 5.1-5.6 |
Additional Materials
- Week 2 exercises. [Solutions.]Exercise sheets ask you to solve problems that will take a bit longer than the quizzes. Some of the problems have a single solution and some will be more open-ended. You should plan to work through the exercises by the beginning of the week after they are distributed.
- Normally, exercises will be distributed in the week before a tutorial group meeting, and we will use those meetings to discuss different solutions or solution methods to the previous week's exercises (which will often be more open-ended ones). So make a good attempt at the exercises before attending your tutorial group.
- However, it's difficult to arrange tutorial groups early in the semester, so we do not have a group meeting for this week's exercises. Instead, we will post solutions early next week. Feel free to post to Piazza or attend the TA hour if you have questions in the meantime, or are still confused after reading the solutions.
- Week 2 lab
- Before your scheduled lab, go through the Preliminaries section of the lab while logged into a DICE machine (i.e. in one of the Appleton Tower labs).
- The rest of the lab should be done during the lab session, working with a partner.
- Solutions will be released after the final lab session (by the end of this week) at the bottom of this page.
- Assignment 1 partnering form Complete this by the end of Friday if you want to work with a partner for the first assignment, which we strongly advise.
- If you found your own partner, one of you should fill in the form to tell us who the partner is.
- If you want us to assign you a partner, fill in the form to tell us that.
- We assume most students will want to work together in person, but if not you can tell us on the form.
- If you want to work alone, you don't need to fill in the form. However, previous experience suggests that pairs tend to do better on the assignment than individuals.
Response to intake form and students' worries
Thank you to everyone who completed our intake form: we received 142 responses. We now know that there are (at least) 33 different sets of native languages and dialects you speak at home. This means that each tutorial discussion group will have students from several different language backgrounds, which should give you a rich source of different experiences that inform your discussions (and other interactions with each other). A large majority of students have an academic background in computer science (60%), but there are significant numbers of students from linguistics, maths/physics, and engineering backgrounds. So take this opportunity to also engage with your peers who are knowledgeable in different disciplines. The programming experience of ANLP students also varies, with the majority (56%) having at least 4 months of experience. A similar percentage of you has an experience with NLP that goes beyond media coverage, acquired via independent learning, university courses, or projects. As expected given your academic backgrounds, you claimed on average to be more familiar with concepts from machine learning (e.g., Bayes theorem and backpropagation) than linguistics (e.g., syntactic constituent and distributional hypothesis).
We have also looked over the concerns people expressed in the intake form. While we can't address every concern that was raised, there were two that came up frequently:
- Skills with English and/or writing: Unfortunately, given the size of this course and the many other learning objectives, we can't offer much individual help with these issues, though we will try to provide some general writing guidance in assignment handouts. If you are worried about these issues, please look over the University pages on support for English and the Institute for Academic Development, which offers resources on academic writing as well as many other topics.
- Background knowledge/skills: Various students were worried about the aspects of the course that they are less familiar with (linguistics, programming, maths). Please remember that only a handful of students in the course have all of this background, and we certainly don't expect any previous experience with NLP, so almost all of you will need to stretch your thinking in some way. Your classmates are a great resource, which is one reason why we try to give you opportunities to meet other students, and why we pair you up for the labs and assignments. We can't guarantee that you will get someone from a different background to yourself, but we do try where possible. We also encourage you to find other students to study with, and to ask them or post to Piazza when you're confused. We've already seen some good examples of students helping other students there!