INF2-FDS: S1 Week 9 Workshop
We're looking forward to seeing you at the S1 Week 9 workshop. The goals of the task and workshop are:
- to develop your ability to interpret multiple regression analyses (Learning Outcome 3, assessed in the exam, and potentially Coursework 2, the Project)
- to develop your skills in reading and critiquing data-driven methods and claims from case studies, in order to identify and discuss the extent to which stated conclusions are warranted given evidence provided (Learning Outcome 4, particularly assessed in the exam)
Before the workshop: Preparation task
A. Learn about reading academic papers
Reading academic papers is a crucial skill that you'll need to use later on in this course, including the project, and are likely to use later on in your degree programme, particularly in your 4th year project. We suggest you spend about 20-30 minutes reading the following resources:
- 1-page infographic on Reading a research paper, part of the University of Edinburgh Institute of Academic Development's resources on reading
- How to read a paper by S. Keshav
- How to read a research paper by Michael Mitzenmacher gives very good advice on reading a paper critically.
The key ideas are:
- Read strategically - depending on your purpose, you don't always need to read all of a paper thoroughly
- Read critically, challenging assumptions, methods and findings in the paper
- Read in multiple passes, examining the paper in more detail at each pass
- On a first or second pass, if you encounter a concept you don't understand, make a note of it, and keep reading - hopefully the rest of the paper will still make some sense
- Before moving to the next pass, look up the concepts you've noted. Wikipedia is often a reliable source, but you may wish to find another source to check it - Wikipedia always provides references where you can check things.
B. Read the set paper for the workshop
After Wednesday's lecture, when you should have the background to understand most of the concepts:
- Do a first pass read of Koloğlu et al. (2018, Multiple linear regression approach for estimating the market value of football players in forward position arXiv 1807.01104 [Abstract]).
- Read the title, abstract, and introduction carefully
- Read the section headings, but ignore everything else
- Read the conclusions
- Now try to answer the questions - if you can't answer a question, don't worry; try again after the second-pass read in step 3.
- Question: What question does the paper address?
- Category: What type of paper is this? A paper collecting survey data, a randomised controlled trial, an analysis of an existing dataset?
- Context: What dataset(s) does it use? How were the datasets collected? Which methods were used to analyse the problem?
- Correctness: Do the assumptions appear to be valid?
- Contributions: What are the paper’s main contributions?
- Clarity: Is the paper well written?
- Now do a second-pass read, reading everything apart from concepts that you don't understand - there are some concepts we haven't covered in the lectures. Make a note of such concepts, and keep reading - hopefully the rest of the paper will still make some sense.
- Now try to answer:
- What are the features/predictors and the response variables used in the multiple regression?
- Identify any numerical or visual diagnostics that were reported for the fit, e.g. coefficient of determination, RMSE, residual plot. (See the Week 5 and week 6 lectures for the diagnostics).
- Identify any problems you see in the analysis, e.g. lurking variables, low adjusted coefficient of determination
- Are there any problems you see with the analysis?
- Look up the concepts you've noted. Wikipedia is often a reliable source, but you may wish to find another source to check it - Wikipedia always provides references where you can check things.
- Post your findings as comments in this Piazza post
- Reading about concepts you don't know about is common even for professional data scientists and academics.
- If you've got this far, you might want to try writing a summary of the paper and a critique.
In the workshop
We'll ask you to work with your group to look at Koloğlu et al. (2018, Multiple linear regression approach for estimating the market value of football players in forward position arXiv 1807.01104) and:
- Identify what question or prediction problem the multiple regression was being used to address. What are the predictors/features/independent variables and what are the response/target/dependent variables?
- Identify any numerical or visual diagnostics that were reported for the fit, e.g. coefficient of determination, RMSE, residual plot. (See the Week 6 and week 8 lectures, and the lecture notes for the diagnostics).
- Identify any problems you see in the analysis, e.g. lurking variables, low adjusted coefficient of determination.
We'll provide a sheet with answers after the Friday Workshop.
After the workshop
Our take on the paper is in the document below - you may have spotted other things in your workshop group.