S1 Week 5 Workshop - Visualisation
In the S1 Week 5 workshop, we will explore the design and evaluation of effective data presentations. The goal of this workshop is twofold:
- To familiarise yourself with the principles and guidance used for data visualisation in the course. We will use these principles during the project in Semester 2 and the exam.
- To learn how to evaluate and critique (in a constructive way) visualisations prepared by others.
To this end, we will give you a chance to peek at some samples of students’ work taken from a now-retired coursework from 2020/21. (Note that this year's assessment structure is different, but the same principles still apply.) In previous years, students have found this exercise very valuable.
You'll get most out of the workshop if you've completed the S1 Week 4 task, but you should still get something out of the workshop if you've not had time to undertake the preparation.
Workshop Plan
In the workshop session, you will need to do the following:
- Join a table, aiming to have no more than 5 people on each table.
- Decide who will be the "screen sharer", i.e. share their screen on the table's display and show these instructions to the group.
Review the following (you should find copies on your table):
- Read the summarised instructions (given below) that were given to the students for their coursework in 2020/21. Note that you're not supposed to answer these yourself, but make sure you understand what the students were asked to do.
- The screen sharer should open (in separate windows/tabs) the three sample student submissions we'll be looking at:
- The three sample submissions. Each contains code, a visualisation, and an interpretation for each of the two questions below (copies of their visualisations should be on the table):
- The shared spreadsheet corresponding to your workshop session:
- When looking at the visualisations for each sample, zoom in or out in the PDF viewer so that you can see the whole page, from top to bottom, including the page number. The page number text should be legible at this magnification.
- Find out which group your table corresponds to - the tables are labelled "A" through to "F", anticlockwise from the door of AT5.04. In the spreadsheet, click on the tab at the bottom for the sheet corresponding to your group (e.g. "Group A" or "Group B").
- Together compare Question 2 (yes, Question 2 first) for each of Sample 1, Sample 2 and Sample 3, and agree how to mark the three samples.
- For each question, there are visualisation criteria corresponding to the visualisation principles. You should assess how well the visualisation in each question meets the principles on a scale of 0 (absent) to 4 (excellent). You should not need to zoom in or out from the size you set in step 6 to be able to read the visualisation. (We instruct our markers not to zoom in or out in order to be able to view the visualisations.)
- You should also assess the quality of the explanation and the readability of the code.
- You can write (short) comments about why you awarded that number of points too. We will not be assessing your marking – the point is for you to learn about what makes a visualisation, explanation or code good.
- Make sure to add your marks to the appropriate row in your spreadsheet.
- Repeat step 8 for Question 1.
- When you're finished, you can look at the "All Results By Sample" in the shared spreadsheet to see how your marking compares with other groups.
- Reflect on:
- How similar or different your marks were to other groups, on each sample and each criterion?
- How difficult did you find it to agree on a mark?
- What practices would you adopt in your work?
- Do you think the overall marks are reasonable, compared to the University Common Marking scheme?
- How would you design marking criteria if you were teaching data visualisation?
- If you have time, take a look at Sample 4 - this isn't a student's submission, but rather what we got when we ran the coursework instructions through GenAI (specifically ELM). What marks would you give it?
Instructions we gave the students (summarised)
Questions and data
We ask you to analyse and visualize a dataset from collected from Japanese restaurants AirREGI / Restaurant Board (air): similar to Square, a reservation control and cash register system.
The data includes the ID of the restaurant, the date and time of visits and reservation-making, restaurant location in Japan, and restaurant type. You will find explanations about the files and variables in them in the Data/Information_On_Files.txt
file.
General instructions
- The criteria on which you will be judged include functional code, and the quality of the textual answers and visualisations asked for.
- For answers involving figures, make sure to clearly label your plots and provide legends where necessary. You will gain marks for clear visualizations.
Imports and data loading
import os
import sys
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import datetime
import matplotlib as mpl
import warnings
warnings.filterwarnings('ignore') # Ignore warnings in deprecated code - not generally good practice!
np.random.seed(42) # For reproducibility
Question 1: Restaurant genre and location
Use a scatterplot to visualize restaurants by location and type of restaurant (you might want to add a small random noise to the restaurant location to reduce overlap in the plot). You can choose how to represent the restaurant location (e.g., longitude/latitude, distance from city centre, or other). Note there are many restaurant categories. You should collapse the different categories to 4 or 5 categories based on your best judgement (e.g., Asian, International, Bar and party, Cafe and sweets). What can you infer from the plot you created about the relationship of the restaurant categories and their location?
# Your code goes here:
Your text goes here:
Question 2: Food passion
We wish to determine which restaurant genre Japanese people are most passionate about. To this end we will analyse how much time people plan ahead before visiting a restaurant.
- Compute the time difference between reservation time and visit time.
- Compare the time differences among restaurant genres using a visualization of your choice. Tip: In order to avoid outliers, it might be best to choose an upper threshold for values of preparation time. You may want to use the genre categories used in question 1 to reduce clutter.
# Your code goes here:
Your text goes here: