{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data-driven Business and Behaviour Analytics (DBBA)\n", "# Lab session: 0 - Introduction and Python preliminaries\n", "\n", "## First steps: Anaconda\n", "\n", "The programming language for this course will be Python. This first notebook will illustrate how to install Python (through Anaconda platform) and how to take the first steps through it. To install Anaconda, please follow the instructions [here](https://docs.anaconda.com/anaconda/install/).\n", "\n", "After installing Anaconda, you will be provided with the last available version of Python, together with hundrends of packages useful for data science and scientific computing.\n", "\n", "We will now proceed to learn how to use some of these packages in Python. In this tutorial, we will first learn some of the basic features of the Python programming language." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is a Jupyter Notebook?\n", "\n", "Jupyter Notebook is a web application used to perform insightful data analysis with Python (and not only). Notebook documents provide a representation of all content visible in the web application, including inputs \n", "and outputs of the computations, explanatory text, mathematics, images, and rich media representations of objects.\n", "\n", "Moreover, they contains the inputs and outputs of a interactive session as well as additional text that accompanies the code but generally speaking is not meant for execution. In this way, notebook files can serve as a complete computational record of a session, interleaving executable code with explanatory text, mathematics, and rich representations of resulting objects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launching a Jupyter notebook\n", "\n", "You can start running a notebook server from the command line of the IPython console (calling Anaconda Prompt from the OS Start) using the following command:\n", "\n", "```{bash}\n", "jupyter-notebook\n", "```\n", "\n", "When you launch Jupyter, you will be presented with a menu of files in your current working directory to choose to edit. You can also navigate around the files on your computer to find a file you wish to edit by clicking the \"`Upload`\" button in the upper right corner. You can also click \"`New`\" in the upper right corner to get a new Jupyter notebook. After selecting the file you wish to edit, it will appear in a new window in your browser, beautifully formatted and ready to edit." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inside the notebook: Cells\n", "\n", "A Jupyter notebook consists of many **cells**. The two main types of cells you will use are **code cells** and **markdown cells**, and we will go into their properties in depth momentarily. First, an overview.\n", "\n", "A code cell contains actual code that you want to run. You can specify a cell as a code cell using the pulldown menu in the toolbar in your Jupyter notebook. Otherwise, you can can hit `esc` and then `y` (denoted \"`esc, y`\") while a cell is selected to specify that it is a code cell. Note that you will have to hit enter after doing this to start editing it.\n", "\n", "If you want to execute the code in a code cell, hit \"`shift + enter`.\" Note that code cells are executed in the order you execute them. That is to say, the ordering of the cells for which you hit \"`shift + enter`\" is the order in which the code is executed. If you did not explicitly execute a cell early in the document, its results are now known to the Python interpreter.\n", "\n", "Markdown cells usually contain text. The text is written in **markdown**, a lightweight markup language. You can read about its syntax [here](http://daringfireball.net/projects/markdown/syntax). Note that you can also insert HTML into markdown cells, and this will be rendered properly. As you are typing the contents of these cells, the results appear as text. Hitting \"`shift + enter`\" renders the text in the formatting you specify.\n", "\n", "You can specify a cell as being a markdown cell in the Jupyter toolbar, or by hitting \"`esc, m`\" in the cell. Again, you have to hit enter after using the quick keys to bring the cell into edit mode.\n", "\n", "In general, when you want to add a new cell, you can use the \"Insert\" pulldown menu from the Jupyter toolbar. The shortcut to insert a cell below is \"`esc, b`\" and to insert a cell above is \"`esc, a`.\" Alternatively, you can execute a cell and automatically add a new one below it by hitting \"`alt + enter`.\" This only happens, however, for the last line of the code cell." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python: how to do\n", "\n", "Let's move to a brief explanation of how Python works: what datatypes it can use, what operation it can employ ecc.\n", "\n", "Starting from the type of data you can store, you can use several datatypes for your computation." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(7)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(7.5)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type('seven')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On these datatypes, you can perform many different operations. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12.2" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "7 + 5.2 # there a float and an int, what datatype will the result be?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "int(7 + 5.2) # if you want to perform a data conversion, str() and float() work as well" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'hello word'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'hello' + ' word' # do not forget the whitespace, otherwise the two string will be adjancent." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'welcome to DBBA welcome to DBBA welcome to DBBA '" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3*'welcome to DBBA ' # this is an example of string concatenation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It could seem fancy to use a string concatenation to automate your tasks, but there even better way to do so." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "var = 'welcome to the course ' # defining a variable allows to spare time if you want to change something after writing the code" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'welcome to the course welcome to the course welcome to the course welcome to the course '" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var*4 # here we perform a string concatenation" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "var = 5" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var*4 # with the same code, we do a completely different computation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data containers\n", "\n", "On top of the elementary datatypes, you can also collect as many elements as you want in three useful compound data types, the lists, the tuples and the dictionaries.\n", "\n", "#### Lists\n", "\n", "Starting from the first one, a list is a container mutable (not all of them are) of elements of any type. Its elements are divided by a comma and enclosed in two square brackets. It has several functions we could directly use." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "test_list = [2020, 2021, 'quarantine']" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2021" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_list[1] # you can use the square bracket to get an element of the list (remind: indices start from zero!)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2020, 2021]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_list[0:2] # you can use the the square bracket to get a collection of elements as well" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "test_list.append('normality') # this way you can add elements at the end of the list (if you want to insert an element\n", " # at a given position, you can use test_list.insert(posit, elem))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "del test_list[0] # if you want delete an element providing the index, otherwise you can use the element itself with\n", " # test_list.remove(elem)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'<' not supported between instances of 'str' and 'int'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mtest_list\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0msort\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# if you want to sort the elements of a list (pay attention: they must be the same datatype,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[1;31m# otherwise you can use test_list.sort(key=fun) and fun is the function that takes as input the\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[1;31m# elements of the lists, computes some values (must be comparable!) and then sort basing on them)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[0mtest_list\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0msort\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mfun\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mTypeError\u001b[0m: '<' not supported between instances of 'str' and 'int'" ] } ], "source": [ "test_list.sort() # if you want to sort the elements of a list (pay attention: they must be the same datatype, \n", " # otherwise you can use test_list.sort(key=fun) and fun is the function that takes as input the\n", " # elements of the lists, computes some values (must be comparable!) and then sort basing on them)\n", "test_list.sort(key=fun)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Tuples\n", "\n", "Differently than lists, a tuple is an immutable ordered list of values, i.e. cannot be changed after assigned. They are enclosed in parenthesis." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "test_tuple = (4, 5, 'hello')" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_tuple[0] # you can " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'tuple' object does not support item assignment", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mtest_tuple\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'anything'\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" ] } ], "source": [ "test_tuple[2] = 'anything'" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Dictionaries\n", "\n", "Dictionaries are data containers (enclosed in curly brackets) which store data as (key, value) allowing for indices to be not onlt integers values." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "d = {'cat': 'cute', 'dog': 'furry'} # Create a new dictionary with some data" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cute\n" ] } ], "source": [ "print(d['cat']) # Get an entry from a dictionary; prints \"cute\"" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "d['fish'] = 'wet' # Set a new entry in a dictionary" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "del d['fish'] # Remove an element from a dictionary" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'furry'" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.get('dog', 'N/A') # Get an element with a default; prints \"N/A\"" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['cat', 'dog'])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.keys()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_values(['cute', 'furry'])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.values()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_items([('cat', 'cute'), ('dog', 'furry')])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.items()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loops\n", "\n", "In any programming language, some methods exist to avoid to write the same task thousands of times. They are called loops, here we present the while construct and the for loop." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inside the for loop\n", "Inside the for loop\n", "Inside the for loop\n", "Inside the for loop\n", "Inside the for loop\n", "Inside the for loop\n", "Inside the for loop\n", "Inside the for loop\n", "Inside the for loop\n", "Inside the for loop\n", "Finally outside!\n" ] } ], "source": [ "i=0\n", "while i<10:\n", " i+=1\n", " print('Inside the for loop')\n", "print('Finally outside!')" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cute\n", "furry\n" ] } ], "source": [ "for i in d.values():\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### List comprehension\n", "\n", "List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "squares = [x**2 for x in range(10)] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Functions\n", "\n", "A function is a reusable block of code that performs a specific task. Functions receive inputs to which code is applied and return outputs (or results) of the code. Python functions are defined using the def keyword. For example:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python functions are defined using the def keyword. For example:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "negative\n", "zero\n", "positive\n" ] } ], "source": [ "def sign(x):\n", " if x > 0:\n", " return 'positive'\n", " elif x < 0:\n", " return 'negative'\n", " else:\n", " return 'zero'\n", "\n", "for x in [-1, 0, 1]:\n", " print(sign(x))\n", "# Prints \"negative\", \"zero\", \"positive\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will often define functions to take optional keyword arguments, like this:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, Bob\n", "HELLO, FRED!\n" ] } ], "source": [ "def hello(name, loud=False):\n", " if loud:\n", " print('HELLO, %s!' % name.upper())\n", " else:\n", " print('Hello, %s' % name)\n", "\n", "hello('Bob') # Prints \"Hello, Bob\"\n", "hello('Fred', loud=True) # Prints \"HELLO, FRED!\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Classes\n", "\n", "A class is a structure in Python that can be used as a blueprint to create objects that have\n", "\n", " - prototyped features, \"attributes\" that are variable\n", " - \"methods\" which are functions that can be applied to the object that is created, or rather, an instance of that class.\n", " \n", "We want to define a class called Client in which a new instance stores a client's name, balance, and account level. It will take the format of:\n", "\n", "class Client(object):\n", " def __init__(self, args[, ...])\n", " #more code\n", "\n", "\"def __init__\" is what we use when creating classes to define how we can create a new instance of this class.\n", "\n", "The arguments of __init__ are required input when creating a new instance of this class, except for 'self'.\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# create the Client class below\n", "class Client(object):\n", " def __init__(self, name, balance):\n", " self.name = name\n", " self.balance = balance + 100\n", " \n", " #define account level\n", " if self.balance < 5000:\n", " self.level = \"Basic\"\n", " elif self.balance < 15000:\n", " self.level = \"Intermediate\"\n", " else:\n", " self.level = \"Advanced\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Methods\n", "\n", "Methods are functions that can be applied (only) to instances of your class.\n", "\n", "For example, in the case of our 'Client' class, we may want to update a person's bank account once they withdraw or deposit money. Let's create these methods below.\n", "\n", "Note that each method takes 'self' as an argument along with the arguments required when calling this method." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "# Use the Client class code above to now add methods for withdrawal and depositing of money\n", "\n", "# create the Client class below\n", "class Client(object):\n", " def __init__(self, name, balance):\n", " self.name = name\n", " self.balance = balance + 100\n", " \n", " #define account level\n", " if self.balance < 5000:\n", " self.level = \"Basic\"\n", " elif self.balance < 15000:\n", " self.level = \"Intermediate\"\n", " else:\n", " self.level = \"Advanced\"\n", " \n", " def deposit(self, amount):\n", " self.balance += amount\n", " return self.balance\n", " \n", " def withdraw(self, amount):\n", " if amount > self.balance:\n", " raise RuntimeError(\"Insufficient for withdrawal\")\n", " else:\n", " self.balance -= amount\n", " return self.balance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inheritance\n", "\n", "A 'child' class can be created from a 'parent' class, whereby the child will bring over attributes and methods that its parent has, but where new features can be created as well.\n", "\n", "This would be useful if you want to create multiple classes that would have some features that are kept the same between them. You would simply create a parent class of these children classes that have those maintained features.\n", "\n", "Imagine we want to create different types of clients but still have all the base attributes and methods found in client currently.\n", "\n", "For example, let's create a class called Savings that inherits from the Client class. In doing so, we do not need to write another __init__ method as it will inherit this from its parent." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# create the Savings class below\n", "class Savings(Client):\n", " interest_rate = 0.005\n", " \n", " def update_balance(self):\n", " self.balance += self.balance*self.interest_rate\n", " return self.balance" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }