# Part 1 - Introduction to Python and the Jupyter Notebook

In this course we will make extensive use of Jupyter notebooks. Jupyter notebooks offer an interactive python session inside a browser, with great added functionality such as markdown support, widgets and much more. They are a great tool for data exploration, and freqently used in data science.

To find out more:
*http://jupyter.org/*

This site has numerous examples and tutorials:
*https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks*

### Jupyter in the LEARN environment

We are currently running a Python3 Jupyter Notebook within the LEARN system. This means there is no need for you to set Jupyter up on your own system you can do everything that you need for the upcoming lectures inside here. You may chose to install it locally for use during your coursework. If so you should first need to [install Python](https://www.python.org/downloads/) and then visit the [Jupyter Install page](http://jupyter.org/install) and follow the instructions there.

### Notebook Basics

The notebook is organised in cells, each of which can hold markdown text (as this one) or python code. To execute the code in a cell, press the key combination ``Shift`` + ``Return`` or click the "Run" button in the control bar along the top of the Notebook.

You can try this in the cell below, where we assign a string to the variable ``a`` and then print it:

In [None]:
a = 'ATG'
print(a)

Once the cell above has been executed, the interpreter knows about ``a``, so we can start woking with it in the python way:

In [None]:
print(a+a)
print(a+a[::-1])
print(len(a))
print(len(a)**2)

### Further Python Basics
Using the same approach as we used for uploading this Python Notebook you can download some more notebooks that introduce a range of basic python concepts from the [Python-Lectures](https://github.com/rajathkmp/Python-Lectures/blob/master/01.ipynb) GitHub site.

## Biopython

*http://biopython.org/*

Biopython provides tools for analysis of genomics and proteomic data. We will use this throughout the course, so make sure this runs on your computer.

First we need to install the Biopython package in your environment. **You only need to do this once**, it will remain in your account space througout the course.

In [None]:
### Install the biopython package in your Noteable environment
%pip install biopython

import Bio

You can find an excellent [biopython cookbook](http://biopython.org/DIST/docs/tutorial/Tutorial.html) written by the biopython community in the resource list for this course that you can practice with to familiarise yourself with some of its functionality.

### Create a DNA sequence using BioPython

In [None]:
from Bio.Seq import Seq
my_seq = Seq("AGTACACTGGT")
print(my_seq)

### Basic operations on DNA sequences

In [None]:
#sequence length
print(len(my_seq),"nucletotides long")

#sequence %GC content
from Bio.SeqUtils import GC

#simple print
print("%GC content = ",GC(my_seq),"%")

#printing to two decimal places
print("%GC content = "+'%4.2f' % GC(my_seq)+"%")

In [None]:
#original sequence
print("original sequence",my_seq)

#sequence slicing NB this displays nucleotides 2-5
print("indexing from 1->5",my_seq[1:5])

#the sequence is indexed from 0
print("indexing from 0->5",my_seq[0:5])

In [None]:
#complement of the sequence
print(my_seq.complement())

#reverse complement of the sequence
print(my_seq.reverse_complement())

### Biopython contains useful meta-data

In [None]:
from Bio.Data import CodonTable
standard_table = CodonTable.unambiguous_dna_by_id[1]

print(standard_table)

In [None]:
#and STOP codons
print(standard_table.stop_codons)

### Challenge 1 - Creating a Random Sequence
An **optional** challenge to get a bit of basic practice manipulating sequences in BioPython. I will post example answers on the course Discussion Board before next week's class.

Create a random DNA sequence of length 100 base pairs and print it out. HINT you will need to use the Python ``random`` function.

### Challenge 2 - Creating Mutated Sequences

Another **optional** challenge, again I will post a possible answer on the course Discussion Board before next week's class. First I will help out by introducing some functions that you will probably want to use. There are many ways to do this so there is no one correct answer!

Using the random sequence you created above, make 20 random mutations in it replacing the original base with a random lower case one. Then print the original sequence with the mutated sequence below it.

HINT 1 - You will need to convert your random sequence into a ``MutableSeq`` object, see the BioPython cookbook for an example of it in use.

HINT 2 - You will want to select a random position in the sequence, you'll probably want to use the ``random.randrange()`` function for that.

Your final ouput should look like this:-

```
GTAAGCGCGTTGGGTTTGAAAGCCCACCGCAAAATGAAGCTCTAAGCAAACTGGGATAAATTGGCGACCCCGCACTGTTAGGACCGAAAGGTTTGTGACA
cTgcGCcCGTTGGGTTTGcAcGCCCACtGtAAAATGAAGttCTAAGCAAACTGGGATAAATTGtCGtCtCCGCACTGTTgGGACCGgAAGGTTTGtGcCA
```

## Matplotlib

This is essentailly a library for data ploting, and is useful to use in combination with ``numpy``. Try this:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
plt.plot(np.random.rand(20),np.random.rand(20),'o')

### Python help system

Python libraries and functions are well documented. To see help for a function easily from a cell you can type ``?function`` and execute the cell. This will show extended documentation at the bottom of the screen

Execute this cell to try:

In [None]:
?plt.plot

### Shutting Down the Notebook

To shutdown the Notebook choose "File->Close and Halt" from the menu at the top of the Notebook.

### Questions?

Any questions feel free to put any quesitons on the Discussion Forum in the "Tutorial Information & Discussion" Channel.