Week 6 - Working with Biological Databases
This week we are going to be practicing working with biological databases in the computing lab session. For the lecture this week I will be walking through and explaining a series of Jupyter Notebooks that programatically access data carrying on from the introduction last week and including some solution code. These notebooks are well annotated and there for you to use, modify and play with whenever you want to. This will hopefullly inspire you in Python coding. As you will see this week, it can empower you to perform large and complex studies taking advanatge of the vast array of data resources and services available from all over the world. We are keeping reading low this week as I know that a lot of people will be working hard on the coursework.
The notebooks are, as ever, available on our GitHub site - https://github.com/tisimpson/bioinformatics1/tree/main/labs/notebooks
Data Handling Using Pandas
There is a very popular python library that some of you may already heard of called 'Pandas' - Python Data Analysis Library that we use in all of the notebooks this week. It is a package that has created specific data strcutures and functions that can act upon them to do a vast array of things that are common asks in data analysis. It has grown to such an extent that there are many books written about it and the website itself is a very high quality resource explaining how to use it with examples and reference material. If you are going to learn data science and use Python then Pandas is the library that you are going to want to learn how to use.
- Pandas - https://pandas.pydata.org/
- Pandas documentation - https://pandas.pydata.org/docs/
- and this 10 minute quick-start is very useful too - https://pandas.pydata.org/docs/user_guide/10min.html#min
O'reilly have published a book "Python for Data Analysis" that covers Pandas, Numpy and iPython (effectively notebooks) that you might find very useful to have as a reference book in the future.
Lecture 6 - Working with Biological Databases
The lecture slides for Week 6 - "Exploring Biological Databases" are available here. This week I will be continuing demonstrations of database use from the web live as well as going through some solutions to the lab challenges in the lecture so there are only a few slides. This will also mean that people could choose to catch up next week if they are pressed for time with the coursework deadline just after the lecture this Friday.
The video of the lecture is available from the GitHub video area here.
Reading Lists & Resources
Each week we will have an accompanying reading list with some articles & web-sites for self study to support the course. You can find the course "Resource List" - here. We will continue to curate the list throughout the course especially if things pop up in the lectures and practicals that we want to add a reference or link to so do please check back in on the list from time to time.
We have generally tried to identify resources as "Essential", "Recommended" or "Further Reading" in an attempt to help you prioritise your reading during the course.
Finally a very important time to draw your attention to what you can consider the "core text" for the course, which is the excellent "Bioinformatics & Functional Genomics" Third Edition by Jonathan Pevsner. You will be pleased to know that this text-book is available free online as part of the University's subscription portfolio. You can find it right at the top of the resource list. If you have any problems accessing or using any of the above please do drop us a comment in the Discussion forum and we will try to get things resolved as soon as possible.
This week you should browse BFG for examples of some of these databases in use, especially material about biological databases in Chapter 2, but you don't need to read the whole chapter. Other useful things you might like to browse are some fo the useful guides provided by some of the resources above such as:-
- NCBI Training Tutorials - https://www.ncbi.nlm.nih.gov/guide/training-tutorials/ (very good)
- PubMed User Guide - https://pubmed.ncbi.nlm.nih.gov/help/ (very good)
- Bioportal Help - https://www.bioontology.org/wiki/BioPortal_Help
- Biogrid Help - https://wiki.thebiogrid.org
- Reactome User Guide - https://reactome.org/userguide (very good)
It will take more time than we have in the course for you to become comfortable with all of the resrouces. The aim here is to introduce you to them and give you some experience of using them (which we will do next week) so that in the future you will know where to start when looking for the different kinds of data.