Course materials for introductory data science
OTHER License
Welcome to the course materials for BTW2401 - Data Science and Visualisation!
Here you can find the course materials for the Data Science course taught at the Bern University of Applied Sciences by Lewis Tunstall and Leandro von Werra in 2020.
Data science is a big deal in business these days, and with good reasons. Amongst them (Loukides, 2010):
In essence, data science is about the extraction of useful information and knowledge from large volumes of data, in order to improve business decision-making (Provost & Fawcett, 2013). With improved decision making comes improved productivity, market value, and competitive edge. Thus the main goal of data science is to enable data-driven decision making across the whole company, where decisions are based on the analysis of data rather than pure intuition.
There are many challenges involved with making data science an effective component of a business, but for those that succeed (Google, Amazon, Facebook, LinkedIn) the rewards are immense.
This module provides students with a hands-on introduction to the methods of data science, with an emphasis on applying these methods to solve business problems. By the end of this course, it is expected that students will:
Throughout the course, students will also have the opportunity to learn several technical skills:
The module structure consists of classroom sessions, self-study tasks, and group work. The classroom sessions will involve a mix of theoretical and practical (programming) work, with an emphasis on the latter. The self-study tasks will be used to read selected chapters from the module textbook. The outline for the module is shown in the table below.
Note: To download the notebooks, right-click on the badge and select "Save Link as ..." and make sure to save them with the .ipynb
ending.
We will be teaching most of the class via Jupyter notebooks in Python. The notebooks for each class will be made available from links on this website. Note that some parts of the notebooks are removed for you to fill in as you follow along in class. You can open and run them directly on Binder by clicking on the Binder badge (see example below) at the top of each lecture notebook. We highly encourage the use of Binder, since it requires no local installation and runs for free.
Binder badge:
A few remarks about Binder:
Since Binder does not save your changes permanently, you should download the notebooks you worked on at the end of your session. If you want to continue your session later on you can re-upload them to Binder. See the image below for instructions how to upload and download files.
You can also install the data science lecture material locally on your laptop. In general, when working with Python it is recommended to use virtual environments. This makes sure that the packages you install don't interfere with the packages you already installed in other projects.
To install the data science lectures library run the following command:
pip install dslectures
To install JupyterLab (a more advanced environment than Jupyter notebooks) run:
pip install jupyterlab
Then make sure you download all course material from the GitHub repository or just the missing notebooks. In general you will need to copy all materials, since some resources such as images are not self-contained in the notebooks.
Finally, to start JuypterLab run:
jupyter-lab
Since we are developing the materials throughout the course you will need to update your local environment every time we move on to the next lesson. To do so just run the following command before you start JupyterLab
pip install --upgrade dslectures
Before class
We expect students to prepare for each class by completing the self-study tasks in advance.
During class
Please refrain from using your phone or social media (Facebook, Twitter, etc) during class; we do however encourage you to use the class Slack group! We also expect students to take an active role during the class by asking questions and contributing to the discussions.
Academic integrity and honesty
Your answers to homework, quizzes, and exams must be your own work (except for the group project, which is a collaborative task). Please don’t cheat or plagiarise other people’s work.
Update: Due to the COVID-19 pandemic, we have decided to remove the mid-term test from the assessment and allocate the 10 points to the group project.
The assessment for this module involves two parts whose sum is 100 points: one group project (50 points), and one final written exam (50 points). The total number of points determines the overall grade for the module.
Data Science
Provost & Fawcett will be used as the primary textbook for the module and is the standard data science text for business programs at over 150 universities around the world. The book by VanderPlas is an excellent reference for the Python programming aspects of the module.
Python Programming
We will use the Python programming language to analyse and visualise a variety of datasets in this module. McKinney’s book is an excellent reference to have at hand and covers the nuts and bolts of the NumPy and pandas packages.
Machine Learning
Although machine learning is not the only focus of this course, Géron’s book or fastAI’s MOOC provide the right level of technical detail to gain a deeper understanding of this exciting field.
Kaggle Learn
Kaggle Learn is a great resource to brush up on concepts like Python basics, data visualisation or pandas in an online notebook environment (similar to Binder).