Analysing Data

Who is this course for? This course is for anyone who wants to improve their proficiency analysing data.

Learning Objectives: Attendees will be more confident assessing data and be able to analyse and draw insights from even unfamiliar datasets.

Course Length and Price: 2 days, £2,900 ex VAT

Pre-requisites: None

Course Content:
Introduction to data
Nearly all data is in tabular format. We look at the basics; tables, columns, rows and data types, then moving on to cardinality, uniqueness, relationships, aggregation and grouping. We reinforce these concepts with some exercises.

Getting data into shape
When data is in a tidy format, it is easier to analyse and visualise. We explain what a tidy format is and how we can transform a dataset into a tidy shape – with operations like pivot, append, merge and split. Attendees will practice these techniques on example datasets.

Basic Descriptive Statistics
We quickly remind ourselves of those stats we learned a school: mean, mode and median, range … We understand why they can sometimes be useful.

Exploratory Visual Analysis
The quickest and often most effective way to understand our data is to visualise it. We practice making quick rough-and-ready charts of our data so we can uncover the patterns in our data; trends, outliers, correlations… We’ll build some bar, scatter, line and other charts in under 10 minutes and hopefully have that “Aha” moment when an interesting pattern in the data is revealed.

Data Quality
Data is often messy; incomplete, riddled with bad values and missing important data points. We see how to identify the problems in our data and discuss techniques how to fix these – and approaches to avoid these in the first place. In an exercise, the instructors provide a few datasets that have some data quality issues – the attendees will find these as quickly as possible.

A survey of free (and nearly free) tools
This is a demo and discussion of some of the popular tools that we can use to understand and analyse data. These include:

  • the R and Python open source language – using cloud notebooks
  • SQL editors (SQL is a language to get data from most large databases)
  • Power BI

A very gentle overview of Machine Learning and Predictive Analytics
This is a brief description of machine learning (ML); looking at techniques and giving a couple of examples of algorithms. There will be a exercise to see if we can teach a ML model with a few examples to tell the difference between apples and oranges, or Chelsea or Arsenal footballers.

Below are some images from the case studies on the course.

Power BI column chart of survey results
Analysing a qualitative dataset – in this case survey responses
A word cloud shows the most popular words in Robert Louis Stevenson’s novel Dr Jekyll and Mr Hyde