Machine Learning

Machine Learning (ML) is a fundamental discipline in Artificial Intelligence (AI).  ML allows us to “train” computers to do tasks without explicit programming e.g. classifying images into certain categories.  Often such tasks would be impossible to program in any case.

This is a very practical course where students will build and test ML models in guided tutorials based on publicly available datasets.  The course starts with simple models and gradually progresses to more sophisticated and powerful ones.

Duration: 2 days (short version) or 3 days (full version)

Pre-requisites:

Much of the course involves guided exercises writing Python code. Students must know Python at least to the level in the Python Foundation course, and preferably to the level of the Intermediate course although we quickly recap on Python libraries covered in that course that we will need in this course e.g. numpy.

Contents

Introduction to ML

This covers the basic concepts in ML and includes:

  • examples of real world ML applications;
  • terms used in ML: e.g., datasets, features and labels; train test and evaluate models.
  • supervised vs unsupervised approaches;
  • regression vs classification techniques;

Prepare data for ML

  • What sort of data do we need to build a model? Data collection, size, quality…
  • Prepare data for use in ML models (feature engineering) e.g., improve data quality, change variables from text to numbers (one-hot encoding)

ML algorithm families – a quick tour

The first step in building a model is to choose the algorithm that we will use. There are several families of these algorithms including:

  • Linear methods
  • Decision trees
  • KNN clustering
  • Deep learning (neural nets)

We will have a quick look at each of these, with a short Python script to demonstrate their use.

Build a model

We will discuss the process for building and testing a model over the course of several exercises.  The main steps are:

  • split the data into training and test datasets;
  • fit the model to the training dataset;
  • evaluate the model performance (how good and accurate it is) with the test data and techniques such as confusion matrix; and
  • deploy the model into production.

Python libraries used in ML

During the guided exercise we will use a few popular Python libraries for ML.  These include:

  • scikit-learn, a library that provides simple and efficient tools for data mining, data analysis and ML
  • keras, a high-level neural networks API, written in Python and capable of running on top of TensorFlow (which is an open-source machine learning framework developed by Google, renowned for its flexibility and support for deep learning and neural networks)
  • numpy, a package for data manipulation which underpins many Python ML libraries. It supports large arrays and matrices, and a set of mathematical functions to operate on these.