Discovering patterns in data

This lab will first introduce you to key concepts in machine learning for social science. Our focus will be on a particular branch of machine learning called unsupervised learning, which includes techniques for clustering and dimensionality reduction. We will then focus on hands-on data analysis with the scikit-learn library for machine learning in Python. Our research objective is to group UK counties with similar mobility trends using two popular techniques of unsupervised learning: k-means clustering and Principal Components Analysis (PCA).

Key themes

  • Definition of machine learning.

  • Supervised and unsupervised learning.

  • Introduction to unsupervised learning techniques, including clustering (k-means) and dimensionality reduction (Principal Component Analysis (PCA)).

  • Hands-on machine learning with scikit-learn.

  • Data-informed model parameter selection.

Learning resources

M Molina & F Garip. 2019. Machine learning for sociology. Annual Review of Sociology. Link to an open-access version of the article available at the Open Science Framework.

Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane. 2021. Chapter 7: Machine Learning. In Big Data and Social Science (2nd edition).

What is Machine Learning? OxfordSparks.

Additional resources

Kosuke Imai. 2018. Chapter 3.7.3: The k-means algorithm. In Quantitative Social Science. Princeton University Press.

Jake VanderPlas. 2016. In Depth: k-Means Clustering. In Python Data Science Handbook.

Sebastian Raschka. 2018. Python Machine Learning. Packt Publishing.


Machine Learning: What is it? What is it good for?

Field of study that gives computers the ability to learn [from data] without being explicitly programmed.

—Arthur Samuel, 1959

Data science tasks we can solve using machine learning

  1. Pattern discovery using unsupervised machine learning

  2. Prediction using supervised machine learning

Unsupervised and Supervised learning

Two types of machine learning are often distinguished in the literature: unsupervised learning and supervised learning

  1. Unsupervised learning — no outcome variable / labeled data are available, and the structure of data is unknown. The goal of unsupervised learning is to explore the structure of data and discover hidden structures and meaningful information without the guidance of outcome variable / labeled data. To uncover such hidden structures in data, we use unsupervised learning techniques, including clustering (e.g., k-Means) and dimensionality reduction (e.g., Principal Component Analysis (PCA)).

Unsupervised Learning, Machine Learning’s course by Andrew Ng

  1. Supervised learning — learn a model from labeled training data or outcome variable that would enable us to make predictions about unseen or future data. The learning is called supervised because the outcome variable as well as labels (e.g., email Spam or Ham where ‘Ham’ is e-mail that is not Spam) that guide the learning process are already known.

Supervised learning, Machine learning’s course by Andrew Ng

In this lab, we will be focusing on unsupervised learning.

Research problem: clustering counties by mobility

Let’s formulate our simple research problem: to inform a public health intervention, we need to group a number of counties in the UK with similar mobility trends. We frame this problem as a clustering task and perform k-means clustering to sort the UK counties into clusters with similar mobility trends.

k-means clustering

Clustering is an exploratory data analysis (EDA) task that aims to group a set of observations into subgroups or clusters (without any prior information about cluster membership) such that observations assigned to the same cluster are more similar to each other than those in other clusters. To cluster observations in our mobility data, we will employ the k-means algorithm.

The k-means algorithm

The k-means algorithm is an iterative algorithm in which a set of operations are repeatedly performed until a noticeable difference in results is no longer produced. The goal of the algorithm is to split the data into k similar groups where each group is associated with its centroid, which is equal to the within-group mean. This is done by first assigning each observation to its closest cluster and then computing the centroid of each cluster based on this new cluster assignments. These two step are iterated until the cluster assignment no longer changes.

The k-means algorithm produces the prespecified number of clusters k and consists of the following steps:

  1. Choose the initial centroids of k clusters.

  2. Given the centroids, assign each observations to a cluster whose centroid is the closest to that observation.

  3. Choose the new centroid of each cluster whose coordinate equals the within-cluster mean of the corresponding variable.

  4. Repeat Steps 2 and 3 until cluster assignment no longer change.

—Kosuke Imai. 2018. Quantitative Social Science. Princeton University Press.

See also Jake VanderPlas’ Python Data Science Handbook.

On k-Means Advantages and Disadvantages, read here.

Recent applications of k-means clustering in social sciences

Garip, F. 2012. Discovering diverse mechanisms of migration: The Mexico–US Stream 1970–2000. Population and Development Review, 38(3), 393-433. Open access version.

Bail, C. A. (2008). The configuration of symbolic boundaries against immigrants in Europe. American Sociological Review, 73(1), 37-59.

Let’s get coding with scikit-learn

Scikit-learn is simple, efficient, and widely used library for machine learning in Python.

# Import libraries for today's lab

# Data analysis & visualisation
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

sns.set_theme(font_scale=1.5)
%matplotlib inline

# Suppress warnings to avoid potential confusion
import warnings

# Machine learning
from sklearn.cluster import KMeans  # For performing k-means
from sklearn.decomposition import PCA  # For performing PCA
from sklearn.preprocessing import StandardScaler  # For standartising data

warnings.filterwarnings("ignore")

The k-means clustering algorithm in scikit-learn

The KMeans estimator class in scikit-learn allows you to set up the algorithm parameters before fitting the estimator to the data.

Parameters of the KMeans algorithm include:

  • n_clusters — Number of clusters k to form (same as the number of centroids to generate).

  • init (‘random’ or ‘k-means++’, default=’k-means++’) Method of selection of initial centroids. ‘random’ selects n_clusters observations (rows) at random from data for the initial centroids. ‘k-means++’ selects initial cluster centers in a way that speeds up convergence.

  • n_init (default = 10) — Number of times the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs. The best output is measured in terms of the sum of squared distances of samples to their closest cluster center.

  • max_iterint (default=300) — Maximum number of iterations of the k-means algorithm for a single run.

  • random_state (default=None) For computational reproducibility, determines random number generation for centroid initialization.

We instantiate the KMeans class with the following arguments:

kmeans = KMeans(n_clusters=3, init="k-means++", n_init=10, max_iter=300, random_state=0)

kmeans
KMeans(n_clusters=3, random_state=0)

Data preprocessing

We preprocess the data in a format expected by the scikit-learn library. As part of the data preprocessing, we first remove countries with one or more NaN (Not a Number) using the Pandas method dropna(). Although some scikit-learn functions, such as StandardScaler(), handle NaNs, others, such as fit(), may require fine-tuning, so we remove NaNs at this stage to avoid unexpected downstream problems.

Tip

scikit-learn works on any numeric data stored as NumPy arrays, SciPy sparse matrices, or (nowadays) pandas DataFrame. If needed, you can convert Pandas DataFrame into a NumPy array using the Pandas method to_numpy().

# Drop NaNs from the DataFrame
mobility_trends_UK_mean_NaNdrop = mobility_trends_UK_mean.dropna()
mobility_trends_UK_mean_NaNdrop
Retail_Recreation Grocery_Pharmacy Parks Transit_stations Workplaces Residential
sub_region_1
Aberdeen City -50.046371 -10.722567 20.557692 -46.127016 -42.489919 14.567010
Aberdeenshire -28.253669 -11.248447 22.474684 -39.953878 -37.207661 12.222222
Angus Council -25.955975 -6.125786 13.982143 -31.150943 -33.542339 10.831551
Antrim and Newtownabbey -29.377358 -7.465409 -29.134328 -53.752621 -33.679435 12.859031
Ards and North Down -27.262055 0.452830 6.838298 -41.721311 -35.991935 12.679039
... ... ... ... ... ... ...
Windsor and Maidenhead -42.714885 -11.178197 0.379455 -43.693920 -43.711694 16.709220
Wokingham -39.044025 -16.285115 30.458101 -51.299790 -45.034274 18.237327
Worcestershire -36.025497 -9.990563 26.511954 -34.033107 -33.779758 12.112019
Wrexham Principal Area -42.293501 -10.448637 -1.860140 -38.511530 -31.306452 11.113895
York -41.892276 -12.343621 2.055319 -47.364729 -44.381048 14.039666

141 rows × 6 columns

Data standardisation

It is a good practice to standardise our input features or variables before applying the k-means algorithm. Standardisation of input features in a data set is a common requirement for many statistical and machine learning estimators. By standardising individual features, all features are converted to the same scale so that the output of the clustering procedure is not influenced by how individual features are measured.

In our example data, the six features are measured on a similar scale so one may argue that standardisation is not strictly necessary but we will perform it so that the procedure is part of your data analysis workflow.

The sklearn.preprocessing module includes StandardScaler among other methods for data scaling. The StandardScaler method calculates a standard score or z-score of a sample observation x as z = (x - M) / SD where M is the mean of the sample observations and SD is the standard deviation of the sample observations. In simple words, for each observation in a column, we subtract the mean and divide by the standard deviation of that column.

# Data standardisation
scaler = StandardScaler()  # Initialising the scaler using the default arguments
mobility_trends_UK_standardised = scaler.fit_transform(
    mobility_trends_UK_mean_NaNdrop
)  # Fit to input data (continuous variable) and return the standardised variables

mobility_trends_UK_standardised
array([[-2.41537629e+00, -5.87410490e-01,  2.10134074e-01,
        -7.91288930e-01, -1.67445518e+00,  1.27693514e+00],
       [ 1.31187764e+00, -6.92727365e-01,  2.89076997e-01,
        -1.75449527e-01, -4.93614030e-01,  5.60796729e-02],
       [ 1.70485729e+00,  3.33177282e-01, -6.06512500e-02,
         7.02741514e-01,  3.25763531e-01, -6.67998047e-01],
       [ 1.11969050e+00,  6.48938482e-02, -1.83621506e+00,
        -1.55202810e+00,  2.95115745e-01,  3.87645364e-01],
       [ 1.48147561e+00,  1.65066306e+00, -3.54839347e-01,
        -3.51770707e-01, -2.21840285e-01,  2.93929581e-01],
       [ 2.00258288e+00,  2.11421107e-01,  3.05692453e-01,
         2.65069849e+00,  3.83734106e-01, -1.67810664e+00],
       [ 1.09194483e+00,  6.58562770e-01, -4.56719269e-01,
         4.51770033e-01,  9.42324863e-01, -8.67190528e-01],
       [-2.21396518e+00, -1.85507507e+00,  4.28819439e-01,
        -9.18325110e-01, -2.03772158e+00,  1.31997203e+00],
       [-3.28335111e-01, -9.81312186e-01, -4.81345232e-01,
        -8.88417676e-01,  5.17312190e-01,  6.06094340e-01],
       [-7.31534167e-01, -7.40711678e-01, -7.39828984e-01,
        -1.01490455e+00, -1.09824341e+00,  5.09791231e-01],
       [-4.17089442e-01, -2.39916438e-01, -2.18071876e+00,
         4.71429465e-01,  1.22085915e+00, -5.87127887e-01],
       [ 6.34919974e-01,  6.12376786e-01, -9.35957607e-02,
         1.92957377e+00,  1.28576034e+00, -1.41051636e+00],
       [ 4.67831907e-01, -5.43432333e-02, -6.76448867e-01,
        -1.72103240e-01,  6.05199222e-01, -2.57406512e-01],
       [-9.44169824e-01, -8.01254388e-01, -3.82735021e-02,
        -1.10237086e+00, -1.48290652e+00,  2.02436811e+00],
       [-2.07332963e-01,  5.09093962e-01, -4.33669270e-01,
        -7.48919361e-01,  3.03679097e-01,  3.32261862e-02],
       [-1.08262990e+00, -1.54858287e+00,  6.84139397e-01,
        -1.05465004e+00, -1.94172190e+00,  1.27498524e+00],
       [-2.28479278e+00, -1.54696956e+00,  1.40169462e+00,
        -1.82849460e+00, -1.80585029e+00,  1.84153895e+00],
       [-4.13371668e-01, -1.19533632e+00,  2.14491224e-01,
        -4.03248568e-01, -1.00633215e+00,  1.55218767e+00],
       [ 9.14953837e-01, -4.32207224e-01, -7.37812544e-01,
        -1.56417523e-01,  4.56918025e-01, -2.04746545e-01],
       [-9.97292725e-01, -9.48753868e-01,  2.41068305e-01,
        -4.28912502e-01, -7.31326728e-01,  1.25170028e+00],
       [-2.34543767e+00, -1.84777037e+00, -9.86176808e-01,
        -1.78656312e+00, -2.53573545e+00,  1.38310243e+00],
       [-3.73703914e-01,  1.96974821e+00,  1.42562908e-01,
        -3.20385557e-01,  6.54325820e-01, -6.92023258e-01],
       [ 1.77154910e+00,  2.35265039e+00, -4.67063822e-01,
         1.03247984e+00,  1.14649320e+00, -1.04451564e+00],
       [ 2.80203585e-01,  9.05448548e-02,  6.35362397e-01,
         7.94229726e-01, -1.33953253e-01,  9.48051851e-01],
       [ 4.42795821e-02, -9.80157490e-02, -3.14236903e-01,
        -5.89667840e-01, -7.02608759e-01, -1.27607217e+00],
       [-2.31326176e-01, -5.18900999e-01,  4.54885264e-01,
         8.00835190e-02, -4.13592167e-01,  4.57931896e-01],
       [-5.46873897e-01, -3.50548290e-01,  4.54961929e-02,
        -2.41026933e-01, -5.12150647e-01,  3.24134813e-01],
       [ 5.01536367e-01,  2.45845230e+00,  1.11553059e+00,
         1.15950961e+00,  2.57256716e-01, -8.16295581e-01],
       [ 2.71938293e+00, -3.24968745e-01,  2.62289434e+00,
         1.64105820e+00, -4.13853194e-01, -1.30424830e+00],
       [ 4.13223226e-01,  1.26185094e-01,  6.32126041e-01,
         7.13201592e-01,  7.01723227e-01, -4.52718023e-01],
       [ 8.30740701e-01,  7.93206094e-01,  8.85967884e-01,
         1.32457083e+00,  5.78600973e-02, -8.01723499e-01],
       [ 2.78207349e-03, -3.91481883e-01, -1.42321175e+00,
        -7.68578794e-01,  5.60128950e-01, -1.85592263e-01],
       [ 9.99214987e-01,  1.14264592e+00, -5.03484910e-01,
         4.16424882e-01,  7.87002143e-01, -1.14459949e+00],
       [-5.63568276e-01,  1.58499239e-01,  6.15121148e-01,
        -1.32519642e+00, -1.59643308e-01, -1.41011204e-01],
       [-2.64686613e-01,  5.73815584e-01,  9.02861806e-01,
         1.24289248e+00,  2.63312668e-01, -1.08285957e-01],
       [ 7.36033354e-01,  1.51757097e+00, -9.55616532e-01,
         7.97048121e-02,  8.03057720e-01, -7.86508306e-01],
       [ 7.98585653e-01,  7.63343834e-01,  1.53908709e+00,
         1.74000469e+00,  6.02532075e-01, -6.86214329e-01],
       [ 1.93697825e-01,  5.74194792e-01,  2.32295345e+00,
         1.17977073e+00, -4.22511086e-02, -3.27218433e-01],
       [ 7.49658561e-01,  1.32402063e+00,  4.11615487e-01,
         1.43306852e+00,  1.31550672e+00, -1.63551190e+00],
       [-6.58641059e-01, -9.62056508e-01, -4.01955754e-01,
        -2.56596972e-01, -8.50119888e-01, -3.04841967e-01],
       [ 6.38522267e-01,  9.02912210e-01, -4.52383291e-01,
         1.45879310e+00, -5.36430789e-01,  5.04308110e-01],
       [ 2.25412405e+00,  1.26062345e+00, -2.70119627e+00,
        -2.28236281e-01, -1.43287852e+00,  1.60023489e+00],
       [ 6.18399411e-01,  6.69405557e-01,  2.13101597e+00,
         1.34271879e+00,  9.37555002e-01, -6.43560376e-01],
       [ 3.89600817e-01,  4.26595899e-01,  1.26236199e+00,
         6.67435614e-01,  1.14903272e-01, -2.51113270e-01],
       [-2.93314794e+00, -1.54457592e+00, -4.06682329e-01,
        -2.27125929e+00, -3.40943264e+00,  2.59972243e+00],
       [ 2.80645148e-01, -1.58584033e-01,  4.77542547e-01,
        -7.50851189e-01, -3.29725724e-01,  6.80465213e-01],
       [ 5.45997570e-01, -8.83124919e-01, -1.77862601e-01,
        -8.02041658e-01, -4.92211395e-02,  2.49417221e-01],
       [ 5.07990413e-01,  1.81230488e+00, -1.59848407e+00,
         3.01359819e-01,  1.28981667e+00, -1.30511626e+00],
       [ 6.41032672e-01,  7.25393078e-01,  5.87751098e-01,
         1.37225816e-01,  4.42522634e-01, -2.64987231e-01],
       [ 2.46243012e-01, -8.25605623e-01, -9.66875108e-01,
        -3.21222129e-01,  7.15671917e-02, -1.88649381e-01],
       [-2.16691003e+00, -1.47356459e+00, -6.04721112e-01,
        -2.26470128e+00, -1.72079496e+00,  1.05846299e+00],
       [-2.14333635e-01, -3.06264997e-01,  5.71263043e-01,
         5.27578736e-01, -8.70086780e-02,  4.00978766e-01],
       [-1.71155996e+00, -1.35866849e+00,  7.66843214e-01,
        -1.35821786e+00, -2.53682878e+00,  2.33003813e+00],
       [-2.72949807e-01, -6.52074448e-01,  4.39292077e-01,
        -4.99890346e-01, -3.05828277e-01,  3.51667405e-01],
       [ 1.05228158e+00,  7.64781929e-01,  2.59030425e-01,
         1.53251597e+00,  5.51933518e-01, -1.39435184e+00],
       [-3.40666694e-01, -1.02394470e-01,  7.10133590e-01,
         1.57309470e-01, -1.23213130e-01,  8.55885418e-01],
       [-7.16844000e-01, -1.84076537e-01, -2.56369869e+00,
         9.33426133e-01,  1.55492416e+00, -1.24951587e+00],
       [-8.68574473e-02, -7.16864763e-01,  4.07809474e-01,
         7.89535817e-01,  1.54761863e+00, -8.79109984e-01],
       [-6.44915163e-01, -4.09765378e-01,  3.00150506e-01,
        -8.68784983e-01, -1.05038802e+00,  1.60479716e+00],
       [ 1.09817702e+00, -9.47379435e-02,  1.29748466e+00,
         4.36084315e-01,  2.60967426e-03, -1.26638571e+00],
       [ 1.61298931e-01,  4.45696781e-01, -2.95432646e+00,
         9.23988198e-01, -1.84744710e-01, -3.74400569e-01],
       [ 9.30834348e-01,  9.88358987e-01, -5.37702293e-01,
         1.22332637e+00,  1.10733781e+00, -1.41475499e+00],
       [ 1.49008100e+00,  1.51715112e+00,  1.10412350e+00,
         1.43369595e+00,  1.39978813e+00, -1.31015656e+00],
       [ 2.52791134e-01,  2.42628706e-02,  4.89546495e-01,
         1.35651975e-01,  1.00990087e-01,  3.26616517e-01],
       [-4.28650440e-01, -4.15575521e-01, -1.49066157e-01,
         6.49273481e-02,  1.27758176e+00, -1.24305429e+00],
       [ 2.81747471e-01, -1.51138242e-02,  1.18620310e+00,
         6.11550108e-01,  6.57643438e-01, -2.07875846e-01],
       [-1.62860149e+00, -1.29204685e+00,  8.76055007e-01,
        -8.61185510e-01,  2.72884070e-01, -3.00251579e-01],
       [-2.89775591e-02, -4.58944935e-02,  1.86393205e-01,
         2.83582437e-01, -2.33198855e-01,  5.95843693e-01],
       [ 1.11291967e+00,  1.32099448e+00,  9.23389281e-01,
         9.76095144e-01,  1.53256874e+00, -1.05584050e+00],
       [ 5.49941708e-01, -1.06404494e-01, -2.03025104e+00,
         3.69783876e-01, -5.08036517e-01,  1.04620831e+00],
       [ 1.60547630e-01, -3.88542941e-01, -1.14117162e+00,
        -2.44514578e+00,  1.28530964e+00, -2.04951200e-01],
       [-2.91278134e-01,  2.03615734e-01,  4.31891105e-01,
        -1.20716914e-01, -2.05275379e-01,  2.09543280e-01],
       [-8.39879102e-02, -1.42133957e-01,  2.21155977e-01,
        -3.75182043e-01, -1.24610121e-01, -1.79113271e-01],
       [ 1.06734102e+00,  5.48559756e-01, -3.06610248e+00,
         2.97031258e-01,  1.99696925e+00, -7.44713373e-01],
       [ 1.16379315e+00,  1.37776129e+00, -1.00397095e+00,
         8.44958686e-01,  5.73199329e-01, -1.48322085e-01],
       [-6.74534146e-01,  5.35579285e-02, -1.02887208e+00,
        -6.98515922e-01,  1.35336575e+00, -1.24003431e+00],
       [ 5.87231749e-01,  1.03348521e+00, -3.23060161e-01,
         7.06434607e-01, -6.90571122e-01,  6.32351230e-01],
       [-3.80270812e-01, -1.41093100e+00,  1.79398017e-01,
        -9.56948230e-01, -3.16037155e-01,  9.37975090e-01],
       [-3.37908510e-02, -1.59015043e+00, -7.77171529e-01,
         7.02532371e-01, -5.74271117e-01,  1.06204838e-01],
       [ 7.96988228e-01, -5.49345061e-01, -1.91604644e-01,
         4.00940523e-01,  9.45183094e-01, -1.96264956e+00],
       [ 6.74002805e-01,  3.34517314e+00, -2.72192267e-01,
        -1.00009998e+00,  6.06100628e-01, -4.82884217e-01],
       [-5.10347549e-01, -4.48581330e-01, -3.15824134e-01,
        -4.92510164e-01, -6.40438031e-03,  1.96856875e-01],
       [ 1.25630113e+00,  2.15658096e+00, -2.47587397e-01,
         7.96093374e-01,  1.03336681e+00, -3.38756851e-01],
       [ 9.78453000e-01,  1.54057068e-01,  1.08356714e+00,
         7.36095370e-01,  9.16510695e-01, -6.55529139e-01],
       [ 1.44884682e+00,  2.05497753e+00, -1.00373159e+00,
         1.75807659e+00,  3.99678778e-01, -5.97764058e-01],
       [ 4.48111213e-01,  2.45009017e-01,  4.14817282e-02,
         1.88084347e+00,  1.86942038e+00, -1.87495438e+00],
       [ 6.39264986e-01, -1.73817839e-01, -2.48957355e+00,
        -2.70478669e-01,  3.67707787e-01, -7.82936259e-02],
       [ 6.25956022e-01,  7.08942028e-01, -4.92139556e-01,
         1.55959998e+00,  1.60936490e+00, -1.37707138e+00],
       [ 3.46468406e-03, -5.80221591e-02,  8.56699378e-01,
        -1.57311556e+00,  7.02150836e-02,  4.21075187e-01],
       [ 7.62039740e-01,  2.55262608e-01,  8.58026425e-01,
         1.34394919e+00, -1.23652422e-01, -3.53544690e-01],
       [-1.17816393e-01, -2.25268232e-01, -5.08726992e-01,
         4.02392764e-01,  7.69069752e-01, -9.54373891e-02],
       [ 5.71631132e-01,  1.12726832e+00,  1.37285954e+00,
         1.29921707e+00,  1.54463075e-01, -6.50961458e-01],
       [-3.03161070e+00, -1.35627621e+00,  1.28228457e+00,
        -1.58863061e+00, -8.68305156e-01, -1.17706720e-01],
       [-3.19895877e-01, -2.73070121e-01,  1.10344469e-01,
         7.32221235e-01,  4.21126285e-01, -1.24111849e-01],
       [-5.35541037e-01, -1.04817560e+00,  1.13107674e-01,
        -5.19481536e-01, -1.07653918e+00,  1.45166915e+00],
       [ 1.37749376e+00,  2.06967224e+00,  1.03342075e+00,
         3.60129778e-01,  1.18125200e+00, -1.72294656e+00],
       [-5.58719885e-01, -9.33086936e-01,  3.27320542e-01,
         9.97825305e-02, -5.34177276e-01, -1.53553186e-01],
       [-2.17756147e-01, -2.83057826e-01, -1.15057885e+00,
        -9.02011964e-01,  1.22040845e+00, -7.45152709e-01],
       [-9.66288743e-01, -9.95747909e-01,  7.60416248e-01,
         5.80423774e-01,  1.24447538e-01, -7.22197418e-01],
       [-8.48884276e-01, -1.14344048e+00,  1.10948869e-01,
         3.84525108e-01, -8.42457942e-01, -7.20442253e-04],
       [ 5.11217435e-01, -9.29693301e-02,  3.85360723e-01,
         1.45695782e-01,  1.08564833e+00, -6.99833046e-01],
       [-1.96303916e+00, -6.90117766e-01,  2.70115427e-01,
        -2.31438314e+00, -1.90070795e+00,  2.53689337e+00],
       [-6.12862155e-01,  3.95314916e-01, -6.28826401e-01,
         1.00432558e+00,  1.21454931e+00, -1.14703650e+00],
       [-6.40820778e-01,  1.76738105e+00, -4.34158845e-01,
        -1.95880190e+00, -6.46852958e-01,  6.48396695e-01],
       [-6.24754977e-02,  2.36955791e-01, -7.06181374e-01,
        -5.99382187e-01,  6.53424414e-01, -2.30024611e-01],
       [ 1.23489386e+00, -1.47549684e-01,  5.87986383e-02,
         3.20010005e-01,  4.58338698e-01, -9.66638515e-01],
       [ 5.26336158e-01,  2.17542370e-01,  6.01233186e-01,
         6.01633925e-01,  1.25008009e+00, -4.60496924e-01],
       [ 4.42732842e-01, -2.28751942e+00, -1.07182966e-01,
        -1.24856175e+00, -6.95979555e-01,  1.02781782e+00],
       [ 2.44440855e-01,  2.12100527e-01,  8.73219238e-01,
         2.12948099e+00,  9.10159412e-01, -6.83801744e-01],
       [ 9.20630362e-02,  1.79092743e-01, -6.04521843e-01,
         3.93210020e-01, -5.10872875e-02, -9.01347151e-01],
       [-1.90500018e-01, -3.77676214e-01,  1.40173167e+00,
        -6.76644044e-01, -5.94877340e-01,  1.16431008e+00],
       [ 1.22055326e+00,  5.76189492e-01,  1.13972337e+00,
         1.74745081e-01,  3.19772526e-01,  3.11688572e-01],
       [-2.23343653e-01, -1.15253437e-01,  3.92435213e-01,
        -2.85070809e-01,  5.49514272e-01, -4.71387384e-01],
       [-2.03419071e+00, -6.26516482e-01, -2.05213605e-01,
        -1.01632511e+00, -1.22356368e+00,  2.09040279e-01],
       [-2.48208585e-01,  6.37987567e-01,  8.25699195e-01,
        -1.03990657e+00, -6.57219120e-01,  7.03108275e-01],
       [ 1.33278925e-01, -2.50996324e-01,  2.49996081e-01,
         6.94763834e-01,  4.59955378e-01, -1.38575503e-01],
       [-1.25145410e+00,  3.17642873e-01, -2.16452239e-02,
        -1.11261887e+00, -1.33810528e+00,  2.84212909e-01],
       [ 1.05412429e-01,  9.62104753e-01, -9.51999530e-03,
         2.31124272e-01,  9.29254484e-01, -5.84621864e-01],
       [-2.82752595e-01, -1.00763038e-01,  1.49771477e-01,
        -7.81127368e-01,  1.10600040e+00, -1.03354257e+00],
       [ 6.23017701e-02, -4.69149676e-02,  6.56474606e-01,
         9.25057384e-01,  4.09338098e-01, -3.12559651e-01],
       [-7.51786133e-01, -9.03077530e-01,  4.36249525e-01,
        -7.56957751e-01, -1.52692591e+00,  2.13425814e+00],
       [-5.52618543e-01, -8.24666042e-01,  6.60124236e-01,
        -8.96155963e-01, -6.08543226e-01,  1.29536595e-02],
       [-6.58158306e-01, -1.23012960e+00, -8.82904700e-01,
        -8.24210805e-01,  1.25756607e-03,  3.65410610e-01],
       [-3.55771528e-01, -9.40035435e-01, -7.09654842e-01,
         6.27766387e-01, -5.05732477e-02,  4.99329011e-02],
       [ 6.14123605e-01,  2.08646620e+00,  1.48454871e+00,
         1.20886733e+00,  8.52635020e-01, -1.24475213e+00],
       [-6.55693853e-01, -2.18584595e-01,  2.47348392e-01,
        -1.89314446e-01, -5.02205260e-01,  9.44491123e-02],
       [ 1.58617457e+00,  7.97110292e-01,  1.18959191e-01,
        -4.41061011e-01, -5.62120845e-01,  7.86911707e-01],
       [-4.41617876e-01, -7.18726629e-01,  8.28438586e-01,
        -3.40672419e-01, -2.99811857e-01,  2.46776904e-01],
       [-4.38669427e-01,  2.17491065e-01,  2.25606682e-01,
        -2.14202060e-02, -1.09095078e-01,  6.85098299e-01],
       [-1.03488502e+00, -1.24419495e+00, -1.36924538e-01,
        -5.88683983e-02, -1.11152747e+00,  1.95226460e+00],
       [ 6.13047931e-01,  5.96422528e-01, -3.11757828e+00,
         3.91746020e-01,  9.27502200e-02,  5.18426272e-02],
       [-5.18330456e-02,  1.37692159e+00, -1.08294970e+00,
        -4.04799959e-01, -3.09276614e-01,  4.24008514e-01],
       [-5.58457752e-01, -5.02145516e-01,  1.42395728e-01,
        -6.84205280e-01,  8.81843285e-03, -1.96627258e-02],
       [-6.17285171e-02, -1.10134851e-01,  1.22496465e+00,
        -9.72875943e-01, -1.35134053e-01,  4.23881862e-01],
       [-4.19565064e-01, -3.28739566e-01,  3.05450342e-01,
        -4.13423496e-01,  3.23716320e-02,  1.30353765e-01],
       [ 3.06583623e-01, -9.08665447e-01,  2.17643348e-01,
         3.84861943e-01,  6.21764978e-02,  1.89300542e-01],
       [-1.16145602e+00, -6.78658516e-01, -6.20818622e-01,
        -5.48560462e-01, -1.94758103e+00,  2.39231451e+00],
       [-5.33620819e-01, -1.70141038e+00,  6.17839206e-01,
        -1.30733091e+00, -2.24324202e+00,  3.18795067e+00],
       [-1.73552203e-02, -4.40813569e-01,  4.55334390e-01,
         4.15213547e-01,  2.72688684e-01, -1.29939186e-03],
       [-1.08938585e+00, -5.32551106e-01, -7.13046565e-01,
        -3.15592112e-02,  8.25592857e-01, -5.20990424e-01],
       [-1.02076352e+00, -9.12055617e-01, -5.51805464e-01,
        -9.14764652e-01, -2.09721434e+00,  1.00236397e+00]])

In the above cell, we printed out the full arrays. This may be necessary for research transparency but in some settings, for example when you deal with large data, may not be practical. In such settings, you could select and print out only a couple of rows. For example, to print out the first 3 rows, you type in:

mobility_trends_UK_standardised[0:3,]
array([[-2.41537629, -0.58741049,  0.21013407, -0.79128893, -1.67445518,
         1.27693514],
       [ 1.31187764, -0.69272736,  0.289077  , -0.17544953, -0.49361403,
         0.05607967],
       [ 1.70485729,  0.33317728, -0.06065125,  0.70274151,  0.32576353,
        -0.66799805]])

We now fit the k-means class we already created (kmeans) to our data. This will perform 10 runs of the k-means algorithm (each with a different centroid seed) on your data with a maximum of 300 iterations per run:

kmeans.fit(mobility_trends_UK_standardised)
KMeans(n_clusters=3, random_state=0)

You can access estimator’s learned parameters using an underscore suffix ‘_’. For example, the attribute labels_ will display the cluster each observation or sample (in our example, county) belongs to. The labels of the clusters can be accessed by typing your k-means object (which we called ‘kmeans’) followed by a ‘.’ and the labels_ attribute.

kmeans.labels_
array([2, 0, 1, 0, 1, 1, 1, 2, 0, 2, 0, 1, 0, 2, 0, 2, 2, 2, 0, 2, 2, 1,
       1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,
       2, 0, 0, 1, 1, 0, 2, 0, 2, 0, 1, 0, 0, 1, 2, 1, 0, 1, 1, 0, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1,
       0, 1, 0, 1, 2, 0, 2, 1, 0, 0, 0, 0, 1, 2, 1, 0, 0, 1, 1, 2, 1, 0,
       2, 1, 0, 2, 0, 0, 2, 1, 0, 1, 2, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0,
       0, 0, 0, 0, 2, 2, 0, 0, 2], dtype=int32)

The cluster labels indicate that, for example, the first county, Aberdeen City, is assign to cluster 2, the second county, Aberdeenshire, to cluster 0, and so on.

You can also access the coordinates of cluster centers using the cluster_centers_ attribute. This will show the means of the points in each cluster for each of the six variables.

kmeans.cluster_centers_
array([[-0.06582254, -0.20255281, -0.37011903, -0.23552421,  0.08304705,
         0.02092666],
       [ 0.78930156,  0.86012023,  0.41426658,  0.911355  ,  0.70390283,
        -0.80759439],
       [-1.32044187, -1.1068233 ,  0.15879976, -1.11968451, -1.53739783,
         1.4688833 ]])

You can include the cluster assignment as a column in your original DataFrame. Let’s name the new column ‘clusters’.

mobility_trends_UK_mean_NaNdrop["clusters"] = kmeans.labels_
mobility_trends_UK_mean_NaNdrop
Retail_Recreation Grocery_Pharmacy Parks Transit_stations Workplaces Residential clusters
sub_region_1
Aberdeen City -50.046371 -10.722567 20.557692 -46.127016 -42.489919 14.567010 2
Aberdeenshire -28.253669 -11.248447 22.474684 -39.953878 -37.207661 12.222222 0
Angus Council -25.955975 -6.125786 13.982143 -31.150943 -33.542339 10.831551 1
Antrim and Newtownabbey -29.377358 -7.465409 -29.134328 -53.752621 -33.679435 12.859031 0
Ards and North Down -27.262055 0.452830 6.838298 -41.721311 -35.991935 12.679039 1
... ... ... ... ... ... ... ...
Windsor and Maidenhead -42.714885 -11.178197 0.379455 -43.693920 -43.711694 16.709220 2
Wokingham -39.044025 -16.285115 30.458101 -51.299790 -45.034274 18.237327 2
Worcestershire -36.025497 -9.990563 26.511954 -34.033107 -33.779758 12.112019 0
Wrexham Principal Area -42.293501 -10.448637 -1.860140 -38.511530 -31.306452 11.113895 0
York -41.892276 -12.343621 2.055319 -47.364729 -44.381048 14.039666 2

141 rows × 7 columns

Choosing the optimal number of clusters

In the example above, our choice of the number of clusters, k, was arbitrary. Let’s find a more informative method of choosing the optimal k for our data. One such method is the Elbow method for choosing optimal k.

Using the Elbow method, we run the k-means algorithm with various values of k and plot each value of k against the sum of squared distances between each data point (county in the UK) and its cluster centroid. For the case of k = 1 all data points will be assigned to the same cluster, resulting in higher sum of squared distances. As k increases, the sum of squared distances will be close to zero because each data point would be assigned to its own cluster.

We perform multiple runs of the k-means clustering algorithm using a for loop.

Performing for loop

A for loop is used to repeatedly execute a block of code, and is perfect fit for repeatedly executing the k-means algorithm. The for loop will iterate over a sequence of k values, and for each value of k will estimate the k-means algorithm.

Let’s first look at a simple example of a for loop:

for number in range(1, 4):
    print(number)
1
2
3

In this example (and in for loops in general), there are two parts:

  • for loop statement, which in this example is ‘for number in range (1,4):’

    • number is the variable name; we could have specified a different variable name;

    • range (1,4) specifies the set of values to loop or iterate over; range (1,4) is the range of numbers 1, 2, 3. The first argument (1) is the starting point, and the second argument (4) is the endpoint (not included in the range)

    • the word ‘in’ connects the two components in the for loop statement

    • the for loop statement ends in a colon ‘:’.

  • the loop body, which contains the code to be executed at each iteration of the for loop. Each line in the loop body is indented four spaces, and this indentation is how the interpreter knows that a line is part of the loop or not. In our example, ‘print(number)’ is the loop body.

At each iteration of the for loop, the variable number is assigned the next number in the range from 1 to 3, and then the value of number is printed. The loop runs once for each number in the sequence from 1 to 3, so the body loop ‘print(number)’ executes 3 times.

This loop description draws on the Real Python’s book Python Basics (pages 153–154) and on the Kaggle’s Python tutorial.

Choosing k via for loop

We are now ready to apply the for loop to the k-means algorithm. In the code below, the for loop statement is ‘for k in K’ where k is the variable name and K is the set of values ranging from 1 to 30. The loop body contains the three lines of code related to the k-means() initiation, estimation, and output. Each of the three lines in the loop body are indented four spaces. The loop will run 30 times, so all three lines related to the k-means() algorithm will be executed 30 times.

# Run the k-means algorithm for values of k between 1 and 30

Sum_of_squared_distances = []  # Initialise a list

K = range(
    1, 31
)  # range with a starting point 1 and endpoint 31, which is not included in the range
for k in K:  # a for loop iterating over values of k ranging from 1 to 30
    kmeans = KMeans(n_clusters=k)  # Initialise the KMeans estimator for a value of k
    kmeans.fit(
        mobility_trends_UK_standardised
    )  # Perform the KMeans estimator by the fit() method
    Sum_of_squared_distances.append(
        kmeans.inertia_
    )  # Store the sum of squared distances (stored in kmeans.inertia_)
    # for each run using the Python append() function.
Sum_of_squared_distances
[846.0,
 547.7080256569716,
 429.7112613888795,
 365.4782968827096,
 328.8362897853743,
 301.72829951678796,
 281.05116080411847,
 262.8764097559398,
 251.88350703120256,
 239.34973620094246,
 223.86683466112927,
 214.68357391767273,
 209.50534998708247,
 197.3101329363527,
 189.8774747310233,
 183.84087640511768,
 177.90242231870533,
 171.01789694172618,
 162.46276850451392,
 159.95180340703382,
 152.35221938560662,
 145.9763201437044,
 142.81737201523777,
 137.44933254954873,
 134.80252297181383,
 129.93053030365422,
 123.79731323466231,
 119.02326607285883,
 114.9870769137471,
 111.87382179317166]

Elbow plot

Let’s plot k against the sum of squared distances. The plot below shows how the sum of squared distances varies with values of k between 1 and 30.

# Plot size
plt.figure(figsize=(8.2, 5.8))

# Generate the plot
grid = sns.lineplot(x=K, y=Sum_of_squared_distances)

# Add x and y labels
labels = grid.set(xlabel="Number of clusters, k", ylabel="Total squared distances")
../_images/06_pattern_discovery_using_unsupervised_learning_42_0.png

For our data set, the elbow of the curve (where the curve “bends”) is not apparent but total squared distances seem to decrease slowly after k = 4. So we rerun our k-means algorithm with k = 4.

k = 4
kmeans_k4 = KMeans(
    n_clusters=k, init="k-means++", n_init=10, max_iter=300, random_state=0
)

kmeans_k4.fit_transform(mobility_trends_UK_standardised)
array([[5.04566964, 4.50949325, 3.01211976, 1.22287598],
       [2.47693818, 2.62868693, 1.62080111, 3.45425377],
       [1.3003252 , 2.26931091, 2.42065505, 4.87648492],
       [3.78652204, 1.69736822, 2.80123963, 4.08388795],
       [2.43619727, 2.27100614, 2.6755362 , 4.50826352],
       [2.35336613, 4.06071712, 4.12945515, 6.48256892],
       [1.19812634, 1.69699685, 2.36632043, 4.95119228],
       [5.63435577, 5.12630547, 3.47305719, 1.24210402],
       [3.31527914, 2.26316156, 1.36184738, 2.64584341],
       [3.91107752, 2.74699518, 1.73162013, 1.61362137],
       [3.23295316, 1.57734068, 2.87257396, 4.69992362],
       [1.42323139, 2.99252444, 3.2625403 , 5.80151349],
       [1.9868273 , 1.08868625, 1.36774609, 3.78215571],
       [4.81014859, 4.0108717 , 2.66323944, 0.7956978 ],
       [2.42377329, 1.54752186, 1.22286252, 3.2322439 ],
       [4.95676567, 4.57502553, 2.8127199 , 0.88207805],
       [6.0778808 , 5.76623296, 3.96517641, 1.80031318],
       [4.00324484, 3.53155206, 1.92038439, 1.39348625],
       [2.21221937, 1.33299872, 1.57762967, 3.85001559],
       [3.83037626, 3.38499741, 1.64938171, 1.2633768 ],
       [6.49018296, 5.30264079, 4.25308026, 1.96176341],
       [2.13250588, 2.57142549, 2.56811458, 4.62071668],
       [2.16062392, 3.06827114, 4.01195094, 6.43820377],
       [2.16336886, 2.86141265, 1.55576949, 3.30182236],
       [2.58694331, 2.26023023, 1.66979223, 3.49667667],
       [2.54594373, 2.69751661, 0.66519137, 2.43677741],
       [2.75980801, 2.43637953, 0.5873986 , 2.189648  ],
       [1.84810182, 3.7594953 , 3.49769804, 5.63411847],
       [3.36500349, 5.42478723, 4.5098375 , 6.41226609],
       [0.89475272, 2.49282263, 1.6441581 , 4.2259671 ],
       [0.82299406, 3.10233895, 2.49617003, 4.85241859],
       [3.0540805 , 1.11916236, 1.8278122 , 3.55587669],
       [1.29114706, 1.85234   , 2.62858964, 5.1557304 ],
       [2.95340139, 2.84462279, 1.35277911, 2.74014982],
       [1.43635026, 3.04140168, 1.94487335, 4.08642567],
       [1.88750728, 1.49787415, 2.70557177, 5.035078  ],
       [1.27540103, 3.79336631, 3.03485238, 5.34532345],
       [2.10744321, 4.26357273, 2.81376763, 4.65089272],
       [1.22916785, 3.22630651, 3.41029138, 5.99173615],
       [3.20253825, 2.59097533, 1.28578913, 2.34702054],
       [2.14532941, 2.4787089 , 2.42601355, 4.19214776],
       [4.98826038, 3.50358604, 4.56193468, 5.27819359],
       [1.67390117, 4.13639275, 3.12392108, 5.36639476],
       [1.26486256, 3.07494385, 1.78324735, 4.04594693],
       [7.70847761, 6.70218335, 5.51864264, 2.90076983],
       [2.74098553, 2.6120279 , 1.0038274 , 2.53682572],
       [2.87919726, 2.19946653, 1.14378677, 2.85199774],
       [2.58340653, 1.98121907, 3.53222376, 5.80422916],
       [1.05929409, 2.29986322, 1.62101985, 4.1115384 ],
       [2.77019801, 1.52054039, 1.35405766, 3.22529125],
       [5.92350259, 4.76471804, 3.64893739, 1.632762  ],
       [2.13467954, 2.69196149, 0.919198  , 2.94858772],
       [6.07288086, 5.58202021, 3.97115996, 1.46266203],
       [2.79554158, 2.68948577, 0.58751824, 2.26048492],
       [0.89842303, 2.91376263, 2.94727422, 5.4733001 ],
       [2.51033563, 2.87114994, 1.04103308, 2.63144655],
       [3.69897039, 2.42715675, 3.69509245, 5.52416318],
       [1.94330508, 2.86723277, 2.22832533, 4.63689781],
       [3.99699334, 3.47763129, 1.9320706 , 1.2443638 ],
       [1.56568879, 3.34530856, 2.33706415, 4.64591563],
       [3.70312069, 1.8928401 , 3.47236901, 4.91732907],
       [1.33156857, 2.35253708, 3.08158649, 5.65230424],
       [1.50516965, 3.79954956, 3.7845444 , 6.33227007],
       [1.81494689, 2.32581749, 0.82610413, 3.19904648],
       [2.16612644, 2.32256385, 1.99274074, 4.35574941],
       [1.35885683, 2.99160696, 1.67474588, 4.06017201],
       [3.7527546 , 3.77331365, 2.0297521 , 2.75979523],
       [2.21222118, 2.31396699, 0.80377903, 2.80661356],
       [1.12327546, 3.30440273, 3.25704519, 5.83380732],
       [3.58536247, 1.85828087, 2.61320207, 3.63128213],
       [4.12389682, 2.57491558, 2.94590157, 4.20102445],
       [2.15064991, 2.39589618, 0.64530149, 2.80374165],
       [2.15938102, 2.15279659, 0.40980959, 2.90018924],
       [3.9055487 , 2.22732609, 4.26824804, 6.26538848],
       [1.83808668, 1.70862451, 2.80959415, 5.06294459],
       [2.91013144, 1.93948426, 2.43370615, 4.45258233],
       [2.2422943 , 2.29171174, 2.06990336, 3.72582045],
       [3.77560688, 3.17735838, 1.5496857 , 1.79343351],
       [3.26910562, 2.68487471, 1.89399813, 3.05899099],
       [1.99787171, 2.59533878, 2.64601988, 5.12835041],
       [3.3462147 , 3.36015143, 3.99890849, 5.7799169 ],
       [2.76663745, 2.01683317, 0.63335189, 2.48998643],
       [1.75662781, 2.62028536, 3.33402792, 5.67671653],
       [0.93884667, 3.00363082, 2.2563041 , 4.79080907],
       [2.27849558, 2.83721861, 3.82212792, 6.03725973],
       [1.9742459 , 3.41599717, 3.62245039, 6.17410372],
       [3.51604344, 1.06343546, 2.82994931, 4.36761373],
       [1.59126789, 2.64219696, 3.25675522, 5.82599461],
       [3.14594168, 3.05188033, 1.58218108, 2.81552348],
       [1.20964787, 3.04416457, 2.12956719, 4.36785299],
       [1.95147469, 1.57197865, 1.29177045, 3.68011088],
       [1.13994117, 3.50176752, 2.69812049, 4.9377037 ],
       [5.4068497 , 5.28949581, 3.56698561, 2.73488982],
       [1.7800033 , 2.25370853, 1.13622088, 3.50381084],
       [3.99400257, 3.43832344, 1.85154615, 1.20925628],
       [1.90180548, 3.71230084, 3.80583123, 6.27537405],
       [2.75442991, 2.84043816, 0.94135893, 2.50916025],
       [2.96276092, 1.56083336, 2.14888028, 4.11439899],
       [2.6067065 , 3.30191569, 1.67113275, 3.41168003],
       [3.16743305, 3.0980149 , 1.42217789, 2.35168104],
       [1.31272557, 2.2486955 , 1.68569781, 4.32472091],
       [6.22696831, 5.3805621 , 4.07644653, 1.74208451],
       [1.94738474, 2.22118897, 2.48769421, 4.81700582],
       [4.06632173, 3.14254286, 2.9194489 , 3.4054936 ],
       [2.32398421, 1.17895364, 1.38137454, 3.58144638],
       [1.35635022, 2.16501713, 1.98342243, 4.5314781 ],
       [0.98334406, 2.51141366, 1.9751961 , 4.5795432 ],
       [4.51899203, 3.73359642, 2.54591335, 2.43381982],
       [1.46008785, 3.49327828, 2.88111841, 5.20078181],
       [1.75263745, 1.57317036, 1.54657994, 3.88408702],
       [3.38842338, 3.7252572 , 1.75606989, 2.26193643],
       [1.64889901, 2.98158215, 2.05832151, 4.22109929],
       [1.89343999, 2.24161257, 0.93183205, 3.44686497],
       [4.38759674, 3.66805042, 2.3297036 , 1.61077625],
       [3.07350247, 3.12443954, 1.6452764 , 2.5612204 ],
       [1.48812828, 2.23168941, 1.17636061, 3.68707324],
       [3.80108261, 3.21136222, 1.97261914, 1.92152769],
       [1.19751844, 1.8954884 , 1.89520261, 4.40973804],
       [2.31066106, 2.32539551, 1.76530322, 4.04419728],
       [1.26355828, 2.64443275, 1.48693168, 3.91849876],
       [4.69564833, 4.21611107, 2.65913276, 1.07012616],
       [3.21794734, 3.13735726, 1.14359739, 2.10922838],
       [3.65356191, 2.36798452, 1.57421401, 2.36670423],
       [2.70708072, 2.12178377, 1.37615431, 3.08978484],
       [1.6824539 , 3.97332374, 3.63315124, 5.96153282],
       [2.61632817, 2.54788451, 0.62779367, 2.36014248],
       [2.66351031, 2.62405874, 2.28809987, 3.84636431],
       [2.78823738, 3.04083381, 0.8450561 , 2.36798068],
       [2.42425876, 2.38244569, 0.86040971, 2.64395037],
       [4.4776361 , 3.84687199, 2.45234533, 1.36333137],
       [3.86505674, 1.67654658, 3.56763307, 4.98620071],
       [2.85933146, 1.65128535, 2.16534265, 3.59976031],
       [2.74228878, 2.38805378, 0.60926247, 2.51372308],
       [2.84256801, 3.21854111, 1.35780561, 2.74091495],
       [2.46130284, 2.37171651, 0.33238111, 2.61227263],
       [2.24577048, 2.45421304, 1.00646126, 3.14919793],
       [5.20285954, 4.25485027, 3.21405765, 1.42658464],
       [6.1704489 , 5.59614227, 4.1646318 , 2.14453473],
       [1.83319912, 2.43144008, 0.84560399, 3.29084402],
       [2.81392292, 2.09360832, 1.69170905, 3.53257811],
       [4.70707215, 3.77226267, 2.59610041, 1.0732578 ]])

Let’s view to which cluster each observation or sample (in our example, UK county) in our data set was assigned to:

kmeans_k4.labels_
array([3, 2, 0, 1, 1, 0, 0, 3, 2, 3, 1, 0, 1, 3, 2, 3, 3, 3, 1, 3, 3, 0,
       0, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 2, 0, 1, 0, 0,
       3, 2, 2, 1, 0, 2, 3, 2, 3, 2, 0, 2, 1, 0, 3, 0, 1, 0, 0, 2, 2, 0,
       2, 2, 0, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 0, 0, 2, 0, 0, 0, 0, 1, 0,
       2, 0, 2, 0, 3, 2, 3, 0, 2, 1, 2, 2, 0, 3, 0, 2, 1, 0, 0, 3, 0, 2,
       2, 0, 2, 3, 2, 2, 3, 0, 2, 0, 3, 2, 2, 2, 0, 2, 2, 2, 2, 3, 1, 1,
       2, 2, 2, 2, 3, 3, 2, 2, 3], dtype=int32)

Here are also the centers of the four detected clusters:

kmeans_k4.cluster_centers_
array([[ 0.77303297,  0.7952798 ,  0.53804385,  0.9729365 ,  0.71215567,
        -0.83613574],
       [ 0.46188499,  0.41371826, -1.65713325, -0.18743052,  0.55582752,
        -0.21609699],
       [-0.210301  , -0.34287992,  0.15765022, -0.24250778, -0.09133243,
         0.11912676],
       [-1.40669657, -1.12453327,  0.10615267, -1.1449252 , -1.62755955,
         1.50369502]])

K-Means is an an iterative algorithm so a set of operations are repeatedly performed until sum of distances from each observation to its cluster centroid is minimised and the cluster assignment no longer updates. How many iterations were needed for the algorithm to converge in our case?

kmeans_k4.n_iter_
11

As we did earlier, we add the cluster assignment as a column to our DataFrame. We name the column ‘clusters_k4’.

# Add the 4-cluster assignment to your DataFrame
mobility_trends_UK_mean_NaNdrop["clusters_k4"] = kmeans_k4.labels_
mobility_trends_UK_mean_NaNdrop
Retail_Recreation Grocery_Pharmacy Parks Transit_stations Workplaces Residential clusters clusters_k4
sub_region_1
Aberdeen City -50.046371 -10.722567 20.557692 -46.127016 -42.489919 14.567010 2 3
Aberdeenshire -28.253669 -11.248447 22.474684 -39.953878 -37.207661 12.222222 0 2
Angus Council -25.955975 -6.125786 13.982143 -31.150943 -33.542339 10.831551 1 0
Antrim and Newtownabbey -29.377358 -7.465409 -29.134328 -53.752621 -33.679435 12.859031 0 1
Ards and North Down -27.262055 0.452830 6.838298 -41.721311 -35.991935 12.679039 1 1
... ... ... ... ... ... ... ... ...
Windsor and Maidenhead -42.714885 -11.178197 0.379455 -43.693920 -43.711694 16.709220 2 3
Wokingham -39.044025 -16.285115 30.458101 -51.299790 -45.034274 18.237327 2 3
Worcestershire -36.025497 -9.990563 26.511954 -34.033107 -33.779758 12.112019 0 2
Wrexham Principal Area -42.293501 -10.448637 -1.860140 -38.511530 -31.306452 11.113895 0 2
York -41.892276 -12.343621 2.055319 -47.364729 -44.381048 14.039666 2 3

141 rows × 8 columns

Our next step will be to assess the way in which clusters are similar or different with respect to each mobility category. To accomplish this, we plot the clusters against each mobility category using the Seaborn function catplot.

# Create a variable 'mobility_category'
mobility_categories = [
    "Retail_Recreation",
    "Grocery_Pharmacy",
    "Parks",
    "Transit_stations",
    "Workplaces",
    "Residential",
]

# Use a for loop to plot the clusters across the six mobility categories

for mobility_category in mobility_categories:
    sns.catplot(
        x="clusters_k4",
        y=mobility_category,
        kind="swarm",
        data=mobility_trends_UK_mean_NaNdrop,
    )
../_images/06_pattern_discovery_using_unsupervised_learning_54_0.png ../_images/06_pattern_discovery_using_unsupervised_learning_54_1.png ../_images/06_pattern_discovery_using_unsupervised_learning_54_2.png ../_images/06_pattern_discovery_using_unsupervised_learning_54_3.png ../_images/06_pattern_discovery_using_unsupervised_learning_54_4.png ../_images/06_pattern_discovery_using_unsupervised_learning_54_5.png

Dimensionality Reduction via PCA

The above plots visualise each variable separately. A more informative approach would be to take into account all six dimensions simultaneously. However, there is a difficulty in visualising and perceiving multidimensional data beyond two or three dimensions. One solution is to use the dimensionality reduction technique Principal Component Analysis (PCA).

We can apply the PCA to reduce the six mobility trends to just 2 dimensions, and then use those 2-dimensional approximations to visualise our clusters using a scatter plot.

The sklearn library is very consistent so the workflow we used to run k-means applies to PCA too. We first initialise the PCA estimator using the default arguments except for n_components where we specify to keep only 2 components. Then we perform the estimator using the fit() method. Below we use the fit_transform() method to simultaneously fit the estimator to data and apply the dimensionality-reduction transformation to data.

# We reuse our standardised data set
mobility_trends_UK_standardised
array([[-2.41537629e+00, -5.87410490e-01,  2.10134074e-01,
        -7.91288930e-01, -1.67445518e+00,  1.27693514e+00],
       [ 1.31187764e+00, -6.92727365e-01,  2.89076997e-01,
        -1.75449527e-01, -4.93614030e-01,  5.60796729e-02],
       [ 1.70485729e+00,  3.33177282e-01, -6.06512500e-02,
         7.02741514e-01,  3.25763531e-01, -6.67998047e-01],
       [ 1.11969050e+00,  6.48938482e-02, -1.83621506e+00,
        -1.55202810e+00,  2.95115745e-01,  3.87645364e-01],
       [ 1.48147561e+00,  1.65066306e+00, -3.54839347e-01,
        -3.51770707e-01, -2.21840285e-01,  2.93929581e-01],
       [ 2.00258288e+00,  2.11421107e-01,  3.05692453e-01,
         2.65069849e+00,  3.83734106e-01, -1.67810664e+00],
       [ 1.09194483e+00,  6.58562770e-01, -4.56719269e-01,
         4.51770033e-01,  9.42324863e-01, -8.67190528e-01],
       [-2.21396518e+00, -1.85507507e+00,  4.28819439e-01,
        -9.18325110e-01, -2.03772158e+00,  1.31997203e+00],
       [-3.28335111e-01, -9.81312186e-01, -4.81345232e-01,
        -8.88417676e-01,  5.17312190e-01,  6.06094340e-01],
       [-7.31534167e-01, -7.40711678e-01, -7.39828984e-01,
        -1.01490455e+00, -1.09824341e+00,  5.09791231e-01],
       [-4.17089442e-01, -2.39916438e-01, -2.18071876e+00,
         4.71429465e-01,  1.22085915e+00, -5.87127887e-01],
       [ 6.34919974e-01,  6.12376786e-01, -9.35957607e-02,
         1.92957377e+00,  1.28576034e+00, -1.41051636e+00],
       [ 4.67831907e-01, -5.43432333e-02, -6.76448867e-01,
        -1.72103240e-01,  6.05199222e-01, -2.57406512e-01],
       [-9.44169824e-01, -8.01254388e-01, -3.82735021e-02,
        -1.10237086e+00, -1.48290652e+00,  2.02436811e+00],
       [-2.07332963e-01,  5.09093962e-01, -4.33669270e-01,
        -7.48919361e-01,  3.03679097e-01,  3.32261862e-02],
       [-1.08262990e+00, -1.54858287e+00,  6.84139397e-01,
        -1.05465004e+00, -1.94172190e+00,  1.27498524e+00],
       [-2.28479278e+00, -1.54696956e+00,  1.40169462e+00,
        -1.82849460e+00, -1.80585029e+00,  1.84153895e+00],
       [-4.13371668e-01, -1.19533632e+00,  2.14491224e-01,
        -4.03248568e-01, -1.00633215e+00,  1.55218767e+00],
       [ 9.14953837e-01, -4.32207224e-01, -7.37812544e-01,
        -1.56417523e-01,  4.56918025e-01, -2.04746545e-01],
       [-9.97292725e-01, -9.48753868e-01,  2.41068305e-01,
        -4.28912502e-01, -7.31326728e-01,  1.25170028e+00],
       [-2.34543767e+00, -1.84777037e+00, -9.86176808e-01,
        -1.78656312e+00, -2.53573545e+00,  1.38310243e+00],
       [-3.73703914e-01,  1.96974821e+00,  1.42562908e-01,
        -3.20385557e-01,  6.54325820e-01, -6.92023258e-01],
       [ 1.77154910e+00,  2.35265039e+00, -4.67063822e-01,
         1.03247984e+00,  1.14649320e+00, -1.04451564e+00],
       [ 2.80203585e-01,  9.05448548e-02,  6.35362397e-01,
         7.94229726e-01, -1.33953253e-01,  9.48051851e-01],
       [ 4.42795821e-02, -9.80157490e-02, -3.14236903e-01,
        -5.89667840e-01, -7.02608759e-01, -1.27607217e+00],
       [-2.31326176e-01, -5.18900999e-01,  4.54885264e-01,
         8.00835190e-02, -4.13592167e-01,  4.57931896e-01],
       [-5.46873897e-01, -3.50548290e-01,  4.54961929e-02,
        -2.41026933e-01, -5.12150647e-01,  3.24134813e-01],
       [ 5.01536367e-01,  2.45845230e+00,  1.11553059e+00,
         1.15950961e+00,  2.57256716e-01, -8.16295581e-01],
       [ 2.71938293e+00, -3.24968745e-01,  2.62289434e+00,
         1.64105820e+00, -4.13853194e-01, -1.30424830e+00],
       [ 4.13223226e-01,  1.26185094e-01,  6.32126041e-01,
         7.13201592e-01,  7.01723227e-01, -4.52718023e-01],
       [ 8.30740701e-01,  7.93206094e-01,  8.85967884e-01,
         1.32457083e+00,  5.78600973e-02, -8.01723499e-01],
       [ 2.78207349e-03, -3.91481883e-01, -1.42321175e+00,
        -7.68578794e-01,  5.60128950e-01, -1.85592263e-01],
       [ 9.99214987e-01,  1.14264592e+00, -5.03484910e-01,
         4.16424882e-01,  7.87002143e-01, -1.14459949e+00],
       [-5.63568276e-01,  1.58499239e-01,  6.15121148e-01,
        -1.32519642e+00, -1.59643308e-01, -1.41011204e-01],
       [-2.64686613e-01,  5.73815584e-01,  9.02861806e-01,
         1.24289248e+00,  2.63312668e-01, -1.08285957e-01],
       [ 7.36033354e-01,  1.51757097e+00, -9.55616532e-01,
         7.97048121e-02,  8.03057720e-01, -7.86508306e-01],
       [ 7.98585653e-01,  7.63343834e-01,  1.53908709e+00,
         1.74000469e+00,  6.02532075e-01, -6.86214329e-01],
       [ 1.93697825e-01,  5.74194792e-01,  2.32295345e+00,
         1.17977073e+00, -4.22511086e-02, -3.27218433e-01],
       [ 7.49658561e-01,  1.32402063e+00,  4.11615487e-01,
         1.43306852e+00,  1.31550672e+00, -1.63551190e+00],
       [-6.58641059e-01, -9.62056508e-01, -4.01955754e-01,
        -2.56596972e-01, -8.50119888e-01, -3.04841967e-01],
       [ 6.38522267e-01,  9.02912210e-01, -4.52383291e-01,
         1.45879310e+00, -5.36430789e-01,  5.04308110e-01],
       [ 2.25412405e+00,  1.26062345e+00, -2.70119627e+00,
        -2.28236281e-01, -1.43287852e+00,  1.60023489e+00],
       [ 6.18399411e-01,  6.69405557e-01,  2.13101597e+00,
         1.34271879e+00,  9.37555002e-01, -6.43560376e-01],
       [ 3.89600817e-01,  4.26595899e-01,  1.26236199e+00,
         6.67435614e-01,  1.14903272e-01, -2.51113270e-01],
       [-2.93314794e+00, -1.54457592e+00, -4.06682329e-01,
        -2.27125929e+00, -3.40943264e+00,  2.59972243e+00],
       [ 2.80645148e-01, -1.58584033e-01,  4.77542547e-01,
        -7.50851189e-01, -3.29725724e-01,  6.80465213e-01],
       [ 5.45997570e-01, -8.83124919e-01, -1.77862601e-01,
        -8.02041658e-01, -4.92211395e-02,  2.49417221e-01],
       [ 5.07990413e-01,  1.81230488e+00, -1.59848407e+00,
         3.01359819e-01,  1.28981667e+00, -1.30511626e+00],
       [ 6.41032672e-01,  7.25393078e-01,  5.87751098e-01,
         1.37225816e-01,  4.42522634e-01, -2.64987231e-01],
       [ 2.46243012e-01, -8.25605623e-01, -9.66875108e-01,
        -3.21222129e-01,  7.15671917e-02, -1.88649381e-01],
       [-2.16691003e+00, -1.47356459e+00, -6.04721112e-01,
        -2.26470128e+00, -1.72079496e+00,  1.05846299e+00],
       [-2.14333635e-01, -3.06264997e-01,  5.71263043e-01,
         5.27578736e-01, -8.70086780e-02,  4.00978766e-01],
       [-1.71155996e+00, -1.35866849e+00,  7.66843214e-01,
        -1.35821786e+00, -2.53682878e+00,  2.33003813e+00],
       [-2.72949807e-01, -6.52074448e-01,  4.39292077e-01,
        -4.99890346e-01, -3.05828277e-01,  3.51667405e-01],
       [ 1.05228158e+00,  7.64781929e-01,  2.59030425e-01,
         1.53251597e+00,  5.51933518e-01, -1.39435184e+00],
       [-3.40666694e-01, -1.02394470e-01,  7.10133590e-01,
         1.57309470e-01, -1.23213130e-01,  8.55885418e-01],
       [-7.16844000e-01, -1.84076537e-01, -2.56369869e+00,
         9.33426133e-01,  1.55492416e+00, -1.24951587e+00],
       [-8.68574473e-02, -7.16864763e-01,  4.07809474e-01,
         7.89535817e-01,  1.54761863e+00, -8.79109984e-01],
       [-6.44915163e-01, -4.09765378e-01,  3.00150506e-01,
        -8.68784983e-01, -1.05038802e+00,  1.60479716e+00],
       [ 1.09817702e+00, -9.47379435e-02,  1.29748466e+00,
         4.36084315e-01,  2.60967426e-03, -1.26638571e+00],
       [ 1.61298931e-01,  4.45696781e-01, -2.95432646e+00,
         9.23988198e-01, -1.84744710e-01, -3.74400569e-01],
       [ 9.30834348e-01,  9.88358987e-01, -5.37702293e-01,
         1.22332637e+00,  1.10733781e+00, -1.41475499e+00],
       [ 1.49008100e+00,  1.51715112e+00,  1.10412350e+00,
         1.43369595e+00,  1.39978813e+00, -1.31015656e+00],
       [ 2.52791134e-01,  2.42628706e-02,  4.89546495e-01,
         1.35651975e-01,  1.00990087e-01,  3.26616517e-01],
       [-4.28650440e-01, -4.15575521e-01, -1.49066157e-01,
         6.49273481e-02,  1.27758176e+00, -1.24305429e+00],
       [ 2.81747471e-01, -1.51138242e-02,  1.18620310e+00,
         6.11550108e-01,  6.57643438e-01, -2.07875846e-01],
       [-1.62860149e+00, -1.29204685e+00,  8.76055007e-01,
        -8.61185510e-01,  2.72884070e-01, -3.00251579e-01],
       [-2.89775591e-02, -4.58944935e-02,  1.86393205e-01,
         2.83582437e-01, -2.33198855e-01,  5.95843693e-01],
       [ 1.11291967e+00,  1.32099448e+00,  9.23389281e-01,
         9.76095144e-01,  1.53256874e+00, -1.05584050e+00],
       [ 5.49941708e-01, -1.06404494e-01, -2.03025104e+00,
         3.69783876e-01, -5.08036517e-01,  1.04620831e+00],
       [ 1.60547630e-01, -3.88542941e-01, -1.14117162e+00,
        -2.44514578e+00,  1.28530964e+00, -2.04951200e-01],
       [-2.91278134e-01,  2.03615734e-01,  4.31891105e-01,
        -1.20716914e-01, -2.05275379e-01,  2.09543280e-01],
       [-8.39879102e-02, -1.42133957e-01,  2.21155977e-01,
        -3.75182043e-01, -1.24610121e-01, -1.79113271e-01],
       [ 1.06734102e+00,  5.48559756e-01, -3.06610248e+00,
         2.97031258e-01,  1.99696925e+00, -7.44713373e-01],
       [ 1.16379315e+00,  1.37776129e+00, -1.00397095e+00,
         8.44958686e-01,  5.73199329e-01, -1.48322085e-01],
       [-6.74534146e-01,  5.35579285e-02, -1.02887208e+00,
        -6.98515922e-01,  1.35336575e+00, -1.24003431e+00],
       [ 5.87231749e-01,  1.03348521e+00, -3.23060161e-01,
         7.06434607e-01, -6.90571122e-01,  6.32351230e-01],
       [-3.80270812e-01, -1.41093100e+00,  1.79398017e-01,
        -9.56948230e-01, -3.16037155e-01,  9.37975090e-01],
       [-3.37908510e-02, -1.59015043e+00, -7.77171529e-01,
         7.02532371e-01, -5.74271117e-01,  1.06204838e-01],
       [ 7.96988228e-01, -5.49345061e-01, -1.91604644e-01,
         4.00940523e-01,  9.45183094e-01, -1.96264956e+00],
       [ 6.74002805e-01,  3.34517314e+00, -2.72192267e-01,
        -1.00009998e+00,  6.06100628e-01, -4.82884217e-01],
       [-5.10347549e-01, -4.48581330e-01, -3.15824134e-01,
        -4.92510164e-01, -6.40438031e-03,  1.96856875e-01],
       [ 1.25630113e+00,  2.15658096e+00, -2.47587397e-01,
         7.96093374e-01,  1.03336681e+00, -3.38756851e-01],
       [ 9.78453000e-01,  1.54057068e-01,  1.08356714e+00,
         7.36095370e-01,  9.16510695e-01, -6.55529139e-01],
       [ 1.44884682e+00,  2.05497753e+00, -1.00373159e+00,
         1.75807659e+00,  3.99678778e-01, -5.97764058e-01],
       [ 4.48111213e-01,  2.45009017e-01,  4.14817282e-02,
         1.88084347e+00,  1.86942038e+00, -1.87495438e+00],
       [ 6.39264986e-01, -1.73817839e-01, -2.48957355e+00,
        -2.70478669e-01,  3.67707787e-01, -7.82936259e-02],
       [ 6.25956022e-01,  7.08942028e-01, -4.92139556e-01,
         1.55959998e+00,  1.60936490e+00, -1.37707138e+00],
       [ 3.46468406e-03, -5.80221591e-02,  8.56699378e-01,
        -1.57311556e+00,  7.02150836e-02,  4.21075187e-01],
       [ 7.62039740e-01,  2.55262608e-01,  8.58026425e-01,
         1.34394919e+00, -1.23652422e-01, -3.53544690e-01],
       [-1.17816393e-01, -2.25268232e-01, -5.08726992e-01,
         4.02392764e-01,  7.69069752e-01, -9.54373891e-02],
       [ 5.71631132e-01,  1.12726832e+00,  1.37285954e+00,
         1.29921707e+00,  1.54463075e-01, -6.50961458e-01],
       [-3.03161070e+00, -1.35627621e+00,  1.28228457e+00,
        -1.58863061e+00, -8.68305156e-01, -1.17706720e-01],
       [-3.19895877e-01, -2.73070121e-01,  1.10344469e-01,
         7.32221235e-01,  4.21126285e-01, -1.24111849e-01],
       [-5.35541037e-01, -1.04817560e+00,  1.13107674e-01,
        -5.19481536e-01, -1.07653918e+00,  1.45166915e+00],
       [ 1.37749376e+00,  2.06967224e+00,  1.03342075e+00,
         3.60129778e-01,  1.18125200e+00, -1.72294656e+00],
       [-5.58719885e-01, -9.33086936e-01,  3.27320542e-01,
         9.97825305e-02, -5.34177276e-01, -1.53553186e-01],
       [-2.17756147e-01, -2.83057826e-01, -1.15057885e+00,
        -9.02011964e-01,  1.22040845e+00, -7.45152709e-01],
       [-9.66288743e-01, -9.95747909e-01,  7.60416248e-01,
         5.80423774e-01,  1.24447538e-01, -7.22197418e-01],
       [-8.48884276e-01, -1.14344048e+00,  1.10948869e-01,
         3.84525108e-01, -8.42457942e-01, -7.20442253e-04],
       [ 5.11217435e-01, -9.29693301e-02,  3.85360723e-01,
         1.45695782e-01,  1.08564833e+00, -6.99833046e-01],
       [-1.96303916e+00, -6.90117766e-01,  2.70115427e-01,
        -2.31438314e+00, -1.90070795e+00,  2.53689337e+00],
       [-6.12862155e-01,  3.95314916e-01, -6.28826401e-01,
         1.00432558e+00,  1.21454931e+00, -1.14703650e+00],
       [-6.40820778e-01,  1.76738105e+00, -4.34158845e-01,
        -1.95880190e+00, -6.46852958e-01,  6.48396695e-01],
       [-6.24754977e-02,  2.36955791e-01, -7.06181374e-01,
        -5.99382187e-01,  6.53424414e-01, -2.30024611e-01],
       [ 1.23489386e+00, -1.47549684e-01,  5.87986383e-02,
         3.20010005e-01,  4.58338698e-01, -9.66638515e-01],
       [ 5.26336158e-01,  2.17542370e-01,  6.01233186e-01,
         6.01633925e-01,  1.25008009e+00, -4.60496924e-01],
       [ 4.42732842e-01, -2.28751942e+00, -1.07182966e-01,
        -1.24856175e+00, -6.95979555e-01,  1.02781782e+00],
       [ 2.44440855e-01,  2.12100527e-01,  8.73219238e-01,
         2.12948099e+00,  9.10159412e-01, -6.83801744e-01],
       [ 9.20630362e-02,  1.79092743e-01, -6.04521843e-01,
         3.93210020e-01, -5.10872875e-02, -9.01347151e-01],
       [-1.90500018e-01, -3.77676214e-01,  1.40173167e+00,
        -6.76644044e-01, -5.94877340e-01,  1.16431008e+00],
       [ 1.22055326e+00,  5.76189492e-01,  1.13972337e+00,
         1.74745081e-01,  3.19772526e-01,  3.11688572e-01],
       [-2.23343653e-01, -1.15253437e-01,  3.92435213e-01,
        -2.85070809e-01,  5.49514272e-01, -4.71387384e-01],
       [-2.03419071e+00, -6.26516482e-01, -2.05213605e-01,
        -1.01632511e+00, -1.22356368e+00,  2.09040279e-01],
       [-2.48208585e-01,  6.37987567e-01,  8.25699195e-01,
        -1.03990657e+00, -6.57219120e-01,  7.03108275e-01],
       [ 1.33278925e-01, -2.50996324e-01,  2.49996081e-01,
         6.94763834e-01,  4.59955378e-01, -1.38575503e-01],
       [-1.25145410e+00,  3.17642873e-01, -2.16452239e-02,
        -1.11261887e+00, -1.33810528e+00,  2.84212909e-01],
       [ 1.05412429e-01,  9.62104753e-01, -9.51999530e-03,
         2.31124272e-01,  9.29254484e-01, -5.84621864e-01],
       [-2.82752595e-01, -1.00763038e-01,  1.49771477e-01,
        -7.81127368e-01,  1.10600040e+00, -1.03354257e+00],
       [ 6.23017701e-02, -4.69149676e-02,  6.56474606e-01,
         9.25057384e-01,  4.09338098e-01, -3.12559651e-01],
       [-7.51786133e-01, -9.03077530e-01,  4.36249525e-01,
        -7.56957751e-01, -1.52692591e+00,  2.13425814e+00],
       [-5.52618543e-01, -8.24666042e-01,  6.60124236e-01,
        -8.96155963e-01, -6.08543226e-01,  1.29536595e-02],
       [-6.58158306e-01, -1.23012960e+00, -8.82904700e-01,
        -8.24210805e-01,  1.25756607e-03,  3.65410610e-01],
       [-3.55771528e-01, -9.40035435e-01, -7.09654842e-01,
         6.27766387e-01, -5.05732477e-02,  4.99329011e-02],
       [ 6.14123605e-01,  2.08646620e+00,  1.48454871e+00,
         1.20886733e+00,  8.52635020e-01, -1.24475213e+00],
       [-6.55693853e-01, -2.18584595e-01,  2.47348392e-01,
        -1.89314446e-01, -5.02205260e-01,  9.44491123e-02],
       [ 1.58617457e+00,  7.97110292e-01,  1.18959191e-01,
        -4.41061011e-01, -5.62120845e-01,  7.86911707e-01],
       [-4.41617876e-01, -7.18726629e-01,  8.28438586e-01,
        -3.40672419e-01, -2.99811857e-01,  2.46776904e-01],
       [-4.38669427e-01,  2.17491065e-01,  2.25606682e-01,
        -2.14202060e-02, -1.09095078e-01,  6.85098299e-01],
       [-1.03488502e+00, -1.24419495e+00, -1.36924538e-01,
        -5.88683983e-02, -1.11152747e+00,  1.95226460e+00],
       [ 6.13047931e-01,  5.96422528e-01, -3.11757828e+00,
         3.91746020e-01,  9.27502200e-02,  5.18426272e-02],
       [-5.18330456e-02,  1.37692159e+00, -1.08294970e+00,
        -4.04799959e-01, -3.09276614e-01,  4.24008514e-01],
       [-5.58457752e-01, -5.02145516e-01,  1.42395728e-01,
        -6.84205280e-01,  8.81843285e-03, -1.96627258e-02],
       [-6.17285171e-02, -1.10134851e-01,  1.22496465e+00,
        -9.72875943e-01, -1.35134053e-01,  4.23881862e-01],
       [-4.19565064e-01, -3.28739566e-01,  3.05450342e-01,
        -4.13423496e-01,  3.23716320e-02,  1.30353765e-01],
       [ 3.06583623e-01, -9.08665447e-01,  2.17643348e-01,
         3.84861943e-01,  6.21764978e-02,  1.89300542e-01],
       [-1.16145602e+00, -6.78658516e-01, -6.20818622e-01,
        -5.48560462e-01, -1.94758103e+00,  2.39231451e+00],
       [-5.33620819e-01, -1.70141038e+00,  6.17839206e-01,
        -1.30733091e+00, -2.24324202e+00,  3.18795067e+00],
       [-1.73552203e-02, -4.40813569e-01,  4.55334390e-01,
         4.15213547e-01,  2.72688684e-01, -1.29939186e-03],
       [-1.08938585e+00, -5.32551106e-01, -7.13046565e-01,
        -3.15592112e-02,  8.25592857e-01, -5.20990424e-01],
       [-1.02076352e+00, -9.12055617e-01, -5.51805464e-01,
        -9.14764652e-01, -2.09721434e+00,  1.00236397e+00]])
# Initialise the Principal component analysis (PCA) algorithm with 2 components
pca = PCA(n_components=2)

# Apply the dimensionality reduction on the six mobility categories
pca_components = pca.fit_transform(mobility_trends_UK_standardised)

# Transformed values arranged as observations/samples in rows
# and number of components in columns
pca_components
array([[ 3.05048664e+00, -3.17134262e-01],
       [ 4.05220779e-02, -3.23107889e-01],
       [-1.66728899e+00, -9.58314439e-03],
       [ 1.87640960e-01,  2.20371174e+00],
       [-9.36968706e-01,  5.59535600e-01],
       [-3.10038586e+00, -8.31210709e-01],
       [-1.80731273e+00,  5.21006053e-01],
       [ 3.72895574e+00, -6.37136213e-01],
       [ 9.73241512e-01,  6.59415885e-01],
       [ 1.82555443e+00,  7.43200181e-01],
       [-7.88387434e-01,  2.10113385e+00],
       [-2.64694215e+00, -1.56654726e-01],
       [-5.21128998e-01,  7.71996986e-01],
       [ 2.88282276e+00,  3.69933394e-02],
       [ 7.95806245e-02,  6.71455150e-01],
       [ 3.09657291e+00, -7.88220843e-01],
       [ 4.17246911e+00, -1.29549204e+00],
       [ 2.05759103e+00, -3.35013940e-01],
       [-4.76628278e-01,  7.88229982e-01],
       [ 1.95474734e+00, -3.13181233e-01],
       [ 4.42040955e+00,  8.64391255e-01],
       [-1.13652904e+00,  1.69484026e-01],
       [-3.23532966e+00,  5.63305600e-01],
       [ 6.64462577e-03, -8.03614441e-01],
       [ 6.06777091e-03,  3.45992357e-01],
       [ 6.94045764e-01, -5.52641275e-01],
       [ 8.86133168e-01, -8.86819381e-02],
       [-2.23501670e+00, -1.11539958e+00],
       [-2.19232116e+00, -2.93389124e+00],
       [-1.08759838e+00, -6.72046788e-01],
       [-1.67095965e+00, -1.08452008e+00],
       [ 1.34098351e-01,  1.59032516e+00],
       [-2.00673439e+00,  5.90240968e-01],
       [ 7.74526602e-01, -2.88043344e-01],
       [-8.30288715e-01, -1.09432152e+00],
       [-1.73796984e+00,  1.13179753e+00],
       [-2.02516454e+00, -1.74505724e+00],
       [-9.57214898e-01, -2.46917047e+00],
       [-2.89027586e+00, -4.58575296e-01],
       [ 1.05349976e+00,  2.49381531e-01],
       [-7.99147025e-01,  9.72393052e-02],
       [ 3.23777523e-03,  2.61323349e+00],
       [-1.86889407e+00, -2.18463821e+00],
       [-8.04253563e-01, -1.31575944e+00],
       [ 5.75801802e+00,  3.29549538e-01],
       [ 7.47404868e-01, -3.22291435e-01],
       [ 6.09390139e-01,  3.01676098e-01],
       [-2.33252485e+00,  1.77300318e+00],
       [-9.71888291e-01, -4.70200550e-01],
       [ 2.41040653e-01,  9.53640948e-01],
       [ 3.86003082e+00,  7.50740681e-01],
       [ 2.23720925e-01, -7.14428100e-01],
       [ 4.20927731e+00, -8.65742676e-01],
       [ 9.18627776e-01, -3.94862053e-01],
       [-2.36462470e+00, -4.70201776e-01],
       [ 5.90062753e-01, -7.48040735e-01],
       [-1.35237809e+00,  2.39500247e+00],
       [-1.15655267e+00, -4.46302266e-01],
       [ 2.08749666e+00, -2.52827162e-01],
       [-1.22947606e+00, -1.33521768e+00],
       [-7.61608088e-01,  2.62135608e+00],
       [-2.54409588e+00,  4.53659633e-01],
       [-3.18022546e+00, -1.07995351e+00],
       [-7.20799620e-02, -4.81291206e-01],
       [-8.59679527e-01,  2.44635689e-01],
       [-7.87912106e-01, -1.19741052e+00],
       [ 1.35962335e+00, -7.33178259e-01],
       [ 3.00396002e-01, -2.80062111e-01],
       [-2.67717567e+00, -8.01910652e-01],
       [ 3.61905954e-01,  1.79706169e+00],
       [ 4.43242489e-01,  1.82238891e+00],
       [ 2.96461767e-01, -4.00456765e-01],
       [ 2.34515840e-01, -1.50004892e-01],
       [-2.13867296e+00,  3.18389569e+00],
       [-1.79419007e+00,  9.62754689e-01],
       [-6.49185149e-01,  1.31426108e+00],
       [-3.67981795e-01,  1.45779196e-01],
       [ 1.75616656e+00, -9.69855787e-02],
       [ 6.77395816e-01,  3.76009489e-01],
       [-1.67671887e+00,  1.76192970e-01],
       [-1.74816933e+00,  8.56887623e-01],
       [ 7.18599097e-01,  3.75520941e-01],
       [-2.43619115e+00,  3.70889175e-01],
       [-1.55417418e+00, -1.06906504e+00],
       [-2.72572155e+00,  7.77428437e-01],
       [-2.88694212e+00, -2.34139233e-01],
       [-3.18055273e-01,  2.49800819e+00],
       [-2.66060406e+00,  3.63534553e-01],
       [ 8.77899240e-01, -4.33989066e-01],
       [-1.13158151e+00, -1.12833394e+00],
       [-4.42362271e-01,  4.66876540e-01],
       [-1.65391170e+00, -1.51175211e+00],
       [ 2.95225548e+00, -1.12539691e+00],
       [-3.22774726e-01, -2.58690709e-01],
       [ 2.08733956e+00, -2.09796876e-01],
       [-2.98045249e+00, -7.39130691e-01],
       [ 7.67894239e-01, -4.91248794e-01],
       [-3.29242285e-01,  1.44777214e+00],
       [ 1.87126923e-01, -9.54311115e-01],
       [ 1.07515504e+00, -4.14535492e-01],
       [-1.09334242e+00, -2.64796296e-01],
       [ 4.25783475e+00, -2.95174027e-02],
       [-1.44701719e+00,  5.28405381e-01],
       [ 1.02339970e+00,  9.35920489e-01],
       [-2.29348699e-01,  9.21840129e-01],
       [-1.29844792e+00, -6.28606110e-02],
       [-1.38960255e+00, -5.36291856e-01],
       [ 2.09824297e+00,  1.49265015e-01],
       [-1.87165243e+00, -1.21773671e+00],
       [-6.89792016e-01,  4.91731539e-01],
       [ 1.37198238e+00, -1.28599091e+00],
       [-8.50761161e-01, -1.02196425e+00],
       [-2.09805119e-01, -2.52067293e-01],
       [ 2.27652791e+00,  1.95688169e-01],
       [ 9.48834294e-01, -5.75995451e-01],
       [-5.40630961e-01, -3.67055405e-01],
       [ 1.67585030e+00,  1.19838815e-01],
       [-1.25665227e+00,  1.47519686e-01],
       [-5.02861307e-01,  1.70139533e-01],
       [-7.49134915e-01, -8.04724329e-01],
       [ 2.76393315e+00, -5.10814314e-01],
       [ 1.27038505e+00, -5.68920759e-01],
       [ 1.32154816e+00,  9.36154333e-01],
       [ 3.12556428e-01,  4.39868270e-01],
       [-2.63579992e+00, -1.43324838e+00],
       [ 7.45754614e-01, -2.85829995e-01],
       [-1.98917555e-01,  2.01538157e-02],
       [ 9.00949676e-01, -8.14746764e-01],
       [ 4.90567982e-01, -2.15832748e-01],
       [ 2.43937985e+00, -1.11579142e-01],
       [-7.22385265e-01,  2.96453502e+00],
       [-2.38463941e-02,  1.20484219e+00],
       [ 7.38469276e-01, -2.08238125e-02],
       [ 7.66860023e-01, -9.64093589e-01],
       [ 5.48432138e-01, -2.22903710e-01],
       [ 1.29598158e-01, -3.58566640e-01],
       [ 3.07770134e+00,  4.07670688e-01],
       [ 4.07090164e+00, -6.98024160e-01],
       [-1.19942157e-01, -5.37613472e-01],
       [ 7.43040484e-02,  7.29267596e-01],
       [ 2.68592412e+00,  3.93415496e-01]])

Now we can run the k-means algorithm on the two principal components:

k = 4
kmeans_k4_pca = KMeans(
    n_clusters=k, init="k-means++", n_init=10, max_iter=300, random_state=0
)
kmeans_k4_pca.fit(pca_components)
KMeans(n_clusters=4, random_state=0)
# Labels of clusters to which each observation was assigned to
kmeans_k4_pca.labels_
array([3, 0, 1, 2, 2, 1, 1, 3, 0, 0, 2, 1, 2, 3, 0, 3, 3, 3, 2, 3, 3, 1,
       1, 0, 0, 0, 0, 1, 1, 1, 1, 2, 1, 0, 1, 2, 1, 1, 1, 0, 1, 2, 1, 1,
       3, 0, 0, 2, 1, 0, 3, 0, 3, 0, 1, 0, 2, 1, 3, 1, 2, 1, 1, 0, 1, 1,
       0, 0, 1, 2, 2, 0, 0, 2, 2, 2, 0, 0, 0, 1, 2, 0, 1, 1, 1, 1, 2, 1,
       0, 1, 0, 1, 3, 0, 3, 1, 0, 2, 0, 0, 1, 3, 1, 0, 2, 1, 1, 3, 1, 2,
       0, 1, 0, 3, 0, 0, 0, 1, 0, 1, 3, 0, 0, 0, 1, 0, 0, 0, 0, 3, 2, 2,
       0, 0, 0, 0, 3, 3, 0, 0, 3], dtype=int32)
# Add the 4-cluster assignment on the PCA components to your DataFrame
mobility_trends_UK_mean_NaNdrop["clusters_k4_pca"] = kmeans_k4_pca.labels_
mobility_trends_UK_mean_NaNdrop
Retail_Recreation Grocery_Pharmacy Parks Transit_stations Workplaces Residential clusters clusters_k4 clusters_k4_pca
sub_region_1
Aberdeen City -50.046371 -10.722567 20.557692 -46.127016 -42.489919 14.567010 2 3 3
Aberdeenshire -28.253669 -11.248447 22.474684 -39.953878 -37.207661 12.222222 0 2 0
Angus Council -25.955975 -6.125786 13.982143 -31.150943 -33.542339 10.831551 1 0 1
Antrim and Newtownabbey -29.377358 -7.465409 -29.134328 -53.752621 -33.679435 12.859031 0 1 2
Ards and North Down -27.262055 0.452830 6.838298 -41.721311 -35.991935 12.679039 1 1 2
... ... ... ... ... ... ... ... ... ...
Windsor and Maidenhead -42.714885 -11.178197 0.379455 -43.693920 -43.711694 16.709220 2 3 3
Wokingham -39.044025 -16.285115 30.458101 -51.299790 -45.034274 18.237327 2 3 3
Worcestershire -36.025497 -9.990563 26.511954 -34.033107 -33.779758 12.112019 0 2 0
Wrexham Principal Area -42.293501 -10.448637 -1.860140 -38.511530 -31.306452 11.113895 0 2 0
York -41.892276 -12.343621 2.055319 -47.364729 -44.381048 14.039666 2 3 3

141 rows × 9 columns

Visualising mobility clusters

Let’s plot the resulting clusters along the two principal components using a scatter plot.

# Set figure size
plt.figure(figsize=(11.7, 8.27))

# Scatterplot with the 1st principal component on the horizontal x axes
# and the 2nd principal component on the vertical y axis
grid = sns.scatterplot(
    x=pca_components[:, 0],
    y=pca_components[:, 1],
    hue=kmeans_k4_pca.labels_,
    alpha=0.8,
    s=120,
)

# Add labels to the horisontal x axis and vertical y axis
labels = grid.set(xlabel="1st principal component", ylabel="2nd principal component")

# Plot the cluster centroids
sns.scatterplot(
    x=kmeans_k4_pca.cluster_centers_[:, 0],
    y=kmeans_k4_pca.cluster_centers_[:, 1],
    hue=range(k),
    s=220,
    alpha=0.8,
    ec="black",
    legend=False,
)

# Add title 'Cluster' to the legend and locate it in the upper right of the plot
plt.legend(title="Cluster", loc="upper right")
<matplotlib.legend.Legend at 0x7fcaa1b6e580>
../_images/06_pattern_discovery_using_unsupervised_learning_63_1.png

In the figure above, we plot the 1st principal component against the 2nd principal component derived from the six mobility types. Each data point is a county in the UK. Larger dots represent the cluster centroid (which is typically not a data point). Colour scheme represents cluster assignment.

The figure above lacks county labels which we would need in order to interpret our results from k-means clustering.

Let’s add labels to data points so that we can associate each county with its name.

# Enlarge figure size
plt.figure(figsize=(32, 24))

# Scatterplot with the 1st principal component on the horizontal x axes
# and the 2nd principal component on the vertical y axis
grid = sns.scatterplot(
    x=pca_components[:, 0],
    y=pca_components[:, 1],
    hue=kmeans_k4_pca.labels_,
    alpha=0.9,
    s=120,
)

# Add labels to the horisontal x axis and vertical y axis
labels = grid.set(xlabel="1st principal component", ylabel="2nd principal component")

# Plot the cluster centroids
sns.scatterplot(
    x=kmeans_k4_pca.cluster_centers_[:, 0],
    y=kmeans_k4_pca.cluster_centers_[:, 1],
    hue=range(k),
    s=240,
    alpha=0.8,
    ec="black",
    legend=False,
)

# This for loop assign country name to each data point iteratively
for line in range(0, mobility_trends_UK_mean_NaNdrop.shape[0]):
    grid.text(
        pca_components[line, 0] + 0.1,
        pca_components[line, 1],  # where the labels should be positioned
        mobility_trends_UK_mean_NaNdrop.index[line],  # add labels to each data point
        horizontalalignment="left",
        size="small",
        color="black",
        weight=None,
    )

# Add title 'Cluster' to the legend and locate it in the upper right of the plot
plt.legend(title="Cluster", loc="upper right");
../_images/06_pattern_discovery_using_unsupervised_learning_65_0.png

Because PCA transforms our six variables into a two-dimensional space, we cannot anymore see how a particular cluster or county is positioned with respect to any particular mobility category.

If you need to cluster counties with regard to any pair of variables, you could run k-means on particular pairs of variables and plot the cluster assignment for those variables. For example, below we run the k-means algorithm on two variables: retail and recreation mobility and workplaces mobility.

# We first fit k-means to two variables retail_recreation and workplaces
# using the standardised data. We specify the number of clusters to be
# formed as k = 4  but keep in mind that we did not performed the Elbow method
# on these two variables in particular.

k = 4
kmeans_k4_2vars = KMeans(
    n_clusters=k, init="k-means++", n_init=10, max_iter=300, random_state=0
)

# 0 indicates the retail_recreation mobility variable
# and 4 indicates workplaces mobility variable
kmeans_k4_2vars.fit(mobility_trends_UK_standardised[:, [0, 4]])
KMeans(n_clusters=4, random_state=0)

Plot the resulting clusters along the two mobility variables — retail and recreation mobility and workplaces mobility — using a scatter plot.

# Plot the clusters
plt.figure(figsize=(11.7, 8.27))

grid = sns.scatterplot(
    x=mobility_trends_UK_standardised[:, 0],
    y=mobility_trends_UK_standardised[:, 4],
    hue=kmeans_k4_2vars.labels_,
    alpha=0.8,
    s=120,
)

# Plot the centers
sns.scatterplot(
    x=kmeans_k4_2vars.cluster_centers_[:, 0],
    y=kmeans_k4_2vars.cluster_centers_[:, 1],
    hue=range(k),
    s=220,
    alpha=0.8,
    ec="black",
    legend=False,
)
grid.set(
    xlabel="Retail and Recreation Mean Change Mobility",
    ylabel="Workplaces Mean Change Mobility",
)

# Add title 'Cluster' to the legend and locate it in the upper right of the plot
plt.legend(title="Cluster", loc="upper right")
<matplotlib.legend.Legend at 0x7fcaaf0d9df0>
../_images/06_pattern_discovery_using_unsupervised_learning_69_1.png

Let’s add UK county labels in the figure below as we did before.

# Enlarge figure size
plt.figure(figsize=(28, 22))

# Scatterplot with the 1st principal component on the horizontal x axes
# and the 2nd principal component on the vertical y axis
grid = sns.scatterplot(
    x=mobility_trends_UK_standardised[:, 0],
    y=mobility_trends_UK_standardised[:, 4],
    hue=kmeans_k4_2vars.labels_,
    alpha=0.9,
    s=120,
)
grid.set(
    xlabel="Retail and Recreation Mean Change Mobility",
    ylabel="Workplaces Mean Change Mobility",
)

# Plot the cluster centroids
sns.scatterplot(
    x=kmeans_k4_2vars.cluster_centers_[:, 0],
    y=kmeans_k4_2vars.cluster_centers_[:, 1],
    hue=range(k),
    s=240,
    alpha=0.8,
    ec="black",
    legend=False,
)

# This for loop assign country name to each data point iteratively
for line in range(0, mobility_trends_UK_mean_NaNdrop.shape[0]):
    grid.text(
        mobility_trends_UK_standardised[line, 0] + 0.1,
        mobility_trends_UK_standardised[
            line, 4
        ],  # where the labels should be positioned
        mobility_trends_UK_mean_NaNdrop.index[
            line
        ],  # add labels to each data point iteratively
        horizontalalignment="left",
        size="small",
        color="black",
        weight=None,
    )

# Add title 'Cluster' to the legend and locate it in the upper right of the plot
plt.legend(title="Cluster", loc="upper right");
../_images/06_pattern_discovery_using_unsupervised_learning_71_0.png

Hands-on exercise

You would like to know whether mobility trends in the UK over the last year of the pandemic were similar to the mobility trends in some other countries, and to which countries in particular.

To learn this, you use k-means clustering to group world countries in the COVID-19 Community Mobility Reports data set according to their mobility across mobility categories.

Write your Python code and Markdown below.

Below is a solution to the hands-on exercise.

# Compute mean mobility trends by country and remove NaN (Not a Number) values
mobility_trends_countries = (
    mobility_trends.groupby("country_region")[
        [
            "Retail_Recreation",
            "Grocery_Pharmacy",
            "Parks",
            "Transit_stations",
            "Workplaces",
            "Residential",
        ]
    ]
    .mean()
    .dropna()
)
mobility_trends_countries.head()
Retail_Recreation Grocery_Pharmacy Parks Transit_stations Workplaces Residential
country_region
Afghanistan 14.629630 37.950231 6.611236 -4.188804 -7.533917 4.368709
Angola -12.054187 -0.541706 6.636268 -27.115071 -11.890746 7.796813
Antigua and Barbuda -18.242138 -8.100629 33.349057 -43.980952 -33.560899 4.367725
Argentina -41.844881 -8.477221 -59.743529 -46.950210 -12.739573 10.504889
Aruba -19.974843 -6.563941 12.723270 -45.807128 -21.149194 5.266667
# Data standardisation
scaler = StandardScaler()
StandardisedData = scaler.fit_transform(mobility_trends_countries)
StandardisedData
array([[ 2.60971986e+00,  2.72237961e+00,  3.47561843e-01,
         1.37660526e+00,  1.40145901e+00, -6.80948462e-01],
       [ 5.37783963e-01, -2.72062247e-02,  3.48512303e-01,
        -1.55760852e-01,  9.02158250e-01,  6.76779321e-02],
       [ 5.73040443e-02, -5.67161074e-01,  1.36277535e+00,
        -1.28305749e+00, -1.58128178e+00, -6.81163370e-01],
       [-1.77539368e+00, -5.94062123e-01, -2.17187512e+00,
        -1.48151933e+00,  8.04881058e-01,  6.59065338e-01],
       [-7.72363952e-02, -4.57391215e-01,  5.79630869e-01,
        -1.40511702e+00, -1.58877118e-01, -4.84853223e-01],
       [ 6.85588402e-01,  1.35974506e-01, -3.45209638e-01,
        -3.95424093e-01,  9.96267366e-01, -1.50054232e-01],
       [-9.59256359e-01, -4.91423251e-01,  1.70358047e-01,
        -1.04091624e-02, -7.87004065e-01, -1.42716295e-02],
       [-4.00453115e-01, -5.48873754e-01, -9.93222721e-01,
         3.57482646e-01,  1.16189086e-01,  6.58432689e-01],
       [ 2.80588000e-01,  6.12406852e-01,  7.55117751e-02,
         1.14437068e+00,  1.53935251e+00,  5.75779225e-01],
       [-7.71871615e-01, -1.49054175e+00, -9.97494429e-04,
        -1.59206991e+00, -1.77993395e+00,  1.03694700e-01],
       [ 3.08739143e-01,  1.67161537e-01,  6.71900417e-01,
         1.07047510e+00,  3.02930773e-01, -1.84553224e+00],
       [-6.62714327e-01, -1.62237222e-01,  1.59660009e+00,
        -2.27766518e-01, -9.03535011e-01,  7.64825718e-01],
       [-5.96197175e-02,  5.84356789e-02,  9.41737687e-02,
        -1.06134513e+00, -1.23523870e+00, -6.04717583e-02],
       [ 8.90963077e-01,  2.47462459e+00,  5.31298819e-01,
         1.08774909e+00,  1.28879947e+00, -9.24507380e-01],
       [-1.18471820e+00, -1.19493584e+00, -1.06310121e+00,
        -1.06901514e+00, -1.54589556e-01,  1.40575016e+00],
       [ 2.25368450e-01,  3.02058487e-01,  4.13225825e-01,
         5.13928237e-01,  3.96301617e-01, -1.85771275e+00],
       [ 1.27271809e+00,  1.73500103e+00,  9.79545504e-01,
         1.10532642e+00,  7.39980250e-01, -1.71686413e-02],
       [-4.61882443e-01,  7.37375413e-01, -1.09596198e+00,
        -1.29264376e-01,  1.69589951e+00,  1.44929501e-01],
       [ 3.46967054e-02,  2.40132651e-01,  5.35903503e-01,
         5.28232291e-01, -1.03415179e-01, -1.22149250e+00],
       [ 2.19300594e+00,  2.84270168e+00,  8.62167725e-01,
         2.02791800e+00,  1.75426265e+00, -1.39616643e+00],
       [ 1.82917067e-02, -1.04820437e+00, -7.09149643e-01,
        -1.07118079e+00, -3.82246744e-01,  7.75248625e-01],
       [ 1.08455597e+00,  6.31392933e-01, -5.08904929e-01,
         2.19165550e+00,  8.57361761e-01, -1.14101040e+00],
       [ 3.65005703e-01,  3.55186727e-02,  1.52413515e+00,
        -8.26928312e-01, -4.13658797e-01,  3.04221019e-01],
       [-1.44112442e+00, -1.07621542e+00, -1.08243975e+00,
        -2.46797352e+00, -1.76142224e+00,  2.30220328e-01],
       [-1.87870816e+00, -1.71853360e+00, -1.69105187e+00,
        -1.18349566e+00, -2.79844659e-01,  2.02049532e+00],
       [-1.20390316e+00, -8.43807783e-01, -1.08170059e+00,
        -7.24153849e-01,  4.55320438e-01,  1.06722439e+00],
       [-1.24096582e+00, -1.12887372e+00, -1.57856853e+00,
        -1.08780832e+00, -6.01099888e-01,  8.50275728e-01],
       [ 3.81318352e-01,  3.54428087e-01,  1.84889905e+00,
         4.64224472e-01, -9.61585749e-02, -8.76848593e-01],
       [-1.56975757e-01,  5.08740222e-01,  1.04396792e+00,
         9.27085801e-01,  2.45472164e-01, -1.38746564e-01],
       [ 1.44861882e+00,  1.44764869e+00, -1.62839254e-01,
         1.69888816e+00,  1.15778143e+00, -8.06715212e-01],
       [ 1.23934036e+00, -1.40082330e-01,  2.63231030e+00,
         3.13547434e-02, -5.62324821e-01, -7.28117566e-02],
       [-8.67416574e-01, -6.46741166e-01, -1.12970672e+00,
        -6.97323561e-01, -1.50050416e+00,  5.58797657e-02],
       [-2.58895312e-01, -5.75439484e-01, -9.21458086e-01,
        -4.44944247e-01, -1.15329882e+00,  1.38601320e+00],
       [ 6.02936229e-03,  2.81728908e+00, -5.35044981e-02,
         1.58307991e-01,  8.63472906e-01, -1.00018395e+00],
       [-5.06634907e-01, -5.65536060e-01, -1.05267817e+00,
        -4.69905338e-01, -1.03188047e+00,  7.33236883e-02],
       [ 6.11806991e-01,  3.62296169e-01,  1.33902110e+00,
         8.62924345e-01, -3.14813743e-01, -5.69959274e-01],
       [ 7.80561089e-01, -1.18055774e-01, -4.57306463e-01,
        -2.09416726e-01,  6.06462989e-01,  6.41466978e-01],
       [ 7.29302622e-01,  2.51623905e-01,  2.15083686e+00,
        -4.85176822e-01, -1.47063067e-01, -4.70006546e-01],
       [-6.57191514e-01, -7.80873179e-02,  1.26279338e+00,
         4.06174460e-01, -7.28855645e-01,  2.53747128e-01],
       [ 2.36962834e-01, -7.62992332e-01,  7.45473737e-01,
         1.09461513e+00,  3.56426654e-01, -2.12691317e-01],
       [-7.05638191e-01, -1.95306217e-01, -1.53334955e-02,
         4.13918140e-01, -7.73970300e-01, -3.72016030e-01],
       [-5.45730847e-01, -2.98970040e-02,  2.31379853e+00,
        -8.51011365e-02, -1.28256400e-01, -4.78066133e-02],
       [ 1.32254121e+00,  1.05570745e+00, -3.64732051e-01,
         1.43533729e+00,  9.20304574e-01,  7.51091572e-01],
       [-3.00954152e-01,  1.03619634e+00,  2.16237108e+00,
         4.97592955e-01, -3.07546750e-01, -4.98008899e-01],
       [-4.69571014e-01, -1.07233259e+00, -1.07699881e+00,
        -8.04957091e-01, -5.57824834e-01,  4.23465773e-01],
       [ 2.85764880e-01, -1.02597519e+00, -6.31108931e-01,
         3.31947469e-01, -6.35487520e-01, -5.78897823e-01],
       [-1.35510656e+00, -1.01540679e+00, -1.09023540e+00,
        -9.45689787e-01, -1.11893544e+00, -4.14700614e-01],
       [-1.02855317e-01,  1.22499269e+00, -6.44314566e-01,
         2.56824887e-01,  6.30527743e-01,  6.71918254e-01],
       [ 8.92127310e-01,  2.29386424e-01,  1.50473428e+00,
         1.00388711e+00,  3.67339223e-02, -3.80945919e-01],
       [-1.08182289e+00,  6.76489015e-01, -6.63625259e-01,
         1.42759251e-01,  2.31669995e-01,  1.19269034e+00],
       [ 2.62193985e-01,  1.41332519e-01, -3.62044131e-01,
        -3.56423625e-01, -1.74059546e-01, -2.54434424e-01],
       [ 1.14523025e+00,  2.05699951e+00,  4.26280866e-03,
         1.44021928e+00,  5.83043304e-01, -5.23946861e-01],
       [-1.13155958e+00,  1.69253780e-01,  2.75169920e-01,
        -7.49139637e-01, -1.39161526e+00,  8.00672452e-01],
       [-6.87765469e-01, -2.33985579e-02,  3.41621888e-01,
         3.78260253e-02, -6.99813186e-01,  4.89300105e-01],
       [-7.22326936e-01, -5.94639834e-01,  7.13854772e-01,
        -3.70624953e-01, -7.44910147e-01,  3.74841029e-01],
       [-8.39476322e-01, -1.15902052e+00, -1.03352859e+00,
        -3.45466649e-01, -1.03308583e+00,  1.36618261e-01],
       [ 7.37480150e-01,  6.61044393e-02,  8.75131152e-02,
         1.50411332e-01,  9.19477021e-01, -3.74871551e-01],
       [ 3.57109571e-02,  2.24336775e-01, -7.93705029e-01,
        -1.62148327e+00, -8.16589366e-01, -6.25899086e-02],
       [-2.11592974e-01, -3.03697124e-01, -4.32365023e-02,
         2.92252461e+00, -7.39443974e-01, -1.11339656e+00],
       [ 1.26647294e+00,  5.94257869e-03, -3.43534767e-01,
         1.38010088e+00,  1.69702652e+00,  6.72628604e-01],
       [-1.18456848e+00, -1.55512752e+00, -1.14150650e+00,
         2.21895784e-01, -1.24664414e+00,  1.43569148e+00],
       [-1.73578735e-01, -1.01294213e+00, -4.47074113e-01,
         5.07691249e-01, -1.05599756e+00, -1.78675606e+00],
       [ 9.37386549e-02, -4.19099473e-01, -7.09437029e-01,
        -5.30462583e-01,  6.73218079e-01, -3.48199496e-01],
       [ 2.21835069e-01, -2.73259256e-03,  1.12106553e+00,
         1.23114455e-02, -1.09271001e+00, -2.78330668e-01],
       [ 2.18286027e-01,  1.13549692e+00,  2.35285460e-01,
        -1.10493340e+00, -1.05692457e+00, -1.36534621e+00],
       [ 2.98101723e+00,  3.11024005e+00,  2.12204593e+00,
         3.94831985e-01,  1.59577980e+00, -1.68582793e+00],
       [-7.27894584e-01,  1.41651057e+00,  1.51666131e+00,
         5.14749299e-01, -2.83665433e-01, -7.10842852e-01],
       [-1.06061289e+00, -5.94832064e-01,  1.09639399e+00,
        -1.71103732e-01, -1.24531026e+00,  8.47550774e-01],
       [-1.05455079e+00, -3.66439793e-01, -8.97859471e-01,
        -1.29984295e+00, -5.21713309e-01,  1.56948876e+00],
       [ 1.16348395e+00,  4.75588900e-01,  1.15832278e+00,
         4.59265128e-01,  1.39381727e+00, -1.37092735e+00],
       [-1.36729495e-01, -2.74241777e-01,  3.27713121e-01,
         3.61876437e-01, -4.61157768e-01,  3.92636550e-01],
       [-2.76393329e-01, -1.01310328e+00, -1.55316146e+00,
        -1.03734911e+00, -4.99124882e-01, -7.61345795e-01],
       [-7.70003660e-01, -2.81684308e-01, -1.12465032e+00,
        -2.87550191e-01, -1.17449018e-01,  5.73076839e-01],
       [ 9.74925551e-03, -4.50831561e-01, -1.34377696e-01,
        -5.48679872e-02, -5.06359301e-01, -1.64890400e+00],
       [ 1.24065863e+00,  2.75304023e+00,  6.29501724e-01,
         2.23676034e+00,  9.74103761e-01, -2.77509033e-01],
       [-1.11776590e+00,  1.59406385e-01, -1.23565452e+00,
        -3.21890653e-01, -5.45454061e-01,  9.98943208e-01],
       [ 4.69402577e-01, -8.45542128e-02, -6.00500276e-01,
        -1.13941863e-01,  1.52232371e+00, -1.06443750e+00],
       [-1.90272391e+00, -1.92179648e+00, -1.19467045e+00,
        -1.13599833e+00, -1.80738726e+00,  2.28047066e+00],
       [ 2.88765451e-01,  3.13999559e-02, -6.45458333e-01,
        -8.28018621e-01,  5.96789709e-01, -9.22651426e-01],
       [-1.08659856e+00, -6.47769726e-01, -4.55224953e-01,
         3.30054738e-01, -6.98351671e-01,  7.02804548e-01],
       [-3.90295177e-02,  4.56995069e-02,  1.38454429e+00,
        -8.28937753e-01, -7.20447101e-01,  3.95630030e-01],
       [ 7.22462099e-01, -3.75039858e-01, -7.26638867e-01,
        -4.57408124e-01,  1.56789462e+00, -5.06454188e-01],
       [ 3.32725303e-01,  1.52992127e-01, -5.09452311e-01,
         8.81417317e-01,  9.66447057e-01, -1.03471032e+00],
       [ 2.44297544e+00,  2.03023347e-01,  3.21765590e-01,
         2.34788654e+00,  3.90246625e-01, -1.55318205e+00],
       [ 6.64022678e-01, -4.86691389e-02, -5.08901804e-01,
         1.37873314e+00,  8.04864874e-01,  4.26128643e-01],
       [-2.28607401e-01, -8.85450250e-02,  8.74516396e-01,
        -4.14359983e-01, -5.50419381e-01, -8.75006572e-01],
       [ 8.36895135e-01,  8.31491575e-02,  1.65905086e+00,
        -4.04547987e-01, -6.04913782e-01, -1.22420188e-01],
       [-5.73843041e-01, -1.19207072e+00, -1.21316319e+00,
        -1.02636257e+00, -6.40534256e-01, -6.42782493e-01],
       [ 9.74902967e-01,  7.55325962e-01,  1.73712948e-01,
         1.45859689e+00,  1.09746131e+00, -7.84225137e-01],
       [-2.35432806e+00, -1.97468330e+00, -1.98136722e+00,
        -1.59429260e+00, -2.44408967e+00,  3.03471977e+00],
       [ 2.36549057e+00,  2.28610650e+00, -5.37634567e-01,
         1.52710614e+00,  3.21864322e+00, -1.04768289e+00],
       [-5.52455912e-01, -8.55431804e-01, -1.27449222e+00,
        -1.19894010e+00,  6.00776243e-01,  7.70940231e-01],
       [-1.78662933e+00, -1.81856477e+00, -1.24490884e+00,
        -1.68865348e+00, -6.07902102e-01,  2.25359431e+00],
       [-1.58295775e+00, -1.11387360e+00, -8.54702332e-01,
        -1.52631532e+00, -1.29484305e+00,  2.53025043e+00],
       [ 6.94690480e-01,  1.72022752e-01,  8.95435053e-01,
         3.41291496e-01,  3.79257596e-01, -4.55571767e-01],
       [-7.15039222e-01, -1.06413572e-01, -4.92868154e-03,
        -5.19526884e-01, -5.95455526e-01,  9.35535450e-01],
       [-2.96899874e-01, -1.13370453e+00, -1.26899256e+00,
        -1.59261294e+00, -5.55753864e-01,  4.31728817e-01],
       [ 1.48571513e-01,  7.04614894e-01, -2.13492790e-01,
         3.11214695e-01,  8.92491173e-01,  1.45981213e-01],
       [-6.80774811e-02, -6.42804248e-02, -1.05594139e-01,
         2.56951091e-01, -2.22742484e-01, -8.73205046e-01],
       [ 3.08211423e-01, -1.60195428e-01,  6.52301211e-01,
         1.32501737e+00, -2.68802818e-01, -1.10917433e+00],
       [ 8.24337346e-01, -1.44237233e+00,  1.32557535e-01,
        -1.06086149e-01, -1.04545042e-01,  2.37919140e+00],
       [ 3.16141149e-01,  7.74592178e-02, -5.38805742e-01,
        -7.79836271e-01, -2.33639636e-01,  6.92678163e-02],
       [-1.07933951e-01,  1.50911972e-01, -5.62803762e-01,
         4.47684124e-02,  5.25634699e-01, -9.09607672e-01],
       [-9.46513642e-02, -4.50160387e-02,  2.89666314e-01,
         3.72381505e-01, -6.94852381e-01, -6.59206118e-01],
       [-5.08262713e-01, -1.39202530e-01, -7.66390821e-01,
        -5.15550322e-01, -3.70069830e-01,  2.39850561e+00],
       [-5.71632246e-01, -2.50361551e-01,  6.16595255e-01,
        -1.75982070e-01, -3.67596173e-01,  3.09117696e-02],
       [-9.03769214e-01, -1.08991709e+00, -1.39965133e-01,
        -4.67027588e-01, -7.64202266e-01,  1.84469257e-01],
       [-5.74441020e-02, -2.42177474e-01, -7.35944703e-01,
        -2.40519263e-01, -2.32484464e-02,  1.20947028e+00],
       [ 5.75324281e-01,  7.34283685e-01,  1.27604173e+00,
         1.05176821e+00,  1.33183638e+00, -5.84849413e-01],
       [-1.09601070e+00, -3.93182295e-01,  2.74186539e-01,
        -2.20528053e-01, -4.80528307e-01,  1.12893681e-01],
       [-1.11212020e+00, -1.16245567e+00, -7.38783737e-01,
        -3.23554967e-01, -1.18232768e+00,  2.10921849e+00],
       [ 8.50143823e-01, -1.13966637e-01,  1.66562682e+00,
        -2.22157492e-01, -4.75398409e-01, -8.14568782e-03],
       [-6.13501872e-01,  4.69708012e-02,  1.18117932e+00,
         2.58239604e-01, -2.01121515e-01,  1.28358576e-01],
       [ 5.09354707e-01,  1.24188035e-01, -2.51764528e-01,
         4.80641941e-01,  1.77837273e+00, -9.73755711e-01],
       [ 6.68980969e-01, -4.16822731e-01, -5.87897129e-01,
        -1.83685952e-01,  9.99673596e-01, -1.15211319e+00],
       [ 1.01969072e+00, -5.88470030e-02, -7.09130021e-02,
         1.40541017e+00,  3.32660016e-01, -6.17531539e-01],
       [ 4.35573964e-01,  2.85125220e-01, -8.51556783e-01,
        -7.35506165e-01,  6.15003984e-01, -4.96975731e-01],
       [-1.08256032e+00, -1.27497352e+00, -2.65358171e-02,
        -1.87368367e+00, -1.48449982e+00, -5.66767330e-01],
       [ 9.23056224e-01,  1.21560113e+00,  1.66357753e+00,
         1.18002037e+00,  1.06280526e+00,  1.84262818e-01],
       [-5.51726840e-01,  1.42828859e-01, -7.56255617e-01,
        -1.15265003e+00, -1.00471762e+00,  2.76486683e-01],
       [-7.54648060e-01,  6.26460624e-01,  8.85409218e-02,
        -3.18897991e-03, -1.05399021e-01,  5.20897073e-02],
       [-5.56232845e-01, -1.10094613e+00, -3.95866018e-01,
        -4.32757885e-01,  1.52130683e+00,  9.19804874e-01],
       [ 5.12025871e-02,  2.24005359e-03,  6.53180545e-01,
         5.21450940e-01, -5.20969897e-01, -1.07163481e+00],
       [ 2.65604181e-01, -1.86047121e-02, -1.01144650e+00,
        -8.02645504e-01,  3.05018590e-01,  6.45865495e-01],
       [-1.38442212e+00, -6.07023246e-01,  1.16367852e+00,
        -8.92854726e-01, -1.84434336e+00,  1.15244882e+00],
       [ 8.96489450e-01,  1.06708267e-01,  1.07430043e+00,
         9.80791148e-01, -2.55278869e-01, -1.17378150e-01],
       [-6.22636763e-01, -3.43112341e-01, -1.50835633e+00,
        -9.00068762e-01,  1.33020572e+00, -1.29723100e-01],
       [-8.70205981e-01, -3.66734412e-01, -8.65549414e-01,
        -7.35106730e-01, -8.99572993e-02,  1.17479933e+00],
       [ 8.77850377e-02, -1.16266411e-01, -6.59880105e-01,
         5.61535775e-01,  1.84240262e+00, -2.47632362e+00],
       [ 3.22928239e+00,  2.52842736e+00,  1.03820640e+00,
         2.37401223e+00,  2.22079986e+00, -1.15385326e+00],
       [ 1.84081097e+00,  9.05537356e-01,  6.46096859e-01,
         1.26147187e+00,  1.26711600e+00,  2.22544422e-01],
       [ 6.78955505e-01,  6.39393713e-01, -2.37046725e-01,
         2.38439226e-01,  1.33892009e+00,  8.61031512e-01]])
# Run PCA with two components
pca_countries = PCA(n_components=2)
pca_countries = pca_countries.fit_transform(StandardisedData)
pca_countries
array([[-3.99016687e+00,  7.45478364e-01],
       [-6.06845801e-01,  2.89283308e-01],
       [ 7.63659966e-01, -2.19230651e+00],
       [ 2.32057062e+00,  2.10239002e+00],
       [ 5.72102336e-01, -7.20155393e-01],
       [-5.64747850e-01,  8.55920787e-01],
       [ 9.41844934e-01, -6.74254911e-01],
       [ 7.71024049e-01,  9.38877429e-01],
       [-1.32368338e+00,  1.03287876e+00],
       [ 2.47774589e+00, -1.19196393e+00],
       [-1.68519673e+00, -6.15938441e-01],
       [ 6.66457601e-01, -1.70202543e+00],
       [ 8.98595586e-01, -8.52468676e-01],
       [-3.02763069e+00,  3.62569086e-01],
       [ 2.46416516e+00,  8.36796320e-01],
       [-1.43099629e+00, -3.84012433e-01],
       [-2.45986827e+00, -1.49673660e-01],
       [-3.53389857e-01,  1.87222486e+00],
       [-9.21811801e-01, -6.69920702e-01],
       [-4.68033064e+00,  4.23018062e-01],
       [ 1.57607279e+00,  3.90821641e-01],
       [-2.37166071e+00,  8.79282529e-01],
       [ 3.35456370e-03, -1.39772761e+00],
       [ 3.34249607e+00, -3.69875237e-01],
       [ 3.54260283e+00,  1.30182544e+00],
       [ 1.80436162e+00,  1.17274771e+00],
       [ 2.58774159e+00,  8.79736352e-01],
       [-1.36658692e+00, -1.61384962e+00],
       [-1.00894477e+00, -6.54182914e-01],
       [-2.79744195e+00,  8.75387540e-01],
       [-1.10997212e+00, -2.34111002e+00],
       [ 1.94873016e+00, -8.37499479e-02],
       [ 1.81075555e+00,  2.47568769e-01],
       [-2.04783053e+00,  4.79725413e-01],
       [ 1.44045285e+00,  1.67215156e-01],
       [-1.30744201e+00, -1.25233176e+00],
       [-9.74208903e-02,  8.64820168e-01],
       [-9.90130797e-01, -1.83079072e+00],
       [ 1.88557279e-01, -1.39924217e+00],
       [-6.76024133e-01, -3.85374162e-01],
       [ 4.17152904e-01, -5.40002802e-01],
       [-3.23711846e-01, -1.94389918e+00],
       [-1.71625520e+00,  1.14458868e+00],
       [-1.23234464e+00, -1.92954160e+00],
       [ 1.74984389e+00,  4.94598485e-01],
       [ 4.03347440e-01,  1.37301861e-02],
       [ 2.12305719e+00, -3.29723304e-02],
       [-4.32965084e-01,  1.04893747e+00],
       [-1.55831055e+00, -1.12525093e+00],
       [ 6.89855941e-01,  8.39591676e-01],
       [ 4.38759722e-02,  1.43670366e-01],
       [-2.53534797e+00,  4.53710835e-01],
       [ 1.55320122e+00, -9.78265640e-01],
       [ 6.80341516e-01, -6.29878654e-01],
       [ 1.00254362e+00, -1.01185907e+00],
       [ 1.83430181e+00,  1.24819785e-01],
       [-9.73263581e-01,  4.53666840e-01],
       [ 1.11254053e+00,  7.84332061e-02],
       [-1.14112634e+00, -4.97929405e-01],
       [-1.52715296e+00,  1.52008806e+00],
       [ 2.52856784e+00,  3.09175967e-01],
       [ 2.05912854e-01, -6.16036147e-01],
       [ 1.88781138e-01,  8.54548922e-01],
       [-1.08495433e-01, -1.55613670e+00],
       [-2.99333532e-01, -1.03978979e+00],
       [-4.87791019e+00, -7.25434649e-01],
       [-1.11014169e+00, -1.45684088e+00],
       [ 1.33845540e+00, -1.53123283e+00],
       [ 2.28182526e+00,  5.52463272e-01],
       [-2.36185105e+00, -2.38414033e-01],
       [ 2.64381591e-01, -4.58973626e-01],
       [ 1.40771913e+00,  6.92734687e-01],
       [ 1.20495805e+00,  8.50272455e-01],
       [-1.48009628e-01, -5.08049920e-01],
       [-3.48733778e+00,  3.01272427e-01],
       [ 1.54303122e+00,  7.59552832e-01],
       [-9.54089064e-01,  1.18778300e+00],
       [ 4.17793767e+00,  6.04290791e-02],
       [-1.81287250e-01,  6.71872745e-01],
       [ 1.33705960e+00,  1.04492404e-03],
       [ 3.87525832e-01, -1.47473627e+00],
       [-5.68614641e-01,  1.40464193e+00],
       [-1.22710308e+00,  8.41111305e-01],
       [-3.10421021e+00, -4.77352886e-02],
       [-9.06691968e-01,  1.04600550e+00],
       [-3.03054318e-02, -1.20383735e+00],
       [-5.49816064e-01, -1.64369898e+00],
       [ 1.62798732e+00,  3.39296439e-01],
       [-2.21603801e+00,  5.12930712e-01],
       [ 5.37359541e+00,  3.95953743e-01],
       [-4.33276444e+00,  2.40976363e+00],
       [ 1.59329862e+00,  1.37487471e+00],
       [ 3.84986210e+00,  7.81664593e-01],
       [ 3.61989016e+00,  1.71004173e-01],
       [-1.13535844e+00, -5.02391417e-01],
       [ 1.19789730e+00, -2.44059859e-01],
       [ 2.09546690e+00,  6.22797015e-01],
       [-7.61788381e-01,  7.64710667e-01],
       [-2.55044921e-01, -2.00917469e-01],
       [-1.14432769e+00, -8.01725558e-01],
       [ 1.19099093e+00,  2.57593788e-01],
       [ 4.25489781e-01,  2.89514482e-01],
       [-4.16586267e-01,  5.85427198e-01],
       [-1.51139763e-01, -7.44410317e-01],
       [ 1.78476026e+00,  7.64334221e-01],
       [ 4.41815024e-01, -7.46788241e-01],
       [ 1.54065386e+00, -4.19883445e-01],
       [ 9.11480557e-01,  7.62340541e-01],
       [-2.17469657e+00, -2.25271922e-01],
       [ 9.50845494e-01, -5.67983115e-01],
       [ 2.65995183e+00,  1.47999176e-01],
       [-5.56177937e-01, -1.55153160e+00],
       [-5.47530011e-02, -1.04896657e+00],
       [-1.49455359e+00,  1.11511490e+00],
       [-6.99086103e-01,  8.53762995e-01],
       [-1.41131205e+00,  2.59040455e-01],
       [-1.97530806e-01,  9.44954036e-01],
       [ 2.29436837e+00, -1.14446541e+00],
       [-2.33809453e+00, -5.01934493e-01],
       [ 1.41538464e+00, -1.83732321e-02],
       [ 1.12519467e-01, -1.44334893e-01],
       [ 8.13817016e-01,  1.27008292e+00],
       [-6.33102836e-01, -9.86910770e-01],
       [ 6.41872747e-01,  1.07260314e+00],
       [ 2.14135859e+00, -1.92968726e+00],
       [-1.15813831e+00, -9.14146181e-01],
       [ 7.11885073e-01,  1.84897121e+00],
       [ 1.62332945e+00,  7.42788577e-01],
       [-1.68141608e+00,  1.16900093e+00],
       [-5.32821171e+00,  6.67290840e-01],
       [-2.44390245e+00,  4.73591045e-01],
       [-8.58207885e-01,  1.20053359e+00]])
# Select optimal number of clusters, k

Sum_of_squared_differences_countries = []

K = range(1, 31)
for k in K:
    kmeans_countries = KMeans(n_clusters=k)
    kmeans_countries.fit(pca_countries)
    Sum_of_squared_differences_countries.append(kmeans_countries.inertia_)
Sum_of_squared_differences_countries
[593.6468345005125,
 304.3305477620855,
 206.67194353683982,
 148.95634380079338,
 121.28174974759105,
 97.43999961101579,
 80.83841686606813,
 68.90853268223835,
 60.22874329306373,
 54.14883465310518,
 48.573875243346166,
 45.43947822211138,
 40.28743116061935,
 37.842849347016006,
 34.45415389681908,
 33.05362269834717,
 31.83094041245218,
 28.85520473381342,
 26.095529452652656,
 24.52245402734946,
 23.495979871278994,
 22.14922258419889,
 19.607214460726738,
 18.627583942264145,
 17.74833201042812,
 16.640232091054198,
 16.03881696384903,
 14.79872891026243,
 13.496850551112397,
 13.02564592378948]
# Plot the number of clusters against the sum of squared differences

# Plot and font size
plt.figure(figsize=(11.7, 8.27))
sns.set(font_scale=1.5)

# Generate the plot
grid = sns.lineplot(x=K, y=Sum_of_squared_differences_countries)

# Add x and y labels
labels = grid.set(xlabel="Number of clusters, k", ylabel="Total squared distances")
../_images/06_pattern_discovery_using_unsupervised_learning_80_0.png
# k = 4 appears optimal so we specify n_clusters=4  and run the KMeans algorithm
kmeans_countries_k4 = KMeans(n_clusters=4)
kmeans_countries_k4.fit(pca_countries)
KMeans(n_clusters=4)
# Labels of clusters each country belongs to
kmeans_countries_k4.labels_
array([3, 0, 2, 1, 2, 0, 2, 0, 0, 1, 2, 2, 2, 3, 1, 2, 3, 0, 2, 3, 1, 3,
       2, 1, 1, 1, 1, 2, 2, 3, 2, 1, 1, 3, 1, 2, 0, 2, 2, 2, 2, 2, 0, 2,
       1, 2, 1, 0, 2, 0, 0, 3, 1, 2, 2, 1, 0, 1, 2, 0, 1, 2, 0, 2, 2, 3,
       2, 2, 1, 3, 2, 1, 1, 2, 3, 1, 0, 1, 0, 1, 2, 0, 0, 3, 0, 2, 2, 1,
       3, 1, 3, 1, 1, 1, 2, 1, 1, 0, 2, 2, 1, 0, 0, 2, 1, 2, 1, 1, 3, 2,
       1, 2, 2, 0, 0, 0, 0, 1, 3, 1, 2, 0, 2, 0, 1, 2, 0, 1, 0, 3, 3, 0],
      dtype=int32)
# Plot the clusters along the two principal components

sns.set(font_scale=1.3)
plt.figure(figsize=(20.7, 16.27))

grid = sns.scatterplot(
    x=pca_countries[:, 0], y=pca_countries[:, 1], hue=kmeans_countries_k4.labels_
)

for label in range(0, mobility_trends_countries.shape[0]):
    grid.text(
        pca_countries[label, 0],
        pca_countries[label, 1],
        mobility_trends_countries.index[label],
    )
../_images/06_pattern_discovery_using_unsupervised_learning_83_0.png
# Add the cluster membership as a new column
mobility_trends_countries["clusters_countries_k4"] = kmeans_countries_k4.labels_
mobility_trends_countries
Retail_Recreation Grocery_Pharmacy Parks Transit_stations Workplaces Residential clusters_countries_k4
country_region
Afghanistan 14.629630 37.950231 6.611236 -4.188804 -7.533917 4.368709 3
Angola -12.054187 -0.541706 6.636268 -27.115071 -11.890746 7.796813 0
Antigua and Barbuda -18.242138 -8.100629 33.349057 -43.980952 -33.560899 4.367725 2
Argentina -41.844881 -8.477221 -59.743529 -46.950210 -12.739573 10.504889 1
Aruba -19.974843 -6.563941 12.723270 -45.807128 -21.149194 5.266667 2
... ... ... ... ... ... ... ...
Venezuela -30.187251 -5.294821 -25.338645 -35.782869 -20.547809 12.866534 1
Vietnam -17.849583 -1.788475 -19.921904 -16.383344 -3.686304 -3.852658 0
Yemen 22.608782 35.235060 24.800839 10.733753 -0.384462 2.203187 3
Zambia 4.727092 12.515936 14.473795 -5.911355 -8.706175 8.505976 3
Zimbabwe -10.236083 8.790144 -8.785682 -21.217305 -8.079623 11.429731 0

132 rows × 7 columns

# Check in which cluster the United Kingdom was assigned
UK_cluster = mobility_trends_countries[
    mobility_trends_countries.index == "United Kingdom"
]
UK_cluster
Retail_Recreation Grocery_Pharmacy Parks Transit_stations Workplaces Residential clusters_countries_k4
country_region
United Kingdom -36.80968 -8.658667 28.105415 -38.142992 -35.856338 12.764187 1
# Access the UK cluster label
UK_cluster.clusters_countries_k4[0]
1
# Identify which other countries were assigned to the same cluster as
# the United Kingdom. These countries were found to be similar to
# the United Kingdom in terms of mobility trends since mid-February 2020.
mobility_trends_countries[
    mobility_trends_countries.clusters_countries_k4
    == UK_cluster.clusters_countries_k4[0]
]
Retail_Recreation Grocery_Pharmacy Parks Transit_stations Workplaces Residential clusters_countries_k4
country_region
Argentina -41.844881 -8.477221 -59.743529 -46.950210 -12.739573 10.504889 1
Barbados -28.920833 -21.027198 -2.568820 -48.604196 -35.294311 7.961740 1
Bolivia -34.237756 -16.888959 -30.541595 -40.778590 -21.111781 13.924102 1
Cambodia -18.744566 -14.834839 -21.219523 -40.810991 -23.098286 11.036915 1
Cape Verde -37.539932 -15.226971 -31.050916 -61.708897 -35.132780 8.541126 1
Chile -43.175436 -24.218896 -47.080019 -42.491373 -22.204740 16.739138 1
Colombia -34.484833 -11.973455 -31.031449 -35.618999 -15.789791 12.373928 1
Costa Rica -34.962151 -15.964143 -44.117530 -41.059761 -25.007968 11.380478 1
Dominican Republic -30.151327 -9.214685 -32.295794 -35.217581 -32.856045 7.742787 1
Ecuador -22.314371 -8.216520 -26.811120 -31.441646 -29.826379 13.833723 1
El Salvador -25.504932 -8.077880 -30.267082 -31.815097 -28.766900 7.822666 1
Guatemala -25.027598 -15.172614 -30.907618 -36.827924 -24.630356 9.426033 1
Honduras -36.432134 -14.375700 -31.256232 -38.933476 -29.526529 5.587909 1
Ireland -33.553143 2.208573 4.704640 -35.992820 -31.905896 11.153335 1
Jamaica -29.791493 -16.386174 -29.762737 -29.953326 -28.777418 8.112504 1
Jordan -18.520229 2.979689 -23.446470 -49.044259 -26.888299 7.200291 1
Kuwait -34.235828 -21.931346 -32.606566 -21.464818 -30.640898 14.061209 1
Malaysia -32.561370 -5.290696 -26.189600 -44.232086 -24.315252 14.673892 1
Mauritius -22.539723 -14.343453 -43.448380 -40.304823 -24.118148 4.000554 1
Mexico -28.896777 -4.104189 -32.162622 -29.086818 -20.787698 10.111131 1
Morocco -33.375498 2.070717 -35.086155 -29.600598 -24.522410 12.061255 1
Myanmar (Burma) -43.484728 -27.064409 -34.006752 -41.780749 -35.533865 17.929615 1
Nepal -32.974104 -9.229084 -14.531873 -19.846614 -25.856574 10.705179 1
Oman -26.370485 -16.848849 -34.493798 -40.140449 -25.352067 4.543478 1
Panama -49.300797 -27.804781 -54.726096 -48.637450 -41.089641 21.383466 1
Paraguay -26.095047 -12.136182 -36.109029 -42.722443 -14.520564 11.017186 1
Peru -41.989582 -25.619250 -35.329888 -50.049217 -25.067323 17.806543 1
Philippines -39.366559 -15.754154 -25.052964 -47.620418 -31.061475 19.073404 1
Portugal -28.188907 -1.650543 -2.672356 -32.557503 -24.958716 11.770899 1
Puerto Rico -22.803820 -16.031771 -35.964184 -48.612320 -24.612285 9.463872 1
Rwanda -8.363755 -20.352866 0.948637 -26.371871 -20.675099 18.381676 1
Singapore -25.525896 -2.109562 -22.727092 -32.498008 -22.992032 18.470120 1
Slovenia -30.619501 -15.418783 -6.228830 -31.772042 -26.431177 8.331623 1
South Africa -19.719944 -3.551126 -21.925228 -28.383172 -19.965717 13.025299 1
Sri Lanka -33.302789 -16.434263 -22.000000 -29.625498 -30.079681 17.145418 1
The Bahamas -32.922096 -18.009420 -3.241427 -52.817518 -32.716393 4.891566 1
Trinidad and Tobago -26.085657 1.838645 -22.460159 -42.029880 -28.529880 8.752988 1
United Kingdom -36.809680 -8.658667 28.105415 -38.142992 -35.856338 12.764187 1
Venezuela -30.187251 -5.294821 -25.338645 -35.782869 -20.547809 12.866534 1