.. _PCA:
Principal Component Analysis (PCA)
===================================
.. include:: ../replace.txt
.. index:: covariance, PCA, SVD
As the name implies, Principal Component Analysis (PCA) accentuates the
principal components of data such that the subtle but salient features of data
are more apparent. Like the SVD, PCA reduces the dimensionality of data
emphasizing the distinguishing characteristics while removing the uncorrelated
noise. The PCA algorithm maps the data into new axes called the PCA space.
From the PCA space, the data may be reconstructed with reduced dimension as
the rank-:math:`k` sum of SVD outer products accomplishes. However, it is more
common to use the PCA space data for classification
applications. In the PCA space, similar items are grouped together and
separated from clusters of dissimilar items.
Suppose that you wish to automate the classification of items into groups based
on measurable features. PCA can use correlations between the features to find
a linear combination of the features such that similar items are clustered
together in the reduced dimension PCA space. PCA uses the most significant
eigenvectors found from a covariance matrix to find the linear combinations of
the features. Then, when we multiply sample data by the most significant
eigenvectors (determined by the eigenvalues), the differences between data sets
become more obvious allowing for source identification.
If you have :math:`n` measured features from :math:`m` sources (samples) then
the data can be arranged in a matrix. Examples of potential data sets include
the stock values from :math:`m` companies taken over :math:`n` days, opinions
of :math:`m` customers regarding :math:`n` competing products, or the grades of
:math:`m` students in :math:`n` courses. In the examples presented here, the
matrix columns hold the measured features and each row holds the data taken
from one source. Although with adjustments to equations, the data may be
transposed with the features in rows and the samples in columns.
Use of a covariance matrix is significant to PCA as it reveals how similar and
dissimilar (at variance) sets of data are to each other.
(See :ref:`covariance`)
The first step in the algorithm is to subtract the mean of each feature from
the data so that each feature has a zero mean. This is needed to make
a covariance matrix.
.. math:: \mathbf{D} = \mat{\vertbar{}, \vertbar{}, {}, \vertbar{};
(\bm{x}_1-\mu_1){}, (\bm{x}_2-\mu_2){}, \cdots{},
(\bm{x}_n-\mu_n){};
\vertbar{}, \vertbar{}, {}, \vertbar{}}
Recall from :ref:`covariance` that a covariance matrix is symmetric
with the variance of each feature on the diagonal and the covariance
between features in the off-diagonal positions.
.. math:: \mathbf{Cov_x} = \frac{\mathbf{D}^T\,\mathbf{D}}{m - 1}
.. math:: \mathbf{Cov_x} =
\mat{Var({x_1, x_1}), Cov({x_1, x_2}), \cdots, Cov({x_1, x_n});
Cov({x_2, x_1}), Var({x_2, x_2}), \cdots, Cov({x_2, x_n});
\vdots, \vdots, \ddots, \vdots;
Cov({x_n, x_1}), Cov({x_n, x_2}), \cdots, Var({x_n, x_n})}
Next, the eigenvectors, :math:`\mathbf{V}`, and eigenvalues of the covariance
matrix are found. The eigenvalues are used to sort the eigenvectors. All
eigenvalues could be used, but because we wish to achieve dimensionality
reduction, only the eigenvectors with the largest eigenvalues are used. The
PCA space is the :math:`k` most significant eigenvectors. The sub-set of the
eigenvectors gives us the mapping of the data onto the PCA space.
.. math:: \mathbf{W} = \mat{\bm{v}_1, \cdots, \bm{v}_k}
.. note::
The computation performed to find the :math:`\bf{V}`
eigenvector matrix
from the :math:`\bf{D}` matrix is the same as finding the
:math:`\bf{V}` sub-matrix of the SVD (``[~, ~, V] = svd(D)``).
:math:`\bf{W}` gives us the linear combinations of each data feature to
project the data onto the PCA space, which is found by multiplying
the :math:`\bf{W}` matrix by each sample vector.
.. math:: \mathbf{Y} = \mathbf{D\,W}.
For purposes of classification and object recognition, the data in the PCA
space (:math:`\bf{Y}`) is used to display or compare the data.
To reconstruct the original data with reduced dimensionality, we just reverse
the steps.
.. math:: \begin{array}{ll}
\mathbf{\tilde{X}} &= \mathbf{D} + \bm{\mu} \\
&= \mathbf{W\,(W}^T\,\mathbf{D}) + \bm{\mu} \\
&= \mathbf{W\,Y} + \bm{\mu}
\end{array}
.. topic:: Transposed Data
If the measured features are held in the rows with the samples in the
columns, then adjusted equations are as follows.
.. math:: \mathbf{D} = \spalignmat{\horzbar{} {(\bm{x}_1 - \mu_1)}
\horzbar{};
\horzbar{} {(\bm{x}_2 - \mu_2)} \horzbar{};
\vdots{} \vdots{} \vdots{};
\horzbar{} {(\bm{x}_n - \mu_n)} \horzbar{}}
.. math:: \mathbf{Cov_x} = \frac{\mathbf{D\,D}^T}{n - 1}.
.. math:: \mathbf{Y} = \mathbf{W}^T\mathbf{D}.
.. _pcaDataAnal:
PCA for Data Analysis
------------------------
To illustrate what PCA does with a simple plot,
the following code shows a PCA example with only two data sets.
Noise was added to the data to show how dimensionality reduction separates
the essence of the data from the uncorrelated noise. In this example, the
data was reconstructed with reduced dimensionality.
Notice the lines showing the principal and minor axes. The orientation of the
principal axis is aligned with the correlation of the data, while the minor
axis is aligned with the uncorrelated noise. Thus, as we also saw
in (:ref:`dimRed`) for the SVD, the lower rank data is
preferred to the full rank data that contains noise. :numref:`fig-PCA-plot`
shows a plot demonstrating that the reconstructed data is aligned with the
correlation between the columns of the original data.
.. note::
* The |M| Statistics and Machine Learning Toolbox has a PCA function,
but the direct calculation is only a few lines of code.
* The ``PCA_demo`` example uses the ``repmat`` |M| function that that has
been previously used. The ``repmat`` function creates a matrix with
copies of a scalar, vector, or matrix. In this case, the variable ``mu``
is a row vector, so ``repmat(mu, m, 1)`` makes a matrix with :math:`m`
rows, each holding one copy of the ``mu`` vector. This allows use of
element-wise subtraction of the mean from each column.
::
% File: PCA_demo.m
% PCA - 2D Example
n = 2;
m = 50;
% two columns random but correlated data
X = zeros(m, n);
X(:,1) = 5*randn(m, 1) - 2.5;
X(:,2) = -1 + 1.5 * X(:,1) + 5*randn(m,1);
X = sortrows(X,1);
%% find PCA from eigenvectors of covariance
mu = mean(X);
D = X - repmat(mu, m, 1);
covx = D'*D/(m - 1); % sources in rows, features in columns
[V, S] = eig_decomp(covx);
% project onto PCA space
W = V(:,1); % only use dominant eigenvector
Y = D * W;
disp('Eigenvalue ratios to total')
disp(S/trace(S))
% Data reconstruction
Xtilde = Y*W' + repmat(mu, m, 1);
% plots
figure, scatter(X(:,1), X(:,2)), title('PCA Demo');
hold on
plot([0 5*V(1,1)], [0 5*V(2,1)]); % principal axis
plot([0 5*V(1,2)], [0 5* V(2,2)]); % minor axis
plot(Xtilde(:,1), Xtilde(:,2)); % principal component
legend({'data in', 'principal axis', 'minor axis', ...
'reconstruction'}, 'Location', 'northwest');
hold off
function [eigvec,eigval]=eig_decomp(C)
[eigvec,eigval]=eig(C);
eigval=abs(diag(eigval)');
[eigval, order]=sort(eigval, 'descend');
eigval=diag(eigval);
eigvec=eigvec(:, order);
end
.. _fig-PCA-plot:
.. figure:: PCA_plot.png
:align: center
:width: 60%
The principal component of the reconstructed data from PCA is
aligned with the correlation between the columns of the original
data.
.. topic:: Is PCA the same as Least Squares Regression?
The plot in :numref:`fig-PCA-plot` makes it appear that reconstructed data
from PCA does the same thing as least squares regression; however, it
differs in two ways.
1. In least squares regression, the error that is minimized is parallel to
the :math:`y`--axis, in PCA the minimized error is orthogonal to the
data projection line (principal axis).
2. In least squares regression, the data fit is an extension of solving
a system of linear equations for over-determined systems.
In PCA, the data fit is a by-product of dimensionality reduction.
.. _PCAcluster:
PCA for Clustering
-------------------------
An important application of PCA is classification, which reduces the dimensions
of the data for the purpose of making it easier to see how the attributes of
items of the same type are similar and differ from items of other types. The
result of the analysis is often a plot showing the data in the PCA space.
In this example, we will look at a public domain dataset and produce
a scatter plot showing how groups of the data are clustered in the PCA space.
The dataset can be downloaded from the `Data World `_ web
site. It comes from the USDA National Nutrient Database and was made available
by Craig Kelly. It list many different foods and identifies food groups,
nutrition, and vitamin data for each food. We will use the nutrition
information in the PCA analysis. The scatter plot in :numref:`fig-PCA-food`
shows how different food groups are similar. For example, it shows that some
fruits and most grains are similar to sweets, probably because they are also
high in carbohydrates. Some of the food groups that are contained in the data
are not plotted.
To see a list of all food groups in the data set, type the following
after the ``FoodGroups`` variable is created by the script.
::
>> categories(FoodGroups)
Download the dataset and my MATLAB script.
:download:`nndb_flat.csv`
:download:`PCA_food.m`
.. _fig-PCA-food:
.. figure:: PCA_food.png
:align: center
:width: 80%
Scatter plot of first two principal components from nutrition data for
several foods from seven food groups.
PCA for Recognition
-------------------------
A fun example of how PCA can be used is for face recognition.
The eigenvectors found from PCA give us a matrix for multiplying images of
peoples faces. These resulting matrices, or vectors -- depending on the
implementation, make a more succinct representation of whom is pictured in
each image.
An example implementation is found on the MathWorks File Exchange web site.
`Two dimensional PCA for face recognition
`_
[ALSAQRE19]_
Since, like SVD, PCA reduces the dimensionality of the data from rank
:math:`r` to rank :math:`k`, for purposes of comparing training and testing
data, row or column coefficients of the data beyond :math:`k` may either be
removed (making smaller matrices) or set to zero. Reducing the
dimensionality is often desired to make the matrix more manageable.
.. topic:: Additional reading
Here is a very good tutorial on PCA
:download:`Principal_Component_Analysis_Tutorial.pdf` by Alaa Tharwat that
I found on MathWorks File Exchange ([THARWAT16]_). The tutorial shows both
the SVD and covariance PCA algorithms. I will use the covariance
algorithm. Figure 2 of the tutorial shows a diagram of the covariance
algorithm.
.. topic:: Homework
Now complete :ref:`PCA_classify` homework.
.. raw:: latex
\clearpage