9.4. Dimensionality Reduction

When a matrix contains important information that is obscured by noise or less significant details, the SVD factors can reveal the most important aspects of the data while removing the less significant details. We began our discussion of the SVD with the simple matrix–vector relationship \mathbf{A}\,\bm{v} = \sigma\,\bm{u}. Each set of vectors, \bm{v} and \bm{u}, along with the corresponding singular value, \sigma, are combined to build an outer product matrix of \sigma\,\bm{u\,v}^T. The singular values are our guide to finding the most significant information. For most matrices, a few singular values will be much larger than the other singular values. Thus, in the complete matrix multiplication by outer products, the product terms with larger singular values hold the significant information and the lesser terms can be discarded with only minimal loss of information.

The \sigma values are scalar multiples of each matrix (Matrix Multiplication Properties).

\mathbf{A} = \sigma_1\,\bm{u}_1\,\bm{v}_1^T
+ \sigma_2\,\bm{u}_2\,\bm{v}_2^T + \cdots
+ \sigma_r\,\bm{u}_r\,\bm{v}_r^T

The Eckart–Young theorem developed in 1936 says that the closest rank k < r matrix to the original comes from the SVD [BRUNTON19], [ECKART36]. The choice of k should be based on the singular values. It might be chosen such that the sum of the first k singular values constitute 95%, or another percentage, of the sum of all singular values. Gavish and Donoho [GAVISH14] suggest that if the noise can be modeled independent of the data then k should be the number of singular values from the data that are larger than the largest singular value of the noise. They also suggest algorithms and equations for picking k when the noise can not be modeled. The rank-k approximation of \bf{A} is given by the following sum of rank-1 matrices.

\mathbf{A}_k = \sigma_1\,\bm{u}_1\,\bm{v}_1^T
+ \sigma_2\,\bm{u}_2\,\bm{v}_2^T + \cdots
+ \sigma_k\,\bm{u}_k\,\bm{v}_k^T

Eckart–Young Theorem

Consider two different matrices, \bf{A} and \bf{B}, of the same size. \bf{A} has rank r > k. Define \mathbf{A}_k as the sum of the first k rank-1 outer product matrices from the SVD. Both \mathbf{A}_k and \bf{B} have rank k. Then the Eckart–Young theorem tells us that \mathbf{A}_k is a closer match to \bf{A} than any other rank k matrix, \bf{B} [ECKART36]. We express this as the norm (magnitude) of differences between matrices—\norm{\mathbf{A} - \mathbf{B}} >
\norm{\mathbf{A} - \mathbf{A}_k}. This relationship holds for all matrix norms listed in Matrix Norms.

The bestKsvd script builds a simple pattern in the matrix and then adds noise to the matrix. We see that the best representation of the original matrix is found in the first three terms of the SVD outer product. The remaining two terms restore the noise, which we would just as well do without.

% File: bestKsvd.m
% Demonstrate the Eckart-Young theorem that says that the
% closest rank k matrix to the original comes from the SVD.

% Because of the random noise added, you may want to run this
% more than once. You should see that after about 3 SVD terms,
% the SVD matrix is closest (lowest mean-squared error) to the
% clean matrix before the noise was added. The later terms
% mostly add the noise back into the matrix. We can see why a
% rank k (k < r) matrix from SVD may be preferred to the
% full rank r matrix.

A1 = 5*eye(5) + 5*rot90(eye(5));
Aclean = [A1 A1];                 % A test pattern
A = [A1 A1] + 0.5*randn(5, 10);   % add some noise
[U, S, V] = svd(A);
figure, plot(diag(S), '*'), title('Singular Values')
Ak = zeros(5, 10);
figure

% Look at each rank k matrix from SVD
for k = 1:5
    disp(['SVD terms: ', num2str(k)])
    Ak = Ak + U(:,k) * S(k,k) * V(:,k)';
    disp( Ak )
    subplot(2, 3, k), imshow(Ak, []), title(['rank = ', num2str(k)])
    disp(['rank: ', num2str(rank(Ak))])
    disp(['Mean-squared Error: ', num2str(immse(A, Ak))])
    disp(['No noise Mean-squared Error: ', num2str(immse(Aclean, Ak))])
    disp(' ')
end
subplot(2,3,6), imshow(A, []), title('Original')
disp('Original matrix with noise')
disp(A)

% NOTE:
% >> rank(A1)
% ans =
%      3
% >> rank(A)
% ans =
%      5

The bestKsvdImage script shows another dimensionality reduction example. This one examines the SVD of an image. With the sum of a few rank-1 matrices from the SVD, the major parts of the image start to take shape. The higher terms add fine details. Something to think about: how is a low rank image from SVD different or the same as a low-pass filtered image? Figure Fig. 9.6 shows the singular values of the image, and the images with progressive rank are shown in figure Fig. 9.7.

% File: bestKsvdImage.m
% Use an image to demonstrate the Eckart-Young theorem that says
% that the closest rank k matrix to the original comes from the SVD.


% Use a public domain image from the Image Processing Toolbox.
A = im2double(imread('cameraman.tif'));
[m, n] = size(A);

[U, S, V] = svd(A);
figure, plot(diag(S), '*'), title('Singular Values')
%%
Ak = zeros(m, n);
figure

% Look at each rank k matrix from SVD
for k = 1:15
    disp(['SVD terms: ', num2str(k)])
    Ak = Ak + U(:,k) * S(k,k) * V(:,k)';
    subplot(4, 4, k), imshow(Ak, []), title(['rank = ', num2str(k)])
    disp(['rank: ', num2str(rank(Ak))])
    disp(['Mean-squared Error: ', num2str(immse(A, Ak))])
    disp(' ')
end
subplot(4,4,16), imshow(A, []), title('Original')
../_images/imageSingVals.png

Fig. 9.6 The singular values of the cameraman image. Most of the important data to the image is contained in a small number of rank 1 images.

../_images/imageSVD.png

Fig. 9.7 Images with progressive rank from the sum of SVD outer products.