9.6. Dimensionality Reduction¶
The SVD factors can reveal the most important aspects of the data in a matrix while removing the less significant details. We began our discussion of the SVD with the simple matrix–vector relationship \(\mathbf{A}\,\bm{v} = \sigma\,\bm{u}\). Each set of vectors, \(\bm{v}\) and \(\bm{u}\), along with the corresponding singular value, \(\sigma\), are combined to build an outer product matrix of \(\sigma\,\bm{u\,v}^T\) (see Matrix Multiplication by Outer Products). The singular values are our guide to finding the most significant information. For most matrices, a few singular values will be much larger than others. Thus, in the complete matrix multiplication by outer products, the product terms with larger singular values hold the significant information, and the lesser terms can be discarded with only minimal loss of information.
The Eckart–Young theorem, developed in 1936, says that the closest rank \(k < r\) matrix to the original comes from the SVD [BRUNTON19], [ECKART36]. The choice of \(k\) should be based on the singular values. It might be chosen such that the sum of the first \(k\) singular values constitutes 95% of the sum of all singular values. Gavish and Donoho [GAVISH14] suggest that if the noise can be modeled independently of the data, then \(k\) should be the number of singular values from the data larger than the largest singular value of the noise. They also suggest algorithms and equations for picking \(k\) when the noise can not be modeled. The rank-\(k\) approximation of \(\bf{A}\) is given by the sum of rank-1 matrices from the SVD outer product.
Eckart–Young Theorem
Theorem 9.4 (Eckart–Young Theorem)
Let \(\mathbf{A} \in \mathbb{R}^{m\times n}\) and \(\mathbf{B} \in \mathbb{R}^{m\times n}\), \(\text{rank}(\mathbf{A}) = r > \text{rank}(\mathbf{B}) = k\). Define \(\mathbf{A}_k\) as the sum of the first \(k\) rank-1 outer product matrices from the SVD of \(\bf{A}\).
For all \(\bf{A}\), \(\mathbf{A}_k\) is a closer match to \(\bf{A}\) than any \(\bf{B}\) matrix of rank \(k\).
The relationship holds for all matrix norms listed in Matrix Norms.
Eckart and Young give a proof using the Frobenius norm [ECKART36]. Mirsky extends the theorem to any matrix norm [MIRSKY60].
The bestKsvd
script builds a simple pattern in the
matrix and then adds noise. The best representation of the original
matrix is found in the first three terms of the SVD outer product. The
remaining two terms restore the noise, which we prefer to do without.
% File: bestKsvd.m
% Demonstrate the Eckart-Young theorem.
% Because of the random noise added, you may want to run this
% more than once. You should see that after 3 SVD terms,
% the SVD matrix is closest (lowest mean-squared error) to the
% clean matrix before the noise was added. The latter terms
% mostly add the noise back into the matrix.
A1 = 5*eye(5) + 5*rot90(eye(5));
Aclean = [A1 A1]; % A test pattern
A = [A1 A1] + 0.5*randn(5, 10); % add some noise
[U, S, V] = svd(A);
figure, plot(diag(S), '*'), title('Singular Values')
Ak = zeros(5, 10); figure
% Look at each rank k matrix from SVD
for k = 1:5
disp(['SVD terms: ', num2str(k)])
Ak = Ak + U(:,k) * S(k,k) * V(:,k)'; disp( Ak )
subplot(2, 3, k)
imshow(Ak, []), title(['rank = ', num2str(k)])
disp(['rank: ', num2str(rank(Ak))])
disp(['Mean-squared Error: ', num2str(immse(A, Ak))])
disp(['No noise Error: ', num2str(immse(Aclean, Ak))])
disp(' ')
end
subplot(2,3,6), imshow(A, []), title('Original')
disp('Original matrix with noise'), disp(A)
The bestKsvdImage
script is a dimensionality reduction example
where the SVD of an image is examined. With the sum of a few rank-1
matrices from the SVD, the major parts of the image start to take shape.
The higher terms add fine details. Something to think about: how is a
low-rank image from SVD different from a low-pass filtered image? Figure
Fig. 9.7 shows the singular values, and the images
with progressive rank are shown in figure Fig. 9.8.
% File: bestKsvdImage.m
% Use an image to demonstrate the Eckart-Young theorem.
% Use a public domain image from the Image Processing Toolbox.
A = im2double(imread('cameraman.tif'));
[m, n] = size(A);
[U, S, V] = svd(A);
Ak = zeros(m, n);
figure, plot(diag(S), '*'), title('Singular Values')
figure % emph{Look at each rank k matrix from SVD}
for k = 1:15
disp(['SVD terms: ', num2str(k)])
Ak = Ak + U(:,k) * S(k,k) * V(:,k)'; % Outer products
subplot(4, 4, k), imshow(Ak, [])
title(['rank = ', num2str(k)])
disp(['rank: ', num2str(rank(Ak))])
disp(['Mean-squared Error: ', num2str(immse(A, Ak))])
disp(' ')
end
subplot(4,4,16), imshow(A, []), title('Original')

Fig. 9.7 The singular values of the cameraman image. Most of the important data in the image is contained in a small number of rank-1 images.¶

Fig. 9.8 Images with progressive rank from the sum of SVD outer products¶