12.5. Calculating the SVD with Unitary Transformations¶
As mentioned in Matrix Factorizations, the singular value decomposition has a wealth of practical applications to linear algebra as well as to many science and engineering disciplines. Whereas the eigendecomposition only applies to square matrices, the SVD has a factoring for any matrix. Many matrices that contain data are rectangular.
The factoring is . If
is real, both
and
are
real orthogonal matrices. If
is complex, then
and
are complex unitary matrices. The
matrix is always a diagonal matrix of positive,
real, scalar values that we call the singular values.
Note
The existence of the SVD came in 1873 when Eugenio Beltrami published a paper describing it. Unaware of Beltrami’s paper, Camille Jordan independently published a paper describing it a year later [STEWART93].
Unfortunately, the classic algorithm for calculating the SVD called for
finding the eigendecomposition of both and
. If
is a large matrix, just
calculating
and
is slow
and also degrades the condition number causing poor numerical stability.
Cleve Moler, co-founder of MathWorks, observed in a blog post that the
SVD was of little practical value until
1970 [MOLER06]. Even when computers became available,
the algorithm was too slow to be useful.
In the early 1960s, the Householder and Givens transformations were available, and the eigenvalue problem was solved using the Francis orthogonal factoring algorithm. Hence, the attention of numerical analysis researchers switched from the eigenvalue problem to the SVD problem. A seminal article by Gene Golub and William Kahan in 1965 [GOLUB65] inaugurated the journey toward a modern SVD algorithm. From 1965 to 1970, Gene Golub, William Kahan, and Christian Reinsch developed a fast SVD algorithm [GOLUB70]. The algorithm that they developed is not only faster than previous algorithms, but it also provides results that are numerically more accurate [YETURU20], [LAMBERS10].
The much-improved strategy to the classic SVD algorithm uses Householder and Givens unitary transforms to drive the matrix to diagonal form. The overall procedure is similar to the diagonalization procedure for finding the Schur decomposition. The main differences are that the SVD calculation does not have the similarity requirement, and the SVD procedure must continue until the matrix is diagonal. The diagonalization procedure only uses similarity transformations until the matrix is upper triangular.
12.5.1. Golub–Kahan 1965¶
Golub and Kahan propose beginning the procedure with Householder
transformations to convert the matrix into bidiagonal form. The left
transformations move the elements below the diagonal to zero. The
transforms that multiply from the right reduce elements to the right of
the superdiagonal to zeros. Matrix
is factored as
(12.12)¶
where and
are orthogonal and
is a bidiagonal matrix.
Matrices ,
, and diagonal
are orthogonally equivalent [1] meaning that
they have the same singular values.
12.5.1.1. Orthogonally Equivalent Matrices¶
Theorem 12.8 (Orthogonally Equivalent Matrices)
Given matrices,
, and
, such that
. Then
is orthogonally equivalent to
, if
and
are unitary.
Proof. Definitions establish the proof. The unitary transformations adapt as
needed to trim the original matrix to a diagonal form. As long as
and
are unitary, then
is
not modified from
in a manner that would change the
singular values. Differences in SVD factors of
and
are in the
and
factors, not
.
The bidiagonal form makes significant progress, but the bigger challenge of
equation (12.12) is to find the SVD of . The function
bidiag
found in Bidiagonal Decomposition implements the bidiagonal decomposition.
Golub and Kahan make several observations about how various algorithms known at
the time could be used to compute the singular values of , but
they do not propose an exact algorithm. The strategy that they suggest has the
most promise is to form a symmetric,
tridiagonal matrix,
, from the first
rows of
(remove the
rows of zeros from
), and find the eigenvalues of
.
has size
, and has a
positive and negative eigenvalue equivalent to each singular value of
. To generate
, first build a symmetric
matrix,
from
and then use a
so called “perfect shuffle” to rearrange the elements of
such
that the subdiagonal and superdiagonal contain the same values, and zeros are
on the diagonal.
To implement the perfect shuffle, create a permutation matrix, ,
from the columns of an identity matrix. Let
represent the
th column of an identity matrix of size
, then
, and
.
Function
pShuffle
in shows how to build from
.
The example below shows finding the positive and negative eigenvalues of
corresponding to the singular values of a square matrix using
the Golub–Kahan 1965 algorithm. Since
is symmetric and in
Hessenberg form, we can use the Francis algorithm to find the eigenvalues of
.
>> A = randi(20, 3) - 8;
>> [P,J,Q] = bidiag(A);
>> J
J =
-7.8740 -7.4801 0
0 11.6766 3.0656
0 0 -0.5547
>> T = pShuffle(J)
T =
0 -7.8740 0 0 0 0
-7.8740 0 -7.4801 0 0 0
0 -7.4801 0 11.6766 0 0
0 0 11.6766 0 3.0656 0
0 0 0 3.0656 0 -0.5547
0 0 0 0 -0.5547 0
>> S1 = sort(eigFrancis(T), 'descend')
S1 =
14.8423
6.5905
0.5214
-0.5214
-6.5905
-14.8423
>> S2 = svd(A)
S2 =
14.8423
6.5905
0.5214
function T = pShuffle(J)
% This is the "perfect shuffle" of bidiagonal J to the
% tridiagonal, symmetric T. T has values from J on the
% superdiagonal and subdiagonal and zeros on the diagonal.
n = size(J,2);
P = zeros(2*n,2*n); E = eye(2*n,2*n);
for k=0:n-1
P(:,1+2*k) = E(:,1+k); P(:,2+2*k) = E(:,n+k+1);
end
C = [zeros(n) J'; J zeros(n)]; T = P'*C*P;
If is over-determined, then
,
and
all have different sizes, which
prevents finding the left and right singular vectors (
and
) directly from the procedure for finding the singular
values. Golub and Kahan’s 1965 paper offers several suggestions on how
the singular vectors might be found, with insights into the advantages
and disadvantages of each method. However, they do not propose a
specific procedure for finding the singular vectors.
12.5.2. Golub–Reinsch 1970¶
Following Golub and Kahan’s suggestions, researchers explored several
algorithmic possibilities between 1965 and 1970. Golub and Christian
Reinsch published a paper in 1970 [GOLUB70] that
established the best algorithm as of that time. They recognized
redundancies in processing the larger matrix with the
Francis algorithm to find the eigenvalues of
. They
proposed using a variant of the Francis algorithm directly on
. The adjustments to the Francis algorithm reflect that a
bidiagonal matrix has zeros on the subdiagonal, and the similarity
requirement does not constrain the SVD. The Golub–Reinsch algorithm
creates the bulge at position (2, 1) from a Given transform multiplied
from the left. As with previous Francis algorithms, the bulge-creating
transform includes an implicit shift from the eigenvalues of the
lower-right
submatrix of
. The products of the left and right
unitary transforms form the singular vector matrices
and
. The Golub–Reinsch algorithm accomplished the
long-standing goal of developing a fast and accurate algorithm to
compute the SVD of a matrix.
12.5.3. Demmel–Kahan 1990¶
James Demmel and William Kahan published a small improvement to the Golub–Reinsch algorithm in 1990 [DEMMEL90]. Although researchers continue to explore algorithmic refinements for computing the SVD, the 1990 Demmel–Kahan algorithm, which is also known as Francis’s algorithm with a zero shift, represents the foundation for the current best-known algorithm.
The 1990 Demmel–Kahan SVD algorithm skips the step of explicitly creating a bulge with a shift at the beginning of each Francis step, which gives an improvement over the Golub–Reinsch algorithm with better accuracy when finding small singular values. The sequence of transforms in a Francis step starts with a Givens transform post-multiplied from the right on the top row. A bulge stemming from the values in the matrix is created at position (2, 1) and moves through the same sequence of positions as with the Golub–Reinsch algorithm. Each transform applied from the left targets the bulge just below the diagonal. A transform post-multiplied from the right targets a value on the superdiagonal and simultaneously sets the bulge above the superdiagonal to zero.
The example below shows an illustration of a matrix. A plus
sign marks the bulge’s location after applying each Givens transformation.
Matrix elements moved to zero by the transform are noted with a 0. Nonzero
elements along the diagonal and superdiagonal are marked with a
.
The svd65
function returns the SVD factors of a matrix. It handles the
administrative tasks. It calls the bidiag
function to put the matrix in
bidiagonal form, calls the svdFrancis
function to run the SVD algorithm,
and returns the combined SVD factors for either the full or economy SVD. The
svdFrancis
function is a simplified implementation of the Demmel–Kahan
algorithm, which finds the SVD using a variant of the Francis algorithm with a
zero shift.
Note that it is not necessary to sort the singular values. The Francis
algorithm automatically finds the singular values from smallest to largest.
The matrix has the most significant singular value in
the first diagonal position. The Givens tranforms push the larger values
toward the upper-left of the matrix. The algorithm may need to multiply
singular values and the cooresponding columns of
by -1 to
ensure that all singular values are positive.
The next example shows computing the SVD of a random over-determined matrix to test the accuracy of the decomposition and the orthogonality of the left and right singular vectors.
A = randi(20, 7, 5) - 8;
>> [U,S,V] = svd65(A);
>> norm(A*V - U*S, 'fro')
ans =
5.1878e-13
>> S
S =
22.1789 0 0 0 0
0 18.4649 0 0 0
0 0 14.8821 0 0
0 0 0 8.6344 0
0 0 0 0 1.7654
0 0 0 0 0
0 0 0 0 0
>> norm(U'*U - eye(7), 'fro')
ans =
2.7299e-15
>> norm(V'*V - eye(5), 'fro')
ans =
2.8669e-15
12.5.3.1. The svd65
Function¶
function [U,S,V] = svd65(A, mode)
% SVD65 - A simple singular value decomposition (SVD).
%
% [U,S,V] = SVD65(A) returns matices such that A = U*S*V'.
% S = SVD65(A) returns the singular values of A.
% [U,S,V] = SVD65(A, 'econ') and S = SVD65(A, 'econ') finds the
% economy svd.
%
% A - input matrix (m x n), real or complex
%
% U - (m x m) unitary matrix - same as the eigenvectors of A*A'
% S - (m x n) diagonal matrix containing the sorted, real singular
% values of A
% V - (n x n) unitary matrix - same as the eigenvectors of A'*A
% U*S*V' = A
%
% It is called svd65 because the seminal paper by Golub and Kahan
% on finding a fast and accurate SVD algorithm was published in 1965.
%
% see also: SVD, BIDIAG, SVDFRANCIS
if ~exist("mode", "var")
mode = 'full';
else
if ~any(mode == ["full" "econ"])
disp("Optional second argument should be one of 'full' for")
disp("a full SVD or 'econ' for the economy SVD.")
disp("The default is 'full'.")
end
end
[m, n] = size(A);
if m >= n % square or over-determined
[P,J,Q] = bidiag(A); % Householder bidiagonal
[X,S,Y] = svdFrancis(J); % Francis algorithm
U = P*X; V = Q*Y;
if mode == "econ"
U = U(:, 1:n);
S = S(1:n,:);
end
else % under-determined - Use A' to find SVD of
% of over-dermined matrix, then take transpose.
[P,J,Q] = bidiag(A'); % Householder bidiagonal
[X,S,Y] = svdFrancis(J); % Francis algorithm
% A' = P*X*S*Y'*Q'
V = P*X; U = Q*Y; S = S'; % A = Q*Y*S*X'*P'
if mode == "econ"
S = S(:,1:m);
V = V(:,1:m);
end
end
if ~isreal(S), S = real(S); end
if nargout <= 1, U = diag(S); end
end
12.5.3.2. The svdFrancis
Function¶
function [U,S,V] = svdFrancis(J, tol)
% SVDFRANCIS - Singular value decomposition (SVD) of a bidiagonal matrix
% using Francis's algorithm with zero shift. This is a
% simplified 1990 Demmel--Kahan algorithm implementation.
%
% Usually call SVD65, not this function. It deals with under-determined
% matrices, converts to bidagonal, calls this function, and
% returns the full or economy SVD factors.
% Input matrix J must be in bidiagonal form.
% [U,S,V] = SVDFRANCIS(J) returns the SVD factors. U and V are
% orthogonal. S is diagonal with the singular values on the diagonal.
% S = SVDFRANCIS(J) returns the singular values.
% An optional tolerance value may be specified.
%
% This function is for educational purposes only.
%
% See also: BIDIAG, GIVENS, SVD65, SVD
[m,n] = size(J);
if n > m % under-determined
error('Use the transform of under-determined matrices.')
end
if ~exist('tol','var')
tol=eps*2048;
end
S = J;
k = n;
U = eye(m);
V = eye(n);
while k >= 2
for c = 1:k-1 % c = current column
% A(c,c+1) & A(c-1,c+1) = 0, A(c+1,c) nonzero.
G = givens(S(c,c), S(c, c+1), c, c+1, n);
S = S*G.'; % A = U*S*G1.'*G2.'*...*Gn.'*V'
S(c, c+1) = 0; % no round-off error
if c>1, S(c-1,c+1) = 0; end
V = V*G.'; % V = G1.'*G2.'*...Gn.',
% V' = conj(Gn*...*G2*G1)
% Transpose of V reverses order and transposes each
% A(c+1, c) = 0, A(c, c+1) & A(c,c+2) nonzero.
G = givens(S(c,c), S(c+1, c), c, c+1, m);
S = G*S; % A = U*Gn*...*G2*G1*S*V'
S(c+1, c) = 0; % no round-off error
U = U*G'; % U = G1'*G2'*...*Gn
end % for - finished Francis step
% convergence test
if abs(S(k-1, k)) < tol
S(k-1, k) = 0;
k = k-1; % deflation
end
end %while
if ~isreal(S), S = real(S); end
% Use a logical vector to fix any negative values in S
belowZero = diag(S) < 0;
U(:,belowZero) = -U(:,belowZero);
S(belowZero,belowZero) = -S(belowZero,belowZero);
if nargout<=1
U=diag(S);
else % clear any round-off error above the diagonal.
S(1:n, 1:n) = diag(diag(S));
end
end % function
[1] | The term isometric is also used to describe matrices with the same singular values. |