12.5. Calculating the SVD with Unitary Transformations¶
The classic algorithm for calculating the SVD called for finding the eigendecomposition of both \(\m{A}^T\m{A}\) and \(\m{A\, A}^T\). If \(\m{A}\) is a large matrix, just calculating \(\m{A}^T\m{A}\) and \(\m{A\, A}^T\) is slow and also degrades the condition number, causing poor numerical stability. Thus, the SVD had little practical value before the improved algorithm was available [MOLER06].
The Householder and Givens transformations were available in the early 1960s and the eigenvalue problem was solved using the Francis algorithm. Hence, the attention of numerical analysis researchers switched from the eigenvalue problem to the SVD problem. A seminal article by Gene Golub and William Kahan in 1965 [GOLUB65] inaugurated the journey toward a modern SVD algorithm. From 1965 to 1970, Gene Golub, William Kahan, and Christian Reinsch developed a fast SVD algorithm [GOLUB70]. The algorithm they developed is not only faster than previous algorithms but also provides results that are numerically more accurate [YETURU20], [LAMBERS10].
The much-improved strategy to the classic SVD algorithm uses Householder and Givens unitary [1] transforms to drive the matrix to diagonal form. The overall procedure is similar to the diagonalization procedure for finding the Schur decomposition. The main differences are when the iterative parts finish, and that the SVD calculation does not have the similarity requirement. The iterative phase of the diagonalization procedure is finished when the matrix is upper triangular, while the iterative SVD procedure continues until the matrix is diagonal.
12.5.1. Golub–Kahan 1965¶
Golub and Kahan propose beginning the procedure with Householder transformations to convert the matrix into bidiagonal form. The left transformations move the elements below the diagonal to zero. The transforms that multiply from the right reduce elements to the right of the superdiagonal to zeros. Matrix \(\mathbf{A} \in \mathbb{C}^{m{\times}n}\) is factored as
where \(\mathbf{P}_{m{\times}m}\) and \(\mathbf{Q}_{n{\times}n}\) are orthogonal and \(\mathbf{J}_{m{\times}n}\) is a bidiagonal matrix.
Matrices \(\bf{A}\), \(\bf{J}\), and diagonal \(\bf{\Sigma}\) are orthogonally equivalent [2] meaning that they have the same singular values.
Orthogonally Equivalent Matrices
Theorem 12.7 (Orthogonally Equivalent Matrices)
Given matrices \(\m{C} \text{ and } \m{A} \in \mathbb{C}^{m{\times}n}\), \(\m{Q}_1 \in \mathbb{C}^{m{\times}m}\), and \(\m{Q}_2 \in \mathbb{C}^{n{\times}n}\), such that \(\mathbf{C} = \mathbf{Q}_1\,\mathbf{A\,Q}_2^{-1}\). Then \(\m{C}\) is orthogonally equivalent to \(\m{A}\), if \(\mathbf{Q}_1\) and \(\mathbf{Q}_2\) are unitary.
Proof. Definitions establish the proof. The unitary transformations adapt as needed to trim the original matrix to a diagonal form. As long as \(\m{Q}_1\) and \(\m{Q}_2\) are unitary, then \(\m{C}\) is not modified from \(\m{A}\) in a manner that would change the singular values. Differences in SVD factors of \(\m{A}\) and \(\m{C}\) are in the \(\m{U}\) and \(\m{V}\) factors, not \(\m{\Sigma}\), \(\mathbf{C} = \mathbf{Q}_1 \mathbf{U\, \Sigma\, V}^T \mathbf{Q}_2^T\). The \(\bf{U}\) and \(\bf{V}\) factors of \(\bf{C}\) are still unitary because \(\bf{Q}_1\) and \(\bf{Q}_2\) are unitary.
\(\qed\)
The bidiagonal form makes significant progress, but the bigger challenge
of equation (12.15) is to find the SVD of \(\bf{J}\).
The function bidiag
implements the bidiagonal decomposition.
Golub and Kahan make several observations about how various algorithms known at the time could be used to compute the singular values of \(\bf{J}\). Yet, they do not propose an exact algorithm. The strategy that they suggest might hold promise is to form a symmetric, tridiagonal matrix, \(\bf{T}\), from the first \(n\) rows of \(\bf{J}\) (remove the \(m-n\) rows of zeros from \(\bf{J}\)), and find the eigenvalues of \(\bf{T}\). \(\bf{T}\) has size \(2n{\times}2n\), and has a positive and negative eigenvalue equivalent to each singular value of \(\bf{J}\). To generate \(\bf{T}\), first build a symmetric \(2n{\times}2n\) matrix, \(\bf{C}\) from \(\bf{J}\) and then use a so called “perfect shuffle” to rearrange the elements of \(\bf{C}\) such that the subdiagonal and superdiagonal contain the same values, and zeros are on the diagonal.
If \(\bf{A}\) is over-determined, then \(\bf{A}\), \(\bf{J}\), and \(\bf{T}\) all have different sizes, which prevents finding the left and right singular vectors (\(\bf{U}\) and \(\bf{V}\)) directly from the procedure for finding the singular values. Golub and Kahan’s 1965 paper offers several suggestions on how the singular vectors might be found, with insights into the advantages and disadvantages of each method. However, they do not propose a specific procedure for finding the singular vectors.
12.5.2. Bidiagonal Decomposition¶
The bidiagonal decomposition is a preliminary step to the iterative
algorithm to calculate the SVD. A bidiagonal matrix is upper triangular,
where the only nonzero elements are on the diagonal and superdiagonal.
The bidiagonal decomposition of \(\mathbf{A}_{m{\times}n}\) is given
by \(\mathbf{A} = \mathbf{P\,J\,Q}^H\), where
\(\mathbf{P}_{m{\times}m}\) and \(\mathbf{Q}_{n{\times}n}\) are
orthogonal and \(\mathbf{J}_{m{\times}n}\) is bidiagonal. The
function bidiag
uses a sequence of Householder transformations to
convert the matrix to bidiagonal form. Because similarity is not
required, the transforms multiplied from the right set all row elements
to the right of the superdiagonal to zero.
function [P, J, Q] = bidiag(A)
% BIDIAG - Biagonal factoring of a matrix using Householder
% reflection transformations. Nonzero values are only
% along the main diagonal and the first superdiagonal.
%
% A - input matrix, square or over-determined (m-by-n)
%
% P - Pre-multiply transform
% J - bidiagonal of A - same singular values as A
% Q - Post-multiply transform
%
% A = P*J*Q'
%
% see also: HOUSEHOLDER, SVD65, SVD, svdFrancis
[m, n] = size(A);
if n > m % under-determined
error('Use the transform of under-determined matrices.')
end
P = eye(m); Q = eye(n); J = A;
% Use Householder transformations to clear values below the
% diagonal and to the right of the first superdiagonal.
for i=1:n-2
H = householder(J(i:m,i), m, i-1);
J = H*J; P = P*H'; J(i+1:m,i) = zeros(m-i,1);
H = householder(J(i,i+1:n)', n, i);
J = J*H'; Q = Q*H'; J(i,i+2:n) = zeros(1,n-i-1);
% For real numbers, H = H', but not for complex numbers
end
% last column of J to change for square matrix
H = householder(J(n-1:m,n-1), m, n-2);
J = H*J; P = P*H'; J(n:m,n-1) = zeros(m-n+1,1);
% last column of J for over-determined matrix
if m > n
H = householder(J(n:m,n), m, n-1);
J = H*J; P = P*H'; J(n+1:m,n) = zeros(m-n, 1);
end
if ~isreal(J)
% There are complex values at J(n-1,n) and J(n,n). We can
% only fix J(n,n).
H = eye(m); % make J(n,n) real
H(n,n) = conj(J(n,n))/abs(J(n,n));
J = H*J; P = P*H';
end
end % end of function
12.5.3. Golub–Reinsch 1970¶
Following Golub and Kahan’s suggestions, researchers explored several algorithmic possibilities between 1965 and 1970. Golub and Christian Reinsch published a paper in 1970 that computed the SVD in a novel way [GOLUB70]. They recognized redundancies in processing the larger \(\bf{T}\) matrix with the Francis algorithm to find the eigenvalues of \(\bf{T}\). They proposed using a variant of the Francis algorithm directly on \(\bf{J}\). The adjustments to the Francis algorithm reflect that a bidiagonal matrix has zeros on the subdiagonal, and the similarity requirement does not constrain the SVD. The Golub–Reinsch algorithm creates the bulge at position (2, 1) from a Givens transform multiplied from the right. As with previous Francis algorithms, the bulge-creating transform includes an implicit shift from the eigenvalues of the lower-right \(2n{\times}2n\) submatrix of \(\mathbf{J}^H\mathbf{J}\). The products of the left and right unitary transforms form the singular vector matrices \(\m{U}\) and \(\m{V}\). The Golub–Reinsch algorithm accomplished the long-standing goal of developing a fast and accurate algorithm to compute the SVD of a matrix.
12.5.4. Demmel–Kahan 1990¶
James Demmel and William Kahan published a small improvement to the Golub–Reinsch algorithm in 1990 [DEMMEL90]. Although researchers continue to explore algorithmic refinements for computing the SVD, the 1990 Demmel–Kahan algorithm represents the foundation for the current best-known algorithm.
The 1990 Demmel–Kahan SVD algorithm skips the step of explicitly creating a bulge with a shift at the beginning of each Francis step, which gives an improvement over the Golub–Reinsch algorithm with better accuracy when finding small singular values. As shown in equation (12.16), the sequence of transforms in a Francis step starts with a Givens transform post-multiplied from the right on the top row. A bulge is created at position (2, 1) and moves through the same sequence of positions as with the Golub–Reinsch algorithm. Each transform applied from the left targets the bulge just below the diagonal. A transform post-multiplied from the right targets a value on the superdiagonal and simultaneously sets the bulge above the superdiagonal to zero.
(12.16)¶\[ \begin{align}\begin{aligned}\begin{split}\begin{bmatrix} * & * & & & \\ & * & * & & \\ & & * & * & \\ & & & * & *\\ & & & & * \end{bmatrix} \quad \begin{bmatrix} * & 0 & & & \\ + & * & * & & \\ & & * & * & \\ & & & * & *\\ & & & & * \end{bmatrix} \quad \begin{bmatrix} * & * & + & & \\ 0 & * & * & & \\ & & * & * & \\ & & & * & *\\ & & & & * \end{bmatrix} \\\end{split}\\\begin{split}\begin{bmatrix} * & * & 0 & & \\ & * & 0 & & \\ & + & * & * & \\ & & & * & *\\ & & & & * \end{bmatrix} \quad \begin{bmatrix} * & * & & & \\ & * & * & + & \\ & 0 & * & * & \\ & & & * & *\\ & & & & * \end{bmatrix} \quad \begin{bmatrix} * & * & & & \\ & * & * & 0 & \\ & & * & 0 & \\ & & + & * & *\\ & & & & * \end{bmatrix} \\\end{split}\\\begin{split}\begin{bmatrix} * & * & & & \\ & * & * & & \\ & & * & * & +\\ & & 0 & * & *\\ & & & & * \end{bmatrix} \quad \begin{bmatrix} * & * & & & \\ & * & * & & \\ & & * & * & 0\\ & & & * & 0\\ & & & + & * \end{bmatrix} \quad \begin{bmatrix} * & * & & & \\ & * & * & & \\ & & * & * & \\ & & & * & *\\ & & & 0 & * \end{bmatrix}\end{split}\end{aligned}\end{align} \]
One Francis SVD step on a \(5n{\times}5n\) bidiagonal matrix. A plus sign marks the bulge’s location after applying each Givens transformation. Matrix elements moved to zero by the transform are noted with a 0. Nonzero elements along the diagonal and superdiagonal are marked with a \(*\).
The svd65
function returns the SVD factors of a matrix. It handles
the administrative tasks. It calls the bidiag
function to put the
matrix in bidiagonal form, calls the svdFrancis
function to run the
SVD algorithm, and returns the combined SVD factors for either the full
or the economy SVD. The svdFrancis
function is a simplified
implementation of the Demmel–Kahan algorithm.
Note that it is not necessary to sort the singular values. The algorithm automatically finds the singular values from the smallest absolute value to the largest. The \(\bf{\Sigma}\) matrix has the most significant singular value in the first diagonal position. The Givens transformations push the larger values toward the upper-left of the matrix. The algorithm may need to multiply singular values and the corresponding columns of \(\m{U}\) by -1 to ensure that all singular values are positive.
Here is an example of how to test the accuracy of the decomposition and the orthogonality of the left and right singular vectors.
>> A = randi(20, 7, 5) - 8;
>> [U,S,V] = svd65(A);
>> norm(A*V - U*S, 'fro')
ans =
5.1878e-13
>> % Test for orthogonal
>> norm(U'*U - eye(7), 'fro')
ans =
2.7299e-15
>> norm(V'*V - eye(5), 'fro')
ans =
2.8669e-15
function [U,S,V] = svd65(A, mode)
% SVD65 - A simple singular value decomposition (SVD).
%
% [U,S,V] = SVD65(A) returns matices such that A = U*S*V'.
% S = SVD65(A) returns the singular values of A.
% [U,S,V] = SVD65(A, 'econ') and S = SVD65(A, 'econ') finds the
% economy svd.
%
% A - input matrix (m x n), real or complex
%
% U - (m x m) unitary matrix - same as the eigenvectors of A*A'
% S - (m x n) diagonal matrix containing the sorted, real singular
% values of A
% V - (n x n) unitary matrix - same as the eigenvectors of A'*A
% U*S*V' = A
%
% It is called svd65 because the paper by Golub and Kahan
% was published in 1965.
%
% see also: SVD, BIDIAG, SVDFRANCIS
if ~exist("mode", "var")
mode = 'full';
else
if ~any(mode == ["full" "econ"])
disp("Optional second argument should be one of 'full' for")
disp("a full SVD or 'econ' for the economy SVD.")
disp("The default is 'full'.")
end
end
[m, n] = size(A);
if m >= n % square or over-determined
[P,J,Q] = bidiag(A); % Householder bidiagonal
[X,S,Y] = svdFrancis(J); % Francis algorithm
U = P*X; V = Q*Y;
if mode == "econ"
U = U(:, 1:n);
S = S(1:n,:);
end
else % under-determined - Use A' to find SVD of
% over-determined matrix, then take the transpose.
[P,J,Q] = bidiag(A'); % Householder bidiagonal
[X,S,Y] = svdFrancis(J); % Francis algorithm
% A' = P*X*S*Y'*Q'
V = P*X; U = Q*Y; S = S'; % A = Q*Y*S*X'*P'
if mode == "econ"
S = S(:,1:m);
V = V(:,1:m);
end
end
if ~isreal(S), S = real(S); end
if nargout <= 1, U = diag(S); end
end
function [U,S,V] = svdFrancis(J, tol)
% SVDFRANCIS - SVD of a bidiagonal matrix using Francis's algorithm
% with zero shift. This is a simplified 1990 Demmel--Kahan
% algorithm implementation.
%
% Call SVD65, not this function. It deals with under-determined
% matrices, converts to bidagonal, calls this function, and
% returns the full or economy SVD factors.
% Input matrix J must be in bidiagonal form.
% [U,S,V] = SVDFRANCIS(J) returns the SVD factors. U and V are
% orthogonal. S is diagonal.
% S = SVDFRANCIS(J) returns the singular values.
% An optional tolerance value may be specified.
%
% This function is for educational purposes only.
%
% See also: BIDIAG, GIVENS, SVD65, SVD
[m,n] = size(J);
if n > m % under-determined
error('Use the transform of under-determined matrices.')
end
if ~exist('tol', 'var')
tol=eps*2048;
end
S = J;
k = n;
U = eye(m);
V = eye(n);
while k >= 2
for c = 1:k-1 % c = current column
% A(c,c+1) & A(c-1,c+1) = 0, A(c+1,c) nonzero.
G = givens(S(c,c), S(c, c+1), c, c+1, n);
S = S*G.'; % A = U*S*G1.'*G2.'*...*Gn.'*V'
S(c, c+1) = 0; % no round-off error
if c>1, S(c-1,c+1) = 0; end
V = V*G.'; % V = G1.'*G2.'*...Gn.',
% V' = conj(Gn*...*G2*G1)
% A(c+1, c) = 0, A(c, c+1) & A(c,c+2) nonzero.
G = givens(S(c,c), S(c+1, c), c, c+1, m);
S = G*S; % A = U*Gn*...*G2*G1*S*V'
S(c+1, c) = 0; % no round-off error
U = U*G'; % U = G1'*G2'*...*Gn
end % for - finished Francis step
% convergence test
if abs(S(k-1, k)) < tol
S(k-1, k) = 0;
k = k-1; % deflation
end
end %while
if ~isreal(S), S = real(S); end
% Use a logical vector to fix any negative values in S
belowZero = diag(S) < 0;
U(:,belowZero) = -U(:,belowZero);
S(belowZero,belowZero) = -S(belowZero,belowZero);
if nargout<=1
U=diag(S);
else % clear any round-off error above the diagonal.
S(1:n, 1:n) = diag(diag(S));
end
end % function
Footnotes: