12.5. Calculating the SVD with Unitary Transformations

As mentioned in Matrix Factorizations, the singular value decomposition has a wealth of practical applications to linear algebra as well as to many science and engineering disciplines. Whereas the eigendecomposition only applies to square matrices, the SVD has a factoring for any matrix. Many matrices that contain data are rectangular.

The factoring is \mathbf{A} = \mathbf{U\, \Sigma \,V}^T. If \mathbf{A} is real, both \bf{U} and \bf{V} are real orthogonal matrices. If \mathbf{A} is complex, then \bf{U} and \bf{V} are complex unitary matrices. The \bf{\Sigma} matrix is always a diagonal matrix of positive, real, scalar values that we call the singular values.

Note

The existence of the SVD came in 1873 when Eugenio Beltrami published a paper describing it. Unaware of Beltrami’s paper, Camille Jordan independently published a paper describing it a year later [STEWART93].

Unfortunately, the classic algorithm for calculating the SVD called for finding the eigendecomposition of both \mathbf{A}^T\m{A} and \m{A\, A}^T. If \m{A} is a large matrix, just calculating \mathbf{A}^T\m{A} and \m{A\, A}^T is slow and also degrades the condition number causing poor numerical stability. Cleve Moler, co-founder of MathWorks, observed in a blog post that the SVD was of little practical value until 1970 [MOLER06]. Even when computers became available, the algorithm was too slow to be useful.

In the early 1960s, the Householder and Givens transformations were available, and the eigenvalue problem was solved using the Francis orthogonal factoring algorithm. Hence, the attention of numerical analysis researchers switched from the eigenvalue problem to the SVD problem. A seminal article by Gene Golub and William Kahan in 1965 [GOLUB65] inaugurated the journey toward a modern SVD algorithm. From 1965 to 1970, Gene Golub, William Kahan, and Christian Reinsch developed a fast SVD algorithm [GOLUB70]. The algorithm that they developed is not only faster than previous algorithms, but it also provides results that are numerically more accurate [YETURU20], [LAMBERS10].

The much-improved strategy to the classic SVD algorithm uses Householder and Givens unitary transforms to drive the matrix to diagonal form. The overall procedure is similar to the diagonalization procedure for finding the Schur decomposition. The main differences are that the SVD calculation does not have the similarity requirement, and the SVD procedure must continue until the matrix is diagonal. The diagonalization procedure only uses similarity transformations until the matrix is upper triangular.

12.5.1. Golub–Kahan 1965

Golub and Kahan propose beginning the procedure with Householder transformations to convert the matrix into bidiagonal form. The left transformations move the elements below the diagonal to zero. The transforms that multiply from the right reduce elements to the right of the superdiagonal to zeros. Matrix \mathbf{A} \in \mathbb{C}^{m{\times}n} is factored as

(12.12)\mathbf{A} = \mathbf{P\,J\,Q}^H,

where \mathbf{P}_{m{\times}m} and \mathbf{Q}_{n{\times}n} are orthogonal and \mathbf{J}_{m{\times}n} is a bidiagonal matrix.

\mathbf{J} = \begin{bmatrix}
       \alpha_1 & \beta_1 & 0 & \cdot & \cdot & 0 & 0 \\
       0 & \alpha_2 & \beta_2 & 0 & & & 0 \\
       \cdot  &   & \cdot  & \cdot &  &   & \cdot  \\
       \cdot  &   & & \cdot  & \cdot & &\cdot \\
       0  &   & & 0 & \alpha_{n-2} & \beta_{n-2} & 0 \\
       0  & 0 & \cdot & \cdot & 0 & \alpha_{n-1} & \beta_{n-1} \\
       0  & 0 & \cdot & \cdot & \cdot & 0 & \alpha_n \\
       0  & 0 & \cdot & \cdot & \cdot & 0 & 0 \\
       \cdot  & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\
       0  & 0 & \cdot & \cdot & \cdot & 0 & 0
\end{bmatrix}_{m{\times}n}

Matrices \bf{A}, \bf{J}, and diagonal \bf{\Sigma} are orthogonally equivalent [1] meaning that they have the same singular values.

12.5.1.1. Orthogonally Equivalent Matrices

Theorem 12.8 (Orthogonally Equivalent Matrices)

Given matrices \mathbf{C} \text{ and } \m{A} \in
\mathbb{C}^{m{\times}n}, \mathbf{Q}_1 \in \mathbb{C}^{m{\times}m}, and \mathbf{Q}_2 \in \mathbb{C}^{n{\times}n}, such that \mathbf{C} = \mathbf{Q}_1\,\mathbf{A\,Q}_2^{-1}. Then \mathbf{C} is orthogonally equivalent to \mathbf{A}, if \mathbf{Q}_1 and \mathbf{Q}_2 are unitary.

Proof. Definitions establish the proof. The unitary transformations adapt as needed to trim the original matrix to a diagonal form. As long as \mathbf{Q}_1 and \m{Q}_2 are unitary, then \m{C} is not modified from \mathbf{A} in a manner that would change the singular values. Differences in SVD factors of \mathbf{A} and \m{C} are in the \m{U} and \mathbf{V} factors, not \m{\Sigma}.

\qed

The bidiagonal form makes significant progress, but the bigger challenge of equation (12.12) is to find the SVD of \bf{J}. The function bidiag found in Bidiagonal Decomposition implements the bidiagonal decomposition.

\begin{aligned}
&\mathbf{J} = \mathbf{X\,\Sigma\,Y}^H \\
&\mathbf{A} = \mathbf{P\,X\,\Sigma\,Y}^H \mathbf{Q}^H \\
&\mathbf{A}  = \mathbf{U\,\Sigma\,V}^H, \: \text{where }
   \mathbf{U} = \mathbf{P\,X}, \: \text{and }
   \mathbf{V} = \mathbf{Q\,Y}
\end{aligned}

Golub and Kahan make several observations about how various algorithms known at the time could be used to compute the singular values of \bf{J}, but they do not propose an exact algorithm. The strategy that they suggest has the most promise is to form a symmetric, 2n{\times}2n tridiagonal matrix, \bf{T}, from the first n rows of \bf{J} (remove the m-n rows of zeros from \bf{J}), and find the eigenvalues of \bf{T}. \bf{T} has size 2n{\times}2n, and has a positive and negative eigenvalue equivalent to each singular value of \bf{J}. To generate \bf{T}, first build a symmetric 2n{\times}2n matrix, \bf{C} from \bf{J} and then use a so called “perfect shuffle” to rearrange the elements of \bf{C} such that the subdiagonal and superdiagonal contain the same values, and zeros are on the diagonal.

\mathbf{C} = \mat{\mathbf{0}_{n{\times}n} \mathbf{J}^{H};
        \mathbf{J} \mathbf{0}_{n{\times}n}}

\mathbf{T} = \begin{bmatrix}
     0 & \alpha_1 & 0 & \cdot & \cdot & 0 & 0 \\
     \alpha_1 & 0 & \beta_1 & 0 & & & 0 \\
     0 &  \beta_1 & 0 & \alpha_2 & &   & \cdot  \\
     \cdot & & \alpha_2 &  0 & \cdot & & \cdot \\
     \cdot & & & \cdot &  0 & \beta_{n-1} & 0  \\
     0 &  & & & \beta_{n-1}  & 0 & \alpha_n \\
     0 & 0  & \cdot & \cdot & 0  & \alpha_{n} & 0
\end{bmatrix}_{m{\times}n}

To implement the perfect shuffle, create a permutation matrix, \bf{P}, from the columns of an identity matrix. Let \bm{e_i} represent the ith column of an identity matrix of size 2n{\times}2n, then \mathbf{P} = [\bm{e}_1\:\bm{e}_{n+1}\:\bm{e}_2\:\bm{e}_{n+2}\cdots
\bm{e}_n\:\bm{e}_{2\,n}], and \mathbf{T} = \mathbf{P}^T\mathbf{C\,P}. Function pShuffle in shows how to build \bf{T} from \bf{J}. The example below shows finding the positive and negative eigenvalues of \bf{T} corresponding to the singular values of a square matrix using the Golub–Kahan 1965 algorithm. Since \bf{T} is symmetric and in Hessenberg form, we can use the Francis algorithm to find the eigenvalues of \bf{T}.

>> A = randi(20, 3) - 8;
>> [P,J,Q] = bidiag(A);
>> J
J =
   -7.8740   -7.4801         0
         0   11.6766    3.0656
         0         0   -0.5547
>> T = pShuffle(J)
T =
         0   -7.8740         0         0         0         0
   -7.8740         0   -7.4801         0         0         0
         0   -7.4801         0   11.6766         0         0
         0         0   11.6766         0    3.0656         0
         0         0         0    3.0656         0   -0.5547
         0         0         0         0   -0.5547         0
>> S1 = sort(eigFrancis(T), 'descend')
S1 =
   14.8423
    6.5905
    0.5214
   -0.5214
   -6.5905
  -14.8423
>> S2 = svd(A)
S2 =
   14.8423
    6.5905
    0.5214
function T = pShuffle(J)
% This is the "perfect shuffle" of bidiagonal J to the
% tridiagonal, symmetric T. T has values from J on the
% superdiagonal and subdiagonal and zeros on the diagonal.
n = size(J,2);
P = zeros(2*n,2*n); E = eye(2*n,2*n);
for k=0:n-1
    P(:,1+2*k) = E(:,1+k); P(:,2+2*k) = E(:,n+k+1);
end
C = [zeros(n) J'; J zeros(n)]; T = P'*C*P;

If \bf{A} is over-determined, then \bf{A}, \bf{J} and \bf{T} all have different sizes, which prevents finding the left and right singular vectors (\bf{U} and \bf{V}) directly from the procedure for finding the singular values. Golub and Kahan’s 1965 paper offers several suggestions on how the singular vectors might be found, with insights into the advantages and disadvantages of each method. However, they do not propose a specific procedure for finding the singular vectors.

12.5.2. Golub–Reinsch 1970

Following Golub and Kahan’s suggestions, researchers explored several algorithmic possibilities between 1965 and 1970. Golub and Christian Reinsch published a paper in 1970 [GOLUB70] that established the best algorithm as of that time. They recognized redundancies in processing the larger \bf{T} matrix with the Francis algorithm to find the eigenvalues of \bf{T}. They proposed using a variant of the Francis algorithm directly on \bf{J}. The adjustments to the Francis algorithm reflect that a bidiagonal matrix has zeros on the subdiagonal, and the similarity requirement does not constrain the SVD. The Golub–Reinsch algorithm creates the bulge at position (2, 1) from a Given transform multiplied from the left. As with previous Francis algorithms, the bulge-creating transform includes an implicit shift from the eigenvalues of the lower-right 2{\times}2 submatrix of \mathbf{J}^H\mathbf{J}. The products of the left and right unitary transforms form the singular vector matrices \mathbf{U} and \m{V}. The Golub–Reinsch algorithm accomplished the long-standing goal of developing a fast and accurate algorithm to compute the SVD of a matrix.

12.5.3. Demmel–Kahan 1990

James Demmel and William Kahan published a small improvement to the Golub–Reinsch algorithm in 1990 [DEMMEL90]. Although researchers continue to explore algorithmic refinements for computing the SVD, the 1990 Demmel–Kahan algorithm, which is also known as Francis’s algorithm with a zero shift, represents the foundation for the current best-known algorithm.

The 1990 Demmel–Kahan SVD algorithm skips the step of explicitly creating a bulge with a shift at the beginning of each Francis step, which gives an improvement over the Golub–Reinsch algorithm with better accuracy when finding small singular values. The sequence of transforms in a Francis step starts with a Givens transform post-multiplied from the right on the top row. A bulge stemming from the values in the matrix is created at position (2, 1) and moves through the same sequence of positions as with the Golub–Reinsch algorithm. Each transform applied from the left targets the bulge just below the diagonal. A transform post-multiplied from the right targets a value on the superdiagonal and simultaneously sets the bulge above the superdiagonal to zero.

The example below shows an illustration of a 5{\times}5 matrix. A plus sign marks the bulge’s location after applying each Givens transformation. Matrix elements moved to zero by the transform are noted with a 0. Nonzero elements along the diagonal and superdiagonal are marked with a *.

\begin{bmatrix}
        * & * &   &   & \\
          & * & * &   &  \\
          &   & * & * &  \\
          &   &   & * & *\\
          &   &   &   & *
    \end{bmatrix} \quad
    \begin{bmatrix}
        * & 0 &   &   & \\
        + & * & * &   &  \\
          &   & * & * &  \\
          &   &   & * & *\\
          &   &   &   & *
    \end{bmatrix} \quad
    \begin{bmatrix}
        * & * & + &   & \\
        0 & * & * &   &  \\
          &   & * & * &  \\
          &   &   & * & *\\
          &   &   &   & *
    \end{bmatrix}

\begin{bmatrix}
        * & * & 0 &   & \\
          & * & 0 &   &  \\
          & + & * & * &  \\
          &   &   & * & *\\
          &   &   &   & *
    \end{bmatrix} \quad
    \begin{bmatrix}
        * & * &   &   & \\
          & * & * & + &  \\
          & 0 & * & * &  \\
          &   &   & * & *\\
          &   &   &   & *
    \end{bmatrix} \quad
    \begin{bmatrix}
        * & * &   &   & \\
          & * & * & 0 &  \\
          &   & * & 0 &  \\
          &   & + & * & *\\
          &   &   &   & *
    \end{bmatrix}

\begin{bmatrix}
        * & * &   &   & \\
          & * & * &   &  \\
          &   & * & * & +\\
          &   & 0 & * & *\\
          &   &   &   & *
    \end{bmatrix} \quad
    \begin{bmatrix}
        * & * &   &   & \\
          & * & * &   &  \\
          &   & * & * & 0\\
          &   &   & * & 0\\
          &   &   & + & *
    \end{bmatrix} \quad
    \begin{bmatrix}
        * & * &   &   & \\
          & * & * &   &  \\
          &   & * & * &  \\
          &   &   & * & *\\
          &   &   & 0 & *
    \end{bmatrix}

The svd65 function returns the SVD factors of a matrix. It handles the administrative tasks. It calls the bidiag function to put the matrix in bidiagonal form, calls the svdFrancis function to run the SVD algorithm, and returns the combined SVD factors for either the full or economy SVD. The svdFrancis function is a simplified implementation of the Demmel–Kahan algorithm, which finds the SVD using a variant of the Francis algorithm with a zero shift.

Note that it is not necessary to sort the singular values. The Francis algorithm automatically finds the singular values from smallest to largest. The \bf{\Sigma} matrix has the most significant singular value in the first diagonal position. The Givens tranforms push the larger values toward the upper-left of the matrix. The algorithm may need to multiply singular values and the cooresponding columns of \m{U} by -1 to ensure that all singular values are positive.

The next example shows computing the SVD of a random over-determined matrix to test the accuracy of the decomposition and the orthogonality of the left and right singular vectors.

A = randi(20, 7, 5) - 8;
>> [U,S,V] = svd65(A);
>> norm(A*V - U*S, 'fro')
ans =
   5.1878e-13

>> S
S =
   22.1789         0         0         0         0
         0   18.4649         0         0         0
         0         0   14.8821         0         0
         0         0         0    8.6344         0
         0         0         0         0    1.7654
         0         0         0         0         0
         0         0         0         0         0

>> norm(U'*U - eye(7), 'fro')
ans =
   2.7299e-15
>> norm(V'*V - eye(5), 'fro')
ans =
   2.8669e-15

12.5.3.1. The svd65 Function

function [U,S,V] = svd65(A, mode)
% SVD65 - A simple singular value decomposition (SVD).
%
% [U,S,V] = SVD65(A) returns matices such that A = U*S*V'.
% S = SVD65(A) returns the singular values of A.
% [U,S,V] = SVD65(A, 'econ') and S = SVD65(A, 'econ') finds the
%           economy svd.
%
% A - input matrix (m x n), real or complex
%
% U - (m x m) unitary matrix - same as the eigenvectors of A*A'
% S - (m x n) diagonal matrix containing the sorted, real singular
%     values of A
% V - (n x n) unitary matrix - same as the eigenvectors of A'*A
%   U*S*V' = A
%
% It is called svd65 because the seminal paper by Golub and Kahan
% on finding a fast and accurate SVD algorithm was published in 1965.
%
% see also: SVD, BIDIAG, SVDFRANCIS

    if ~exist("mode", "var")
        mode = 'full';
    else
        if ~any(mode == ["full" "econ"])
            disp("Optional second argument should be one of 'full' for")
            disp("a full SVD or 'econ' for the economy SVD.")
            disp("The default is 'full'.")
        end
    end
    [m, n] = size(A);
    if m >= n                % square or over-determined
        [P,J,Q] = bidiag(A); % Householder bidiagonal
        [X,S,Y] = svdFrancis(J);  % Francis algorithm
        U = P*X; V = Q*Y;
        if mode == "econ"
            U = U(:, 1:n);
            S = S(1:n,:);
        end
    else             % under-determined - Use A' to find SVD of
                     % of over-dermined matrix, then take transpose.
        [P,J,Q] = bidiag(A'); % Householder bidiagonal
        [X,S,Y] = svdFrancis(J);   % Francis algorithm
        % A' = P*X*S*Y'*Q'
        V = P*X; U = Q*Y; S = S';  % A = Q*Y*S*X'*P'
        if mode == "econ"
            S = S(:,1:m);
            V = V(:,1:m);
        end
    end

    if ~isreal(S), S = real(S); end
    if nargout <= 1, U = diag(S); end
end

12.5.3.2. The svdFrancis Function

function [U,S,V] = svdFrancis(J, tol)
% SVDFRANCIS - Singular value decomposition (SVD) of a bidiagonal matrix
%              using Francis's algorithm with zero shift. This is a
%              simplified 1990 Demmel--Kahan algorithm implementation.
%
%  Usually call SVD65, not this function. It deals with under-determined
%  matrices, converts to bidagonal, calls this function, and
%  returns the full or economy SVD factors.
%  Input matrix J must be in bidiagonal form.
%  [U,S,V] = SVDFRANCIS(J) returns the SVD factors. U and V are
%  orthogonal. S is diagonal with the singular values on the diagonal.
%  S = SVDFRANCIS(J) returns the singular values.
%  An optional tolerance value may be specified.
%
%  This function is for educational purposes only.
%
%  See also: BIDIAG, GIVENS, SVD65, SVD

    [m,n] = size(J);
    if n > m    % under-determined
        error('Use the transform of under-determined matrices.')
    end
    if ~exist('tol','var')
       tol=eps*2048;
    end
    S = J;
    k = n;
    U = eye(m);
    V = eye(n);
    while k >= 2
        for c = 1:k-1  % c = current column
            % A(c,c+1) & A(c-1,c+1) = 0, A(c+1,c) nonzero.
            G = givens(S(c,c), S(c, c+1), c, c+1, n);
            S = S*G.';   % A = U*S*G1.'*G2.'*...*Gn.'*V'
            S(c, c+1) = 0; % no round-off error
            if c>1, S(c-1,c+1) = 0; end
            V = V*G.';   % V = G1.'*G2.'*...Gn.',
   % V' = conj(Gn*...*G2*G1)
   % Transpose of V reverses order and transposes each

            % A(c+1, c) = 0, A(c, c+1) & A(c,c+2) nonzero.
            G = givens(S(c,c), S(c+1, c), c, c+1, m);
            S = G*S;    % A = U*Gn*...*G2*G1*S*V'
            S(c+1, c) = 0; % no round-off error
            U = U*G';   % U = G1'*G2'*...*Gn
        end % for - finished Francis step
        % convergence test
        if abs(S(k-1, k)) < tol
            S(k-1, k) = 0;
            k = k-1;  % deflation
        end
    end %while
    if ~isreal(S), S = real(S); end

    % Use a logical vector to fix any negative values in S
    belowZero = diag(S) < 0;
    U(:,belowZero) = -U(:,belowZero);
    S(belowZero,belowZero) = -S(belowZero,belowZero);

    if nargout<=1
       U=diag(S);
    else    % clear any round-off error above the diagonal.
       S(1:n, 1:n) = diag(diag(S));
    end
end % function
[1]The term isometric is also used to describe matrices with the same singular values.