9.1. The Geometry of the SVD

Recall from our introduction to eigenvalue problems at the beginning of Application of Eigenvalues and Eigenvectors that multiplication of a matrix and a vector usually stretches and rotates the vector. The SVD factors out the stretching and rotating components of the matrix. For an \(m{\times}n\) real matrix \(\bf{A}\) [1] and a \(n{\times}1\) unit vector \(\bm{v}\), we have

(9.1)\[\mathbf{A}\,\bm{v} = \sigma\,\bm{u},\]

where \(\sigma\) is a positive, real scalar that is the length of the resulting vector and \(\bm{u}\) is a unit vector that shows the direction of the resulting vector. The factorization uses the \(\bm{v}\) and \(\bm{u}\) vectors as the columns of matrices \(\bf{V}\) and \(\bf{U}\), which we require to be orthogonal. The \(\sigma\)s are the singular values in a diagonal matrix, \(\bf{\Sigma}\). The \(\bf{V}\) and \(\bf{U}\) matrices may be considered rotation matrices, while \(\bf{\Sigma}\) is a stretching matrix. The breadth of applications of the SVD stems from the orthogonal and diagonal nature of the factors. The factorization is then \(\mathbf{A} = \mathbf{U\,\Sigma\,V}^T\), or \(\mathbf{A\,V} = \mathbf{U\,\Sigma}\). For a square matrix, it looks as follows.

\[\begin{split}\begin{array}{ll} \mathbf{A\,V} &= \mathbf{A} \begin{bmatrix} \vertbar{} & \vertbar{} & {} & \vertbar{} \\ \bm{v}_1 & \bm{v}_2 & \cdots{} & \bm{v}_n \\ \vertbar{} & \vertbar{} & {} & \vertbar{} \end{bmatrix} = \begin{bmatrix} \vertbar{} & \vertbar{} & {} & \vertbar{} \\ \sigma_1\ & \bm{u}_1 & \sigma_2\ & \bm{u}_2 & \cdots{}\sigma_n\ & \bm{u}_n \\ \vertbar{} & \vertbar{} & {} & \vertbar{} \end{bmatrix} \\ {} \\ &= \begin{bmatrix} \vertbar{} & \vertbar{} & {} & \vertbar{} \\ \bm{u}_1 & \bm{u}_2 & \cdots{} & \bm{u}_n \\ \vertbar{} & \vertbar{} & {} & \vertbar{} \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 & \cdots{} & 0 \\ 0 & \sigma_2 & \cdots{} & 0 \\ \vdots{} & \vdots{} & \ddots{} & \vdots{} \\ 0 & 0 & \cdots{} & \sigma_n \end{bmatrix} \\ {} \\ &= \mathbf{U\,\Sigma} \end{array}\end{split}\]

Note

The previous equation shows a square matrix (\(m = n\)). For rectangular matrices, the \(\bf{\Sigma}\) matrix has rows or columns of zeros to accommodate the matrix multiplication requirements. The size of each matrix is: \(\bf{A}\): \(m{\times}n\), \(\bf{U}\): \(m{\times}m\), \(\bf{\Sigma}\): \(m{\times}n\), and \(\bf{V}\): \(n{\times}n\).

9.1.1. Ordered Columns of the SVD

The columns of the sub-matrices are ordered according to the values of the singular values (\(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_p\)), where \(p= \min\{m, n\}\). The columns of \(\bf{U}\) and \(\bf{V}\) are ordered to match the singular values.

9.1.2. SVD of Rectangular Matrices

To satisfy the size requirements for multiplying the SVD factors, the \(\bf{\Sigma}\) matrix contains rows of zeros for over-determined matrices and columns of zeros for under-determined matrices. Figure Fig. 9.1, Fig. 9.2 show the related sizes of the sub-matrices for over-determined and under-determined matrices.

Shape of the SVD factors of an over-determined matrix

Fig. 9.1 Shape of the SVD factors of an over-determined matrix

Shape of the SVD factors of an under-determined matrix

Fig. 9.2 Shape of the SVD factors of an under-determined matrix

9.1.3. Economy SVD

Note that in the multiplication of the factors to yield the original matrix, \((m - n)\) columns of \(\bf{U}\) for an over-determined matrix and \((n - m)\) rows of \(\mathbf{V}^T\) for an under-determined matrix are multiplied by zeros from \(\bf{\Sigma}\). They are unnecessary to recover \(\bf{A}\) from its factors. Many applications of the SVD do not need the elements of \(\bf{U}\) or \(\bf{V}\) corresponding to zeros in \(\bf{\Sigma}\). So the economy SVD, invoked with the ’econ’ option to the svd function, is often used instead of the full SVD. The economy SVD removes the unused elements. Figure Fig. 9.3 shows the related sizes of the economy sub-matrices for over-determined matrices. The primary difference to be aware of when applying the economy SVD is a degradation of the orthogonal properties of \(\bf{U}\) and \(\bf{V}\). For an over-determined matrix \(\mathbf{\tilde{U}}^T\mathbf{\tilde{U}} = \mathbf{I}\), but \(\mathbf{\tilde{U}}\mathbf{\tilde{U}}^T \ne \mathbf{I}\). Similarly, for an under-determined matrix \(\mathbf{\tilde{V}}^T\mathbf{\tilde{V}} = \mathbf{I}\), but \(\mathbf{\tilde{V}}\mathbf{\tilde{V}}^T \ne \mathbf{I}\).

Shape of the economy SVD factors of an over-determined matrix

Fig. 9.3 Shape of the economy SVD factors of an over-determined matrix

We will apply the economy SVD to vector projections in Projection and the Economy SVD.

9.1.4. The Classic SVD

This section discusses the relationship between a matrix and its SVD factors. In addition to learning more about the SVD, we will see how the equations relate to the classic method for calculating the SVD. Following the description of the SVD, we will consider the requirements and the difficulties associated with implementing the classic SVD algorithm.

You may have correctly guessed that finding the SVD factors from \(\bf{A}\) makes use of eigenvalues and eigenvectors. However, the SVD factorization works with rectangular and square matrices, but eigenvalues and eigenvectors are only found for square matrices. So a square matrix derived from any \(\bf{A}\) is needed.

Consider matrices \(\mathbf{A}^T\mathbf{A}\) and \(\mathbf{A\, A}^T\), which are always square, symmetric, have positive real eigenvalues, and have orthogonal eigenvectors. The size of \(\mathbf{A}^T \mathbf{A}\) is \(n{\times}n\), while the size of \(\mathbf{A\, A}^T\) is \(m{\times}m\). When we express \(\mathbf{A}^T \mathbf{A}\) and \(\mathbf{A\, A}^T\) with the SVD factors, a pair of matrices reduce to the identity matrix and two \(\bf{\Sigma}\) matrices combine into a diagonal matrix of squared singular values.

\[\begin{split}\begin{array}{ll} \mathbf{A}^T \mathbf{A} &= \left(\mathbf{U\, \Sigma\, V}^T\right)^T\, \mathbf{U\, \Sigma\, V}^T \\ &= \mathbf{V \, \Sigma}^T \, \mathbf{U}^T \, \mathbf{U\, \Sigma\, V}^T \\ &= \mathbf{V \, \Sigma}^T \, \mathbf{\Sigma\, V}^T \\ &= \mathbf{V \, \Sigma}^2 \, \mathbf{V}^T \\ &= \mathbf{V} \begin{bmatrix} \sigma_1^2 & 0 & \cdots{} & 0 \\ 0 & \sigma_2^2 & \cdots{} & 0 \\ \vdots{} & \vdots{} & \ddots{} & \vdots{} \\ 0 & 0 & \cdots{} & \sigma_n^2 \end{bmatrix} \mathbf{V}^T \end{array}\end{split}\]

The factoring is the diagonalization of a symmetric matrix (Diagonalization of a Symmetric Matrix). It follows that the \(\bf{V}\) matrix comes from the eigenvectors of \(\mathbf{A}^T \mathbf{A}\). Likewise, the \(\bf{\Sigma}\) matrix is the square root of the diagonal eigenvalue matrix of \(\mathbf{A}^T \mathbf{A}\).

Similarly, the \(\bf{U}\) matrix is the eigenvector matrix of \(\mathbf{A\,A}^T\).

\[\begin{split}\begin{array}{ll} \mathbf{A \,A}^T &= \mathbf{U\, \Sigma\, V}^T\, \left(\mathbf{U\, \Sigma\, V}^T\right)^T \\ &= \mathbf{U \, \Sigma}^T \, \mathbf{V}^T \, \mathbf{V\, \Sigma\, U}^T \\ &= \mathbf{U \, \Sigma}^T \, \mathbf{\Sigma\, U}^T \\ &= \mathbf{U \, \Sigma}^2 \, \mathbf{U}^T \\ &= \mathbf{U} \begin{bmatrix} \sigma_1^2 & 0 & \cdots{} & 0 \\ 0 & \sigma_2^2 & \cdots{} & 0 \\ \vdots{} & \vdots{} & \ddots{} & \vdots{} \\ 0 & 0 & \cdots{} & \sigma_m^2 \end{bmatrix} \mathbf{U}^T \end{array}\end{split}\]

Existence of the SVD

Theorem 9.1 (Existence of the SVD)

Any matrix can be factored into a singular value decomposition,

\[\mathbf{A} = \mathbf{U\, \Sigma\, V}^T.\]

If \(\bf{A} \in \mathbb{R}^{m\times n}\), the factors \(\bf{U} \in \mathbb{R}^{m\times m}\) and \(\bf{V} \in \mathbb{R}^{n\times n}\) are orthogonal and \(\bf{\Sigma}\) \(\in \mathbb{R}^{m\times n}\) is diagonal with \(r = \text{rank}(\mathbf{A})\) positive values. If \(\bf{A}\) \(\in \mathbb{C}^{m\times n}\), then \(\bf{U}\) and \(\bf{V}\) are unitary, and \(\bf{\Sigma}\) is still real.

Proof. As explained in Strang’s introductory linear algebra text [STRANG16], the SVD exists for all \(\bf{A}\) because \(\mathbf{A}^T \mathbf{A}\), or \(\mathbf{A}^H \mathbf{A}\) for the complex case, is real, symmetric, and positive definite. By the spectral decomposition theorem, \(\mathbf{A}^T \mathbf{A}\) always has orthogonal eigenvectors (Real Eigenvalues of a Symmetric Matrix, Spectral Decomposition Theorem in Schur Decomposition of Symmetric Matrices). Thus, from the diagonalization, \(\mathbf{A}^T \mathbf{A}\,\bm{v}_i = \sigma_i^2 \,\bm{v}_i\), and \(\mathbf{A}^T \mathbf{A\,V} = \mathbf{\Sigma}^T \mathbf{\Sigma}\). The first \(r\) columns of \(\bf{U}\) may be computed directly from \(\bf{A}\), \(\bf{V}\), and \(\bf{\Sigma}\). We get the unit vectors \(\bm{u}_1\) to \(\bm{u}_r\) from \(\mathbf{A}\,\bm{v}_i = \sigma\,\bm{u}_i\). The first \(r\) columns of \(\bf{U}\) are orthogonal because the columns of \(\bf{V}\) are orthogonal, \(\bm{v}_i^T \bm{v}_j = 0\) for \(i \neq j\).

\[\bm{u}_i^T\,\bm{u}_j = \left(\frac{\m{A}\bm{v}_i}{\sigma_i}\right)^T \left(\frac{\m{A}\bm{v}_j}{\sigma_j}\right) = \frac{\bm{v}_i^T\m{A}^T\m{A}\,\bm{v}_j}{\sigma_i\,\sigma_j} = \frac{\sigma_j^2}{\sigma_i\,\sigma_j} \bm{v}_i^T \bm{v}_j = 0, \quad i \neq j\]

Columns \(r+1\) to \(m\) of \(\bf{U}\) may be computed from the eigenvectors of \(\mathbf{A\, A}^T\), which are also orthogonal.

Lyche’s text on Matrix factorizations [LYCHE20] gives a longer SVD existence proof.

\(\qed\)

The proof of existence outlines the classic algorithm for computing the SVD.

9.1.5. Implementation Difficulties

Finding the SVD using the classic algorithm is problematic.

  • A difficulty arises from the rows or columns of zeros in \(\bf{\Sigma}\) for the full SVD. The full \(\bf{U}\) matrix for an over-determined matrix can not be directly found from \(\bf{A}\), \(\bf{V}\), and \(\bf{\Sigma}\) as may be done for square matrices, or using the economy SVD of a rectangular matrix. The same problem exists for finding the full \(\bf{V}\) matrix for an under-determined matrix. However, finding both \(\bf{U}\) and \(\bf{V}\) from both \(\mathbf{AA}^T\) and \(\mathbf{A}^T\mathbf{A}\) is fraught with problems. If the eigenvectors of \(\mathbf{A}^T\mathbf{A}\) and \(\mathbf{AA}^T\) are computed independent of each other, there can be a problem with certain columns of \(\bf{U}\) or \(\bf{V}\) needing to be multiplied by \(-1\) for the factoring to be correct. It is best to do only one eigenvector calculation.

  • Some applications may have large \(\bf{A}\) matrices (even hundreds of thousands of rows and columns), so calculating \(\mathbf{AA}^T\) might be slow and may have unacceptable round-off errors.

  • Small singular values of \(\bf{A}\) can not be computed accurately with this approach. By the “loss of information through squaring” phenomenon [WATKINS10], a singular value of \(10^{-5}\) for \(\bf{A}\) corresponds to an eigenvalue of \(10^{-10}\) for \(\mathbf{A}^T\mathbf{A}\), which can not be expected to be computed to the level of needed accuracy. Golub and Reinsch [GOLUB70] offer a simple experiment to observe the lack of accuracy. Consider the matrix

    \[\begin{split}\mathbf{A} = \begin{bmatrix} 1 & 1 \\ b & 0 \\ 0 & b \end{bmatrix},\end{split}\]

    then

    \[\begin{split}\mathbf{A}^T\mathbf{A} = \begin{bmatrix} {1+b^2} & 1 \\ 1 & {1+b^2} \end{bmatrix}.\end{split}\]

    The singular values are: \(\sigma_1 = \sqrt{2 + b^2}\), and \(\sigma_2 = \abs{b}\). Reduce the \(b\) variable by powers of 10 from 0.01 to \(10^{-8}\) and solve for the singular values with the classic SVD algorithm. Small errors in \(\sigma_2\) are returned from the beginning, and the errors grow as \(b\) is reduced. At \(b = 10^{-8}\), the error is complete and \(\sigma_2 = 0\).

The difficulties associated with finding the SVD using the classic approach motivates the search for alternative algorithms, which are explored in Calculating the SVD with Unitary Transformations.

Footnote: