12.4.11. Convergence by Similarity Transforms

Eigenvalue finding algorithms must use an iterative sequence of similarity transformations to shift the matrix into the upper triangular form, where the eigenvalues are on the diagonal. From the matrix in upper Hessenberg form, the iterations work to cause the subdiagonal elements to converge to values near zero. As each subdiagonal element at \(\m{T}_{k, k-1}\) converges to zero, the diagonal element at \(\m{T}_{k,k}\) converges to eigenvalue \(\lambda_k\). Here, we address the basic concept of why repeated applications of similarity transformations cause convergence. The complete explanation of convergence is quite advanced. More advanced numerical analysis texts, such as by Demmel [DEMMEL97], Watkins [WATKINS10], and Golub and Van Loan [GOLUB13], have extensive coverage of convergence. Watkins makes two notable observations about convergence. First, he states that convergence is not guaranteed but that robust implementations can detect when the algorithm is not progressing toward convergence. When non-convergence is detected, a random shift is applied, which usually shakes things up, putting the algorithm on a path toward convergence. Secondly, he states that algorithms using similarity transformations and the power method converge for the same reason—changes to the vector subspace.

We can easily see from the power method (The Power Method) that raising each eigenvalue to the successively larger powers of \(k\) causes the largest eigenvalue term to dominate the equation. But there is more going on than a change to the basis vectors. The equations step through a sequence of subspaces with each multiplication by \(\m{A}\). The subspaces, \(\mathcal{S} \in \mathbb{R}^n\) used by the power method include [1]

\[\mathcal{E}_k\mathcal{S},\, \m{A}\mathcal{S},\, \m{A}^2\mathcal{S},\, \m{A}^3\mathcal{S} \ldots,\, \text{where } \mathcal{E}_k = span\{\bm{e}_1, \bm{e}_2, \ldots, \bm{e}_k\}.\]

As Watkins explains, subspace iteration is how the power method relates to eigenvalue algorithms using similarity transformations [WATKINS11]. But similarity transform algorithms do not change to a subspace from powers of \(\m{A}\) as the power method does. Rather, in each iteration, the subspace changes from the standard Euclidean subspace to one derived from \(p(\m{T})\) and then back to the Euclidean subspace. The \(p(\m{T})\) function is defined relative to the degree of the iteration, \(m\), and an estimate, \(\rho\), of the closest eigenvalue to \(\lambda_k\), which is the eigenvalue that we expect to see at position \((k,\,k)\) when the algorithm has converged. The degree is usually one or two, but it could be higher.

\[p(\m{T}) = (\m{T} - \rho_1\,\m{I}) \ldots (\m{T} - \rho_m\,\m{I})\]

The QR algorithm calculates \(p(\m{T})\) as a matrix, but the Francis algorithm is only concerned with the first column, so it is a column vector, \(p(\m{T})\bm{e}_1\). The subspaces then include

\[\mathcal{E}_k\mathcal{S},\, p(\m{T})\mathcal{S},\, p(\m{T})^2\mathcal{S},\, p(\m{T})^3\mathcal{S} \, \ldots\,.\]

The unitary transform, \(\m{Q}\), is formed from the orthogonal columns of \(p(\m{T})\) and applied to \(\m{T}\) from the left with \(\m{Q}^H\,\m{T}\), which changes of the coordinate system from the basis vectors \(\bm{e}_1,\ldots,\bm{e}_n\) to \(\bm{q}_1,\ldots,\bm{q}_n\), but then we complete the similarity transform (\(\m{Q}^H\m{T\, Q}\)) and the matrix is back to \(\bm{e}_1,\ldots,\bm{e}_n\) basis vectors. Although the matrix is back to a standard coordinate system, the matrix has changed. Of course, the eigenvalues are the same. We expect the values on the subdiagonal to have smaller absolute values and those on the diagonal to be closer to the eigenvalues.

Subspaces from powers of \(\m{A}\) emphasize the dominant eigenvalue. However, subspaces from powers of \(p(\m{T})\) emphasize the eigenvalue at position (\(k,k\)), where \(k\) is the last column used in the current iteration. Then the proof of convergence relies on what happens in the limit to \(\m{T}_{k,k-1}\) after several iterations of similarity transformations.

Watkins presents the convergence of the Francis algorithm without reference to the QR algorithm in his book on matrix computations [WATKINS10] and in a 2011 journal article [WATKINS11]. The more common pedagogy is to explain the convergence of the QR algorithm and then tie the Francis algorithm into the explanation with the Implicit Q Theorem. Observing how convergence occurs with the QR algorithm is a bit easier. So, let’s shift our discussion to more pragmatic considerations for both algorithms.

The bulge-chasing aspect of the Francis algorithm merits special consideration. As the bulge travels down the matrix’s diagonal, it changes to incorporate the first column of \(\m{Q}\) and what it finds in the matrix. The Francis algorithm uses similarity transforms, intending to move the bulge. Restoring the matrix to Hessenberg form is a side effect. The subdiagonal elements are diminished because they are unwitting victims of the bulge-chasing effort. It may be helpful to witness examples of how the Francis algorithm changes a matrix using the MATLAB debugger to trace its steps.

The strategy used by the QR algorithm to reduce the subdiagonal is an obvious juxtaposition to the Francis algorithm’s approach. QR temporarily puts the matrix into upper triangular form until the similarity transform is completed. So we have a step forward followed by a small step backward. The forward steps eventually triumph.

Footnote: