12.4.7. Convergence by Similarity Transforms¶
Eigenvalues finding algorithms must use an iterative sequence of
similarity transformations to shift the matrix into the upper triangular
form, where the eigenvalues are on the diagonal. From the matrix in
upper Hessenberg form, the iterations work to cause the subdiagonal
elements to converge to values near zero. As each subdiagonal element at
converges to zero, the diagonal element at
converges to eigenvalue
. The speed of
convergence varies due to algorithms using different similarity
transformations. Here, we address the basic concept of why repeated
applications of similarity transformations cause convergence. The
complete explanation of convergence is quite advanced. More advanced
numerical analysis texts such as by Demmel [DEMMEL97],
Watkins [WATKINS10], and Golub and Van
Loan [GOLUB13] has extensive coverage of convergence.
Watkins makes two notable observations about convergence. First, he
states that convergence is not guaranteed but that robust
implementations can detect when the algorithm is not progressing toward
convergence. When non-convergence is detected, a random shift is
applied, which usually shakes things up, putting the algorithm on a
path toward convergence. Secondly, he states that algorithms using
similarity transformations and the power method converge for the same
reason—changes to the vector subspace.
We can easily see from the power method (The Power Method) that
raising each eigenvalue to the power of causes some terms of
the equation to go to near zero while the term for the largest
eigenvalue dominates the equation. But there is more going on than a
change to the basis vectors. The equations step through a sequence of
subspaces with each multiplication by
. The subspaces,
used by the power method include
Here by we mean the standard Euclidean space where
is a standard basis vector with a one in the
position and zeros elsewhere.
As Watkins explains, subspace iteration is how the power method relates to
eigenvalue algorithms using similarity transformations [WATKINS11]. But
similarity transform algorithms do not change to a subspace from powers of
as the power method does. Rather, in each iteration, the
subspace changes from the standard Euclidean subspace to one derived from
and then back to the Euclidean subspace. The
function is defined relative to the degree of the
iteration,
, and an estimate,
, of the closest eigenvalue
to
, which is the eigenvalue that we expect to see at position
when the algorithm has converged. The degree is usually one or
two, but it could be higher. We are mostly concerned with degree one iterative
steps, but the degree two Francis step procedure is presented in
Francis Steps of Degree Two.
The QR algorithm calculates as a matrix, but the
Francis algorithm is only concerned with the first column, so it is a
column vector,
. The subspaces then include
The unitary transform, , is formed from the
orthogonalized columns of
and applied from the
left to
,
. This step is a
change of the coordinate system from basis vectors
to
,
but then we complete the similarity transform
(
) and the matrix is back to
basis vectors. Although the matrix is
back to a standard coordinate system, the matrix did change. Of course,
the eigenvalues are the same, and the matrix is usually still upper
Hessenberg [1]. We expect the values on the subdiagonal to have smaller
absolute values and those on the diagonal to be closer to eigenvalues.
The composite Schur vectors will also have been updated with the columns
of
. The procedure is repeated as many times as needed
with new
and
variables until the
Schur triangular form is reached.
In this subspace iteration the subspace remained fixed (span:math:{bm{e}_1,ldots,bm{e}_n}) and the matrix changes. Contrast this with the subspace iteration used by the power method, where the matrix stays fixed and the subspace changes.
Subspaces from powers of emphasize the dominant
eigenvalue. However, subspaces from powers of
emphasize the eigenvalue at position (
), where
is
the last column used in the current iteration. Then the proof of
convergence relies on what happens in the limit to
after several iterations of similarity transformations.
Watkins presents the convergence of the Francis algorithm without reference to the QR algorithm in his book on matrix computations [WATKINS10] and in a 2011 journal article [WATKINS11]. The specific tenets of how subspace iterations allow the Francis algorithm to converge are rather abstract. The more common pedagogy is to explain the convergence of the QR algorithm and then tie the Francis algorithm into the explanation with the implicit Q theorem (Implicit Q Theorem). Observing how convergence occurs with the QR algorithm is a bit easier. So, let’s shift our discussion to more pragmatic convergence considerations for both algorithms.
Two aspects of the Francis algorithm require special consideration—the
bulge and similarity by Hessenberg factoring. As the bulge travels down
the matrix’s diagonal, it changes to incorporate the first column of
and what it finds in the matrix. The Francis
algorithm uses similarity transforms intending to move the bulge.
Restoring the matrix to Hessenberg form is a side effect. The
subdiagonal elements are diminished because they are unwitting victims
of the bulge-chasing effort. It is more challenging to see exactly how
the matrix is changed from Hessenberg form to upper triangular because
the subdiagonal is reduced by indirect actions. Readers may find it
helpful to witness examples of how the Francis algorithm changes a
matrix using the MATLAB debugger to trace its steps.
The strategy used by the QR algorithm to reduce the subdiagonal is an obvious juxtaposition to the Francis algorithm’s approach. QR temporarily puts the matrix into upper triangular form until the similarity transform is completed. So we have a step forward followed by a half step backward. The forward steps will eventually triumph over the smaller backward steps.
Here are a few points to consider relative to how these two algorithms converge.
- Convergence is reached when
. If the calculated Wilkinson shift from the
submatrix from rows and columns
and
yields
then the following QR or Francis step will converge to
. This can be seen either by working through the similarity transform equations manually or with numeric examples by setting breakpoints in the MATLAB debugger to track the variables [2].
- Convergence to eigenvalue
is marked by a near zero value at position (
) on the subdiagonal. If deflation were not used to reduce the matrix, then any similarity transforms beyond this point would be identity matrices. So, it is time to move up the diagonal and work on finding
. Once convergence is complete for all eigenvalues, future
matrices will be identity matrices, which, of course, are not transformation matrices because they will not change the matrix.
- As the algorithm moves up the diagonal with deflation as the eigenvalues are found, changes are made to all diagonal and subdiagonal matrix elements. Complex conjugate values move toward each other. When tracking the state of the matrix, perhaps using the debugger, one can see the diagonal values approach the values of eigenvalues.
- From the state of Hessenberg form, the goal of the algorithm is to move the subdiagonal elements to zero, not to place eigenvalues on the diagonal. By the Schur triangulation theorem ( Schur Triangularization Theorem), the eigenvalues will be on the diagonal automatically after similarity transforms push the matrix into upper triangular form.
- Probability favors convergence. The similarity transforms applied from the left are designed to move matrix elements below the diagonal to zero. Completion of the similarity requirement impedes progress, but convergence will eventually win. The left transforms are designed to clear out a column, but the magnitude of non-zero values placed below the diagonal by the transpose of the transform applied from the right is random chance.
- The Wilkinson shift is not required to find real eigenvalues from a
real matrix but always makes the convergence progress faster. A shift
is strictly necessary to find complex eigenvalues. The shift speeds
up convergence because the value we seek to find on the diagonal
should be much smaller than the values on the diagonal from 1 to
. The convergence rate is proportional to the ratio of the second smallest eigenvalue to the smallest eigenvalue.
- Eigenvalues (
) are found when processing column
, not column
. That is because the element at position (
) is set to zero by the Householder or Givens transformation operating on column
. In the case of the QR algorithm, the value at position (
) is not finalized until the similarity transformation is completed with
, but the QR decomposition prepares the matrix for convergence.
[1] | The exceptions are complex eigenvalues with the step of degree two and after convergence to upper triangular. |
[2] | Use the eigQR function and set a breakpoint in qrFactor after
the Householder transform is applied R = H*R; Then, enter
A1 = R*Q in the command window. It may be easier to see what is
happening if you start with a symmetric matrix. |