12.4.9. Iterative Francis Algorithm¶
The distinctive feature of the Francis algorithm is the implementation
of the shift. Like the QR algorithm, the Wilkinson shift helps find
complex eigenvalues and facilitates convergence. As illustrated in
in equation (12.8), an initial similarity transform
incorporates the shift into a bulge in the first column of the matrix.
A bulge is a nonzero element below the subdiagonal of a matrix that is
otherwise in Hessenberg form. Then, a sequence of similarity transforms
chases the bulge down the diagonal and out of the matrix. The Francis
algorithm contrasts with the QR algorithm, where the shift is applied
and removed external to the algorithm’s application of similarity
transforms. The schurFrancis
function finds the Schur
decomposition [1] of a matrix using Francis steps of degree one and
degree two.
12.4.9.1. Francis Steps of Degree One¶
The first similarity transform creates the bulge in position \((3,1)\). Then, the following similarity transforms chase the bulge down the diagonal, always two positions below the diagonal. Each transform from the left moves the bulge to the right of the diagonal. Then the similarity completion from the right moves the bulge below the diagonal and one column to the right of its previous position. The final similarity transform pushes the bulge below the last row of the matrix.
(12.8)¶\[\begin{split}\begin{aligned} &\begin{bmatrix} * & * & * & * & *\\ * & * & * & * & *\\ + & * & * & * & *\\ & & * & * & *\\ & & & * & * \end{bmatrix} \qquad \begin{bmatrix} * & * & * & * & *\\ * & * & * & * & *\\ & * & * & * & *\\ & + & * & * & *\\ & & & * & * \end{bmatrix} \\ &\begin{bmatrix} * & * & * & * & *\\ * & * & * & * & *\\ & * & * & * & *\\ & & * & * & *\\ & & + & * & * \end{bmatrix} \qquad \begin{bmatrix} * & * & * & * & *\\ * & * & * & * & *\\ & * & * & * & *\\ & & * & * & *\\ & & & * & * \end{bmatrix} \end{aligned}\end{split}\]
The creation and chasing of a bulge for one Francis step of degree one on a \(5{\times}5\) non-symmetric matrix in Hessenberg form. After each similarity transform, the bulge locations are shown with plus signs (\(+\)).
Each introduction of a bulge and chasing the bulge out of the matrix is
called a Francis step. Each Francis step diminishes the values of the
elements along the subdiagonal. The degree one Francis steps chase the
bulge using Givens transforms (Givens Rotation Matrices). As with the QR
algorithm, the algorithm finds the eigenvalue \(\lambda_k\) at
T(k, k)
when the subdiagonal element, T(k, k-1)
, converges to a
value close to zero. The accomplishment of a Francis step is comparable
to a QR step from the QR algorithm, which is why the algorithm has been
called the implicitly shifted QR algorithm. However, references to the
QR algorithm in the name have confused learners because the algorithm
never performs QR decomposition. The “implicitly shifted” label refers
to the implicit scheme for introducing the shifts into the matrix. The
confusion caused by the name prompted David Watkins, with his book on
matrix computations [WATKINS10], to start a movement
to rename the algorithm to the Francis algorithm after its inventor,
John Francis.
12.4.9.2. Francis Steps of Degree Two¶
Complex numbers are not a problem for modern computer systems. However, that was not always the case. Due to the limitations of his computer, Francis developed a variant algorithm to minimize the need for storing or processing complex numbers. The algorithm uses a degree two Francis step, which takes advantage of complex eigenvalues in conjugate pairs. Complex conjugate pairs become real when they are either multiplied or added together. As with the degree one Francis step, the Wilkinson shift is calculated from the two eigenvalues \(\{\rho_1, \rho_2\}\) of the lower-right \(2{\times}2\) elements of the real, upper Hessenberg matrix \(\mathbf{T_i}\). A three-element bulge is introduced into the matrix from a similarity transform using a Householder reflection matrix derived from \(p\left(\mathbf{T}_i\right)\) of equation (12.9). The \(p\) function yields a real vector, so the similarity transforms \(\m{Q}_i\) and \(\m{T}_{i+1} = \m{Q}_i^H\,\m{T}_i\,\m{Q}_i\) are always real. Because the bulge is larger, the degree two Francis step is completed by restoring the matrix to upper Hessenberg form with Householder transforms.
The algorithm leaves real eigenvalues on the diagonal. However, complex eigenvalues are represented with \(2{\times}2\) real submatrices. The lower-left element of each \(2{\times}2\) submatrix is below the diagonal. So, we do not have a proper Schur decomposition in the form of an upper triangular matrix with the eigenvalues on the diagonal.
Degree two steps converge quickly because each step is the equivalent of
two degree one steps. Also, complex conjugate eigenvalue pairs are
always next to each other, which is not always true with the QR
algorithm or degree one Francis steps. For this reason, the
schurFrancis
function first uses the degree two algorithm, even when
complex eigenvalues are needed. The degree one algorithm is used after
the degree two algorithm to put the complex eigenvalues on the diagonal,
which is quick because most of the iterative computations were completed
by the degree two algorithm. The degree two algorithm option is also
well suited to cases where only the eigenvalues are needed because the
eigenvalue solver can easily find the eigenvalues from the
\(2{\times}2\) submatrices.
The example below shows the output from the Schur decomposition with the
mode
option set to ’real’
so that only degree two Francis steps
are taken.
>> T = schurFrancis(A, 'real')
T =
27.1312 -8.5259 -11.5434 -3.4973 -5.0473
0 10.8960 9.3448 5.5289 -10.6843
0 0 -3.3198 17.2476 2.9627
0 0 -8.1861 -0.6561 -7.7879
0 0 0 0 -1.0513
>> eigs = eigFrancis(A)
eigs =
27.1312 + 0.0000i
10.8960 + 0.0000i
-1.9879 +11.8075i
-1.9879 -11.8075i
-1.0513 + 0.0000i
Convergence to an eigenvalue with degree one Francis steps occurs when the value at \(\mathbf{T}_{k,k-1}\) is near zero. However, convergence to a pair of complex conjugate eigenvalues using the degree two procedure occurs when the value at \(\mathbf{T}_{k-1,k-2}\) is near zero.
Note
Some convergence points, although small, do not go below the convergence threshold. So, convergence can also be detected when the subdiagonal value does not change between Francis steps.
12.4.9.3. Zeros on the Subdiagonal¶
A mixture of zero and nonzero values on the subdiagonal is problematic
for the Francis algorithm. The Francis algorithm requires that the
matrix be properly Hessenberg, which means that the matrix has all
nonzero elements on the subdiagonal. The implicit Q theorem
(Implicit Q Theorem) shows that the bulge in the first column is
applied to the remaining columns because the matrix is properly
Hessenberg. A zero on the subdiagonal prevents moving the bulge below
the subdiagonal when completing the similarity transform. If there is no
bulge, the next transform matrix is an identity matrix, and convergence
stops. The schurFrancis
function splits the matrix into properly
Hessenberg submatrix regions by changing the variables s
and k
that control the columns where the algorithm starts and ends each
Francis step.
12.4.9.4. Eigenvalues of a Triangular Block Matrix¶
When the eigFrancis
function is seeking only the eigenvalues of a
non-symmetric matrix, the ’real’
option is passed to the
schurFrancis
function, which leaves nonzero values on the
subdiagonal where there is a \(2{\times}2\) submatrix for a pair of
complex conjugate eigenvalues.
Theorem 12.5 (Eigenvalues of a triangular block matrix)
Let matrix \(\bf{T}\) be a quasitriangular block matrix with at least one zero value on the subdiagonal. The set of eigenvalues of \(\bf{T}\) comprises the union of the eigenvalues from its diagonal blocks. A zero value on the subdiagonal indicates a division point of the blocks, with a block above and to the left of the zero and the other block to the right and below the zero. With reference to equation (12.10), \(\lambda(\mathbf{T}) = \{\lambda(\mathbf{T}_1) \cup \lambda(\mathbf{T}_2)\}\).
As depicted below, a quasitriangular matrix may be split into blocks based on zero and nonzero values on the subdiagonal. The eigenvalues of the \(2{\times}2\) submatrix holding \(\tilde{\lambda_3}\) and \(\tilde{\lambda_4}\) are needed to find \(\lambda_3\) and \(\lambda_4\).
Proof. Since the eigenvalues are the roots of the characteristic equation given by \(det(\mathbf{T} - \lambda\,\mathbf{I}) = \bm{0}\), we look at how determinants of block-diagonal matrices are calculated to gain the form of the characteristic polynomial. First, establish the determinant of matrix products, \(det(\mathbf{A\,B}) = det(\mathbf{A})\,det(\mathbf{B})\). Next, consider block diagonal matrices with an identity submatrix, which follows from Laplace expansion and \(det(\mathbf{I}) = 1\).
Now express equation (12.10) as a product of matrices.
The determinant of the middle matrix of equation (12.12) is one because it is upper triangular with ones on the diagonal. From equation (12.11), the determinant of equation (12.12) is \(det(\mathbf{T}_1)\,det(\mathbf{T}_2)\). By applying these determinants to characteristic equations, we get the roots of the polynomials from \(\bf{T}_1\) and \(\bf{T}_2\) equating to a union of eigenvalues.
\(\qed\)
The conclusion of Eigenvalues of a Triangular Block Matrix is that a zero on the
subdiagonal of a matrix in Hessenberg form means that we may split the
matrix at that point when we only wish to find the eigenvalues. The code
in eigFrancis
takes advantage of this. When a zero is found on the
subdiagonal, the block of elements above the zero is closed, and a new
block is started with the elements to the right and below the zero.
Consecutive zeros will close and start new blocks, so each diagonal
element is an eigenvalue. A nonzero element on the subdiagonal extends
the current block.
12.4.9.5. Implicit Q Theorem¶
The implicit Q theorem establishes that the Francis algorithm has the same potential to converge as the QR algorithm. The theorem is listed here with the equations used in the texts by Stewart [STEWART73] and by Demmel [DEMMEL97].
Theorem 12.6 (Implicit Q Theorem)
Given matrices \(\mathbf{T, Q} \in \mathbb{C}^{n{\times}n}\) with \(\mathbf{Q}\) unitary and \(\mathbf{T}_{i+1} = \mathbf{Q}^H\,\mathbf{T}_i\,\mathbf{Q}\) in unreduced upper Hessenberg [2] form, then columns 2 through \(n\) of \(\mathbf{Q}\) are uniquely determined by the first column of \(\mathbf{Q}\).
Proof. The key aspect of the Francis algorithm that makes the implicit Q theorem true comes from using matrices in upper Hessenberg form. Each iteration begins with a unitary similarity transform, \(\mathbf{Q}_1\), designed to operate on the first column of \(\mathbf{T}_i - \rho_i\,\mathbf{I}\). The similarity transformation of \(\mathbf{Q}_0^H\,\mathbf{T\, Q}_0\) creates the bulge in the first column below the subdiagonal. Then, the subsequent similarity transforms, (\(\mathbf{Q}_1, \mathbf{Q}_2, \ldots \mathbf{Q}_{k-2}\)), push the bulge from column to column, always below the subdiagonal, eventually pushing the bulge out of \(\bf{T}\).
Removing the iteration count for clarity, let \(\mathbf{A} = \mathbf{T}_i\), \(\mathbf{B} = \mathbf{T}_{i+1}\), and introduce a new matrix, \(\mathbf{C} = \mathbf{A}\,\mathbf{Q} = \mathbf{Q\,B}\). We can find an expression for columns 2 through \(n\) of \(\bf{Q}\) by examining each column of \(\bf{C}\). Each element of \(\bf{C}\) comes from an abbreviated sum (stopping at \(k + 1\)) since \(\bf{A}\) and \(\bf{B}\) are both in Hessenberg form.
Column \(k\) of \(\bf{C}\) is then
The final term of equation (12.13) uniquely determines column \(k + 1\) of \(\bf{Q}\) for \(k = 1,\ldots n-1\) from the first column of \(\bf{Q}\), \(\bf{A}\), and columns 1 to \(k\) of \(\bf{B}\).
Each Francis step achieves the same effect as a QR step from the QR algorithm because both the implicit bulge and the explicit shift are derived from \(\mathbf{T}_i - \rho_i\,\mathbf{I}\).
\(\qed\)
12.4.10. The eigFrancis
Function¶
The eigFrancis
function handles the administrative details of
calculating eigenvalues and eigenvectors with the Francis algorithm. It
calls the schurFrancis
function to find the eigenvalues and the
Schur vectors. The eigenvectors are the product of the Schur vectors and
the eigenvectors of the upper triangular matrix (Eigenvectors Following the Schur Decomposition).
function [X, Lambda] = eigFrancis(A)
% EIGFRANCIS - Find the eigenvalues and eigenvectors of a matrix
% using Francis's algorithm, a.k.a. implicitly-shifted QR.
%
% A - Square matrix
% EIGFRANCIS(A) or L = EIGFRANCIS(A) returns the eigenvalues of A.
% [X, L] = EIGFRANCIS(A) returns the eigenvectors of A as the
% columns of the X matrix and the eigenvalues of A are
% along the diagonal of matrix L.
%
% The Schur decomposition (schurFrancis) does the real work.
%
% NOTE: This code is for educational purposes.
% For other purposes, use MATLAB's eig function.
%
% see also: EIG, EIG22, HESS, SCHURFRANCIS, UTEIGVECT
[m, n] = size(A);
if m ~= n
error("Matrix must be square to find eigenvalues")
end
if m == 1, if nargout <= 1, X = A; else, X = 1; Lambda = A; end
return, end
if m == 2
[l1, l2] = eig22(A);
if nargout > 1
Lambda = diag([l1 l2]);
X = [null(A - l1*eye(2)), null(A - l2*eye(2))];
else, X = [l1; l2];
end, return
end
tol = 1e-14;
if nargout <= 1 % only want eigenvalues
B = hess(A);
% Split if any zeros are on the subdiagonal
for k = 1:n-1
if abs(B(k+1,k)) < tol
L1 = eigFrancis(B(1:k,1:k));
L2 = eigFrancis(B(k+1:m,k+1:n));
X = [L1; L2];
return;
end % Any other zeros found in recursion.
end
if isequal(A, A') % Hermitian symmetric
L = schurFrancis(B, 'real');
else
L = schurFrancis(B);
end
X = diag(L);
return
else % Find eigenvectors also
if isequal(A, A') % Hermitian symmetric
% Schur decomposition = eigendecomposition
[X, L] = schurFrancis(A, 'real');
else
[Q, L] = schurFrancis(A); % L is upper triangular
V = UTeigVect(L);
X = Q*V;
for k=1:n % no round-off on imag of real numbers
if abs(imag(L(k,k))) < tol
X(:,k) = real(X(:,k));
L(k,k) = real(L(k,k));
end
end
end
Lambda = L.*eye(n); % zeros except on diagonal
end
end
function [Q, T] = schurFrancis(A, mode)
% SCHURFRANCIS - Francis's (implicit shift)
% iterative Schur decomposition
% [Q,T] = schurFrancis(A, mode);
%
% A - Input Square matrix
% mode - Optional input, if mode is "real" and A is real,
% degree two steps are used for complex eigenvalues.
%
% Q - Schur-Vectors, Unitary matrix, often complex
% T - Upper-triangular with the eigenvalues on the diagonal
% for all real eigenvalues or no 'real' option.
% - Quasitriangular if some eigenvalues are complex and
% mode == 'real'.
% - T is similar to A -- same eigenvalues.
% A = Q*T*Q', likewise T = Q'*A*Q
%
% NOTE: This code is for educational purposes.
%
% See also: SCHUR, EIG22, EIGFRANCIS, GIVENS, HOUSEHOLDER
[m, n] = size(A);
if m ~= n
error("Matrix must be square for Schur decomposition")
end
% Start with upper Hessenberg - zeros below first subdiagonal
[Q, H] = hess(A);
% Call the degree-2 algorithm first because it puts complex
% conjugate eigenvalues next to each other.
if isreal(A)
[Q, T] = schurHelper(H, Q, n, true);
if ~(exist('mode', 'var') && isequal(mode, 'real'))
[Q, T] = schurHelper(T, Q, n, false);
end
else
[Q, T] = schurHelper(H, Q, n, false);
end
if nargout <= 1, Q = T; end
end
function [Q, T] = schurHelper(T, Q, n, degree2)
% Implementation of Francis algorithm for Schur decomposition.
tol = eps(4); % 1e-14;
% Test subdiagonal for not properly Hessenberg
zeroSub = abs(diag(T, -1)) < tol;
if any(zeroSub)
if all(zeroSub) return; % upper triangular
else
k = find(~zeroSub, 1, 'last') + 1;
s = find(zeroSub(1:k-2), 1, 'last') + 1;
notProper = true;
end
else
k = n; s = 1; notProper = false;
end
if degree2, p = zeros(n, 1); end
while k > s
[rho1, rho2] = eig22(T(k-1:k,k-1:k));
if degree2 && k >= 2 && ~isreal(rho2) % degree 2 step
if k == 2, break; end % finished
if abs(T(k-1,k-2)) < tol*(abs(T(k,k)) + abs(T(k-1,k-1)))
T(k-1,k-2) = 0; k = k - 2; % deflation
if notProper && s > 1 && k <= s
s = nextStart(zeroSub(1:k-2), k);
end
continue;
end
lastSub = abs(T(k-1,k-2));
p(1:3) = [(T(s,s)-rho1)*(T(s,s)-rho2) ...
+ T(s,s+1)*T(s+1,s);
T(s+1,s)*((T(s,s) + T(s+1,s+1)) ...
- (rho1 + rho2)); T(s+2,s+1)*T(s+1,s)];
Hr = householder(real(p(1:k-s+1)), k-s+1, 0); % p is real
H = eye(n); H(s:k,s:k) = Hr;
T = H*T*H'; % create bulge with similarity transform
% Chase the bulge, put T back to Hessenberg form
[P, T] = hess(T); Q = Q*H'*P;
if abs(abs(T(k,k-2)) - lastSub) < tol
T(k-1,k-2) = 0; k = k - 2;
if notProper && s > 1 && k <= s
s = nextStart(zeroSub(1:k-2), k);
end
end
else % degree one step
lastSub = abs(T(k,k-1));
if abs(T(k,k-1)) < tol*(abs(T(k,k)) + abs(T(k-1,k-1)))
T(k, k-1) = 0;
if abs(imag(T(k,k))) < tol
T(k,k) = real(T(k,k)); % no imag round-off
end
k = k - 1; % deflation
if notProper && s > 1 && k <= s
s = nextStart(zeroSub(1:k-2), k);
end
continue;
end
if abs(T(k,k)-rho1) < abs(T(k,k)-rho2)
shift = rho1; else, shift = rho2;
end
G = givens((T(s,s) - shift), T(s+1,s), s, s+1, n);
T = G*T*G'; % create bulge with similarity transform
Q = Q*G'; % due to shift T(2,1) is not 0.
% chase the bulge
for c = s:k-2 % c = current column
G = givens(T(c+1,c), T(c+2,c), c+1, c+2, n);
T = G*T*G'; T(c+2,c) = 0; % no round-off error
Q = Q*G'; % Q = G0'*G1'*G2'*...*Gn'
end
if abs(abs(T(k,k-1)) - lastSub) < tol
T(k,k-1) = 0; k = k - 1;
if notProper && s > 1 && k <= s
s = nextStart(zeroSub(1:k-2), k);
end
end
end
end
end
function s = nextStart(zeroSub, k)
% Helper function to find the start of the next
% Francis step when there are zeros on the subdiagonal.
if any(zeroSub)
if all(zeroSub), s = k; return; end
if k > 2, s = find(zeroSub, 1, 'last') + 1;
else, s = 1;
end
else
s = 1;
end
end
Footnotes: