12.4.9. Iterative Francis Algorithm

The distinctive feature of the Francis algorithm is the implementation of the shift. Like the QR algorithm, the Wilkinson shift helps find complex eigenvalues and facilitates convergence. As illustrated in in equation (12.8), an initial similarity transform incorporates the shift into a bulge in the first column of the matrix. A bulge is a nonzero element below the subdiagonal of a matrix that is otherwise in Hessenberg form. Then, a sequence of similarity transforms chases the bulge down the diagonal and out of the matrix. The Francis algorithm contrasts with the QR algorithm, where the shift is applied and removed external to the algorithm’s application of similarity transforms. The schurFrancis function finds the Schur decomposition [1] of a matrix using Francis steps of degree one and degree two.

12.4.9.1. Francis Steps of Degree One

The first similarity transform creates the bulge in position \((3,1)\). Then, the following similarity transforms chase the bulge down the diagonal, always two positions below the diagonal. Each transform from the left moves the bulge to the right of the diagonal. Then the similarity completion from the right moves the bulge below the diagonal and one column to the right of its previous position. The final similarity transform pushes the bulge below the last row of the matrix.

(12.8)\[\begin{split}\begin{aligned} &\begin{bmatrix} * & * & * & * & *\\ * & * & * & * & *\\ + & * & * & * & *\\ & & * & * & *\\ & & & * & * \end{bmatrix} \qquad \begin{bmatrix} * & * & * & * & *\\ * & * & * & * & *\\ & * & * & * & *\\ & + & * & * & *\\ & & & * & * \end{bmatrix} \\ &\begin{bmatrix} * & * & * & * & *\\ * & * & * & * & *\\ & * & * & * & *\\ & & * & * & *\\ & & + & * & * \end{bmatrix} \qquad \begin{bmatrix} * & * & * & * & *\\ * & * & * & * & *\\ & * & * & * & *\\ & & * & * & *\\ & & & * & * \end{bmatrix} \end{aligned}\end{split}\]

The creation and chasing of a bulge for one Francis step of degree one on a \(5{\times}5\) non-symmetric matrix in Hessenberg form. After each similarity transform, the bulge locations are shown with plus signs (\(+\)).

Each introduction of a bulge and chasing the bulge out of the matrix is called a Francis step. Each Francis step diminishes the values of the elements along the subdiagonal. The degree one Francis steps chase the bulge using Givens transforms (Givens Rotation Matrices). As with the QR algorithm, the algorithm finds the eigenvalue \(\lambda_k\) at T(k, k) when the subdiagonal element, T(k, k-1), converges to a value close to zero. The accomplishment of a Francis step is comparable to a QR step from the QR algorithm, which is why the algorithm has been called the implicitly shifted QR algorithm. However, references to the QR algorithm in the name have confused learners because the algorithm never performs QR decomposition. The “implicitly shifted” label refers to the implicit scheme for introducing the shifts into the matrix. The confusion caused by the name prompted David Watkins, with his book on matrix computations [WATKINS10], to start a movement to rename the algorithm to the Francis algorithm after its inventor, John Francis.

12.4.9.2. Francis Steps of Degree Two

Complex numbers are not a problem for modern computer systems. However, that was not always the case. Due to the limitations of his computer, Francis developed a variant algorithm to minimize the need for storing or processing complex numbers. The algorithm uses a degree two Francis step, which takes advantage of complex eigenvalues in conjugate pairs. Complex conjugate pairs become real when they are either multiplied or added together. As with the degree one Francis step, the Wilkinson shift is calculated from the two eigenvalues \(\{\rho_1, \rho_2\}\) of the lower-right \(2{\times}2\) elements of the real, upper Hessenberg matrix \(\mathbf{T_i}\). A three-element bulge is introduced into the matrix from a similarity transform using a Householder reflection matrix derived from \(p\left(\mathbf{T}_i\right)\) of equation (12.9). The \(p\) function yields a real vector, so the similarity transforms \(\m{Q}_i\) and \(\m{T}_{i+1} = \m{Q}_i^H\,\m{T}_i\,\m{Q}_i\) are always real. Because the bulge is larger, the degree two Francis step is completed by restoring the matrix to upper Hessenberg form with Householder transforms.

(12.9)\[\begin{split} \begin{aligned} p\left(\mathbf{T}_i\right) &= \left(\mathbf{T}_i - \rho_1 \mathbf{I}\right) \, \left(\mathbf{T}_i - \rho_2 \mathbf{I}\right)\,\bm{e}_1 \\ &= \begin{bmatrix} {t_{11} - \rho_1} & t_{12} & * & * \\ t_{21} & {t_{22} - \rho_1} & * & * \\ 0 & t_{32} & * & * \\ 0 & 0 & * & * \end{bmatrix}\, \begin{bmatrix} {t_{11} - \rho_2} & * & * & * \\ t_{21} & * & * & * \\ 0 & * & * & * \\ 0 & 0 & * & * \end{bmatrix}\, \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix} \\ &= \begin{bmatrix} {\left(t_{11} - \rho_1\right)\,\left(t_{11} - \rho_2\right) + t_{12}t_{21}} \\ {t_{21}\left(\left(t_{11} + t_{22}\right) -\left(\rho_1 + \rho_2\right)\right)} \\ {t_{32}\,t_{21}} \\ 0 \\ 0 \end{bmatrix} \end{aligned}\end{split}\]

The algorithm leaves real eigenvalues on the diagonal. However, complex eigenvalues are represented with \(2{\times}2\) real submatrices. The lower-left element of each \(2{\times}2\) submatrix is below the diagonal. So, we do not have a proper Schur decomposition in the form of an upper triangular matrix with the eigenvalues on the diagonal.

Degree two steps converge quickly because each step is the equivalent of two degree one steps. Also, complex conjugate eigenvalue pairs are always next to each other, which is not always true with the QR algorithm or degree one Francis steps. For this reason, the schurFrancis function first uses the degree two algorithm, even when complex eigenvalues are needed. The degree one algorithm is used after the degree two algorithm to put the complex eigenvalues on the diagonal, which is quick because most of the iterative computations were completed by the degree two algorithm. The degree two algorithm option is also well suited to cases where only the eigenvalues are needed because the eigenvalue solver can easily find the eigenvalues from the \(2{\times}2\) submatrices.

The example below shows the output from the Schur decomposition with the mode option set to ’real’ so that only degree two Francis steps are taken.

>> T = schurFrancis(A, 'real')
T =
   27.1312   -8.5259  -11.5434   -3.4973   -5.0473
         0   10.8960    9.3448    5.5289  -10.6843
         0         0   -3.3198   17.2476    2.9627
         0         0   -8.1861   -0.6561   -7.7879
         0         0         0         0   -1.0513
>> eigs = eigFrancis(A)
eigs =
  27.1312 + 0.0000i
  10.8960 + 0.0000i
  -1.9879 +11.8075i
  -1.9879 -11.8075i
  -1.0513 + 0.0000i

Convergence to an eigenvalue with degree one Francis steps occurs when the value at \(\mathbf{T}_{k,k-1}\) is near zero. However, convergence to a pair of complex conjugate eigenvalues using the degree two procedure occurs when the value at \(\mathbf{T}_{k-1,k-2}\) is near zero.

Note

Some convergence points, although small, do not go below the convergence threshold. So, convergence can also be detected when the subdiagonal value does not change between Francis steps.

12.4.9.3. Zeros on the Subdiagonal

A mixture of zero and nonzero values on the subdiagonal is problematic for the Francis algorithm. The Francis algorithm requires that the matrix be properly Hessenberg, which means that the matrix has all nonzero elements on the subdiagonal. The implicit Q theorem (Implicit Q Theorem) shows that the bulge in the first column is applied to the remaining columns because the matrix is properly Hessenberg. A zero on the subdiagonal prevents moving the bulge below the subdiagonal when completing the similarity transform. If there is no bulge, the next transform matrix is an identity matrix, and convergence stops. The schurFrancis function splits the matrix into properly Hessenberg submatrix regions by changing the variables s and k that control the columns where the algorithm starts and ends each Francis step.

12.4.9.4. Eigenvalues of a Triangular Block Matrix

When the eigFrancis function is seeking only the eigenvalues of a non-symmetric matrix, the ’real’ option is passed to the schurFrancis function, which leaves nonzero values on the subdiagonal where there is a \(2{\times}2\) submatrix for a pair of complex conjugate eigenvalues.

Theorem 12.5 (Eigenvalues of a triangular block matrix)

Let matrix \(\bf{T}\) be a quasitriangular block matrix with at least one zero value on the subdiagonal. The set of eigenvalues of \(\bf{T}\) comprises the union of the eigenvalues from its diagonal blocks. A zero value on the subdiagonal indicates a division point of the blocks, with a block above and to the left of the zero and the other block to the right and below the zero. With reference to equation (12.10), \(\lambda(\mathbf{T}) = \{\lambda(\mathbf{T}_1) \cup \lambda(\mathbf{T}_2)\}\).

(12.10)\[\begin{split}\mathbf{T} = \begin{bmatrix} \mathbf{T}_1 & \mathbf{*} \\ \mathbf{0} & \mathbf{T}_2 \end{bmatrix}\end{split}\]

As depicted below, a quasitriangular matrix may be split into blocks based on zero and nonzero values on the subdiagonal. The eigenvalues of the \(2{\times}2\) submatrix holding \(\tilde{\lambda_3}\) and \(\tilde{\lambda_4}\) are needed to find \(\lambda_3\) and \(\lambda_4\).

\[\begin{split}\begin{bmatrix} \begin{matrix} \boxed{\lambda_1} & * \\ 0 & \boxed{\lambda_2} \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \end{matrix} & \begin{matrix} \begin{matrix} * & * \\ * & * \end{matrix} \\ \boxed{ \begin{matrix} \tilde{\lambda_3} & * \\ * & \tilde{\lambda_4} \end{matrix} } \\ \begin{matrix} 0 & 0 \end{matrix} \end{matrix} & \begin{matrix} * \\ * \\ * \\ * \\ \boxed{\lambda_5} \end{matrix} \end{bmatrix}\end{split}\]

Proof. Since the eigenvalues are the roots of the characteristic equation given by \(det(\mathbf{T} - \lambda\,\mathbf{I}) = \bm{0}\), we look at how determinants of block-diagonal matrices are calculated to gain the form of the characteristic polynomial. First, establish the determinant of matrix products, \(det(\mathbf{A\,B}) = det(\mathbf{A})\,det(\mathbf{B})\). Next, consider block diagonal matrices with an identity submatrix, which follows from Laplace expansion and \(det(\mathbf{I}) = 1\).

(12.11)\[\begin{split}det\begin{bmatrix} \mathbf{T} & \mathbf{0} \\ \mathbf{0} & \mathbf{I} \end{bmatrix} = det\begin{bmatrix} \mathbf{I} & \mathbf{0} \\ \mathbf{0} & \mathbf{T} \end{bmatrix} = det(\mathbf{T})\end{split}\]

Now express equation (12.10) as a product of matrices.

(12.12)\[\begin{split}\mathbf{T} = \begin{bmatrix} \mathbf{T}_1 & \mathbf{*} \\ \mathbf{0} & \mathbf{T}_2 \end{bmatrix} = \begin{bmatrix} \mathbf{I} & \mathbf{0} \\ \mathbf{0} & \mathbf{T}_2 \end{bmatrix}\, \begin{bmatrix} \mathbf{I} & \mathbf{*} \\ \mathbf{0} & \mathbf{I} \end{bmatrix}\, \begin{bmatrix} \mathbf{T}_1 & \mathbf{0} \\ \mathbf{0} & \mathbf{I} \end{bmatrix}\end{split}\]

The determinant of the middle matrix of equation (12.12) is one because it is upper triangular with ones on the diagonal. From equation (12.11), the determinant of equation (12.12) is \(det(\mathbf{T}_1)\,det(\mathbf{T}_2)\). By applying these determinants to characteristic equations, we get the roots of the polynomials from \(\bf{T}_1\) and \(\bf{T}_2\) equating to a union of eigenvalues.

\(\qed\)

The conclusion of Eigenvalues of a Triangular Block Matrix is that a zero on the subdiagonal of a matrix in Hessenberg form means that we may split the matrix at that point when we only wish to find the eigenvalues. The code in eigFrancis takes advantage of this. When a zero is found on the subdiagonal, the block of elements above the zero is closed, and a new block is started with the elements to the right and below the zero. Consecutive zeros will close and start new blocks, so each diagonal element is an eigenvalue. A nonzero element on the subdiagonal extends the current block.

12.4.9.5. Implicit Q Theorem

The implicit Q theorem establishes that the Francis algorithm has the same potential to converge as the QR algorithm. The theorem is listed here with the equations used in the texts by Stewart [STEWART73] and by Demmel [DEMMEL97].

Theorem 12.6 (Implicit Q Theorem)

Given matrices \(\mathbf{T, Q} \in \mathbb{C}^{n{\times}n}\) with \(\mathbf{Q}\) unitary and \(\mathbf{T}_{i+1} = \mathbf{Q}^H\,\mathbf{T}_i\,\mathbf{Q}\) in unreduced upper Hessenberg [2] form, then columns 2 through \(n\) of \(\mathbf{Q}\) are uniquely determined by the first column of \(\mathbf{Q}\).

Proof. The key aspect of the Francis algorithm that makes the implicit Q theorem true comes from using matrices in upper Hessenberg form. Each iteration begins with a unitary similarity transform, \(\mathbf{Q}_1\), designed to operate on the first column of \(\mathbf{T}_i - \rho_i\,\mathbf{I}\). The similarity transformation of \(\mathbf{Q}_0^H\,\mathbf{T\, Q}_0\) creates the bulge in the first column below the subdiagonal. Then, the subsequent similarity transforms, (\(\mathbf{Q}_1, \mathbf{Q}_2, \ldots \mathbf{Q}_{k-2}\)), push the bulge from column to column, always below the subdiagonal, eventually pushing the bulge out of \(\bf{T}\).

Removing the iteration count for clarity, let \(\mathbf{A} = \mathbf{T}_i\), \(\mathbf{B} = \mathbf{T}_{i+1}\), and introduce a new matrix, \(\mathbf{C} = \mathbf{A}\,\mathbf{Q} = \mathbf{Q\,B}\). We can find an expression for columns 2 through \(n\) of \(\bf{Q}\) by examining each column of \(\bf{C}\). Each element of \(\bf{C}\) comes from an abbreviated sum (stopping at \(k + 1\)) since \(\bf{A}\) and \(\bf{B}\) are both in Hessenberg form.

\[c_{j,k} = \sum_{i=1}^{k+1} a_{(j,i)}\, q_{(i,k)} = \sum_{i=1}^{k+1} q_{(j,i)}\, b_{(i,k)}, \quad \text{for } j = 1 \ldots k+1, k = 1 \ldots n-1\]

Column \(k\) of \(\bf{C}\) is then

(12.13)\[\bm{c}_k = \mathbf{A}\,\bm{q}_k = \sum_{i=1}^{k+1} \bm{q}_i b_{i,k}, \quad \text{for } k = 1 \ldots n-1.\]

The final term of equation (12.13) uniquely determines column \(k + 1\) of \(\bf{Q}\) for \(k = 1,\ldots n-1\) from the first column of \(\bf{Q}\), \(\bf{A}\), and columns 1 to \(k\) of \(\bf{B}\).

\[\bm{q}_{k+1} = \frac{\mathbf{A}\,\bm{q}_k - \sum_{i=1}^{k} \bm{q}_i b_{i,k}}{b_{k+1, k}}\]

Each Francis step achieves the same effect as a QR step from the QR algorithm because both the implicit bulge and the explicit shift are derived from \(\mathbf{T}_i - \rho_i\,\mathbf{I}\).

\(\qed\)

12.4.10. The eigFrancis Function

The eigFrancis function handles the administrative details of calculating eigenvalues and eigenvectors with the Francis algorithm. It calls the schurFrancis function to find the eigenvalues and the Schur vectors. The eigenvectors are the product of the Schur vectors and the eigenvectors of the upper triangular matrix (Eigenvectors Following the Schur Decomposition).

function [X, Lambda] = eigFrancis(A)
% EIGFRANCIS - Find the eigenvalues and eigenvectors of a matrix
%     using Francis's algorithm, a.k.a. implicitly-shifted QR.
%
% A - Square matrix
% EIGFRANCIS(A) or L = EIGFRANCIS(A) returns the eigenvalues of A.
% [X, L] = EIGFRANCIS(A) returns the eigenvectors of A as the
%     columns of the X matrix and the eigenvalues of A are
%     along the diagonal of matrix L.
%
% The Schur decomposition (schurFrancis) does the real work.
%
% NOTE: This code is for educational purposes.
% For other purposes, use MATLAB's eig function.
%
% see also: EIG, EIG22, HESS, SCHURFRANCIS, UTEIGVECT
    [m, n] = size(A);
    if m ~= n
        error("Matrix must be square to find eigenvalues")
    end
    if m == 1, if nargout <= 1, X = A; else, X = 1; Lambda = A; end
    return, end
    if m == 2
        [l1, l2] = eig22(A);
        if nargout > 1
            Lambda = diag([l1 l2]);
            X = [null(A - l1*eye(2)), null(A - l2*eye(2))];
        else, X = [l1; l2];
        end, return
    end
    tol = 1e-14;

    if nargout <= 1 % only want eigenvalues
        B = hess(A);
        % Split if any zeros are on the subdiagonal
        for k = 1:n-1
            if abs(B(k+1,k)) < tol
                L1 = eigFrancis(B(1:k,1:k));
                L2 = eigFrancis(B(k+1:m,k+1:n));
                X = [L1; L2];
                return;
            end  % Any other zeros found in recursion.
        end
        if isequal(A, A') % Hermitian symmetric
            L = schurFrancis(B, 'real');
        else
            L = schurFrancis(B);
        end
        X = diag(L);
        return
    else            % Find eigenvectors also
        if isequal(A, A') % Hermitian symmetric
            % Schur decomposition = eigendecomposition
            [X, L] = schurFrancis(A, 'real');
        else
            [Q, L] = schurFrancis(A); % L is upper triangular
            V = UTeigVect(L);
            X = Q*V;
            for k=1:n % no round-off on imag of real numbers
                if abs(imag(L(k,k))) < tol
                    X(:,k) = real(X(:,k));
                    L(k,k) = real(L(k,k));
                end
            end
        end
        Lambda = L.*eye(n); % zeros except on diagonal
    end
end
function [Q, T] = schurFrancis(A, mode)
% SCHURFRANCIS -  Francis's (implicit shift)
%             iterative Schur decomposition
% [Q,T] = schurFrancis(A, mode);
%
% A - Input Square matrix
% mode - Optional input, if mode is "real" and A is real,
%   degree two steps are used for complex eigenvalues.
%
% Q - Schur-Vectors, Unitary matrix, often complex
% T - Upper-triangular with the eigenvalues on the diagonal
%     for all real eigenvalues or no 'real' option.
%   - Quasitriangular if some eigenvalues are complex and
%     mode == 'real'.
%   - T is similar to A -- same eigenvalues.
% A = Q*T*Q', likewise T = Q'*A*Q
%
% NOTE: This code is for educational purposes.
%
% See also: SCHUR, EIG22, EIGFRANCIS, GIVENS, HOUSEHOLDER
    [m, n] = size(A);
    if m ~= n
        error("Matrix must be square for Schur decomposition")
    end
    % Start with upper Hessenberg - zeros below first subdiagonal
    [Q, H] = hess(A);
    % Call the degree-2 algorithm first because it puts complex
    % conjugate eigenvalues next to each other.
    if isreal(A)
        [Q, T] = schurHelper(H, Q, n, true);
        if ~(exist('mode', 'var') && isequal(mode, 'real'))
            [Q, T] = schurHelper(T, Q, n, false);
        end
    else
        [Q, T] = schurHelper(H, Q, n, false);
    end
    if nargout <= 1, Q = T; end
end

function [Q, T] = schurHelper(T, Q, n, degree2)
% Implementation of Francis algorithm for Schur decomposition.

tol = eps(4); % 1e-14;
% Test subdiagonal for not properly Hessenberg
zeroSub = abs(diag(T, -1)) < tol;
if any(zeroSub)
    if all(zeroSub) return; % upper triangular
    else
        k = find(~zeroSub, 1, 'last') + 1;
        s = find(zeroSub(1:k-2), 1, 'last') + 1;
        notProper = true;
    end
else
    k = n; s = 1; notProper = false;
end
if degree2, p = zeros(n, 1); end
while k > s
    [rho1, rho2] = eig22(T(k-1:k,k-1:k));
    if degree2 && k >= 2 && ~isreal(rho2) % degree 2 step
        if k == 2, break; end % finished
        if abs(T(k-1,k-2)) < tol*(abs(T(k,k)) + abs(T(k-1,k-1)))
            T(k-1,k-2) = 0; k = k - 2;   % deflation
            if notProper && s > 1 && k <= s
                s = nextStart(zeroSub(1:k-2), k);
            end
            continue;
        end
        lastSub = abs(T(k-1,k-2));
        p(1:3) = [(T(s,s)-rho1)*(T(s,s)-rho2) ...
                    + T(s,s+1)*T(s+1,s);
                   T(s+1,s)*((T(s,s) + T(s+1,s+1)) ...
                    - (rho1 + rho2)); T(s+2,s+1)*T(s+1,s)];
        Hr = householder(real(p(1:k-s+1)), k-s+1, 0); % p is real
        H = eye(n); H(s:k,s:k) = Hr;
        T = H*T*H';  % create bulge with similarity transform
        % Chase the bulge, put T back to Hessenberg form
        [P, T] = hess(T); Q = Q*H'*P;
        if abs(abs(T(k,k-2)) - lastSub) < tol
            T(k-1,k-2) = 0; k = k - 2;
            if notProper && s > 1 && k <= s
                s = nextStart(zeroSub(1:k-2), k);
            end
        end

    else                % degree one step
        lastSub = abs(T(k,k-1));
        if abs(T(k,k-1)) < tol*(abs(T(k,k)) + abs(T(k-1,k-1)))
            T(k, k-1) = 0;
            if abs(imag(T(k,k))) < tol
                T(k,k) = real(T(k,k)); % no imag round-off
            end
            k = k - 1;   % deflation
            if notProper && s > 1 && k <= s
                s = nextStart(zeroSub(1:k-2), k);
            end
            continue;
        end
        if abs(T(k,k)-rho1) < abs(T(k,k)-rho2)
            shift = rho1; else, shift = rho2;
        end
        G = givens((T(s,s) - shift), T(s+1,s), s, s+1, n);
        T = G*T*G';   % create bulge with similarity transform
        Q = Q*G';     % due to shift T(2,1) is not 0.
        % chase the bulge
        for c = s:k-2  % c = current column
            G = givens(T(c+1,c), T(c+2,c), c+1, c+2, n);
            T = G*T*G'; T(c+2,c) = 0; % no round-off error
            Q = Q*G';   % Q = G0'*G1'*G2'*...*Gn'
        end
        if abs(abs(T(k,k-1)) - lastSub) < tol
            T(k,k-1) = 0; k = k - 1;
            if notProper && s > 1 && k <= s
                s = nextStart(zeroSub(1:k-2), k);
            end
        end
    end
end
end

function s = nextStart(zeroSub, k)
% Helper function to find the start of the next
% Francis step when there are zeros on the subdiagonal.
    if any(zeroSub)
        if all(zeroSub), s = k; return; end
        if k > 2, s = find(zeroSub, 1, 'last') + 1;
        else, s = 1;
        end
    else
        s = 1;
    end
end

Footnotes: