12.2. The Unitary Transformation Matrices

For clarity in future references, the two unitary transforms are presented here rather than in context with the historical development.

Householder transformations have the advantage of being able to set several values of a column or row to zero, while the application of a Givens transformation only moves one matrix element to zero. The faster moving Householder transformation is a good choice for direct algorithms such as the QR, Hessenberg, and bidiagonal factoring algorithms.

Either Givens or Householder transformations affect other matrix values besides those we want to set to zero. Thus, we must be careful about where and in which order they are used. Householder reflection transforms multiplied from the left change values along the rows, left to right, from the column operated on. Each Givens transform changes values on only two rows when multiplying from the left or two columns when multiplying from the right. The restricted impact of the Givens transformations makes well suited for the iterative Francis algorithm as it trims a matrix from Hessenberg form to upper triangular, or as it thins a bidiagonal matrix down to diagonal.

We can apply a transform from the left to set matrix elements in a column to zeros (\mathbf{T} = \mathbf{H\, T}). We can also apply a transpose from the right to set elements along a row to zeros (\mathbf{T} = \mathbf{T \, H}^H).

Note

Both Givens and Householder transforms produce real values at the indicated positions of the matrix. This allows the singular values of the SVD to always be real. Eigenvalues can be complex because other matrix elements are impacted by the multiplication, and completion of similarity transforms allows them to remain complex where needed.

12.2.1. Givens Rotation Matrices

Figure Fig. 12.1 depicts a Givens rotation on two elements of a column. The two variables are regarded as point coordinates on a plane. Givens rotations need to rotate a point clockwise by \theta to set the second vector value to zero. [1] If the rotator is operating on elements of a row, then the transpose of the rotator is multiplied from the right.

../_images/GivensRot.png

Fig. 12.1 Givens rotations view two elements of a column as sides of a right triangle. The Givens rotator is a 2{\times}2 rotation matrix that when applied to a column vector rotates the vector by -\theta, which flattens the triangle and sets the second element of the vector to a value of zero.

../_images/GivensMat.png

Fig. 12.2 A Givens rotation matrix is an m{\times}m identity matrix except in four elements that hold the coefficients of the 2{\times}2 Givens rotator matrix. The indices, either rows or columns, of the two matrix elements rotated are not required to be consecutive.

The computationally fastest way to apply a Givens rotation is to use the 2{\times}2 rotator and multiply it by only the two rows impacted by the rotation. For simplicity, the MATLAB code code-givens function returns a m{\times}m matrix for applying to a m{\times}n matrix. The transformation matrix can be applied to a whole matrix because, as shown in figure Fig. 12.2, the rotator matrix is placed in the appropriate position along the diagonal of an identity matrix.

function G = givens(x1, x2, i1, i2, m)
% GIVENS - Create a Givens rotation matrix
%
%  x1, x2 - Real or complex vector values
%  i1 - first value index number.
%  i2 - second value index number
%  m - Size of transformation matrix
%
%  G - Square, unitary rotation matrix
%
% see also: SCHURFRANCIS, SVDFRANCIS, HOUSEHOLDER

    G = eye(m); r = norm([x1; x2]);
    if r < eps(2) || abs(x2) < eps
        return
    end
    c = x1/r; s = x2/r;
    if isreal(x1) && isreal(x2) % real case
        G(i1,i1) = c; G(i1,i2) = s;
    else   % complex case
        G(i1,i1) = conj(c); G(i1,i2) = conj(s);
    end
    G(i2,i1) = -s; G(i2,i2) = c;
end

12.2.1.1. Verifying the real and complex solution

We want the application of the Givens rotator to replace the first matrix element (x_1) with \sqrt{x_1^2 + x_2^2} and to replace the second matrix element (x_2) with 0. For both the real and complex cases, we have the following variables.

r = \sqrt{x_1^2 + x_2^2}, \quad
            c = \cos \theta = \frac{x_1}{r}, \quad
            s = \sin \theta = \frac{x_2}{r}

The application of the Givens rotator to real numbers is as follows.

\mat{c s; -s c}\vector{x_1; x_2} =
        \frac{1}{r}\,\mat{x_1 x_2; -x_2 x_1}\,\vector{x_1; x_2} =
        \frac{1}{r}\,\vector{{x_1^2 + x_2^2}; {-x_1\,x_2 + x_1\,x_2}}
        = \vector{r; 0}

The easiest way to express the complex case is with variable substitutions and a complex exponential to note the relationship between each variable’s real and imaginary parts. One could do multiplication by taking care to keep real and imaginary parts separate, but it is simpler to add exponents when multiplying.

\begin{array}{lcl}
            u = \norm{x_1}_2  &\qquad  &v = \norm{x_2}_2 \\
            x_1 = u\,e^{i\,\phi} &\qquad  &x_2 = v\,e^{i\,\alpha} \\
                               &r = \sqrt{u^2 + v^2}  \\
            c = \cos \theta = \frac{u\,e^{i\,\phi}}{r} &\qquad
                        &s = \sin \theta = \frac{v\,e^{i\,\alpha}}{r} \\
            \bar{c} = \frac{u\,e^{-i\,\phi}}{r} &\qquad
                    &\bar{s} = \frac{v\,e^{-i\,\alpha}}{r}
        \end{array}

\begin{aligned}
            \mat{\bar{c} \bar{s}; -s c}\,\vector{x_1; x_2}
            &= \frac{1}{r} \mat{u\,e^{-i\,\phi} v\,e^{-i\,\alpha};
            -v\,u\,e^{i\,\alpha} u\,e^{i\,\phi}}\,
            \vector{u\,e^{i\,\phi}; v\,e^{i\,\alpha}}  \\
            &= \frac{1}{r}\,\vector{{u^2\,e^{i\,(\phi - \phi)} +
            v^2\,e^{i\,(\alpha - \alpha)}};
            {-v\,u\,e^{i\,(\alpha + \phi)} + v\,u\,e^{i\,(\alpha + \phi)}}}\,
            = \frac{1}{r}\,\vector{{u^2 + v^2}; 0} \,
            = \vector{r; 0}
        \end{aligned}

12.2.1.2. Verifying Unitary

We can verify the unitary property of the Givens rotator matrices with the required relationship \mathbf{G\, G}^H = \mathbf{I}. Let c = \cos \theta and s = \sin \theta.

Real case:

\mat{c s; -s c}\,\mat{c -s; s c}
        = \mat{{c^2 + s^2}, {-cs + cs};
               {-cs + cs}, {s^2 + c^2}} = \mat{1 0; 0 1}

Complex case:

\mat{\bar{c} \bar{s}; -s c}\,\mat{c -\bar{s}; s \bar{c}}
        = \mat{{\bar{c}c + \bar{s}s}, {-\bar{c}\bar{s} + \bar{c}\bar{s}};
               {-cs + cs}, {\bar{s}s + c\bar{c}}} = \mat{1 0; 0 1}

12.2.1.3. Example use of Givens rotations

The example below comes from Francis’s algorithm for finding the SVD of a matrix, which is an iterative procedure attempting to form a diagonal matrix ( Demmel–Kahan 1990). The example shows a Francis step where Givens transformations are applied in a pattern from the upper-left to the lower-right of the matrix. The starting matrix is bidiagonal with nonzero values on the diagonal and superdiagonal. Because of the bidiagonal structure, left multiplication of a Givens transformation only changes values within a 2{\times}3 region. The transformations’ range of impact for operating on row elements with right multiplication is 3{\times}2. Extra nonzero values appear after the transformations are applied. Those unwanted nonzero values are then chased out of the matrix by forthcoming transformations. The resulting matrix is shown after the first two transformations and after the final listed transformation. The norm of the superdiagonal is noted before and after the Givens transformations. Note that the matrix is closer to being diagonal after the Francis step than before. Additional iterations of the Francis step are needed for the matrix to converge to diagonal form.

>> A = [1 2 0 0; 0 1 2 0; 0 0 1 2; 0 0 0 1];

% Initial superdiagonal energy
>> norm(diag(A, 1))
ans =
    3.4641
>> G = givens(A(1,1), A(1,2), 1, 2, 4); A = A*G'
>> A =
       2.2361         0         0         0
       0.8944    0.4472    2.0000         0
            0         0    1.0000    2.0000
            0         0         0    1.0000

>> G = givens(A(1,1), A(2,1), 1, 2, 4); A = G*A
>> A =
       2.4083    0.1661    0.7428         0
            0    0.4152    1.8570         0
            0         0    1.0000    2.0000
            0         0         0    1.0000

>> G = givens(A(1,2), A(1,3), 2, 3, 4); A = A*G';
>> G = givens(A(2,2), A(3,2), 2, 3, 4); A = G*A;
>> G = givens(A(2,3), A(2,4), 3, 4, 4); A = A*G';
>> G = givens(A(3,3), A(4,3), 3, 4, 4); A = G*A
>> A =
       2.4083    0.7611         0         0
            0    2.1385    0.9181         0
            0   -0.0000    2.0477    0.0527
            0    0.0000         0    0.0948

% final superdiagonal energy
>> norm(diag(A, 1))
ans =
    1.1937

12.2.2. Householder Reflection Matrices

A Householder reflection matrix is part identity matrix and part Householder reflector. The transformations are applied starting at the first column and advancing one column at a time. The transformation moves column elements below a specified element to zero. Columns already processed are multiplied by the identity portion of the transformation, while the remaining columns are multiplied by the reflector matrix.

When a column vector is multiplied by its Householder reflector, only the first element of the vector is left with a nonzero value.

\mathbf{Hr}\,\bm{x} = \mathbf{Hr} \vector{{\times}
{\times} {\times}} = \vector{{\times} 0 0}

The following equation specifies the general reflector matrix [HIGHAM02].

(12.1)\mathbf{Hr} = \mathbf{I} - \frac{2}{\bm{u}^H\bm{u}}\bm{u}\bm{u}^H, \qquad
    \bm{u} \neq 0, \bm{u} \in \mathbb{C}^n

Since we will set \norm{\bm{u}} = 1, the calculation of \bf{Hr} from equation (12.1) is simplified.

(12.2)\mathbf{Hr} = \mathbf{I} - 2\,\bm{u}\,\bm{u}^H

Householder reflector matrices derive from vector projections ( Over-determined Systems and Vector Projections). We find a reflection of vector \bm{x} about the hyperplane \text{span}(\bm{u}^{\perp}) by multiplying \bm{x} by a reflector matrix \bf{Hr} as illustrated in figure Fig. 12.3. The equations used to determine \bf{Hr} come directly from the geometry depicted in the diagram. Vector \bm{z} is the projection of \bm{x} onto \bm{u}. The length of \bm{z} is the scalar dot product \norm{\bm{z}} = \bm{u}^H\bm{x}. Its vector equation is \bm{z} = \bm{u\,(u}^H \bm{x}). Similarly, the vector \bm{x} - \left(\bm{u}^H\bm{x}\right)\bm{u} is an alternate equation for the projection of \bm{x} onto the hyperplane. By symmetry, the reflection of \bm{x} about the hyperplane is \bm{x} - 2\,\left(\bm{u}^H\bm{x}\right)\bm{u}. We need to use an identity matrix to factor \bm{x} out because \bm{u\,u}^H is a matrix, as is \mathbf{Hr}.

(12.3)\begin{aligned}
        \mathbf{Hr}\,\bm{x} &= \bm{x} - 2\,\left(\bm{u}^H\bm{x}\right)\bm{u}
        = \bm{x} - 2\,\bm{u\, u}^H\bm{x} \\
        &= \left(\mathbf{I} - 2\,\bm{u\, u}^H\right)\bm{x}\\
        \mathbf{Hr} &= \mathbf{I} - 2\,\bm{u\, u}^H
    \end{aligned}

Equation (12.3) gives us a match to the simplified general reflection matrix in equation (12.2).

../_images/Householder_reflect.png

Fig. 12.3 Householder reflector matrix \bf{Hr} times \bm{x} is a reflection of \bm{x} about the hyperplane \text{span}(\bm{u}^{\perp}). We can think of the hyperplane as a mirror. The diagram shows a side view of the mirror. If the vector \bm{x} is viewed in the mirror, its reflection is the vector \mathbf{Hr}\,\bm{x}. The hyperplane represents the set of vectors perpendicular to the \bm{u} vector.

function H = householder(x, n, k)
% HOUSEHOLDER - Find Householder reflection matrix
%
%  x - Real or complex column vector
%  n - Size of Householder transformation matrix
%  k - Size of the leading identity matrix, probably k-1 if k is
%      the column you are currently working on.
%      H = [eye(k) zeros(k, n-k); zeros(n-k, k) Hr], where Hr is
%      the reflector. Size of Hr is (n-k)-by-(n-k).
%  H - Square n-by-n, Hermitian, unitary reflection matrix.
%  Note: If H is real, then it is not only unitary but because it
%        is symmetric, it is self-inverted (H == H', H*H == I).
%        This is not the case if H is complex (H*H' == I).
%
% see also: BIDIAG, HESSFORM, QRFACTOR, EIGQR, GIVENS

    % Modify the sign function so that sign(0) = 1.
    sig = @(u) sign(u) + (u==0);
    m = size(x, 1);
    % catch trivial cases
    if m < 2 || abs(norm(x(2:m))) < 1e-16
        H = eye(n);
        return;
    end
    if isreal(x(1))     % real case
        x(1) = x(1) + sig(x(1))*norm(x);
        u = x/norm(x);
        Hr = eye(m) - 2*(u*u'); % reflection
    else                % complex case
        expTheta = x(1)/norm(x(1));   % e^{i theta}
        expNegTheta = conj(expTheta); % e^{-i theta}
        x(1) = x(1) + expTheta*norm(x);
        u = x/norm(x);
        Hr = expNegTheta * (eye(m) - 2*(u*u'));
    end
    H = zeros(n);
    H(1:k,1:k) = eye(k);
    H(k+1:n, k+1:n) = Hr;
end    % end of function

Note

Householder transformations work with matrices that may contain complex values in addition to real matrices and vectors.

In some applications, one might start with \bm{x} and either the hyperplane or \bm{u} and then determine the reflection \mathbf{Hr}\,\bm{x}. For our application, we start with \bm{x}, but we need to find the \bm{u} vector such that all elements of \mathbf{Hr}\,\bm{x}, except for the first, are equal to zero. That is to say that the reflection vector needs to lie on the x axis. [2]

\mathbf{Hr}\,\bm{x} = \mat{c, 0, \ldots, 0}^T = c\,\bm{e}_1,
\; \text{where}\: \bm{e}_1 = \mat{1, 0, \ldots, 0}^T

From the geometry and since \norm{\bm{x}}_2 = \norm{\mathbf{Hr}\,\bm{x}}_2 =
\abs{c}, \bm{u} must be parallel to \bm{\tilde{u}} = \bm{x} \pm
\norm{\bm{x}}_2 \bm{e}_1. So we can find \bm{u} simply from \bm{x} with \norm{\bm{x}}_2 added to x_1 with the same sign as x_1 and then making \bm{u} a unit vector.

\bm{\tilde{u}} = \vector{{x_1 + \text{sign}(x_1)\,\norm{x}_2}
        {x_2} {\vdots} {x_n}}, \qquad
    \bm{u} = \frac{\bm{\tilde{u}}}{\norm{\bm{\tilde{u}}}_2}

The Complex Case

If the first value of \bm{x} is complex, then an adjustment to the algorithm is needed [GOLUB13]. The objective is to put the complex component into the transformation such that after the transform is applied, the new vector holds only real numbers. In the equation to find the vector \bm{u}, the sign of the first value of \bm{x} (x_1) is replaced with the orientation of x_1 in the complex plane (e^{i\theta}), which from Euler’s complex exponential equation is just x_1 converted to a unit length complex number e^{i\theta} = \cos \theta + i \sin \theta. [3] Then, to make the first element of the vector a real number when the transform is applied, we multiply the transform by the complex conjugate of x_1, which is e^{-i\theta}. See Euler’s Complex Exponential Equation if you are not familiar with complex exponentials.

Each Householder reflection matrix combines an identity matrix, zeros, and a Householder reflector. It sets the reflector in the needed location to affect the desired column of \bf{A}. It uses an identity matrix to leave the previously completed work alone.

\mathbf{H}_1 = \mathbf{Hr}_1, \qquad
\mathbf{H}_2 = \left[ \begin{array}{c|c}
    1 & \scalemath{1.2}{\mathbf{0}} \\ \hline \\[-10pt]
    \scalemath{1.2}{\mathbf{0}} & \scalemath{1.1}{\mathbf{Hr}_2}
  \end{array} \right], \qquad
\mathbf{H}_3 = \left[ \begin{array}{c|c}
    \begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix} & \scalemath{1.5}{\mathbf{0}}
  \\ \hline \\[-10pt]
    \scalemath{1.5}{\mathbf{0}} & \scalemath{1.3}{\mathbf{Hr}_3}
  \end{array} \right]

12.2.2.1. Verifying Unitary

We can verify the unitary property of the Householder reflector matrices with the required relationship \mathbf{Hr\,Hr}^H = \mathbf{I}. Note again that \bm{u}^H\bm{u} = 1, but \bm{u}\,\bm{u}^H is a matrix. Also, \left(\bm{u\, u}^H\right)^H = \left(\bm{u}^H\right)^H\bm{u}^H =
\bm{u\, u}^H.

\begin{aligned}
        \mathbf{I} &= \left(\mathbf{I} - 2\,\bm{u\,u}^H\right)\,
                            \left(\mathbf{I} - 2\,\bm{u\, u}^H\right)^H \\
    &=\left(\mathbf{I} - 2\,\bm{u\, u}^H\right)\,
                \left(\mathbf{I} - 2\,\bm{u\, u}^H\right) \\
        &= \mathbf{I} - 2\,\bm{u\,u}^H - 2\,\bm{u\, u}^H
                + 4\,\bm{u\, u}^H \bm{u\, u}^H \\
        &= \mathbf{I} + (-2 - 2 + 4)\,\bm{u\, u}^H
    \end{aligned}

12.2.3. Numerical Stability

Orthogonal factoring algorithms are regarded as numerically accurate or stable. Numerical computations can yield inaccurate results for two reasons. First, the finite data storage of computers limits the possible accuracy. Operations such as adding or subtracting a huge number with a small number can result in round-off error where the contribution of the small number is lost. We’ll not consider this situation in detail here since it is covered in Numerical Stability and is likely not an issue for algorithms that use unitary transformation matrices. The second common cause of lost accuracy comes from perturbations to the data. The accuracy of a solution to a system of equations (\mathbf{A}\,\bm{x} = \bm{b}) can, in some cases, be highly dependent on the accuracy of the values in \bm{b}. If the values in \bm{b} come from an experiment, they may not be as accurate as desired. This will not be a problem if the \bf{A} matrix is well-conditioned. However, if \bf{A} is said to be ill- or poorly-conditioned, then a minor error in \bm{b} will result in a significant error in the calculated solution, \bm{x}. The metric for measuring the condition of a matrix is called the condition number.

A matrix with a relatively small condition number is well-conditioned. The condition of \bf{A} is denoted as \kappa({\mathbf{A}}) [WATKINS10].

\kappa(\mathbf{A}) = \text{cond}(\mathbf{A}) =
            \norm{\mathbf{A}}\norm{\mathbf{A}^{-1}}

The condition number of a matrix usually increases with multiplication.

\kappa(\m{A\,B})  \leq \kappa(\m{A})\,\kappa(\m{B}),
    \quad \text{and} \quad
    \kappa(\m{A}^T\m{A})  = \kappa(\m{A})^2

Thus, algorithms that perform operations such as \mathbf{A}^T\mathbf{A} can become poorly-conditioned. However, unitary matrices have a condition number of one. So, although orthogonal algorithms do matrix multiplications, they are considered numerically stable because the multiplication does not degrade the system’s condition. See Numerical Stability for more details about numerical stability, especially with regard to systems of equations.

[1]It is not necessary to calculate the angle \theta. The two coordinates define the cosine and sine of \theta.
[2]The notation \bm{e}_i is a common notation in linear algebra literature to indicate a vector of all zeros except for a one in the i{\text{th}} position.
[3]It is not necessary to compute \theta, just express x_1 as a unit length complex number by dividing by its length.