6.11.3. Projections Onto a Hyperplane

We can extend projections to \mathbb{R}^3 and still visualize the projection as projecting a vector onto a plane. Here, the column space of matrix \bf{A} is two 3-dimension vectors, \bm{a}_1 and \bm{a}_2.

\mathbf{A} = \mat{{\vertbar} {\vertbar}; \bm{a}_1  \bm{a}_2;
                       {\vertbar} {\vertbar}}

The span of two vectors in \mathbb{R}^3 forms a plane. As before, we have an approximation solution to \mathbf{A}\,\bm{x} = \bm{b}, which can be written as \mathbf{A}\,\bm{\hat{x}} = \bm{p}. The error is \bm{e} = \bm{b} - \bm{p} = \bm{b} - \mathbf{A}\,\bm{\hat{x}}. We want to find \bm{\hat{x}} such that \bm{b} - \mathbf{A}\,\bm{\hat{x}} is orthogonal to the plane, so again we set the dot products equal to zero.

\bm{a}_1^{T}(\bm{b} - \mathbf{A}\,\bm{\hat{x}}) = 0

\bm{a}_2^{T}(\bm{b} - \mathbf{A}\,\bm{\hat{x}}) = 0

\left[ \begin{array}{lll}
    \horzbar  & \bm{a}_1^{T} & \horzbar \\ {} \\
    \horzbar  & \bm{a}_2^{T} & \horzbar
  \end{array} \right]
  \left [ \begin{array}{c} \vertbar \\
        \bm{b} - \mathbf{A}\,\bm{\hat{x}} \\ \vertbar
          \end{array} \right]  =
\left[ \begin{array}{c} 0 \\ {} \\ 0 \end{array} \right]

The left matrix is just \mathbf{A}^T, which is size 2{\times}3. The size of \bm{\hat{x}} is 2{\times}1, so the size of \mathbf{A}\,\bm{\hat{x}}, like \bm{b}, is 3{\times}1.

\begin{aligned}
        &\mathbf{A}^T(\bm{b} - \mathbf{A}\,\bm{\hat{x}}) = 0 \\
        &\mathbf{A}^{T}\mathbf{A}\,\hat{\bm{x}} = \mathbf{A}^{T}\bm{b} \\
        &\boxed{\hat{\bm{x}} =
{\left(\mathbf{A}^{T}\mathbf{A}\right)}^{-1}\mathbf{A}^{T}\bm{b}}
    \end{aligned}

The orthogonal solution makes use of QR decomposition. Because \bf{A} is over-determined, the last m - n rows of \bf{R} will be all zeros, so we can reduce the size of \bf{Q} and \bf{R} with the economy QR option where The first n columns of \bf{Q} are returned (Q is now m {\times} n, and \bf{R} is returned as a n {\times} n square matrix. After finding \bf{Q} and \bf{R}, computing the solution only requires one n {\times} m multiplication with a n {\times} 1 vector and then a n {\times} n back-substitution of an upper triangular matrix.

\begin{aligned}
        &\mathbf{A} = \mathbf{Q\,R} \\
        &\mathbf{A}\,\hat{\bm{x}} = \mathbf{Q\,R}\,\hat{\bm{x}} = \bm{b} \\
        &\hat{\bm{x}} = \mathbf{R}^{-1}\mathbf{Q}^T\bm{b}
    \end{aligned}

In MATLAB, we can find \hat{\bm{x}} as:

x_hat = (A'*A) \ (A'*b);

% or

[Q, R] = qr(A, 0); % Economy QR
x_hat = R \ (Q'*b);

Note

(A \ b) finds the same answer. The left-divide operator handles under-determined and over-determined systems as well as square matrix systems (Left-Divide Operator).

The projection vector is then [1]:

\bm{p} = \mathbf{A}\,\bm{\hat{x}} =
\mathbf{A}{\left(\mathbf{A}^{T}\mathbf{A}\right)}^{-1}
    \mathbf{A}^{T}\bm{b} =
    \mathbf{A\,R}^{-1}\mathbf{Q}^T\bm{b}.

The projection matrix is:

\mathbf{P} = \mathbf{A}{\left(\mathbf{A}^{T} \mathbf{A}\right)}^{-1}
    \mathbf{A}^{T} = \mathbf{A\,R}^{-1}\,\mathbf{Q}^T.

Note

\bf{A} is not a square matrix, and is not invertible. If it were, \mathbf{P} = \bf{I} and there would be no need to do the projection. Try to verify why this is the case. Recall that (\mathbf{BA})^{-1} =
\mathbf{A}^{-1}\mathbf{B}^{-1}.

6.11.3.1. Over-determined Pseudo-inverse

The matrix \left(\mathbf{A}^{T}\mathbf{A}\right)^{-1}\mathbf{A}^{T} has a special name and symbol. It is called the Moore–Penrose pseudo-inverse of an over-determined matrix. It is used when the simpler \mathbf{A}^{-1} can not be used. A superscript plus sign (\mathbf{A}^{+}) is used as a short-hand math symbol for the pseudo-inverse. MATLAB has a function called pinv(A) that returns the pseudo-inverse of \bf{A}.

MATLAB uses the SVD to calculate the pseudo-inverse of over-determined systems and under-determined systems (The Preferred Under-determined Solution), which is more accurate and usually faster than the direct matrix calculation (Over-determined Pseudo-inverse).

Note

Unlike the matrix equations, the same equation of the SVD sub-matrices finds the pseudo-inverse for both over-determined and under-determined matrices. Recall that since \bf{U} and \bf{V} are orthogonal, their inverse are just their transpose.

\mathbf{A}^+ = \left(\mathbf{U\,\Sigma\,V}^T \right)^+
= \mathbf{V\,\Sigma}^+\,\mathbf{U}^T

Since \bf{\Sigma} is a diagonal matrix its inverse is the reciprocal of the nonzero elements on the diagonal. The zero elements remain zero.

>> S = diag([2 3 0])
S =
    2     0     0
    0     3     0
    0     0     0
>> pinv(S)
ans =
    0.5000         0         0
         0    0.3333         0
         0         0         0

Here is an example of using the pseudo-inverse on an over-determined matrix equation.

>> A=randi(20,15,4)-8;
>> y=randi(10,4,1)-4
y =
     4
    -1
     2
     3
>> b = A*y;
>> [U, S, V] = svd(A);
>> pinvA=V*pinv(S)*U';
>> x = pinvA*b
x =
    4.0000
   -1.0000
    2.0000
    3.0000
>> x = pinv(A)*b
x =
    4.0000
   -1.0000
    2.0000
    3.0000

6.11.3.2. Projection Example

The script program in projectR3.m and the resulting plot in figure Fig. 6.26 demonstrate the calculation and display of the projection of a vector onto the column space of a matrix. The dot product is also calculated to verify orthogonality.

../../_images/project_to_plane.png

Fig. 6.26 The projection of a vector onto the column space of \bf{A}, which spans a plane in \mathbb{R}^3

Two vectors, \bm{a}_1 and \bm{a}_2 define a plane. To find the equation for a plane, we first determine a vector, \bm{n}, which is orthogonal to both vectors. The cross product provides this. The vector \bm{n} is defined as \bm{n} = [a\: b\: c]^T. Then given one point on the plane (x_0, y_0, z_0), we can calculate the equation for the plane.

a(x - x_0) + b(y - y_0) + c(z - z_0) + d = 0

In this case, we know that the point (0, 0, 0) is on the plane, so the plane equation is simplified.

The points of the plane are determined by first defining a region for x and y and then using the plane equation to calculate the corresponding points for z. A simple helper function called vector3 was used to plot the vectors. A plot of the projection is shown in figure Fig. 6.26.

% File: projectR3.m
% Projection of the vector B onto the column space of A.
% Note that with three sample points, we can visualize
% the projection in R^3.

%% Find the projection
% Define the column space of A, which forms a plane in R^3.
a1 = [1 1 0]';  % vector from (0, 0, 0) to (1, 1, 0).
a2 = [-1 2 1]';
A = [a1 a2];
b = [1 1 3]';
x_hat = (A'*A)\(A'*b);
p = A*x_hat;          % projection vector
e = b - p;            % error vector
fprintf('Error is orthogonal to the plane if close to zero:
    %f\n', p'*e);

%% Plot the projection
% To make the equation for a plane, we need n, which is the
% orthogonal normal vector to the column space, found from
% the cross product.
n = cross(a1,a2);

% Plane is: a(x-x0) +b(y-y0) + c(z-z0) + d = 0
% Variables a, b, and c come from the normal vector.
% Keep it simple and use point (0, 0, 0) as the point on
% the plane (x0, y0, z0).
a=n(1); bp=n(2); c=n(3); d=0;
[x, y] = meshgrid(-1:0.25:3);  % Generate x and  y data
z = -1/c*(a*x + bp*y + d);     % Generate z data

figure;
surf(x,y,z);               % Plot the plane
daspect([1 1 1]);          % consistent aspect ratio
hold on;
z0 = [0 0 0]';             % vector starting point
vector3(z0, a1, 'k');      % column 1 of A
vector3(z0, a2, 'k');      % column 2 of A
vector3(z0, b, 'g');       % target vector
vector3(z0, p, 'b');       % projection
vector3(p, b, 'r');        % error from p to b
hold off;
xlabel('X');
ylabel('Y');
zlabel('Z');
title('Projection of a vector onto a plane');

% helper function
function vector3( v1, v2, clr )
% VECTOR3 Plot a vector from v1 to v2 in R^3.
    plot3([v1(1) v2(1)], [v1(2) v2(2)], ...
      [v1(3) v2(3)], 'Color', clr,'LineWidth',2);
end

6.11.3.3. Alternate Projection Equation

Not all linear algebra textbooks present the same equations for projection of a vector onto a vector space. The approach that we have used might be described as projecting a vector onto the column space of a matrix. An alternate approach projects a vector onto an equivalent vector space that is spanned by an orthonormal set of basis vectors. So the first step in the alternate approach is to find the orthonormal basis vectors of the matrix columns. (See Vector Spaces for help with some of the terminology.)

Let \mathbf{U} = \{ \bm{u}_1, \bm{u}_2, \ldots , \bm{u}_n \} be an orthonormal basis for the column space of matrix \bf{A}. Then the projection of vector \bm{b} onto \text{Col}( \mathbf{A} ) is defined by

\text{proj}_\mathbf{A}\,\bm{b} = (\bm{b}^T \bm{u}_1)\,\bm{u}_1 +
(\bm{b}^T \bm{u}_2)\,\bm{u}_2
    + \cdots + (\bm{b}^T \bm{u}_n)\,\bm{u}_n = \mathbf{U\,U}^T \bm{b}.

There is a trade-off between the two algorithms. The alternative approach finds the orthonormal basis vectors to the columns of \bf{A} using either the Gram–Schmidt process (Finding Orthogonal Basis Vectors QR factorization (QR Decomposition), or the economy SVD method (Projection and the Economy SVD), which is used by the orth function. Whereas, projecting onto the column space requires solving an \mathbf{A}\,\bm{x} = \bm{b} problem.

The geometry of the alternate algorithm provides a simple way to see that it also projects a vector onto a vector space.

  • Any vector in the vector space must be a linear combination of the orthonormal basis vectors.
  • The \bm{b}^T \bm{u}_i coefficients in the sum are scalars from the dot products of \bm{b} and each basis vector. Since the basis vectors are unit length, each term in the sum is \left(\norm{\bm{b}}\,\cos(\theta_i)\right)\,\bm{u}_i, where \theta_i is the angle between \bm{b} and the basis vector; thus, each term in the sum is the projection of \bm{b} onto a basis vector.

The example in the file alt_project.m finds the projection using the two methods. The mod_gram_schmidt function from Implementation of Modified Gram–Schmidt is used to find the orthogonal basis vectors.

% File: alt_project.m
%% Comparison of two projection equations
% Define the column space of A, which forms a plane in R^3.
a1 = [1 1 0]';  % vector from (0, 0, 0) to (1, 1, 0).
a2 = [-1 2 1]';
A = [a1 a2];
b = [1 1 3]';
%% Projection onto the column space of matrix A.
x_hat = (A'*A)\(A'*b);
p = A*x_hat;          % projection vector
disp('Projection onto column space: ')
disp(p)

%% Alternate projection
% The projection is the vector sum of projections onto
% the orthonormal basis vectors of the column space of A.
[U, ~] = mod_gram_schmidt(A); % Could use the orth function
u1 = U(:,1);
u2 = U(:,2);
p_alt = b'*u1*u1 + b'*u2*u2;
disp('Projection onto basis vectors: ')
disp(p_alt)

The output from the vector projection example follows.

>> alt_project
Projection onto column space:
    0.1818
    1.8182
    0.5455
Projection onto basis vectors:
    0.1818
    1.8182
    0.5455

6.11.3.4. Higher Dimension Projection

If the matrix \bf{A} is larger than 3{\times}2, then we can not visually plot a projection as we did above. However, the equations still hold. One of the nice things about linear algebra is that we can visually see what is happening when the vectors are in \mathbb{R}^2 or \mathbb{R}^3, but higher dimensions are fine too. The projection equations are used in least squares regression where we want as many rows of data as possible to accurately fit an equation to the data.

Footnote:

[1]It is tempting to replace \bf{A} here with \mathbf{Q\,R}, but \mathbf{Q\,R\,R}^{-1}\mathbf{Q}^T\bm{b} is just \bm{b}. Remember that our projection equation is an estimate of \bm{b} when \bm{b} is not in the column space of \bf{A}.