6.12. Under-determined Systems

As discussed in When Does a Solution Exist?, under-determined systems of linear equations have an infinite set of solutions rather than a unique solution. If additional equations can not be found to make the system full rank, then one might find an acceptable solution from the set of vectors that satisfy the system of equations. Under-determined systems are found in linear programming problems, especially in the context of machine learning and artificial intelligence [CLINE76], [CHAPRA15].

6.12.1. RREF and Under-determined Systems

Using elimination, we can put a matrix into what is called reduced row echelon form (RREF) as described in Reduced Row Echelon Form, which will reveal the set of solution vectors where the system of equations is satisfied. In MATLAB, the rref function performs elimination with partial pivoting to put a matrix into RREF. It provides a set of solutions but does not tell us the single best solution. When we need a working solution, MATLAB’s left-divide operator usually provides a solution that will meet our needs.

When the first m columns of an under-determined system are independent of each other then the augmented matrix in RREF takes the form of the following example that shows an identity matrix in the first m columns.

>> A = randi(10, 3, 4) - randi(5, 3, 4);
>> rank(A)
ans =
    3
>> x = [4; 2; 5; 1]; %  The values that we hope to find
>> b = A*x;
>> C = rref([A b])
C =
    1.0000         0         0   -0.6091    3.3909
         0    1.0000         0    0.3864    2.3864
         0         0    1.0000    0.9136    5.9136

Notice that each of the first three columns have only a single one. But the fourth column, for x_4, has values in each row. MATLAB found the fourth column to be a linear combination of the first three columns by the weights indicated. So the values of the first three variables, which are pivot columns, are fixed by the equation. We call the variables associated with the pivot columns basic variables. Any variables, such as x_4, that are not associated with pivot columns are called free variables, meaning that they can take any value. We don’t have an equation for x_4 since it is a free variable, so we can just say that x_4 = x_4. We can also replace it with an independent scalar variable, a.

The new system of equations is:

(6.32)\begin{array}{rl}
  x_1 - 0.6091\,x_4 &= 3.3909 \\
  x_2 + 0.3864\,x_4 &= 2.3864 \\
  x_3 + 0.9136\,x_4 &= 5.9136 \\
  x_4 &= x_4
\end{array}

The solution is a set of lines. We can see this if we write the line equations in what is called parametric form.

(6.33)\vector{x_1; x_2; x_3; x_4} =
\vector{3.3909; 2.3864; 5.9136; 0}
+ a\,\vector{0.6091; -0.3864; -0.9136; 1} = \bm{u} + a\,\bm{v}

6.12.1.1. Particular and General Solution

Part of the solution that we found by elimination with rref is called the particular solution, which is any vector that solves \mathbf{A}\,\bm{u} = \bm{b} when a = 0. The \bm{u} vector is the last column of the augmented matrix in RREF. We also found a general solution, which are the vectors, \{\bm{v}, \bm{w}, \ldots\}, that can be multiplied by scalars, such that \mathbf{A}\,(a\,\bm{v} + b\,\bm{w} +
\ldots) = \bm{0}. As reflected in the example equation (6.32), the augmented equations in RREF have the equal signs between the columns from \bf{A} and \bm{b}. Then, as shown in equation (6.33), when the equations are put into parametric form, the general solution vectors are moved to the right side of the equal signs with a sign change.

Note

Rows need to be added to the particular solution (\bm{u}) and the general solution vectors to match the number of columns in \bf{A}. Make the added rows to the general solution vectors a 1 in the row corresponding to the column number and a 0 in the other rows.

>> A = randi(20, 3,  4) - 8;
>> b = randi(20, 3, 1) - 8;
>> z = rref([A b]);
>> u = [z(:,5); 0];
%  Verify that A*u = b
>> A*u - b
ans =
     0
     0
     0
>> v = [-z(:,4); 1];
%  Verify A*a*v = 0
>> A*v
ans =
     0
     0
     0
>> A*2*v
ans =
     0
     0
     0

Note that if \bm{x} = \bm{u} + a\,\bm{v}, then

\mathbf{A}\,\bm{x} = \mathbf{A}\,(\bm{u} + a\,\bm{v})
= \bm{b} + \bm{0} = \bm{b}.

In addition to finding the general solution from the parametric form, the null space solution of \bf{A} also provides a general solution. If needed, we can find the specific value of a to match the null solution. When there is only one general solution then finding a_0 is straight forward, a_0\,\bm{v} = \text{Null}(\mathbf{A}).

>> a_null = null(A)./v
a_null =
    0.6516
    0.6516
    0.6516
    0.6516
>> a_0 = a_null(1);

If the general solution has more than one vector, \{\bm{v}, \bm{w}, \ldots\}, the same basic relationship holds.

\mathbf{A}\,\bm{x} = \mathbf{A}\,(\bm{u} + a\,\bm{v} + b\,\bm{w})
= \bm{b} + \bm{0} + \bm{0} = \bm{b}

However, the multiplier matching the general solution vectors to the null space solution is a matrix rather than a scalar.

\mat{\bm{v} \bm{w}}\, \mathbf{\mathbf{D}} = null(\mathbf{A})

Here is a quick example showing a two column general solution. Remember that in parametric form, we change the sign of the general solution columns as they are moved them from the left to right side of the equal sign. The added rows to \bm{v} and \bm{w} are such that x_3 = a and x_4 = b.

>> A
A =
    8    -6     4    -3
   10    10    -7     2
>> b
b =
    14
    58
>> z = rref([A b])
z =
   1.0000         0   -0.0143   -0.1286    3.4857
        0    1.0000   -0.6857    0.3286    2.3143
% general solutions
>> v = [-z(:,3); 1; 0];
>> w = [-z(:,4); 0; 1];
>> A*(v + w)   % a = 1; b = 1;
ans =
    0
    0
>> A*(v + 2*w) % a = 1; b = 2;
ans =
    0
    0
% particular solution
>> A*z(:,5)
ans =
    14.0000
    58.0000

6.12.2. The Preferred Under-determined Solution

The best solution to an under-determined system of equations is usually the smallest \bm{x} that satisfies the equations. The length of \bm{x} is found with the l_2–norm, but the l_1–norm or other vector norms described in Vector Norms may be used.

One possibility is to find a scalar multiplier, a, that minimizes \norm{\bm{x} = \bm{u} + a\,\bm{v}}_2. We could use an optimization algorithm as described in Finding a Minimum to minimize \norm{\bm{x}}_2 with respect to the multipliers (a, b, \ldots). MATLAB’s lsqminnorm combines the particular and general solutions to find a solution with a smaller l_2–norm than what the left-divide operators finds using just the particular solution. In this example, lsqminnorm uses the same \bf{A} and \bm{b} as the previous examples.

>> x = lsqminnorm(A, b)
x =
    3.4774
    1.5379
   -1.1043
    0.0582
>> norm(x)
ans =
    3.9599
>> x = A\b
x =
    3.4857
    2.3143
         0
         0
>> norm(x)
ans =
    4.1840

Another approach is to let a = 0 and focus on the particular solution. Cline and Plemmons showed that the particular solution that minimizes \norm{\bm{x} = \bm{u}}_2 is found with the Moore–Penrose pseudo-inverse of under-determined matrices [CLINE76]. Solutions that minimize the l_2–norm of \bm{x} are also referred to as the least squares solution.

\bm{x} = \mathbf{A}^+ \bm{b}

Under-determined Pseudo-inverse

The Moore–Penrose pseudo-inverse, denoted \mathbf{A}^+, fills the role of a matrix inverse for rectangular matrices. We can find a pseudo-inverse for an over-determined matrix (\mathbf{A}_o^+) and an under-determined matrix (\mathbf{A}_u^+). We show in Projections Onto a Hyperplane that the pseudo-inverse of an over-determined matrix is:

\mathbf{A}_o^+ = \left(\mathbf{A}^T \mathbf{A}\right)^{-1}\mathbf{A}^T.

We use this result to find the pseudo-inverse of an under-determined matrix. We have an over-determined matrix in \mathbf{A}^T. The transpose of the pseudo-inverse of \mathbf{A}^T is the under-determined pseudo-inverse of \bf{A}.

\begin{array}{rl}
\mathbf{A}_u^+ &= \left(\left(\mathbf{A}^T\right)_o^+\right)^T
= \left(\left(\mathbf{A\,A}^T\right)^{-1}\,\mathbf{A} \right)^T
    \\ \hfill \\
&= \mathbf{A}^T\left(\left(\mathbf{A\,A}^T\right)^{-1} \right)^T
 = \mathbf{A}^T\left(\left(\mathbf{A\,A}^T\right)^T \right)^{-1}
    \\ \hfill \\
&= \mathbf{A}^T\,(\mathbf{A\,A}^T)^{-1}
  \end{array}

As with the over-determined case, the direct matrix equation can be slow and prone to inaccuracy for large and poorly conditioned matrices, so orthogonal techniques such as the SVD or QR factorization algorithms are used instead. MATLAB’s pinv function uses the SVD as described in Over-determined Pseudo-inverse.

\mathbf{A}_u^+ = \mathbf{V\,\Sigma}^+\mathbf{U}^T

The left-divide operator uses QR factoring to find solutions to an under-determined systems of equations as described in Left-Divide of Under-determined Systems.

Here is an example of the pseudo-inverse of an under-determined matrix, which is calculated with the SVD, the matrix equation, and the pinv function.

>> A = [2 4 5; 3 8 9];
>> [U,S,V] = svd(A);
>> V*pinv(S)*U'
ans =
    1.4390   -0.7561
   -1.1707    0.6829
    0.5610   -0.2439
% The right-divide operator
>> A'/(A*A')
ans =
    1.4390   -0.7561
   -1.1707    0.6829
    0.5610   -0.2439
>> pinv(A)
ans =
    1.4390   -0.7561
   -1.1707    0.6829
    0.5610   -0.2439

In addition to minimizing the length of the \bm{x} vector, we also have a preference for solutions that are sparse (lots of zeros). Sparse solutions are especially desired in some machine learning applications. The smallest \bm{x} in terms of the l_1 –norm is known to be a sparse solution. Numerical methods for finding the sparse l_1 –norm based solution are discussed in CVX for Disciplined Convex Programming.

As discussed in Left-Divide of Under-determined Systems, MATLAB’s left-divide operator finds the least squares particular solution using the QR factorization algorithm. Before using QR, dependent columns of \bf{A} are zeroed out, which yields more zeros in the solution. The left-divide operator zeros out n - r columns of \bf{A} where r is the rank of \bf{A}. The columns to zero out are found sequentially. The first column zeroed out is the column that when replaced with zeros results in the smallest l_2–norm of the solution. The evaluation is repeated until n - r columns are zeroed out [SYSEQDOC].

We conclude the discussion of under-determined systems by returning to our first under-determined example and finding solutions with MATLAB using the left-divide operator, RREF, and the pseudo-inverse with the last column zeroed out, which gives the same solution as the left-divide operator. The algorithm that the left-divide operator uses to find the solution with the QR factorization is listed in Left-Divide of Under-determined Systems.

>> A\b
ans =
    3.3909
    2.3864
    5.9136
         0

% Add extra rows of zeros to get the desired number of
% rows for the particular solution.
>> rref([[A b]; 0 0 0 0 0])
ans =         % particular solution in last column
    1.0000         0         0   -0.6091    3.3909
         0    1.0000         0    0.3864    2.3864
         0         0    1.0000    0.9136    5.9136
         0         0         0         0         0

% Set the forth column of A to zero, which gives the
% same result with pinv that left-dived returns.
>> A(:,4) = [0 0 0]';
>> pinv(A)*b
ans =
    3.3909
    2.3864
    5.9136
         0