6.12. Under-determined Systems

As discussed in When Does a Solution Exist?, under-determined systems of linear equations have an infinite set of solutions rather than a unique solution. If additional equations can not be found to make the system full rank, one might find an acceptable solution from the set of vectors that satisfy the system of equations. Under-determined systems are found in linear programming problems, especially for machine learning and artificial intelligence [CLINE76], [CHAPRA15].

6.12.1. RREF and Under-determined Systems

Using elimination, we can put a matrix into what is called reduced row echelon form (RREF) as described in Reduced Row Echelon Form, which will reveal the set of solution vectors where the system of equations is satisfied. In MATLAB, the rref function performs elimination with partial pivoting to put a matrix into RREF. It provides solutions but does not tell us the single best solution. When we need a working solution, MATLAB’s left-divide operator usually provides a solution that will meet our needs.

When the first \(m\) columns of an under-determined system are independent of each other, the augmented matrix in RREF takes the form of the following example that shows an identity matrix in the first \(m\) columns.

>> A = randi(10, 3, 4) - randi(5, 3, 4);
>> rank(A)
ans =
    3
>> x = [4; 2; 5; 1]; %  The values that we hope to find
>> b = A*x;
>> C = rref([A b])
C =
    1.0000         0         0   -0.6091    3.3909
         0    1.0000         0    0.3864    2.3864
         0         0    1.0000    0.9136    5.9136

Notice that each of the first three columns has only a single one. But the fourth column, for \(x_4\), has values in each row. MATLAB found the fourth column to be a linear combination of the first three columns by the weights indicated. So the values of the first three variables, pivot columns, are fixed by the equation. We call the variables associated with the pivot columns basic variables. Any variables, such as \(x_4\), that are not associated with pivot columns are called free variables, meaning they can take any value. We don’t have an equation for \(x_4\) since it is a free variable, so we can say that \(x_4 = x_4\). We can also replace it with an independent scalar variable, \(a\).

The new system of equations is:

(6.30)\[\begin{split}\begin{array}{rl} x_1 - 0.6091\,x_4 &= 3.3909 \\ x_2 + 0.3864\,x_4 &= 2.3864 \\ x_3 + 0.9136\,x_4 &= 5.9136 \\ x_4 &= x_4 \end{array}\end{split}\]

The solution is a set of lines. We can see this if we write the line equations in what is called parametric form.

(6.31)\[\begin{split}\begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} = \begin{bmatrix} 3.3909 \\ 2.3864 \\ 5.9136 \\ 0 \end{bmatrix} + a\,\begin{bmatrix} 0.6091 \\ -0.3864 \\ -0.9136 \\ 1 \end{bmatrix} = \bm{u} + a\,\bm{v}\end{split}\]

6.12.1.1. Particular and General Solution

Part of the solution that we found by elimination with rref is called the particular solution, which is any vector that solves \(\mathbf{A}\,\bm{u} = \bm{b}\) when \(a = 0\). The \(\bm{u}\) vector is the last column of the augmented matrix in RREF. We also found a general solution, which are the vectors, \(\{\bm{v}, \bm{w}, \ldots\}\), that can be multiplied by scalars, such that \(\mathbf{A}\,(a\,\bm{v} + b\,\bm{w} + \ldots) = \bm{0}\). As reflected in the example equation (6.30), the augmented equations in RREF have the equal signs between the columns from \(\bf{A}\) and \(\bm{b}\). Then, as shown in equation (6.31), when the equations are put into parametric form, the general solution vectors are moved to the right side of the equal signs with a sign change.

Note

Rows need to be added to the particular solution (\(\bm{u}\)) and the general solution (\(\bm{v}\)) vectors to match the number of columns in \(\bf{A}\).

>> A = randi(20, 3,  4) - 8;
>> b = randi(20, 3, 1) - 8;
>> z = rref([A b]);
>> u = [z(:,5); 0];
>> %  Verify that A*u = b
>> A*u - b
ans =
     0
     0
     0
>> v = [-z(:,4); 1];
>> %  Verify A*a*v = 0
>> A*v
ans =
     0
     0
     0
>> A*2*v
ans =
     0
     0
     0

Note that if \(\bm{x} = \bm{u} + a\,\bm{v}\), then

\[\mathbf{A}\,\bm{x} = \mathbf{A}\,(\bm{u} + a\,\bm{v}) = \bm{b} + \bm{0} = \bm{b}.\]

If the general solution has more than one vector, \(\{\bm{v}, \bm{w}, \ldots\}\), the same basic relationship holds.

\[\mathbf{A}\,\bm{x} = \mathbf{A}\,(\bm{u} + a\,\bm{v} + b\,\bm{w}) = \bm{b} + \bm{0} + \bm{0} = \bm{b}\]

Here is a quick example showing a two-column general solution. Remember that in parametric form, we change the sign of the general solution columns as they are moved from the left to the right side of the equal sign. The added rows to \(\bm{v}\) and \(\bm{w}\) are such that \(x_3 = a\) and \(x_4 = b\).

Note

In addition to adding rows of zeros to the particular solution (\(\bm{u}\)) and the general solution vectors to match the number of columns in \(\bf{A}\). Make the added element to the general solution vectors a 1 in the row corresponding to the column number.

>> A
A =
    8    -6     4    -3
   10    10    -7     2
>> b
b =
    14
    58
>> z = rref([A b])
z =
   1.0000         0   -0.0143   -0.1286    3.4857
        0    1.0000   -0.6857    0.3286    2.3143
>> % general solutions
>> v = [-z(:,3); 1; 0];
>> w = [-z(:,4); 0; 1];
>> A*(v + w) % a = 1; b = 1;
ans =
    0
    0
>> >> % a = 1; b = 2;
>> A*(v + 2*w)
ans =
    0
    0
>> % particular solution
>> A*[z(:,5); 0; 0]
ans =
    14.0000
    58.0000

6.12.2. The Preferred Under-determined Solution

The best solution to an under-determined system of equations is usually the smallest \(\bm{x}\) that satisfies the equations. The length of \(\bm{x}\) is found with the \(l_2\)–norm, but the \(l_1\)–norm or other vector norms described in Vector Norms may be used.

One possibility is to find a scalar multiplier, \(a\), that minimizes \(\norm{\bm{x} = \bm{u} + a\,\bm{v}}_2\). We could use an optimization algorithm as described in Finding a Minimum to minimize \(\norm{\bm{x}}_2\) with respect to the multipliers (\(a, b, \ldots\)). Fortunately, such a strategy is not necessary. Cline and Plemmons showed that the least squares solution that minimizes \(\norm{\bm{x}}_2\) is found with the Moore–Penrose pseudo-inverse of under-determined matrices [CLINE76].

\[\bm{x} = \mathbf{A}^+ \bm{b}\]

6.12.2.1. Under-determined Pseudo-inverse

Projection equations are not directly applicable to under-determined systems, so the pseudo-inverse of an under-determined matrix (\(\mathbf{A}_u^+\)) derives from the pseudo-inverse of an over-determined matrix (\(\mathbf{A}_o^+\)). From Projections Onto a Hyperplane:

\[\mathbf{A}_o^+ = \left(\mathbf{A}^T \mathbf{A}\right)^{-1}\mathbf{A}^T.\]

We have an over-determined matrix in \(\mathbf{A}^T\), and the transpose of the pseudo-inverse of \(\mathbf{A}^T\) is the pseudo-inverse of \(\bf{A}\).

(6.32)\[\begin{split}\begin{array}{rl} \mathbf{A}_u^+ &= \left(\left(\mathbf{A}^T\right)_o^+\right)^T = \left(\left(\mathbf{A\,A}^T\right)^{-1}\,\mathbf{A} \right)^T \\ \hfill \\ &= \mathbf{A}^T\left(\left(\mathbf{A\,A}^T\right)^{-1} \right)^T = \mathbf{A}^T\left(\left(\mathbf{A\,A}^T\right)^T \right)^{-1} \\ \hfill \\ &= \mathbf{A}^T\,(\mathbf{A\,A}^T)^{-1} \end{array}\end{split}\]

As with the over-determined case, the direct matrix equation can be slow and prone to inaccuracy for large and poorly conditioned matrices, so a method based on the SVD factorization is used. Like the over-determined case, several products simplify to an identity matrix for \(\mathbf{A} = \mathbf{U\,\Sigma}\mathbf{V}^T\) in equation (6.32).

\[\mathbf{A}_u^+ = \mathbf{V\,\Sigma}^+\mathbf{U}^T\]

Here is an example of the pseudo-inverse of an under-determined matrix, which is calculated with the SVD, the matrix equation, and the pinv function.

>> A = [2 4 5; 3 8 9];
>> [U,S,V] = svd(A);
>> V*pinv(S)*U'
ans =
    1.4390   -0.7561
   -1.1707    0.6829
    0.5610   -0.2439
>> % The right-divide operator
>> A'/(A*A')
ans =
    1.4390   -0.7561
   -1.1707    0.6829
    0.5610   -0.2439
>> pinv(A)
ans =
    1.4390   -0.7561
   -1.1707    0.6829
    0.5610   -0.2439

MATLAB also has a function called lsqminnorm that finds the minimum \(l_2\)–norm solution to \(\mathbf{A}\,\bm{x} = \bm{b}\).

>> A = [-4 9 3 12; -6 1 -5 12];
>> b = [4; 9];
>> x = lsqminnorm(A, b)
x =
   -0.2802
   -0.1625
   -0.3925
    0.4599
>> norm(x)
ans =
    0.6859

6.12.2.2. QR and Under-determined Systems

The QR decomposition is also used to find the least squares solution to under-determined systems. The solution, however, is complicated because the \(\bf{R}\) matrix is not square. So we start with the economy QR decomposition of \(\mathbf{A}^T\) rather than \(\bf{A}\). We will call the factors of \(\mathbf{A}^T\) \(\mathbf{Q}_t\) and \(\mathbf{R}_t\). The orthogonality of \(\mathbf{Q}_t\) and the triangular structure of \(\mathbf{R}_t\) again yield an efficient solution algorithm.

(6.33)\[\begin{split} \begin{aligned} &\mathbf{A}^T = \mathbf{Q}_t\, \mathbf{R}_t \\ &\mathbf{A} = \left(\mathbf{A}^T\right)^T = \left(\mathbf{Q}_t\, \mathbf{R}_t\right)^T = \mathbf{R}_t^T\,\mathbf{Q}_t^T \\ &\mathbf{A}\,\bm{x} = \mathbf{R}_t^T\,\mathbf{Q}_t^T\,\bm{x} \\ &\bm{x} = \mathbf{Q}_t\, \left(\mathbf{R}_t^T\right)^{-1}\,\bm{b} \end{aligned}\end{split}\]
>> A = [-4 9 3 12;
       -6 1 -5 12];
>> b = [4; 9];
>> [Qt, Rt] = qr(A', 0);
>> x = Qt * (Rt'\b)
x =
   -0.2802
   -0.1625
   -0.3925
    0.4599
>> norm(x)
ans =
    0.6859

In addition to minimizing the length of the \(\bm{x}\) vector, we also prefer sparse solutions (lots of zeros). Sparse solutions are especially desired in some machine learning applications. The smallest \(\bm{x}\) in terms of the \(l_1\)-norm is known to be a sparse solution. Numerical methods for finding the sparse \(l_1\)-norm based solution are discussed in CVX for Disciplined Convex Programming and Left-Divide of Under-determined Systems describes how the left-divide operator uses QR factoring to solve under-determined systems such that the solution contains at least \(n-r\) zeros.