6.12. Under-determined Systems¶
As discussed in When Does a Solution Exist?, under-determined systems of linear equations have an infinite set of solutions rather than a unique solution. If additional equations can not be found to make the system full rank, then one might find an acceptable solution from the set of vectors that satisfy the system of equations. Under-determined systems are found in linear programming problems, especially in the context of machine learning and artificial intelligence [CLINE76], [CHAPRA15].
6.12.1. RREF and Under-determined Systems¶
Using elimination, we can put a matrix into what is called reduced row
echelon form (RREF) as described in Reduced Row Echelon Form, which will reveal
the set of solution vectors where the system of equations is satisfied.
In MATLAB, the rref
function performs elimination with partial
pivoting to put a matrix into RREF. It provides a set of solutions but
does not tell us the single best solution. When we need a working
solution, MATLAB’s left-divide operator usually provides a solution that
will meet our needs.
When the first columns of an under-determined system are independent of each other then the augmented matrix in RREF takes the form of the following example that shows an identity matrix in the first columns.
>> A = randi(10, 3, 4) - randi(5, 3, 4);
>> rank(A)
ans =
3
>> x = [4; 2; 5; 1]; % The values that we hope to find
>> b = A*x;
>> C = rref([A b])
C =
1.0000 0 0 -0.6091 3.3909
0 1.0000 0 0.3864 2.3864
0 0 1.0000 0.9136 5.9136
Notice that each of the first three columns have only a single one. But the fourth column, for , has values in each row. MATLAB found the fourth column to be a linear combination of the first three columns by the weights indicated. So the values of the first three variables, which are pivot columns, are fixed by the equation. We call the variables associated with the pivot columns basic variables. Any variables, such as , that are not associated with pivot columns are called free variables, meaning that they can take any value. We don’t have an equation for since it is a free variable, so we can just say that . We can also replace it with an independent scalar variable, .
The new system of equations is:
(6.32)¶
The solution is a set of lines. We can see this if we write the line equations in what is called parametric form.
(6.33)¶
6.12.1.1. Particular and General Solution¶
Part of the solution that we found by elimination with rref
is
called the particular solution, which is any vector that solves
when . The
vector is the last column of the augmented matrix in RREF. We also found
a general solution, which are the vectors,
, that can be multiplied by scalars,
such that . As reflected in the example equation
(6.32), the augmented equations in RREF have
the equal signs between the columns from and
. Then, as shown in equation
(6.33), when the equations are put into
parametric form, the general solution vectors are moved to the right
side of the equal signs with a sign change.
Note
Rows need to be added to the particular solution () and the general solution vectors to match the number of columns in . Make the added rows to the general solution vectors a 1 in the row corresponding to the column number and a 0 in the other rows.
>> A = randi(20, 3, 4) - 8;
>> b = randi(20, 3, 1) - 8;
>> z = rref([A b]);
>> u = [z(:,5); 0];
% Verify that A*u = b
>> A*u - b
ans =
0
0
0
>> v = [-z(:,4); 1];
% Verify A*a*v = 0
>> A*v
ans =
0
0
0
>> A*2*v
ans =
0
0
0
Note that if , then
In addition to finding the general solution from the parametric form, the null space solution of also provides a general solution. If needed, we can find the specific value of to match the null solution. When there is only one general solution then finding is straight forward, .
>> a_null = null(A)./v
a_null =
0.6516
0.6516
0.6516
0.6516
>> a_0 = a_null(1);
If the general solution has more than one vector, , the same basic relationship holds.
However, the multiplier matching the general solution vectors to the null space solution is a matrix rather than a scalar.
Here is a quick example showing a two column general solution. Remember that in parametric form, we change the sign of the general solution columns as they are moved them from the left to right side of the equal sign. The added rows to and are such that and .
>> A
A =
8 -6 4 -3
10 10 -7 2
>> b
b =
14
58
>> z = rref([A b])
z =
1.0000 0 -0.0143 -0.1286 3.4857
0 1.0000 -0.6857 0.3286 2.3143
% general solutions
>> v = [-z(:,3); 1; 0];
>> w = [-z(:,4); 0; 1];
>> A*(v + w) % a = 1; b = 1;
ans =
0
0
>> A*(v + 2*w) % a = 1; b = 2;
ans =
0
0
% particular solution
>> A*[z(:,5); 0; 0]
ans =
14.0000
58.0000
6.12.2. The Preferred Under-determined Solution¶
The best solution to an under-determined system of equations is usually the smallest that satisfies the equations. The length of is found with the –norm, but the –norm or other vector norms described in Vector Norms may be used.
One possibility is to find a scalar multiplier, , that
minimizes . We could use an
optimization algorithm as described in Finding a Minimum to minimize
with respect to the multipliers
(). MATLAB’s lsqminnorm
combines the particular
and general solutions to find a solution with a smaller –norm
than what the left-divide operators finds using just the particular
solution. In this example, lsqminnorm
uses the same
and as the previous examples.
>> x = lsqminnorm(A, b)
x =
3.4774
1.5379
-1.1043
0.0582
>> norm(x)
ans =
3.9599
>> x = A\b
x =
3.4857
2.3143
0
0
>> norm(x)
ans =
4.1840
Another approach is to let and focus on the particular solution. Cline and Plemmons showed that the particular solution that minimizes is found with the Moore–Penrose pseudo-inverse of under-determined matrices [CLINE76]. Solutions that minimize the –norm of are also referred to as the least squares solution.
Under-determined Pseudo-inverse
The Moore–Penrose pseudo-inverse, denoted , fills the role of a matrix inverse for rectangular matrices. We can find a pseudo-inverse for an over-determined matrix () and an under-determined matrix (). We show in Projections Onto a Hyperplane that the pseudo-inverse of an over-determined matrix is:
We use this result to find the pseudo-inverse of an under-determined matrix. We have an over-determined matrix in . The transpose of the pseudo-inverse of is the under-determined pseudo-inverse of .
As with the over-determined case, the direct matrix equation can be
slow and prone to inaccuracy for large and poorly conditioned
matrices, so orthogonal techniques such as the SVD or QR
factorization algorithms are used instead. MATLAB’s pinv
function
uses the SVD as described in Over-determined Pseudo-inverse.
The left-divide operator uses QR factoring to find solutions to an under-determined systems of equations as described in Left-Divide of Under-determined Systems.
Here is an example of the pseudo-inverse of an under-determined matrix,
which is calculated with the SVD, the matrix equation, and the pinv
function.
>> A = [2 4 5; 3 8 9];
>> [U,S,V] = svd(A);
>> V*pinv(S)*U'
ans =
1.4390 -0.7561
-1.1707 0.6829
0.5610 -0.2439
% The right-divide operator
>> A'/(A*A')
ans =
1.4390 -0.7561
-1.1707 0.6829
0.5610 -0.2439
>> pinv(A)
ans =
1.4390 -0.7561
-1.1707 0.6829
0.5610 -0.2439
In addition to minimizing the length of the vector, we also have a preference for solutions that are sparse (lots of zeros). Sparse solutions are especially desired in some machine learning applications. The smallest in terms of the –norm is known to be a sparse solution. Numerical methods for finding the sparse –norm based solution are discussed in CVX for Disciplined Convex Programming.
As discussed in Left-Divide of Under-determined Systems, MATLAB’s left-divide operator finds the least squares particular solution using the QR factorization algorithm. Before using QR, dependent columns of are zeroed out, which yields more zeros in the solution. The left-divide operator zeros out columns of where is the rank of . The columns to zero out are found sequentially. The first column zeroed out is the column that when replaced with zeros results in the smallest –norm of the solution. The evaluation is repeated until columns are zeroed out [SYSEQDOC].
We conclude the discussion of under-determined systems by returning to our first under-determined example and finding solutions with MATLAB using the left-divide operator, RREF, and the pseudo-inverse with the last column zeroed out, which gives the same solution as the left-divide operator. The algorithm that the left-divide operator uses to find the solution with the QR factorization is listed in Left-Divide of Under-determined Systems.
>> A\b
ans =
3.3909
2.3864
5.9136
0
% Add extra rows of zeros to get the desired number of
% rows for the particular solution.
>> rref([[A b]; 0 0 0 0 0])
ans = % particular solution in last column
1.0000 0 0 -0.6091 3.3909
0 1.0000 0 0.3864 2.3864
0 0 1.0000 0.9136 5.9136
0 0 0 0 0
% Set the forth column of A to zero, which gives the
% same result with pinv that left-dived returns.
>> A(:,4) = [0 0 0]';
>> pinv(A)*b
ans =
3.3909
2.3864
5.9136
0