6.12. Under-determined Systems¶
As discussed in When Does a Solution Exist?, under-determined systems of linear equations have an infinite set of solutions rather than a unique solution. If additional equations can not be found to make the system full rank, then one might find an acceptable solution from the set of vectors that satisfy the system of equations. Under-determined systems are found in linear programming problems, especially in the context of machine learning and artificial intelligence [CLINE76], [CHAPRA15].
6.12.1. RREF and Under-determined Systems¶
Using elimination, we can put a matrix into what is called reduced row
echelon form (RREF) as described in Reduced Row Echelon Form, which will reveal
the set of solution vectors where the system of equations is satisfied.
In MATLAB, the rref
function performs elimination with partial
pivoting to put a matrix into RREF. It provides a set of solutions but
does not tell us the single best solution. When we need a working
solution, MATLAB’s left-divide operator usually provides a solution that
will meet our needs.
When the first columns of an under-determined system are
independent of each other then the augmented matrix in RREF takes the
form of the following example that shows an identity matrix in the first
columns.
>> A = randi(10, 3, 4) - randi(5, 3, 4);
>> rank(A)
ans =
3
>> x = [4; 2; 5; 1]; % The values that we hope to find
>> b = A*x;
>> C = rref([A b])
C =
1.0000 0 0 -0.6091 3.3909
0 1.0000 0 0.3864 2.3864
0 0 1.0000 0.9136 5.9136
Notice that each of the first three columns have only a single one. But
the fourth column, for , has values in each row. MATLAB found
the fourth column to be a linear combination of the first three columns
by the weights indicated. So the values of the first three variables,
which are pivot columns, are fixed by the equation. We call the
variables associated with the pivot columns basic variables. Any
variables, such as
, that are not associated with pivot
columns are called free variables, meaning that they can take any
value. We don’t have an equation for
since it is a free
variable, so we can just say that
. We can also replace
it with an independent scalar variable,
.
The new system of equations is:
(6.32)¶
The solution is a set of lines. We can see this if we write the line equations in what is called parametric form.
(6.33)¶
6.12.1.1. Particular and General Solution¶
Part of the solution that we found by elimination with rref
is
called the particular solution, which is any vector that solves
when
. The
vector is the last column of the augmented matrix in RREF. We also found
a general solution, which are the vectors,
, that can be multiplied by scalars,
such that
. As reflected in the example equation
(6.32), the augmented equations in RREF have
the equal signs between the columns from
and
. Then, as shown in equation
(6.33), when the equations are put into
parametric form, the general solution vectors are moved to the right
side of the equal signs with a sign change.
Note
Rows need to be added to the particular solution () and
the general solution vectors to match the number of columns in
. Make the added rows to the general solution vectors a
1 in the row corresponding to the column number and a 0 in the other
rows.
>> A = randi(20, 3, 4) - 8;
>> b = randi(20, 3, 1) - 8;
>> z = rref([A b]);
>> u = [z(:,5); 0];
% Verify that A*u = b
>> A*u - b
ans =
0
0
0
>> v = [-z(:,4); 1];
% Verify A*a*v = 0
>> A*v
ans =
0
0
0
>> A*2*v
ans =
0
0
0
Note that if , then
In addition to finding the general solution from the parametric form,
the null space solution of also provides a general
solution. If needed, we can find the specific value of
to
match the null solution. When there is only one general solution then
finding
is straight forward,
.
>> a_null = null(A)./v
a_null =
0.6516
0.6516
0.6516
0.6516
>> a_0 = a_null(1);
If the general solution has more than one vector,
, the same basic relationship holds.
However, the multiplier matching the general solution vectors to the null space solution is a matrix rather than a scalar.
Here is a quick example showing a two column general solution. Remember
that in parametric form, we change the sign of the general solution
columns as they are moved them from the left to right side of the equal
sign. The added rows to and
are such that
and
.
>> A
A =
8 -6 4 -3
10 10 -7 2
>> b
b =
14
58
>> z = rref([A b])
z =
1.0000 0 -0.0143 -0.1286 3.4857
0 1.0000 -0.6857 0.3286 2.3143
% general solutions
>> v = [-z(:,3); 1; 0];
>> w = [-z(:,4); 0; 1];
>> A*(v + w) % a = 1; b = 1;
ans =
0
0
>> A*(v + 2*w) % a = 1; b = 2;
ans =
0
0
% particular solution
>> A*[z(:,5); 0; 0]
ans =
14.0000
58.0000
6.12.2. The Preferred Under-determined Solution¶
The best solution to an under-determined system of equations is usually
the smallest that satisfies the equations. The length of
is found with the
–norm, but the
–norm or other vector norms described in Vector Norms
may be used.
One possibility is to find a scalar multiplier, , that
minimizes
. We could use an
optimization algorithm as described in Finding a Minimum to minimize
with respect to the multipliers
(
). MATLAB’s
lsqminnorm
combines the particular
and general solutions to find a solution with a smaller –norm
than what the left-divide operators finds using just the particular
solution. In this example,
lsqminnorm
uses the same
and
as the previous examples.
>> x = lsqminnorm(A, b)
x =
3.4774
1.5379
-1.1043
0.0582
>> norm(x)
ans =
3.9599
>> x = A\b
x =
3.4857
2.3143
0
0
>> norm(x)
ans =
4.1840
Another approach is to let and focus on the particular
solution. Cline and Plemmons showed that the particular solution that
minimizes
is found with the
Moore–Penrose pseudo-inverse of under-determined
matrices [CLINE76]. Solutions that minimize the
–norm of
are also referred to as the least
squares solution.
Under-determined Pseudo-inverse
The Moore–Penrose pseudo-inverse, denoted , fills
the role of a matrix inverse for rectangular matrices. We can find a
pseudo-inverse for an over-determined matrix (
)
and an under-determined matrix (
). We show in
Projections Onto a Hyperplane that the pseudo-inverse of an over-determined
matrix is:
We use this result to find the pseudo-inverse of an under-determined
matrix. We have an over-determined matrix in .
The transpose of the pseudo-inverse of
is the
under-determined pseudo-inverse of
.
As with the over-determined case, the direct matrix equation can be
slow and prone to inaccuracy for large and poorly conditioned
matrices, so orthogonal techniques such as the SVD or QR
factorization algorithms are used instead. MATLAB’s pinv
function
uses the SVD as described in Over-determined Pseudo-inverse.
The left-divide operator uses QR factoring to find solutions to an under-determined systems of equations as described in Left-Divide of Under-determined Systems.
Here is an example of the pseudo-inverse of an under-determined matrix,
which is calculated with the SVD, the matrix equation, and the pinv
function.
>> A = [2 4 5; 3 8 9];
>> [U,S,V] = svd(A);
>> V*pinv(S)*U'
ans =
1.4390 -0.7561
-1.1707 0.6829
0.5610 -0.2439
% The right-divide operator
>> A'/(A*A')
ans =
1.4390 -0.7561
-1.1707 0.6829
0.5610 -0.2439
>> pinv(A)
ans =
1.4390 -0.7561
-1.1707 0.6829
0.5610 -0.2439
In addition to minimizing the length of the vector, we
also have a preference for solutions that are sparse (lots of zeros).
Sparse solutions are especially desired in some machine learning
applications. The smallest
in terms of the
–norm is known to be a sparse solution. Numerical methods for
finding the sparse
–norm based solution are discussed in
CVX for Disciplined Convex Programming.
As discussed in Left-Divide of Under-determined Systems, MATLAB’s left-divide operator finds
the least squares particular solution using the QR factorization
algorithm. Before using QR, dependent columns of are
zeroed out, which yields more zeros in the solution. The left-divide
operator zeros out
columns of
where
is the rank of
. The columns to zero out are
found sequentially. The first column zeroed out is the column that when
replaced with zeros results in the smallest
–norm of the
solution. The evaluation is repeated until
columns are
zeroed out [SYSEQDOC].
We conclude the discussion of under-determined systems by returning to our first under-determined example and finding solutions with MATLAB using the left-divide operator, RREF, and the pseudo-inverse with the last column zeroed out, which gives the same solution as the left-divide operator. The algorithm that the left-divide operator uses to find the solution with the QR factorization is listed in Left-Divide of Under-determined Systems.
>> A\b
ans =
3.3909
2.3864
5.9136
0
% Add extra rows of zeros to get the desired number of
% rows for the particular solution.
>> rref([[A b]; 0 0 0 0 0])
ans = % particular solution in last column
1.0000 0 0 -0.6091 3.3909
0 1.0000 0 0.3864 2.3864
0 0 1.0000 0.9136 5.9136
0 0 0 0 0
% Set the forth column of A to zero, which gives the
% same result with pinv that left-dived returns.
>> A(:,4) = [0 0 0]';
>> pinv(A)*b
ans =
3.3909
2.3864
5.9136
0