.. _underdetermined: Under-determined Systems ======================== .. index:: under-determined systems As discussed in :ref:`when`, under-determined systems of linear equations have an infinite set of solutions rather than a unique solution. If additional equations can not be found to make the system full rank, then one might find an acceptable solution from the set of vectors that satisfy the system of equations. Under-determined systems are found in linear programming problems, especially in the context of machine learning and artificial intelligence [CLINE76]_, [CHAPRA15]_. .. _parametricForm: RREF and Under-determined Systems --------------------------------- .. index:: rref .. index:: reduced row echelon form Using elimination, we can put a matrix into what is called *reduced row echelon form* (RREF) as described in :ref:`rref`, which will reveal the set of solution vectors where the system of equations is satisfied. In MATLAB, the ``rref`` function performs elimination with partial pivoting to put a matrix into RREF. It provides a set of solutions but does not tell us the single *best* solution. When we need a working solution, MATLAB’s left-divide operator usually provides a solution that will meet our needs. When the first :math:`m` columns of an under-determined system are independent of each other then the augmented matrix in RREF takes the form of the following example that shows an identity matrix in the first :math:`m` columns. :: >> A = randi(10, 3, 4) - randi(5, 3, 4); >> rank(A) ans = 3 >> x = [4; 2; 5; 1]; % The values that we hope to find >> b = A*x; >> C = rref([A b]) C = 1.0000 0 0 -0.6091 3.3909 0 1.0000 0 0.3864 2.3864 0 0 1.0000 0.9136 5.9136 Notice that each of the first three columns have only a single one. But the fourth column, for :math:`x_4`, has values in each row. MATLAB found the fourth column to be a linear combination of the first three columns by the weights indicated. So the values of the first three variables, which are pivot columns, are fixed by the equation. We call the variables associated with the pivot columns *basic variables*. Any variables, such as :math:`x_4`, that are not associated with pivot columns are called *free variables*, meaning that they can take any value. We don’t have an equation for :math:`x_4` since it is a free variable, so we can just say that :math:`x_4 = x_4`. We can also replace it with an independent scalar variable, :math:`a`. The new system of equations is: .. math:: :label: eq-underdet-rref-out \begin{array}{rl} x_1 - 0.6091\,x_4 &= 3.3909 \\ x_2 + 0.3864\,x_4 &= 2.3864 \\ x_3 + 0.9136\,x_4 &= 5.9136 \\ x_4 &= x_4 \end{array} The solution is a set of lines. We can see this if we write the line equations in what is called *parametric form*. .. index:: parametric form .. math:: :label: eq-underdet-paramtric \vector{x_1; x_2; x_3; x_4} = \vector{3.3909; 2.3864; 5.9136; 0} + a\,\vector{0.6091; -0.3864; -0.9136; 1} = \bm{u} + a\,\bm{v} Particular and General Solution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Part of the solution that we found by elimination with ``rref`` is called the *particular* solution, which is any vector that solves :math:`\mathbf{A}\,\bm{u} = \bm{b}` when :math:`a = 0`. The :math:`\bm{u}` vector is the last column of the augmented matrix in RREF. We also found a *general* solution, which are the vectors, :math:`\{\bm{v}, \bm{w}, \ldots\}`, that can be multiplied by scalars, such that :math:`\mathbf{A}\,(a\,\bm{v} + b\,\bm{w} + \ldots) = \bm{0}`. As reflected in the example equation :eq:`eq-underdet-rref-out`, the augmented equations in RREF have the equal signs between the columns from :math:`\bf{A}` and :math:`\bm{b}`. Then, as shown in equation :eq:`eq-underdet-paramtric`, when the equations are put into parametric form, the general solution vectors are moved to the right side of the equal signs with a sign change. .. note:: Rows need to be added to the particular solution (:math:`\bm{u}`) and the general solution vectors to match the number of columns in :math:`\bf{A}`. Make the added rows to the general solution vectors a 1 in the row corresponding to the column number and a 0 in the other rows. :: >> A = randi(20, 3, 4) - 8; >> b = randi(20, 3, 1) - 8; >> z = rref([A b]); >> u = [z(:,5); 0]; % Verify that A*u = b >> A*u - b ans = 0 0 0 >> v = [-z(:,4); 1]; % Verify A*a*v = 0 >> A*v ans = 0 0 0 >> A*2*v ans = 0 0 0 Note that if :math:`\bm{x} = \bm{u} + a\,\bm{v}`, then .. math:: \mathbf{A}\,\bm{x} = \mathbf{A}\,(\bm{u} + a\,\bm{v}) = \bm{b} + \bm{0} = \bm{b}. In addition to finding the general solution from the parametric form, the null space solution of :math:`\bf{A}` also provides a general solution. If needed, we can find the specific value of :math:`a` to match the null solution. When there is only one general solution then finding :math:`a_0` is straight forward, :math:`a_0\,\bm{v} = \text{Null}(\mathbf{A})`. :: >> a_null = null(A)./v a_null = 0.6516 0.6516 0.6516 0.6516 >> a_0 = a_null(1); .. index:: rref .. index:: null If the general solution has more than one vector, :math:`\{\bm{v}, \bm{w}, \ldots\}`, the same basic relationship holds. .. math:: \mathbf{A}\,\bm{x} = \mathbf{A}\,(\bm{u} + a\,\bm{v} + b\,\bm{w}) = \bm{b} + \bm{0} + \bm{0} = \bm{b} However, the multiplier matching the general solution vectors to the null space solution is a matrix rather than a scalar. .. math:: \mat{\bm{v} \bm{w}}\, \mathbf{\mathbf{D}} = null(\mathbf{A}) Here is a quick example showing a two column general solution. Remember that in parametric form, we change the sign of the general solution columns as they are moved them from the left to right side of the equal sign. The added rows to :math:`\bm{v}` and :math:`\bm{w}` are such that :math:`x_3 = a` and :math:`x_4 = b`. :: >> A A = 8 -6 4 -3 10 10 -7 2 >> b b = 14 58 >> z = rref([A b]) z = 1.0000 0 -0.0143 -0.1286 3.4857 0 1.0000 -0.6857 0.3286 2.3143 % general solutions >> v = [-z(:,3); 1; 0]; >> w = [-z(:,4); 0; 1]; >> A*(v + w) % a = 1; b = 1; ans = 0 0 >> A*(v + 2*w) % a = 1; b = 2; ans = 0 0 % particular solution >> A*[z(:,5); 0; 0] ans = 14.0000 58.0000 .. _prefunder: The Preferred Under-determined Solution --------------------------------------- The best solution to an under-determined system of equations is usually the smallest :math:`\bm{x}` that satisfies the equations. The length of :math:`\bm{x}` is found with the :math:`l_2`–norm, but the :math:`l_1`–norm or other vector norms described in :ref:`vecNorms` may be used. One possibility is to find a scalar multiplier, :math:`a`, that minimizes :math:`\norm{\bm{x} = \bm{u} + a\,\bm{v}}_2`. We could use an optimization algorithm as described in :ref:`optMin` to minimize :math:`\norm{\bm{x}}_2` with respect to the multipliers (:math:`a, b, \ldots`). MATLAB’s ``lsqminnorm`` combines the particular and general solutions to find a solution with a smaller :math:`l_2`–norm than what the left-divide operators finds using just the particular solution. In this example, ``lsqminnorm`` uses the same :math:`\bf{A}` and :math:`\bm{b}` as the previous examples. .. index:: lsqminnorm :: >> x = lsqminnorm(A, b) x = 3.4774 1.5379 -1.1043 0.0582 >> norm(x) ans = 3.9599 >> x = A\b x = 3.4857 2.3143 0 0 >> norm(x) ans = 4.1840 Another approach is to let :math:`a = 0` and focus on the particular solution. Cline and Plemmons showed that the particular solution that minimizes :math:`\norm{\bm{x} = \bm{u}}_2` is found with the Moore–Penrose pseudo-inverse of under-determined matrices [CLINE76]_. Solutions that minimize the :math:`l_2`–norm of :math:`\bm{x}` are also referred to as the *least squares* solution. .. math:: \bm{x} = \mathbf{A}^+ \bm{b} .. topic:: Under-determined Pseudo-inverse .. index:: pseudo-inverse .. index:: Moore-Penrose pseudo-inverse The Moore–Penrose pseudo-inverse, denoted :math:`\mathbf{A}^+`, fills the role of a matrix inverse for rectangular matrices. We can find a pseudo-inverse for an over-determined matrix (:math:`\mathbf{A}_o^+`) and an under-determined matrix (:math:`\mathbf{A}_u^+`). We show in :ref:`projectR3` that the pseudo-inverse of an over-determined matrix is: .. math:: \mathbf{A}_o^+ = \left(\mathbf{A}^T \mathbf{A}\right)^{-1}\mathbf{A}^T. We use this result to find the pseudo-inverse of an under-determined matrix. We have an over-determined matrix in :math:`\mathbf{A}^T`. The transpose of the pseudo-inverse of :math:`\mathbf{A}^T` is the under-determined pseudo-inverse of :math:`\bf{A}`. .. math:: \begin{array}{rl} \mathbf{A}_u^+ &= \left(\left(\mathbf{A}^T\right)_o^+\right)^T = \left(\left(\mathbf{A\,A}^T\right)^{-1}\,\mathbf{A} \right)^T \\ \hfill \\ &= \mathbf{A}^T\left(\left(\mathbf{A\,A}^T\right)^{-1} \right)^T = \mathbf{A}^T\left(\left(\mathbf{A\,A}^T\right)^T \right)^{-1} \\ \hfill \\ &= \mathbf{A}^T\,(\mathbf{A\,A}^T)^{-1} \end{array} As with the over-determined case, the direct matrix equation can be slow and prone to inaccuracy for large and poorly conditioned matrices, so orthogonal techniques such as the SVD or QR factorization algorithms are used instead. MATLAB’s ``pinv`` function uses the SVD as described in :ref:`SVDpinv`. .. math:: \mathbf{A}_u^+ = \mathbf{V\,\Sigma}^+\mathbf{U}^T The left-divide operator uses QR factoring to find solutions to an under-determined systems of equations as described in :ref:`left-under`. .. index:: pinv Here is an example of the pseudo-inverse of an under-determined matrix, which is calculated with the SVD, the matrix equation, and the ``pinv`` function. :: >> A = [2 4 5; 3 8 9]; >> [U,S,V] = svd(A); >> V*pinv(S)*U' ans = 1.4390 -0.7561 -1.1707 0.6829 0.5610 -0.2439 % The right-divide operator >> A'/(A*A') ans = 1.4390 -0.7561 -1.1707 0.6829 0.5610 -0.2439 >> pinv(A) ans = 1.4390 -0.7561 -1.1707 0.6829 0.5610 -0.2439 In addition to minimizing the length of the :math:`\bm{x}` vector, we also have a preference for solutions that are sparse (lots of zeros). Sparse solutions are especially desired in some machine learning applications. The smallest :math:`\bm{x}` in terms of the :math:`l_1` –norm is known to be a sparse solution. Numerical methods for finding the sparse :math:`l_1` –norm based solution are discussed in :ref:`cvxOpt`. As discussed in :ref:`left-under`, MATLAB’s left-divide operator finds the least squares particular solution using the QR factorization algorithm. Before using QR, dependent columns of :math:`\bf{A}` are zeroed out, which yields more zeros in the solution. The left-divide operator zeros out :math:`n - r` columns of :math:`\bf{A}` where :math:`r` is the rank of :math:`\bf{A}`. The columns to zero out are found sequentially. The first column zeroed out is the column that when replaced with zeros results in the smallest :math:`l_2`–norm of the solution. The evaluation is repeated until :math:`n - r` columns are zeroed out [SYSEQDOC]_. We conclude the discussion of under-determined systems by returning to our first under-determined example and finding solutions with MATLAB using the left-divide operator, RREF, and the pseudo-inverse with the last column zeroed out, which gives the same solution as the left-divide operator. The algorithm that the left-divide operator uses to find the solution with the QR factorization is listed in :ref:`left-under`. :: >> A\b ans = 3.3909 2.3864 5.9136 0 % Add extra rows of zeros to get the desired number of % rows for the particular solution. >> rref([[A b]; 0 0 0 0 0]) ans = % particular solution in last column 1.0000 0 0 -0.6091 3.3909 0 1.0000 0 0.3864 2.3864 0 0 1.0000 0.9136 5.9136 0 0 0 0 0 % Set the forth column of A to zero, which gives the % same result with pinv that left-dived returns. >> A(:,4) = [0 0 0]'; >> pinv(A)*b ans = 3.3909 2.3864 5.9136 0