6.13. Left-Divide Operator¶
In this chapter, we focused primarily on solutions to systems of linear
equations. We express systems of linear equations in terms of the
\(\mathbf{A}\,\bm{x} = \bm{b}\) problem. A constant reference point
in the discussion has been MATLAB’s left-divide or backslash operator
(\
), which efficiently finds \(\bm{x}\) for all but singular
equations. The results and applications are more important than the
details of MATLAB’s algorithms. Still, it is illuminating to briefly
discuss the algorithms in light of what we have learned about systems of
equations.
The mldivide
function is invoked by the left-divide operator. The
documentation does not tell us every secret behind the implementation of
the algorithms, but it does tell us the basic strategies. Please look at
the MATLAB documentation of mldivide
. The documentation shows flow
charts depicting how full and sparse matrices are evaluated by their
size, shape, and special properties to determine the most efficient
algorithm. Note that the work of mldivide
is usually a progression
from more complex problems toward simpler problems using matrix
factorization. In particular, lower and upper triangular matrices are
sought so that efficient triangular solvers can be used to find
solutions by forward or back substitution.
6.13.1. Left-Divide of Square Matrix Systems¶
As discussed in LU Decomposition, when matrix \(\bf{A}\) is square, the primary algorithm is LU decomposition with row exchanges (partial pivoting). Matrices smaller than \(16{\times}16\) (real) or \(8{\times}8\) (complex) are always solved with LU decomposition. Other algorithms for larger square matrices use the matrix’s special structural features to compute efficient matrix factorizations.
A square, dense (not sparse) matrix is first checked to see if it is triangular or can be multiplied by a permutation matrix to become triangular. Triangular systems are solved with forward and back substitution.
Next, it is checked to see if it is Hermitian (Matrix Transpose Properties). If \(\mathbf{A}\) is Hermitian and positive definite, the symmetry allows quick factoring with the Cholesky decomposition \(\mathbf{A} = \mathbf{R}^T\,\mathbf{R}\), where \(\bf{R}\) is upper triangular (Cholesky Decomposition for Positive Definite Systems). Cholesky decomposition is attempted if \(\bf{A}\) appears to be positive definite, but if it fails because \(\bf{A}\) is not positive definite, then the LU decomposition is used.
Note
MATLAB checks if the diagonal values of a Hermitian matrix are all positive or all negative, which is a necessary but not sufficient requirement for being positive definite. If the diagonal values are all negative, then the negative of \(\bf{A}\) is used, which may be positive definite. If the first two tests pass, then the Cholesky decomposition is attempted.
A square matrix that is not triangular or Hermitian is solved with either LU decomposition as presented in LU Decomposition or with a variation of the LU decomposition algorithm that takes advantage of the structure of upper Hessenberg matrices. An upper Hessenberg matrix is one diagonal off from being upper triangular. It contains zeros below the first subdiagonal (just below the main diagonal). Having a special solver for Hessenberg may seem obscure, but they occur in many problems. Upper Hessenberg matrices also include the class of tridiagonal matrices that only contain nonzero elements on the main diagonal and the diagonals immediately above and below the main diagonal. Heat conduction and fluid flow problems often result in tridiagonal matrices [BUTT10].
6.13.2. Left-Divide of Over-determined Systems¶
As discussed in Over-determined Systems and Vector Projections, the solution to over-determined
systems is based on vector projections. It is an exact solution in the
rare event that vector \(\bm{b}\) is in the column space of
\(\bf{A}\), but it is usually an approximation. We have shown how
the least squares solution is found from the vector projection equation
\(\bm{x} = \left(\mathbf{A}^T\mathbf{A}\right)^{-1}\mathbf{A}^T\bm{b}\)
or from the pseudo-inverse (\(\bm{x} = \mathbf{A}^+\bm{b}\)), which
MATLAB finds from the economy SVD. However, the mldivide
function
finds the least squares solution from the economy QR factorization
equation of equation (6.27) in Projections Onto a Hyperplane. The
solution is found quickly because \(\bf{Q}\) is orthogonal and
\(\bf{R}\) is upper triangular. So finding the solution only
requires a matrix transpose, a matrix–vector multiplication, and back
substitution with the triangular solver. Note that for an
over-determined matrix, the \(\bf{R}\) matrix will end with
\(m-n\) rows of all zeros, so we add an extra zero argument to
qr
to use the economy QR, which removes those rows. Here is a
quick example:
>> A = randi(20, 5, 2) - 10;
>> b = randi(20, 5, 1) - 10;
>> x = A\b
x =
0.6383
0.1901
>> [Q, R] = qr(A, 0);
>> x = R\(Q'*b)
x =
0.6383
0.1901
6.13.3. Left-Divide of Under-determined Systems¶
As discussed in Under-determined Systems, the solution to
under-determined systems is an infinite set rather than a unique
solution. So any solution satisfying the equation
\(\mathbf{A}\,\bm{x} = \bm{b}\) is correct, but some solutions are
better than others. The minimum length solution is desired, but so is a
sparse solution with several zeros. The documentation for mldivide
states that the returned solution has at most \(r\), the rank of
\(\bf{A}\), nonzero values. It replaces \(n-r\) columns of
\(\bf{A}\) with zeros before QR factorization to produce the zero
value results. The columns to zero out are found sequentially. The first
column zeroed out is the column that, when replaced with zeros, yields
the solution with the smallest \(l_2\)-norm. The evaluation is
repeated until \(n-r\) columns are zeroed
out [SYSEQDOC]. The calculation uses the economy QR
factorization of \(\mathbf{A}^T\) rather than \(\bf{A}\) as
listed in equation (6.33) of QR and Under-determined Systems.
The following example uses the same \(\bf{A}\) and \(\bm{b}\) as the example from QR and Under-determined Systems. Note that the solution contains more zeros but has a larger \(l_2\)-norm.
>> A = [-4 9 3 12;
-6 1 -5 12];
>> b = [4; 9];
>> x = A\b
x =
0
-0.6250
0
0.8021
>> norm(x)
ans =
1.0168
>> A(:,1) = [0; 0];
>> A(:,3) = [0; 0];
>> [Qt, Rt] = qr(A', 0);
>> x = Qt * (Rt'\b)
x =
0.0000
-0.6250
0
0.8021