6.2. Working with Matrices

We introduced both vectors and matrices in Vectors and Matrices in MATLAB and discussed how to create them in MATLAB. Here, we present mathematical operations on matrices and review important properties of matrices.

6.2.1. Matrix Math

6.2.1.1. Addition and Subtraction

Addition and subtraction of matrices is performed element-wise.

\mat{a b; c d} - \mat{e f; g h} = \mat{a-e b-f; c-g d-h}

Scalar and element-wise operations between matrices work the same as with vectors. Of course, the size of both matrices must be the same ( Element-wise Arithmetic).

6.2.1.2. Matrix Multiplication

Multiplication of a vector by a matrix and multiplication of two matrices are common linear algebra operations. The former is used in the expression of a system of linear equations of the form \mathbf{A}\,\bm{x} = \bm{b}. Points and vectors can be moved, scaled, rotated, or skewed when multiplied by a matrix. This has application to physical geometry and also to pixels in an image. Two primary uses of matrix–to–matrix multiplication are explored in later sections. First, we discuss geometric transformation matrices in Geometric Transforms. The transformation matrices are multiplied with moveable points to determine new coordinate locations. Multiplication of the transformation matrices is also used to combine transformations. Secondly, we will cover several matrix factorings later in this chapter and in the next chapter, such as \mathbf{A} = \mathbf{L\,U}, \mathbf{A} =
\mathbf{Q\,R}, \mathbf{A} = \mathbf{U\,\Sigma\,V}^T, and \mathbf{A} = \mathbf{X\,\Lambda\,X}^{-1}. Each matrix factoring facilitates solving particular problems. It is matrix multiplication that verifies that the product of the factors yields the original matrix.

Here is an example showing both the product of matrix factors and how multiplying a vector by a matrix scales and rotates vectors and points. We begin with a vector on the x axis and a matrix that rotates points and vectors \pi/4 radians (). Rotation matrices are discussed in Rotation of a Point. We multiply the rotation matrix by two so that it also scales the vector. Then we use the singular value decomposition [1] (SVD), which is later described in Singular Value Decomposition (SVD). With matrix multiplication, we see that the product of the factors is the same as our original matrix. Then figure Fig. 6.11 shows that successive multiplications by the factors rotate, scale, and again rotate the vector to yield the same result as multiplication by the original scaling and rotating matrix (\mathbf{p}_3 = 2\,\mathbf{R\,p} = \mathbf{U\,\Sigma\,V}^T
\mathbf{p}). [2]

2\,\mathbf{R} = \mat{1.4142, -1.4142; 1.4142 1.4142} =
\underbrace{\mat{-0.7071, -0.7071; -0.7071, 0.7071}}_{\mathbf{U}}
\underbrace{\mat{2 0; 0 2}}_{\mathbf{\Sigma}}
\underbrace{\mat{-1 0; 0 1}}_{\mathbf{V}^T}

../_images/succMatMult.png

Fig. 6.11 The first matrix multiplication rotates the vector. The second multiplication scales the vector. The third multiplication rotates the vector again. The three successive multiplications achieve the same result as multiplying the vector by 2\,\mathbf{R}.

p = [1;0];
R = [0.7071 -0.7071;
     0.7071 0.7071];
[U,S,V] = svd(2*R);
p1 = V'*p;  % (-1, 0)
p2 = S*p1;  % (-2, 0)
p3 = U*p2;  % (1.4, 1.4)

The elements of a matrix product are found by inner products. To complete the inner products, the number of columns of the left matrix must match the number of rows in the right matrix. We say that the inner dimensions of the two matrices must agree. As shown in Size of Matrix Multiplications, A m{\times}p matrix may be multiplied by a p{\times}n matrix to yield a m{\times}n matrix. The size of the product matrix is the outer dimensions of the matrices.

../_images/matmult-size.png
Table 6.1 Size of Matrix Multiplications
First Matrix Second Matrix Output Matrix
1{\times}n row vector n{\times}1 column vector 1{\times}1 scalar
1{\times}p row vector p{\times}n matrix 1{\times}n row vector
m{\times}1 column vector 1{\times}n row vector m{\times}n matrix
m{\times}p matrix p{\times}1 column vector m{\times}1 column vector

With regard to matrix multiplication, we will view vectors as a matrix with only one column (size m{\times}1). Multiplication of a matrix by a vector is defined as either the linear combination of the columns of the matrix with the vector elements, or as the inner product between the rows of the left matrix and the column vector.

\begin{array}{ll}
\mathbf{A}\,\bm{x} & = \spalignmat{1 2 3; 2 5 2; 6 -3 1}\,
       \spalignvector{2 1 1} \\ \\
  & = 2\, \spalignvector{1 2 6} + 1\,\spalignvector{2 5 -3}
     + 1\,\spalignvector{3 2 1} \\ \\
  & = \spalignmat{ \spalignmat{1 2 3};;
                 \spalignmat{2 5 2};;
                 \spalignmat{6 -3 1} }\:\vector{2 1 1} \\
                 \\
  & = \spalignvector{7 11 10}
\end{array}

The individual values of a matrix product \mathbf{C}_{i,j} are calculated as inner products between the rows of the left matrix (\bf{A}) and the columns of the right matrix (\bf{B}).

\mathbf{C}_{i,j} = \sum_{n=1}^{p} \mathbf{A}_{i,n} \,
\mathbf{B}_{n,j}

For 2{\times}2 matrices:

\begin{array}{l}
\spalignmat{a b; c d}\, \spalignmat{e f;g h} =
\spalignmat{(a\,e + b\,g) (a\,f + b\,h);
            (c\,e + d\,g) (c\,f + d\,h)} \\ \\
\spalignmat{1 2; 3 4}\, \spalignmat{5 6; 7 8} =
\spalignmat{(1\cdot5 + 2\cdot7) (1\cdot6 + 2\cdot8);
            (3\cdot5 + 4\cdot7) (3\cdot6 + 4\cdot8)} =
\spalignmat{19 22; 43 50}
\end{array}

6.2.1.3. Matrix Multiplication Properties

  1. Matrix multiplication is associative: (\bf{A\,B})\,\bf{C} = \bf{A}\,(\bf{B\,C}).

  2. In general, matrix multiplication is NOT commutative even for square matrices: (\bf{A\,B} \neq  \bf{B\,A}). The more obvious exceptions to this are multiplication by the identity matrix and an inverse matrix. Note below that the two products of the same matrices are not the same.

    \begin{array}{rl}
\mat{3 0; 4 3}\mat{2 -2; 3 4} &= \mat{6 -6; 3 4} \\
\hfill \\
\mat{2 -2; 3 4}\mat{3 0; 4 3} &= \mat{-2 -6; 25 12}
\end{array}

  3. \bf{A\,B} = \bf{A\,C} does not imply that \bf{B} = \bf{C}. For example, consider matrices:

    \begin{array}{c}
\mathbf{A} = \mat{1 2; 2 4}, \mathbf{B} = \mat{2 1; 1 3},
\mathbf{C} = \mat{4 3; 0 2} \\ \hfill \\
\mathbf{A\,B} = \mathbf{A\,C} = \mat{4 7; 8 14}
\end{array}

The Outer Product View of Matrix Multiplication

We normally think of matrix multiplication as finding each term of the matrix product from the sum of products between the rows of the left matrix and the columns of the right matrix. This is the inner product view of matrix multiplication. But there is also an outer product view that is columns times rows.

\mathbf{A\,B} = \mat{\vertbar{} \vertbar{} {} \vertbar{};
        \bm{a}_1 \bm{a}_2 \cdots{} \bm{a}_p;
        \vertbar{} \vertbar{} {} \vertbar{}}
\mat{ \horzbar{} \bm{b}_1 \horzbar{};
         \horzbar{} \bm{b}_2 \horzbar{};
         {} \vdots{} {};
         \horzbar{} \bm{b}_p \horzbar{}}
= \bm{a}_1\,\bm{b}_1 + \cdots + \bm{a}_p\,\bm{b}_p

Each \bm{a}_k column of a m{\times}p matrix multiplies \bm{b}_k, the k\mbox{th} row of a p{\times}n matrix. The product \bm{a}_k \bm{b}_k is an m{\times}n matrix of rank one formed from the outer product of the two vectors. The final product \bf{A B} is the sum of the rank-one matrices.

Outer product matrix multiplication is not how we usually multiply two matrices. Rather, it gives an alternative perspective of what matrix multiplication accomplishes. This perspective will be useful later when we learn about the singular value decomposition.

6.2.1.4. Matrix Division

Matrix division, in the sense of scalar division, is not defined for matrices. Multiplication by the inverse of a matrix accomplishes the same objective as division does in scalar arithmetic. MATLAB also has two alternative operators to the matrix inverse that act as one might expect from division operators, left-divide (\) and right-divide (/); however, they are more complicated than simple division. As shown in

MATLAB’s Matrix Division Operators, they are used to solve systems of linear equations as described in Systems of Linear Equations.

Table 6.2 MATLAB’s Matrix Division Operators
\ left-divide A*x = b x = A\b x = inv(A)*b
/ right-divide x*A = b x = b/A x = b*inv(A)

6.2.2. Special Matrices

Zero Matrix

A zero vector or matrix contains all zero elements and is denoted as \bf{0}.

Diagonal Matrix

A diagonal matrix has zeros everywhere except on the main diagonal, which is the elements where the row index and column index are the same. Diagonal matrices are usually square (same number of rows and columns), but they may be rectangular with extra rows or columns of zeros.

\mat{d_1 0 0; 0 d_2 0; 0 0 d_3}

The main diagonal, also called the forward diagonal, or the major diagonal, is the set of diagonal matrix elements from upper left to lower right. The set of diagonal elements from lower left to upper right is of significantly less interest to us, but has several names including the antidiagonal, back diagonal, secondary diagonal, or the minor diagonal.

Identity Matrix

An identity matrix (\bf{I}) is a square, diagonal matrix where all of the elements on the main diagonal are one. Identity matrices are like a one in scalar math. That is, the product of any matrix with the identity matrix yields itself.

\mathbf{A\,I} = \mathbf{A} = \mathbf{I\,A}

\begin{array}{ll}
    \mathbf{I}_{2{\times}2} & = \spalignmat{1 0; 0 1} \\ \\
\mathbf{I}_{3{\times}3} & = \spalignmat{1 0 0; 0 1 0; 0 0 1} \\ \\
\mathbf{I}_{4{\times}4} & = \spalignmat{1 0 0 0; 0 1 0 0;0 0 1 0;0 0 0 1}
\end{array}

MATLAB has a function called eye that takes one argument for the matrix size and returns an identity matrix.

>> I2 = eye(2)
I2 =
    1     0
    0     1
>> I3 = eye(3)
I3 =
    1     0     0
    0     1     0
    0     0     1
Upper Triangular Matrix

Upper triangular matrices have all zero values below the main diagonal. Any nonzero elements are on or above the main diagonal.

\mathbf{U} = \mat{a b c; 0 d e; 0 0 f}

Lower Triangular Matrix

Lower triangular matrices have all zero values above the main diagonal. Any nonzero elements are on or below the main diagonal.

\mathbf{L} = \mat{a 0 0; b c 0; d e f}

Symmetric Matrix

A symmetric matrix (\bf{S}) has symmetry relative to the main diagonal. If the matrix where written on a piece of paper and you folded the paper along the main diagonal then the off-diagonal elements with the same value would lie on top of each other. For symmetric matrices \mathbf{S}^T = \mathbf{S}. Note that for any matrix, \bf{A}, the products \mathbf{A}^T\mathbf{A} and \mathbf{A\,A}^T both yield symmetric matrices. Here are a couple examples of symmetric matrices.

\begin{array}{ccc}
    \mat{1 2;2 3} & \hspace{35pt} & \mat{2 3 6; 3 1 5; 6 5 2}
\end{array}

A symmetric matrix with complex elements is said to be Hermitian.

Orthonormal columns

A matrix, \bf{Q}, has orthonormal columns if the columns are unit vectors (length of 1) and the dot product between the columns is zero (\cos(\theta) = 0). That is to say, the columns are all orthogonal to each other.

Unitary Matrix

A matrix \mathbf{Q} \in \mathbb{C}^{n\,\times\,n} is unitary if it is both orthonormal and square. The inverse of a unitary matrix is its conjugate (Hermitian) transpose, \mathbf{Q}^{-1} = \mathbf{Q}^H. The product of these matrices with its Hermitian transpose is the identity matrix, \mathbf{Q\,Q}^H =
\bf{I}, and \mathbf{Q}^H\,\mathbf{Q} = \bf{I}.

Orthogonal Matrix

A matrix is orthogonal if its elements are real, \mathbf{Q} \in \mathbb{R}^{n\,\times\,n}, and it is unitary.

Here is an example of multiplication of an orthogonal matrix by its transpose.

\begin{array}{lc}
    &\mat{0.5 -0.866; 0.866 0.5}\mat{0.5 0.866; -0.866 0.5} =\\
    \hfill \\
    &\mat{(0.5^2 + 0.866^2) (0.5\cdot0.866 - 0.866\cdot0.5);
   (0.866\cdot0.5 - 0.5\cdot0.866) (0.866^2 + 0.5^2)} = \\
    \hfill \\
    &\mat{1 0; 0 1}
\end{array}

When we discuss the accuracy of solutions to systems of equations (\mathbf{A}\,\bm{x} = \bm{b}) in Numerical Stability, we will see that the condition number of the \bf{A} matrix influences the level of obtainable accuracy (Matrix Condition Number). Orthogonal matrices have a condition number of 1, so Orthogonal Matrix Decompositions are used to obtain the most accurate results, especially when \bf{A} is rectangular.

Note

Orthogonal matrices may seem a bit abstract at this point because we have not yet encountered an application for them. We need an orthogonal matrix to provide the basis vectors for vector projections, which is described in Alternate Projection Equation. Rotation matrices discussed in Rotation of a Point are orthogonal. We can obtain orthogonal columns from an existing matrix by using the modified Gram–Schmidt algorithm (Implementation of Modified Gram–Schmidt), QR factorization (QR Decomposition), or from the SVD (Finding Orthogonal Basis Vectors, and Projection and the Economy SVD). Orthogonal matrices are also the result from finding the eigenvectors of symmetric matrices, which is proven in The Schur Decomposition of Symmetric Matrices.

6.2.3. Special Matrix Relationships

6.2.3.1. Matrix Inverse

The inverse of a square matrix, \bf{A}, is another matrix, \mathbf{A}^{-1}, that multiplies with the original matrix to yield the identity matrix.

\mathbf{A}^{-1} \mathbf{A} = \mathbf{A\,A}^{-1} = \bf{I}

Not all square matrices have an inverse and calculating the inverse for larger matrices is slow, requiring significant calculations.

In MATLAB, the function inv(A) returns the inverse of matrix \bf{A}. MATLAB’s left-divide operator provides a more efficient and numerically stable alternative to using a matrix inverse to solve a system of equations. So while matrix inverses frequently appear in written documentation, we seldom calculate a matrix inverse.

Here are a couple properties for matrix inverses.

6.2.3.2. Matrix Transpose Properties

We described the transpose for vectors and matrices in Transpose.

  • {\left( \mathbf{A}^{T} \right)}^T = \bf{A}

  • The transpose with respect to addition, (\mathbf{A} + \mathbf{B})^T
= \mathbf{A}^T + \mathbf{B}^T.

  • (\mathbf{A\,B})^T = \mathbf{B}^T \, \mathbf{A}^T. Notice that the order is reversed.

    \begin{array}{rcl}
    (i, j) \mbox{ value of } (\mathbf{A\,B})^T
    & = & (j, i) \mbox{ value of } \mathbf{A\,B} \\
    & = & j \mbox{ row of } \mathbf{A} \times i
            \mbox{ column of } \mathbf{B} \\
    & = & i \mbox{ row of } \mathbf{B}^T \times j
            \mbox{ column of } \mathbf{A}^T \\
    & = & (i, j) \mbox{ value of } \mathbf{B}^T\,\mathbf{A}^T
\end{array}

  • We often pre-multiply vectors and matrices by their transpose (\mathbf{A}^T \mathbf{A}). The result is a scalar for a column vector, and a square, symmetric matrix for a row vector, rectangular matrix, and square matrix. If \mathbf{A} is a m{\times}n matrix, \mathbf{A}^T \mathbf{A} is a n{\times}n square, symmetric matrix.

    Since \mathbf{S} = \mathbf{A}^T \mathbf{A} is symmetric, then \mathbf{S}^T = \mathbf{S}.

    \mathbf{S}^T = {\left( \mathbf{A}^T \mathbf{A} \right)}^T =
\mathbf{A}^T {\left( \mathbf{A}^T \right)}^T
= \mathbf{A}^T \mathbf{A} = \mathbf{S}

    Another interesting result is that the trace of \mathbf{S} =
\mathbf{A}^T \mathbf{A}, denoted as Tr(\mathbf{S}), is the sum of the squares of all the elements of \bf{A}. The trace of a matrix is the sum of its diagonal elements.

    \mathbf{A}^T\mathbf{A} = \mat{a b c; d e f} \mat{a d; b e; c f}
= \mat{{a^2 + b^2 + c^2}, {ad + be + cf};
       {ad + be + cf}, {d^2 + e^2 + f^2}}

  • For an orthogonal matrix, \mathbf{Q}^{-1} = \mathbf{Q}^T. This is called the Spectral Theorem. Note that the columns of the matrix must be unit vectors for this property to be true.

    Since each of the columns are unit vectors, it follows that the diagonal of \mathbf{Q}^T \mathbf{Q} is ones. The off-diagonal values are zeros because the dot product between the columns is zero. Thus,

    \begin{array}{l}
\mathbf{Q}^T \mathbf{Q} = \mathbf{Q\,Q}^T = \mathbf{I} \\
\hfill \\
\mathbf{Q}^T = \mathbf{Q}^{-1}
\end{array}

    Here is an example of an orthogonal rotation matrix.

    >> Q
    Q =
        0.5000   -0.8660
        0.8660    0.5000
    
    >> Q'
    ans =
        0.5000    0.8660
       -0.8660    0.5000
    
    >> Q(:,1)'*Q(:,2)   % column dot products => orthogonal
    ans =
        0
    
    >> Q'*Q             % orthogonal columns
    ans =
        1.0000   -0.0000
       -0.0000    1.0000
    >> Q*Q'             % orthogonal rows
    ans =
        1.0000    0.0000
        0.0000    1.0000
    

    Another interesting result about orthogonal matrices is that the product of two orthogonal matrices is also orthogonal. This is because both the inverse and transpose of matrices reverse the matrix order. Proof of orthogonality only requires that \mathbf{Q}^T = \mathbf{Q}^{-1}.

    Consider two orthogonal matrices \mathbf{Q}_1 and \mathbf{Q}_2.

    {\left(\mathbf{Q}_1 \mathbf{Q}_2 \right)}^{-1} =
\mathbf{Q}_2^{-1} \mathbf{Q}_1^{-1} =
\mathbf{Q}_2^T \mathbf{Q}_1^T =
{\left(\mathbf{Q}_1 \mathbf{Q}_2 \right)}^T

  • When a transpose is made of a matrix with complex values, we usually also take the complex conjugate of each value, which is called a Hermitian transpose. The conjugate of each value is often needed in numerical algorithms because when a complex number is multiplied by its conjugate, the product is a real number. The complex matrix product \mathbf{S} = \mathbf{A}^H\,\mathbf{A} is said to be Hermitian. Hermitian matrices have real numbers along the diagonal and the other values are symmetric conjugates. In technical writing, the Hermitian transpose is usually indicated as \mathbf{A}^H. In some older publications, it may be written as \mathbf{A}^* or even as \mathbf{A}^{\dagger}. Also in technical writing, symmetric matrices that may or may not have complex values are usually referred to as Hermitian rather than as symmetric.

    In MATLAB, A’ is the Hermitian transpose of A. While A.’ is the transpose of A where the complex conjugates of the values in A are not taken.

6.2.4. Determinant

The determinant of a square matrix is a number. At one time, determinants were a significant component of linear algebra. But that is not so much the case today [STRANG99]. The importance of determinants is now more for analytical than computational purposes. Thus, while it is important to know what a determinant is, we will avoid using them when possible.

  • Either of two notations are used to indicate the determinant of a matrix, \det(\bf{A}) or \vert\bf{A}\vert.
  • The classic technique for computing a determinant is called the Laplace expansion. The complexity of computing a determinant by the classic Laplace expansion method has run-time complexity on the order of n! (n–factorial), which is fine for small matrices, but is too slow for large (n > 3) matrices.
  • In practice, the determinant for matrices is computed using a faster method that is described in Determinant Shortcut and improves the performance to \mathcal{O}(n^3). [3] The faster method uses the result that \det(\mathbf{A\,B)} = \det(\mathbf{A})\,\det(\mathbf{B}). The LU decomposition factors are computed as described in LU Decomposition, then the determinant is the product of the determinants of the LU decomposition factors.
  • In some applications, such as machine learning, very large matrices are often used, so computing determinants is impractical. Fortunately, we now have alternate algorithms that do not require determinants.

The determinant of a 2{\times}2 matrix is simple, but it gets more complicated as the matrix dimension increases.

\spaligndelims\vert\vert \spalignmat{a b; c d} =
a\,d - c\,b

The n{\times}n determinant makes use of n (n-1){\times}(n-1) determinants, called cofactors. In each stage of Laplace expansion the elements in a row or column is multiplied by a cofactor determinant from the other rows and columns excluding the row and column of the multiplication element. The determinant is then the sum of the expansion along the chosen row or column. But then the (n-1){\times}(n-1) determinants need to be calculated. Likewise for the following smaller determinants until the needed determinant is a 2{\times}2 that is solved directly. It works as follows for a 3{\times}3 determinant.

\spaligndelims\vert\vert \spalignmat{a b c;d e f;g h i} =
a\, \spaligndelims\vert\vert \spalignmat{e f;h i} -
b\, \spaligndelims\vert\vert \spalignmat{d f;g i} +
c\, \spaligndelims\vert\vert \spalignmat{d e;g h}

Another approach to remembering the sum of products needed to compute a 3{\times}3 determinant is to compute diagonal product terms. The terms going from left to right are added while the right to left terms are subtracted. Note: This only works for 3{\times}3 determinants.

\begin{array}{rl}
    \spaligndelims\vert\vert \spalignmat{a b c;d e f;g h i} &=
\mat{a, {}, b, {}, c, {}, a, {} b;
    {}, \diagdown, {}, \diagdown,{}, \diagdown, {}, {}, {};
    d, {}, e, {}, f, {}, d, {} e;
    {}, {}, {}, \diagdown, {}, \diagdown,{}, \diagdown, {};
    g, {}, h, {}, i, {}, g, {} h} \\ & \hfill \\
&- \mat{a, {}, b, {}, c, {}, a, {} b;
    {}, {}, {}, \diagup, {}, \diagup,{}, \diagup, {};
    d, {}, e, {}, f, {}, d, {} e;
    {}, \diagup, {}, \diagup,{}, \diagup, {}, {}, {};
    g, {}, h, {}, i, {}, g, {} h}
\end{array}

\spaligndelims\vert\vert \spalignmat{a b c;d e f;g h i} =
aei + bfg + cdh - ceg - afh - bdi

For a 4{\times}4 determinant, the pattern continues with each cofactor term now being a 3{\times}3 determinant. The pattern likewise continues for higher order matrices. The number of computations needed with this method of computing a determinant is on the order of n!, which is prohibitive for large matrices.

Notice that the sign of the cofactor additions alternate according to the pattern:

sign_{i,j} = (-1)^{i+j} =
\spalignmat{+ - + -;- + - +;+ - + -;- + - +}

It is not necessary to always use the top row for Laplace expansion. Any row or column will work, just make note of the signs of the cofactors. The best strategy is to apply Laplace expansion along the row or column with the most zeros.

Here is bigger determinant problem with lots of zeros so it is not too bad. At each stage of the expansion there is a row or column with only one nonzero element, so it is not necessary to find the sum of cofactor determinants. Trace the Laplace expansion as we find the determinant.

\spaligndelims\vert\vert
    \begin{array}{rl}
\mat{8 5 4 3 0; 7 0 6 1 0; 8 -5 4 3 -5; -3 0 0 0 0; 4 0 2 2 0}
        &= 3 \times \mat{5 4 3 0; 0 6 1 0; -5 4 3 -5; 0 2 2 0} \\ \hfill \\
= 3 \times 5 \times \mat{5 4 3; 0 6 1; 0 2 2}
        &= 3 \times 5 \times 5 \times \mat{6 1; 2 2} = 750
    \end{array}

MATLAB has a function called det that takes a square matrix as input and returns the determinant.

6.2.5. Calculating a Matrix Inverse

The inverse of a square matrix, \bf{A}, is another matrix, \mathbf{A}^{-1} that multiplies with the original matrix to yield the identity matrix.

\mathbf{A}^{-1} \mathbf{A} = \mathbf{A}\,\mathbf{A}^{-1} = \bf{I}

Unfortunately, calculating the inverse of a matrix is computationally slow. A formula known as Cramer’s Rule provides a neat and tidy equation for the inverse, but for matrices beyond a 2{\times}2, it is prohibitively slow. Cramer’s rule requires the calculation of the determinant and the full set of cofactors for the matrix. Cofactors are determinants of matrices one dimension less than the original matrix. To find the inverse of a 3{\times}3 matrix requires finding one 3{\times}3 determinant and 9 2{\times}2 determinants. The run time complexity of Cramer’s rule is \mathcal{O}(n\cdot\,n!).

For a 2{\times}2 matrix, Cramer’s rule can be found by algebraic substitution.

\begin{array}{rl}
  \mathbf{A\,A}^{-1} &= \bf{I} \\
  \mat{a b;c d} \mat{x_1 x_2; y_1 y_2} &= \mat{1 0;0 1}
\end{array}

One can perform the above matrix multiplication and find four simple equations from which the terms of the matrix inverse may be derived.

\mathbf{A} = \spalignmat{a b;c d} \mbox{ has inverse }
\mathbf{A}^{-1} = \frac{\spalignmat{d -b; -c a}} {det(\mathbf{A})}
            = \frac{1}{(a\,d - c\,b)}\spalignmat{d -b; -c a}

Because of the computational complexity, Cramer’s rule is not used by MATLAB or similar software. Another technique known as elimination is used. The elimination algorithm is descried in Elimination. The run time complexity of calculating a matrix inverse by elimination is \mathcal{O}(n^3). The good news is that we do not need to manually calculate the inverse of matrices because MATLAB has a matrix inverse function called inv.

>> A = [2 -3 0;4 -5 1;2 -1 -3];
>> A_inv = inv(A)
A_inv =
    -1.6000    0.9000    0.3000
    -1.4000    0.6000    0.2000
    -0.6000    0.4000   -0.2000

Note

For square matrices, the left-divide operator uses an efficient elimination technique called LU decomposition to solve a system of equations. Thus, it is seldom necessary to compute the full inverse of the matrix.

6.2.6. Invertible Test

Not all matrices can be inverted. A matrix that is not invertible is said to be singular. A matrix is singular if some of its rows or columns are dependent on each other as described in Independent Vectors. An invertible matrix must derive from the same number of independent equations as the number of unknown variables. A square matrix from independent equations has rank equal to the dimension of the matrix, which means that it is full rank, has a nonzero determinant, and is invertible. We will define rank more completely after discussing elimination. For now, just think of rank as the number of independent rows and columns in the matrix. See also Systems of Linear Equations that describes how linear equations are mapped to matrix equations. A function described in The Column and Row Factorization shows which columns are independent and how the columns are combined to yield any dependent columns. Linearly Independent Vectors gives more formal definitions of independent and dependent vectors.

The following matrix is singular because column 3 is just column 1 multiplied by 2.

\spalignmat{1 0 2;2 1 4;1 2 2}

Sometimes it is difficult to observe that the rows or columns are not independent. When a matrix is singular, its rank is less than the dimension of the matrix, and its determinant also evaluates to zero.

>> A = [1 0 2;2 1 4;1 2 2];
>> rank(A)
ans =
     2
>> det(A)
ans =
    0
>> det(A')  % the transpose is also singular
ans =
    0

If you work much with MATLAB, you will occasionally run across a warning message saying that a matrix is close to singular. The message may also reference a rcondition number. The condition number and rcondition number are metrics indicating if a matrix is invertible, which we will describe in Matrix Condition Number. For our discussion here, we will focus on the rank as our invertible test.

6.2.7. Cross Product

The cross product is a special operation using two vectors in \mathbb{R}^3 with application to geometry and physical systems. Although cross product is an operation for vectors, it is included here because matrix operations (determinant or multiplication) is used to compute it.

The cross product of two nonparallel 3-D vectors is a vector perpendicular to both vectors and the plane which they span and is called the normal vector to the plane. The direction of the normal vector follows the right hand rule as shown in figure Fig. 6.12. Finding the normal vector to a plane is needed to write the equation of a plane. Projections Onto a Hyperplane shows an example of using a cross product to find the equation of a plane.

../_images/right_hand_rule.png

Fig. 6.12 Right Hand Rule: \bm{u}{\times}\bm{v} points in the direction of your right thumb when the fingers curl from \bm{u} to \bm{v}.

  • \bm{u} \times \bm{v} is spoken as “\bm{u} cross \bm{v}”.
  • \bm{u} \times \bm{v} is perpendicular to both \bm{u} and \bm{v}.
  • \bm{v} \times \bm{u} is -(\bm{u} \times \bm{v}).
  • The length \norm{\bm{v} \times \bm{u}} = \norm{\bm{u}}\,\norm{\bm{v}}\,|\sin \theta|, where \theta is the angle between \bm{u} and \bm{v}.
  • The MATLAB function cross(u, v) returns \bm{u} \times \bm{v}.
Cross Product by Determinant

The cross product of \bm{u} = (u_1, u_2, u_3) and \bm{v} = (v_1, v_2, v_3) is a vector.

\begin{array}{ll}
\bm{u} \times \bf{v} & = \spaligndelims\vert\vert
\spalignmat{\vec{i} \vec{j} \vec{k};
            u_1 u_2 u_3; v_1 v_2 v_3} \\ \\
& = \vec{i}(u_2 v_3 - u_3 v_2)
  - \vec{j}(u_1 v_3 - u_3 v_1)
  + \vec{k}(u_1 v_2 - u_2 v_1)
\end{array}

Vectors \vec{i}, \vec{j}, and \vec{k} are the unit vectors in the directions of \bm{x}, \bm{y}, and \bm{z} of the 3-D axis.

Cross Product by Skew-Symmetric Multiplication

An alternative way to compute \bm{u} \times \bm{v} is by multiplication of a skew-symmetric, or anti-symmetric matrix.

  • The skew-symmetric matrix of \bm{u} is given the math symbol, [\bm{u}]_\times. Such a matrix has a zero diagonal and is always singular. The transpose of a skew-symmetric matrix is equal to its negative.

    _{\times}^T = -[\bm{u}]_{\times}

  • MATLAB function skew(u) returns [\bm{u}]_\times.

  • For vectors in \mathbb{R}^3:

    _\times = \mat{0 -u_3 u_2; u_3 0 -u_1; -u_2 u_1 0}

    \bm{u} \times \bm{v} = [\bm{u}]{_\times}\,\bm{v}

>> u = [1 2 3]';
>> v = [1 0 1]';
>> cross(u,v)
ans =
    2
    2
   -2
>> s = skew(u)
s =
    0    -3     2
    3     0    -1
   -2     1     0
>> s*v
ans =
    2
    2
   -2

Footnotes

[1]The meaning of the SVD factors is not our concern at this point. In this application, it is only an example factoring. However, the SVD was chosen because the three factors have the interesting property of rotating, scaling, and then rotating points or vectors again.
[2]The factors of the SVD are \mathbf{A} =
\mathbf{U\,\Sigma\,V}^T. In this example, \bf{V} is symmetric, so \mathbf{V} = \mathbf{V}^T.
[3]The symbol \mathcal{O}(\,) is called Big–O. It is a measure of the order of magnitude of the number of calculations needed relative to the size of the data operated on, n. If we say that an algorithm has run time complexity of \mathcal{O}(n^3), we are not saying that n^3 is the exact number of calculations needed to run the algorithm. Rather, we are saying that the order of magnitude of the number of calculations needed has a cubic relationship to n. So if the number of data elements is doubled, then the number of calculations needed grows by a factor of 8, \mathcal{O}((2n)^3).