8.6. Change of Basis and Difference Equations

When we wish to multiply \mathbf{A}^k by a vector (\mathbf{A}^k\bm{v}), we could diagonalize the matrix to find \mathbf{A}^k and then multiply by the vector. But there is another approach that also makes apparent the steady state of the system (when k goes to infinity). The change of basis method of computing \mathbf{A}^k\bm{v} has particular application to difference equations, which are commonly used as a model for discrete systems.

First we need to change the vector into a linear combination of the eigenvectors of \bf{A}.

\bm{v} = c_1\,\bm{x}_1 + c_2\,\bm{x}_2 + \cdots + c_n\,\bm{x}_n,

This is called a change of basis, because as shown in figure Fig. 8.3, we are using the eigenvectors as a new coordinate frame to describe the vector instead of the normal Cartesian basis vectors.

../_images/chBasis.png

Fig. 8.3 Vector \bm{v} is changed from coordinates (a_1, a_2) using the standard basis vectors to coordinates (c_1, c_2) using the eigenvectors of \bf{A} as the basis vectors.

In matrix form, the vector \bm{v} is described as

\bm{v} = \mathbf{X}\,\bm{c} =
\spalignmat{\vertbar{} \vertbar{} {} \vertbar{};
               \bm{x}_1 \bm{x}_2 \cdots{} \bm{x}_n;
               \vertbar{} \vertbar{} {} \vertbar{} }
               \vector{c_1; c_2; \vdots; c_n}.

Finding the linear coefficients is just a matter of solving a system of linear equations: \bm{c} = \mathbf{X}^{-1}\bm{v}, or in MATLAB: c = X\v.

Now, we will take advantage of \mathbf{A\,X} = \mathbf{X\,\Lambda} along with \bm{v} = \mathbf{X}\,\bm{c} to multiply \bm{v} by powers of \bf{A},

\begin{array}{rl}
       \mathbf{A}\,\bm{v} &= \mathbf{A\,X}\,\bm{c} \\
                         &= \mathbf{X\,\Lambda}\,\bm{c} \\
&= \lambda_1\,c_1\,\bm{x}_1 + \lambda_2\,c_2\,\bm{x}_2
            + \cdots + \lambda_n\,c_n\,\bm{x}_n \\
       \mathbf{A}^2\bm{v} &= \mathbf{A\,X\,\Lambda}\,\bm{c} \\
                         &= \mathbf{X\,\Lambda}^2\,\bm{c} \\
&= \lambda_1^2\,c_1\,\bm{x}_1 + \lambda_2^2\,c_2\,\bm{x}_2
            + \cdots + \lambda_n^2\,c_n\,\bm{x}_n \\
       \mathbf{A}^k\bm{v} &= \mathbf{A\,X\,\Lambda}^{k-1}\,\bm{c} \\
                         &= \mathbf{X\,\Lambda}^k\,\bm{c} \\
&= \lambda_1^k\,c_1\,\bm{x}_1 + \lambda_2^k\,c_2\,\bm{x}_2
            + \cdots + \lambda_n^k\,c_n\,\bm{x}_n \\
        \end{array}

The \bf{\Lambda} matrix is the same diagonal eigenvalue matrix from the diagonalization factoring.

Let’s illustrate this with an example. Consider the following matrix and its eigenvectors and eigenvalues. We will find \mathbf{A}^2 \bm{v} using the change of basis strategy, where \bm{v} = (3, 11) is changed to a linear combination of the eigenvectors.

>> A = [1 2; 5 4];
>> [X, L] = eig(A);
>> x1 = X(:,1);
>> x2 = X(:,2);
>> l1 = L(1,1);
>> l2 = L(2,2);
>> v = [3; 11];
>> c = X\v;

>> A^2 * v
ans =
   143
   361

% check with
% eigenvalues and eigenvectors
>> l1^2*c(1)*x1 + l2^2*c(2)*x2
ans =
  143.0000
  361.0000

To generalize, we can say that when

\bm{v} = c_1\,\bm{x}_1 + c_2\,\bm{x}_2 + \cdots + c_n\,\bm{x}_n,

then

(8.5)\mathbf{A}^k\bm{v} = \lambda_1^k c_1\,\bm{x}_1 +
    \lambda_2^k c_2\,\bm{x}_2
+ \cdots + \lambda_n^k c_n\,\bm{x}_n.

Note

Eigenvalues and eigenvectors are often complex, but when the terms are added together, the result will be real (zero imaginary part).

Change of basis for steady state analysis

The number of computations to find \mathbf{A}^k\bm{v} by diagonalization and change of basis is essentially the same. The reason to employ the change of basis method is to more easily see what the equation will do when k \to \infty, which is steady-state analysis.

  • If any 0 \leq \lambda_i < 1.0, then that term goes to zero when k goes to infinity.
  • If any \lambda_i > 1.0, the equation goes to infinity.
  • If any \lambda_i = 1.0, that term will have a stable, nonzero steady-state value.
  • If all 0 \leq \lambda_i \leq 1.0, then the equation has a stable steady-state value.
  • If any -1.0 < \lambda_i < 0, then that term oscillates between positive and negative, but goes to zero when k goes to infinity.
  • If any \lambda_i = -1.0, that term will oscillate between a nonzero positive and negative value.
  • If any \lambda_i < -1.0, the equation oscillates between positive and negative and goes to infinity when k goes to infinity.

8.6.1. Difference Equations

One of the main applications where you might want to compute the power of a matrix multiplied by a vector is for a difference equation, which is the discrete form of a differential equation. Difference equations are often used in the study of control systems, which is important to nearly every engineering discipline.

We have a difference equation when the state of a system at observation k is its state in the previous observation multiplied by a matrix. Note that the state of a system can be a set of any desired conditions of the system put into a vector, e.g., position, velocity, acceleration, or anything else that describes a property of the system at each observation.

\bm{u}_k = \mathbf{A}\,\bm{u}_{k-1}.

The state of the system at the first observation as given by its initial condition is

\bm{u}_1 = \mathbf{A}\,\bm{u}_0.

After the second observation, it is

\bm{u}_2 = \mathbf{A}\,\bm{u}_1 =
\mathbf{A\,A}\,\bm{u}_0 = \mathbf{A}^2\,\bm{u}_0.

After k observations, it is

\bm{u}_k = \mathbf{A}^k\,\bm{u}_0.

Applying equation (8.5) gives us an equation for the state of a system defined by a difference equation after k observations.

(8.6)\bm{u}_k = \mathbf{A}^k\bm{u}_0 = \lambda_1^k c_1\,\bm{x}_1 +
    \lambda_2^k c_2\,\bm{x}_2
+ \cdots + \lambda_n^k c_n\,\bm{x}_n,

The eigenvalues and eigenvectors of \bf{A} are \lambda_i and \bm{x}_i. The c coefficients are the elements of the vector \bm{c} = \mathbf{X}^{-1}\bm{u}_0.

8.6.2. Application: Fibonacci Sequence

A simple application of a difference equation is the Fibonacci sequence of numbers. This is the sequence of numbers starting with 0 and 1 and continuing with each number being the sum of the previous two numbers: {0, 1, 1, 2, 3, 5, 8, 13, 21, \ldots }, \text{Fibonacci}(k) = \text{Fibonacci}(k-1) + \text{Fibonacci}(k-2). This sequence of numbers occurs naturally in nature and is often fodder for programming problems using recursion and dynamic programming, but it can also be described as a difference equation, from which we can use the change of basis method to find a closed form solution known as Binet’s formula.

The state of the system here is a vector with \text{Fibonacci}(k) and \text{Fibonacci}(k - 1).

\begin{array}{rl}  \bm{F(k)} =
\vector{\text{Fibonacci}(k); \text{Fibonacci}(k-1)}
                &= \mat{1 1; 1 0}
             \vector{\text{Fibonacci}(k-1);
                \text{Fibonacci}(k-2)}  \\ \hfill \\
        &= \mat{1 1; 1 0}^k \vector{1 ; 0}
\end{array}

Let us use the change of basis strategy to compute \bm{F}(k) for any value of k.

Using the quadratic equation, one can quickly find the two eigenvalues from the determinant of the characteristic matrix.

m = \frac{1}{2}, \;\; p = -1, \;\; \lambda_{1,2} = \frac{1}{2} \pm
\sqrt{\frac{1}{4} + 1} = \frac{1 \pm \sqrt{5}}{2}

\lambda_1 = \frac{1 + \sqrt{5}}{2}, \mbox{  and  }
\lambda_2 = \frac{1 - \sqrt{5}}{2}.

However, the algebra to find the symbolic equation for the eigenvectors and the change of basis coefficients gets a little messy, but Chamberlain shows the derivation in a nice blog post [CHAMBERLAIN16].

We can also let MATLAB compute the eigenvalues and eigenvectors using the Symbolic Math Toolbox to get the exact equations so that they can be used in the solution.

[X, L] = eig(sym([1 1; 1 0]))

The eigenvectors are:

\bm{x}_1 = \vector{\left(\frac{1 + \sqrt{5}}{2}\right); 1}, \mbox{ and }
    \bm{x}_2 = \vector{\left(\frac{1 - \sqrt{5}}{2}\right); 1}

With elimination, we find the change of basis coefficients.

\bm{c} = \vector{\frac{1}{\sqrt{5}}; -\frac{1}{\sqrt{5}}}

Then each Fibonacci number is given by the first value of the difference equation vector.

\text{Fibonacci(k)} = c_1 \lambda_1^k x_{1,1} +
c_2 \lambda_2^k x_{1,2}

With this solution, \text{Fibonacci(0)} = 1, but we observe that the eigenvector terms used are the same as the eigenvalues, so we are actually raising the eigenvalues to the (k + 1) power. Thus we can shift the terms and get a result with \text{Fibonacci(0)} = 0.

\text{Fibonacci(k)} = c_1 \lambda_1^k + c_2 \lambda_2^k
= \frac{\left(\frac{1 + \sqrt{5}}{2}\right)^k -
  \left(\frac{1 - \sqrt{5}}{2}\right)^k} {\sqrt{5}}

The Fibonacci sequence function is closed form (constant run time). The function calculates integer values. The round function just returns values of integer data type instead of floating point numbers. The commented lines show code that yields the equations used in the code.

function fib = fibonacci(k)
% FIBONACCI(k) - kth value of Fibonacci sequence
% (Binet's formula)

%     A = [1 1; 1 0];
%     [X, L] = eig(sym(A));
%     X = [(1 + sqrt(5))/2 (1 - sqrt(5))/2; 1 1];
    L = [(1 + sqrt(5))/2 0; 0 (1 - sqrt(5))/2];

%     u0 = [1; 0];
%     c = X/u0;
    c = [1/sqrt(5); -1/sqrt(5)];

    % only need the first vector value
    fib = round(c(1)*L(1,1)^k + c(2)*L(2,2)^k);
end

8.6.3. Application: Markov Matrices

A Markov matrix is a stochastic (statistical) matrix showing probabilities of something transitioning from one state to another in one observation period. There are many applications where things can be thought of having one of a finite number of states. We often visualize state transition probabilities with a finite state machine (FSM) diagram. Then from a FSM, we can build a Markov matrix and make predictions about the future. Markov models are named after the Russian mathematician Andrei Andreevich Markov (1856 - 1922).

Figure Fig. 8.4 shows a simple FSM with odds for the weather on the next day given the current weather conditions.

../_images/weatherFSM.png

Fig. 8.4 Weather prediction finite state machine.

We interpret the FSM diagram to say that if it is sunny today, there is a 70% chance that it will also be sunny tomorrow, a 20% percent chance that it will be cloudy tomorrow, and a 10% chance of rain tomorrow. Similarly, we can assign probabilities for tomorrow’s weather for when it is cloudy or raining today.

We put the transition probabilities into a Markov matrix. Since the values in the matrix are probabilities, each value is between zero and one. The columns represents the transition probabilities for starting in a given state. The sum of each column must be one. The rows represent the probabilities for being in a state on the next observation.

The columns and rows are here ordered as sunny, cloudy, and rain.

\bf{M} = \mat{0.7, 0.4, 0.3;
0.2, 0.4, 0.5;
0.1, 0.2, 0.2}

Matrix multiplication tells us the weather forecast for tomorrow if it is cloudy today.

>> M = [0.7, 0.4, 0.3; 0.2, 0.4, 0.5; 0.1, 0.2, 0.2]
M =
    0.7000    0.4000    0.3000
    0.2000    0.4000    0.5000
    0.1000    0.2000    0.2000
>> M * [0; 1; 0]
ans =
    0.4000      %  probability sunny
    0.4000      %  probability cloudy
    0.2000      %  probability rain

We need to raise \bf{M} to the k power to predict the weather in k days.

\bm{p}_k = \mathbf{M}^k \, \bm{p}_0.

So if it is cloudy today, we can expect the following on the day after tomorrow.

>> M^2 * [0; 1; 0]
ans =
    0.5000     %  probability sunny
    0.3400     %  probability cloudy
    0.1600     %  probability rain

We will always see something interesting when we look at the eigenvalues of a Markov matrix. One eigenvalue will always be one and the others will be less than one. As we saw before, this means that the forecast for the distant future converges and becomes independent to the current state of the system.

>> [X, L] = eig(M)
X =
    0.8529    0.7961    0.1813
    0.4713   -0.5551   -0.7801
    0.2245   -0.2410    0.5988
L =
    1.0000         0         0
         0    0.3303         0
         0         0   -0.0303

The steady state of the system only depends on the eigenvector corresponding to the eigenvalue of value one. The other terms will go to zero in future observations.

>> c_sun = X\[1; 0; 0];
>> c_clouds = X\[0; 1; 0];
>> c_rain = X\[0; 0; 1];
>> steady_state = X(:,1)*c_sun(1)
steady_state =
    0.5507
    0.3043
    0.1449
>> steady_state = X(:,1)*c_clouds(1)
steady_state =
    0.5507
    0.3043
    0.1449
>> steady_state = X(:,1)*c_rain(1)
steady_state =
    0.5507
    0.3043
    0.1449

We can plot the forecast to see the convergence. The MarkovWeather function applies the change of basis calculation and plots the weather forecast. The markers in figure Fig. 8.5 show the starting and ending of the forecast period. Each point in the 3-D plot is a probability of sun, clouds, or rain. We see from the plot that the distant forecast converges to the same probability regardless of the weather condition today.

../_images/MarkovWeather.png

Fig. 8.5 A 3-D plot showing the Markov matrix forecast probabilities given that today is each of sunny, cloudy, or raining.

% File: MarkovWeather.m
% Script showing change of basis calculation for
% a Markov matrix.

M = [0.7, 0.4, 0.3; 0.2, 0.4, 0.5; 0.1, 0.2, 0.2];
[X, L] = eig(M);
l = diag(L);
% coefficients from initial conditions
c_sun = X/[1; 0; 0];
c_clouds = X/[0; 1; 0];
c_rain = X/[0; 0; 1];

N = 20;
F_sun = zeros(3, N);
F_clouds = zeros(3, N);
F_rain = zeros(3, N);
% build forecast for next N days
for k = 1:N
    F_sun(:,k) = c_sun(1)*X(:,1) + c_sun(2)*l(2)^k*X(:,2) ...
                + c_sun(3)*l(3)^k*X(:,3);
    F_clouds(:,k) = c_clouds(1)*X(:,1) + c_clouds(2)*l(2)^k*X(:,2) ...
                + c_clouds(3)*l(3)^k*X(:,3);
    F_rain(:,k) = c_rain(1)*X(:,1) + c_rain(2)*l(2)^k*X(:,2) ...
                + c_rain(3)*l(3)^k*X(:,3);
end

% plot probabilities for next 20 days
figure, hold on
plot3(F_sun(1,:), F_sun(2,:), F_sun(3,:))
plot3(F_clouds(1,:), F_clouds(2,:), F_clouds(3,:))
plot3(F_rain(1,:), F_rain(2,:), F_rain(3,:))
% mark starting points
plot3(F_sun(1,1), F_sun(2,1), F_sun(3,1), 'r', 'Marker', '*')
plot3(F_clouds(1,1), F_clouds(2,1), F_clouds(3,1), 'b', 'Marker', '+')
plot3(F_rain(1,1), F_rain(2,1), F_rain(3,1), 'm', 'Marker', 'x')
% mark final point
plot3(F_sun(1,N), F_sun(2,N), F_sun(3,N), 'k', 'Marker', '*')
grid on
xlabel('sunny')
ylabel('cloudy')
zlabel('rain')
legend(\{'start sunny', 'start cloudy', 'start raining'\})
hold off

8.6.3.1. Markov Eigenvalues

Markov matrices must have an eigenvalue of one because of the steady state, \lim_{k\to\infty}\mathbf{M}^k\,\bm{v}. By definition, the steady state is never zero or infinity, so there must be at least one eigenvalue of one and no eigenvalue greater than one.

Markov matrices have at least one eigenvalue of one because the columns add to one. Recall that the eigenvalues satisfy the requirement that the characteristic equation, \mathbf{C} = \mathbf{M} - \mathbf{I}\lambda, is a singular matrix, which will occur when \lambda = 1. Subtracting one from the diagonal elements makes each column sum to zero. So the rows are dependent because each row is -1 times the sum of the other rows.

>> M = [0.7, 0.4, 0.3; 0.2, 0.4, 0.5; 0.1, 0.2, 0.2];
>> rank(M)
ans =
     3
>> d = M - eye(3)
d =
   -0.3000    0.4000    0.3000
    0.2000   -0.6000    0.5000
    0.1000    0.2000   -0.8000
>> rank(d)
ans =
     2

From Change of Basis and Difference Equations, we know that if any eigenvalue is > 1 then \mathbf{M}^k\bm{p}_0 \to\infty when k\to\infty, so it is not possible for a stochastic matrix to have an eigenvalue greater than one.