7. Least Squares Regression¶
See also
Here is some additional reading material on Least Squares Regression.
It is one section from Gilbert Strang’s Linear Algebra textbook.
ila0403.pdf
In science and engineering, we often conduct experiments where we measure an output of a system with respect to a control variable. The output and control variables might be time, voltage, weight, length, volume, temperature, pressure, or any number of parameters relative to the system under investigation. We might expect that the output and control variable are related by a polynomial equation, but we likely do not know the coefficients of the polynomial. We might not even know if the relationship of the measured data is linear, quadratic, or a higher order polynomial. We are conducting the experiment to learn about the system. Unfortunately, in addition to measuring the relationship between the output and control variables, we will also measure some noise caused by volatility of the system and measurement errors.
If the is the control variable, the measured output data can be modeled as:
- is the output data.
- is the underlying relationship that we wish to find.
- is the measurement noise, which may be correlated to , or it may be independent of .
The accuracy of our findings will increase with more sample points. After conducting the experiment, we want to fit the data to a polynomial that minimizes the square of the error ().
Note
Least squares regression results can be very good if the noise is moderate and the data does not have outliers that will skew the results. After initial regression fitting, a further analysis of the noise might lead to removal of outlier data so that a better fit can be achieved.
7.1. Linear Regression¶
In linear regression we model the data as following a line, so we represent the relationship between the measured data points, , and the control variable, , with line equations. The coefficients and are the unknowns that we need to find.
As a matrix equation, defines the control variables. This matrix is often called a design matrix in regression terminology.
Vector represents the unknown polynomial coefficients.
The equations for least squares regression calculations are given in three mathematical disciplines — Linear Algebra, Calculus, and Statistics. The computations for the three are actually the same, but the equations look different depending on one’s vantage point.
7.1.1. Linear Algebra Based Linear Regression¶
Because the system of equations is over-determined and the measured output data, , is not likely to lie in the column space of , either the projection equations or QR decomposition may be used to approximate (Projections Onto a Hyperplane).
The projection follows directly.
7.1.2. Calculus Based Linear Regression¶
The error between the projection, , and the target vector, , is . To account for both positive and negative errors, we wish to minimize the squares of the error vector, .
Note that the squared length of a vector is its dot product with itself.
The minimum occurs where the derivative is zero. The derivative is with respect to the vector , which is the gradient of . Appendix C of [YANG05] and the vector calculus chapter of [DEISENROTH20] lists the derivative results that we need.
Calculus gives us the same equation that we get from linear algebra.
7.1.3. Statistical Linear Regression¶
We start by calculating some statistical parameters from both the results, , and the control variable, . The equations for calculating the mean and standard deviation are found in Common Statistical Functions.
Mean of is .
Mean of is .
Standard deviation of is .
Standard deviation of is .
The correlation between and is .
We turn to statistics textbooks for the linear regression coefficients ([HOGG78], [MOORE18]).
We write the matrix and vector terms as sums of the variables and to show that the equations for linear regression from the study of statistics and linear algebra are the same.
Using elimination to solve, we get:
We get a match for the coefficient by identifying statistical equations.
The coefficient comes from back substitution of the elimination matrix.
7.1.4. Linear Regression Example¶
Here is a simple linear regression example. The exact output is different each time the script is run because of the random noise. A plot of the output is shown in Fig. 7.1.
% File: linRegress.m
% Least Squares linear regression example showing the projection of the
% test data onto the design matrix A.
%
% Using control variable t, an experiment is conducted.
% Model the data as: b(t) = C + Dt.
f = @(t) 2 + 4.*t;
% A*u_hat = b, where u_hat = [ C D ]'
% Define the design matrix A.
m = 20; n = 2;
t = linspace(-2, 3, m);
A = ones(m, n);
A(:,2) = t';
b = f(t)' + 3*randn(m, 1);
u_hat = (A'*A)\(A'*b); % least squares line
p = A*u_hat; % projection vector
e = b - p; % error vector
C = u_hat(1);
D = u_hat(2);
fprintf('Equation estimate = %.2f + %.2f*t\n', C, D);
%% Plot the linear regression
figure, plot(t, b, 'o');
hold on
plot(t, p, 'Color', 'k');
hold off
grid on, xlabel('t'), ylabel('b')
title('Linear Regression')
7.2. Quadratic and Higher Order Regression¶
Application of least squares regression to quadratic polynomials only requires adding an additional column to the design matrix for the square of the control variable. [STRANG07]
Higher order polynomials would likewise require additional columns in the matrix.
Here is a quadratic regression example. The plot of the data is shown in Fig. 7.2.
% File: quadRegress.m
%% Quadratic Least Squares Regression Example
%
% Generate data per a quadratic equation, add noise to it, use
% least squares regression to find the equation.
%% Generate data
f = @(t) 1 - 4*t + t.^2; % Quadratic function as a handle
Max = 10;
N = 40; % number of samples
std_dev = 5; % standard deviation of the noise
t = linspace(0, Max, N);
y = f(t) + std_dev*randn(1,N); % randn -> mean = 0, std_deviation = 1
%% Least Squares Fit to Quadratic Equation
% model data as: b = C + Dt + Et^2
% Projection is A*u_hat, where u_hat = [C D E]'
% Determine A matrix from t.
A = ones(N,3); % pre-allocate A - Nx3 matrix
% First column of A is already all ones for offset term.
A(:,2) = t'; % linear term, second column
A(:,3) = t'.^2; % Quadratic term, third column
% b vector comes from sampled data
b = y';
u_hat = (A'*A)\(A'*b); % least squares equation
p = A*u_hat; % projection vector
% e = B - p; % error vector
C = u_hat(1); D = u_hat(2); E = u_hat(3);
fprintf('Equation estimate = %.2f + %.2f*t + %.2f*t^2\n', C, D, E);
%% Plot the results
figure, plot(t, y, 'ob');
hold on
plot(t, p, 'k')
plot(t, f(t), 'm')
labels = {'Sampled Data', 'Regression', 'No Noise Line'};
legend(labels, 'Location', 'northwest');
hold off
title('Quadratic Regression');
xlabel('Control Variable');
ylabel('Output');
7.3. polyfit function¶
An erudite engineer should understand the principles of how to do
least squares regression for fitting a polynomial equation to a set of data.
However, as might be expected with MATLAB, it has a function that
does the real work for us. The polyfit
function does the regression
calculation to find the coefficients. Then the polyval
function can
be used to apply the coefficients to create the –axis data points.
%% MATLAB's polyfit function
coef = polyfit(t, y, 2);
disp(['Polyfit Coefficients: ', num2str(coef)])
pvals = polyval(coef, t);
7.4. Goodness of a Fit¶
Statistics gives us a metric called the coefficient of determination, , to measure how closely a regression fit models the data. Let be the data values, is the regression fit values, and is the mean of the data values.
The range of is from 0 to 1. If the regression fit does not align well with the data, the value will be close to 0. If the model is an exact fit to the data, then will be 1. In reality, it will be somewhere in between, and hopefully closer to 1 than 0.
7.5. Generalized Least Squares Regression¶
The key to least squares regression success is to correctly model the data with an appropriate set of basis functions. When fitting the data to a polynomial, we use progressive powers of as the basis functions.
The regression algorithm then uses a simple linear algebra vector projection algorithm to compute the coefficients () to fit the data to the polynomial.
The basis functions need not be polynomials, however. They can be any function that offers a reasonable model for the data. Consider a set of basis functions consisting of a constant offset and oscillating trigonometry functions.
The design matrix for the regression is then,
Here is a function that implements generalized least squares regression. The basis functions are given as an input in the form of a cell array containing function handles.
function [ alpha, varargout ] = genregression( f, x, y )
% GENREGRESSION Regression fit of data to specified functions
% Inputs:
% f - cell array of basis function handles
% x - x values of data
% y - y values of data
% Output:
% alpha - Coefficients of the regression fit
% yHat - Curve fit data (optional)
% A will be m-by-n
% alpha and b will be n-by-1
if length(x) ~= length(y)
error('Input vectors not the same size');
end
m = length(x);
n = length(f);
A = ones(m,n);
if size(y, 2) == 1 % a column vector
b = y;
cv = true;
else
b = y';
cv = false;
end
if size(x, 2) == 1 % column vector
c = x;
else
c = x';
end
for k = 1:n
A(:,k) = f{k}(c);
end
alpha = (A'*A) \ (A'*b);
if nargout > 1
yHat = A * alpha;
if ~cv
yHat = yHat';
end
varargout{1} = yHat;
end
end
Here is a script to test the genregression
function.
A plot of the output is shown in Fig. 7.3.
% File: trigRegress.m
f = {@(x) 1, @sin, @cos };
x = linspace(-pi, pi, 1000)';
y = 0.5 + 2*sin(x) - 3*cos(x) + randn(size(x));
[alpha, curve] = genregression(f, x, y);
disp(alpha)
figure, plot(x, y, 'b.')
hold on
plot(x, curve,'r', 'LineWidth', 2);
xlabel('x')
ylabel('y')
title('Regression Fit of Trig Functions')
legend('data', 'regression', 'Location', 'north')
hold off
axis tight
A fun application of the generalized least squares is to find the Fourier
series coefficients of a periodic waveform. It turns out that the regression
calculation can also find Fourier series coefficients when the basis functions
are sine waves of the appropriate frequency and phase. The fourierRegress
script demonstrates this for a square wave. The plot of the data is shown in
Fig. 7.4.
% File: fourierRegress.m
% Use least squares regression to find Fourier series coefficients.
f = {@(t)1,@(t) sin(t),@(t) sin(3*t),@(t) sin(5*t),@(t) sin(7*t)};
x = linspace(0, 4*pi, 1000)';
y = sign(sin(x));
%y = y + 0.1*randn(length(x),1);
[alpha, curve] = genregression(f, x, y);
disp(alpha)
figure, plot(x, y, 'b.')
hold on
plot(x, curve,'r', 'LineWidth', 2);
xlabel('x')
ylabel('y')
title('Regression Fit of Square Wave')
legend('data', 'regression') %, 'Location', 'north')
hold off
axis tight
7.6. Fitting Exponential Data¶
If the data is exponential use the logarithm function to convert the data to linear before doing regression data fitting.
- The data takes the form .
- Taking the logarithm of the data converts it to linear data, .
- Use linear regression to find .
- Then .
Here is an example of using a logarithm to facilitate least squares regression of exponential data. The plot of the output is shown in Fig. 7.5.
% File: expRegress.m
x = linspace(0,5);
y = 25*exp(-0.5*x) + randn(1, 100);
Y = log(y);
A = ones(100, 2); A(:,2) = x';
y_hat = exp(A*(A\Y'));
figure, hold on
plot(x,y, 'o')
plot(x, y_hat', 'r')
hold off
legend('data', 'regression')
title('Regression Fit of Exponential Data')
xlabel('x'), ylabel('y')
Note
Now complete the Least Squares Regression Homework.