11.1. Optimization¶
11.1.1. Numerical Roots of a Function¶
See also
If analytic roots are needed, see Solve.
To find the numeric roots of a polynomial, see Roots of a Polynomial by Eigenvalues.
The numerical methods discussed here find a root of any function within a
specified range, even if the function is non-linear. If the function (equation)
is a polynomial, see the roots()
function described in Roots of a Polynomial by Eigenvalues.
The roots of a function is where the function is equal to zero.
See also
Three classic numerical algorithms find where a function evaluates to zero. The first function, Newton’s method, works best when we have an equation for the function and we can take the derivative of the function. The other two algorithms only require that we can find or estimate the value of the function over a range of points.
11.1.1.1. Newton-Raphson Method¶
The Newton-Raphson method, or just Newton’s method, uses lines that are tangent to the curve of a function to estimate where the function crosses the x-axis.
The algorithm evaluates the function at a sequence of estimates until the root is found.
Evaluate the function and its derivative at the first estimate, and .
The equation for a tangent line passing through the x-axis is
The estimate for the root is then
If is not close enough to the root, let and repeat the steps.
The above graph shows a function for finding . Let , then .
Here is a short function that uses the Newton-Raphson method to find the square root of a number.
function x = newtSqrt(K)
% NEWTSQRT - Square root using the Newton-Raphson method
% x = newtQqrt(K) returns the square root of K
% when K is a real number, K > 0.
if ~isreal(K) || K < 0
error('K must be a real, positive number.')
end
tol = 1e-7; % reduce tolerance for more accuracy
x = K; % starting point
while abs(x^2 - K) > tol
x = (x + K/x)/2;
end
>> newtSqrt(64)
ans =
8.0000
>> newtSqrt(0.5)
ans =
0.7071
In the case of a polynomial function such as the used for the square root
function, one can also use the roots
function. For example, the square root
of 64 could be found as follows.
>> roots([1 0 -64])
ans =
8.0000
-8.0000
We can also write a general function to implement the Newton-Raphson method. We will function handles to evaluate the function and its derivative.
function x = newtRoot(f, df, start)
% NEWTROOT - A root of function f using the Newton-Raphson method
% x = newtQqrt(f, df , start) returns x, where f(x) = 0.
% f - function, df - derivative of function,
% start - inital x value to evaluate.
% Tip: First plot f(x) with fplot and pick a start value where
% the slope of f(start) points towards f(x) = 0.
tol = 1e-7; % reduce tolerance for more accuracy
x = start; % starting point
while abs(f(x)) > tol
d = df(x);
d(d==0) = eps; % prevent divide by 0
x = x - f(x)/d;
end
>> newtRoot((@(x) cos(x) + sin(x) - 0.5), (@(x) cos(x) -sin(x)), 1.5)
ans =
1.9948
>> cos(ans) + sin(ans) - 0.5
ans =
-1.4411e-13
11.1.1.2. Bisection Method¶
The bisection algorithm converges slowly to find the root of a function, but has the advantage that it will always find a root. The bisection algorithm requires two values where the function crosses the –axis between the two points. A new point is found that is half way between the points.
If is not within the allowed tolerance of the root, then one on the end points is changed to such that we have a new, smaller range spanning the –axis crossing.
function x = bisectRoot(f, a, b)
% BISECTROOT find root of function f(x), where a <= x <= b.
% x = bisectRoot(f, a, b)
% f - a function handle
% sign(f(a)) not equal to sign(f(b))
tol = 1e-7; % reduce tolerance for more accuracy
if sign(f(a)) == sign(f(b))
error('sign of f(a) and f(b) must be different.')
end
if f(a) > 0 % swap end points f(a) < 0, f(b) > 0
c = a;
a = b;
b = c;
end
c = (a + b)/2;
fc = f(c);
while abs(fc) > tol
if fc < 0
a = c;
else
b = c;
end
c = (a + b)/2;
fc = f(c);
end
x = c;
>> f = @(x) x^2 - 64;
>> bisectRoot(f, 5, 10)
ans =
8.0000
>> bisectRoot((@(x) cos(x) - 0.5), 0, pi/2)
ans =
1.0472
11.1.1.3. Secant Method¶
The secant algorithm is similar to the bisection algorithm in that we consider a range of values that gets smaller after each iteration of the algorithm. The secant method finds its next point of consideration by making a line between the end points and finding where the line crosses the x-axis.
function x = secantRoot(f, a, b)
% SECANTROOT find root of function f(x), where a <= x <= b.
% x = secantRoot(f, a, b)
% f - a function handle
% sign(f(a)) not equal to sign(f(b))
tol = 1e-7; % reduce tolerance for more accuracy
if sign(f(a)) == sign(f(b))
error('sign of f(a) and f(b) must be different.')
end
if a > b % swap end points a < b
c = a;
a = b;
b = c;
end
sign_a = sign(f(a));
c = a + abs(f(a)*(b - a)/(f(b) - f(a)));
fc = f(c);
while abs(fc) > tol
if sign(fc) == sign_a
a = c;
else
b = c;
end
c = a + abs(f(a)*(b - a)/(f(b) - f(a)));
fc = f(c);
end
x = c;
>> f = @(x) x^2 - 64;
>> secantRoot(f, 5, 10)
ans =
8.0000
>> secantRoot((@(x) cos(x) - 0.3), 0, pi/2)
ans =
1.2661
>> secantRoot((@(x) sin(x) - 0.3), 0, pi/2)
ans =
0.3047
11.1.1.4. Fzero¶
MATLAB has a function called fzero
that uses numerical methods to find a root
of a function. It uses an algorithm that is a combination of the bisection
and secant methods.
The arguments passed to fzero
are a function handle and a row vector with
the range of values to consider. The function should be positive at one of the
vector values and negative at the other value. It may help to plot the function
to find good values to use.
The code below uses the function fplot
to display a plot of the values of a
function over a range.
>> f = @(x) cos(x) - x;
>> fplot(f, [0 pi/2])
>> grid on
>> fzero(f, [0.5 1])
ans =
0.7391
>> f(ans)
ans =
1.1102e-16
See also
MATLAB documentation for fzero
and fplot
.
11.1.2. Finding a Minimum¶
Finding a minimum value of a function can be a harder problem than finding a function root (where the function is zero). Ideally, we would like to use analytic techniques to find the optimal (minimal) values. We learn from calculus that a minimum or maximum of a function occurs where the first derivative is zero. Indeed, derivatives are always an important part of finding where a minimum occurs. But when the function is non-linear and involves several variables, finding an analytic solution is often not possible, so instead we make use of numerical search algorithms.
There are several numerical algorithms for finding where a minimum occurs. Our coverage here is restricted to algorithms operating on restricted convex regions. By convex, we mean like a bowl. If we pour water over a convex surface, the minimum point is where the water stops flowing and begins to fill the bowl-like surface. Some functions are strictly convex while others have local minimum points that are not the global minimum point. In the later case, we need to restrict the domain of the search. MATLAB includes a minimum search algorithm which is discussed below. MathWorks also sales an Optimization Toolbox with enhanced algorithms that are not discussed here. There are also third party and open source resources with enhanced capabilities. One such resource is discussed below.
The mathematics behind how these algorithms work is a little complicated and
beyond the scope of this text. Thus the focus here is restricted to what the
algorithms are able to do and how to use them. As with algorithms that find a
root of a function, the minimum search algorithms call the passed function many
times as they search for the minimum result. MATLAB has two minimum search
functions. The first, fminbnd
, works with functions of one variable. While
the second function, fminsearch
handles functions with multiple variables.
11.1.2.1. Fminbnd¶
The MATLAB function x = fminbnd(f, x1, x2)
returns the minimum value of
function in the range . The algorithm is
based on golden section search and parabolic interpolation and is described in
a book co-authored by MathWork’s chief mathematician and cofounder, Cleve Moler
[FORSYTHE76].
>> f = @(x) x.^2 - sin(x) - 2*x;
>> fplot(f, [0 pi])
>> grid on
>> fminbnd(f, 1, 1.5)
ans =
1.1871
11.1.2.2. Fminsearch¶
The MATLAB function fminsearch
uses the Nelder-Mead simplex search method to
find the minimum value of a function of one or more variables. Unlike the other
searching functions, fminsearch
takes a starting point for the search
rather than a search range. The starting point may be scalar or an array
(vector). If it is an array, then the function needs to use array indexing to
access the array elements. Best results are achieved when the supplied
starting point is close to the minimum location.
Need a maximum instead of a minimum?
Just use the negative of the function and find the minimum.
The following evaluates a function to find the location of its minimum. The overall shape of the surface is a bowl, but with local hills and valleys from a cosine term. The surface is plotted in Fig. 11.6.
% fminsearch example
f = @(x) x(1).^2 + x(2).^2 + 10*cos(x(1).*x(2));
m = fminsearch(f, [1 1]);
disp(['min at: ', num2str(m)]);
disp(['min value: ', num2str(f(m))]);
% plot it now
g = @(x, y) x.^2 + y.^2 + 10*cos(x.*y);
[X, Y] = meshgrid(linspace(-5, 5, 50), linspace(-5, 5, 50));
Z = g(X, Y);
surf(X, Y, Z)
The output shows the location and value of the minimum.
min at: 1.7147 1.7147
min value: -3.9175
A common application of fminsearch
is to define a function handle that
computes an error function that we wish to minimize. An example of that
follows. We can use matrix multiplication
to calculate the final position of a serial-link robotic arm. We can then use
fminsearch
to calculate the inverse kinematics of a robot, which finds the
joint angles needed to position the robot end effector at a specified location.
For robots with only a few joints, it may be possible to find an analytic
inverse kinematic solution, but more complex robots often require numerical
methods to find the desired joint angles. Multiple joint angle sets can
position the robot arm at the same location. For example, we can consider
either elbow-up or elbow-down configurations. Thus, the starting point
value is critical for finding the desired joint angles.
Here is a simple example for a robot with two joints and two rigid arm segments (a and b). The minimized function finds the length of the error vector between forward kinematic calculation and the desired location of the end effector. The forward kinematics function uses functions from Peter Corke’s Spatial Math Toolbox to produce coordinate rotation and translation matrices as used in Coordinate Transformations in 2-D.
% File: fminsearch2.m
% Script using fminsearch for inverse kinematic
% calculation of a 2 joint robot arm.
E = [5.4 4.3]';
error = @(Q) norm(E - fwdKin(Q));
Qmin = fminsearch(error, [0 0]);
disp(['Q = ', num2str(Qmin)]);
disp(['Position = ', num2str(fwdKin(Qmin)')]);
function P = fwdKin(Q)
a = 5;
b = 2;
PB = trot2(Q(1)) * transl2(a, 0) * trot2(Q(2)) * transl2(b, 0);
P = PB(1:2,3);
end
The output of the script is the joint angles for the robot and forward kinematic calculation of the end effector position with those joint angles.
>> fminsearch2
Q = 0.56764 0.3695
Position = 5.4 4.3
11.1.2.3. CVX for Disciplined Convex Programming¶
CVX is a MATLAB-based modeling system for convex
optimization. CVX allows constraints and objectives to be specified using
standard MATLAB syntax. This allows for more flexibility and control in the
specification of the optimization problem. Commands to specify and constrain
the problem are placed between cvx_begin
and cvx_end
statements. CVX
defines a set of commands and functions that can be used to describe the
optimization problem. Most applications of CVX will include a call to
either the minimize
or maximize
function [CVX20].
An example can best illustrate how to use CVX. Recall from the Under-determined Systems section that the preferred solution to an under-determined system of equations is one that minimizes . The least squares solution found by the QR algorithm as discussed in Left-Divide Operator minimizes the -norm of . But it is known that a minimization of the -norm tends to yield a solution that is sparse (has more zeros). We turn to CVX to replace the inherent -norm that we get from our projection equations with an -norm.
>> A
A =
2 4 20 3 16
5 17 2 20 17
19 11 9 1 18
>> b
b =
5
20
13
>> cvx_begin
>> variable x(5);
>> minimize( norm(x, 1) );
>> subject to
>> A*x == b;
>> cvx_end
>> x
x =
-0.0000
1.1384
0.0000
0.0103
0.0260
>> A*x
ans =
5.0000
20.0000
13.0000
Sparse Matrices
Very large but sparse matrices are common in some application domains. MATLAB
supports a sparse storage class to efficiently hold large matrices that
have mostly zeros. A sparse matrix contains the indices and values of only
the non-zero elements. All functions in MATLAB work with sparse matrices. The
sparse
command converts a standard matrix to a sparse matrix. The
full
command restores a sparse matrix to a standard matrix [MOLER04].
S = sparse(A);
...
A = full(S);