11.1. Optimization

11.1.1. Numerical Roots of a Function

See also

If analytic roots are needed, see Solve.

To find the numeric roots of a polynomial, see Roots of a Polynomial by Eigenvalues.

The numerical methods discussed here find a root of any function within a specified range, even if the function is non-linear. If the function (equation) is a polynomial, see the roots() function described in Roots of a Polynomial by Eigenvalues. The roots of a function is where the function is equal to zero.

Three classic numerical algorithms find where a function evaluates to zero. The first function, Newton’s method, works best when we have an equation for the function and we can take the derivative of the function. The other two algorithms only require that we can find or estimate the value of the function over a range of points.

11.1.1.1. Newton-Raphson Method

The Newton-Raphson method, or just Newton’s method, uses lines that are tangent to the curve of a function to estimate where the function crosses the x-axis.

../_images/newton.png

Fig. 11.1 The Newton-Raphson method extends a tangent line to the x-axis to estimate a root value and repeats the process until the root is found within an allowed tolerance.

The algorithm evaluates the function at a sequence of estimates until the root is found.

  1. Evaluate the function and its derivative at the first estimate, f(x_a) and f'(x_a).

  2. The equation for a tangent line passing through the x-axis is

    0 = f(x_a) - f'(x_a)(x_b - x_a).

  3. The estimate for the root is then

    x_b = x_a - \frac{f(x_a)}{f'(x_a)}.

  4. If x_b is not close enough to the root, let x_a = x_b and repeat the steps.

The above graph shows a function for finding \sqrt{K}. Let f(x) = x^2 - K, then f'(x) = 2\,x.

\begin{array}{rl}
  x_b &= x_a - \frac{(x_a^2 - K)}{2\,x_a} \\
      &= \frac{x_a + \frac{K}{x_a}}{2}
\end{array}

Here is a short function that uses the Newton-Raphson method to find the square root of a number.

function x = newtSqrt(K)
% NEWTSQRT - Square root using the Newton-Raphson method
%   x = newtQqrt(K) returns the square root of K
%                   when K is a real number, K > 0.

    if ~isreal(K) || K < 0
        error('K must be a real, positive number.')
    end
    tol = 1e-7; % reduce tolerance for more accuracy
    x = K;      % starting point
    while abs(x^2 - K) > tol
        x = (x + K/x)/2;
    end
>> newtSqrt(64)
ans =
    8.0000
>> newtSqrt(0.5)
ans =
    0.7071

In the case of a polynomial function such as the used for the square root function, one can also use the roots function. For example, the square root of 64 could be found as follows.

>> roots([1 0 -64])
ans =
    8.0000
   -8.0000

We can also write a general function to implement the Newton-Raphson method. We will function handles to evaluate the function and its derivative.

function x = newtRoot(f, df, start)
% NEWTROOT - A root of function f using the Newton-Raphson method
%   x = newtQqrt(f, df , start) returns x, where f(x) = 0.
%       f - function, df - derivative of function,
%       start - inital x value to evaluate.
%   Tip: First plot f(x) with fplot and pick a start value where
%        the slope of f(start) points towards f(x) = 0.


    tol = 1e-7; % reduce tolerance for more accuracy
    x = start;  % starting point
    while abs(f(x)) > tol
        d = df(x);
        d(d==0) = eps; % prevent divide by 0
        x = x - f(x)/d;
    end
>> newtRoot((@(x) cos(x) + sin(x) - 0.5), (@(x) cos(x) -sin(x)), 1.5)
ans =
    1.9948
>> cos(ans) + sin(ans) - 0.5
ans =
   -1.4411e-13

11.1.1.2. Bisection Method

The bisection algorithm converges slowly to find the root of a function, but has the advantage that it will always find a root. The bisection algorithm requires two x values where the function crosses the x–axis between the two points. A new point is found that is half way between the points.

c = \frac{a + b}{2}

If c is not within the allowed tolerance of the root, then one on the end points is changed to c such that we have a new, smaller range spanning the x–axis crossing.

../_images/bisection.png

Fig. 11.2 The bisection algorithm cuts the range between a and b until the root is found midway between a and b.

function x = bisectRoot(f, a, b)
% BISECTROOT  find root of function f(x), where a <= x <= b.
%    x = bisectRoot(f, a, b)
%    f - a function handle
%    sign(f(a)) not equal to sign(f(b))

    tol = 1e-7; % reduce tolerance for more accuracy
    if sign(f(a)) == sign(f(b))
        error('sign of f(a) and f(b) must be different.')
    end
    if f(a) > 0 % swap end points f(a) < 0, f(b) > 0
        c = a;
        a = b;
        b = c;
    end
    c = (a + b)/2;
    fc = f(c);
    while abs(fc) > tol
        if fc < 0
            a = c;
        else
            b = c;
        end
        c = (a + b)/2;
        fc = f(c);
    end
    x = c;
>> f = @(x) x^2 - 64;
>> bisectRoot(f, 5, 10)
ans =
    8.0000
>> bisectRoot((@(x) cos(x) - 0.5), 0, pi/2)
ans =
    1.0472

11.1.1.3. Secant Method

The secant algorithm is similar to the bisection algorithm in that we consider a range of x values that gets smaller after each iteration of the algorithm. The secant method finds its next point of consideration by making a line between the end points and finding where the line crosses the x-axis.

../_images/secant.png

Fig. 11.3 The secant algorithm cuts the range between a and b by finding a point c where a line between f(a) and f(b) crosses the x–axis. The range is repeatedly reduced until f(c) is within an accepted tolerance of zero.

function x = secantRoot(f, a, b)
% SECANTROOT  find root of function f(x), where a <= x <= b.
%    x = secantRoot(f, a, b)
%    f - a function handle
%    sign(f(a)) not equal to sign(f(b))

    tol = 1e-7; % reduce tolerance for more accuracy
    if sign(f(a)) == sign(f(b))
        error('sign of f(a) and f(b) must be different.')
    end
    if a > b % swap end points a < b
        c = a;
        a = b;
        b = c;
    end
    sign_a = sign(f(a));
    c = a + abs(f(a)*(b - a)/(f(b) - f(a)));
    fc = f(c);
    while abs(fc) > tol
        if sign(fc) == sign_a
            a = c;
        else
            b = c;
        end
        c = a + abs(f(a)*(b - a)/(f(b) - f(a)));
        fc = f(c);
    end
    x = c;
>> f = @(x) x^2 - 64;
>> secantRoot(f, 5, 10)
ans =
    8.0000
>> secantRoot((@(x) cos(x) - 0.3), 0, pi/2)
ans =
    1.2661
>> secantRoot((@(x) sin(x) - 0.3), 0, pi/2)
ans =
    0.3047

11.1.1.4. Fzero

MATLAB has a function called fzero that uses numerical methods to find a root of a function. It uses an algorithm that is a combination of the bisection and secant methods.

The arguments passed to fzero are a function handle and a row vector with the range of values to consider. The function should be positive at one of the vector values and negative at the other value. It may help to plot the function to find good values to use.

The code below uses the function fplot to display a plot of the values of a function over a range.

>> f = @(x) cos(x) - x;
>> fplot(f, [0 pi/2])
>> grid on
>> fzero(f, [0.5 1])
ans =
    0.7391
>> f(ans)
ans =
1.1102e-16
../_images/fzero.png

Fig. 11.4 The plot from fplot helps one select the search range to pass to fzero.

See also

MATLAB documentation for fzero and fplot.

11.1.2. Finding a Minimum

Finding a minimum value of a function can be a harder problem than finding a function root (where the function is zero). Ideally, we would like to use analytic techniques to find the optimal (minimal) values. We learn from calculus that a minimum or maximum of a function occurs where the first derivative is zero. Indeed, derivatives are always an important part of finding where a minimum occurs. But when the function is non-linear and involves several variables, finding an analytic solution is often not possible, so instead we make use of numerical search algorithms.

There are several numerical algorithms for finding where a minimum occurs. Our coverage here is restricted to algorithms operating on restricted convex regions. By convex, we mean like a bowl. If we pour water over a convex surface, the minimum point is where the water stops flowing and begins to fill the bowl-like surface. Some functions are strictly convex while others have local minimum points that are not the global minimum point. In the later case, we need to restrict the domain of the search. MATLAB includes a minimum search algorithm which is discussed below. MathWorks also sales an Optimization Toolbox with enhanced algorithms that are not discussed here. There are also third party and open source resources with enhanced capabilities. One such resource is discussed below.

The mathematics behind how these algorithms work is a little complicated and beyond the scope of this text. Thus the focus here is restricted to what the algorithms are able to do and how to use them. As with algorithms that find a root of a function, the minimum search algorithms call the passed function many times as they search for the minimum result. MATLAB has two minimum search functions. The first, fminbnd, works with functions of one variable. While the second function, fminsearch handles functions with multiple variables.

11.1.2.1. Fminbnd

The MATLAB function x = fminbnd(f, x1, x2) returns the minimum value of function f(x) in the range x_1 < x < x_2. The algorithm is based on golden section search and parabolic interpolation and is described in a book co-authored by MathWork’s chief mathematician and cofounder, Cleve Moler [FORSYTHE76].

>> f = @(x) x.^2 - sin(x) - 2*x;
>> fplot(f, [0 pi])
>> grid on
>> fminbnd(f, 1, 1.5)
ans =
    1.1871
../_images/fminbnd.png

Fig. 11.5 The plot from fplot shows that the minimum value of the function f(x) = x^2 - \sin(x) - 2\,x is between 1 and 1.5.

11.1.2.2. Fminsearch

The MATLAB function fminsearch uses the Nelder-Mead simplex search method to find the minimum value of a function of one or more variables. Unlike the other searching functions, fminsearch takes a starting point for the search rather than a search range. The starting point may be scalar or an array (vector). If it is an array, then the function needs to use array indexing to access the array elements. Best results are achieved when the supplied starting point is close to the minimum location.

Need a maximum instead of a minimum?

Just use the negative of the function and find the minimum.

The following evaluates a function to find the location of its minimum. The overall shape of the surface is a bowl, but with local hills and valleys from a cosine term. The surface is plotted in Fig. 11.6.

% fminsearch example
f = @(x) x(1).^2 + x(2).^2 + 10*cos(x(1).*x(2));
m = fminsearch(f, [1 1]);
disp(['min at: ', num2str(m)]);
disp(['min value: ', num2str(f(m))]);
% plot it now
g = @(x, y) x.^2 + y.^2 + 10*cos(x.*y);
[X, Y] = meshgrid(linspace(-5, 5, 50), linspace(-5, 5, 50));
Z = g(X, Y);
surf(X, Y, Z)

The output shows the location and value of the minimum.

min at: 1.7147      1.7147
min value: -3.9175
../_images/fminsearch.png

Fig. 11.6 Surface searched by fminsearch to find the location of the minimum value.

A common application of fminsearch is to define a function handle that computes an error function that we wish to minimize. An example of that follows. We can use matrix multiplication to calculate the final position of a serial-link robotic arm. We can then use fminsearch to calculate the inverse kinematics of a robot, which finds the joint angles needed to position the robot end effector at a specified location. For robots with only a few joints, it may be possible to find an analytic inverse kinematic solution, but more complex robots often require numerical methods to find the desired joint angles. Multiple joint angle sets can position the robot arm at the same location. For example, we can consider either elbow-up or elbow-down configurations. Thus, the starting point value is critical for finding the desired joint angles.

Here is a simple example for a robot with two joints and two rigid arm segments (a and b). The minimized function finds the length of the error vector between forward kinematic calculation and the desired location of the end effector. The forward kinematics function uses functions from Peter Corke’s Spatial Math Toolbox to produce coordinate rotation and translation matrices as used in Coordinate Transformations in 2-D.

% File: fminsearch2.m
% Script using fminsearch for inverse kinematic
% calculation of a 2 joint robot arm.
E = [5.4 4.3]';
error = @(Q) norm(E - fwdKin(Q));
Qmin = fminsearch(error, [0 0]);
disp(['Q = ', num2str(Qmin)]);
disp(['Position = ', num2str(fwdKin(Qmin)')]);

function P = fwdKin(Q)
    a = 5;
    b = 2;
    PB = trot2(Q(1)) * transl2(a, 0) * trot2(Q(2)) * transl2(b, 0);
    P = PB(1:2,3);
end

The output of the script is the joint angles for the robot and forward kinematic calculation of the end effector position with those joint angles.

>> fminsearch2
Q = 0.56764      0.3695
Position = 5.4         4.3

11.1.2.3. CVX for Disciplined Convex Programming

CVX is a MATLAB-based modeling system for convex optimization. CVX allows constraints and objectives to be specified using standard MATLAB syntax. This allows for more flexibility and control in the specification of the optimization problem. Commands to specify and constrain the problem are placed between cvx_begin and cvx_end statements. CVX defines a set of commands and functions that can be used to describe the optimization problem. Most applications of CVX will include a call to either the minimize or maximize function [CVX20].

An example can best illustrate how to use CVX. Recall from the Under-determined Systems section that the preferred solution to an under-determined system of equations is one that minimizes \bm{x}. The least squares solution found by the QR algorithm as discussed in Left-Divide Operator minimizes the l_2-norm of \bm{x}. But it is known that a minimization of the l_1-norm tends to yield a solution that is sparse (has more zeros). We turn to CVX to replace the inherent l_2-norm that we get from our projection equations with an l_1-norm.

>> A
A =
    2     4    20     3    16
    5    17     2    20    17
   19    11     9     1    18
>> b
b =
    5
   20
   13

>> cvx_begin
>>  variable x(5);
>>  minimize( norm(x, 1) );
>>  subject to
>>      A*x == b;
>> cvx_end
>> x
x =
   -0.0000
    1.1384
    0.0000
    0.0103
    0.0260
>> A*x
ans =
     5.0000
    20.0000
    13.0000

Sparse Matrices

Very large but sparse matrices are common in some application domains. MATLAB supports a sparse storage class to efficiently hold large matrices that have mostly zeros. A sparse matrix contains the indices and values of only the non-zero elements. All functions in MATLAB work with sparse matrices. The sparse command converts a standard matrix to a sparse matrix. The full command restores a sparse matrix to a standard matrix [MOLER04].

S = sparse(A);

...

A = full(S);