Calculus IV with Maple
Copyright 2002, Dr. Jack Wagner
j.wagner@intelligentsearch.com
Lesson 3:
Applications of Vector Derivatives: Multivariate extrema, Newton's method and Taylor series
Topics : The ideas and methods of of the calculus of a single variable are extended to the problem of maxima and minima of a function of two variables. Degenerate critical points. Taylor series for functions of more than one variable including truncation error estimate Newton's method with multiple variables.
Maple commands introduced:
Eigenvalues, Hessian, mtaylor, sphere, coeftayl
Maxima, Minima and Saddle Points
Given
F
:
->
R
on an open set,
U.
We define the set of critical points, {X}, of
F
, by
{x|grad
F
(x) = 0
} for all x in X.
A point,
, is a local maximum (minimum) for
F
if there exists a neighborhood,
V
of
, such that for every x in
V
,
. If, at a point,
,
F(x) attains a local maximum (minimum), we know from the calculus of a single variable that
(
) = 0 for every k; i.e grad
F
(
) = 0
.
Consequently, every extreme point of F lies in {X} although there may be points in {X} that are not extreme points. The hypothesis that
U
be open is critical since otherwise a maximum (minimum) point might lie at an endpoint of
U
.
Example 3.1
Find the local maxima and minima of the function:
| x = -1..1, y =
..
| > | restart: with(plots):with(VectorCalculus):with(LinearAlgebra): |
| > | F := (x, y) -> cos(x + y) + x^2: |
| > | plot3d([x, y, F(x, y)], x = -1..1, y = -1.5 * Pi..1.5 * Pi, axes = framed, color = blue, style = wireframe, labels = [x, y, z]); |
At a local maximum (minimum)
= 0
for every v. From the plot, however, it is apparent that there are points where
= 0, but which are neither local maxima nor minima because F increases in some directions but decreases in others. Such a point is called a saddle point.
| > | dF[x] := diff(F(x, y), x); dF[y] := diff(F(x, y), y); |
By inspecting the plot we select appropriate ranges in which to look for solutions.
| > | fsolve({dF[x], dF[y]}, {x = -1..1, y = -1/2 * Pi..1/2 * Pi}); |
| > | fsolve({dF[x], dF[y]}, {x = -1..1, y = -3/2 * Pi..-1/2 * Pi}); |
| > | fsolve({dF[x], dF[y]}, {x = -1..1, y = 1/2 * Pi..3/2 * Pi}); |
These points could be visualized with pointplot3d, but they would be lost in the plot of F. In order to visualize them together with F, we need to see them as small spheres. In the plottools package there is a sphere function.
sphere
| > | with(plottools): p1 := sphere([0, 0, F(0, 0)], .07): p2 := sphere([0, -Pi, F(0, -Pi)], .07): p3 := sphere([0, Pi, F(0, Pi)], .07): |
| > | P1 := plot3d([x, y, F(x, y)], x = -1..1, y = -1.5 * Pi..1.5 * Pi, axes = framed, color = blue, style = wireframe, labels = [x, y, z]): |
| > | display(p1, p2, p3, P1); |
The spheres are seen as ellipsoidal because the scaling is not "constrained".
Now we need a criterion for determining whether a point is a local minimum, maximum or saddle point. We are going to look for a generalization of what we already know from the calculus of a single variable; i.e. that the second derivative is negative (positive) at a relative maximum (minimum). Since, at a critical point,
, for every v = {
,
,
, ...
},
= 0
.
This alternative criterion for a critical point extends naturally to criteria for determining whether a critical point is a local maximum a local minimum or a saddle point.
We look at
; that is, the derivative in every direction of the first derivatives. Since every vector can be represented in a basis consisting of the coordinate vectors, it suffices to consider
=
In matrix form this looks like:
Notice that, when dealing with matrices, it is necessary to distinguish between horizontal and vertical vectors. The matrix,
is known as the Hessian,
.
Evaluated at a point x, it is written
.
The Hessian is found with the Maple functions, hessian and Hessian.
Hessian
hessian
| > | h[f] := Hessian(F(x, y), [x, y]); |
We need a functional form.
| > | H[f] := X -> subs({x = X[1], y = X[2]}, h[f]); |
The criteria we are looking for is this. When, at a critical point, the second derivative is negative (positive) in every direction, then that point is a local maximum (minimum). In other words we need to determine whether, in some sense, the Hessian is positive or negative. What does it mean for the Hessian matrix to be negative (positive)? The correct interpretation of this statement is that for any vector v,
, (>0)
.
It suffices to consider only unit vectors, u, because, if the statement is true for a normalized vector, it will be true of the original vector. We are led therefore, to consider the following problem. For which x will
be such that
Max{
}, subject to the constraint that
.
We apply the method of Lagrange multipliers, noting that
is symmetric because of the equality of the mixed partial derivatives.
= 2
Compute the derivative of the primary equation.
Compute the derivative of the constraint.
=
Use the Lagrange multiplier principle.
By taking transposes we obtain:
=
Lambda is thus an eigenvalue, and
u
an eigenvector, of
.
Multiplying on the left by
:
=
=
Because
is symmetric, we are guaranteed that all its eigenvalues are real. We have found, therefore, that if the eigenvalues of
are all negative,
x
is a local maximum, whereas if they are all positive,
x
is a local minimum. If some are positive and others negative, then we are at a saddle point, where the value of F is increasing in some directions and decreasing in others.
Maple provides a number of functions in the linalg package for working with eigenvalues and eigenvectors. We will require only one of these functions;
eigenvalues
or
Eigenvalues
Eigenvalues
eigenvalues
| > | Eigenvalues(H[f]([0, Pi]), output = 'list'); |
| > | Eigenvalues(H[f]([0, -Pi]), output = 'list'); |
| > | Eigenvalues(H[f]([0, 0]), output = 'list'); |
The two points, (0,
) and (0,
)
are relative minima and the point
(0, 0)
is a saddle point.
With the method now in hand we will try another example.
Example 3.2
=
| > | restart: with(plots):with(LinearAlgebra): with(VectorCalculus): |
| > | F := X -> exp(cos(X[1])) + exp(cos(X[2])): |
| > | plot3d([x, y, F([x, y])], x = -3/2 * Pi..3/2 * Pi, y = -1..1, axes = framed, color = blue, style = wireframe, labels = [x, y, z]); |
We need this plot in order to select appropriate ranges for the fsolve function.
| > | dF[x] := diff(F([x, y]), x); |
| > | dF[y] := diff(F([x, y]), y); |
| > | fsolve({dF[x] = 0, dF[y] = 0}, {x, y}); |
| > | fsolve({dF[x] = 0, dF[y] = 0}, {x = 2..4, y = -1..1}); |
| > | fsolve({dF[x] = 0, dF[y] = 0}, {x = -4..-2, y = -1..1}); |
| > | with(plottools): |
| > | p1 := sphere([0, 0, F([0, 0])], .07): p2 := sphere([3.141592654, 0, F([3.141592654, 0])], .07): p3 := sphere([-3.141592654, 0, F([-3.141592654, 0])], .07): |
| > | P1 := plot3d([x, y, F([x, y])], x = -3/2 * Pi..3/2 * Pi, y = -1..1, axes = framed, color = blue, style = wireframe, labels = [x, y, z]): |
| > | display(p1, p2, p3, P1); |
This plot is not a necessity but it furnishes a satisfying visual confirmation that we are on target.
| > | h[f] := Hessian(F([x, y]), [x, y]); H[f] := X -> subs({x = X[1], y = X[2]}, h[f]): |
| > | Eigenvalues(H[f]([0, Pi]), 'output' = list); |
| > | radsimp(%); |
| > | Eigenvalues(H[f]([0, 0]), 'output' = list); |
| > | Eigenvalues(H[f]([0, -Pi]), 'output' = list); |
| > | radsimp(%); |
| > | evalf(F([0, 0])); |
| > | evalf(F([0, Pi])); |
| > | evalf(F([0, -Pi])); |
The points
(0,
,
)
and
(0,
,
)
are saddle points and the point
(0, 0, 5.436563656)
is a local maximum.
What happens if the machinery doesn't work?
Example 3.3
Here is a perfectly innocuous looking function.
We will look for extreme values of F.
| > | restart: with(plots):with(LinearAlgebra): with(VectorCalculus): |
To gain an idea of what we are working with we plot the surface
.
| > | F := x^4-12 * x^3 + 54 * x^2-108 * x + 339 + y^4-16 * y^3 + 96 * y^2-256 * y-z^4: |
| > | r := x = 0..10, y = 0..10, z = 0..10: |
| > | implicitplot3d(F, r, axes = boxed, grid=[25,25,25], lightmodel=light2, style = patchnogrid); |
| > | dF[x] := diff(F, x); |
| > | dF[y] := diff(F, y); |
| > | dF[z] := diff(F, z); |
| > | fsolve(dF[x], x); |
| > | fsolve(dF[y], y); |
| > | fsolve(dF[z], z); |
| > | h := Hessian(F, [x, y, z]); |
| > | H := subs({x = 3, y = 4, z = 0}, h); |
If the determinant of the Hessian is zero at a critical point, then that point is a degenerate critical point. How to handle this situation in all generality is not a simple question. It requires the use of higher order partial derivatives. A fuller (but not complete) explanation will appear in the section on Taylor polynomials. The present case is not too difficult because the Hessian is a diagonal matrix whose entries are polynomials in a single variable. Taking third derivatives we have:
This is still the zero matrix at (3, 4, 0). Repeating the process
This matrix has eigenvalues 24, 24, 0, suggesting that the point (3, 4, 0) is a local minimum. This is easily confirmed by trying values of the variables in a neighborhood of (3, 4, 0).
| > | f := unapply(F, [x, y, z]); |
| > | for k from 1 to 20 do v.k := f(2 + .1 * k, 3 + .1 * k, -1 + .1 * k) od: |
| > | S := [seq([n, v.n], n = 1..20)]: |
| > | pointplot(S, connect = true, labels = ["k", "F"], title = "Variation of F(x, y, z) in a neighborhood of (3, 4, 0)."); |
This plot clearly shows that F is a minimum when k = 10, i.e. at the point [3, 4, 0].
Newton's Method
Recall from the calculus of a single variable the recursion formula for Newton's method.
.
We would like to have something similar for the multivariate problem. We seek a solution to the equation
f
(
x
) = 0, where
x
= (
, ...
). Let
x
be the exact solution and
be the initial approximation. Then,
x
0 =
f
(
x
) =
f
(
+
x) ~
f
(
) +
Df
(
)(
x
-
)
Keep in mind that
Df
(
)
is the Jacobian matrix evaluated at
.
Our next approximation to
x
,
say
will be found by solving
f
(
) +
Df
(
)(
-
)
=
0, thereby obtaining
*
f
(
); a result completely analogous to the single variable case.
To implement this we need an approximation to the solution. This can be a far from trivial problem.
Example 3.6
Solve:
For systems having only two variables, Maple makes it easy.
| > | restart: with(plots):with(LinearAlgebra):with(VectorCalculus): |
| > | g1 := 4 * x^2-y + cos(x + y)-4: g2 := y^2-3 * x^2 + log(x + y)-3: |
| > | implicitplot({g1, g2}, x = 0..4, y = 0..4); |
By pointing and clicking we find the graphical estimate (1.35, 2.69)
| > | a := <1.35, 2.69>; |
Define a vector valued function.
| > | F := X -> subs({x = X[1], y = X[2]}, <g1, g2>): F(X); |
Now compute the Jacobian.
| > | dF := Jacobian(<g1, g2>, [x, y]); |
Define another vector valued function that is the inverse of dF.
| > | invdF := MatrixInverse(dF); |
| > |
| > | invdF2 := X -> subs({x = X[1], y = X[2]}, invdF ): invdF2(X); |
The Newton recurrence relationship is simple.
| > | Delta := <1.0|1.0>: while(Norm(Delta, 2)>10^(-7)) do Delta := invdF(a).F(a): a := a-Delta: od: print(a); |
We check the answer.
| > | print(evalf(F(a))); |
Beyond two unknowns, the problem of getting an initial estimate becomes more complex; there may not be a solution. In the case of three variables a 3d plot may give a satisfactory estimate.
Example 3.4
Solve the following system of equations.
| > | restart: with(plots):with(LinearAlgebra): with(VectorCalculus): |
| > | f1 := x + sin(y)-5: f2 := y^2 + log(x)-10: f3 := x-y-z + 5: |
| > | p1 := implicitplot3d(f1, x = 0..8, y = 0..8, z=-5..10, color = red): p2 := implicitplot3d(f2, x = 0..8, y = 0..8, z=-5..10, color = blue, axes = framed): p3 := implicitplot3d(f3, x = 0..8, y = 0..8, z=-5..10, color = green, axes = framed): display(p1,p2,p3, axes = framed, style=patchnogrid); |
The intersection of the three surfaces approximates the solution to the system of equations.. Call the estimated answer a .
| > | a := <5.0, 2.5, 7.5>; |
With the method of the previous example in hand we make a few obvious changes for three dimensions.
| > | F := X -> subs({x = X[1], y = X[2], z = X[3]}, <f1, f2, f3>): F(X); |
| > | dF := Jacobian(<f1, f2, f3>, [x, y, z]); |
| > | invdF := MatrixInverse(dF); |
| > | invdF2 := X -> subs({x = X[1], y = X[2], z = X[3]}, invdF): invdF2(X); |
| > | Delta := <1.0, 1.0, 1.0>: while(Norm(Delta, 2)>10^(-7)) do a := evalf(a-invdF(a).F(a)): Delta := evalf(invdF(a).F(a)): od: print(a); print(evalf(F(a))); |
In addition to the answer, we have computed the value of F at that point as a check.
Taylor Series
The Taylor series and Taylor remainder formula for functions of a single variable generalize to the multivariate case. Because we can only plot functions in three dimensions we will examine the case of two variables, but the same ideas and principles apply equally to higher dimensions. We will denote differentiation with respect to x by
and differentiation with respect to y by
. Higher order derivatives will be designated by
etc. For two variables, x and y, expanded around x = a and y = b the Taylor formula reads,
+
+
To third degree the Taylor series for a function of two variables is,
+
+ R
R is the error introduced by truncating the series at this point.
Maple provides the mtaylor function in the standard library (readlib).
mtaylor
Example 3.5
| > | restart: |
As an illustration, up to terms of the fourth degree, the series for sin(x + y) expanded around the point (1, -2) is:
| > | f := sin(x + y): |
| > | `sin(x + y)`[4] := mtaylor(f, {x = 1, y = -2}, 5); |
When a function is expanded around a point, the expansion is likely to be most accurate near that point. It is of interest to know how accurate. That is, we need an upper bound for the truncation error, R. The upper bound for R in an expansion to fourth degree terms, can be estimated as follows. If the function is expanded around (
a, b
) and evaluated at a point (
) then let
h
= (
) - (
a, b
),
,
. Let M be the maximum value of all the mixed fifth order derivatives of sin(x + y) evaluated on the interval (
)..(
) Then the truncation error is bounded by
. We will illustrate the idea with the slightly simpler example of expansion around (0, 0).
Example 3.6
Compute the upper bound on the truncation error, after third degree terms, of the Taylor expansion for the function sin(x + y) expanded around (0, 0) and evaluated at (0.1, 0.05); approximately x = 5 degrees , y = 3 degrees.
| > | restart: with(plots): |
Warning, the name changecoords has been redefined
| > | f := sin(x + y): |
| > | `sin(x + y)`[4] := mtaylor(f, {x, y}, 5); |
All the fourth degree coefficients are zero so we need to look at the fifth degree coefficients using the coeftayl function.
coeftayl
This is a tricky syntax. Coeftayl must be informed by the use of an equation, the names of the variables (left side) and the points around which they are to be evaluated (right side). Then you must insert, as a list, the powers to which each of the variables are to be raised in the term whose coefficient is being calculated. In the present case, with two variables the fifth order term has the following possibilities for the powers of x and y; [5, 0], [4, 1], [3, 2], [2, 3], [1.4], [0, 5]. Because we want to see all the coefficients of fifth order terms we create a sequence using coeftayl. First we must call up coeftayl from the standard library.
| > | C := [seq(coeftayl(f, [x, y] = [0, 0], [a, 5-a] ), a = 0..5)]; |
The largest coefficients belong to the terms containing variables raised to the [3, 2] or [2, 3] powers.
| > | for k from 3 to 4 do f||k := C[k] * x^(k-1) * y^(6-k)od; |
The easiest way to look for the maximum of these terms on the interval ([0, 0], [0.1, 0.05]) is to plot them each in turn.
| > | plot3d(f||3, x = 0..1, y = 0..1, axes = framed); |
| > | plot3d(f||4, x = 0..1, y = 0..1, axes = framed); |
We rapidly determine that .08 is the maximum value attained by either of these terms on this interval. Now put the pieces together.
| > | R := .08 * (.1 + .05)^5; |
This is the maximum error introduced by truncating the series at third order terms. To check on the accuracy of the method we calculate.
| > | s[1] := subs({x = .1, y = .05}, `sin(x + y)`[4]); |
| > | s[2] := sin(.1 + .05); |
| > | s[2]-s[1]; |
The importance of the Taylor series is not limited to calculation. It finds application in theoretical considerations.
The second degree terms in the Taylor series are,
which can be rewritten as:
[
]
=
[
x-a, y-b
]
By shifting the origin, f(a, b) can always be made zero. At a critical point, all the first derivatives are zero, so in a neighborhood of a critical point the function is approximated by the terms containing second derivatives. If
= 0, then the function will be approximated by terms containing third derivatives, etc. This is the reason why the nature of a critical point must sometimes be determined by higher derivatives.
Problems
1.
For
find the critical points on the interval
and determine their nature.
2. Solve the following systems of equations.
a.
b.
c.
3.
Find the Taylor series for
and find the maximum truncation error after terms of fourth degree over the range x = {.5..1.5}, y = {.5..1.5}