Section 5.6 Higher Order Derivatives and Taylor Expansion
When
\(\bff :U\subset \bbR^{n}\mapsto \bbR^{m}\) is differentiable in
\(U\text{,}\) then its Jacobian matrix
\([D\bff (\bx)]\) can be considered as a map from
\(U\) into the vector space of
\(m\times n\) matrices, which can be identified with
\(\bbR^{m\times n}\text{.}\) Thus it makes sense to consider whether
\([D\bff (\bx)]\) is differentiable in
\(U\text{.}\)
If \([D\bff (\bx)]\) is differentiable at \(\bx \in U\text{,}\) then for \(\bh\) with \(|\bh|\) is small,
\begin{equation*}
\Vert [D\bff (\bx+\bh)]-[D\bff (\bx)] - [D\left(D\bff\right) (\bx)]\bh\Vert /\Vert \bh \Vert \to 0, \text{ as $\bh\to 0$,}
\end{equation*}
where \(\bh \mapsto [D\left(D\bff\right) (\bx)]\bh\) is a linear map from \(\bbR^{n}\) into the vector space of \(m\times n\) matrices \(\bbR^{m\times n}\text{:}\) if \(\bff(\bx)=(f_{1}(\bx),\cdots, f_{m}(\bx))\) and \(\bh=(h_{1},\cdots, h_{n})\text{,}\) then each of the component \(\frac{\partial f_{i}}{\partial x_{j}}\) of \([D\bff (\bx)]\) is differentiable at \(\bx\) and has directional derivatives at \(\bx\) in any direction, and
\begin{equation*}
[D\left(D\bff\right) (\bx)]\bh= \sum_{k=1}^{n} h_{k}[ D_{x_{k}}\left(D\bff\right) (\bx)].
\end{equation*}
In terms of the \((i, j)\) entry of the output matrix, it is
\begin{equation*}
\sum_{k=1}^{n} h_{k}D_{x_{k}}\left( \frac{\partial f_{i}}{\partial x_{j}}\right) (\bx)
:= \sum_{k=1}^{n} h_{k} \frac{\partial^{2} f_{i}}{\partial x_{k}\partial x_{j}} (\bx).
\end{equation*}
These quantities \(D_{x_{k}}\left( \frac{\partial f_{i}}{\partial x_{j}}\right) (\bx)
= \frac{\partial^{2} f_{i}}{\partial x_{k}\partial x_{j}} (\bx)\) are called the second derivatives of \(f_{i}(\bx)\text{.}\) Other commonly used notations for the second derivatives of a function \(f(\bx)\) include \(\partial^{2}_{kj} f(\bx)\) or \(D^{2}_{kj} f(\bx)\) , or \(f_{x_j x_k}(\bx)\text{;}\) sometimes the superscript that indicates the order of the derivative is omitted. Note that the order of the subscripts in \(f_{x_j x_k}(\bx)\) is in reverse order from \(D^{2}_{kj} f(\bx)\text{,}\) as the former intends to indicate applying the partial differnetiation to the right of \(f\text{,}\) first in \(x_j\) then in \(x_k\text{,}\) while the latter means the same, but the operator \(D_j\) is applied first on the left of \(f\text{,}\) then the operator \(D_k\text{.}\) Fortunately, by Clairautβs theorem to be discussed below, under reasonable conditions, the order of the subscripts in the notation is not important, as long as we understand that they are both second derivatives.
We will not spend energy on the more abstract concept of higher order differentials, but focus on the higher order (partial) derivatives of a scalar-valued function, where we define, say, third order derivatives of a scalar function \(f(\bx)\) via
\begin{equation*}
D^{3}_{x_{l}x_{k}x_{j}} f(\bx):=
\frac{\partial^{3} f}{\partial x_{l} \partial x_{k}\partial x_{j}} (\bx)=
D_{x_{l}}\left( \frac{\partial^{2} f}{\partial x_{k}\partial x_{j}} \right)(\bx),
\end{equation*}
when this derivative is defined. We will often work in a setting where all the \(k\)th order partial derivatives of a function are continuous in a region, therefore any of its \(j\)th order partial derivatives, for \(j\le (k-1)\text{,}\) are differentiable.
One basic question is whether the order in which to take the different mixed higher order derivatives affects the outcomes.
Example 5.6.1.
Define \(f(x, y)=x^{y}\) for \(x, y \gt 0\text{.}\) Then
\begin{equation*}
D_{x} f(x, y)= y x^{y-1}, \; D_{y}f(x, y)=x^{y} \ln x\text{,}
\end{equation*}
and
\begin{align*}
\frac{\partial^2 f}{\partial y \partial x} = \frac{\partial }{\partial y}\left( \frac{\partial f}{\partial x}
\right) = \frac{\partial }{\partial y}\left( y x^{y-1} \right) \amp= x^{y-1}+ y x^{y-1} \ln x,\\
\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial }{\partial x}\left( \frac{\partial f}{\partial y}
\right) = \frac{\partial }{\partial x}\left(x^{y} \ln x \right) \amp= y x^{y-1} \ln x + x^{y-1},
\end{align*}
So \(\frac{\partial^2 f}{\partial y \partial x}= \frac{\partial^2 f}{\partial x \partial y}\) for this function. But this property does require some conditions on the function.
Consider
\begin{equation*}
f(x, y)=\begin{cases} xy \frac{x^{2}-y^{2}}{x^{2}+y^{2}}\quad \amp (x, y)\ne (0, 0)\\
0 \quad \amp (x, y)= (0, 0)\\
\end{cases}
\end{equation*}
Then at \((x, y)\ne (0, 0)\text{,}\)
\begin{align*}
\frac{\partial f}{\partial x}(x, y) \amp= \frac{y \left(4 x^2 y^2+x^4-y^4\right)}{\left(x^2+y^2\right)^2},\\
\frac{\partial f}{\partial y}(x, y) \amp=\frac{-4 x^3 y^2+x^5-x y^4}{\left(x^2+y^2\right)^2},\\
\frac{\partial^{2} f}{\partial y \partial x}(x, y) \amp=\frac{\left(x^2-y^2\right) \left(10 x^2 y^2+x^4+y^4\right)}{\left(x^2+y^2\right)^3},\\
\frac{\partial^{2} f}{\partial x \partial y}(x, y) \amp=\frac{\left(x^2-y^2\right) \left(10 x^2 y^2+x^4+y^4\right)}{\left(x^2+y^2\right)^3},
\end{align*}
so we see that
\begin{equation*}
\frac{\partial^{2} f}{\partial y \partial x}(x, y)= \frac{\partial^{2} f}{\partial x \partial y}(x, y) \quad \text{ when }
(x, y)\ne (0, 0).
\end{equation*}
We can also verify directly by definition that
\begin{equation*}
\frac{\partial f}{\partial x}(0, 0)=0, \quad \frac{\partial f}{\partial y}(0, 0)=0,
\end{equation*}
and to compute \(\frac{\partial^{2} f}{\partial y \partial x}(0, 0)\text{,}\) we only need to examine the derivative with respect to \(y\) of \(\frac{\partial f}{\partial x}(0, y)= -y\text{,}\) which gives \(-1\text{;}\) while to compute \(\frac{\partial^{2} f}{\partial x \partial y}(0, 0)\text{,}\) we only need to examine the derivative with respect to \(x\) of \(\frac{\partial f}{\partial y}(x, 0)=x\text{,}\) which gives \(1\text{.}\) Thus
\begin{equation*}
\frac{\partial^{2} f}{\partial y \partial x}(0, 0)=-1\ne 1= \frac{\partial^{2} f}{\partial x \partial y}(0, 0).
\end{equation*}
Theorem 5.6.2. Clairault Theorem.
Suppose that
\(D_{i}f(\bx), D_{j}f(\bx), D_{ij}f(\bx)\) exist in a neighborhood of
\(\bx\) and are continuous at
\(\bx\text{.}\) Then
\(D_{ji}f(\bx)\) exists and equals
\(D_{ij}f(\bx)\text{.}\)
Proof.
We may set \(\bx=\mathbf 0\text{,}\) \(i=1, j=2\text{,}\) and \(n=2\text{.}\) Then we need to show that
\begin{equation*}
D_{21}f(0, 0)= \lim_{y\to 0}\frac{D_{1}f(0, y)-D_{1}f(0, 0)}{y}
\end{equation*}
exists and equals \(D_{12}f(0, 0)\text{.}\) But
\begin{equation*}
\frac{D_{1}f(0, y)-D_{1}f(0, 0)}{y} = \lim_{x\to 0}\frac{ f(x, y)-f(0, y)-f(x, 0)+f(0,0)}{xy}.
\end{equation*}
Applying the mean value theorem to \(f(x, y)-f(0, y)\) as a function of \(y\text{,}\) we get
\begin{equation*}
f(x, y)-f(0, y)-[f(x, 0)-f(0, 0)]=\left[D_{2}f(x, y^{*})-D_{2}f(0, y^{*})\right]y
\end{equation*}
for some \(y^{*}\) between \(0\) and \(y\) which may also depend on \(x\text{.}\) Applying the mean value theorem to \(D_{2}f(x, y^{*})-D_{2}f(0, y^{*})\) as a function of \(x\text{,}\) we get
\begin{equation*}
D_{2}f(x, y^{*})-D_{2}f(0, y^{*})=D_{12}f(x^{*}, y^{*})x
\end{equation*}
for some \(x^{*}\) between \(0\) and \(x\text{.}\) Using the continuity of \(D_{12}f(x, y)\) at \((0, 0)\text{,}\) it follows that
\begin{equation*}
D_{21}f(0, 0)=\lim_{y\to 0} \lim_{x\to 0} D_{12}f(x^{*}, y^{*}) \text{ exists and }=D_{12}f(0, 0).
\end{equation*}
Definition 5.6.4.
Suppose that
\(U\subset \bbR^{n}\) is open and
\(k\in \bbN\text{.}\) We define
\(C^{k}(U)\) to be the space of functions which have
\(j\) th order continuous partial derivatives in
\(U\) for
\(1\le j\le k\text{.}\) When
\(k=1\) we say functions in
\(C^{1}(U)\) are
continuously differentiable in
\(U\text{,}\) and when
\(k\gt 1\text{,}\) we say that functions in
\(C^{k}(U)\) are
\(k\)-times continuously differentiable in
\(U\text{.}\) We define
\(C^{k}(\bar U)\) to be the space of functions in
\(C^{k}(U)\) such that each of its
\(j\) th order partial derivative has a continuous extension to
\(\bar U\text{.}\)
For any multi-index \(\alpha=(\alpha_{1},\ldots, \alpha_{n})\) we denote \(|\alpha| :=\alpha_{1}+\ldots+\alpha_{n}\text{.}\) Note that
\begin{equation*}
\Vert f\Vert_{C^{k}(\bar U)}:=\sum_{j=0}^{k}\sum_{|\alpha|=j}\max_{\bx \in \bar U}|D^{\alpha}u(\bx)|
\end{equation*}
defines a norm on \(C^{k}(\bar U)\) and makes the latter a complete metric space.
Suppose that \(f\in C^{k}(U)\) and \(\bx \in U\text{.}\) Take any vector \(\bv \in \bbR^{n}\) and consider \(f(\bx+ t\bv)\) as a one variable function \(g(t)\) of \(t\) for \(t\) near \(0\text{.}\) Then by the chain rule
\begin{align*}
g'(t)\amp =\sum_{j=1}^{n} v_{j}\frac{\partial f}{\partial x_{j}}(\bx+ t\bv)\\
g''(t)\amp =\sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx+ t\bv)\\
g^{(k)}(t)\amp =\sum_{j_{1},\ldots, j_{k}=1}^{n}v_{j_{1}}\cdots v_{j_{k}}
\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx+ t\bv)
\end{align*}
Then the one variable Taylor expansion
\begin{equation}
g(t)=g(0)+g'(0)t+\frac{g''(0)}{2!}t^{2}+\cdots+ \frac{g^{(k)}(0)}{k!}
t^{k} +R_{k}(t)\tag{5.6.1}
\end{equation}
gives rise to
\begin{align*}
f(\bx+ t\bv)= \amp f(\bx)+ \sum_{j=1}^{n} t v_{j}\frac{\partial f}{\partial x_{j}}(\bx)+
\frac{t^{2}}{2!} \sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)\\
\amp +\cdots + \frac{t^{k}}{k!}\sum_{j_{1},\ldots,j_{k}=1}^{n} v_{j_{1}}\cdots v_{j_{k}}\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)+R_{k}(t).
\end{align*}
The remainder has the property that \(\vert R_{k}(t)\vert/t^{k} \to 0\) as \(t \to 0\text{.}\)
To get the dependence of
\(R_{k}(t)\) in
(5.6.1) on
\(\bv\text{,}\) we use a version of
(5.6.1) with an integral remainder term:
\begin{align}
\amp g(t) \notag\\
= \amp g(0)+g'(0)t+\frac{g''(0)}{2!}t^{2}+\cdots+ \frac{g^{(k-1)}(0)}{(k-1)!}t^{k-1}\notag\\
\amp +\frac{1}{(k-1)!}\int_{0}^{t}g^{(k)}(s)(t-s)^{k-1}\, ds\text{,}\tag{5.6.2}
\end{align}
from which we find
\begin{equation*}
R_{k}(t)=\frac{1}{(k-1)!}\int_{0}^{t}\left( g^{(k)}(s)- g^{(k)}(0)\right) (t-s)^{k-1}\, ds.
\end{equation*}
For \(g(t)=f(\bx +t\bv)\text{,}\) if we make the change of variable \(s=t \tau\) in the above integral, we see that \(R_{k}(t)\) equals
\begin{align*}
\amp \frac{1}{(k-1)!}\int_{0}^{1}
\sum_{j_{1},\ldots, j_{k}=1}^{n} t^{k} v_{j_{1}}\cdots v_{j_{k}}
\left( \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx+ \tau t\bv)-
\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)\right) (1-\tau)^{k-1}\, d\tau\\
=\amp \frac{1}{(k-1)!}\int_{0}^{1}
\sum_{j_{1},\ldots, j_{k}=1}^{n} h_{j_{1}}\cdots h_{j_{k}}
\left( \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx+ \tau \bh)-
\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)\right) (1-\tau)^{k-1}\, d\tau
\end{align*}
so \(R_{k}(t)\) is actually a function of \(\bx\) and \(\bh=t\bv\text{.}\)
Using the continuity of \(\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}\) at \(\bx\text{,}\) we find that, for any \(\epsilon > 0\text{,}\) there exists some \(\delta > 0\) such that
\begin{equation*}
\left\vert \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx +\bh) -
\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)\right\vert \lt \epsilon
\end{equation*}
for all \(\bh\) with \(\Vert \bh \Vert \lt \delta\text{.}\) Thus when \(0\le \Vert \bh \Vert \lt \delta\text{,}\) we have
\begin{align*}
\vert R_{k}(t)\vert \le \amp \frac{\epsilon}{(k-1)!} \int_{0}^{1}
\sum_{j_{1},\ldots,j_{k}=1}^{n} \vert h_{j_{1}}\vert \cdots \vert h_{j_{k}}\vert (1-s)^{k-1}\, ds \\
\le \amp \frac{C(n, k) \epsilon \Vert \bh\Vert^{k} }{k!}
\end{align*}
where we have used \(\sum_{j_{1},\ldots,j_{k}=1}^{n} \vert h_{j_{1}}\vert
\cdots \vert h_{j_{k}}\vert \le C(n, k) \Vert \bh\Vert^{k}\text{.}\)
Theorem 5.6.5. Taylor Expansion.
Suppose that \(f\in C^{k}(U)\) and \(\bx \in U \subset \bbR^{n}\text{.}\) Then the \(k\)th order Taylor expansion of \(f\) at \(\bx\text{,}\) \(T_{k}(f,\bx)(\bh) \text{,}\) defined as
\begin{align*}
\amp f(\bx)+ \sum_{j=1}^{n} h_{j}\frac{\partial f}{\partial x_{j}}(\bx)+
\frac{1}{2!} \sum_{i, j=1}^{n} h_{j}h_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)
+\cdots \\
+ \amp \frac{1}{k!}\sum_{j_{1},\ldots,j_{k}=1}^{n} h_{j_{1}}\cdots h_{j_{k}}\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)
\end{align*}
satisfies
\begin{equation}
\left\lvert f(\bx +\bh)- T_{k}(f,\bx)(\bh) \right\rvert/\Vert \bh \Vert^{k} \to 0 \text{ as $\bh \to \mathbf 0$}.\tag{5.6.3}
\end{equation}
Furthermore, under the assumption here, for any subdomain \(U'\) of \(U\) such that the closure of \(U'\) is compact and any \(\epsilon > 0\text{,}\) there exists \(\delta > 0\) such that
\begin{equation*}
\left\lvert f(\bx +\bh)- T_{k}(f,\bx)(\bh) \right\rvert \le \epsilon \Vert \bh \Vert^{k} \text{ for all $\bx\in U',
\Vert \bh \Vert \lt \delta$.}
\end{equation*}
The expansion
(5.6.3) is used often when
\(k=2\text{,}\) where we can write
\begin{equation*}
\sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)
=\bv^{\rm t}[D^{2}f(\bx)]\bv
\end{equation*}
with \([D^{2}f(\bx)]\) denoting the Hessian matrix of \(f\) at \(\bx\) with entries \(\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)\text{,}\) and \(\bv^{\rm t}\) denoting the transpose of \(\bv\text{.}\) If \(\bx\) is an interior minimum of \(f\text{,}\) then for any vector \(\bv\text{,}\) the one variable function \(f(\bx+t\bv)\) has \(t=0\) as an interior minimum. Therefore
\begin{equation*}
g''(0)=\sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx) \ge 0.
\end{equation*}
This then implies that the Hessian matrix \([D^{2}f(\bx)]\) is non-negative definite. Conversely, if \(\bx\) is an interior critical point of a twice continuously differentiable \(f\text{,}\) namely, \(D_{i}f(\bx)=0\) for all \(i=1,\ldots, n\text{,}\) and \([D^{2}f(\bx)]\) is positive definite, then the Taylor expansion of order \(2\) above would show that \(\bx\) is a local minimum of \(f\text{.}\)
Exercise 5.6.7.
Prove
(5.6.3) under the assumption that all partial derivatives of
\(f\) of order up to
\(k-1\) are defined and continuous in a neighborhood of
\(\bx\text{,}\) and all partial derivatives of
\(f\) of order
\((k-1)\) are differentiable at
\(\bx\text{.}\)
Hint.
Use
(5.6.2) with the integral remainder term at order
\(k-1\) and use the differentiability of the order
\((k-1)\) partial derivatives of
\(f\) at
\(\bx\text{,}\) which appear in
\(g^{(k-1)}(s)\) in the integral remainder term.
Exercise 5.6.8.
Prove that, if
(5.6.4) holds for some
\(P_{k}(\bx; \bh)\text{,}\) then it is unique.
Exercise 5.6.9.
Construct an example of a function such that
(5.6.4) holds for some
\(P_{k}(\bx; \bh)\text{,}\) but
\(f\) fails to have derivatives at a sequence
\(\bx_{m}\to \bx\text{.}\)