Section 5.6 Higher Order Derivatives and Taylor Expansion
When
\(\bff :U\subset \bbR^{n}\mapsto \bbR^{m}\) is differentiable in
\(U\text{,}\) then its Jacobian matrix
\([D\bff (\bx)]\) can be considered as a map from
\(U\) into the vector space of
\(m\times n\) matrices, which can be identified with
\(\bbR^{m\times n}\text{.}\) Thus it makes sense to consider whether
\([D\bff (\bx)]\) is differentiable in
\(U\text{.}\)
If \([D\bff (\bx)]\) is differentiable at \(\bx \in U\text{,}\) then for \(\bh\) with \(|\bh|\) is small,
\begin{equation*}
\Vert [D\bff (\bx+\bh)]-[D\bff (\bx)] - [D\left(D\bff\right) (\bx)]\bh\Vert /\Vert \bh \Vert \to 0, \text{ as $\bh\to 0$,}
\end{equation*}
where \(\bh \mapsto [D\left(D\bff\right) (\bx)]\bh\) is a linear map from \(\bbR^{n}\) into the vector space of \(m\times n\) matrices \(\bbR^{m\times n}\text{:}\) if \(\bff(\bx)=(f_{1}(\bx),\cdots, f_{m}(\bx))\) and \(\bh=(h_{1},\cdots, h_{n})\text{,}\) then each of the component \(\frac{\partial f_{i}}{\partial x_{j}}\) of \([D\bff (\bx)]\) is differentiable at \(\bx\) and has directional derivatives at \(\bx\) in any direction, and
\begin{equation*}
[D\left(D\bff\right) (\bx)]\bh= \sum_{k=1}^{n} h_{k}[ D_{x_{k}}\left(D\bff\right) (\bx)].
\end{equation*}
In terms of the \((i, j)\) entry of the output matrix, it is
\begin{equation*}
\sum_{k=1}^{n} h_{k}D_{x_{k}}\left( \frac{\partial f_{i}}{\partial x_{j}}\right) (\bx)
:= \sum_{k=1}^{n} h_{k} \frac{\partial^{2} f_{i}}{\partial x_{k}\partial x_{j}} (\bx)\text{.}
\end{equation*}
These quantities \(D_{x_{k}}\left( \frac{\partial f_{i}}{\partial x_{j}}\right) (\bx)
= \frac{\partial^{2} f_{i}}{\partial x_{k}\partial x_{j}} (\bx)\) are called the second derivatives of \(f_{i}(\bx)\text{.}\)
We will not spend energy on the more abstract concept of higher order differentials, but focus on the higher order (partial) derivatives of a scalar-valued function, where we define, say, third order derivatives of a scalar function \(f(\bx)\) via
\begin{equation*}
D^{3}_{x_{l}x_{k}x_{j}} f(\bx):=
\frac{\partial^{3} f}{\partial x_{l} \partial x_{k}\partial x_{j}} (\bx)=
D_{x_{l}}\left( \frac{\partial^{2} f}{\partial x_{k}\partial x_{j}} \right)(\bx),
\end{equation*}
when this derivative is defined. We will often work in a setting where all the \(k\)th order partial derivatives of a function are continuous in a region, therefore any of its \(j\)th order partial derivatives, for \(j\le (k-1)\text{,}\) are differentiable.
One basic question is whether the order in which to take the different mixed higher order derivatives affects the outcomes.
Example 5.6.1.
Define \(f(x, y)=x^{y}\) for \(x, y \gt 0\text{.}\) Then
\begin{equation*}
D_{x} f(x, y)= y x^{y-1}, \; D_{y}f(x, y)=x^{y} \ln x\text{,}
\end{equation*}
and
\begin{align*}
\frac{\partial^2 f}{\partial y \partial x} = \frac{\partial }{\partial y}\left( \frac{\partial f}{\partial x}
\right) = \frac{\partial }{\partial y}\left( y x^{y-1} \right) \amp= x^{y-1}+ y x^{y-1} \ln x,\\
\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial }{\partial x}\left( \frac{\partial f}{\partial y}
\right) = \frac{\partial }{\partial x}\left(x^{y} \ln x \right) \amp= y x^{y-1} \ln x + x^{y-1},
\end{align*}
So \(\frac{\partial^2 f}{\partial y \partial x}= \frac{\partial^2 f}{\partial x \partial y}\) for this function. But this property does require some conditions on the function.
Consider
\begin{equation*}
f(x, y)=\begin{cases} xy \frac{x^{2}-y^{2}}{x^{2}+y^{2}}\quad \amp (x, y)\ne (0, 0)\\
0 \quad \amp (x, y)= (0, 0)\\
\end{cases}
\end{equation*}
Then at \((x, y)\ne (0, 0)\text{,}\)
\begin{align*}
\frac{\partial f}{\partial x}(x, y) \amp= \frac{y \left(4 x^2 y^2+x^4-y^4\right)}{\left(x^2+y^2\right)^2},\\
\frac{\partial f}{\partial y}(x, y) \amp=\frac{-4 x^3 y^2+x^5-x y^4}{\left(x^2+y^2\right)^2},\\
\frac{\partial^{2} f}{\partial y \partial x}(x, y) \amp=\frac{\left(x^2-y^2\right) \left(10 x^2 y^2+x^4+y^4\right)}{\left(x^2+y^2\right)^3},\\
\frac{\partial^{2} f}{\partial x \partial y}(x, y) \amp=\frac{\left(x^2-y^2\right) \left(10 x^2 y^2+x^4+y^4\right)}{\left(x^2+y^2\right)^3},
\end{align*}
so we see that
\begin{equation*}
\frac{\partial^{2} f}{\partial y \partial x}(x, y)= \frac{\partial^{2} f}{\partial x \partial y}(x, y) \quad \text{ when }
(x, y)\ne (0, 0).
\end{equation*}
We can also verify directly by definition that
\begin{equation*}
\frac{\partial f}{\partial x}(0, 0)=0, \quad \frac{\partial f}{\partial y}(0, 0)=0,
\end{equation*}
and to compute \(\frac{\partial^{2} f}{\partial y \partial x}(0, 0)\text{,}\) we only need to examine the derivative with respect to \(y\) of \(\frac{\partial f}{\partial x}(0, y)= -y\text{,}\) which gives \(-1\text{;}\) while to compute \(\frac{\partial^{2} f}{\partial x \partial y}(0, 0)\text{,}\) we only need to examine the derivative with respect to \(x\) of \(\frac{\partial f}{\partial y}(x, 0)=x\text{,}\) which gives \(1\text{.}\) Thus
\begin{equation*}
\frac{\partial^{2} f}{\partial y \partial x}(0, 0)=-1\ne 1= \frac{\partial^{2} f}{\partial x \partial y}(0, 0).
\end{equation*}
Theorem 5.6.2. Clairault Theorem.
Suppose that
\(D_{i}f(\bx), D_{j}f(\bx), D_{ij}f(\bx)\) exist in a neighborhood of
\(\bx\) and are continuous at
\(\bx\text{.}\) Then
\(D_{ji}f(\bx)\) exists and equals
\(D_{ij}f(\bx)\text{.}\)
Proof.
We may set \(\bx=\mathbf 0\text{,}\) \(i=1, j=2\text{,}\) and \(n=2\text{.}\) Then
\begin{align*}
D_{21}f(0, 0)=\amp \lim_{y\to 0}\frac{D_{1}f(0, y)-D_{1}f(0, 0)}{y}\\
=\amp \lim_{y\to 0} \lim_{x\to 0}\frac{ f(x, y)-f(0, y)-f(x, 0)+f(0,0)}{xy}.
\end{align*}
But applying the mean value theorem to \(f(x, y)-f(0, y)\) as a function of \(y\text{,}\) we get
\begin{equation*}
f(x, y)-f(0, y)-[f(x, 0)-f(0, 0)]=\left[D_{2}f(x, y^{*})-D_{2}f(0, y^{*})\right]y
\end{equation*}
for some \(y^{*}\) between \(0\) and \(y\) which may also depend on \(x\text{.}\) Applying the mean value theorem to \(D_{2}f(x, y^{*})-D_{2}f(0, y^{*})\) as a function of \(x\text{,}\) we get
\begin{equation*}
D_{2}f(x, y^{*})-D_{2}f(0, y^{*})=D_{12}f(x^{*}, y^{*})x
\end{equation*}
for some \(x^{*}\) between \(0\) and \(x\text{.}\) Using the continuity of \(D_{12}f(x, y)\) at \((0, 0)\text{,}\) it follows that
\begin{equation*}
D_{21}f(0, 0)=\lim_{y\to 0} \lim_{x\to 0} D_{12}f(x^{*}, y^{*})=D_{12}f(0, 0).
\end{equation*}
Definition 5.6.4.
Suppose that
\(U\subset \bbR^{n}\) is open and
\(k\in \bbN\text{.}\) We define
\(C^{k}(U)\) to be the space of functions which have
\(j\) th order continuous partial derivatives in
\(U\) for
\(1\le j\le k\text{.}\) When
\(k=1\) we say functions in
\(C^{1}(U)\) are
continuously differentiable in
\(U\text{,}\) and when
\(k\gt 1\text{,}\) we say that functions in
\(C^{k}(U)\) are
\(k\)-times continuously differentiable in
\(U\text{.}\) We define
\(C^{k}(\bar U)\) to be the space of functions in
\(C^{k}(U)\) such that each of its
\(j\) th order partial derivative has a continuous extension to
\(\bar U\text{.}\)
For any multi-index \(\alpha=(\alpha_{1},\ldots, \alpha_{n})\) we denote \(|\alpha| :=\alpha_{1}+\ldots+\alpha_{n}\text{.}\) Note that
\begin{equation*}
\Vert f\Vert_{C^{k}(\bar U)}:=\sum_{j=0}^{k}\sum_{|\alpha|=j}\max_{\bx \in \bar U}|D^{\alpha}u(\bx)|
\end{equation*}
defines a norm on \(C^{k}(\bar U)\) and makes the latter a complete metric space.
Suppose that \(f\in C^{k}(U)\) and \(\bx \in U\text{.}\) Take any vector \(\bv \in \bbR^{n}\) and consider \(f(\bx+ t\bv)\) as a one variable function \(g(t)\) of \(t\) for \(t\) near \(0\text{.}\) Then by the chain rule
\begin{align*}
g'(t)\amp =\sum_{j=1}^{n} v_{j}\frac{\partial f}{\partial x_{j}}(\bx+ t\bv)\\
g''(t)\amp =\sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx+ t\bv)\\
g^{(k)}(t)\amp =\sum_{j_{1},\ldots, j_{k}=1}^{n}v_{j_{1}}\cdots v_{j_{k}}
\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx+ t\bv)
\end{align*}
Then the one variable Taylor expansion
\begin{equation}
g(t)=g(0)+g'(0)t+\frac{g''(0)}{2!}t^{2}+\cdots+ \frac{g^{(k)}(0)}{k!}
t^{k} +R_{k}(t)\tag{5.6.1}
\end{equation}
gives rise to
\begin{align*}
f(\bx+ t\bv)= \amp f(\bx)+ \sum_{j=1}^{n} t v_{j}\frac{\partial f}{\partial x_{j}}(\bx)+
\frac{t^{2}}{2!} \sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)\\
\amp +\cdots + \frac{t^{k}}{k!}\sum_{j_{1},\ldots,j_{k}=1}^{n} v_{j_{1}}\cdots v_{j_{k}}\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)+R_{k}(t).
\end{align*}
The remainder has the property that \(\Vert R_{k}(t)\Vert/t^{k} \to 0\) as \(t \to 0\text{.}\)
To get the dependence of
\(R_{k}(t)\) in
(5.6.1) on
\(\bv\text{,}\) we use a version of
(5.6.1) with an integral remainder term:
\begin{align}
\amp g(t) \notag\\
= \amp g(0)+g'(0)t+\frac{g''(0)}{2!}t^{2}+\cdots+ \frac{g^{(k-1)}(0)}{(k-1)!}t^{k-1}
+\frac{1}{(k-1)!}\int_{0}^{t}g^{(k)}(s)(t-s)^{k-1}\, ds\text{,}\tag{5.6.2}
\end{align}
from which we find
\begin{equation*}
R_{k}(t)=\frac{1}{(k-1)!}\int_{0}^{t}\left( g^{(k)}(s)- g^{(k)}(0)\right) (t-s)^{k-1}\, ds.
\end{equation*}
For \(g(t)=f(\bx +t\bv)\text{,}\) if we make the change of variable \(s=t \tau\) in the above integral, we see that \(R_{k}(t)\) equals
\begin{align*}
\amp \frac{1}{(k-1)!}\int_{0}^{1}
\sum_{j_{1},\ldots, j_{k}=1}^{n} t^{k} v_{j_{1}}\cdots v_{j_{k}}
\left( \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx+ \tau t\bv)-
\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)\right) (1-\tau)^{k-1}\, d\tau\\
=\amp \frac{1}{(k-1)!}\int_{0}^{1}
\sum_{j_{1},\ldots, j_{k}=1}^{n} h_{j_{1}}\cdots h_{j_{k}}
\left( \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx+ \tau \bh)-
\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)\right) (1-\tau)^{k-1}\, d\tau
\end{align*}
so \(R_{k}(t)\) is actually a function of \(\bx\) and \(\bh=t\bv\text{.}\)
Using the continuity of \(\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}\) at \(\bx\text{,}\) we find that, for any \(\epsilon > 0\text{,}\) there exists some \(\delta >0\) such that
\begin{equation*}
\left\Vert \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx +\bh) -
\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)\right\Vert \lt \epsilon
\end{equation*}
for all \(\bh\) with \(\Vert \bh \Vert \lt \delta\text{.}\) Thus when \(0\le \Vert \bh \Vert \lt \delta\text{,}\) we have
\begin{align*}
\vert R_{k}(t)\vert \le \amp \frac{\epsilon}{(k-1)!} \int_{0}^{1}
\sum_{j_{1},\ldots,j_{k}=1}^{n} \vert h_{j_{1}}\vert \cdots \vert h_{j_{k}}\vert (1-s)^{k-1}\, ds \\
\le \amp \frac{C(n, k) \epsilon \Vert \bh\Vert^{k} }{k!}
\end{align*}
where we have used \(\sum_{j_{1},\ldots,j_{k}=1}^{n} \vert h_{j_{1}}\vert
\cdots \vert h_{j_{k}}\vert \le C(n, k) \Vert \bh\Vert^{k}\text{.}\)
Theorem 5.6.5. Taylor Expansion.
Suppose that \(f\in C^{k}(U)\) and \(\bx \in U \subset \bbR^{n}\text{.}\) Then the \(k\)th order Taylor expansion of \(f\) at \(\bx\text{,}\) \(T_{k}(f,\bx)(\bh) \text{,}\) defined as
\begin{align*}
\amp f(\bx)+ \sum_{j=1}^{n} h_{j}\frac{\partial f}{\partial x_{j}}(\bx)+
\frac{1}{2!} \sum_{i, j=1}^{n} h_{j}h_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)
+\cdots \\
+ \amp \frac{1}{k!}\sum_{j_{1},\ldots,j_{k}=1}^{n} h_{j_{1}}\cdots h_{j_{k}}\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)
\end{align*}
satisfies
\begin{equation}
\Vert f(\bx +\bh)- T_{k}(f,\bx)(\bh) \Vert/\Vert \bh \Vert^{k} \to 0 \text{ as $\bh \to \mathbf 0$}.\tag{5.6.3}
\end{equation}
Furthermore, under the assumption here, for any subdomain \(U'\) of \(U\) such that the closure of \(U'\) is compact and any \(\epsilon >0\text{,}\) there exists \(\delta >0\) such that
\begin{equation*}
\Vert f(\bx +\bh)- T_{k}(f,\bx)(\bh) \Vert \le \epsilon \Vert \bh \Vert^{k} \text{ for all $\bx\in U',
\Vert \bh \Vert \lt \delta$.}
\end{equation*}
The expansion
(5.6.3) is used often when
\(k=2\text{,}\) where we can write
\begin{equation*}
\sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)
=\bv^{\rm t}[D^{2}f(\bx)]\bv
\end{equation*}
with \([D^{2}f(\bx)]\) denoting the Hessian matrix of \(f\) at \(\bx\) with entries \(\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)\text{,}\) and \(\bv^{\rm t}\) denoting the transpose of \(\bv\text{.}\) If \(\bx\) is an interior minimum of \(f\text{,}\) then for any vector \(\bv\text{,}\) the one variable function \(f(\bx+t\bv)\) has \(t=0\) as an interior minimum. Therefore
\begin{equation*}
g''(0)=\sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx) \ge 0.
\end{equation*}
This then implies that the Hessian matrix \([D^{2}f(\bx)]\) is non-negative definite. Conversely, if \(\bx\) is an interior critical point of a twice continuously differentiable \(f\text{,}\) namely, \(D_{i}f(\bx)=0\) for all \(i=1,\ldots, n\text{,}\) and \([D^{2}f(\bx)]\) is positive definite, then the Taylor expansion of order \(2\) above would show that \(\bx\) is a local minimum of \(f\text{.}\)
Exercise 5.6.7.
Prove
(5.6.3) under the assumption that all partial derivatives of
\(f\) of order up to
\(k-1\) are defined and continuous in a neighborhood of
\(\bx\text{,}\) and all all partial derivatives of
\(f\) of order
\((k-1)\) are differentiable at
\(\bx\text{.}\)
Hint.
Use
(5.6.2) with the integral remainder term at order
\(k-1\) and use the differentiability of the order
\((k-1)\) partial derivatives of
\(f\) at
\(\bx\text{,}\) which appear in
\(g^{k-1}(s)\) in the integral remainder term.
Exercise 5.6.8.
Prove that, if
(5.6.4) holds for some
\(P_{k}(\bx; \bh)\text{,}\) then it is unique.
Exercise 5.6.9.
Construct an example of a function such that
(5.6.4) holds for some
\(P_{k}(\bx; \bh)\text{,}\) but
\(f\) fails to have derivatives at a sequence
\(\bx_{m}\to \bx\text{.}\)