Skip to main content

Section 5.6 Higher Order Derivatives and Taylor Expansion

When \(\bff :U\subset \bbR^{n}\mapsto \bbR^{m}\) is differentiable in \(U\text{,}\) then its Jacobian matrix \([D\bff (\bx)]\) can be considered as a map from \(U\) into the vector space of \(m\times n\) matrices, which can be identified with \(\bbR^{m\times n}\text{.}\) Thus it makes sense to consider whether \([D\bff (\bx)]\) is differentiable in \(U\text{.}\)
If \([D\bff (\bx)]\) is differentiable at \(\bx \in U\text{,}\) then for \(\bh\) with \(|\bh|\) is small,
\begin{equation*} \Vert [D\bff (\bx+\bh)]-[D\bff (\bx)] - [D\left(D\bff\right) (\bx)]\bh\Vert /\Vert \bh \Vert \to 0, \text{ as $\bh\to 0$,} \end{equation*}
where \(\bh \mapsto [D\left(D\bff\right) (\bx)]\bh\) is a linear map from \(\bbR^{n}\) into the vector space of \(m\times n\) matrices \(\bbR^{m\times n}\text{:}\) if \(\bff(\bx)=(f_{1}(\bx),\cdots, f_{m}(\bx))\) and \(\bh=(h_{1},\cdots, h_{n})\text{,}\) then each of the component \(\frac{\partial f_{i}}{\partial x_{j}}\) of \([D\bff (\bx)]\) is differentiable at \(\bx\) and has directional derivatives at \(\bx\) in any direction, and
\begin{equation*} [D\left(D\bff\right) (\bx)]\bh= \sum_{k=1}^{n} h_{k}[ D_{x_{k}}\left(D\bff\right) (\bx)]. \end{equation*}
In terms of the \((i, j)\) entry of the output matrix, it is
\begin{equation*} \sum_{k=1}^{n} h_{k}D_{x_{k}}\left( \frac{\partial f_{i}}{\partial x_{j}}\right) (\bx) := \sum_{k=1}^{n} h_{k} \frac{\partial^{2} f_{i}}{\partial x_{k}\partial x_{j}} (\bx)\text{.} \end{equation*}
These quantities \(D_{x_{k}}\left( \frac{\partial f_{i}}{\partial x_{j}}\right) (\bx) = \frac{\partial^{2} f_{i}}{\partial x_{k}\partial x_{j}} (\bx)\) are called the second derivatives of \(f_{i}(\bx)\text{.}\)
We will not spend energy on the more abstract concept of higher order differentials, but focus on the higher order (partial) derivatives of a scalar-valued function, where we define, say, third order derivatives of a scalar function \(f(\bx)\) via
\begin{equation*} D^{3}_{x_{l}x_{k}x_{j}} f(\bx):= \frac{\partial^{3} f}{\partial x_{l} \partial x_{k}\partial x_{j}} (\bx)= D_{x_{l}}\left( \frac{\partial^{2} f}{\partial x_{k}\partial x_{j}} \right)(\bx), \end{equation*}
when this derivative is defined. We will often work in a setting where all the \(k\)th order partial derivatives of a function are continuous in a region, therefore any of its \(j\)th order partial derivatives, for \(j\le (k-1)\text{,}\) are differentiable.
One basic question is whether the order in which to take the different mixed higher order derivatives affects the outcomes.

Example 5.6.1.

Define \(f(x, y)=x^{y}\) for \(x, y \gt 0\text{.}\) Then
\begin{equation*} D_{x} f(x, y)= y x^{y-1}, \; D_{y}f(x, y)=x^{y} \ln x\text{,} \end{equation*}
and
\begin{align*} \frac{\partial^2 f}{\partial y \partial x} = \frac{\partial }{\partial y}\left( \frac{\partial f}{\partial x} \right) = \frac{\partial }{\partial y}\left( y x^{y-1} \right) \amp= x^{y-1}+ y x^{y-1} \ln x,\\ \frac{\partial^2 f}{\partial x \partial y} = \frac{\partial }{\partial x}\left( \frac{\partial f}{\partial y} \right) = \frac{\partial }{\partial x}\left(x^{y} \ln x \right) \amp= y x^{y-1} \ln x + x^{y-1}, \end{align*}
So \(\frac{\partial^2 f}{\partial y \partial x}= \frac{\partial^2 f}{\partial x \partial y}\) for this function. But this property does require some conditions on the function.
Consider
\begin{equation*} f(x, y)=\begin{cases} xy \frac{x^{2}-y^{2}}{x^{2}+y^{2}}\quad \amp (x, y)\ne (0, 0)\\ 0 \quad \amp (x, y)= (0, 0)\\ \end{cases} \end{equation*}
Then at \((x, y)\ne (0, 0)\text{,}\)
\begin{align*} \frac{\partial f}{\partial x}(x, y) \amp= \frac{y \left(4 x^2 y^2+x^4-y^4\right)}{\left(x^2+y^2\right)^2},\\ \frac{\partial f}{\partial y}(x, y) \amp=\frac{-4 x^3 y^2+x^5-x y^4}{\left(x^2+y^2\right)^2},\\ \frac{\partial^{2} f}{\partial y \partial x}(x, y) \amp=\frac{\left(x^2-y^2\right) \left(10 x^2 y^2+x^4+y^4\right)}{\left(x^2+y^2\right)^3},\\ \frac{\partial^{2} f}{\partial x \partial y}(x, y) \amp=\frac{\left(x^2-y^2\right) \left(10 x^2 y^2+x^4+y^4\right)}{\left(x^2+y^2\right)^3}, \end{align*}
so we see that
\begin{equation*} \frac{\partial^{2} f}{\partial y \partial x}(x, y)= \frac{\partial^{2} f}{\partial x \partial y}(x, y) \quad \text{ when } (x, y)\ne (0, 0). \end{equation*}
We can also verify directly by definition that
\begin{equation*} \frac{\partial f}{\partial x}(0, 0)=0, \quad \frac{\partial f}{\partial y}(0, 0)=0, \end{equation*}
and to compute \(\frac{\partial^{2} f}{\partial y \partial x}(0, 0)\text{,}\) we only need to examine the derivative with respect to \(y\) of \(\frac{\partial f}{\partial x}(0, y)= -y\text{,}\) which gives \(-1\text{;}\) while to compute \(\frac{\partial^{2} f}{\partial x \partial y}(0, 0)\text{,}\) we only need to examine the derivative with respect to \(x\) of \(\frac{\partial f}{\partial y}(x, 0)=x\text{,}\) which gives \(1\text{.}\) Thus
\begin{equation*} \frac{\partial^{2} f}{\partial y \partial x}(0, 0)=-1\ne 1= \frac{\partial^{2} f}{\partial x \partial y}(0, 0). \end{equation*}

Proof.

We may set \(\bx=\mathbf 0\text{,}\) \(i=1, j=2\text{,}\) and \(n=2\text{.}\) Then
\begin{align*} D_{21}f(0, 0)=\amp \lim_{y\to 0}\frac{D_{1}f(0, y)-D_{1}f(0, 0)}{y}\\ =\amp \lim_{y\to 0} \lim_{x\to 0}\frac{ f(x, y)-f(0, y)-f(x, 0)+f(0,0)}{xy}. \end{align*}
But applying the mean value theorem to \(f(x, y)-f(0, y)\) as a function of \(y\text{,}\) we get
\begin{equation*} f(x, y)-f(0, y)-[f(x, 0)-f(0, 0)]=\left[D_{2}f(x, y^{*})-D_{2}f(0, y^{*})\right]y \end{equation*}
for some \(y^{*}\) between \(0\) and \(y\) which may also depend on \(x\text{.}\) Applying the mean value theorem to \(D_{2}f(x, y^{*})-D_{2}f(0, y^{*})\) as a function of \(x\text{,}\) we get
\begin{equation*} D_{2}f(x, y^{*})-D_{2}f(0, y^{*})=D_{12}f(x^{*}, y^{*})x \end{equation*}
for some \(x^{*}\) between \(0\) and \(x\text{.}\) Using the continuity of \(D_{12}f(x, y)\) at \((0, 0)\text{,}\) it follows that
\begin{equation*} D_{21}f(0, 0)=\lim_{y\to 0} \lim_{x\to 0} D_{12}f(x^{*}, y^{*})=D_{12}f(0, 0). \end{equation*}

Remark 5.6.3.

Note that both the formulation and proof of Clairault’s theorem only involve the behavior of the function along the plane spanned by two specific coordinate directions, so it does not by itself imply the differentiability of \(D_{i}f\text{.}\) In fact, it is possible to have a function \(f\) of two variables such that \(D_{i}f, D_{ij}f\) exist and \(D_{ij}f=D_{ji}f\text{,}\) yet \(D_{i}f\) may fail to be differentiable. Here is a simple example:
\begin{equation*} f(x, y)=\begin{cases} \frac{x^{2}y^{2}}{x^{2}+y^{2}}\amp (x, y)\ne (0, 0);\\ 0 \amp (x, y)=(0, 0). \end{cases} \end{equation*}
Fortunately in most context we will work with functions whose second order derivatives are continuous in the domain of interest so the first order derivatives are differentiable.

Definition 5.6.4.

Suppose that \(U\subset \bbR^{n}\) is open and \(k\in \bbN\text{.}\) We define \(C^{k}(U)\) to be the space of functions which have \(j\) th order continuous partial derivatives in \(U\) for \(1\le j\le k\text{.}\) When \(k=1\) we say functions in \(C^{1}(U)\) are continuously differentiable in \(U\text{,}\) and when \(k\gt 1\text{,}\) we say that functions in \(C^{k}(U)\) are \(k\)-times continuously differentiable in \(U\text{.}\) We define \(C^{k}(\bar U)\) to be the space of functions in \(C^{k}(U)\) such that each of its \(j\) th order partial derivative has a continuous extension to \(\bar U\text{.}\)
For any multi-index \(\alpha=(\alpha_{1},\ldots, \alpha_{n})\) we denote \(|\alpha| :=\alpha_{1}+\ldots+\alpha_{n}\text{.}\) Note that
\begin{equation*} \Vert f\Vert_{C^{k}(\bar U)}:=\sum_{j=0}^{k}\sum_{|\alpha|=j}\max_{\bx \in \bar U}|D^{\alpha}u(\bx)| \end{equation*}
defines a norm on \(C^{k}(\bar U)\) and makes the latter a complete metric space.
Suppose that \(f\in C^{k}(U)\) and \(\bx \in U\text{.}\) Take any vector \(\bv \in \bbR^{n}\) and consider \(f(\bx+ t\bv)\) as a one variable function \(g(t)\) of \(t\) for \(t\) near \(0\text{.}\) Then by the chain rule
\begin{align*} g'(t)\amp =\sum_{j=1}^{n} v_{j}\frac{\partial f}{\partial x_{j}}(\bx+ t\bv)\\ g''(t)\amp =\sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx+ t\bv)\\ g^{(k)}(t)\amp =\sum_{j_{1},\ldots, j_{k}=1}^{n}v_{j_{1}}\cdots v_{j_{k}} \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx+ t\bv) \end{align*}
Then the one variable Taylor expansion
\begin{equation} g(t)=g(0)+g'(0)t+\frac{g''(0)}{2!}t^{2}+\cdots+ \frac{g^{(k)}(0)}{k!} t^{k} +R_{k}(t)\tag{5.6.1} \end{equation}
gives rise to
\begin{align*} f(\bx+ t\bv)= \amp f(\bx)+ \sum_{j=1}^{n} t v_{j}\frac{\partial f}{\partial x_{j}}(\bx)+ \frac{t^{2}}{2!} \sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)\\ \amp +\cdots + \frac{t^{k}}{k!}\sum_{j_{1},\ldots,j_{k}=1}^{n} v_{j_{1}}\cdots v_{j_{k}}\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)+R_{k}(t). \end{align*}
The remainder has the property that \(\Vert R_{k}(t)\Vert/t^{k} \to 0\) as \(t \to 0\text{.}\)
To get the dependence of \(R_{k}(t)\) in (5.6.1) on \(\bv\text{,}\) we use a version of (5.6.1) with an integral remainder term:
\begin{align} \amp g(t) \notag\\ = \amp g(0)+g'(0)t+\frac{g''(0)}{2!}t^{2}+\cdots+ \frac{g^{(k-1)}(0)}{(k-1)!}t^{k-1} +\frac{1}{(k-1)!}\int_{0}^{t}g^{(k)}(s)(t-s)^{k-1}\, ds\text{,}\tag{5.6.2} \end{align}
from which we find
\begin{equation*} R_{k}(t)=\frac{1}{(k-1)!}\int_{0}^{t}\left( g^{(k)}(s)- g^{(k)}(0)\right) (t-s)^{k-1}\, ds. \end{equation*}
For \(g(t)=f(\bx +t\bv)\text{,}\) if we make the change of variable \(s=t \tau\) in the above integral, we see that \(R_{k}(t)\) equals
\begin{align*} \amp \frac{1}{(k-1)!}\int_{0}^{1} \sum_{j_{1},\ldots, j_{k}=1}^{n} t^{k} v_{j_{1}}\cdots v_{j_{k}} \left( \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx+ \tau t\bv)- \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)\right) (1-\tau)^{k-1}\, d\tau\\ =\amp \frac{1}{(k-1)!}\int_{0}^{1} \sum_{j_{1},\ldots, j_{k}=1}^{n} h_{j_{1}}\cdots h_{j_{k}} \left( \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx+ \tau \bh)- \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)\right) (1-\tau)^{k-1}\, d\tau \end{align*}
so \(R_{k}(t)\) is actually a function of \(\bx\) and \(\bh=t\bv\text{.}\)
Using the continuity of \(\frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}\) at \(\bx\text{,}\) we find that, for any \(\epsilon > 0\text{,}\) there exists some \(\delta >0\) such that
\begin{equation*} \left\Vert \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx +\bh) - \frac{\partial^{k} f}{\partial x_{j_{k}}\cdots \partial x_{j_{1}}}(\bx)\right\Vert \lt \epsilon \end{equation*}
for all \(\bh\) with \(\Vert \bh \Vert \lt \delta\text{.}\) Thus when \(0\le \Vert \bh \Vert \lt \delta\text{,}\) we have
\begin{align*} \vert R_{k}(t)\vert \le \amp \frac{\epsilon}{(k-1)!} \int_{0}^{1} \sum_{j_{1},\ldots,j_{k}=1}^{n} \vert h_{j_{1}}\vert \cdots \vert h_{j_{k}}\vert (1-s)^{k-1}\, ds \\ \le \amp \frac{C(n, k) \epsilon \Vert \bh\Vert^{k} }{k!} \end{align*}
where we have used \(\sum_{j_{1},\ldots,j_{k}=1}^{n} \vert h_{j_{1}}\vert \cdots \vert h_{j_{k}}\vert \le C(n, k) \Vert \bh\Vert^{k}\text{.}\)
We summarize this as

Remark 5.6.6.

(5.6.3) can also be established under the weaker assumption that all partial derivatives of \(f\) of order up to \(k-1\) are defined and continuous in a neighborhood of \(\bx\text{,}\) and all all partial derivatives of \(f\) of order \(k-1\) are differentiable at \(\bx\text{.}\)
\(T_{k}(f,\bx)(\bh)\) is a polynomial in \(\bh\) of degree at most \(k\text{.}\) There are contexts where one works with a function \(f\) with the property that at some \(\bx\text{,}\) there exists a polynomial in \(\bh\) of degree at most \(k\text{,}\) \(P_{k}(\bx;\bh)\) such that
\begin{equation} \Vert f(\bx +\bh)- P_{k}(\bx;\bh)\Vert/\Vert \bh \Vert^{k} \to 0 \text{ as $\bh \to \mathbf 0$}.\tag{5.6.4} \end{equation}
When this holds, such a \(P_{k}(\bx;\bh)\) is unique and \(f\) is differentiable at \(\bx\) with \(f(\bx)=P_{k}(\bx; \mathbf 0)\text{,}\) \(D_{\bx}f(\bx)=D_{\bh}P_{k}(\bx; \mathbf 0)\text{,}\) but \(f\) may not have derivatives at all nearby points.
The expansion (5.6.3) is used often when \(k=2\text{,}\) where we can write
\begin{equation*} \sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx) =\bv^{\rm t}[D^{2}f(\bx)]\bv \end{equation*}
with \([D^{2}f(\bx)]\) denoting the Hessian matrix of \(f\) at \(\bx\) with entries \(\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx)\text{,}\) and \(\bv^{\rm t}\) denoting the transpose of \(\bv\text{.}\) If \(\bx\) is an interior minimum of \(f\text{,}\) then for any vector \(\bv\text{,}\) the one variable function \(f(\bx+t\bv)\) has \(t=0\) as an interior minimum. Therefore
\begin{equation*} g''(0)=\sum_{i, j=1}^{n} v_{j}v_{i}\frac{\partial^{2} f}{\partial x_{i}\partial x_{j}}(\bx) \ge 0. \end{equation*}
This then implies that the Hessian matrix \([D^{2}f(\bx)]\) is non-negative definite. Conversely, if \(\bx\) is an interior critical point of a twice continuously differentiable \(f\text{,}\) namely, \(D_{i}f(\bx)=0\) for all \(i=1,\ldots, n\text{,}\) and \([D^{2}f(\bx)]\) is positive definite, then the Taylor expansion of order \(2\) above would show that \(\bx\) is a local minimum of \(f\text{.}\)

Exercise 5.6.7.

Prove (5.6.3) under the assumption that all partial derivatives of \(f\) of order up to \(k-1\) are defined and continuous in a neighborhood of \(\bx\text{,}\) and all all partial derivatives of \(f\) of order \((k-1)\) are differentiable at \(\bx\text{.}\)
Hint.
Use (5.6.2) with the integral remainder term at order \(k-1\) and use the differentiability of the order \((k-1)\) partial derivatives of \(f\) at \(\bx\text{,}\) which appear in \(g^{k-1}(s)\) in the integral remainder term.

Exercise 5.6.8.

Exercise 5.6.9.

Construct an example of a function such that (5.6.4) holds for some \(P_{k}(\bx; \bh)\text{,}\) but \(f\) fails to have derivatives at a sequence \(\bx_{m}\to \bx\text{.}\)