Skip to main content

Section 5.3 Differentiation

Differentiability of a function of several variables is defined in terms of a linear approximation. The notion of directional derivative and partial derivative arise when the differentiability of a function of several variables is studied along a one-dimensional line.

Subsection 5.3.1 Differentiability, Directional Derivative, and Partial Derivative

Definition 5.3.1. Differentiability and Jacobian Matrix.

Suppose that \(X\) and \(Y\) are normed vector spaces, that \(E\subset X\) and \(\bx\) is an interior point of \(E\text{.}\) \({\mathbf f}:E\mapsto Y\) is said to be differentiable at \(\mathbf x\) if there exists a linear map \(A: X\mapsto Y\) with finite operator norm such that
\begin{equation} \lim_{{\mathbf h}\to {\mathbf 0}} \frac{\Vert {\mathbf f}({\mathbf x}+{\mathbf h})- {\mathbf f}({\mathbf x})-A{\mathbf h} \Vert_{Y}}{\Vert {\mathbf h} \Vert_{X}}=0.\tag{5.3.1} \end{equation}
In other words, for any \(\epsilon >0\text{,}\) there exists some \(\delta>0\) such that
\begin{equation*} \Vert {\mathbf f}({\mathbf x}+{\mathbf h})- {\mathbf f}({\mathbf x})-A{\mathbf h}\Vert_{Y} \le \epsilon \Vert {\mathbf h} \Vert_{X} \text{ for all } {\mathbf h} \text { with } \Vert {\mathbf h} \Vert < \delta. \end{equation*}
When \(X=\bbR^{n}, Y=\bbR^{m}\) and \(\bff\) is differentiable at \(\bx\text{,}\) the linear map \(A\bh\) can be represented as a matrix multiplication on \(\bh\) in the usual rectangular coordinates. We will denote this matrix also by \(A\) and call it the Jacobian matrix (also called Jacobian derivative or total derivative) of \(f\) at \(\bx\) and also denote it as \([D\mathbf f (\bx)]\text{.}\)
The key here is that
  • The linear approximation part, \(A {\mathbf h}\text{,}\) depends on \({\mathbf h}\) in a linear fashion with a finite operator norm: \(||A {\mathbf h}||_{Y}\le ||A|| ||{\mathbf h}||_{X}\) for all \({\mathbf h}\text{.}\)
  • The \(\delta\) works for all \({\mathbf h}\) with \(\Vert {\mathbf h} \Vert < \delta\) regardless its direction.
  • If we set
    \begin{equation*} R_{\bff}(\bx; \bh)= {\mathbf f}({\mathbf x}+{\mathbf h})- {\mathbf f}({\mathbf x})-A{\mathbf h}\text{,} \end{equation*}
    it is the remainder term when \({\mathbf f}({\mathbf x}+{\mathbf h})- {\mathbf f}({\mathbf x})\) is approximated by \(A{\mathbf h}\text{,}\) and (5.3.1) is equivalent to
    \begin{equation*} \Vert R_{\bff}(\bx; \bh) \Vert_{Y}/\Vert \bh \Vert_{X}\to 0 \text{ as $\bh\to \mathbf 0$.} \end{equation*}
One should check that when \(\bff\) is differentiable, then only one linear map \(A\) can satisfy the condition in the definition.

Definition 5.3.2. Directional Derivative.

Suppose \(D\subset X\text{,}\) \(\bx\) is an interior point of \(D\text{,}\) \({\mathbf u}\) is any non-zero vector (often taken as a unit vector). \({\mathbf f}\) is said to have directional derivative at \({\mathbf x}\) in the direction \({\mathbf u}\text{,}\) if the one variable function \(t\mapsto {\mathbf f}({\mathbf x}+t{\mathbf u})\) is differentiable at \(t=0\text{.}\) In such a case the derivative of this one variable function at \(t=0\) is called the directional derivative \(f\) at \({\mathbf x}\) in the direction \({\mathbf u}\text{,}\) and is denoted as \(D_{{\mathbf u}} {\mathbf f}({\mathbf x})\text{.}\)
Note that the existence of the directional derivative of \(\bff\) at \(\bx\) in the direction of \({\mathbf u}\) is defined as
\begin{equation*} \lim_{t\to 0} \frac{ {\mathbf f}({\mathbf x}+t{\mathbf u})- {\mathbf f}({\mathbf x})}{t}={\mathbf v} \end{equation*}
for some vector \({\mathbf v}\text{.}\) This can also be formulated as
\begin{equation*} \lim_{t\to 0} \Vert \frac{ {\mathbf f}({\mathbf x}+t{\mathbf u})- {\mathbf f}({\mathbf x})-t {\mathbf v} }{t} \Vert_{Y}=0, \end{equation*}
or equivalently, for any \(\epsilon >0\text{,}\) there exists some \(\delta>0\) such that
\begin{equation*} \Vert {\mathbf f}({\mathbf x}+t{\mathbf u})- {\mathbf f}({\mathbf x})-t {\mathbf v} \Vert_{Y} < \epsilon |t| \text{ for all } t \text{ with } |t|< \delta. \end{equation*}
But in this formulation there is no condition on how \({\mathbf v}\) may depend on \({\mathbf u}\) and the \(\delta\) here may also depend on \({\mathbf u}\text{.}\)

Remark 5.3.3.

The definition of differentiability involves the norm on \(X\) and \(Y\text{.}\) But due to LemmaΒ 5.2.13, when \(X\) is finite dimensional, it does not matter what norm to use on \(X\text{.}\) Similarly, when \(Y\) is finite dimensional, it does not matter what norm to use on \(Y\text{.}\)
From now on we will restrict to the case of maps between finite dimensional vector spaces, and usually take \(D=\bbR^{n}\text{.}\) When \(D_{{\mathbf u}} {\mathbf f}({\mathbf x})\) exists with \(\bu\) equal to the the standard unit vector \(\be_{j}\) along the \(x_{j}\) coordinate, we say that \(\bff\) has partial derivative at \(\bx\) in the \(x_{j}\) variable, and denote this partial derivative as \(\frac{\partial \bff }{\partial x_{j}}(\bx)\text{.}\) Thus \(\frac{\partial \bff (\bx)}{\partial x_{j}}= D_{{\mathbf e}_{j}}{\mathbf f}({\mathbf x})\text{.}\) Another commonly used notation for \(D_{{\mathbf e}_{j}}{\mathbf f}({\mathbf x})\) is \(D_{j}{\mathbf f}({\mathbf x})\text{,}\) or \(\partial_{j}{\mathbf f}({\mathbf x})\text{.}\)
When \(\bff\) is differentiable at \(\bx\text{,}\) then if we take any unit vector \(\bu\) and \({\mathbf h} =t {\mathbf u}\text{,}\) we get
\begin{align} \amp \lim_{{\mathbf h}\to {\mathbf 0}} \frac{\Vert {\mathbf f}({\mathbf x}+{\mathbf h})- {\mathbf f}({\mathbf x})-[D\mathbf f (\bx)]{\mathbf h} \Vert}{\Vert {\mathbf h} \Vert}\tag{5.3.2}\\ = \amp \lim_{t\to 0} \Vert \frac{ {\mathbf f}({\mathbf x}+t{\mathbf u})- {\mathbf f}({\mathbf x})-t [D\mathbf f (\bx)]{\mathbf u} }{t} \Vert=0,\tag{5.3.3} \end{align}
This means that the directional derivative of \({\mathbf f}\) at \({\mathbf x}\) in the direction \({\mathbf u}\) exists, and \(D_{{\mathbf u}} {\mathbf f}({\mathbf x})=[D\mathbf f (\bx)]\bu\text{.}\) In particular, if we take \({\mathbf u}={\mathbf e}_{j}\text{,}\) the standard unit vector along the \(x_{j}\) coordinate, we get the partial derivative of \(\bff\) at \(\bx\) \(D_{{\mathbf e}_{j}}{\mathbf f}({\mathbf x})=[D\mathbf f (\bx)]{\mathbf e}_{j}\text{,}\) which is the \(j\)th column of \([D\mathbf f (\bx)]\text{.}\)

Proof.

The only part that we have not provided detail is to prove the continuity of \(\bff\) at \(\bx\text{.}\) For any \(\epsilon >0\text{,}\) there exists some \(\delta>0\) such that
\begin{equation*} \Vert {\mathbf f}({\mathbf x}+{\mathbf h})- {\mathbf f}({\mathbf x})-A{\mathbf h}\Vert\le \epsilon \Vert {\mathbf h} \Vert \text{ for all } {\mathbf h} \text { with } \Vert {\mathbf h} \Vert < \delta. \end{equation*}
Then
\begin{equation*} \Vert {\mathbf f}({\mathbf x}+{\mathbf h}) - {\mathbf f}({\mathbf x})\Vert \le \Vert {\mathbf f}({\mathbf x}+{\mathbf h})- {\mathbf f}({\mathbf x})-A{\mathbf h}\Vert + \Vert A{\mathbf h}\Vert \le \epsilon \Vert {\mathbf h} \Vert + \Vert A{\mathbf h}\Vert. \end{equation*}
Using \(\Vert A{\mathbf h}\Vert \le \Vert A\Vert \Vert {\mathbf h}\Vert\) we can certainly adjust \(0\lt \delta \lt 1\) to make sure that when \(\Vert {\mathbf h} \Vert < \delta\text{,}\) we have \(\Vert A{\mathbf h}\Vert \lt \epsilon\text{,}\) which would guarantee that \(\Vert {\mathbf f}({\mathbf x}+{\mathbf h}) - {\mathbf f}({\mathbf x})\Vert \le 2 \epsilon\text{,}\) proving the continuity of \(\bff\) at \(\bx\text{.}\)

Question.

Here are a few basic questions related to the concept of a function differentiable at a point and having partial derivatives or directional derivatives there.
  • Suppose a function has partial derivatives at a point in all its coordinate directions. Does it imply that the function has directional derivatives at that point in any direction? Does it imply that the function is differentiable at that point? Does it imply that the function is continuous there?
  • Suppose a function has directional derivatives at a point in any direction. Does it imply that the function is differentiable at that point? Does it imply that the function is continuous at that point?

Example 5.3.5.

We discuss below two examples showing that the answers to all the above questions are negative.
(a)
A function may have partial derivative at some \({\mathbf x}\) in each \(x_{j}\) direction, yet can fail to be continuous there, and does not have directional derivative in any direction except for the coordinate directions.
Solution.
Here is a simple example for TaskΒ 5.3.5.a.
\begin{equation*} f(x, y)=\begin{cases} \frac{x y}{x^{2}+y^{2}} \amp\text{ if } (x, y)\ne (0, 0),\\ 0 \amp\text{ if } (x, y)= (0, 0).\\ \end{cases} \end{equation*}
Its restriction to either the \(x\) or \(y\) coordinate is identically zero, so its partial derivatives \(\frac{\partial f}{\partial x}(0,0)=\frac{\partial f}{\partial y}(0,0)=0\text{.}\) Yet for any direction \({\mathbf u}=(\cos\theta, \sin\theta)\text{,}\)
\begin{equation*} f(t\cos\theta, t\sin\theta)= \begin{cases} \cos\theta \sin\theta\amp\text{ if } (x, y)\ne (0, 0),\\ 0 \amp\text{ if } (x, y)= (0, 0).\\ \end{cases} \end{equation*}
which is not continuous at \(t=0\text{,}\) unless \(\cos\theta\sin\theta =0\text{.}\) Thus \(f\) is not continuous at \((0, 0)\) and \(D_{(\cos\theta, \sin\theta)}f(0,0)\) does not exist unless \(\theta\) points along either the \(x\) or \(y\) coordinate axis.
(b)
A function may have directional derivative at some \({\mathbf x}\) in each direction, yet can still fail to be differentiable there.
Solution.
Here is a simple example for TaskΒ 5.3.5.b.
\begin{equation*} f(x, y)=\begin{cases} \frac{x^{2}y}{x^{2}+y^{2}} \amp\text{ if } (x, y)\ne (0, 0),\\ 0 \amp\text{ if } (x, y)= (0, 0).\\ \end{cases} \end{equation*}
For any direction \({\mathbf u}=(\cos\theta, \sin\theta)\text{,}\) \(f(t\cos\theta, t\sin\theta)=t \cos^{2}\theta \sin\theta\text{,}\) so
\begin{equation*} D_{(\cos\theta, \sin\theta)}f(0,0)= \cos^{2}\theta \sin\theta\text{.} \end{equation*}
But it does not depend on \({\mathbf u}=(\cos\theta, \sin\theta)\) in a linear fashion, so \(f\) is not differentiable at \((0, 0)\text{.}\)
A formal way of proving that this \(f\) is not differentiable at \((0, 0)\) is to argue by contradiction. If it were differentiable at \((0, 0)\text{,}\) then the linear approximation must be given by \(f(0,0)+(f_{x}(0, 0), f_{y}(0, 0))\cdot (x, y)\text{.}\) But it is easy to see by definition that \(f_{x}(0, 0)= f_{y}(0, 0)=0\text{.}\) Thus we would have
\begin{equation*} \frac{|f(x, y)-f(0, 0) -0|}{\sqrt{x^{2}+y^{2}}} =\frac{|x^{2}y|}{(x^{2}+y^{2})^{3/2}}\to 0 \end{equation*}
as \(\sqrt{x^{2}+y^{2}}\to 0\text{.}\) But that is not the case, as when \((x, y)=t(\cos\theta, \sin\theta)\text{,}\) this quotient is \(|\cos^{2}\theta \sin\theta|\text{,}\) which does not tend to \(0\) as \(\sqrt{x^{2}+y^{2}}\to 0\text{.}\)

Example 5.3.6.

A function may have directional derivative at some \({\mathbf x}\) in each direction, yet can even fail to be continuous there. Here is a simple example
\begin{equation*} f(x, y)=\begin{cases} \frac{x^{3}y}{x^{6}+y^{2}} \amp\text{ if } (x, y)\ne (0, 0),\\ 0 \amp\text{ if } (x, y)= (0, 0).\\ \end{cases} \end{equation*}
For any direction \({\mathbf u}=(\cos\theta, \sin\theta)\text{,}\)
\begin{equation*} f(t\cos\theta, t\sin\theta)=t^{2} \frac{\cos^{3}\theta \sin\theta}{t^{4} \cos^{6}\theta+ \sin^{2}\theta} \end{equation*}
has its derivative equal \(0\) at \(t=0\text{.}\) However, as \(t\to 0\text{,}\) if we choose \(\theta\) to satisfy \(t^{2}\cos^{3}\theta =\sin\theta\text{,}\) we would get \(f(t\cos\theta, t\sin\theta) =\frac 12\text{,}\) which is not \(\to 0\text{.}\)

Example 5.3.7.

A function may be differentiable at a point but may not have partial derivatives at nearby points. \(f(x, y)=|xy|\) near \((0, 0)\) is such an example.
In addition to the possible difference of behavior as illustrated above when the domain is more than one dimension, there may also be some differences when the function is vector-valued. If \(f(x)\) is a real-valued function, differentiable on \((a, b)\) and continuous on \([a, b]\text{,}\) then the mean value theorem implies that \(f(b)-f(a)=f'(c)(b-a)\) for some \(c\) between \(a\) and \(b\text{.}\) Does this property hold for vector-valued (e.g. complex-valued) functions under similar assumptions?

Exercise 5.3.8. Mean Value Theorem.

Examine whether the mean value theorem holds for \(\bff (t)=(\cos t, \sin t)\) for \(t\in \bbR\text{.}\)

Proof.

If we assume, in addition, that \(\bff'(x)\) is continuous on \((a, b)\text{,}\) then we have an easy proof using \(\bff(b)-\bff(a)=\int_{a}^{b}\bff'(x)\, dx\text{.}\)
For the general case, it suffices to prove that for any \(\epsilon \gt 0\text{,}\)
\begin{equation} \Vert\bff(c)-\bff(a)\Vert\le (M+\epsilon)(c-a) +\epsilon \text{ for all $c\in [a, b]$.}\tag{5.3.4} \end{equation}
Fix any \(\epsilon \gt 0\text{,}\) if (5.3.4) does not hold for some \(c\text{,}\) let \(c^{*}\) be the infimum of the values \(c\in [a, b]\) for which (5.3.4) fails. First we show that \(c^{*}>a\text{.}\) This follows from the continuity of \(\bff\) at \(a\text{,}\) as it shows that for some \(\delta >0\) (5.3.4) holds for \(c\in [a, a+\delta]\text{.}\) By definition of \(c^{*}\text{,}\) (5.3.4) holds for any \(c \lt c^{*}\text{.}\) By continuity of \(\bff\) at \(c^{*}\) (5.3.4) continues to hold at \(c^{*}\text{.}\) Thus \(c^{*}\lt b\) under our assumption. Then using \(\Vert\bff'(c^{*})\Vert\le M\text{,}\) there exists some \(\delta^{*}>0\) such that \(\Vert \bff(c)-f(c^{*})\Vert\le (M+\epsilon)(c-c^{*})\) for \(c\in [c^{*}, c^{*}+\delta^{*}]\text{,}\) it then follows that, for \(c\in [c^{*}, c^{*}+\delta^{*}]\text{,}\)
\begin{align*} \Vert\bff(c)-\bff(a)\Vert\le \amp \Vert\bff(c)- \bff(c^{*})\Vert + \Vert\bff(c^{*})-\bff(a)\Vert\\ \le \amp (M+\epsilon)(c-c^{*}) + (M+\epsilon)(c^{*}-a) +\epsilon \end{align*}
showing that (5.3.4) continues to hold for \(c\in [c^{*}, c^{*}+\delta^{*}]\text{,}\) contradicting the definition of \(c^{*}\text{.}\)

Remark 5.3.10.

For a function of several variables, in general it does not make sense to define differentiability in terms of the limit of the difference quotient
\begin{equation*} \frac{ {\mathbf f}({\mathbf x}+{\mathbf h})- {\mathbf f}({\mathbf x})}{{\mathbf h}} \end{equation*}
as there is no meaningful quotient operation between vectors in a general context. There are some exceptions. When \(n=m=2\text{,}\) there is a well defined multiplication and quotient between vectors in \({\mathbb R}^{2}\) when we represent them as complex numbers. When a complex valued function \(\bff\) satisfies
\begin{equation*} \lim_{{\mathbf h} \to {\mathbf 0}}\frac{ {\mathbf f}({\mathbf x}+{\mathbf h})- {\mathbf f}({\mathbf x})}{{\mathbf h}}={\mathbf v} \text{ exists and is independent of how } {\mathbf h} \to {\mathbf 0}, \end{equation*}
it is said to have a complex derivative.
This is a stronger condition than the differentiability in the linear approximation sense over vectors in \(\bbR^{2}\) as introduced above, as that would give a function \(A\bh\) which is linear in \(\bh\) over the reals, while this condition of having a complex derivative would imply \(A(i \bh)=i A(\bh)\) and as a result \(A(e^{i\theta} \bh)=e^{i\theta} A(\bh)\) so \(D_{e^{i\theta} {\mathbf u}}f(z)=e^{i\theta }D_{{\mathbf u}}f(z).\)
In terms of vector operations in \({\mathbb R}^{2}\cong \mathbb C\text{,}\) \(e^{i\theta} {\mathbf u}\) corresponds to \(R_{\theta} {\mathbf u}\text{,}\) where \(R_{\theta} \) represents rotation with respect to the origin in \({\mathbb R}^{2}\) of angle \(\theta\text{.}\) If \(\bff\) is merely differentiable in the linear approximation sense, then
\begin{equation*} A R_{\theta} {\mathbf u} \text{ may not equal } R_{\theta}A {\mathbf u}. \end{equation*}
In fact, a complex valued function \(f\) has a complex derivative at some \(z\) if it is differentiable in the linear approximation sense, and the linear approximation \(A {\mathbf u}\) further satisfies
\begin{equation} A R_{\theta}= R_{\theta}A \text{ for any rotation matrix } R_{\theta}.\tag{5.3.5} \end{equation}
It is easy to see that if \({\mathbf f}\) is differentiable at \({\mathbf x}\) in the linear approximation sense, then the linear approximation part \(A {\mathbf h}\) is uniquely determined. It will be denoted as \({\mathbf f}'({\mathbf x}){\mathbf h} \text{.}\) It is easier to interpret this in the matrix-vector multiplication sense, with the entry in \((i, j)\) position of \({\mathbf f}'({\mathbf x})\) given by the partial derivative \(D_{j}f_{i}({\mathbf x}):=\frac{\partial f_{i} }{\partial x_{j}}({\mathbf x})\text{.}\)

Exercise 5.3.11. Matrix Commuting with Rotation Matrix.

Let \(\displaystyle{A=\begin{bmatrix} a \amp b \\ c \amp d \end{bmatrix}}\text{.}\) Verify that \(A\) satisfies (5.3.5) iff there exist some real \(r, \varphi\) such that \(\displaystyle{A=\begin{bmatrix} r\cos\varphi \amp -r \sin\varphi \\ r \sin\varphi \amp r \cos\varphi \end{bmatrix}}\text{.}\)
The most important properties regarding differentiability are the chain rule and differentiability of functions with continuous partial derivatives.

Subsection 5.3.2 Chain Rule

Proof.

We first work out \(\mathbf g \circ \mathbf f(\bx) - \mathbf g \circ \mathbf f(\bx_0)\) by using the differentiability of \(\mathbf f\) at \(\bx_0\) and of \(\mathbf g\) at \(\by_0=\mathbf f(\bx_0)\text{:}\)
\begin{align*} \mathbf g \circ \mathbf f(\bx) - \mathbf g \circ \mathbf f(\bx_0) \amp = \mathbf g (\mathbf f(\bx)) - \mathbf g ( \mathbf f(\bx_0)) \\ \amp = [D\mathbf g(\mathbf f(\bx_0))] \left[ \mathbf f(\bx)- \mathbf f(\bx_0)\right] +\bz(\mathbf f(\bx), \mathbf f(\bx_0)) \end{align*}
where \(\Vert \bz(\by, \by_0)\Vert/\Vert \by -\by_0\Vert \to 0\) as \(\by \to \by_0\text{;}\) and
\begin{equation*} \mathbf f(\bx) - \mathbf f(\bx_0)=[D\mathbf f(\bx_0)](\bx -\bx_0) +\bw (\bx, \bx_0), \end{equation*}
where \(\Vert \bw(\bx, \bx_0)\Vert/\Vert \bx -\bx_0\Vert \to 0\) as \(\bx \to \bx_0\text{,}\) so we have
\begin{align*} \mathbf g \circ \mathbf f(\bx) - \mathbf g \circ \mathbf f(\bx_0) =\amp [D\mathbf g (\mathbf f(\bx_0))][D\mathbf f(\bx_0)](\bx -\bx_0)\\ \amp + [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0) + \bz(\mathbf f(\bx), \mathbf f(\bx_0)). \end{align*}
The differentiability of \(\mathbf g \circ \mathbf f(\bx)\) at \(\bx_0\) is equivalent to
\begin{equation*} \Vert [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0) + \bz(\mathbf f(\bx), \mathbf f(\bx_0)) \Vert/\Vert \bx -\bx_0\Vert \to 0 \text{ as } \bx \to \bx_0. \end{equation*}
Using the property of matrix norm on \(\Vert [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0) \Vert\text{,}\) we have
\begin{equation*} \Vert [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0) \Vert \le \Vert [D\mathbf g (\mathbf f(\bx_0))]\Vert_{\cF} \Vert\bw(\bx, \bx_0) \Vert, \end{equation*}
therefore
\begin{equation*} \Vert [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0) \Vert/\Vert \bx -\bx_0\Vert \to 0 \text{ as } \bx \to \bx_0. \end{equation*}
For \(\bz(\mathbf f(\bx), \mathbf f(\bx_0))\text{,}\) informally
\begin{equation*} \frac{\Vert \bz (\by, \by_0) \Vert}{\Vert \bx -\bx_0 \Vert} =\frac{\Vert \bz (\by, \by_0) \Vert}{\Vert \by -\by_0\Vert} \frac{\Vert \by -\by_0\Vert}{\Vert \bx -\bx_0 \Vert}, \end{equation*}
where \(\by =\mathbf f (\bx)\) and \(\by_0=\mathbf f (\bx_0)\text{,}\)
\begin{equation*} \frac{\Vert \by -\by_0\Vert}{\Vert \bx -\bx_0 \Vert} \text{ remains bounded as $\bx \to \bx_0$ and $\frac{\Vert \bz (\by, \by_0) \Vert}{\Vert \by -\by_0\Vert} \to 0$ as $\by \to \by_0$.} \end{equation*}
But this argument has a minor flaw, for \(\Vert \by -\by_0\Vert\) could be \(0\text{.}\) To fix this issue, for any \(\epsilon \gt 0\text{,}\) there exists some \(\delta \gt 0\) such that \(\Vert \bz (\by, \by_0) \Vert \le \epsilon \Vert \by -\by_0\Vert\) whenever \(\Vert \by -\by_0\Vert \lt \delta\text{.}\) Then using
\begin{align*} \Vert \by -\by_0 \Vert \le \amp \Vert \bw (\bx, \bx_0) + [D\mathbf f(\bx_0)](\bx -\bx_0)\Vert\\ \le \amp \Vert \bw (\bx, \bx_0) \Vert + \Vert [D\mathbf f(\bx_0)](\bx -\bx_0)\Vert \\ \le \amp \Vert \bw (\bx, \bx_0) \Vert + \Vert [D\mathbf f(\bx_0)]\Vert \Vert \bx -\bx_0\Vert \end{align*}
and \(\frac{\Vert \bw (\bx, \bx_0) \Vert}{\Vert \bx -\bx_0 \Vert}\to 0\) as \(\bx\to \bx_0\text{,}\) we can find some \(\sigma \gt 0\) such that when \(\Vert \bx -\bx_0\Vert \lt \sigma\text{,}\) \(\Vert \bw (\bx, \bx_0) \Vert \lt \epsilon \Vert \bx -\bx_0\Vert\text{,}\) so
\begin{align*} \Vert \by-\by_0 \Vert\le \amp \epsilon \Vert \bx -\bx_0\Vert + \Vert [D\mathbf f(\bx_0)]\Vert \Vert \bx -\bx_0\Vert\\ \le \amp \left( \epsilon + \Vert [D\mathbf f(\bx_0)]\Vert\right) \Vert \bx -\bx_0\Vert \lt \delta. \end{align*}
Putting these together, when \(\Vert \bx -\bx_0\Vert \lt \sigma\) we have
\begin{equation*} \Vert \bz (\by, \by_0) \Vert \le \epsilon \Vert \by-\by_0 \Vert\le \epsilon \left( \epsilon + \Vert [D\mathbf f(\bx_0)]\Vert\right) \Vert \bx -\bx_0\Vert, \end{equation*}
which shows the differentiability of \(\bg \circ \bff \) at \(\bx_0\text{.}\)

Example 5.3.13.

Suppose that \(\mathbf f:D\subset \bbR^n\mapsto E\subset \bbR^m\) and \(\mathbf g:E \mapsto D\) are inverse of each other:
\begin{equation*} {\mathbf g}\circ {\mathbf f}(\bx)=\bx \text{ for all $\bx\in D$ and ${\mathbf f}\circ {\mathbf g}(\by)=\by$ for all $\by \in E$.} \end{equation*}
Suppose further that \(\mathbf f\) is differentiable at \(\bx_0\text{,}\) with \(\by_0=\mathbf f(\bx_0) \in E\text{,}\) and \(\mathbf g\) is differentiable at \(\by_0\text{.}\) Then
\begin{equation*} [D \mathbf g(\by_{0})] [D\mathbf f(\bx_{0})] =I_{n\times n}, \quad [D \mathbf f(\bx_{0})] [D \mathbf g(\by_{0})]=I_{m\times m}. \end{equation*}
As a result both \([D \mathbf f(\bx_{0})]\) and \([D \mathbf g(\by_{0})]\) are inverse matrices of each other and \(n=m\text{.}\)

Exercise 5.3.14. Jacobian Matrix of Composite Function.

Define \(S(x, y)=(x^{2}-y^{2}, 2 x y)\) and \(f(x, y)=(e^{x}\cos y, e^{x}\sin y)\text{.}\) Compute the Jacobian matrix of \(S, f\text{,}\) \(f\circ S\text{,}\) and \(S\circ f\text{.}\)

Exercise 5.3.15. Chain Rule Involving Polar Coordinates.

Suppose that \(f(x, y)\) is differentiable for \((x, y)\in \bbR^2\text{.}\) Let \((r, \theta)\) be the polar coordinates of \((x, y)\text{,}\) namely \((x, y)=P(r,\theta)=(r\cos\theta, r\sin\theta)\text{.}\) Compute the Jacobian matrix of \(P\) and verify that
\begin{equation*} \left\{ \begin{aligned} \frac{\partial f}{\partial r} &= \cos \theta \frac{\partial f}{\partial x} +\sin \theta \frac{\partial f}{\partial y}\\ \frac{\partial f}{\partial \theta} &= - r\sin \theta \frac{\partial f}{\partial x} +r \cos \theta \frac{\partial f}{\partial y} \end{aligned} \right. \end{equation*}
In Matrix form, this is written as
\begin{equation*} \begin{bmatrix} \frac{\partial f}{\partial r} & \frac{\partial f}{\partial \theta} \end{bmatrix} = \begin{bmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \end{bmatrix} \begin{bmatrix} \cos \theta & -r \sin\theta \\ \sin\theta & r\cos\theta \end{bmatrix} \end{equation*}
Note that we have abused the notation on the left hand side, as the function on the left hand side really represents the composition \(f\circ P\) of \(f\) with \(P\text{.}\)
Note also that if we use \(r^{-1}\frac{\partial f }{\partial \theta}\) instead of \(\frac{\partial f}{\partial \theta}\) in the relation above we would get a simpler relation using an orthogonal matrix:
\begin{equation*} \begin{bmatrix} \frac{\partial f}{\partial r} & r^{-1} \frac{\partial f}{\partial \theta} \end{bmatrix} = \begin{bmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \end{bmatrix} \begin{bmatrix} \cos \theta & - \sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \end{equation*}

Subsection 5.3.3 A Differentiability Criterion

Proof.

Suppose that \(\bff (\bx)=(f_1(\bx),\cdots, f_m(\bx))\text{.}\) It suffices to prove that each component function \(f_i(\bx)\) is differentiable at \(\bx_0\text{.}\) For simplicity, we first write up a proof for the case of \(n=2\) and set \(\bx_0=(0, 0)\text{.}\)
For \(\bx=(x_1, x_2)\text{,}\) we have
\begin{align*} f_i(\bx)-f_i(\bx_0)=\amp f_i(x_1, x_2)-f_i(x_1, 0)+f_i(x_1, 0)-f_i(0, 0)\\ = \amp \frac{\partial f_i}{\partial x_2}(x_1, b)x_2+ \frac{\partial f_i}{\partial x_1}(a, 0)x_1\\ = \amp \frac{\partial f_i}{\partial x_1}(0, 0) x_1 + \frac{\partial f_i}{\partial x_2}(0, 0)x_2 + \left[ \frac{\partial f_i}{\partial x_1}(a, 0)-\frac{\partial f_i}{\partial x_1}(0, 0)\right] x_1 \\ \amp + \left[ \frac{\partial f_i}{\partial x_2}(x_1, b)- \frac{\partial f_i}{\partial x_2}(0, 0)\right] x_2 \end{align*}
for some \(a\) between \(x_1\) and \(0\) and some \(b\) between \(x_2\) and \(0\text{.}\) Using the continuity at \(\bx_0=(0, 0)\) of the partial derivatives, for any \(\epsilon \gt 0\text{,}\) we can find some \(\delta \gt 0\) such that whenever \(\Vert \bx -\bx_0\Vert \lt \delta\text{,}\) we have \(\vert \frac{\partial f_i}{\partial x_j}(\bx)- \frac{\partial f_i}{\partial x_j}(\bx_0)\vert \lt \epsilon\text{.}\) This implies that
\begin{align*} \amp \vert \left[ \frac{\partial f_i}{\partial x_1}(a, 0)-\frac{\partial f_i}{\partial x_1}(0, 0)\right] x_1+ \left[ \frac{\partial f_i}{\partial x_2}(x_1, b)- \frac{\partial f_i}{\partial x_2}(0, 0)\right] x_2\vert \\ \le \amp \epsilon \left(|x_1|+|x_2|\right)\le \sqrt 2 \epsilon \Vert \bx-\bx_0\Vert, \end{align*}
which shows the differentiability of \(f_i(\bx)\) at \(\bx_0=(0, 0)\text{.}\) The general case can be worked out in a similar way.

Remark 5.3.17.

The converse of the above theorem does not hold, even in one dimension. Here is a simple example:
\begin{equation*} f(x)=\begin{cases} x^{2}\sin\frac 1x \amp\text{ if } x\ne 0, \\ 0 \amp\text{ if } x=0.\\ \end{cases} \end{equation*}
By definition, \(f'(0)=0\text{,}\) and \(\frac{f(x)-f(0)-f'(0)x}{x}=x\sin\frac 1x \to 0\) as \(x\to 0 \text{,}\) so it is differentiable at \(x=0\text{.}\) Yet
\begin{equation*} f'(x)=\begin{cases}2 x \sin\frac 1x- \cos \frac 1x \amp\text{ if } x\ne 0, \\ 0 \amp\text{ if } x=0,\\ \end{cases} \end{equation*}
and it’s clear that \(f'(x)\) is not continuous near \(x=0\text{.}\)

Exercise 5.3.18. Differentiability of a norm function on \(\bbR^{n}\).

Consider the norm \(||(x_{1}, \ldots, x_{n})||_{p}:= \left( \sum_{i=1}^{n}|x_{i}|^{p}\right)^{1/p}\) for some \(1 \le p \lt \infty\) as a function on \(\bbR^{n}\text{.}\) Identify the set of points at which this function is differentiable. Repeat the exercise when the norm is \(||(x_{1}, \ldots, x_{n})||_{\infty}:=\max_{1\le i \le n} |x_{i}|\text{.}\)