Inverse and Implicit Function Theorems

Section 5.5 Inverse and Implicit Function Theorems

Subsection 5.5.1 Inverse Function Theorem

Theorem 5.5.1.

Suppose that \({\mathbf f}\) is a differentiable mapping from a neighborhood of \({\mathbf x}_0\) in \(\mathbb R^n\) into \(\mathbb R^n\text{,}\) with \({\mathbf y}_0 = {\mathbf f}({\mathbf x}_0)\text{,}\) and that \(D {\mathbf f}({\mathbf x}_{0})\) is bijective and \(D {\mathbf f}({\mathbf x})\) is continuous at \({\mathbf x}_{0}\text{.}\) Then

There is a ball \(B({\mathbf y}_0, r)\) in \(\mathbb R^n\) and a neighborhood \(U\) of \({\mathbf x}_0\) in \(\mathbb R^n\) such that \(\forall {\mathbf y} \in B({\mathbf y}_0, r), \exists ! {\mathbf x} \in U\) satisfying \({\mathbf f}({\mathbf x}) = {\mathbf y}\text{,}\) and \({\mathbf x}\) depends on \({\mathbf y}\) continuously.

🔗
Denote the mapping \({\mathbf y} \mapsto {\mathbf x}\) by \({\mathbf x} ={\mathbf f}^{-1} ({\mathbf y})\) for \({\mathbf y} \in B({\mathbf y}_0, r)\text{.}\) Then \({\mathbf f}^{-1}\) is differentiable at \({\mathbf y}_0\text{,}\) and
\begin{equation*} D{\mathbf f}^{-1} ({\mathbf y_{0}}) = \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}. \end{equation*}

🔗
If, furthermore, we assume that \({\mathbf f}\) is \(C^1\) in this neighborhood, then \({\mathbf f}^{-1}\) is \(C^1\) in \(B({\mathbf y}_0, r)\) and
\begin{equation*} D{\mathbf f}^{-1} ({\mathbf y}) = \left[D{\mathbf f}({\mathbf x})\right]^{-1}|_{x={\mathbf f}^{-1}({\mathbf y})}. \end{equation*}

🔗

🔗

Proof.

The heuristic idea is to use the differentiability of \({\mathbf f}\) to approximate \({\mathbf f}({\mathbf x}) \text{,}\) when \(\bx\) is near \(\bx_{0}\text{,}\) by the mapping \({\mathbf f}({\mathbf x}_{0})+D{\mathbf f}({\mathbf x}_{0})({\mathbf x}-{\mathbf x}_{0})\text{.}\) If there were no remainder term, then solving \({\mathbf f}({\mathbf x})= {\mathbf y}\) would be equivalent to solving

\begin{equation*} {\mathbf f}({\mathbf x}_{0})+D{\mathbf f}({\mathbf x}_{0})({\mathbf x}-{\mathbf x}_{0})= {\mathbf y}\text{,} \end{equation*}

which is straightforward under the assumption that \(D{\mathbf f}({\mathbf x}_{0})\) is bijective. In the presence of the remainder

\begin{equation*} {\mathbf r}({\mathbf x}; {\mathbf x}_{0}) := {\mathbf f}({\mathbf x}) -\left\{ {\mathbf f}({\mathbf x}_{0})+D{\mathbf f}({\mathbf x}_{0})({\mathbf x}-{\mathbf x}_{0}) \right\} \end{equation*}

we would need to solve

\begin{equation} {\mathbf r}({\mathbf x}; {\mathbf x}_{0})+ {\mathbf f}({\mathbf x}_{0}) + D{\mathbf f}({\mathbf x}_{0})({\mathbf x}-{\mathbf x}_{0})= {\mathbf y} .\tag{5.5.1} \end{equation}

Using \({\mathbf y}_{0}={\mathbf f}({\mathbf x}_{0})\) and the invertibility assumption on \(D{\mathbf f}({\mathbf x}_{0})\text{,}\) the above is equivalent to

\begin{equation*} {\mathbf x}-{\mathbf x}_{0}= \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left\{ {\mathbf y}- {\mathbf y}_{0}-{\mathbf r}({\mathbf x}; {\mathbf x}_{0})\right\}. \end{equation*}

In other words, we look for a fixed point \({\mathbf x}\) near \({\mathbf x}_{0}\) of the mapping

\begin{align*} \phi ({\mathbf x}) := \amp {\mathbf x}_{0} + \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left\{ {\mathbf y}- {\mathbf y}_{0}-{\mathbf r}({\mathbf x}; {\mathbf x}_{0})\right\}\\ =\amp {\mathbf x}_{0} + \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left\{ {\mathbf y}- {\mathbf f}({\mathbf x}) + D{\mathbf f}({\mathbf x}_{0})({\mathbf x}-{\mathbf x}_{0}) \right\}\\ =\amp {\mathbf x}+ \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left\{ {\mathbf y}- {\mathbf f}({\mathbf x}) \right\}. \end{align*}

\(\phi ({\mathbf x})\) is differentiable whenever \({\mathbf f}({\mathbf x})\) is and

\begin{equation*} D\phi ({\mathbf x})= I - \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}D {\mathbf f}({\mathbf x}) =\left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left[ D{\mathbf f}({\mathbf x}_{0})- D {\mathbf f}({\mathbf x})\right]. \end{equation*}

The continuity assumption of \(D {\mathbf f}({\mathbf x})\) at \({\mathbf x}_{0}\) implies the existence of some \(\delta >0\) such that

\begin{equation} \Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left[ D{\mathbf f}({\mathbf x}_{0})- D {\mathbf f}({\mathbf x})\right] \Vert \le \frac 12 \text{ for all } {\mathbf x}, \Vert {\mathbf x}-{\mathbf x}_{0}\Vert \le \delta.\tag{5.5.2} \end{equation}

By Theorem 5.3.9 it follows that

\begin{equation} \Vert \phi ({\mathbf x}_{1})-\phi ({\mathbf x}_{2})\Vert \le \frac 12 ||{\mathbf x}_{1}-{\mathbf x}_{2}\Vert\tag{5.5.3} \end{equation}

for \({\mathbf x}_{1}, {\mathbf x}_{2}\in \overline{B({\mathbf x}_{0}, \delta)}\text{.}\) It remains to show that one can choose an open set \(V\) containing \({\mathbf y}_{0}\) such that \(\phi ({\mathbf x})\in \overline{B({\mathbf x}_{0}, \delta)}\) when \({\mathbf x}\in \overline{B({\mathbf x}_{0}, \delta)}\) and \({\mathbf y}\in V\text{.}\)

🔗

Using

\begin{equation*} \Vert \phi ({\mathbf x}_{0})- {\mathbf x}_{0}\Vert \le \Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\Vert \Vert {\mathbf y}-{\mathbf y}_{0}\Vert < \frac{\delta}{2}, \end{equation*}

if \({\mathbf y}\) satisfies \(\Vert {\mathbf y}-{\mathbf y}_{0}\Vert < r\text{,}\) where

\begin{equation*} r \Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\Vert = \frac{\delta}{2}. \end{equation*}

then, with (5.5.3), for any \({\mathbf x} \in \overline{ B({\mathbf x}_{0}, \delta)}\text{,}\)

\begin{equation*} \Vert \phi ({\mathbf x})- {\mathbf x}_{0}\Vert \le \Vert \phi ({\mathbf x})-\phi ( {\mathbf x}_{0})\Vert + \Vert \phi ({\mathbf x}_{0})- {\mathbf x}_{0}\Vert < \frac 12 \Vert {\mathbf x}-{\mathbf x}_{0} \Vert + \frac{\delta}{2} \le \delta. \end{equation*}

So if we now take \(V=B({\mathbf y}_{0}, r)\text{,}\) then \(\phi ({\mathbf x})\) would be a contraction on \(\overline{B({\mathbf x}_{0}, \delta)}\text{,}\) thus has a unique fixed point \({\mathbf x}\) in it. In fact any fixed point \({\mathbf x}\) satisfies \(\Vert {\mathbf x}- {\mathbf x}_{0}\Vert < \delta\text{.}\) This gives a well defined inverse of \({\mathbf f}\) on \(V=B({\mathbf y}_{0}, r)\) with \({\mathbf f}^{-1}: V\mapsto U:={\mathbf f}^{-1}(V)\cap B({\mathbf x}_{0}, \delta)\text{,}\) where \(U\) is open and non-empty.

🔗

Next we prove the continuity of \(\bff^{-1}(\by)\) for \(\by \in B({\mathbf y}_{0}, r)\text{.}\) Set \({\mathbf x}_{1}= {\mathbf f}^{-1} ({\mathbf y}_{1}), {\mathbf x}_{2}= {\mathbf f}^{-1} ({\mathbf y}_{2})\) for \({\mathbf y}_{1}, {\mathbf y}_{2}\in B({\mathbf y}_{0}, r)\text{,}\) we examine

\begin{align} \amp {\mathbf y}_{2}- {\mathbf y}_{1}\notag\\ = \amp {\mathbf f}({\mathbf x}_2)- {\mathbf f}({\mathbf x}_1)\notag\\ = \amp \left( {\mathbf f}({\mathbf x}_2)- \left[D{\mathbf f}({\mathbf x}_{0})\right]{\mathbf x}_2 \right) - \left({\mathbf f}({\mathbf x}_1)- [D{\mathbf f}({\mathbf x}_{0})]{\mathbf x}_1\right) + [D{\mathbf f}({\mathbf x}_{0})]({\mathbf x}_2 - {\mathbf x}_1),\tag{5.5.4} \end{align}

where we use \({\mathbf f}({\mathbf x})- [D{\mathbf f}({\mathbf x}_{0})]{\mathbf x}\) in place of \({\mathbf f}({\mathbf x})\) as its derivative in \({\mathbf x}\) is \(D{\mathbf f}({\mathbf x})- D{\mathbf f}({\mathbf x}_{0})\text{,}\) which satisfies (5.5.2) when \({\mathbf x}\in B({\mathbf x}_{0}, \delta)\) due to the continuity of \(D{\mathbf f}({\mathbf x})\) at \({\mathbf x}_{0}\text{.}\) Thus

\begin{equation*} \Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left[ \left( {\mathbf f}({\mathbf x}_2)- \left[D{\mathbf f}({\mathbf x}_{0})\right]{\mathbf x}_2 \right) - \left({\mathbf f}({\mathbf x}_1)- [D{\mathbf f}({\mathbf x}_{0})]{\mathbf x}_1\right) \right] \Vert \le \frac 12 \Vert {\mathbf x}_2 - {\mathbf x}_1 \Vert\text{.} \end{equation*}

We get from (5.5.4)

\begin{align*} \Vert {\mathbf x}_2 - {\mathbf x}_1 \Vert \le \amp \Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left( {\mathbf y}_{2}- {\mathbf y}_{1}\right)\Vert\\ \amp + \Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left[ \left( {\mathbf f}({\mathbf x}_2)- \left[D{\mathbf f}({\mathbf x}_{0})\right]{\mathbf x}_2 \right) - \left({\mathbf f}({\mathbf x}_1)- [D{\mathbf f}({\mathbf x}_{0})]{\mathbf x}_1\right) \right] \Vert\\ \le \amp \Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\Vert \Vert {\mathbf y}_{2}- {\mathbf y}_{1}\Vert+ \frac 12 \Vert {\mathbf x}_2 - {\mathbf x}_1 \Vert. \end{align*}

This then implies

\begin{equation} \Vert {\mathbf x}_{1}-{\mathbf x}_{2}\Vert \le 2 \Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\Vert \Vert {\mathbf f}({\mathbf x}_2)- {\mathbf f}({\mathbf x}_1) \Vert,\tag{5.5.5} \end{equation}

which shows the (Lipschitz) continuity of \({\mathbf f}^{-1}\) on \(V\text{.}\)

🔗

(5.5.4) can also rewritten as

\begin{equation*} \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\left[{\mathbf y}_{2}- {\mathbf y}_{1}\right]= {\mathbf x}_2 - {\mathbf x}_1- \left[\phi({\mathbf x}_2)-\phi({\mathbf x}_1)\right], \end{equation*}

where \(\phi\) is defined with respect to either \({\mathbf y}_{1}\) or \({\mathbf y}_{2}\) and satisfies (5.5.3). This would lead to the same conclusion as (5.5.5).

🔗

Finally, the relation (5.5.1) gives rise to

\begin{equation*} {\mathbf x}- {\mathbf x}_{0}=\left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1} ({\mathbf y}- {\mathbf y}_{0}) - \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1} {\mathbf r}({\mathbf x}; {\mathbf x}_{0}). \end{equation*}

Applying (5.5.5) with \({\mathbf x}, {\mathbf x}_{0}\) we see that, as \(\Vert {\mathbf y}- {\mathbf y}_{0}\Vert \to 0\text{,}\)

\begin{equation*} \frac{\Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1} {\mathbf r}({\mathbf x}; {\mathbf x}_{0}) \Vert} {\Vert {\mathbf y}- {\mathbf y}_{0}\Vert } \le \frac{2\Vert \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}\Vert^{2} \Vert {\mathbf r}({\mathbf x}; {\mathbf x}_{0}) \Vert}{ \Vert {\mathbf x}- {\mathbf x}_{0} \Vert } \to 0\text{.} \end{equation*}

This shows the differentiability of \({\mathbf f}^{-1}\) at \({\mathbf y}_{0}\) with

\begin{equation*} D{\mathbf f}^{-1} ({\mathbf y_{0}}) = \left[D{\mathbf f}({\mathbf x}_{0})\right]^{-1}. \end{equation*}

🔗

Remark 5.5.2.

In our formulation and proof, we take \(V\) to be a ball and define \({\mathbf f}^{-1}\) on \(V\text{.}\) Since \(U=\bff^{-1}(V)\) is open and contains \(\bx_{0}\text{,}\) there exists some \(\delta' >0\) such that \(B(\bx_{0}, \delta')\subset U\text{,}\) then \(\bff(B(\bx_{0}, \delta'))\) is an open neighborhood of \(\by_{0}\) and \(\bff^{-1}\) is then a well defined inverse function of \(\bff\) on this open set.

🔗

In fact one can also prove that \(\bff\) has a well defined inverse on \(B(\bx_{0}, \delta)\text{,}\) where \(\delta\) is chosen to satisfy (5.5.2).

🔗

Exercise 5.5.3.

Identify points \((x_{0}, y_{0})\) where the function \(S(x, y)=(x^{2}-y^{2}, 2 x y)\) satisfies the conditions of the Inverse Function Theorem; at points \((x_{0}, y_{0})\) where the conditions of the Inverse Function Theorem are not satisfied, examine the solvability of \(S(x, y)=(u, v)\) for \((u, v)\) near \(S(x_{0}, y_{0})\text{.}\)

🔗

Exercise 5.5.4.

Compute the Jacobian matrix of the function \((u, v)=f(x, y)=(e^{x}\cos y, e^{x}\sin y)\) and verify that it satisfies the conditions of the Inverse Function Theorem. Determine the largest disc \(U\) around \((x, y)=(0, 0)\) on which the function \(f\) has a well-defined differentiable inverse function. Determine the largest disc \(V\) around \((u, v)=(1, 0)\) on which \(f^{-1}\) is a well-defined differentiable function. What if one drops the condition that \(U\) or \(V\) be a disc, but just requires it to be an open set?

🔗

Example 5.5.5. A geometric application of the Inverse Function Theorem.

Suppose that \(\bff(x)\) is a mapping from a neighborhood of \(\bx_0\) in \(\mathbb R^m \) into \(\mathbb R^n\) (\(m \lt n\)), with \(\bff(\bx_0)=\by_0=(y_{01},\ldots, y_{0n})\text{,}\) that it has continuous partial derivatives and the submatrix \([\frac{\partial f_i}{\partial x_j}], 1 \le i, j \le m\text{,}\) of the the Jacobian matrix \(\bff_{\bx}(\bx_0)\) invertible. Such an \(\bff\) is called an immersion near \(\bx_0\text{.}\)

🔗

Then by the Inverse Function Theorem the mapping \(\bx\mapsto (f_{1}(\bx),\ldots, f_{m}(\bx))\) has a differentiable inverse defined in a neighborhood \(V\) of \((y_{01},\ldots, y_{0m})\text{.}\) Call it \(\Phi\) and let \(U=\Phi(V)\text{.}\) Then \(U\) is an open neighborhood of \(\bx_{0}\) in \(\bbR^{m}\text{,}\) and any point \(\bff(\bx)\) for \(\bx \in U\) can be represented as

\begin{equation*} (y_{1},\ldots, y_{m}, f_{m+1}(\Phi(y_{1},\ldots, y_{m})), \ldots, f_{n}(\Phi(y_{1},\ldots, y_{m}))) \end{equation*}

for \((y_{1},\ldots, y_{m})\in V\text{.}\) Namely \(\bff(U)\) is represented as a graph over \(V\) with continuous partial derivatives.

🔗

Note that \(\bff\) could have been defined on a bigger domain \(\cM\text{,}\) and the above discussion only says that when \(\bff\) is restricted to a small domain \(U\) near \(\bx_{0}\text{,}\) \(\bff(U)\) is represented as a graph; it does not say that \(\bff(\cM)\) in a neighborhood of \(\bff(\bx_{0})\) is represented as a graph. A simple example is the Lemniscate of Genoro given by \(t\mapsto G(t):=(x, y)=(\sin t, \sin t \cos t)\) for \(t\in \bbR\text{.}\) Its image in \(\bbR^{2}\) is a figure eight crossing the origin. Near \(t=0\text{,}\) \(x'(0)=\cos 0=1, y'(0)=\cos^{2}0-\sin^{2}0=1\text{,}\) so \(t\mapsto x=\sin t\) has an inverse \(t=\sin^{-1}(x)\) near \(x=0\text{,}\) from which one gets \(y\) as a function of \(x\text{,}\) and \(\frac{d y}{dx}\) can be computed via the chain rule

\begin{equation*} \frac{d y}{dx}=\frac{dy}{dt} \frac{dt}{dx}= \frac{dy}{dt}/ \frac{dx}{dt}, \end{equation*}

which evaluates to \(1\) at \(t=0\text{.}\) Note that in the case here the parametric curve can be represented explicitly as \(y=x\sqrt{1-x^{2}}\text{;}\) but this graph does not include the other branch crossing \((0, 0)\text{.}\)

🔗

But we could equally apply the Inverse Function Theorem to \(t\mapsto y=\sin t \cos t\) at \(t=0\) here, as \(y'(0)=1\text{,}\) from which we get an inverse \(t=g(y)\)---we can work out an explicit form for \(g(y)\) here, but we can carry on the analysis without knowing it. So the same parametric curve near \(t=0\) can also be represented as \(x=\sin (g(y))\text{.}\)

🔗

Exercise 5.5.6.

Set \(g_{k}(y_{1},\ldots, y_{m})=f_{k}(\Phi(y_{1},\ldots, y_{m}))\) for \(k=m+1,\ldots, n\) in the above setting. Use the chain rule to determine \(\frac{\partial g_k}{\partial y_j}\) in terms of the \(\frac{\partial f_i}{\partial x_j}, 1 \le i\le n, 1\le j \le m\text{.}\)

🔗

Exercise 5.5.7.

Can the Inverse Function Theorem be applied to the Lemniscate of Genoro with respect to the \(x\) variable at \(t=\pi\text{?}\) If so, find \(\frac{dy}{dx}\) at \((0, 0)\) where \(y=g(x)\) is a graph representation for \(G(\pi -\delta, \pi+\delta)\text{.}\)

🔗

Can the Inverse Function Theorem be applied with respect to the \(y\) variable at \(t=0\text{,}\) \(t=\frac{\pi}{2}\text{,}\) and \(t=\pi\text{?}\) If so, find \(\frac{dx}{dy}\) at the corresponding point.

🔗

Can the Inverse Function Theorem be applied with respect to the \(x\) variable at \(t=\frac{\pi}{2}\text{?}\)

🔗

Subsection 5.5.2 Implicit Function Theorem

We first introduce some notation. Suppose that \({\mathbf f}({\mathbf x},{\mathbf y})\) is differentiable in \(({\mathbf x},{\mathbf y})\in \mathbb R^n \times \mathbb R^m\) at some \(({\mathbf x}_{0},{\mathbf y}_{0})\text{.}\) \(D{\mathbf f}({\mathbf x}_{0},{\mathbf y}_{0})\) is used to represent the Jacobian derivative of \(\bff\) at \(({\mathbf x}_{0},{\mathbf y}_{0})\text{,}\) which is a linear function on \(\mathbb R^n \times \mathbb R^m\text{.}\) We use \(D_{\bx}{\mathbf f}({\mathbf x}_{0},{\mathbf y}_{0})\) to represent the restriction of this linear function on \(\mathbb R^n \times \{\mathbf 0\}\text{.}\) In other words

\begin{equation*} D_{\bx}{\mathbf f}({\mathbf x}_{0},{\mathbf y}_{0})\bh=D{\mathbf f}({\mathbf x}_{0},{\mathbf y}_{0})(\bh, 0) \quad \bh \in \bbR^{n}\text{.} \end{equation*}

Similarly we define \(D_{\by}{\mathbf f}({\mathbf x}_{0},{\mathbf y}_{0})\) by

\begin{equation*} D_{\by}{\mathbf f}({\mathbf x}_{0},{\mathbf y}_{0})\bk=D{\mathbf f}({\mathbf x}_{0},{\mathbf y}_{0})(0, \bk) \quad \bk\in \bbR^{m}\text{.} \end{equation*}

🔗

If \({\mathbf f}({\mathbf x},{\mathbf y})\) is differentiable in \(\bx\) when \(\by\) is held fixed, we also use \(D_{\bx}{\mathbf f}({\mathbf x},{\mathbf y})\) to denote its derivative in \(\bx\) at \(({\mathbf x},{\mathbf y})\text{.}\)

🔗

This notation has a small chance of getting confused with the notation \(D_{\bu}{\mathbf f}({\mathbf x}_{0},{\mathbf y}_{0})\) for the directional derivative of \(\bff\) at \(({\mathbf x}_{0},{\mathbf y}_{0})\) in the direction of \(\bu\text{,}\) but one can usually tell the difference from the context.

🔗

Theorem 5.5.8. Implicit Function Theorem.

Let \({\mathbf f}({\mathbf x},{\mathbf y})\) be a continuous mapping from a neighborhood \(U_{0}\times V_{0}\) of \(({\mathbf x}_0,{\mathbf y}_0)\) in \(\mathbb R^n \times \mathbb R^m\) into \(\mathbb R^n\text{,}\) with \({\mathbf f}({\mathbf x}_0,{\mathbf y}_0)={\mathbf 0}\text{.}\) Assume that \({\mathbf f}({\mathbf x},{\mathbf y})\) is differentiable in the \({\mathbf x}\) variable in this neighborhood and \(D_{{\mathbf x}}{\mathbf f}({\mathbf x},{\mathbf y})\) is continuous at \(({\mathbf x}_0,{\mathbf y}_0)\text{.}\) If \(D_{{\mathbf x}}{\mathbf f}({\mathbf x}_0,{\mathbf y}_0)\) is a bijection from \(\mathbb R^n\) onto \(\mathbb R^n\) , then

There is a ball \(B({\mathbf y}_0, r)\subset V_{0}\) in \(\mathbb R^m\) and a neighborhood \(U\subset U_{0}\) of \({\mathbf x}_0\) in \(\mathbb R^n\) such that \(\forall {\mathbf y} \in B({\mathbf y}_0, r), \exists ! {\mathbf x} \in U\) satisfying \(\bff({\mathbf x}, {\mathbf y})={\mathbf 0}\text{,}\) and \({\mathbf x}\) depends on \({\mathbf y}\) continuously.

🔗
Denote the mapping \({\mathbf y} \mapsto {\mathbf x}\) by \({\mathbf x} = {\mathbf u}({\mathbf y})\text{.}\) If \(\bff\) is \(C^1\) jointly in \(({\mathbf x}, {\mathbf y})\in U\times B({\mathbf y}_0, r)\text{,}\) then \({\mathbf u}\) is \(C^1\) in \({\mathbf y}\in B({\mathbf y}_0, r)\) and
\begin{equation*} D{\mathbf u} ({\mathbf y}) = -\left[D_{\mathbf x}{\mathbf f}({\mathbf u}({\mathbf y}),{\mathbf y}))\right]^{-1} \circ D_{\mathbf y}{\mathbf f}({\mathbf u}({\mathbf y}),{\mathbf y}). \end{equation*}

🔗

🔗

Proof.

The heuristic for the first part is similar to that in proving the Inverse Function Theorem: for \((\bx, \by)\in U_{0}\times V_{0})\text{,}\) use the linear approximation of \({\mathbf f}\) in the \({\mathbf x}\) variable to approximate \({\mathbf f}({\mathbf x}, {\mathbf y})\text{;}\) more precisely,

\begin{equation*} {\mathbf r}({\mathbf x}; {\mathbf x}_{0}, {\mathbf y}):={\mathbf f}({\mathbf x}, {\mathbf y})- {\mathbf f}({\mathbf x}_{0}, {\mathbf y})-D_{{\mathbf x}} {\mathbf f}({\mathbf x}_{0}, {\mathbf y}) ({\mathbf x}-{\mathbf x}_{0}). \end{equation*}

Then the equation \({\mathbf f}({\mathbf x}, {\mathbf y})={\mathbf 0}\) is equivalent to

\begin{equation*} {\mathbf r}({\mathbf x}; {\mathbf x}_{0}, {\mathbf y})=- {\mathbf f}({\mathbf x}_{0}, {\mathbf y})-D_{{\mathbf x}} {\mathbf f}({\mathbf x}_{0}, {\mathbf y}) ({\mathbf x}-{\mathbf x}_{0}), \end{equation*}

or \({\mathbf x}\) is a fixed point of the mapping

\begin{align*} \phi ({\mathbf x}) := \amp {\mathbf x}_{0} + \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y})\right]^{-1} \left\{ - {\mathbf f}({\mathbf x}_{0}, {\mathbf y}) -{\mathbf r}({\mathbf x}; {\mathbf x}_{0}, {\mathbf y})\right\}\\ =\amp {\mathbf x}_{0} + \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y})\right]^{-1} \left\{ - {\mathbf f}({\mathbf x}, {\mathbf y}) + D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y})({\mathbf x}-{\mathbf x}_{0}) \right\}\\ =\amp {\mathbf x}+ \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y})\right]^{-1} \left\{ - {\mathbf f}({\mathbf x}, {\mathbf y}) \right\}. \end{align*}

🔗

Then in a similar way one can show the existence of a unique fixed point in \(\overline{B({\mathbf x}_{0}, \delta)}\) of \(\phi \) when \(\delta>0\) and \(r>0\) are chosen appropriately so that \(\phi\) satisfies (5.5.3) for \({\mathbf x} \in B({\mathbf x}_{0}, \delta), {\mathbf y}\in B({\mathbf y}_0, r) \text{.}\) This shows the existence of \({\mathbf x} = {\mathbf u}({\mathbf y})\) for \({\mathbf y}\in B({\mathbf y}_0, r) \text{.}\)

🔗

In fact, there is some flexibility in setting up \(\phi\text{.}\) One could use a modified \(\phi\) such as

\begin{equation*} \phi ({\mathbf x}) :={\mathbf x}+ \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1} \left\{ - {\mathbf f}({\mathbf x}, {\mathbf y}) \right\} \end{equation*}

and use its fixed point to construct \({\mathbf x} = {\mathbf u}({\mathbf y})\text{.}\)

🔗

To prove the continuity of \({\mathbf x} = {\mathbf u}({\mathbf y})\text{,}\) one takes \({\mathbf y}_{1}, {\mathbf y}_{2}\in B({\mathbf y}_0, r) \) and tries to use the relation

\begin{align*} {\mathbf 0} =\amp {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})- {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\\ =\amp {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})-{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})+ {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})- {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1}) \end{align*}

and the information that \({\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})- {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1}) \to \mathbf 0\) as \(\by_{2}\to \by_{1}\) to show that \({\mathbf u}({\mathbf y}_{2})\to {\mathbf u}({\mathbf y}_{1})\text{.}\)

🔗

But a standard application of the mean value theorem can only give an upper bound of \(\Vert {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})-{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})\Vert\) in terms of \(\Vert {\mathbf u}({\mathbf y}_{2})- {\mathbf u}({\mathbf y}_{1})\Vert\text{.}\) Since \(D_{\bx}\bff(\bx, \by_{2})\) is close to \(D_{\bx}\bff(\bx_{0}, \by_{0})\) when \(\bx \in B({\mathbf x}_{0}, \delta)\) and \(\by \in B(\by_{0}, r)\text{,}\) the derivative of \(\bff(\bx, \by_{2})-D_{\bx}\bff(\bx_{0}, \by_{0})\bx\) with respect to \(\bx\) is small in the same neighorhood. In other words,

\begin{equation*} \bff(\bx, \by_{2})=D_{\bx}\bff(\bx_{0}, \by_{0})\bx+ \left[ \bff(\bx, \by_{2})-D_{\bx}\bff(\bx_{0}, \by_{0})\bx\right] \end{equation*}

"behaves like" \(D_{\bx}\bff(\bx_{0}, \by_{0})\bx\) as a function of \(\bx\in B({\mathbf x}_{0}, \delta)\text{.}\) We implement this as

\begin{align*} {\mathbf 0} =\amp {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})- {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\\ =\amp \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right] \left[{\mathbf u}({\mathbf y}_{2})- {\mathbf u}({\mathbf y}_{1})\right] + {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})- {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\\ + \amp \left\{ {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})- D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})]{\mathbf u}({\mathbf y}_{2})\right\} -\left\{ {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})- D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})]{\mathbf u}({\mathbf y}_{1})\right\} \end{align*}

from which one gets

\begin{align*} {\mathbf u}({\mathbf y}_{2})- {\mathbf u}({\mathbf y}_{1}) =\amp - \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1} \left\{ {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})- {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\right\}\\ \amp+ \left\{{\mathbf u}({\mathbf y}_{2})- \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1}{\mathbf f}({\mathbf u}({\mathbf y}_{2}),{\mathbf y}_{2})\right\}\\ \amp - \left\{{\mathbf u}({\mathbf y}_{1})+ \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1}{\mathbf f}({\mathbf u}({\mathbf y}_{1}),{\mathbf y}_{2})\right\} \end{align*}

One then uses that the \({\mathbf x}\) derivative of \({\mathbf x}\mapsto {\mathbf x}- \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1} {\mathbf f}(\mathbf x,{\mathbf y}_{2})\) can be made smaller than \(1/2\) when \({\mathbf x}\in B({\mathbf x}_{0}, \delta), {\mathbf y}_{2}\in B({\mathbf y}_0, r) \) to get

\begin{equation*} \Vert {\mathbf u}({\mathbf y}_{2})- {\mathbf u}({\mathbf y}_{1}) \Vert \le 2 \Vert \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1}\Vert \Vert {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})- {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\Vert, \end{equation*}

which shows the continuity of \({\mathbf u}({\mathbf y})\text{.}\)

🔗

The differentiability of \(\bu (\by)\) is shown in the same way as in the proof of the Inverse Function Theorem.

🔗

Remark 5.5.9.

If, for part 1, one assumes \(\bff\) is jointly differentiable in \(({\mathbf x}, {\mathbf y})\) and the derivatives are continuous at \((\bx_{0}, \by_{0})\text{,}\) then one can get a simple proof by using the Inverse Function Theorem. One simply defines

\begin{equation*} F(\bx, \by)=(\bff (\bx, \by), \by)\in \bbR^{n+m}\text{.} \end{equation*}

Then \(F(\bx_{0}, \by_{0})=(\mathbf 0, \by_{0})\text{,}\) and the Jacobian matrix at \((\bx_{0}, \by_{0})\) is an invertible \((n+m)\times (n+m)\) matrix, so there exist a neighborhood \(U\) of \((\bx_{0}, \by_{0})\text{,}\) a neighborhood \(V\) of \((\mathbf 0, \by_{0})\text{,}\) an inverse \(G\) to \(F\) defined on \(V\text{.}\) When we restrict \(G\) to \((\mathbf 0, \by)\in V\) and write out its \(\bbR^{n}\) and \(\bbR^{m}\)components, we get

\begin{equation*} G(\mathbf 0, \by)=(g(\mathbf 0, \by), \by), \quad \bff(g(\mathbf 0, \by),\by)=\mathbf 0. \end{equation*}

🔗

Remark 5.5.10.

The rule for determining the derivative of the implicit function is just a form of implicit differentiation. For example, on the level set \(f(x, y, z)=x^{2}+y^{2}+z^{2}=1\text{,}\) since \(f_{z}(x, y, z)=2z\text{,}\) at any point \((x, y, z)\) where \(z\ne 0\text{,}\) the Implicit Function Theorem applies to imply that \(z\) can be solved as a function of \((x, y)\) from \(x^{2}+y^{2}+z^{2}=1\text{,}\) and

\begin{equation*} 2x + 2z \frac{\partial z}{\partial x} =0,\quad 2y + 2z \frac{\partial z}{\partial y} =0 \end{equation*}

from which one gets \(\frac{\partial z}{\partial x} =-\frac xz\text{,}\) \(\frac{\partial z}{\partial y} =-\frac yz\text{.}\)

🔗

Our formulation allows us to study \(\bff (\bx, \by)=\mathbf 0\) when \(\bff\) may not be differentiable in \(\by\text{.}\) E.g., \(f(x, y)=x |y|+e^{x}\) is differentiable in \(x\text{,}\) \(f(0, 0)=1\text{,}\) \(D_{x}f(x, y)=|y|+e^{x}\) is continuous with \(D_{x}f(0, 0)=1\text{,}\) so Theorem 5.5.8 allows us to conclude that there exists a continuous \(x=g(y)\) for \(y\) near \(0\) with \(g(0)=0\) such that \(|y|g(y)+e^{g(y)}=1\text{.}\)

🔗

The formulation of Theorem 5.5.8 specifies a splitting of the variables \((\bx, \by)\text{.}\) In applications, one often has some flexibility in making a choice of the splitting. For example, suppose that \(f(x, y, z)\) has continuous partial derivatives, and one is interested in understanding the set \(\{(x, y, z): f(x, y, z)=f(x_{0}, y_{0},z_{0})\}\) for \((x, y, z)\) near \((x_{0}, y_{0},z_{0})\)---this is called a level set of \(f\text{.}\) \(f(x, y, z)\) here is a scalar function so any application of Theorem 5.5.8 is to solve for one of the variables in terms of the remaining two.

🔗

To be able to solve \(z\) in terms of \((x, y)\text{,}\) one needs to check whether \(D_{z}f(x_{0}, y_{0},z_{0})\) as a linear map from \(\bbR\) to \(\bbR\) is invertible. In this case, this is equivalent to whether \(\frac{\partial f}{\partial z }(x_{0}, y_{0},z_{0})\ne 0\text{.}\) But it’s possible that \(\frac{\partial f}{\partial x}(x_{0}, y_{0},z_{0})\ne 0\) or \(\frac{\partial f}{\partial y}(x_{0}, y_{0},z_{0})\ne 0\text{.}\) Then it would mean that near \((x_{0}, y_{0},z_{0})\) the set \(\{(x, y, z): f(x, y, z)=f(x_{0}, y_{0},z_{0})\}\) can be described as a graph of \(x\) in terms of \((y, z)\) or a graph of \(y\) in terms of \((x, z)\text{.}\) Thus as long as

\begin{equation*} Df(x_{0}, y_{0},z_{0})=(\frac{\partial f}{\partial x}(x_{0}, y_{0},z_{0}), \frac{\partial f}{\partial y}(x_{0}, y_{0},z_{0}),\frac{\partial f}{\partial z}(x_{0}, y_{0},z_{0})) \ne (0, 0, 0) \end{equation*}

one can apply Theorem 5.5.8 with respect to one of the variables to conclude that the level set \(\{(x, y, z): f(x, y, z)=f(x_{0}, y_{0},z_{0})\}\) for \((x, y, z)\) near \((x_{0}, y_{0},z_{0})\) is given by the graph of a function of two of the variables.

🔗

Suppose \(g(x, y, z)\) is another function with continuous partial derivatives and one is interested in understanding the set

\begin{equation} \{(x, y, z): f(x, y, z)=f(x_{0}, y_{0},z_{0}), g(x, y, z)=g(x_{0}, y_{0},z_{0})\}\text{.}\tag{5.5.6} \end{equation}

We are imposing two conditions, so an application of Theorem 5.5.8 would need to solve for two of the variables in terms of the remaining single variable. Geometrically, the set above is the intersection of a level set of \(f\) with a level set of \(g\text{.}\)

🔗

Suppose that we want to check whether the conditions of Theorem 5.5.8 hold for the \((x, y)\) variables, then we would need to check whether

\begin{equation*} \begin{bmatrix} \frac{\partial f}{\partial x}(x_{0}, y_{0},z_{0}) \amp \frac{\partial f}{\partial y}(x_{0}, y_{0},z_{0})\\ \frac{\partial g}{\partial x}(x_{0}, y_{0},z_{0}) \amp \frac{\partial g}{\partial y}(x_{0}, y_{0},z_{0})\end{bmatrix} \end{equation*}

is invertible. One could formulate similar criteria with respect to the other choices of a pair of variables. Note that one just needs one of the invertibility criteria to hold to apply Theorem 5.5.8 to conclude that the set in (5.5.6) near \((x_{0}, y_{0},z_{0})\) is a curved represented as a graph over one of the three variables. Thus a streamlined condition is that the matrix

\begin{equation*} \begin{bmatrix} \frac{\partial f}{\partial x}(x_{0}, y_{0},z_{0}) \amp \frac{\partial f}{\partial y}(x_{0}, y_{0},z_{0}) \amp \frac{\partial f}{\partial z}(x_{0}, y_{0},z_{0}) \\ \frac{\partial g}{\partial x}(x_{0}, y_{0},z_{0}) \amp \frac{\partial g}{\partial y}(x_{0}, y_{0},z_{0}) \amp \frac{\partial g}{\partial z}(x_{0}, y_{0},z_{0}) \end{bmatrix} \end{equation*}

has ranke \(2\text{.}\) Geometrically, this means that the two row vectors of the above matrix are linearly independent.

🔗

Remark 5.5.11.

From the structure of the proof, one can see that only the differentiability and continuity of maps, and the completeness of the underlying space (used in the iteration scheme in the proof of the contraction mapping theorem) are used. So the Inverse Function and Implicit Function Theorems easily generalize to infinite dimensional, complete normed vector spaces, which are called Banach spaces. The simplest examples of such spaces are \(C[0,1]\) of continuous functions on the interval \([0,1]\) with the norm \(||f||_0 = \max_{x \in [0,1]} |f(x)|\text{.}\)

🔗

The Inverse Function and Implicit Function Theorems are often used to establish local solvability of solutions near a known one. In geometric context, they are often used to construct manifolds. Below is an example.

🔗

Example 5.5.12. A geometric application of the Implicit Function Theorem.

Suppose that \(\bff(x)\) is a mapping from a neighborhood of \(\bx_0=(x_{01},\ldots, x_{0n})\) in \(\mathbb R^n \) into \(\mathbb R^m\) (\(m \lt n\)), with \(\bff(\bx_0)=\by_0\text{,}\) that it has continuous partial derivatives and the submatrix \([\frac{\partial f_i}{\partial x_j}], 1 \le i, j \le m\text{,}\) of the the Jacobian matrix \(\bff_{\bx}(\bx_0)\) invertible. Such an \(\bff\) is called an submersion near \(\bx_0\)

🔗

Then by the Implicit Function Theorem there is a neighborhood \(V\) of \((x_{0 (m+1)},\ldots, x_{0n})\text{,}\) a neighborhood \(U\) of \((x_{0 1},\ldots, x_{0m})\text{,}\) and a differentiable map \(\Phi: V\mapsto U\) such that

\begin{equation*} \bff (\Phi(x_{m+1},\ldots, x_{n}), x_{m+1},\ldots, x_{n})=\by_{0} \end{equation*}

for all \((x_{m+1},\ldots, x_{n})\in V\text{.}\) Furthermore, for any \((x_{m+1},\ldots, x_{n})\in V\text{,}\) the only solution in \(U\times V\) of \(\bff(\bx)=\by_{0}\) such that the last \((n-m)\)-components of \(\bx\) is \((x_{m+1},\ldots, x_{n})\) must be given by this \((\Phi(x_{m+1},\ldots, x_{n}), x_{m+1},\ldots, x_{n})\text{.}\)

🔗

For describing the set of all solutions of \({\mathbf f}({\mathbf x})= {\mathbf y}_0 \text{,}\) if \({\mathbf f}\) satisfies the assumption that its Jabobian at any solution of \({\mathbf f}({\mathbf x})= {\mathbf y}_0 \) is rank \(m\text{,}\) then one can apply the Implicit Function Theorem near any solution and conclude that the set of solutions near any single solution is described by a \(C^1\) graph over an \((n-m)\)-dimensional ball. These are examples of what are called \(C^1\) manifolds. The simplest such cases are when \(m=1\text{,}\) where the condition on the Jacobian becomes the non-vanishing of the gradient vector \((D_{1}f({\mathbf x}_{0}), \ldots, D_{n}f({\mathbf x}_{0}))\text{,}\) and the resulting manifold is a piece of a hypersurface.

🔗

Here is a simple example of \(m=2, n=3\text{:}\) the set

\begin{equation*} \{(x, y, z): x^{2}+y^{2}-z^{2}=1, x -y =c\} \end{equation*}

represents the intersection of the hyperboloid \(\{(x, y, z): x^{2}+y^{2}-z^{2}=1\}\) and the vertical plane \(\{(x, y, z): x -y =c\}\text{.}\) We set \(f(x, y, z)= x^{2}+y^{2}-z^{2}\) and \(g(x, y, z)=x-y\) and check the rank of

\begin{equation*} \begin{bmatrix} \frac{\partial f}{\partial x} \amp \frac{\partial f}{\partial y} \amp \frac{\partial f}{\partial z} \\ \frac{\partial g}{\partial x} \amp \frac{\partial g}{\partial y} \amp \frac{\partial g}{\partial z} \end{bmatrix} =\begin{bmatrix} 2x \amp 2y \amp -2z \\ 1 \amp -1 \amp 0 \end{bmatrix}\text{.} \end{equation*}

Since \(\det \begin{bmatrix} 2y \amp -2z \\ -1 \amp 0 \end{bmatrix}=-2z\text{,}\) which is \(\ne 0\) whenever \(z\ne 0\text{,}\) so if the intersection does not contain any point with \(z=0\text{,}\) the level set can be represented in the form of \((y, z)=(g(x), h(x))\) for some differentiable \(g, h\text{.}\) The intersection has solutions with \(z=0\) only when \(c\) is the range \([-\sqrt 2, \sqrt 2]\text{.}\) We can thus conclude that if \(c\) is not in such a range, then the intersection can be represented in the form of \((y, z)=(g(x), h(x))\) for some differentiable \(g, h\text{.}\)The derivatives can be found by implicit differentiation

\begin{equation*} 2x + 2y \frac{\partial y}{\partial x} - 2 z \frac{\partial z}{\partial x}=0\quad 1-\frac{\partial y}{\partial x}=0.\text{.} \end{equation*}

🔗

In the above setting one could also apply the Implicit Function Theorem to the \((x, z)\) variables to represent the graph as functions of the \(y\) variable.

🔗

Exercise 5.5.13.

Study the applicability of the Implicit Function Theorem with respect to the \((x, y)\) variables in the level set \(\{(x, y, z): x^{2}+y^{2}-z^{2}=1, x -y =c\}\) for \(c\) in the range \([-\sqrt 2, \sqrt 2]\text{.}\) When it is applicable, find \(\frac{\partial x}{\partial z}\) and \(\frac{\partial y}{\partial z}\text{.}\)

🔗

Exercise 5.5.14. Limacon curve.

The level set of \(f(x, y; a)=(x^{2}+y^{2}-x)^{2}-a^{2}(x^{2}+y^{2})=0\) is a curve called a Limacon---we treat \(a\) as a parameter.

Identify the point(s) where this curve intersects the \(x\) axis and examine whether Theorem 5.5.8 is applicable at the point---the result may depend on the value of the parameter \(a\text{.}\)

🔗
Note that \((0, 0)\) is a solution for any choice of \(a\text{.}\) Examine the set of solutions near \((0, 0)\) for different values of the parameter \(a\text{.}\)

🔗

🔗

Hint.

Examine the plots would give some idea of the behavior to expect.

🔗

Exercise 5.5.15. Differentiability of the level set of \(\Vert \bx \Vert =c\).

Consider \(\Vert\bx\Vert =\left(\sum_{i=1}^{n}|x_{i}|^{p}\right)^{1/p}\) as a function on \(\bbR^{n}\text{.}\) Identify conditions on \(p\) and \(\bx_{0}\) such that the level set \(\{\bx: \Vert\bx\Vert =\Vert\bx_{0}\Vert \}\) near \(\bx_{0}\) is the graph of a differentiable function of \((n-1)\) variables.

🔗

Exercise 5.5.16. Foliation of a neighborhood by level surfaces.

Suppose that \(f(\bx)\) is continuously differentiable in a neighborhood of \(\mathbf 0\in\bbR^{n}\text{,}\) \(f(\mathbf 0)=0\text{,}\) and \(D_{n}f(\mathbf 0)\ne 0\text{.}\) Prove that there exist a ball \(B(\mathbf 0, r)\subset \bbR^{n-1}\text{,}\) \(\delta >0\text{,}\) and a continuously differentiable function \(g(\bx', t)\) defined for \((\bx', t)\in B(\mathbf 0, r) \times (-\delta, \delta)\) such that (a) \(f(\bx', g(\bx', t))=t\text{,}\) (b) \(g(\mathbf 0, 0)=0\text{,}\) and (c) \(V:=\{(\bx', g(\bx', t)): \bx'\in B(\mathbf 0, r) \}\) forms a neighborhood of \(\mathbf 0\in\bbR^{n}\text{.}\)

🔗

Note that for each fixed \(t\in (-\delta, \delta)\text{,}\) \(f=t\) on the set \(\{(\bx', g(\bx', t)): (\bx', t)\in B(\mathbf 0, r)\text{,}\) so this neighborhood is foliated by leaves which are level surfaces of \(f\text{.}\)

🔗

Example 5.5.17. Differentiability of eigenvectors and eigenvalues.

Often we are interested in whether eigenvectors and eigenvalues of a matrix depend on the entries of the matrix in a continuous or differentiable way. Since any non-zero multiple of an eigenvector of a matrix is still an eigenvector, we need to impose some normalizing condition to speak of a continuously varying eigenvector.

🔗

Let \(A_{0}\) be an \(n\times n\) matrix and \(\bx_{0}\ne \mathbf 0\) satisfies \(A_{0}\bx_{0}=\lambda_{0}\bx_{0}\) for some \(\lambda_{0}\text{.}\) We normalize \(\bx_{0}\) such that \(\Vert \bx_{0}\Vert=1\) (We use the Euclidean metric here for its desirable differentiability). Then a unit norm eigenvector and eigenvalue pair of matrix \(A\) can be formulated as to satisfy

\begin{equation} F(A; \bx, \lambda)=((A-\lambda)\bx, \Vert \bx \Vert^{2})=(\mathbf 0, 1)\in \bbR^{n+1}\tag{5.5.7} \end{equation}

🔗

Note that \(F(A; \bx, \lambda)\) has continuous partial derivatives as a function of \((A; \bx, \lambda)\in \bbR^{n\times n}\times \bbR^{n}\times \bbR\text{.}\) Our aim is to solve for \((\bx, \lambda)\) as a function of \(A\text{.}\)

🔗

Claim. Suppose that \(A_{0}\) is symmetric and \(\bx_{0}\) is a simple eigenvector with eigenvalue \(\lambda_{0}\text{,}\) namely, the eigenspace of \(A-\lambda_{0}I\) is one-dimensional and spanned by \(\bx_{0}\text{.}\) Then there exists some \(\delta >0\) and differentiable functions \(Evc(A)\in \bbR^{n}\) and \(Evl(A)\in \bbR\) defined for matrices \(A\) with \(\Vert A-A_{0}\Vert \lt \delta\text{,}\) such that (5.5.7) holds with \(\bx=Evc(A)\) and \(\lambda=Evl(A)\text{.}\)

🔗

According to the Implicit Function Theorem, we need to verify that

\begin{equation*} D_{(\bx, \lambda)}F(A_{0},\bx_{0},\lambda_{0}) \end{equation*}

is an invertible map of \(\bbR^{n+1}\text{.}\) But

\begin{equation*} D_{(\bx, \lambda)}F(A_{0},\bx_{0},\lambda_{0})(\bv, s)= ((A_{0}-\lambda_{0})\bv -s \bx_{0}, 2 \bx_{0}\cdot \bv). \end{equation*}

We need to establish the unique solvability in \((\bv, s)\) of

\begin{align} (A_{0}-\lambda_{0}I)\bv -s \bx_{0} \amp = \bw \tag{5.5.8}\\ 2 \bx_{0}\cdot \bv \amp = t\tag{5.5.9} \end{align}

for any given \(\bw \in \bbR^{n}\text{,}\) \(t\in \bbR\text{.}\)

🔗

Under our assumption, the matrix \(A-\lambda_{0}I\) has rank \((n-1)\) and \((A-\lambda_{0}I)\bv=\bc\) has a solution iff \(\bc\cdot \bx_{0}=0\text{.}\) Thus (5.5.8) has a solution when \((s\bx_{0}+\bw)\cdot \bx_{0}=0\text{,}\) which determines \(s\) uniquely: \(s=- \bw\cdot \bx_{0}\text{.}\) (5.5.9), together with (5.5.8), then determine \(\bv\) uniquely.

🔗

Note that matrix \(A\) need not be symmetric.

🔗

Question: Can either the symmetry assumption of \(A_{0}\) or the simplicity assumption on the eigenspace be dropped entirely to have the same conclusions?

🔗

Hint: Examine the behavior of eigenvectors and eigenvalues of simple \(2\times 2\) matrices.

🔗

Prev Top Next