The heuristic for the first part is similar to that in proving the Inverse Function Theorem: for \((\bx, \by)\in U_{0}\times V_{0})\text{,}\) use the linear approximation of \({\mathbf f}\) in the \({\mathbf x}\) variable to approximate \({\mathbf f}({\mathbf x}, {\mathbf y})\text{;}\) more precisely,
\begin{equation*}
{\mathbf r}({\mathbf x}; {\mathbf x}_{0}, {\mathbf y}):={\mathbf f}({\mathbf x}, {\mathbf y})-
{\mathbf f}({\mathbf x}_{0}, {\mathbf y})-D_{{\mathbf x}} {\mathbf f}({\mathbf x}_{0}, {\mathbf y})
({\mathbf x}-{\mathbf x}_{0}).
\end{equation*}
Then the equation \({\mathbf f}({\mathbf x}, {\mathbf y})={\mathbf 0}\) is equivalent to
\begin{equation*}
{\mathbf r}({\mathbf x}; {\mathbf x}_{0}, {\mathbf y})=-
{\mathbf f}({\mathbf x}_{0}, {\mathbf y})-D_{{\mathbf x}} {\mathbf f}({\mathbf x}_{0}, {\mathbf y})
({\mathbf x}-{\mathbf x}_{0}),
\end{equation*}
or \({\mathbf x}\) is a fixed point of the mapping
\begin{align*}
\phi ({\mathbf x}) := \amp {\mathbf x}_{0} +
\left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y})\right]^{-1}
\left\{ - {\mathbf f}({\mathbf x}_{0}, {\mathbf y})
-{\mathbf r}({\mathbf x}; {\mathbf x}_{0}, {\mathbf y})\right\}\\
=\amp {\mathbf x}_{0} +
\left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y})\right]^{-1}
\left\{ - {\mathbf f}({\mathbf x}, {\mathbf y}) + D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y})({\mathbf x}-{\mathbf x}_{0}) \right\}\\
=\amp {\mathbf x}+ \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y})\right]^{-1}
\left\{ - {\mathbf f}({\mathbf x}, {\mathbf y}) \right\}.
\end{align*}
Then in a similar way one can show the existence of a unique fixed point in
\(\overline{B({\mathbf x}_{0}, \delta)}\) of
\(\phi \) when
\(\delta>0\) and
\(r>0\) are chosen appropriately so that
\(\phi\) satisfies
(5.5.3) for
\({\mathbf x} \in B({\mathbf x}_{0}, \delta), {\mathbf y}\in B({\mathbf y}_0, r) \text{.}\) This shows the existence of
\({\mathbf x} = {\mathbf u}({\mathbf y})\) for
\({\mathbf y}\in B({\mathbf y}_0, r) \text{.}\)
In fact, there is some flexibility in setting up \(\phi\text{.}\) One could use a modified \(\phi\) such as
\begin{equation*}
\phi ({\mathbf x}) :={\mathbf x}+
\left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1}
\left\{ - {\mathbf f}({\mathbf x}, {\mathbf y}) \right\}
\end{equation*}
and use its fixed point to construct \({\mathbf x} = {\mathbf u}({\mathbf y})\text{.}\)
To prove the continuity of \({\mathbf x} = {\mathbf u}({\mathbf y})\text{,}\) one takes \({\mathbf y}_{1}, {\mathbf y}_{2}\in B({\mathbf y}_0, r) \) and tries to use the relation
\begin{align*}
{\mathbf 0} =\amp {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})-
{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\\
=\amp {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})-{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})+ {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})-
{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})
\end{align*}
and the information that \({\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})-
{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1}) \to \mathbf 0\) as \(\by_{2}\to \by_{1}\) to show that \({\mathbf u}({\mathbf y}_{2})\to
{\mathbf u}({\mathbf y}_{1})\text{.}\)
But a standard application of the mean value theorem can only give an upper bound of \(\Vert {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})-{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})\Vert\) in terms of \(\Vert {\mathbf u}({\mathbf y}_{2})-
{\mathbf u}({\mathbf y}_{1})\Vert\text{.}\) Since \(D_{\bx}\bff(\bx, \by_{2})\) is close to \(D_{\bx}\bff(\bx_{0}, \by_{0})\) when \(\bx \in B({\mathbf x}_{0}, \delta)\) and \(\by \in B(\by_{0}, r)\text{,}\) the derivative of \(\bff(\bx, \by_{2})-D_{\bx}\bff(\bx_{0}, \by_{0})\bx\) with respect to \(\bx\) is small in the same neighorhood. In other words,
\begin{equation*}
\bff(\bx, \by_{2})=D_{\bx}\bff(\bx_{0}, \by_{0})\bx+ \left[ \bff(\bx, \by_{2})-D_{\bx}\bff(\bx_{0}, \by_{0})\bx\right]
\end{equation*}
"behaves like" \(D_{\bx}\bff(\bx_{0}, \by_{0})\bx\) as a function of \(\bx\in B({\mathbf x}_{0}, \delta)\text{.}\) We implement this as
\begin{align*}
{\mathbf 0} =\amp {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})-
{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\\
=\amp \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]
\left[{\mathbf u}({\mathbf y}_{2})- {\mathbf u}({\mathbf y}_{1})\right]
+ {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})-
{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\\
+ \amp \left\{ {\mathbf f}({\mathbf u}({\mathbf y}_{2}), {\mathbf y}_{2})-
D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})]{\mathbf u}({\mathbf y}_{2})\right\}
-\left\{ {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})-
D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})]{\mathbf u}({\mathbf y}_{1})\right\}
\end{align*}
from which one gets
\begin{align*}
{\mathbf u}({\mathbf y}_{2})- {\mathbf u}({\mathbf y}_{1}) =\amp - \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1}
\left\{ {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})-
{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\right\}\\
\amp+ \left\{{\mathbf u}({\mathbf y}_{2})-
\left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1}{\mathbf f}({\mathbf u}({\mathbf y}_{2}),{\mathbf y}_{2})\right\}\\
\amp - \left\{{\mathbf u}({\mathbf y}_{1})+
\left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1}{\mathbf f}({\mathbf u}({\mathbf y}_{1}),{\mathbf y}_{2})\right\}
\end{align*}
One then uses that the \({\mathbf x}\) derivative of \({\mathbf x}\mapsto {\mathbf x}-
\left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1}
{\mathbf f}(\mathbf x,{\mathbf y}_{2})\) can be made smaller than \(1/2\) when \({\mathbf x}\in B({\mathbf x}_{0}, \delta), {\mathbf y}_{2}\in B({\mathbf y}_0, r) \) to get
\begin{equation*}
\Vert {\mathbf u}({\mathbf y}_{2})- {\mathbf u}({\mathbf y}_{1}) \Vert
\le 2 \Vert \left[D_{{\mathbf x}}{\mathbf f}({\mathbf x}_{0}, {\mathbf y}_{0})\right]^{-1}\Vert
\Vert {\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{2})-
{\mathbf f}({\mathbf u}({\mathbf y}_{1}), {\mathbf y}_{1})\Vert,
\end{equation*}
which shows the continuity of \({\mathbf u}({\mathbf y})\text{.}\)
The differentiability of
\(\bu (\by)\) is shown in the same way as in the proof of the Inverse Function Theorem.