We first work out \(\mathbf g \circ \mathbf f(\bx) - \mathbf g \circ \mathbf f(\bx_0)\) by using the differentiability of \(\mathbf f\) at \(\bx_0\) and of \(\mathbf g\) at \(\by_0=\mathbf f(\bx_0)\text{:}\)
\begin{align*}
\mathbf g \circ \mathbf f(\bx) - \mathbf g \circ \mathbf f(\bx_0) \amp
= \mathbf g (\mathbf f(\bx)) - \mathbf g ( \mathbf f(\bx_0)) \\
\amp = [D\mathbf g(\mathbf f(\bx_0))] \left[ \mathbf f(\bx)- \mathbf f(\bx_0)\right] +\bz(\mathbf f(\bx), \mathbf f(\bx_0))
\end{align*}
where \(\Vert \bz(\by, \by_0)\Vert/\Vert \by -\by_0\Vert \to 0\) as \(\by \to \by_0\text{;}\) and
\begin{equation*}
\mathbf f(\bx) - \mathbf f(\bx_0)=[D\mathbf f(\bx_0)](\bx -\bx_0) +\bw (\bx, \bx_0),
\end{equation*}
where \(\Vert \bw(\bx, \bx_0)\Vert/\Vert \bx -\bx_0\Vert \to 0\) as \(\bx \to \bx_0\text{,}\) so we have
\begin{align*}
\mathbf g \circ \mathbf f(\bx) - \mathbf g \circ \mathbf f(\bx_0)
=\amp [D\mathbf g (\mathbf f(\bx_0))][D\mathbf f(\bx_0)](\bx -\bx_0)\\
\amp + [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0)
+ \bz(\mathbf f(\bx), \mathbf f(\bx_0)).
\end{align*}
The differentiability of \(\mathbf g \circ \mathbf f(\bx)\) at \(\bx_0\) is equivalent to
\begin{equation*}
\Vert [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0)
+ \bz(\mathbf f(\bx), \mathbf f(\bx_0)) \Vert/\Vert \bx -\bx_0\Vert \to 0 \text{ as } \bx \to \bx_0.
\end{equation*}
Using the property of matrix norm on \(\Vert [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0) \Vert\text{,}\) we have
\begin{equation*}
\Vert [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0) \Vert \le
\Vert [D\mathbf g (\mathbf f(\bx_0))]\Vert_{\cF} \Vert\bw(\bx, \bx_0) \Vert,
\end{equation*}
therefore
\begin{equation*}
\Vert [D\mathbf g (\mathbf f(\bx_0))]\bw(\bx, \bx_0) \Vert/\Vert \bx -\bx_0\Vert \to 0 \text{ as } \bx \to \bx_0.
\end{equation*}
For \(\bz(\mathbf f(\bx), \mathbf f(\bx_0))\text{,}\) informally
\begin{equation*}
\frac{\Vert \bz (\by, \by_0) \Vert}{\Vert \bx -\bx_0 \Vert}
=\frac{\Vert \bz (\by, \by_0) \Vert}{\Vert \by -\by_0\Vert} \frac{\Vert \by -\by_0\Vert}{\Vert \bx -\bx_0 \Vert},
\end{equation*}
where \(\by =\mathbf f (\bx)\) and \(\by_0=\mathbf f (\bx_0)\text{,}\)
\begin{equation*}
\frac{\Vert \by -\by_0\Vert}{\Vert \bx -\bx_0 \Vert} \text{ remains bounded
as $\bx \to \bx_0$ and
$\frac{\Vert \bz (\by, \by_0) \Vert}{\Vert \by -\by_0\Vert} \to 0$ as $\by \to \by_0$.}
\end{equation*}
But this argument has a minor flaw, for \(\Vert \by -\by_0\Vert\) could be \(0\text{.}\) To fix this issue, for any \(\epsilon \gt 0\text{,}\) there exists some \(\delta \gt 0\) such that \(\Vert \bz (\by, \by_0) \Vert \le \epsilon \Vert \by -\by_0\Vert\) whenever \(\Vert \by -\by_0\Vert \lt \delta\text{.}\) Then using
\begin{align*}
\Vert \by -\by_0 \Vert \le \amp \Vert \bw (\bx, \bx_0) + [D\mathbf f(\bx_0)](\bx -\bx_0)\Vert\\
\le \amp \Vert \bw (\bx, \bx_0) \Vert + \Vert [D\mathbf f(\bx_0)](\bx -\bx_0)\Vert \\
\le \amp \Vert \bw (\bx, \bx_0) \Vert + \Vert [D\mathbf f(\bx_0)]\Vert \Vert \bx -\bx_0\Vert
\end{align*}
and \(\frac{\Vert \bw (\bx, \bx_0) \Vert}{\Vert \bx -\bx_0 \Vert}\to 0\) as \(\bx\to \bx_0\text{,}\) we can find some \(\sigma \gt 0\) such that when \(\Vert \bx -\bx_0\Vert \lt \sigma\text{,}\) \(\Vert \bw (\bx, \bx_0) \Vert \lt \epsilon \Vert \bx -\bx_0\Vert\text{,}\) so
\begin{align*}
\Vert \by-\by_0 \Vert\le \amp \epsilon \Vert \bx -\bx_0\Vert + \Vert [D\mathbf f(\bx_0)]\Vert \Vert \bx -\bx_0\Vert\\
\le \amp \left( \epsilon + \Vert [D\mathbf f(\bx_0)]\Vert\right) \Vert \bx -\bx_0\Vert \lt \delta.
\end{align*}
Putting these together, when \(\Vert \bx -\bx_0\Vert \lt \sigma\) we have
\begin{equation*}
\Vert \bz (\by, \by_0) \Vert \le \epsilon \Vert \by-\by_0 \Vert\le \epsilon
\left( \epsilon + \Vert [D\mathbf f(\bx_0)]\Vert\right) \Vert \bx -\bx_0\Vert,
\end{equation*}
which shows the differentiability of \(\bg \circ \bff \) at \(\bx_0\text{.}\)