Skip to main content

Section 5.2 Linear Functions

Before we discuss linear approximation of a function from a normed vector space to a normed vector space, we first review linear functions defined from a normed vector space to a normed vector space. Even though much of the discussion makes sense in this general setting, we will mostly focus on the cases when the vector spaces are finite dimensional, in particular, when they are the usual Cartesian vector spaces. In such situations, much of this discussion is a review of some relevant concepts and properties from linear algebra. Perhaps the only new concept is the norm of a linear function (also called a linear map or a linear transformation).

Subsection 5.2.1 Definition of Linear Functions and Their Matrix Representation

Definition 5.2.1.

Let \(X, Y\) be two vector spaces over \(\bbR\text{.}\) A function \(L: X\mapsto Y\) is called linear if
\begin{equation*} L(a \bx +b \by)=aL(\bx)+bL(\by)\quad \text{for any $\bx, \by \in X, a, b\in \bbR$.} \end{equation*}
A linear function is also called a linear map or a linear transformation .
A function \(A: X\mapsto Y\) is called affine if there is a linear function \(L: X\mapsto Y\) and a vector \(\by_0\in Y\) such that \(A(\bx)=\by_0+L(\bx)\text{.}\)
Note that \(\by_0=A(\mathbf 0)\text{,}\) so an equivalent condition for \(A\) to be affine is that \(\bx\mapsto A(\bx)-A(\mathbf 0)\) is linear. We often use a certain affine function \(A(\bx)\) to approximate another function and in such a context we call it a linear approximation.
The most useful relation is that for any linear map \(T:X\mapsto Y\) between two finite dimensional vectors spaces \(X\) and \(Y\text{,}\) once a basis \(\left\{{\mathbf x}_{1}, \ldots, {\mathbf x}_{n}\right\}\) of \(X\) and a basis \(\left\{{\mathbf y}_{1}, \ldots, {\mathbf y}_{m}\right\}\) of \(Y\) are chosen, \(T\) can be represented through a matrix multiplication as follows: There exist coefficients \(a_{ij}\) such that for each \(1\le j \le n\text{,}\)
\begin{equation} T( {\mathbf x}_{j})=\sum_{i=1}^{m}a_{ij} {\mathbf y}_{i},\tag{5.2.1} \end{equation}
\begin{equation*} \text{then for any } {\mathbf x} \in X, \text{ there exist coefficients } c_{j}, 1\le j \le n, \text{ such that ${\mathbf x}= \sum_{j=1}^{n}c_{j} {\mathbf x}_{j}$,} \end{equation*}
thus
\begin{align*} T({\mathbf x}) \amp =\sum_{j=1}^{n}c_{j} T({\mathbf x}_{j}) \\ \amp= \sum_{j=1}^{n}c_{j} \sum_{i=1}^{m}a_{ij} {\mathbf y}_{i}\\ \amp= \sum_{i=1}^{m} d_{i} {\mathbf y}_{i} \end{align*}
where
\begin{equation} d_{i}= \sum_{j=1}^{n} a_{ij} c_{j}, 1\le i \le m \text{.}\tag{5.2.2} \end{equation}
In other words, the action of \(T\) in terms of the coordinates with respect to the two bases is represented through matrix multiplication by \((a_{ij} )\text{.}\)
Both (5.2.1) and (5.2.2) can be represented more cleanly using matrix notation: with
\begin{align*} {\mathbf x} \amp = \begin{bmatrix} {\mathbf x}_{1}\amp \ldots \amp {\mathbf x}_{n}\end{bmatrix} \begin{bmatrix} c_{1}\\ \vdots \\ c_{n}\end{bmatrix},\\ \begin{bmatrix} T({\mathbf x}_{1}) \amp \ldots\amp T({\mathbf x}_{n})\end{bmatrix} \amp = \begin{bmatrix} {\mathbf y}_{1} \amp \ldots \amp {\mathbf y}_{m}\end{bmatrix} \begin{bmatrix} a_{11} \amp a_{12}\amp \ldots \amp a_{1n} \\ a_{21}\amp a_{22}\amp \ldots \amp a_{2n} \\ \vdots \amp \vdots \amp \vdots \amp \vdots \\ a_{m1}\amp a_{m2}\amp \ldots \amp a_{mn}\end{bmatrix}, \end{align*}
we have
\begin{align*} T({\mathbf x}) \amp = \begin{bmatrix} T({\mathbf x}_{1})\amp \ldots\amp T({\mathbf x}_{n})\end{bmatrix} \begin{bmatrix} c_{1}\\ \vdots \\ c_{n}\end{bmatrix}\\ \amp =\begin{bmatrix} {\mathbf y}_{1}\amp \ldots\amp {\mathbf y}_{m}\end{bmatrix} \begin{bmatrix} a_{11} \amp a_{12}\amp \ldots \amp a_{1n} \\ a_{21}\amp a_{22} \amp \ldots \amp a_{2n} \\ \vdots \amp \vdots \amp \vdots \amp \vdots \\ a_{m1}\amp a_{m2}\amp \ldots \amp a_{mn}\end{bmatrix} \begin{bmatrix} c_{1}\\ \vdots \\ c_{n}\end{bmatrix}\\ \amp = \begin{bmatrix} {\mathbf y}_{1}\amp \ldots\amp {\mathbf y}_{m}\end{bmatrix} \begin{bmatrix} d_{1} \\ \vdots \\ d_{m} \end{bmatrix}, \end{align*}
where
\begin{align*} \begin{bmatrix} d_{1} \\ \vdots \\ d_{m} \end{bmatrix} \amp = \begin{bmatrix} a_{11} \amp a_{12}\amp \ldots \amp a_{1n} \\ a_{21}\amp a_{22} \amp \ldots \amp a_{2n} \\ \vdots \amp \vdots \amp \vdots \amp \vdots \\ a_{m1}\amp a_{m2}\amp \ldots \amp a_{mn}\end{bmatrix} \begin{bmatrix} c_{1}\\ \vdots \\ c_{n}\end{bmatrix}. \end{align*}
There is a natural addition \(S+T\) of two linear maps \(S\) and \(T\) from a vector space \(X\) to a vector space \(Y\text{,}\) and a scalar multiplication \(c S\) of a linear map. This makes the set \(L(X, Y)\) of linear maps from \(X\) to \(Y\) a vector space. Furthermore, when \(X\) and \(Y\) are finite dimensional, after a basis of \(X\) and a basis of \(Y\) are chosen, if \(S\) is represented by matrix \(A\text{,}\) and \(T\) is represented by matrix \(B\text{,}\) then \(S+T\) is represented by matrix \(A+B\text{.}\)
Suppose \(S\) is a linear map from \(X\) to \(Y\text{,}\) and \(T\) is a linear map from \(Y\) to \(Z\text{,}\) then the natural composition map \(T\circ S: X\mapsto Z\) is also a linear map. When \(X\text{,}\) \(Y\) and \(Z\) are all finite dimensional, and a basis has been chosen in each vector space, with \(A\) representing \(S\) and \(B\) representing \(T\text{,}\) then the matrix representation for \(T\circ S\) is the matrix product \(BA\text{.}\) In fact, the matrix multiplication is defined precisely based on this natural property. We often omit the composition operator \(\circ\) between \(S\) and \(T\) and write \(T\circ S\) as \(TS\text{.}\)

Definition 5.2.2. Invertible Linear Map.

If \(T:X\mapsto Y\) is a linear map, and there exists a linear map \(S: Y\mapsto X\) such that \(S\circ T=I_{X}\) and \(T\circ S=I_{Y}\text{,}\) namely, \(S(T(\bx))=\bx\) for all \(\bx\in X\) and \(T(S(\by))=\by\) for all \(\by \in Y\text{,}\) we say that \(T\) is an invertible linear map.
When such an \(S\) exists, it is uniquely determined. It is called the inverse of \(T\) and denoted as \(T^{-1}\text{.}\)

Exercise 5.2.3. Composition of Linear Maps.

Define \(T(x, y)=(x, y, x+y)\) and \(S(x, y, z)=(x+y+z, x-y+z, x+z)\text{.}\)
  1. Determine \(S\circ T\text{.}\)
  2. Find the matrix representation for \(T\text{,}\) \(S\text{,}\) and \(S\circ T\) respectively in the respective standard bases.
  3. Are \(T\) or \(S\) invertible? Are they injective or surjective?

Exercise 5.2.4. Matrix Representation of the Derivative Operator.

Let \(\cP_{k}\) denote the span of \(\{1, \cos t, \sin t, \ldots, \cos (kt), \sin (kt)\}\) and define \(D:\cP_{k}\mapsto \cP_{k}\) be the derivative operator.
  1. Find the matrix representation of \(D\) and \(D\circ D\) in the given basis.
  2. Does \(D\) map the span of \(\{ \cos t, \sin t, \ldots, \cos (kt), \sin (kt)\}\) to itself? If so, determine whether this map is invertible.

Subsection 5.2.2 Operator Norm of a Linear Map

Definition 5.2.5. Operator Norm of a Linear Map.

If \(X\) and \(Y\) are normed vector spaces, and \(T:X\mapsto Y\) is a linear map, then the operator norm, also called Frobenius norm, of \(T\) is defined as
\begin{equation*} ||T|| :=\sup\left\{ ||T({\mathbf x})||_{Y}: ||{\mathbf x}||_{X} = 1\right\}. \end{equation*}
Equivalently,
\begin{equation*} ||T|| :=\sup\left\{ ||T({\mathbf x})||_{Y}/||{\mathbf x}||_{X}: {\mathbf x}\ne {\mathbf 0}\right\}. \end{equation*}
Sometimes we use the notation \(||T||_{\cF}\text{.}\)
It follows that
\begin{equation*} ||T({\mathbf x})||_{Y} \le ||T|| ||{\mathbf x}||_{X} \text{ for any vector } {\mathbf x}, \end{equation*}
and \(||T||\) is the smallest number \(C\) such that
\begin{equation*} ||T({\mathbf x})||_{Y} \le C ||{\mathbf x}||_{X} \text{ for any vector } {\mathbf x}. \end{equation*}
In applications we often work normed vector spaces and linear maps with a finite operator norm, and when a linear map with a finite operator norm is invertible, we are interested in knowing whether its inverse also has a finite operator norm.

Remark 5.2.6.

\(||T||\) depends on the specific norms used on \(X, Y\text{.}\) For example,
\begin{equation*} I(u)(x) :=\int_{a}^{x}u(t)\, dt \end{equation*}
is a linear map from \(X=C[a, b]\) to \(Y=\{v\in C^{1}[a, b]: v(a)=0\}\text{,}\) where both \(X\) and \(Y\) are endowed with the \(C[a, b]\) norm: \(||u||_{C[a, b]} :=\max_{x\in [a, b]} |u(x)|\text{.}\) Then clearly \(||I(u)||_{C[a, b]} \le (b-a) ||u||_{C[a, b]}\text{.}\)
\(I:X\mapsto Y\) here is invertible, with \(I^{-1}(v)=v'(x)\) for any \(v\in Y\text{.}\) However, \(I^{-1}\) does not have a finite operator norm with respect to the norms we have chosen here, for, that would require the existence of some \(C'>0\) such that
\begin{equation*} ||I^{-1}(v)||_{C[a, b]}\le C' ||v||_{C[a, b]} \text{ for all $v\in Y$.} \end{equation*}
But \(v_{k}=\sin k(x-a)\) is a family in \(Y\text{,}\) with \(||v_{k}||_{C[a, b]}=1\text{,}\) yet \(||v_{k}'||_{C[a, b]}=|k|\text{,}\) so the above inequality can’t hold when \(|k|\to \infty\text{.}\)
If we endow \(Y\) with the \(C^{1}[a, b]\) norm:
\begin{equation*} ||v||_{C^{1}[a, b]} :=\max_{x\in [a, b]} |v(x)| + \max_{x\in [a, b]} |v'(x)|\text{,} \end{equation*}
then \(I\) still has a finite norm: \(||I(u)||_{C^{1}[a, b]}\le (b-a+1) ||u||_{C[a, b]}\) for all \(u\in X\text{,}\) and its inverse also has a finite norm:
\begin{equation*} ||I^{-1}(v)||_{C[a, b]} \le ||v||_{C^{1}[a, b]} \text{ for all $v\in Y$.} \end{equation*}

Remark 5.2.7.

As seen from the example above, when \(X\) is not finite dimensional it is possible that a linear map from \(X\) to \(Y\) may not have a finite norm. However, when \(X\) is finite dimensional, we will prove the following proposition.

Proof.

Let \(\left\{{\mathbf x}_{1}, \ldots, {\mathbf x}_{n}\right\}\) be a basis of \(X\text{.}\) Then any \({\mathbf x}\in X\) has coordinates \((c_{1}, \ldots, c_{n})\) in this basis: \({\mathbf x}=\sum_{j=1}^{n}c_{j} {\mathbf x}_{j}\text{,}\) and
\begin{align*} ||T({\mathbf x})||=||\sum_{j=1}^{n}c_{j} T({\mathbf x}_{j})||\le \amp \sum_{j=1}^{n}|c_{j}| || T({\mathbf x}_{j})|| \\ \le \amp \left( \sum_{j=1}^{n}|| T({\mathbf x}_{j})||^{2}\right)^{1/2} \left( \sum_{j=1}^{n}|c_{j}|^{2}\right)^{1/2}. \end{align*}
At the end of this subsection, we will prove a Lemma which implies that there exists some constant \(C>0\) such that
\begin{equation*} \left( \sum_{j=1}^{n}|c_{j}|^{2}\right)^{1/2} \le C|| \sum_{j=1}^{n}c_{j} {\mathbf x}_{j}||=C||{\mathbf x}|| \text{ for all $\bx \in X$}. \end{equation*}
This shows that \(||T||\) is finite and \(||T||\le C \left( \sum_{j=1}^{n}|| T({\mathbf x}_{j})||^{2}\right)^{1/2}\text{.}\)
Unless indicated otherwise, we will restrict to the situation that \(X\) is a finite dimensional normed vector space.
Suppose \(S\) and \(T\) are linear maps from \(X\) to \(Y\text{.}\) Using the property of the operator norm, we see that
\begin{equation*} ||(T+S)({\mathbf x})||\le|| T ({\mathbf x}) + S ({\mathbf x})|| \le ||T|| || {\mathbf x}||+||S|| ||{\mathbf x}|| =\left(||T||+||S||\right) ||{\mathbf x}|| \end{equation*}
for any vector \({\mathbf x}\text{,}\) so it follows that
\begin{equation*} ||T+S||\le ||T||+||S||. \end{equation*}
It is easier to see that \(||cT||=|c| ||T||\) for any scalar \(c\text{.}\) Thus the set \(L(X, Y)\) of linear maps from \(X\) to \(Y\) becomes a normed vector space.
Suppose \(S\) is a linear map from \(X\) to \(Y\text{,}\) and \(T\) is a linear map from \(Y\) to \(Z\text{,}\) using the property of the operator norm, we see that
\begin{equation*} ||TS({\mathbf x})||\le ||T|| ||S({\mathbf x})||\le ||T|| ||S|| ||{\mathbf x}|| \text{ for any vector } {\mathbf x}, \end{equation*}
thus \(||TS||\le ||T|| ||S||\text{.}\)

Remark 5.2.9.

The operator norm depends on the norms on \(X\) and \(Y\text{.}\) When \(X\) and \(Y\) are Cartesian vector spaces such as \(\mathbb R^{n}\text{,}\) the most commonly used norm is the Euclidean norm
\begin{equation*} ||(x_{1}, \ldots, x_{n})||=\sqrt{\sum_{i=1}^{n}|x_{i}|^{2}}. \end{equation*}
But other norms may also be used, such as
\begin{equation*} ||(x_{1}, \ldots, x_{n})||_{1} := \sum_{i=1}^{n}|x_{i}|, \end{equation*}
or
\begin{equation*} ||(x_{1}, \ldots, x_{n})||_{\infty} :=\max_{1\le i\le n}|x_{i}|. \end{equation*}
It is usually not easy to get the precise value of the operator norm. Often one tries to give an estimate for this norm. When a linear map \(T\) is represented by a matrix \(A\text{,}\) it would make sense to estimate \(||T||\) in terms of the matrix \(A\text{;}\) but one needs to be aware that \(T\) has different matrix representations depending on the choice of bases used, and any estimate of \(||T||\) in terms of the matrix \(A\) has to take into account of this freedom.
When \(X=\mathbb R^{n}\) and \(Y=\mathbb R^{m}\) are equipped with the standard Euclidean norm, and their standard bases are used, recall that
\begin{equation*} T(x_{1}, \ldots, x_{n})=A\begin{bmatrix} x_{1} \\ \vdots \\ x_{n} \end{bmatrix}, \end{equation*}
it follows that
\begin{equation*} ||T(x_{1}, \ldots, x_{n})||^{2}= \begin{bmatrix} x_{1} \amp \ldots \amp x_{n} \end{bmatrix}A^{\rm t}A \begin{bmatrix} x_{1} \\ \vdots \\ x_{n} \end{bmatrix}, \end{equation*}
so \(||T||\) in this setting is identified as the square root of the largest eigenvalue of \(A^{\rm t}A\text{.}\)
Using the Cauchy-Schwarz inequality, one can easily get an estimate of the form
\begin{equation*} ||T||\le \sqrt{\sum_{1\le i \le m, 1\le j \le n} |a_{ij}|^{2}}. \end{equation*}
The latter is actually the square mean norm on the space of matrices denoted as \(\Vert T\Vert_2\text{.}\)
When \(m=n\) and \(A\) is an orthogonal matrix, this estimate would give \(||T||\le \sqrt{n}\) as \(\sum_{1\le j \le n} |a_{ij}|^{2}=1\) for each \(1\le i \le n\text{,}\) while the above characterization, or the defining property of an orthogonal map gives \(||T||=1\text{.}\)

Exercise 5.2.10. Dependence of operator norm on vector space norm.

\(S(x_{1},\ldots, x_{n})=\sum_{i=1}^{n}a_{i}x_{i} \in \bbR\) is a linear function on \(\bbR^{n}\text{.}\)
  1. Determine the operator norm of \(S\) if \(\bbR^{n}\) is equipped with the norm \(||(x_{1}, \ldots, x_{n})||_{1}\text{.}\)
  2. Determine the operator norm of \(S\) if \(\bbR^{n}\) is equipped with the norm \(||(x_{1}, \ldots, x_{n})||_{\infty}\text{.}\)
  3. Determine the operator norm of \(S\) if \(\bbR^{n}\) is equipped with the norm \(||(x_{1}, \ldots, x_{n})||_{p}:= \left( \sum_{i=1}^{n}|x_{i}|^{p}\right)^{1/p}\) for some \(1 \lt p \lt \infty\text{.}\)

Remark 5.2.11.

When a linear map \(T\) from \(X\) to \(Y\) has a finite norm, it is a continuous map, as
\begin{equation*} ||T({\mathbf x}) -T({\mathbf y})||=||T({\mathbf x}- {\mathbf y})||\le ||T|| || {\mathbf x}- {\mathbf y}||. \end{equation*}
In fact, the converse is also true. For, if \(T: X\mapsto Y\) is linear and continuous at \(\mathbf 0\text{,}\) then for any \(\epsilon >0\text{,}\) there exists \(\delta>0\) such that \(||T({\mathbf x})||\le \epsilon\) whenever \(||{\mathbf x}||\le \delta\text{.}\) But for any \({\mathbf x}\ne {\mathbf 0}\text{,}\) \(\Vert \frac{\delta {\mathbf x}}{||{\mathbf x}||} \Vert \le \delta\text{,}\) we thus have
\begin{equation*} ||T(\frac{\delta {\mathbf x}}{||{\mathbf x}||})||\le \epsilon, \end{equation*}
from which we conclude that
\begin{equation*} ||T({\mathbf x})||\le \frac{\epsilon}{\delta} ||{\mathbf x}||. \end{equation*}
When \(X\) and \(Y\) are finite dimensional, most questions about a linear map from \(X\) to \(Y\) can be formulated as a question about its matrix representation and answered that way. For example, if \(X\) and \(Y\) have the same dimension, then a a linear map \(T\) from \(X\) to \(Y\) is injective iff the null space of its matrix representation is trivial, from which one also knows that \(T\) is injective iff it is surjective.
However, when \(X\) and \(Y\) are not finite dimensional, we lose this matrix representation, and many of the conclusions or deductions in the finite dimensional setting do not work any more. For example, if \(X=Y=l^{2}\text{,}\) and \(L\) and \(R\) are the left and right shift operator respectively, then \(L\) is surjective but not injective, while \(R\) is injective but not surjective.
In the context of Fourier series, we may consider \(f\mapsto (a_{0}, a_{1}, b_{1}, \ldots)\) as a linear map \(F\) from either \(C[-\pi, \pi]\) or \({\mathcal R}[-\pi, \pi]\) to \(l^{2}\text{,}\) where \((a_{0}, a_{1}, b_{1}, \ldots)\) is the vector of Fourier coefficients of \(f\text{.}\) Then the uniqueness of the Fourier series (which is equivalent to the completeness of the sequence of standard trigonometric functions) implies that this transformation is injective. The Bessel’s inequality implies that, as a linear map from \(C[-\pi, \pi]\) to \(l^{2}\text{,}\) it has a bounded norm, as
\begin{equation*} \frac{1}{2\pi} \int_{-\pi}^{\pi}|f(x)|^{2}\;dx \le (\max_{[-\pi, \pi]}|f(x)|)^{2} =||f||_{C[-\pi, \pi]}^{2} \text{ for all } f\in C[-\pi, \pi], \end{equation*}
and
\begin{align*} \Vert (a_{0}, a_{1}, b_{1}, \ldots) \Vert_{l^2} \amp \le \left(2|a_{0}|^{2}+ \sum_{n=1}^{\infty} \left[ |a_{n}|^{2}+|b_{n}|^{2}\right]\right)^{1/2}\\ \amp \le \frac{1}{\sqrt \pi} \left( \int_{-\pi}^{\pi}|f(x)|^{2}\;dx \right)^{1/2}\\ \amp \le \sqrt 2 ||f||_{C[-\pi, \pi]}. \end{align*}
But if we consider this linear map as from \({\mathcal R}[-\pi, \pi]\) to \(l^{2}\text{,}\) it does not have a bounded norm if we equip function \(f\in {\mathcal R}[-\pi, \pi]\) with the norm \(\int_{-\pi}^{\pi}|f(x)|\, dx\text{,}\) as for any \(C>0\text{,}\) there exists \(f\in {\mathcal R}[-\pi, \pi]\text{,}\) such that
\begin{equation*} \Vert (a_{0}, a_{1}, b_{1}, \ldots) \Vert_{l^2}\ge \left( \frac{1}{2\pi} \int_{-\pi}^{\pi}|f(x)|^{2}\;dx\right)^{1/2}\ge C \int_{-\pi}^{\pi}|f(x)|\;dx. \end{equation*}
A natural question in this context is whether \(F\) is surjective considered either as a map from \(C[-\pi, \pi]\) or \({\mathcal R}[-\pi, \pi]\) to \(l^{2}\text{.}\) The answer to this question turns out to be negative, and it is related to whether \(C[-\pi, \pi]\) or \({\mathcal R}[-\pi, \pi]\) is a complete normed space equipped with the norm \(\left( \frac{1}{2\pi} \int_{-\pi}^{\pi}|f(x)|^{2}\;dx\right)^{1/2}\text{,}\) as the latter is also a well defined norm on either \(C[-\pi, \pi]\) or \({\mathcal R}[-\pi, \pi]\text{.}\)

Remark 5.2.12.

The \(N\)th Fourier series partial sum \(s_{N}(f; x)\) of \(f\in C[-\pi, \pi]\) is a linear map from \(C[-\pi, \pi]\) to itself. Using the integral expression for \(s_{N}(f; x)\text{,}\) it is easy to see that its operator norm is \(\frac{1}{2\pi} \int_{-\pi}^{\pi} |D_{N}(t)|\,dt\text{,}\) which tends to \(\infty\) as \(N\to \infty\text{.}\)
Fix any \(x\in [-\pi, \pi]\text{,}\) one may consider \(f\mapsto s_{N}(f; x)\) as a linear map from \(C[-\pi, \pi]\) to the normed vector space \(\mathbb R\text{,}\) and the norm of this transformation is also \(\frac{1}{2\pi} \int_{-\pi}^{\pi} |D_{N}(t)|\,dt\text{.}\) This fact plays a role in implying that at any designated point there are continuous functions whose Fourier series diverges there.

Subsection 5.2.3 Any Two Norms on a Finite Dimensional Vector Space Are Equivalent

Proof.

The general statement of the Lemma follows from the inequality above, as \(\left(\sum_{k=1}^{n}|{\mathbf x}(k)|^{2}\right)^{1/2}\) is a concrete norm on \(X\text{,}\) and any two norms satisfy a similar inequality via their relations with this norm.
The first inequality follows from triangle inequality. For the second inequality, if it does not hold, then using homogeneity of norms, there would exist a sequence \({\mathbf x}_{m}\) such that
\begin{equation*} \left(\sum_{k=1}^{n}|{\mathbf x}_{m}(k)|^{2}\right)^{1/2}=1, \text{ but } ||{\mathbf x}_{m}||\to 0. \end{equation*}
The sequence \(({\mathbf x}_m(1), \ldots, {\mathbf x}_m(n))\) is a sequence in \(\mathbb R^{n}\) with unit Euclidean norm. By the Bolzano-Weierstrass Theorem, it has a convergent subsequence. Still call it \(({\mathbf x}_m(1), \ldots, {\mathbf x}_m(n))\) to simply notation, and let \(({\mathbf x}_{\infty}(1), \ldots, {\mathbf x}_{\infty}(n))\) be such that \(({\mathbf x}_{m}(1), \ldots, {\mathbf x}_{m}(n))\to ({\mathbf x}_{\infty}(1), \ldots, {\mathbf x}_{\infty}(n))\) as \(m\to \infty\text{.}\) Then \(\left(\sum_{k=1}^{n}|{\mathbf x}_{\infty}(k)|^{2}\right)^{1/2}=1\text{.}\) Thus \({\mathbf x}_{\infty}:=\sum_{k=1}^{n}{\mathbf x}_{\infty}(k) {\mathbf v}_{k}\ne {\mathbf 0}\text{.}\)
But by the first inequality, which is already established,
\begin{align*} ||{\mathbf x}_{\infty} || \amp \le ||{\mathbf x}_{\infty}- {\mathbf x}_{m}|| + ||{\mathbf x}_{m} ||\\ \amp \le c_{2} \left(\sum_{k=1}^{n}|{\mathbf x}_{m}(k)-{\mathbf x}_{\infty}(k)|^{2}\right)^{1/2} + ||{\mathbf x}_{m} || \end{align*}
Now using
\begin{equation*} \lim_{m\to \infty} \left(\sum_{k=1}^{n}|{\mathbf x}_{m}(k)-{\mathbf x}_{\infty}(k)|^{2}\right)^{1/2}=0 \quad \text{ and } \lim_{m\to \infty} ||{\mathbf x}_{m} ||=0, \end{equation*}
it follows that \(||{\mathbf x}_{\infty} || =0\text{,}\) which contradicts the property of a norm. This concludes our proof of the Lemma.