Subsection 5.2.1 Definition of Linear Functions and Their Matrix Representation
Definition 5.2.1.
Let \(X, Y\) be two vector spaces over \(\bbR\text{.}\) A function \(L: X\mapsto Y\) is called linear if
\begin{equation*}
L(a \bx +b \by)=aL(\bx)+bL(\by)\quad \text{for any $\bx, \by \in X, a, b\in \bbR$.}
\end{equation*}
A linear function is also called a linear map or a linear transformation .
A function
\(A: X\mapsto Y\) is called
affine if there is a linear function
\(L: X\mapsto Y\) and a vector
\(\by_0\in Y\) such that
\(A(\bx)=\by_0+L(\bx)\text{.}\)
Note that
\(\by_0=A(\mathbf 0)\text{,}\) so an equivalent condition for
\(A\) to be affine is that
\(\bx\mapsto A(\bx)-A(\mathbf 0)\) is linear. We often use a certain affine function
\(A(\bx)\) to approximate another function and in such a context we call it a linear approximation.
The most useful relation is that for any linear map \(T:X\mapsto Y\) between two finite dimensional vectors spaces \(X\) and \(Y\text{,}\) once a basis \(\left\{{\mathbf x}_{1}, \ldots, {\mathbf x}_{n}\right\}\) of \(X\) and a basis \(\left\{{\mathbf y}_{1}, \ldots, {\mathbf y}_{m}\right\}\) of \(Y\) are chosen, \(T\) can be represented through a matrix multiplication as follows: There exist coefficients \(a_{ij}\) such that for each \(1\le j \le n\text{,}\)
\begin{equation}
T( {\mathbf x}_{j})=\sum_{i=1}^{m}a_{ij} {\mathbf y}_{i},\tag{5.2.1}
\end{equation}
\begin{equation*}
\text{then for any } {\mathbf x} \in X, \text{ there exist coefficients } c_{j}, 1\le j \le n,
\text{ such that ${\mathbf x}= \sum_{j=1}^{n}c_{j} {\mathbf x}_{j}$,}
\end{equation*}
thus
\begin{align*}
T({\mathbf x}) \amp =\sum_{j=1}^{n}c_{j} T({\mathbf x}_{j}) \\
\amp= \sum_{j=1}^{n}c_{j} \sum_{i=1}^{m}a_{ij} {\mathbf y}_{i}\\
\amp= \sum_{i=1}^{m} d_{i} {\mathbf y}_{i}
\end{align*}
where
\begin{equation}
d_{i}= \sum_{j=1}^{n} a_{ij} c_{j}, 1\le i \le m \text{.}\tag{5.2.2}
\end{equation}
In other words, the action of \(T\) in terms of the coordinates with respect to the two bases is represented through matrix multiplication by \((a_{ij} )\text{.}\)
Both
(5.2.1) and
(5.2.2) can be represented more cleanly using matrix notation: with
\begin{align*}
{\mathbf x} \amp = \begin{bmatrix} {\mathbf x}_{1}\amp \ldots \amp {\mathbf x}_{n}\end{bmatrix}
\begin{bmatrix} c_{1}\\ \vdots \\ c_{n}\end{bmatrix},\\
\begin{bmatrix} T({\mathbf x}_{1}) \amp \ldots\amp T({\mathbf x}_{n})\end{bmatrix} \amp
= \begin{bmatrix} {\mathbf y}_{1} \amp \ldots \amp {\mathbf y}_{m}\end{bmatrix}
\begin{bmatrix} a_{11} \amp a_{12}\amp \ldots \amp a_{1n} \\
a_{21}\amp a_{22}\amp \ldots \amp a_{2n} \\
\vdots \amp \vdots \amp \vdots \amp \vdots \\
a_{m1}\amp a_{m2}\amp \ldots \amp a_{mn}\end{bmatrix},
\end{align*}
we have
\begin{align*}
T({\mathbf x}) \amp = \begin{bmatrix} T({\mathbf x}_{1})\amp \ldots\amp T({\mathbf x}_{n})\end{bmatrix}
\begin{bmatrix} c_{1}\\ \vdots \\ c_{n}\end{bmatrix}\\
\amp =\begin{bmatrix} {\mathbf y}_{1}\amp \ldots\amp {\mathbf y}_{m}\end{bmatrix}
\begin{bmatrix} a_{11} \amp a_{12}\amp \ldots \amp a_{1n} \\
a_{21}\amp a_{22} \amp \ldots \amp a_{2n} \\
\vdots \amp \vdots \amp \vdots \amp \vdots \\
a_{m1}\amp a_{m2}\amp \ldots \amp a_{mn}\end{bmatrix}
\begin{bmatrix} c_{1}\\ \vdots \\ c_{n}\end{bmatrix}\\
\amp = \begin{bmatrix} {\mathbf y}_{1}\amp \ldots\amp {\mathbf y}_{m}\end{bmatrix}
\begin{bmatrix} d_{1} \\ \vdots \\ d_{m} \end{bmatrix},
\end{align*}
where
\begin{align*}
\begin{bmatrix} d_{1} \\ \vdots \\ d_{m} \end{bmatrix} \amp =
\begin{bmatrix} a_{11} \amp a_{12}\amp \ldots \amp a_{1n} \\
a_{21}\amp a_{22} \amp \ldots \amp a_{2n} \\
\vdots \amp \vdots \amp \vdots \amp \vdots \\
a_{m1}\amp a_{m2}\amp \ldots \amp a_{mn}\end{bmatrix} \begin{bmatrix} c_{1}\\ \vdots \\ c_{n}\end{bmatrix}.
\end{align*}
There is a natural addition
\(S+T\) of two linear maps
\(S\) and
\(T\) from a vector space
\(X\) to a vector space
\(Y\text{,}\) and a scalar multiplication
\(c S\) of a linear map. This makes the set
\(L(X, Y)\) of linear maps from
\(X\) to
\(Y\) a vector space. Furthermore, when
\(X\) and
\(Y\) are finite dimensional, after a basis of
\(X\) and a basis of
\(Y\) are chosen, if
\(S\) is represented by matrix
\(A\text{,}\) and
\(T\) is represented by matrix
\(B\text{,}\) then
\(S+T\) is represented by matrix
\(A+B\text{.}\)
Suppose
\(S\) is a linear map from
\(X\) to
\(Y\text{,}\) and
\(T\) is a linear map from
\(Y\) to
\(Z\text{,}\) then the natural composition map
\(T\circ S: X\mapsto Z\) is also a linear map. When
\(X\text{,}\) \(Y\) and
\(Z\) are all finite dimensional, and a basis has been chosen in each vector space, with
\(A\) representing
\(S\) and
\(B\) representing
\(T\text{,}\) then the matrix representation for
\(T\circ S\) is the matrix product
\(BA\text{.}\) In fact, the matrix multiplication is defined precisely based on this natural property. We often omit the composition operator
\(\circ\) between
\(S\) and
\(T\) and write
\(T\circ S\) as
\(TS\text{.}\)
Definition 5.2.2. Invertible Linear Map.
If
\(T:X\mapsto Y\) is a linear map, and there exists a linear map
\(S: Y\mapsto X\) such that
\(S\circ T=I_{X}\) and
\(T\circ S=I_{Y}\text{,}\) namely,
\(S(T(\bx))=\bx\) for all
\(\bx\in X\) and
\(T(S(\by))=\by\) for all
\(\by \in Y\text{,}\) we say that
\(T\) is an invertible linear map.
When such an
\(S\) exists, it is uniquely determined. It is called the inverse of
\(T\) and denoted as
\(T^{-1}\text{.}\)
Exercise 5.2.3. Composition of Linear Maps.
Define \(T(x, y)=(x, y, x+y)\) and \(S(x, y, z)=(x+y+z, x-y+z, x+z)\text{.}\)
-
Determine \(S\circ T\text{.}\)
-
Find the matrix representation for \(T\text{,}\) \(S\text{,}\) and \(S\circ T\) respectively in the respective standard bases.
-
Are \(T\) or \(S\) invertible? Are they injective or surjective?
Exercise 5.2.4. Matrix Representation of the Derivative Operator.
Let \(\cP_{k}\) denote the span of \(\{1, \cos t, \sin t, \ldots, \cos (kt), \sin (kt)\}\) and define \(D:\cP_{k}\mapsto \cP_{k}\) be the derivative operator.
-
Find the matrix representation of \(D\) and \(D\circ D\) in the given basis.
-
Does \(D\) map the span of \(\{ \cos t, \sin t, \ldots, \cos (kt), \sin (kt)\}\) to itself? If so, determine whether this map is invertible.
Subsection 5.2.2 Operator Norm of a Linear Map
Definition 5.2.5. Operator Norm of a Linear Map.
If \(X\) and \(Y\) are normed vector spaces, and \(T:X\mapsto Y\) is a linear map, then the operator norm, also called Frobenius norm, of \(T\) is defined as
\begin{equation*}
||T|| :=\sup\left\{ ||T({\mathbf x})||_{Y}: ||{\mathbf x}||_{X} = 1\right\}.
\end{equation*}
Equivalently,
\begin{equation*}
||T|| :=\sup\left\{ ||T({\mathbf x})||_{Y}/||{\mathbf x}||_{X}: {\mathbf x}\ne {\mathbf 0}\right\}.
\end{equation*}
Sometimes we use the notation \(||T||_{\cF}\text{.}\)
It follows that
\begin{equation*}
||T({\mathbf x})||_{Y} \le ||T|| ||{\mathbf x}||_{X} \text{ for any vector } {\mathbf x},
\end{equation*}
and \(||T||\) is the smallest number \(C\) such that
\begin{equation*}
||T({\mathbf x})||_{Y} \le C ||{\mathbf x}||_{X} \text{ for any vector } {\mathbf x}.
\end{equation*}
In applications we often work normed vector spaces and linear maps with a finite operator norm, and when a linear map with a finite operator norm is invertible, we are interested in knowing whether its inverse also has a finite operator norm.
Proposition 5.2.8.
Suppose that
\(X\) is a finite dimensional normed vector space, then for any normed vector space
\(Y\) and
\(T: X\mapsto Y\) a linear map from
\(X\) to
\(Y\text{,}\) its operator norm
\(||T||\) is finite.
Proof.
Let \(\left\{{\mathbf x}_{1}, \ldots, {\mathbf x}_{n}\right\}\) be a basis of \(X\text{.}\) Then any \({\mathbf x}\in X\) has coordinates \((c_{1}, \ldots, c_{n})\) in this basis: \({\mathbf x}=\sum_{j=1}^{n}c_{j} {\mathbf x}_{j}\text{,}\) and
\begin{align*}
||T({\mathbf x})||=||\sum_{j=1}^{n}c_{j} T({\mathbf x}_{j})||\le \amp
\sum_{j=1}^{n}|c_{j}| || T({\mathbf x}_{j})|| \\
\le \amp
\left( \sum_{j=1}^{n}|| T({\mathbf x}_{j})||^{2}\right)^{1/2} \left( \sum_{j=1}^{n}|c_{j}|^{2}\right)^{1/2}.
\end{align*}
At the end of this subsection, we will prove a Lemma which implies that there exists some constant \(C>0\) such that
\begin{equation*}
\left( \sum_{j=1}^{n}|c_{j}|^{2}\right)^{1/2} \le C|| \sum_{j=1}^{n}c_{j} {\mathbf x}_{j}||=C||{\mathbf x}||
\text{ for all $\bx \in X$}.
\end{equation*}
This shows that \(||T||\) is finite and \(||T||\le C \left( \sum_{j=1}^{n}|| T({\mathbf x}_{j})||^{2}\right)^{1/2}\text{.}\)
Unless indicated otherwise, we will restrict to the situation that
\(X\) is a finite dimensional normed vector space.
Suppose \(S\) and \(T\) are linear maps from \(X\) to \(Y\text{.}\) Using the property of the operator norm, we see that
\begin{equation*}
||(T+S)({\mathbf x})||\le|| T ({\mathbf x}) + S ({\mathbf x})|| \le ||T|| || {\mathbf x}||+||S|| ||{\mathbf x}||
=\left(||T||+||S||\right) ||{\mathbf x}||
\end{equation*}
for any vector \({\mathbf x}\text{,}\) so it follows that
\begin{equation*}
||T+S||\le ||T||+||S||.
\end{equation*}
It is easier to see that \(||cT||=|c| ||T||\) for any scalar \(c\text{.}\) Thus the set \(L(X, Y)\) of linear maps from \(X\) to \(Y\) becomes a normed vector space.
Suppose \(S\) is a linear map from \(X\) to \(Y\text{,}\) and \(T\) is a linear map from \(Y\) to \(Z\text{,}\) using the property of the operator norm, we see that
\begin{equation*}
||TS({\mathbf x})||\le ||T|| ||S({\mathbf x})||\le ||T|| ||S|| ||{\mathbf x}|| \text{ for any vector } {\mathbf x},
\end{equation*}
thus \(||TS||\le ||T|| ||S||\text{.}\)
Exercise 5.2.10. Dependence of operator norm on vector space norm.
\(S(x_{1},\ldots, x_{n})=\sum_{i=1}^{n}a_{i}x_{i} \in \bbR\) is a linear function on
\(\bbR^{n}\text{.}\)
-
Determine the operator norm of
\(S\) if
\(\bbR^{n}\) is equipped with the norm
\(||(x_{1}, \ldots, x_{n})||_{1}\text{.}\)
-
Determine the operator norm of
\(S\) if
\(\bbR^{n}\) is equipped with the norm
\(||(x_{1}, \ldots, x_{n})||_{\infty}\text{.}\)
-
Determine the operator norm of
\(S\) if
\(\bbR^{n}\) is equipped with the norm
\(||(x_{1}, \ldots, x_{n})||_{p}:= \left( \sum_{i=1}^{n}|x_{i}|^{p}\right)^{1/p}\) for some
\(1 \lt p \lt
\infty\text{.}\)
When
\(X\) and
\(Y\) are finite dimensional, most questions about a linear map from
\(X\) to
\(Y\) can be formulated as a question about its matrix representation and answered that way. For example, if
\(X\) and
\(Y\) have the same dimension, then a a linear map
\(T\) from
\(X\) to
\(Y\) is injective iff the null space of its matrix representation is trivial, from which one also knows that
\(T\) is injective iff it is surjective.
However, when
\(X\) and
\(Y\) are not finite dimensional, we lose this matrix representation, and many of the conclusions or deductions in the finite dimensional setting do not work any more. For example, if
\(X=Y=l^{2}\text{,}\) and
\(L\) and
\(R\) are the left and right shift operator respectively, then
\(L\) is surjective but not injective, while
\(R\) is injective but not surjective.
In the context of Fourier series, we may consider \(f\mapsto (a_{0}, a_{1}, b_{1}, \ldots)\) as a linear map \(F\) from either \(C[-\pi, \pi]\) or \({\mathcal R}[-\pi, \pi]\) to \(l^{2}\text{,}\) where \((a_{0}, a_{1}, b_{1}, \ldots)\) is the vector of Fourier coefficients of \(f\text{.}\) Then the uniqueness of the Fourier series (which is equivalent to the completeness of the sequence of standard trigonometric functions) implies that this transformation is injective. The Besselβs inequality implies that, as a linear map from \(C[-\pi, \pi]\) to \(l^{2}\text{,}\) it has a bounded norm, as
\begin{equation*}
\frac{1}{2\pi} \int_{-\pi}^{\pi}|f(x)|^{2}\;dx \le (\max_{[-\pi, \pi]}|f(x)|)^{2} =||f||_{C[-\pi, \pi]}^{2}
\text{ for all } f\in C[-\pi, \pi],
\end{equation*}
and
\begin{align*}
\Vert (a_{0}, a_{1}, b_{1}, \ldots) \Vert_{l^2}
\amp \le \left(2|a_{0}|^{2}+ \sum_{n=1}^{\infty} \left[ |a_{n}|^{2}+|b_{n}|^{2}\right]\right)^{1/2}\\
\amp \le \frac{1}{\sqrt \pi} \left( \int_{-\pi}^{\pi}|f(x)|^{2}\;dx \right)^{1/2}\\
\amp \le \sqrt 2 ||f||_{C[-\pi, \pi]}.
\end{align*}
But if we consider this linear map as from \({\mathcal R}[-\pi, \pi]\) to \(l^{2}\text{,}\) it does not have a bounded norm if we equip function \(f\in
{\mathcal R}[-\pi, \pi]\) with the norm \(\int_{-\pi}^{\pi}|f(x)|\, dx\text{,}\) as for any \(C>0\text{,}\) there exists \(f\in {\mathcal R}[-\pi, \pi]\text{,}\) such that
\begin{equation*}
\Vert (a_{0}, a_{1}, b_{1}, \ldots) \Vert_{l^2}\ge \left( \frac{1}{2\pi} \int_{-\pi}^{\pi}|f(x)|^{2}\;dx\right)^{1/2}\ge C
\int_{-\pi}^{\pi}|f(x)|\;dx.
\end{equation*}
A natural question in this context is whether
\(F\) is surjective considered either as a map from
\(C[-\pi, \pi]\) or
\({\mathcal R}[-\pi, \pi]\) to
\(l^{2}\text{.}\) The answer to this question turns out to be negative, and it is related to whether
\(C[-\pi, \pi]\) or
\({\mathcal R}[-\pi, \pi]\) is a
complete normed space equipped with the norm
\(\left( \frac{1}{2\pi} \int_{-\pi}^{\pi}|f(x)|^{2}\;dx\right)^{1/2}\text{,}\) as the latter is also a well defined norm on either
\(C[-\pi, \pi]\) or
\({\mathcal R}[-\pi, \pi]\text{.}\)