Spatial Statistics (VII): Karhunen-Loève Expansion

Kosambi-Karhunen-Loève Theorem

Let $\mathcal{X}$ be a compact set and $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space. Given a second order process $\lbrace X(s) \in L^2(\Omega): s\in\mathcal{X} \rbrace$ over $(\Omega,\mathcal{F},\mathbb{P})$ with zero mean and continuous covariance function $k(\cdot,\cdot),$ then $(k(\cdot,\cdot)$ is a Mercer’s kernel, i.e. there exists $\lbrace \lambda_ l\rbrace_ {l\in\mathbb{N}^{*}}$ such that $\lambda_1\geq\lambda_2\geq\cdots\geq 0$ and an orthogonal basis $\lbrace \phi_l\rbrace_{l\in\mathbb{N}^*}\subset L^2(\Omega),$

$$k(s,t) = \sum_{l=1}^\infty \lambda_l\phi_l(s)\phi_l(t),\ \forall s,t\in\mathcal{X}.$$

Theorem 1 (Kosambi-Karhunen-Loève). The process $X(\cdot)$ defined above admits the following expansion:

$$X(t) = \sum_{l=1}^\infty\sqrt{\lambda_l}\xi_l\phi_l(t),$$

where

$$\xi_l = \frac{1}{\sqrt{\lambda_l}}\int_{\mathcal{X}} X(t)\phi_l(t)\mathrm{d}t,$$

and the convergence holds uniformly on $\mathcal{X}$ in $L^2$ sense. Furthermore, $\lbrace\xi_l\rbrace_{l\in\mathbb{N}^{*}}$ are uncorrelated, with

$$\mathbb{E}\xi_i=0,\ \mathrm{Cov}(\xi_i,\xi_{j}) = \delta_{ij},\ \forall i,j\in\mathbb{N}^*.$$

Proof. By definition,

$$\begin{aligned} \mathbb{E}\xi_i &= \frac{1}{\sqrt{\lambda_i}}\int_{\mathcal{X}} X(t)\phi_i(t)\mathrm{d}t,\\ \mathrm{Cov}(\xi_i,\xi_{j}) = \mathbb{E}[\xi_i,\xi_j] &= \frac{1}{\sqrt{\lambda_i}}\frac{1}{\sqrt{\lambda_j}}\int_\mathcal{X}\int_\mathcal{X}\mathbb{E}[X(s)X(t)]\phi_i(s)\phi_j(t)\mathrm{d}s\mathrm{d}t \\ &= \frac{1}{\sqrt{\lambda_i}}\frac{1}{\sqrt{\lambda_j}}\int_\mathcal{X}\left(\int_\mathcal{X}k(s,t)\phi_i(s)\mathrm{d}s\right)\phi_j(t)\mathrm{d}t\\ &= \frac{1}{\sqrt{\lambda_i}}\frac{1}{\sqrt{\lambda_j}}\int_\mathcal{X}\lambda_i\phi_i(t)\phi_j(t)\mathrm{d}t\\ & = \frac{\lambda_i}{\sqrt{\lambda_i\lambda_j}}\int_\mathcal{X}\phi_i(t)\phi_j(t)\mathrm{d}t\\ & = \frac{\lambda_i}{\sqrt{\lambda_i\lambda_j}}\delta_{ij} = \delta_{ij}. \end{aligned}$$

Also, we have

$$\begin{aligned} \mathbb{E}[X(t)\xi_l] &= \frac{1}{\sqrt{\lambda_l}}\mathbb{E}\left[\int_\mathcal{X} X(t)X(s)\phi_l(s)\mathrm{d}s\right]\\ &= \frac{1}{\sqrt{\lambda_l}}k(t,s)\phi_l(s)\mathrm{d}s = \sqrt{\lambda_l}\phi_l(t)\end{aligned}.$$

To show the convergence, we define $X_L(\cdot)$ as

$$X_L(t) = \sum_{l=1}^L\sqrt{\lambda_l}\xi_l\phi_l(t),$$

$$\begin{aligned} \mathbb{E}\left[\left(X(t) - X_L(t)\right)^2\right] &= \mathbb{E}\left[X(t)^2\right] - 2\mathbb{E}\left[X(t)X_L(t)\right] + \mathbb{E}\left[X_L(t)^2\right]\\ &= \mathbb{E}\left[X(t)^2\right] - 2\sum_{l=1}^L\sqrt{\lambda_l}\phi_l(t)\mathbb{E}\left[X(t)\xi_l\right] + \sum_{l=1}^L\lambda_l\phi_l(t)^2\mathbb{E}[\xi_l^2]\\ &= k(t,t) - \sum_{l=1}^L\lambda_l\phi_l(t)^2. \end{aligned}$$

Note that $X(\cdot)$ is square integrable, which implies

\[\sup_{t\in\mathcal{X}}k(t,t) = \sup_{t\in\mathcal{X}}\sum_{l=1}^\infty\lambda_l\phi_l(t)^2 < \infty,\]

we have

$$\lim_{L\to\infty} \left\lbrace k(t,t) - \sum_{l=1}^L\lambda_l\phi_l(t)^2\right\rbrace = 0 $$

uniformly on $t\in\mathcal{X}.$ Hence $X_L(\cdot)\overset{L^2}{\to}X(\cdot),$ which concludes the proof.

Remark.

The Karhunen-Loève expansion gives the representation of a stochastic process as an infinite linear combination of orthogonal functions.
(Gaussian case) For a Gaussian process $X(\cdot)\sim\mathcal{GP}(0,k(\cdot,\cdot))$ with Mercer’s kernel $k(s,t)=\sum_{l\in\mathbb{N}^{*}}\lambda_l\phi_l(s)\phi_l(t),$ it can be represented as
$$X(t) = \sum_{l=1}^\infty\sqrt{\lambda_l}Z_l\phi_l(t),\ \ Z_l\overset{\mathrm{i.i.d.}}{\sim} N(0,1).$$
If $X(\cdot)$ is finite dimensional, i.e. there exists some $N$ such that $\lambda_n = 0$ if $n>N,$ then the Karhunen-Loève expansion has a truncated version:
$$X(t) = \sum_{l=1}^N\sqrt{\lambda_l}\xi_l\phi_l(t).$$

Note that the proof above need to be slightly modified, since $\xi_n$’s with $n > N$ are undefined.

Function Approximation with Orthogonal Bases

We consider the problem of function approximation. Let $\mathcal{X}$ be a compact set. For a space of (real-valued) functions defined on $\mathcal{X}$, we may seek for a basis $\lbrace \psi_i\rbrace_{i\in\mathbb{N}^{*}}$ such that the first $n$ can be used to approximate the functions in this space. To evaluate the approximation error, we define the $\mu$-weighted $L^2$ norm:

$$\Vert f\Vert_{L^2_\mu} = \sqrt{\int_\mathcal{X} \vert f(x)\vert^2\mathrm{d}\mu}$$

where $\mu$ is a $\sigma$-finite measure on $\mathcal{X}.$ This norm has an associated (real) inner product

$$\langle f,g\rangle_{L^2_\mu} = \sqrt{\int_\mathcal{X} f(x)g(x)\mathrm{d}\mu}.$$

Consider approximating functions in the $L^2_\mu$ space

$$L^2_\mu(\mathcal{X}) = \left\lbrace f:\mathcal{X}\to\mathbb{R}, \Vert f\Vert_{L^2_\mu} < \infty\right\rbrace.$$

A basis $\lbrace \psi_i\rbrace_{i\in\mathbb{N}^{*}}$ for $L^2_ \mu(\mathcal{X})$ is a collection of functions such that any $f\in L^2_ \mu(\mathcal{X})$ can be uniquely represented as

$$f=\sum_{l=1}^\infty c_l\psi_l$$

in the sense that the series of partial sums converges in norm:

$$\lim_{n\to\infty}\left\Vert f - \sum_{l=1}^n c_l\psi_l\right\Vert_{L^2_\mu} = 0.$$

Orthogonal Basis

Definition (Orthogonality). A basis $\lbrace \psi_i\rbrace_{i\in\mathbb{N}^{*}}$ for $L^2_ \mu(\mathcal{X})$ is called orthogonal if $\langle \psi_i,\psi_j\rangle_{L^2_\mu} = 0,\ \forall i\neq j.$

Properties. Suppose $\lbrace \psi_ i\rbrace_ {i\in\mathbb{N}^{*}}$ is an orthogonal basis over $L^2_ \mu(\mathcal{X}).$ For any $f\in L^2_ \mu(\mathcal{X}),$ it has the representation $f=\sum_ {n\in\mathbb{N}^{*}} c_ n\psi_ n.$

(Coefficient Matching). The coefficient $c_l$ can be easily found by taking the inner product with a basis function:
$$\langle f,\psi_l\rangle_{L^2_\mu} = \sum_{n=1}^\infty c_n\langle \psi_l,\psi_n\rangle_{L^2_\mu} = c_l\Vert\psi_l\Vert_{L^2_\mu}^2\ \Longrightarrow\ c_l = \frac{\langle f,\psi_l\rangle_{L^2_\mu}}{\Vert\psi_l\Vert_{L^2_\mu}^2}.$$
(Best approximation). The approximation
$$f_N=\sum_{l=1}^N c_l\psi_l$$

is the best approximation to $f$ in subspace $\mathcal{S}_N=\mathrm{span}\lbrace\psi_1,\cdots,\psi_N\rbrace$ in the sense of minimizing the norm of error, i.e.

$$f_N = \underset{g\in\mathcal{S}_N}{\mathrm{argmin}} \Vert f - g\Vert_{L^2_\mu}.$$
(Parseval’s theorem).
$$\Vert f-f_N\Vert_{L^2_\mu} = \sqrt{\sum_{l=N+1}^\infty c_l^2\Vert \psi_l\Vert_{L_\mu^2}^2}.$$

Examples. In the following discussion, we consider function approximation in the $L^2$-space, i.e. the weighting measure $\mu$ is Lebesgue measure.

(Fourier Series). Consider function approximations on the closed interval $[0,2\pi],$ an orthogonal basis is $\lbrace 1 \rbrace\cup\lbrace \cos(nx)\rbrace_{n\in\mathbb{N}^*}\cup\lbrace \sin(nx) \rbrace_{n\in\mathbb{N}^*}$ Then we can expand a function $f\in L_2([0,2\pi])$ to its Fourier series:
$$f(x) = A_0 + \sum_{n=1}^\infty A_n\cos(nx) + \sum_{n=1}^\infty B_n\sin(nx),$$

where

$$\begin{aligned} A_0 &= \frac{1}{2\pi}\int_0^{2\pi}f(x)\mathrm{d}x,\\ A_n &= \frac{1}{\pi}\int_0^{2\pi}f(x)\cos(nx)\mathrm{d}x,\ n\geq 1,\\ B_n &= \frac{1}{\pi}\int_0^{2\pi}f(x)\sin(nx)\mathrm{d}x,\ n\geq 1.\end{aligned}$$
(Legendre Polynomials). Consider function approximations on the closed interval $[-1,1].$ Beginning from $P_0(x)=1,\ P_1(x)=x,$ one can use Gram-Schmidt process to find a polynomial basis $\lbrace P_n\rbrace_{n\in\mathbb{N}}$ on $[-1,1].$ A general solution is given by Rodrigues formula:
$$P_n(x) = \frac{1}{2^n n!}\frac{\mathrm{d}^n}{\mathrm{d}x^n}(x^2 - 1)^n.$$

Optimality of truncated Karhunen-Loève expansion

Let $X(t),\, t\in\mathcal{X}$ be a zero mean second-order stochastic process over $(\Omega,\mathcal{F},\mathbb{P}),$ with covariance $k(\cdot,\cdot)$ a Mercer’s kernel. Let $\lbrace\psi_l\rbrace_{l\in\mathbb{N}^*}$ be a orthonormal basis in $L^2(\mathcal{X}),$ i.e. $\int_\mathcal{X}\psi_i(t)\psi_j(t) = \delta_{ij},$ Then we can expand $X(t)$ as an infinite series:

$$X(t) = \sum_{l=1}^\infty \sqrt{\nu_l}\psi_l(x)\xi_l,$$

where

$$\xi_l = \frac{1}{\sqrt{\nu_l}} \int_\mathcal{X} X(t)\psi_l(t)\mathrm{d}t.$$

Then the truncated version of $X(t)$ at order $p$ and the corresponding term can be written as

$$\begin{aligned} X_p(t) &= \sum_{l=1}^p \sqrt{\nu_l}\psi_l(x)\xi_l,\\ e_p(t) &= X(t) - X_p(t) = \sum_{l=p+1}^\infty \sqrt{\nu_l}\psi_l(t)\xi_l = \sum_{l=p+1}^\infty\int_\mathcal{X} X(s)\psi_l(s)\psi_l(t)\mathrm{d}s. \end{aligned}$$

Furthermore, the integrated mean squared error (IMSE) is defined as

$$\begin{aligned} \mathcal{E}_p &= \int_\mathcal{X}\mathbb{E}\left[e_p(t)^2\right]\mathrm{d}t\\ &= \int_\mathcal{X}\mathbb{E}\left[\sum_{m=p+1}^\infty\sum_{n=p+1}^\infty\int_\mathcal{X}\int_\mathcal{X}X(s)X(u)\psi_m(s)\psi_m(t)\psi_n(u)\psi_n(t)\mathrm{d}u\mathrm{d}s\right]\mathrm{d}t\\ &= \sum_{m=p+1}^\infty\sum_{n=p+1}^\infty\left(\int_\mathcal{X}\psi_m(t)\psi_n(t)\mathrm{d}t\right)\left(\int_\mathcal{X}\int_\mathcal{X}k(s,u)\psi_m(s)\psi_n(u)\mathrm{d}u\mathrm{d}s\right)\\ &= \sum_{m=p+1}^\infty\left(\int_\mathcal{X}\int_\mathcal{X}k(s,u)\psi_m(s)\psi_m(u)\mathrm{d}u\mathrm{d}s\right). \end{aligned}$$

The following therorem states the optimality of truncated Karhunen-Loève expansion in the sense of minimizing IMSE.

Theorem 2 (Optimality of truncated Karhunen-Loève expansion). Among all the truncated expansion expressed as a finite linear combination of orthonormal basis, the Karhunen-Loève expansion minimizes the IMSE.

Proof. We fix the truncation order $p$. Recall the expression of IMSE $\mathcal{E}_p,$ consider the following optimization problem:

$$\min_{\lbrace \lambda_l\rbrace_{l>p}} \sum_{l=p+1}^\infty\left(\int_\mathcal{X}\int_\mathcal{X}k(s,u)\psi_l(s)\psi_l(u)\mathrm{d}u\mathrm{d}s\right)\ \ \mathrm{s.t.}\ \int_\mathcal{X}\psi_l(t)^2\mathrm{d}t = 1,\ l\geq p+1.$$

Construct the Lagrangian function

$$\mathcal{L} = \sum_{l=p+1}^\infty\left\lbrace\left(\int_\mathcal{X}\int_\mathcal{X}k(s,u)\psi_l(s)\psi_l(u)\mathrm{d}u\mathrm{d}s\right) + \zeta_l\left(1 - \int_\mathcal{X}\psi_l(t)^2\mathrm{d}t\right)\right\rbrace.$$

Differentiate $\mathcal{L}$ with respect to $\psi_l(\cdot),\ l\geq p+1,$ we can obtain a functional derivative:

$$\frac{\partial\mathcal{L}}{\partial\psi_l(\cdot)} = 2\int_\mathcal{X}\left(\int_\mathcal{X}k(s,t)\psi_l(s)\mathrm{d}s - \zeta_l\psi_l(t)\right)\mathrm{d}t.$$

Set this derivative to zero yields a Fredholm integral equation:

$$ \zeta_l\psi_l(t) = \int_\mathcal{X}k(s,t)\psi_l(s)\mathrm{d}s.$$

Hence $\psi_l$ is selected as the eigenfunction of integral operator $T_k(\cdot) = \int_\mathcal{X}k(\cdot,s)\psi_l(s)\mathrm{d}s,$ which yields Karhunen-Loève expansion.

Example: Brownian Motion

Recall that the standard Brownian motion $W(\cdot)$ is a Gaussian process with mean zero and covariance $k(s,t) = \min(s,t).$ For simplicity, the domain of this process is assumed to be $[0,1],$ i.e. $\mathcal{X}=[0,1].$ The integral equation is

\[\int_0^1 \min(s,t)\phi(s)\mathrm{d}s = \lambda\phi(t),\ \lambda > 0.\tag{1}\]

Equivalently,

\[\int_0^t s\phi(s)\mathrm{d}s + t\int_t^1 \phi(s)\mathrm{d}s= \lambda\phi(t),\ \lambda > 0.\tag{2}\]

Take the derivatives on both sides two times, we get

$$\begin{align} \int_t^1 \phi(s)\mathrm{d}s &= \lambda\frac{\mathrm{d}}{\mathrm{d}t}\phi(t)，\tag{3} \\ -\phi(t) &= \lambda\frac{\mathrm{d}^2}{\mathrm{d}t^2}\phi(t). \tag{4} \end{align}$$

Solve the second order differential equation (4), we got

\[\phi(t) = A\cos\left(\frac{t}{\sqrt{\lambda}}\right) + B\sin\left(\frac{t}{\sqrt{\lambda}}\right). \tag{5}\]

Take $t=0$ in the integral equation (1), we have $\phi(0)=0,$ which implies $\phi(t) = B\sin\left(\frac{t}{\sqrt{\lambda}}\right).$ Plug in this to (3), we obtain

\[B\cos\left(\frac{t}{\sqrt{\lambda}}\right) - B\cos\left(\frac{1}{\sqrt{\lambda}}\right) = B\cos\left(\frac{t}{\sqrt{\lambda}}\right),\]

hence $\cos\left(\frac{1}{\sqrt{\lambda}}\right)=0,$ the eigenvalues are

\[\lambda_l = \frac{1}{\left(l - \frac{1}{2}\right)^2\pi^2},\ \phi_l(t) = B\sin\bigl[(l - \frac{1}{2})\pi t\bigl],\ l=1,2,\cdots.\]

Moreover, the orthonormality condition implies

\[1 = \int_0^1\phi_l(t)^2\mathrm{d}t = \frac{1}{2}B^2 - \frac{1}{4}\sqrt{\lambda_l}B^2\sin\left((2l-1)\pi\right) = \frac{1}{2}B^2,\]

hence $B=\pm\sqrt{2}.$ (The sign does not matter since the standard Gaussian distribution is symmetric.)

Therefore, the standard Brownian motion $W(t),\ 0\leq t\leq 1$ can be represented as its Karhunen-Loève expansion:

$$W(t) = \sqrt{2}\sum_{l=1}^\infty \frac{2}{(2l-1)\pi}\sin\bigl[(l - \frac{1}{2})\pi t\bigl]Z_l,$$

where $Z_l\overset{\mathrm{i.i.d.}}{\sim} N(0,1).$

Spatial Statistics (VI): Kernel Ridge Regression

Structural Causal Models (I): Formulation and Confounding Bias