Spatial Statistics (V): Reproducing Kernel Hilbert Space

Definition and Properties

Definition (Reproducing kernel Hilbert space). Let $\mathcal{X}$ be a non-empty set and $\mathcal{H}$ be a Hilbert space of real-valued functions defined on $\mathcal{X}$ and equipped with inner product $\langle\cdot,\cdot\rangle_\mathcal{H}$. A symmetric and positive definite function $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}$ is called a reproducing kernel, and $\mathcal{H}$ is a reproducing kernel Hilbert space, if $k$ satisfies the following two conditions:

$\forall x\in\mathcal{X},\ k(\cdot,x)\in\mathcal{H},$
(Reproducing property) $\forall x\in\mathcal{X},\ \forall f\in\mathcal{H},\ f(x)=\left\langle f,k(\cdot,x) \right\rangle_\mathcal{H}.$

By definition, for any $x,y\in\mathcal{X}$, it holds $k(x,y)=\langle k(\cdot, x),k(\cdot,y)\rangle_\mathcal{H}$.

Property. Define $\delta_x$ to be the evaluation functional at $x$, i.e.,

$$f(x) = \delta_x[f] = \int_\mathcal{X} \delta_x(t) f(t)\mathrm{d}t,$$

then $\forall x\in\mathcal{X}$, $\delta_x$ is a bounded operator on $\mathcal{H}$, i.e., there exists a corresponding $\lambda_x\geq 0$ such that

$$\vert f(x)\vert = \left\vert\delta_x[f]\right\vert \leq \lambda_x\Vert f\Vert_{\mathcal{H}}.$$

Proof. Given a Hilbert space $\mathcal{H}$ with reproducing kernel $k(\cdot,\cdot)$, it holds

$$\left\vert \delta_x[f]\right\vert = \vert f(x)\vert = \left\vert\left\langle f,k(\cdot,x) \right\rangle_\mathcal{H}\right\vert \leq \left\Vert k(\cdot, x)\right\Vert_\mathcal{H}\left\Vert f \right\Vert_\mathcal{H} = \sqrt{k(x,x)}\cdot\left\Vert f \right\Vert_\mathcal{H}.$$

Hence $\delta_x: \mathcal{H}\to\mathbb{R}$ is a bounded operator with $\lambda_x = \sqrt{k(x,x)}$.

Remark. An equivalent definition of RKHS is stated as follows: $\mathcal{H}$ is a RKHS if the evaluation functional $\delta_x$ is bounded for all $x\in\mathcal{X}$. We have proven that the first definition implies the second. The proof of the other direction uses the Riesz representation theorem.

Construction of RKHS

From Pre-Hilbert Space to its Completion: Moore-Aronszajn Theorem

Theorem (Moore-Aronszajn). RKHS and positive definite kernel are one-to-one correspondent, i.e., for each positive definite kernel $k(\cdot,\cdot)$, there exists a unique RKHS with $k(\cdot,\cdot)$ as its reproducing kernel.

Proof. Let $\mathcal{X}$ be a non-empty set and $k(\cdot,\cdot)$ be a positive definite kernel on $\mathcal{X}\times\mathcal{X}$. Define

\[\mathcal{H}_ 0 = \mathrm{span} \left\lbrace k(\cdot, x): x\in\mathcal{X}\right\rbrace = \left\lbrace\sum_{i=1}^n c_i k(\cdot,x_i): n\in\mathbb{N},\ c_1,\cdots,c_n\in\mathbb{R},\ x_1,\cdots,x_n\in\mathcal{X}\right\rbrace,\]

then $\mathcal{H}_0$ is a pre-Hilbert space with inner product

$$\left\langle f,g \right\rangle_{\mathcal{H}_0} = \sum_{i=1}^m\sum_{j=1}^n a_i b_j k(x_i, y_j),$$

where $f,g\in\mathcal{H}_0$ have representations

$$f=\sum_{i=1}^m a_i k(\cdot, x_i),\ g=\sum_{j=1}^n b_j k(\cdot, y_j),\ x_1,\cdots,x_m,y_1,\cdots,y_n\in\mathcal{X}.$$

Let $\mathcal{H}$ be the completion of $\mathcal{H}_0$ under $\Vert\cdot\Vert _{\mathcal{H}_0} = \langle\cdot,\cdot\rangle _{\mathcal{H}_0}$, i.e., $\mathcal{H}$ consists of functions $f\in\mathbb{R}^\mathcal{X}$ of the form

$$f=\sum_{i=1}^\infty {c_i}k(\cdot,x_i),\ \text{with}\ \lim_{n\to\infty}\sup_{m\in\mathbb{N}^*}\left\Vert\sum_{i=n+1}^{n+m} c_ik(\cdot, x_i)\right\Vert _{\mathcal{H}_0} = 0.$$

We claim that $\mathcal{H}$ is a RKHS with $k(\cdot,\cdot)$ as its reproducing kernel. To show this it suffices to check the reproducing property: $\forall x \in\mathcal{X}$,

$$\left\langle f, k(\cdot, x)\right\rangle_{\mathcal{H}} = \left\langle \sum_{i=1}^\infty {c_i}k(\cdot,x_i), k(\cdot, x)\right\rangle_{\mathcal{H}} = \sum_{i=1}^\infty {c_i}\left\langle k(\cdot,x_i), k(\cdot, x)\right\rangle_{\mathcal{H}_ 0} = \sum_{i=1}^\infty c_i k(x,x_i) = f(x).$$

Now we prove that $\mathcal{H}$ is unique. Suppose that $\tilde{\mathcal{H}}$ is another RKHS with $k(\cdot,\cdot)$ as its reproducing kernel. Recall the properties of RKHS, $\mathcal{H}_ 0 \subset\tilde{\mathcal{H}}$, and $\langle\cdot,\cdot\rangle_{\mathcal{H}} = \langle\cdot,\cdot\rangle_{\tilde{\mathcal{H}}}$ on $\mathcal{H}_0$. Since $\tilde{\mathcal{H}}$ is complete, it contains the closure of $\mathcal{H}_0$, i.e. $\mathcal{H}\subseteq\tilde{\mathcal{H}}$. it remains to prove that $\mathcal{H}\supseteq\tilde{\mathcal{H}}$.

Let $\tilde{\mathcal{H}} = \mathcal{H}\oplus\mathcal{H}^\perp$, $\forall f\in\tilde{\mathcal{H}}$, it can be decomposed as $f=f^* + f^\perp$, where $f^*\in\mathcal{H},f^\perp\in\mathcal{H}^\perp$, then $\forall x\in\mathcal{X}$,

$$f(x) = \langle f, k(\cdot, x)\rangle_ {\tilde{\mathcal{H}}} = \langle f^*, k(\cdot,x)\rangle_ {\tilde{\mathcal{H}}} + \langle f^\perp,k(\cdot,x)\rangle _{\tilde{\mathcal{H}}} = \langle f^*, k(\cdot,x)\rangle_{\tilde{\mathcal{H}}} = \langle f^*, k(\cdot,x)\rangle_{\mathcal{H}} = f^*(x),$$

which implies $f^\perp = 0$ on $\mathcal{X}$. Hence $f=f^*\in\mathcal{H}$, which concludes the proof.

Eigenanalysis: Mercer’s Theorem

Regular conditions. In the following discussion, we fix the domain $\mathcal{X}$ as a compact set, and we make the following five assumptions of our kernel function $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}$.

Continuous: $k$ is continuous on $\mathcal{X}\times\mathcal{X}$;
Symmetric: $\forall x, y\in\mathcal{X},\ k(x,y) = k(y,x)$;
Bounded: $\sup_{x\in\mathcal{X}}k(x,x) < \infty$;
Square integrable: $\int_\mathcal{X}\int_\mathcal{X}k(x,y)^2\mathrm{d}x\mathrm{d}y < \infty$;
Positive-definite: $\forall f\in L_2(\mathcal{X}),\ \int_\mathcal{X}\int_\mathcal{X} f(x)k(x,y)f(y)\mathrm{d}x\mathrm{d}y \geq 0.$

Definition (Integral operator). Fix a kernel $k:\mathcal{X}\times\mathcal{X}\to\mathbb{R}$ on a compact set $\mathcal{X}\subset\mathbb{R}^d$, the operator $T_k:L_2(\mathcal{X})\to\mathcal{X}$ is defined as

\[T_k[f] (\cdot) := \int_\mathcal{X} k(\cdot, x)f(x)\mathrm{d}x,\ \ f\in L_2(\mathcal{X}).\]

$T_k$ is said to be positive definite if $\forall f\in L_2(\mathcal{X})$, it holds $\left\langle f, T_k[f]\right\rangle_{L_2(\mathcal{X})} \geq 0$, that is,

$$\int_\mathcal{X}\int_\mathcal{X} f(x)k(x,y)f(y)\mathrm{d}x\mathrm{d}y \geq 0.$$

Properties. The integral operator given by the definition above have following properties:

(Linear) $T_k[\alpha f + \beta g] = \alpha T_k[f] + \beta T_k[g],\ \alpha,\beta\in\mathbb{R},\ f,g\in L_2(\mathcal{X});$
(Bounded) By Cauchy-Schwarz inequality,
$$\left\vert T_k[f](x)\right\vert^2 \leq \left(\int_\mathcal{X}k(x,y)^2\mathrm{d}y\right)\left(\int_\mathcal{X}f(y)^2\mathrm{d}y\right),$$

furthermore,

$$\left\Vert T_k[f]\right\Vert_{L_2(\mathcal{X})} \leq C_k\Vert f\Vert^2_{L_2(\mathcal{X})},\ C_k = \left(\int_\mathcal{X}\int_\mathcal{X}k(x,y)^2\mathrm{d}x\mathrm{d}y\right)^{1/2};$$
(Symmetric/Self-adjoint)
$$\left\langle T_k[f], g \right\rangle_{L_2(\mathcal{X})} = \left\langle f, T_k[g]\right\rangle_{L_2(\mathcal{X})},\ f,g\in L_2(\mathcal{X}).$$
(Eigendecomposition) In functional analysis it is shown that $T_k$ is not just bounded but compact. Spectral theorem gives the eigendecomposition of compact and self-adjoint operator $T_k$:
- $T_k$ has at most countably many eigenvalues $\lambda_1\geq\lambda_2\geq\cdots\geq 0$ such that $\lim_{n\to\infty}\lambda_n = 0;$
- The corresponding eigenfunctions $\lbrace \phi_n\rbrace$, satisfying $T_k[\phi_n] = \lambda_n\phi_n$, form an orthogonal basis in $L_2(\mathcal{X}),$ i.e. $\int_\mathcal{X}\phi_i(x)\phi_j(x)\mathrm{d}x = \delta_{ij},\ i, j\in\mathbb{N}^*.$

Proposition 1. For every $f\in L_2(\mathcal{X})$, it has expansion

$$f(x)=\sum_{n=1}^\infty\left\langle f, \phi_n \right\rangle_{L_2(\mathcal{X})}\phi_n(x),$$

and the convergence holds in $L_2$ sense.

Proof. Since $\lbrace \phi_n\rbrace$ form an orthogonal basis in $L_2(\mathcal{X})$, $f$ has some representation of the form

$$f=\sum_{n=1}^\infty f_n\phi_n,\ \lbrace f_n\rbrace\subset\mathbb{R}.$$

By orthogonality, it can be calculated that

$$f_n = \left\langle f, \phi_n \right\rangle_{L_2(\mathcal{X})} = \int_\mathcal{X}f(x)\phi_n(x)\mathrm{d}x,\ \ \Vert f \Vert_{L_2(\mathcal{X})}^2 = \sum_{n=1}^\infty \left\langle f, \phi_n \right\rangle_{L_2(\mathcal{X})}^2 = \sum_{n=1}^\infty f_n^2 < \infty,$$

then we have

$$\begin{aligned} \left\Vert f - \sum_{j=1}^n f_j\phi_j \right\Vert_{L_2(\mathcal{X})}^2 &= \left\Vert f\right\Vert_{L_2(\mathcal{X})}^2 - 2\sum_{j=1}^n f_j\left\langle f, \phi_j \right\rangle_{L_2(\mathcal{X})} + \left\Vert \sum_{j=1}^n f_j\phi_j \right\Vert_{L_2(\mathcal{X})}^2 \\ &= \left\Vert f\right\Vert_{L_2(\mathcal{X})}^2 - \sum_{j=1}^n f_j^2\overset{n\to\infty}{\ \longrightarrow\ } 0;\end{aligned}$$

Thus we conclude the proof.

Theorem (Mercer). Under regular conditions, $k(\cdot,\cdot)$ admits the following spectral representation:

$$k(x,y) = \sum_{l=1}^\infty \lambda_l\phi_l(x)\phi_l(y),$$

Proof. By definition of the integral operator,

$$\left\langle k(\cdot, y), \phi_l\right\rangle_{L_2(\mathcal{X})} = T_k[\phi_l](y) = \lambda_l\phi_l(y);$$

Proposition 1 implies that

$$k(x, y) = \sum_{l=1}^\infty \left\langle k(\cdot, y), \phi_l\right\rangle_{L_2(\mathcal{X})}\phi_l(x) = \sum_{l=1}^\infty \lambda_l\phi_l(x)\phi_l(y); $$

Remark. Further study shows that this converge is absolute and uniform, i.e.,

$$\lim_{m,n\to\infty}\sup_{x,y\in\mathcal{X}}\sum_{l=m+1}^n\lambda_l\left\vert\phi_l(x)\phi_l(y)\right\vert = 0,\ \ \ \lim_{n\to\infty}\sup_{x,y\in\mathcal{X}}\left\vert k(x,y) - \sum_{l=1}^n\lambda_l\phi_l(x)\phi_l(y)\right\vert = 0.$$

Corollary (Trace of Functions). Under regular conditions, the trace of kernel $k$ can be calculated by

$$\int_\mathcal{X} k(x,x)\mathrm{d}x = \sum_{l=1}^\infty \lambda_l,\ \int_\mathcal{X}\int_\mathcal{X} k(x,y)^2\mathrm{d}x\mathrm{d}y = \int_\mathcal{X}\left(\int_\mathcal{X} k(y,x)k(x,y)\mathrm{d}x\right)\mathrm{d}y = \sum_{l=1}^\infty \lambda_l^2,$$

More generally, extend the matrix multiplication to functions:

$$k^{(1)}(x,y)=k(x,y),\ k^{(n)}(x,y) = \int_\mathcal{X}k^{(n-1)}(x,z)k(z,y)\mathrm{d}z,\ n \geq 2;$$

then

$$k^{(n)}(x,y) = \sum_{l=1}^\infty\lambda^n\phi_l(x)\phi_l(y),\ \ \int_\mathcal{X}k^{(n)}(x,x)\mathrm{d}x = \sum_{l=1}^\infty\lambda_l^n,\ \ n\geq 1.$$

Theorem (An alternative construction of RKHS). Let $\mathcal{H}$ to be a Hilbert space defined as

$$\mathcal{H}=\left\lbrace f=\sum_{l=1}^\infty f_l\phi_l:\ \Vert f\Vert_\mathcal{H}^2 = \sum_{l=1}^\infty\frac{f_l^2}{\lambda_l}<\infty,\ \lbrace f_l\rbrace \subset\mathbb{R}\right\rbrace,$$

with the inner product defined as

$$\langle f,g\rangle_\mathcal{H} = \sum_{l=1}^\infty \frac{f_l g_l}{\lambda_l},\ \ \text{where}\ f,g\in\mathcal{H}\ \text{have representations}\ f=\sum_{l=1}^\infty f_l\phi_l,\ g=\sum_{l=1}^\infty g_l\phi_l,$$

then $\mathcal{H}$ is the RKHS associated with reproducing kernel $k(\cdot,\cdot)$.

Proof. Recall that $k(\cdot, x) = \sum_{l\in\mathbb{N}^*} \lambda_l\phi_l(\cdot)\phi_l(x)$,

$$\Vert k(\cdot, x)\Vert_\mathcal{H}^2 = \sum_{l=1}^\infty \frac{\lambda_l^2\phi_l(x)^2}{\lambda_l} = \sum_{l=1}^\infty \lambda_l\phi_l(x)^2 = k(x,x) \leq \sup_{x\in\mathcal{X}} k(x,x) < \infty,$$

hence $k(\cdot, x) \in \mathcal{H},\ \forall x\in\mathcal{X}.$

Now we prove the reproducing property. For an arbitrary $f = \sum_{l\in\mathbb{N}^*}f_l\phi_l \in\mathcal{H},$

$$\langle f(\cdot),k(\cdot,x)\rangle_\mathcal{H} = \left\langle\sum_{l=1}^\infty f_l\phi_l(\cdot), \sum_{l=1}^\infty \lambda_l\phi_l(x)\phi_l(\cdot)\right\rangle_\mathcal{H} = \sum_{l=1}^\infty\frac{f_l\lambda_l\phi_l(x)}{\lambda_l} = \sum_{l=1}^\infty f_l\phi_l(x) = f(x);$$

Therefore $\mathcal{H}$ is a reproducing kernel Hilbert space with reproducing kernel $k(\cdot,\cdot)$.

Remarks.

To see into the structure of the inner product defined above, let’s derive it under the original definition of RKHS. Using the reproducing properties, $\forall l\in\mathbb{N}^*$,
$$\phi_l(x) = \left\langle \phi_l(\cdot), \sum_{l'=1}^\infty\lambda_{l'}\phi_{l'}(\cdot)\phi_{l'}(x)\right\rangle = \sum_{l'=1}^\infty\lambda_{l'}\phi_{l'}(x)\langle \phi_l(\cdot),\phi_{l'}(\cdot)\rangle_\mathcal{H},\ \forall x\in\mathcal{X};$$

which implies

$$\left\langle \phi_l(\cdot),\phi_{l'}(\cdot)\right\rangle_\mathcal{H} = \frac{\delta_{ll'}}{\lambda_{l'}} = \frac{\delta_{ll'}}{\lambda_{l}}.$$

It is seen that the orthogonality of eigenfunctions is preserved in RKHS.
The RKHS norm and $L_2$ norm are not equivalent. Note that $\lim_{n\to\infty} \lambda_n = 0$, we have
$$\sup_{f\in\mathcal{H}} \frac{\Vert f\Vert_\mathcal{H}}{\Vert f\Vert_{L_2(\mathcal{X})}} \geq \lim_{n\to\infty} \frac{\Vert \phi_n\Vert_\mathcal{H}}{\Vert \phi_n\Vert_{L_2(\mathcal{X})}} = \infty.$$

One reason is that the RKHS norm measures not only the magnitude of a function but also its smoothness.

Spatial Statistics (IV): Kriging

Spatial Statistics (VI): Kernel Ridge Regression