Spatial Statistics (III): Gaussian Process

Definition and properties

The Gaussian process can be seen as an extension of multivariate Gaussian distribution from finite-dimensional case to infinite-dimensional case. It is a stochastic process or a random field (a collection of random variables indexed by time or space) such that the joint distribution of every finite collection of those random variables is Gaussian.

Definition (Gaussian process). Given a probability space $(\Omega,\mathcal{F},\mathbb{P})$ and an index set $\mathcal{D},$ a random field $\left\lbrace Z(s)\in L_2(\Omega):\ s\in\mathcal{D}\right\rbrace$ is called a Gaussian process, if for every finite set of indices $\lbrace s_1,\cdots,s_n\rbrace \subset \mathcal{D},$ the joint distribution of random variables $Z(s_1),\cdots,Z(s_n)$ is a multivariate Gaussian distribution.

The mean and covariance function of a Gaussian process is defined by

$$\mu(s):=\mathbb{E}[Z(s)],\ k(s,t):=\mathrm{Cov}\left\lbrace Z(s),z(t)\right\rbrace,$$

and they completely determine a Gaussian process $\mathcal{GP}\left(\mu(\cdot),k(\cdot,\cdot)\right).$

Property of Gaussian processes

A Gaussian process can be seen as a distribution over real-valued functions $\lbrace f:\mathcal{D}\to\mathbb{R}\rbrace.$ Suppose that $f\sim\mathcal{GP}\left(\mu(\cdot),k(\cdot,\cdot)\right),$ then $\forall n\in\mathbb{N}^*,\ x_1,\cdots,x_n\in\mathcal{D},$ the joint distribution of $f(x_1),\cdots,f(x_n)$ is given by

$$\begin{pmatrix} f(x_1) \\ \vdots\\ f(x_n) \end{pmatrix} \sim N\left\lbrace \begin{pmatrix} \mu(x_1) \\ \vdots\\ \mu(x_n) \end{pmatrix}, \begin{pmatrix} k(x_1,x_1) & \cdots & k(x_1,x_n) \\ \vdots & \ddots & \vdots\\ k(x_n, x_1) & \cdots & k(x_n,x_n) \end{pmatrix}\right\rbrace.$$

The covariance function $k$ is a kernel which is symmetric and positive definite. A wide range of kernels can be selected as the covariance function, such as Gaussian RBF and Matérn kernels.
If a Gaussian process $Z(\cdot)$ is (weakly) stationary, i.e. it has constant mean $\mu$, and the covariance $k(s,t)$ depends on only the difference between locations $s-t$, then $\forall n\in\mathbb{N}^*,\ s_1,\cdots,s_n\in\mathcal{D},$ and $\forall h$ such that $s_1+h,\cdots,s_n+h\in\mathcal{D}$, then
\[\left(Z(s_1),\cdots,Z(s_n)\right) \overset{d}{=} \left(Z(s_1 + h),\cdots,Z(s_n + h)\right).\]
In a nutshell, weak stationarity implies strict stationarity in Gaussian processes.

Some Gaussian processes with zero mean and different covariance kernels are visualized below.

avatar avatar

Hierarchical model for geostatistical process

In practice, we often apply a hierarchical model to simulate spatial or geostatistical data. A hierarchical model has two components: a process model, which sets up some latent variable with covariance structure, and a data model, which generates observations from latent variables with measurement error.

Process Model. Suppose that $\mathbf{m}(\cdot)\in\mathbb{R}^p$ is a column vector of covariates related to the mean function. Given regression coefficients $\beta\in\mathbb{R}^p$ and kernel $k(\cdot,\cdot),$ let

\[Z(\cdot)\sim\mathcal{GP}\left(\mathbf{m}(\cdot)^\top\beta, k(\cdot,\cdot)\right).\]

Data Model. Conditioning on process $Z(\cdot),$ the observation at location $s\in\mathcal{D}$ is

\[Y(s)|Z(\cdot) \sim N(Z(s), \sigma^2),\ \sigma^2>0;\]

Equivalently, for $n$ locations $s_1,\cdots,s_n,$

\[Y(s_i) = Z(s_i) + e_i,\ e_i \overset{\textrm{i.i.d.}}{\sim} N(0,\sigma^2),\ i=1,\cdots,n.\]

Hence we can use an error process $\epsilon(\cdot)$ to model our observational data:

$$Y(s_i) = \mathbf{m}(s_i)^\top\beta + \epsilon(s_i),\ \begin{pmatrix}\epsilon(s_1)\\ \vdots \\ \epsilon(s_n)\end{pmatrix}\sim N(0,\mathbf{K} + \sigma^2\mathbf{I}_n),$$

where $\mathbf{K} = \lbrace k(s_i,s_j)\rbrace_{i,j=1}^n.$

Analytical Examples

Brownian Motion

Definition (Brownian motion/Wiener process). Given a probability space $(\Omega,\mathcal{F},\mathbb{R}),$ a stochastic process $\lbrace W(t): \Omega\to\mathbb{R},\ t\geq 0\rbrace$ is a Brownian motion if

$\mathbb{P}\lbrace W(0) = 0\rbrace = 1;$
The sample path of $W(t)$ is continuous;
$W(t)$ has independent and stationary increments;
$W(t)\sim N(0,\sigma^2t),\ \sigma > 0.$

For convenience of our discussion, we will consider standard Brownian motion in which $\sigma^2=1.$

Properties. Let $W(\cdot)$ be a standard Brownian motion.

(Mean and Covariance) $\mathbb{E}[W(t)] = 0,\ \mathrm{Cov}\lbrace W(s),W(t)\rbrace = \min(s,t).$

Proof. The mean of $W(t)$ is zero by definition, so we only consider the covariance.

Without loss of generality, suppose $s < t$ (the case $s = t$ is evident). Note that $W(t)$ has independent increments, we have
\[W(s)\perp W(t) - W(s),\ \mathbb{E}\left[W(s)\left(W(t)-W(s)\right)\right] = 0,\]
hence

$$\begin{aligned}\mathrm{Cov}\lbrace W(s),W(t)\rbrace &= \mathbb{E}[W(s)W(t)]\\ &= \mathbb{E}[W(s)^2] + \mathbb{E}[W(s)(W(t)-W(s))] = s. \end{aligned}$$
(Gaussian process) $W(\cdot)$ is a Gaussian process with zero mean and kernel $k(s,t):=\min(s,t).$

Proof. It suffices to show that $\forall n\geq 1,\ 0 < t_1 < \cdots < t_n,$ the joint distribution of corresponding variables $\mathbf{W} = (W(t_1),\cdots,W(t_n))^\top$ is Gaussian. Denote Brownian increments

$$\mathbf{Z}=\left(W(t_1),W(t_2)-W(t_1),\cdots,W(t_n) - W(t_{n-1})\right)^\top,$$

By stationarity and independence of increments, we have $\mathbf{Z}\sim N(0,\mathbf{D})$, where $\mathbf{D} = \mathrm{diag}\lbrace t_1, t_2-t_1,\cdots,t_n-t_{n-1}\rbrace.$

Denote lower triangular matrix

$$\mathbf{L} = \left\lbrace\mathbb{1}(i\geq j)\right\rbrace_{i,j=1}^n = \begin{pmatrix} 1 & 0 & \cdots & 0 & 0 \\ 1 & 1 & \cdots & 0 & 0\\ \vdots & \vdots & \ddots & \vdots & \vdots\\ 1 & 1 & \cdots & 1 & 0\\ 1 & 1 & \cdots & 1 & 1\end{pmatrix},$$

$$\mathbf{LZ} \sim N(0,\mathbf{LDL}^\top)\ \Rightarrow\ \mathbf{W}\sim N(0,\mathbf{K}),$$

where $\mathbf{K}=\left\lbrace \min(t_i,t_j)\right\rbrace_{i,j=1}^n.$ Hence $\mathbf{W}$ is Gaussian, and $W(t)$ is a Gaussian process.

Ornstein–Uhlenbeck process (O-U process)

Given a probability space $(\Omega,\mathcal{F},\mathbb{R}).$ A Ornstein–Uhlenbeck process $\lbrace X(t): \Omega\to\mathbb{R},\ t\geq 0\rbrace$ is defined by the following stochastic differential equation (SDE):

$$\mathrm{d}X(t) = -\theta X(t)\mathrm{d}t + \sigma\mathrm{d}W(t),$$

where $\theta>0,\sigma>0$ and $W(t)$ is a standard Brownian motion.

Now we derive the solution of the SDE above.

$$\begin{aligned}\mathrm{d}X(t) + \theta X(t)\mathrm{d}t &= \sigma\mathrm{d}W(t),\\ \mathrm{d}\left(X(t)\mathrm{e}^{\theta t}\right) &= \sigma\mathrm{e}^{\theta t}\mathrm{d}W(t),\\ X(t)\mathrm{e}^{\theta t} - X(0) &= \sigma\int_0^t\mathrm{e}^{\theta s}\mathrm{d}W(s). \end{aligned}$$

Hence

$$X(t) = X(0)\mathrm{e}^{-\theta t} + \sigma\int_0^t\mathrm{e}^{-\theta(t-s)}\mathrm{d}W(s).$$

Properties. Let $X(\cdot)$ be an Ornstein–Uhlenbeck process defined above.

Conditional on $X(0),$ the mean and variance of $X(\cdot)$ is given by

$$\begin{aligned} \mathbb{E}[X(t)|X(0)] &= X(0)\mathrm{e}^{-\theta t},\\ \mathrm{Cov}\left\lbrace X(s), X(t)|X(0)\right\rbrace &= \frac{\sigma^2}{2\theta}\left(\mathrm{e}^{\theta\vert t-s\vert} + \mathrm{e}^{-(s+t)}\right). \end{aligned}$$

Proof. The conditional expectation of $X(t)$ is its deterministic part, since the stochastic part is an integral of Brownian increments with zero mean. Now we derive the covariace.

$$\mathrm{Cov}\left\lbrace X(s), X(t)|X(0)\right\rbrace = \sigma^2\mathrm{e}^{-(s+t)}\mathbb{E}\left[\int_0^s \mathrm{e}^{\theta u}\mathrm{d}W(u)\int_0^t \mathrm{e}^{\theta u}\mathrm{d}W(u)\right]$$

By the independence of Brownian increments and the Itô isometry, we have

$$\begin{aligned}\mathbb{E}\left[\int_0^s \mathrm{e}^{\theta u}\mathrm{d}W(u)\int_0^t \mathrm{e}^{\theta u}\mathrm{d}W(u)\right] &= \mathbb{E}\left[\left(\int_0^{\min(s,t)} \mathrm{e}^{\theta u}\mathrm{d}W(u)\right)^2\right]\\ &= \int_0^{\min(s,t)} \mathrm{e}^{2\theta u}\mathrm{d}u = \frac{1}{2\theta}\left(\mathrm{e}^{2\theta\min(s,t)} - 1\right) \end{aligned}$$

Hence

$$\mathrm{Cov}\left\lbrace X(s), X(t)|X(0)\right\rbrace = \frac{\sigma^2}{2\theta}\left(\mathrm{e}^{-\theta\vert t-s\vert} - \mathrm{e}^{-\theta(t+s)}\right).$$
We impose a Gaussian distribution $N(0,\frac{\sigma^2}{2\theta})$ on $X(0)$ which is independent of $W(\cdot),$ then
$$\begin{aligned} \mathbb{E}[X(t)] &= 0,\\ \mathrm{Cov}\left\lbrace X(s), X(t)\right\rbrace &= \frac{\sigma^2}{2\theta}\mathrm{e}^{\theta\vert t-s\vert}, \end{aligned}$$

which implies that a (unconditioned) Ornstein–Uhlenbeck process is stationary with zero mean and Laplacian kernel.

Proof. Since $X(0)$ is independent of $W(\cdot)$, we only need to add the term $\mathrm{e}^{-\theta(s+t)}\mathrm{Var}(Z_0)$ to the conditional covariance.
Each Ornstein–Uhlenbeck process is a Gaussian process. We only consider the stochastic part of $X(t):$
$$\int_0^t\mathrm{e}^{-\theta(t-s)}\mathrm{d}W(s) = \lim_{n \to \infty}\sum_{[t_{i-1},t_i]\in \pi_n}\mathrm{e}^{-\theta(t-s_{i-1})}\left(W(s_i) - W(s_{i-1})\right),$$

where $\lbrace\pi_n\rbrace$ is a sequence of partitions of $[0,t]$ with the length of maximum sub-interval going to zero. Since all Brownian increments $W(s_i) - W(s_{i-1})$ are Gaussian, the linear combination of finite Brownian increments is Gaussian. Furthermore, the Itô integral as the $L^2$-limit of a sequence of Gaussian variables is still Gaussian. Hence $X(\cdot)$ is a Gaussian process.

Remark. A more general form of Ornstein–Uhlenbeck processes has an additional drift term $\mu$:

$$\mathrm{d}X(t) = \theta (\mu -X(t))\mathrm{d}t + \sigma\mathrm{d}W(t).$$

The drift term does not change the form of covariance, but the conditional mean is $\mathbb{E}[X(t)\vert X(0)] = X(0)\mathrm{e}^{-\theta t} + \mu(1-\mathrm{e}^{-\theta t}).$

Spatial Statistics (II): Kernel Functions

Spatial Statistics (IV): Kriging