💾 Archived View for epi.benthic.zone › posts › 2021-07-26_GaussianRandomSamplingII.gmi captured on 2022-07-16 at 15:05:51. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-11-30)
-=-=-=-=-=-=-
2021-07-26
In Part I of this series, we defined a Gaussian process as a random function. We observe that function at a discrete set of points, turning the Gaussian process into a multivariate Gaussian distribution. We saw how to sample from this distribution using the Cholesky decomposition of the covariance matrix. While this method works for any Gaussian process for which you can calculate a covariance function, the Cholesky decomposition is computationally expensive, and we'd like to do it a little more quickly if possible.
The key to the Cholesky decomposition method was the identity that if Z ~ N(0,I), and you have a matrix A with the right size, then the vector Y = A*Z has a distribution of N(0,A*A'). The Cholesky decomposition gives us a way of turning a covariance matrix Σ into a lower triangular matrix A that we can use to sample from a general multivariate Gaussian distribution, but if we can find another, cheaper way to design a transformation matrix then we can avoid the computational cost of the Cholesky decomposition.
One approach is to design a basis function expansion. Returning to the idea of the Gaussian process as a random function of space, we can separate out the *random* part from the *function of space* part. We first define a set of basis functions, [φ₀(s),φ₁(s),...,φₙ(s)], that are defined over the region of space that we are interested in. An example might be monomials:
φₖ(s) = s^k
or sinusoids
φ₂ₖ(s) = sin(k*s)
φ₂ₖ₊₁(s) = cos(k*s)
We then define a set of normal random variables [X₀,X₁,...,Xₙ] that are each independently distributed with variance λₖ. We can then define our Gaussian process as
Y(s) = Σₖ Xₖ*φₖ(s) = Σₖ √λₖ*Zₖ*φₖ(s)
where Zₖ ~ N(0,1). As usual, we observe Y at a finite collection of points and collection those observations into a vector Y. We also evaluate each basis function at each observation point and collect those into a matrix Φᵢⱼ = φⱼ(sᵢ). Our model becomes
Y = Φ*X = Φ*√Λ*Z
where √Λ is a diagonal matrix with √λₖ on the diagonal.
Using our identity from before Y ~ N(0,ΦΛΦ'). This looks a lot like an eigenvalue decomposition of the covariance matrix, and indeed it is if the matrix of basis functions Φ is orthonormal (i.e. Φ'Φ = I), in which case the functions φₖ are the eigenfunctions of the covariance operator and the variances λₖ are its eigenvalues.
If you had a covariance matrix, you could compute its eigenvalue decomposition and use the eigenfunctions and eigenvalues to sample from the Gaussian distribution, but this is just as hard as the Cholesky decomposition. More likely, you would choose some eigenfunctions and some eigenvalues and form the transformation matrix. The resulting covariance function is
R(s,t) = Σₖ λₖ*φₖ(s)*φₖ(t)
which might be a little weird if you want to have a nice, explicit formula for your covariance, but it is just as valid a covariance function.
The Cholesky decomposition lets us compute an appropriate transformation matrix from a given covariance matrix, but it is expensive. We can instead go the other direction, choosing a simple transformation matrix A and using the resulting covariance matrix A*A'. This is equivalent to expanding the Gaussian process in a fixed basis with random coefficients.
We retain a lot of freedom in choosing our basis, and actually with the right choice of basis, we can represent just about any covariance function we desire. This is good because it allows us to be flexible in our modeling, but bad because we now have to decide which basis functions we are going to use.
There is one basis that is particularly convenient to use -- the Fourier basis of complex exponentials -- and we'll see in the next post how this basis lets us replace the transformation matrix with an even faster algorithm.