2021-07-29
In Part 2 of this series, we thought about using a basis function expansion to draw samples from a Gaussian process. With a basis function expansion, our Gaussian process observed at a collection of points, Y, is given by
Y = Φ*√Λ*Z
where Φᵢⱼ = φⱼ(sᵢ) is the matrix of evaluated basis functions evaluated at the observation points, √Λ is a diagonal scaling matrix and Z ~ N(0,I) is a standard multivariate normal random vector.
This is all well and good, but now you have to choose a set of basis functions and the scaling coefficients λₖ so that you end up with an appropriate covariance matrix
Σ = Φ*Λ*Φ'
One particularly convenient basis is the Fourier basis composed of sines and cosines
φ₂ₖ(s) = cos(2π*k*s)
φ₂ₖ₊₁(s) = sin(2π*k*s)
There is a lot to say about Fourier series, and we won't get too deep into it here. The main reason why it is so convenient is that there is a fast algorithm for performing the matrix multiplication Φ*X called the fast Fourier transform, and implementations of the FFT are available in many languages including Julia
using FFTW # Y = Φ*X Y = ifft(X)
Here we actually use the *inverse* fast Fourier transform `ifft` because `fft` usually implements the adjoint or transposed operation Φ'X in our notation. There are a few details about using the fast Fourier transform that are a little tricky. First of all: the FFT only computes the value of Y at points on an evenly spaced grid. This is good if you are using Gaussian processes for image analysis or for other data that comes in a gridded format, but annoying if your data are irregularly spaced. We'll see later how we might get around this limitation.
More confusing is how the coefficients X are actually represented. The notation that we have used above would imply that X₂ₖ is the coefficient of the cosine term with frequency k and X₂ₖ₊₁ is the coefficient of the sine term with frequency k. Many FFT implementations including FFTW, which Julia's is based on, store X in a vector of complex numbers aₖ + im*bₖ where aₖ is the cosine coefficient and bₖ is the sine coefficient. There are other details having to do with symmetries of the coefficients and other things, but we can actually avoid thinking too hard about this. Instead of drawing a random vector that is stored in the right format, we can draw a standard normal random vector u ~ N(0,I) and compute its FFT to get a complex random vector that is organized properly. We then multiply by our diagonal scaling matrix and take the inverse FFT to get our sample Y
u = randn(N) Y = ifft(sqrt.(Λ) .* fft(u))
Mathematically, we think of this as implementing the matrix multiplication
Y = A*u = Φ*√Λ*Φ'u
Using our rules for the covariance matrix we can see that we get the right covariance matrix
Σ = A*A' = Φ*√Λ*Φ'Φ*√Λ*Φ' = Φ*Λ*Φ'
since the inverse FFT (Φ) and the FFT (Φ') are inverses of each other.
Constructing the Gaussian process this way lets us interpret it in a really useful way. We have a vector u of standard normal random variables defined on an evenly spaced grid. When we take the FFT and multiply by the scaling matrix √Λ, we are really multiplying pointwise by the diagonal elements of that matrix. Something called the convolution theorem states that the *convolution* of two signals is equal to the pointwise multiplication of their Fourier transforms. Convolution is basically a fancy word for filtering, and when we use this FFT algorithm, we are really filtering the white noise u to come up with the random function Y. We choose the coefficients λₖ to implement the kind of filter that we want. We can also design a filter in the signal domain (rather than in the Fourier domain) by putting the filter coefficients in a real-valued vector v and then using
Y = ifft(fft(v) .* fft(u))
A Gaussian process constructed by filtering white noise in this manner is stationary: its covariance function R(s,t) = R(s - t) = R(h) depends only on h = s-t, not on their actual values. The statistical properties of a stationary process don't vary in space, which is sometimes a useful assumption to make.
Another good reason to use the FFT method is that it is really easy to draw samples in two or three dimensions using multidimensional FFTs. You draw u on a two-dimensional or three-dimensional grid and then the pointwise multiplication by a scaling vector (or a filter) becomes a pointwise multiplication between a multi-dimensional scaling array/filter.
using FFTW u = randn(N,N) Y = ifft(sqrt.(Λ) .* fft(u))
The fast Fourier transform gives us an easy way to generate samples on an evenly spaced grid from a stationary Gaussian process, one whose statistical properties do not vary in space, by filtering white noise. However, the covariance function of the resulting sample is kind of weird. It is periodic, with a period equal to the size of the grid. Samples at the corresponding edges are correlated with each other, which creates a kind of long-range interaction that might be unrealistic in many applications. It is common, if you have a given stationary covariance function R(s), to sample it on the grid, take its FFT and use that as the scaling vector Λ. This is only an approximation and does not actually give you samples with the right covariance function. The right way to do this is a method called *circulant embedding*, which we'll talk about in a future post. But in many applications you are kind of guessing at the covariance function anyway, so it can be okay to use the FFT method and just be careful about the boundaries.