Kernels

This package implements the different covariance kernels that one can use in volcapy.

Its main goal is to compute the covariance pushforward \(K F^t\), where \(K\) is the model covariance matrix and \(F\) is the forward operator.

IMPORTANT: Note that we always strip the variance parameter \(\sigma_0^2\) from the covariance matrix. Hence, when using the covariance pushforward computed here, one has to manually multiply by \(\sigma_0^2\) for expressions to make sense.

Each of the kernels should implement the three methods below

def compute_cov_pushforward(lambda0, F, cells_coords, device,
        n_chunks=200, n_flush=50):
    """ Compute covariance pushforward

    """

def compute_cov(lambda0, cells_coords, i, j):
    """ Compute the covariance bewtween cells i and j of the model.

    """

def compute_full_cov(lambda0, cells_coords, device,
        n_chunks=200, n_flush=50):
    """ Compute the full covariance matrix. Note that due to the
    :math:`n_m^2` size, this should only be
    attempted on small models.

    """

A detailed description of the arguments is available at the end of this section.

Handling out of Memory Errors

Due to the size of the covariance matrix, care has to be taken when computing its product with the forward. Let \(n_m\) be the number of model cells. Then the covariance matrix has size \(n_m^2\), which for 10000 cels already takes more than 160 Gb of memory.

The strategy used here is to compute the matrix in chunks. We compute matrix products of the form \(K A\) by computing the rows of the resulting matrix in chunks of size n_chunks. This then only involves n_chunks of the covariance matrix at a time. Hence what we do is compute such a chunk of the covariance matrix on GPU, multiply it with the right hand side matrix and send the result back to CPU where it is concatenated with the previously computed chunks, while the freed GPU memory is used to compute the next chunk.

We noticed that CUDA tends to keep arbitrary data in cache, which after computing a certain number of chunks will fill the GPU memory. The cache thus has to be manually flushed every :code`n_flush` chunks.

Flushing takes a long time, so one shouldn’t do it to often. The value of n_flush should be as high as possible to avoid flushing too often. The optimal value should be determined experimentally by the user.

Matérn 3/2

The implementation of the Matérn 3/2 kernel is provided as example below.

volcapy.covariance.matern32.compute_cov(lambda0, cells_coords, i, j)[source]

Compute the covariance between two points.

Note that, as always, sigma0 has been stripped.

Parameters
lambda0: float

Lenght-scale parameter

cells_coords: tensor

n_cells * n_dims: cells coordinates

i: int

Index of first cell (index in the cells_coords array).

j: int

Index of second cell.

Returns
Tensor

(Stripped) covariance between cell nr i and cell nr j.

volcapy.covariance.matern32.compute_cov_pushforward(lambda0, F, cells_coords, device, n_chunks=200, n_flush=50)[source]

Compute the covariance pushforward.

The covariance pushforward is just KF^T, where K is the model covariance matrix.

Note that the sigam0^2 is not included, and one has to manually add it when using the covariance pushforward computed here.

Parameters
lambda0: float

Lenght-scale parameter

F: tensor

Forward operator matrix

cells_coords: tensor

n_cells * n_dims: cells coordinates

device: toch.Device

Device to perform the computation on, CPU or GPU.

n_chunks: int

Number of chunks to split the matrix into. Default is 200. Increase if get OOM errors.

n_flush: int

Synchronize threads and flush GPU cache every n_flush iterations. This is necessary to avoid OOM errors. Default is 50.

Returns
Tensor

n_model * n_data covariance pushforward K F^t.

volcapy.covariance.matern32.compute_full_cov(lambda0, cells_coords, device, n_chunks=200, n_flush=50)[source]

Compute the full covariance matrix.

Note that the sigam0^2 is not included, and one has to manually add it when using the covariance pushforward computed here.

Parameters
lambda0: float

Lenght-scale parameter

cells_coords: tensor

n_cells * n_dims: cells coordinates

device: toch.Device

Device to perform the computation on, CPU or GPU.

n_chunks: int

Number of chunks to split the matrix into. Default is 200. Increase if get OOM errors.

n_flush: int

Synchronize threads and flush GPU cache every n_flush iterations. This is necessary to avoid OOM errors. Default is 50.

Returns
Tensor

n_cells * n_cells covariance matrix.