Reduce GPU memory usage (!378) · Merge requests · tomotools / Nabu

Pierre Paleo requested to merge reduce_memory_usage into master Dec 05, 2023

About

Reduce GPU memory usage for pipeline, update required memory estimations.

To do

Notes

Vertical shifts cuda implementation

CudaVerticalShifts did a lot of self._d_radio_tmp[:-s0] += radio[s0:] * f, where the RHS creates one array for each iteration. To avoid this, a new "mul-add" kernel was added.

Padding Cuda/OpenCL implementation

XXPadding provides a generic padding through coordinate transform, but currently it allocates two 2D images (coords_rows and coords_cols) for the coordinates transform. Only one 1D array is needed for each:

from nabu.processing.padding_base import PaddingBase

for mode in set(PaddingBase.supported_modes) - set(["constant"]):
    pad = PaddingBase((12,13), ((5,6), (7,8)), mode=mode)
    assert np.max(np.std(pad.coords_cols, axis=0)) == 0
    assert np.max(np.std(pad.coords_rows, axis=1)) == 0

FBP

Given a sinogram of shape (n_a, n_x), the memory footprint for FBP is n_x * (5 n_a + n_x), assuming R2C transforms for filtering, and no FFT plans stored:

sino: (n_a, n_x)
sino_padded: (n_a, 2*n_x) (in the best case! usually next_power(2*n) > 2*n)
sino_padded_fourier: (n_a, 2*n_x//2 + 1) complex values
reco: (n_x, n_x)

This memory usage could be reduced to the minimum (sinogram + reconstructed slice) if the filtering is done in-place. This would have two drawbacks:

The user "loses" the input sinogram, though it's probably fine in most cases
Filtering is not as efficient: batched 1D FFT becomes a series of 1D FFT, with the overhead of python loops. A compromise could be batchs of hundreds of lines.

With these modifications, a test (using VKFFT backend) on a sinogram of shape (n_a, n_x) = (43200, 16384) will use 21 GB. On the other hand 5 * sino.nbytes/1e9 + rec_big.nbytes/1e9 gives 18 GB.

Edited Dec 11, 2023 by Pierre Paleo

Reduce GPU memory usage

About

To do

Notes

Vertical shifts cuda implementation

Padding Cuda/OpenCL implementation

FBP

Merge request reports