Adapt chunks not fitting in GPU

Nabu processes the data by chunks, as does PyHST.

The current design defines a "maximum possible chunk height" based on GPU memory, with the insight that all the chunk should fit in the GPU.

However, some steps might put a constraint on the minimum chunk height. For example, the Paganin filter might have a large kernel (see MARGE in PyHST). In this case, we might have min_required_chunk_height > max_gpu_chunk_height.

Nabu should be able to handle this case: process chunks height larger than GPU memory:

Use a chunk height that satisfy the "Phase retrieval requirement" (min_required_chunk_height)
For radio-processing, process by individual radios (as it is the case now !). These (sub-)radios might be bigger than max_gpu_chunk_height, but they are processed individually.
For sino-processing, use sub-chunks (i.e sub-stacks here). This entails memory transfers.

With this approach, the factor defining the maximum chunk height becomes the main CPU memory.

A positive side-effect is that it is a step forward heterogeneous computing, where workers have different computing resources.

Edited Jan 28, 2020 by Pierre Paleo