nabu-multicor: performance improvement
In nabu.app.multicor
there is this unholy mess:
# Get sinogram into contiguous array
# TODO Can't do memcpy2D ?! It used to work in cuda 11.
# For now: transfer to host... not optimal
sino = pipeline._d_radios[:, pipeline._d_radios.shape[1] // 2, :].get() # pylint: disable=E1136
it would be good to (a) use a real memcopy D2D, (b) ensure to take extract the correct sinogram