Pre-processing API consistency and GPU memory

Technically, some pre-processing can be done in-place on radios/sinograms.
All processing that can be done in-place should be done in-place by default.

Otherwise, we can either

Provide output=, with a pre-allocated device array
"Force" in-place by using temporary arrays and copying back to input array.

The approach (1), although more efficient, is much more memory demanding (the stack is replicated).
The approach (2) entails more memory copies.

As memory is the crucial resource for GPU in the context of large-scale data processing, approach (2) should be preferred. However, implementing (1) is easy. Therefore, both approaches should be implemented in all pre-processing classes.

Edited Dec 09, 2019 by Pierre Paleo