Spring cleaning: refactoring, API changes, consistency checks

Nabu has seen an increase in the number of features at a steady pace from the beginning. However some of the choices made at the time turn out not to be the best. This issue tries to list the things that can be improved.

The ultimate goal is to have an easy to understand module in terms of architecture: external people should be able to "dive" in the code base without too much hassle.

This "spring cleaning" WILL result in API changes, although we should try to minimize the impact. It would best be done now rather than with a larger users base.

Modules to be renamed/moved

Currently the processing pipelines (FullField, FullRadios, ...) are put in nabu.app. The "app" name might suggest an unstable API, which we would like to avoid. nabu.pipeline would be better. (This name was avoided in the past to make it clear that no generic pipeline engine would be implemented).
nabu.distributed is to be removed
nabu.cuda contains general-purpose processing modules (GPP) like convolution and medfilt. However nabu.misc contains unsharp_cuda which is also a GPP module. So either we put all the GPP cuda modules in nabu.cuda, or we only put in cuda the specific helpers (kernel, etc).
nabu.io.reader and nabu.io.writer might be better split into specific files for clarity (ex. nabu.io.edf_reader, nabu.io.nxwriter), of course all of them being accessible from nabu.io.reader and nabu.io.writer.
nabu.preproc.phase only contains PaganinPhaseRetrieval, while CTF has its own file. So []Paganin classes might better be put in dedicated files, and nabu.preproc.phase would "redirect" to the correct classes.
nabu.preproc.ccd might be too generic
nabu.resources looks like a "misc" module. It contains too many modules with different purposes. Its scope should be re-defined. For example nabu.resources.nxflatfield should be moved to nabu.io.
The CLI tools should not be in nabu.resources but in nabu.app

Classes/inheritance/constraints

Don't add useless constraints in API.

In the beginning, main classes like CCDProcessing and SinoProcessing were created. They were supposed to provide a clear architecture and avoid boilerplate code. However, it turns out that these classes are trouble:

Multiple inheritance + "diamond problem". Example: CudaSinoProcessing and SinoDeringer both inherit from SinoProcessing, and CudaSinoDeringer should inherit from both of them.
There is actually almost no "boilerplate" code that can be factored in a general-purpose parent class
Too much constraints due to the API. For example the shape might not have to be fixed at class instantiation.

The classes CCDProcessing and SinoProcessing are bound to be removed. Actually few classes inherit from them, and none (?) do in a meaningful way.

Building blocks (processing classes and functions) should primarily act on arrays

Data processing and data reading must be decoupled.
For example in FlatField, it makes no sense to pass DataUrl object in a processing class.

`nabu.resources` module

This module will be split into two modules with different scopes.

`nabu.resources` (or `nabu.utils`)

Purpose: provide various utility tools: logger, gpu, machinesdb, ...

`nabu.parsing` or `nabu.pipeline.XXX`

Purpose: make the link between a (config_file, dataset) pair and the processing pipeline.

The principle is:

Nabu ingests a user configuration (file) and a dataset
This module generates an internal pipeline description (processing steps and options)

A dedicated page in the documentation will be written on these steps (see comment below).

Edited Mar 11, 2021 by Pierre Paleo