Spring cleaning: refactoring, API changes, consistency checks
Nabu has seen an increase in the number of features at a steady pace from the beginning. However some of the choices made at the time turn out not to be the best. This issue tries to list the things that can be improved.
The ultimate goal is to have an easy to understand module in terms of architecture: external people should be able to "dive" in the code base without too much hassle.
This "spring cleaning" WILL result in API changes, although we should try to minimize the impact. It would best be done now rather than with a larger users base.
Modules to be renamed/moved
-
Currently the processing pipelines (FullField, FullRadios, ...) are put in
nabu.app
. The "app" name might suggest an unstable API, which we would like to avoid.nabu.pipeline
would be better. (This name was avoided in the past to make it clear that no generic pipeline engine would be implemented). -
nabu.distributed
is to be removed -
nabu.cuda
contains general-purpose processing modules (GPP) likeconvolution
andmedfilt
. Howevernabu.misc
containsunsharp_cuda
which is also a GPP module. So either we put all the GPP cuda modules innabu.cuda
, or we only put in cuda the specific helpers (kernel, etc). -
nabu.io.reader
andnabu.io.writer
might be better split into specific files for clarity (ex.nabu.io.edf_reader
,nabu.io.nxwriter
), of course all of them being accessible fromnabu.io.reader
andnabu.io.writer
. -
nabu.preproc.phase
only containsPaganinPhaseRetrieval
, while CTF has its own file. So[]Paganin
classes might better be put in dedicated files, andnabu.preproc.phase
would "redirect" to the correct classes. -
nabu.preproc.ccd
might be too generic -
nabu.resources
looks like a "misc" module. It contains too many modules with different purposes. Its scope should be re-defined. For examplenabu.resources.nxflatfield
should be moved tonabu.io
. -
The CLI tools should not be in
nabu.resources
but innabu.app
Classes/inheritance/constraints
Don't add useless constraints in API.
In the beginning, main classes like CCDProcessing
and SinoProcessing
were created. They were supposed to provide a clear architecture and avoid boilerplate code. However, it turns out that these classes are trouble:
- Multiple inheritance + "diamond problem". Example: CudaSinoProcessing and SinoDeringer both inherit from SinoProcessing, and CudaSinoDeringer should inherit from both of them.
- There is actually almost no "boilerplate" code that can be factored in a general-purpose parent class
- Too much constraints due to the API. For example the shape might not have to be fixed at class instantiation.
The classes CCDProcessing
and SinoProcessing
are bound to be removed. Actually few classes inherit from them, and none (?) do in a meaningful way.
Building blocks (processing classes and functions) should primarily act on arrays
Data processing and data reading must be decoupled.
For example in FlatField
, it makes no sense to pass DataUrl
object in a processing class.
nabu.resources
module
This module will be split into two modules with different scopes.
nabu.resources
(or nabu.utils
)
Purpose: provide various utility tools: logger, gpu, machinesdb, ...
nabu.parsing
or nabu.pipeline.XXX
Purpose: make the link between a (config_file, dataset)
pair and the processing pipeline.
The principle is:
- Nabu ingests a user configuration (file) and a dataset
- This module generates an internal pipeline description (processing steps and options)
A dedicated page in the documentation will be written on these steps (see comment below).