"dry run" pipeline
It would be nice to have a "dry run" pipeline implementation.
The benefit would be:
- Faster prototyping and : no need to load data and make data go through all processing classes. Each individual processing class is normally unit-tested, so normally the "cabling" is the most tedious part.
- The "dryrun pipeline" would define a contract/blueprint of the actual pipelines
- On the user side: have a trace of what the processing would be (memory used, output files, etc).
There are two possible approaches:
Approach 1: Subclass each pipeline
BasePipeline
ChunkedPipeline
CudaChunked
DryRunChunked
GroupedPipeline
CudaGrouped
DryRunGrouped
Approach 2: Each pipeline class is dry run by default, only subclasses implement a backend
BasePipeline
ChunkedPipeline <- dry run
NumpyChunked <-- our current ChunkedPipeline
CudaChunked
GroupedPipeline <- dry run
NumpyGrouped <- our current GroupedPipeline
CudaGrouped
Edited by Pierre Paleo